[ https://issues.apache.org/jira/browse/HUDI-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu reassigned HUDI-281: ------------------------------- Assignee: (was: Raymond Xu) > HiveSync failure through Spark when useJdbc is set to false > ----------------------------------------------------------- > > Key: HUDI-281 > URL: https://issues.apache.org/jira/browse/HUDI-281 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration, Spark Integration, Usability > Reporter: Udit Mehrotra > Priority: Major > Labels: query-eng, user-support-issues > Fix For: 0.11.0, 0.10.1 > > > Table creation with Hive sync through Spark fails, when I set *useJdbc* to > *false*. Currently I had to modify the code to set *useJdbc* to *false* as > there is not *DataSourceOption* through which I can specify this field when > running Hudi code. > Here is the failure: > {noformat} > java.lang.NoSuchMethodError: > org.apache.hadoop.hive.ql.session.SessionState.start(Lorg/apache/hudi/org/apache/hadoop_hive/conf/HiveConf;)Lorg/apache/hadoop/hive/ql/session/SessionState; > at > org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:527) > at > org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:517) > at > org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:507) > at > org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:272) > at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:132) > at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:96) > at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:68) > at > org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:235) > at > org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) > at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229){noformat} > I was expecting this to fail through Spark, becuase *hive-exec* is not shaded > inside *hudi-spark-bundle*, while *HiveConf* is shaded and relocated. This > *SessionState* is coming from the spark-hive jar and obviously it does not > accept the relocated *HiveConf*. > We in *EMR* are running into same problem when trying to integrate with Glue > Catalog. For this we have to create Hive metastore client through > *Hive.get(conf).getMsc()* instead of how it is being down now, so that > alternate implementations of metastore can get created. However, because > hive-exec is not shaded but HiveConf is relocated we run into same issues > there. > It would not be recommended to shade *hive-exec* either because it itself is > an Uber jar that shades a lot of things, and all of them would end up in > *hudi-spark-bundle* jar. We would not want to head that route. That is why, > we would suggest if we consider removing any shading of Hive libraries. > We can add a *Maven Profile* to shade, but that means it has to be activated > by default otherwise it will fail default if *useJdbc* is set to false, and > later when we commit *Glue Catalog* changes. > > > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)