[GitHub] [iceberg] pvary commented on pull request #1505: Hive: Make HiveCatalog based tables automatically readable from Hive

GitBox Sat, 26 Sep 2020 00:41:33 -0700


pvary commented on pull request #1505:
URL: https://github.com/apache/iceberg/pull/1505#issuecomment-699450888



   > Okay, I understand what you're saying about `ReflectionStorageHandler` now 
(for Spark). But I don't think we need it because the Spark code interacts with 
the table through the Iceberg library. It won't try to instantiate the storage 
handler because that's specific to the Hive integraiton.
   
   We need the build changes, because the spark sql uses Hive code to interact 
with hive tables. This fact was highlighted by the failures in the tests.
   
   You can repro by reverting the `gradle.build` changes and setting the 
`ENGINE_HIVE_ENABLED_DEFAULT` to `true`. 
   ```
   ./gradlew :iceberg-spark3:test --tests 
org.apache.iceberg.spark.sql.TestCreateTable
   ```
   
   One of the exception what I have got is here:
   ```
       java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage 
handler.org.apache.iceberg.mr.hive.HiveIcebergStorageHandler
           at 
org.apache.hadoop.hive.ql.metadata.Table.getStorageHandler(Table.java:297)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.convertHiveTableToCatalogTable(HiveClientImpl.scala:465)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:424)
           at scala.Option.map(Option.scala:230)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:424)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:422)
           at 
org.apache.spark.sql.hive.client.HiveClient.getTable(HiveClient.scala:90)
           at 
org.apache.spark.sql.hive.client.HiveClient.getTable$(HiveClient.scala:89)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:90)
           at 
org.apache.spark.sql.hive.HiveExternalCatalog.getRawTable(HiveExternalCatalog.scala:120)
           at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$getTable$1(HiveExternalCatalog.scala:719)
           at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
           at 
org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:719)
           at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getTable(ExternalCatalogWithListener.scala:138)
           at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:446)
           at 
org.apache.spark.sql.execution.command.DropTableCommand.run(ddl.scala:226)
           at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
           at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
           at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
           at 
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
           at 
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
           at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
           at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
           at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
           at org.apache.spark.sql.Dataset.<init>(Dataset.scala:229)
           at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
           at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
           at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:606)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
           at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601)
           at org.apache.iceberg.spark.SparkTestBase.sql(SparkTestBase.java:83)
           at 
org.apache.iceberg.spark.sql.TestCreateTable.dropTestTable(TestCreateTable.java:47)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] pvary commented on pull request #1505: Hive: Make HiveCatalog based tables automatically readable from Hive

Reply via email to