pvary commented on pull request #1505:
URL: https://github.com/apache/iceberg/pull/1505#issuecomment-698477361


   > > > The changes look good. Just a quick sanity check: what happens in Hive 
if the serde or storage handler class is missing? Will this break metastore 
operations on the table?
   > > > I remember that a missing class used to break any query that loaded 
the table, including DROP TABLE. So you couldn't even clean up problems.
   > > 
   > > 
   > > That was my fear as well, but did not have time to test them out yet. It 
might even cause issues for users who are not using the Hive tables, just want 
to use HiveCatalog to store the table versions.
   > 
   > I'm pretty sure that if you try to perform any operations on tables with 
the StorageHandler set on them, and you don't have that class on the classpath, 
it will fail with an error. I've been in exactly the situation you describe 
where I couldn't issue a DROP TABLE for a table that was created by a test 
storage handler that no longer existed.
   
   The tests caught the exact same problem:
   ```
   org.apache.iceberg.spark.sql.TestAlterTable > testSetTableProperties[2] 
FAILED
       java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage 
handler.org.apache.iceberg.mr.hive.HiveIcebergStorageHandler
           at 
org.apache.hadoop.hive.ql.metadata.Table.getStorageHandler(Table.java:297)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.convertHiveTableToCatalogTable(HiveClientImpl.scala:465)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:424)
           at scala.Option.map(Option.scala:230)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:424)
   ```
   
   My first ideas:
   1. Check that the classes are on the classpath before creating the table? - 
This would only fix the problem if the creator/reader of the table is both have 
the same things on the classpath.
   2. Move the HiveIcebergStorageHandler / HiveIcebergSerDe to their own 
package and try to shave of the dependencies - Seems like a hard solution and 
we will eventually fall back to Reflection
   3. Go full reflection: Create a reflection based SerDe, StorageHandler which 
checks the classpath for the actual classes and if successful wraps them?
   
   @rdblue, @massdosage: What do you think about the 3rd option?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to