Kontinuation commented on issue #5752: URL: https://github.com/apache/iceberg/issues/5752#issuecomment-1244864514
This seems to be an issue of [trino-iceberg plugin](https://github.com/trinodb/trino/blob/385/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/hms/HiveMetastoreTableOperations.java#L92-L98), which throws `CommitFailedException` on an inappropriate occasion. Commit of transactions takes roughly 2 steps: 1. Write metadata of the new snapshot to file storage; 2. Update the catalog to point to the location of the new metadata. This is the step where `CommitFailedException` could be thrown after the snapshot is persisted to file storage. This issue is about whether we can roll back step 1 when step 2 fails. Step 2 can fail in many ways, in some cases we are pretty sure that the update must have failed, and the metadata location of the table should remain unchanged after failure. Example of such failures are `CommitFailedException` and `org.apache.hadoop.hive.metastore.api.AlreadyExistsException`. In this case we can safely roll back step 1. For most types of failures, we don't know if the catalog was actually updated or not, such as the `java.net.SocketTimeoutException` mentioned in this issue. We should not blindly rollback step 1 in this case. The hive metastore catalog implementation in iceberg [does a second check](https://github.com/apache/iceberg/blob/0.14.x/core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java#L278-L288) when unexpected exception was raised during metastore update. To my mere understanding, `CommitFailedException` in iceberg indicates a known commit failure, where the catalog was guaranteed to be left unmodified (thrown in metadata consistency validation process most of the time). It should not be thrown in case of socket errors when updating hive metadata. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
