liqinrae opened a new issue, #5752: URL: https://github.com/apache/iceberg/issues/5752
### Apache Iceberg version 0.14.0 (latest release) ### Query engine Trino ### Please describe the bug 🐞 Hey folks, we are running Trino 385 and Iceberg 0.14.0 on self-managed EKS cluster on AWS and have been seeing missing manifest list file issue frequently in the event of HMS read timeout, similar to this issue https://github.com/trinodb/trino/issues/12581 "message":"java.sql.SQLException: Query failed (#20220830_200538_00217_v3bfb): Failed to get status for file: s3://our_bucket/our_table/metadata/snap-5248356702593051690-1-4bab0c8e-543c-41bd-99a7-62dcb1864795.avro We have enabled object versioning and CloudTrail and can confirm the manifest list file exists and is deleted by Trino. From our Trino log, it looks like that the iceberg library removes the manifest list file in the event of commit failure despite the fact the snapshot has been successfully committed. Here's our query log: Execute a INSERT SELECT FROM statement (QueryId: 20220830_195613_00200) Iceberg performs a base table scans with snapshot ID 7266396334490458249 2022-08-30T19:56:15.027Z INFO Query-20220830_195613_00200_v3bfb-2519 org.apache.iceberg.BaseTableScan Scanning table "our_schema"."our_table" snapshot 7266396334490458249 created at 2022-08-30 16:37:05.638 with filter true Trino successfully committed a snapshot 5248356702593051690 2022-08-30T19:56:17.780Z INFO 20220830_195613_00200_v3bfb.0.0.0-32-257 org.apache.iceberg.SnapshotProducer Committed snapshot 5248356702593051690 (MergeAppend) Despite that the snapshot is successfully committed, Trino reports commit task failure org.apache.iceberg.exceptions.CommitFailedException (This stack trace is taken from Splunk so the execution order might not be accurate ) org.apache.iceberg.exceptions.CommitFailedException: Failed to commit to table our_schema.our_table at io.trino.plugin.iceberg.catalog.AbstractIcebergTableOperations.commit(AbstractIcebergTableOperations.java:150) at org.apache.iceberg.BaseTransaction.lambda$commitSimpleTransaction$5(BaseTransaction.java:386) at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404) at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214) at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198) at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190) at org.apache.iceberg.BaseTransaction.commitSimpleTransaction(BaseTransaction.java:378) at org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:268) at io.trino.plugin.iceberg.IcebergMetadata.finishInsert(IcebergMetadata.java:752) at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.finishInsert(ClassLoaderSafeConnectorMetadata.java:504) at io.trino.metadata.MetadataManager.finishInsert(MetadataManager.java:892) at io.trino.sql.planner.LocalExecutionPlanner.lambda$createTableFinisher$3(LocalExecutionPlanner.java:3962) at io.trino.operator.TableFinishOperator.getOutput(TableFinishOperator.java:319) at io.trino.operator.Driver.processInternal(Driver.java:410) at io.trino.operator.Driver.lambda$process$10(Driver.java:313) at io.trino.operator.Driver.tryWithLock(Driver.java:698) at io.trino.operator.Driver.process(Driver.java:305) at io.trino.operator.Driver.processForDuration(Driver.java:276) at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1092) at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163) at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:488) at io.trino.$gen.Trino_385_sfdc_2_4d4b287____20220830_184755_2.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: io.trino.spi.TrinoException: bdmpresto-hive-002.presto-hive-metastore.uip.aws-esvc1-useast2.aws.sfdc.is:80: java.net.SocketTimeoutException: Read timed out at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastore.alterTable(ThriftHiveMetastore.java:1161) at io.trino.plugin.hive.metastore.thrift.BridgingHiveMetastore.alterTable(BridgingHiveMetastore.java:340) at io.trino.plugin.hive.metastore.thrift.BridgingHiveMetastore.replaceTable(BridgingHiveMetastore.java:219) at io.trino.plugin.hive.metastore.cache.CachingHiveMetastore.replaceTable(CachingHiveMetastore.java:523) at io.trino.plugin.iceberg.catalog.hms.HiveMetastoreTableOperations.commitToExistingTable(HiveMetastoreTableOperations.java:93) ... 25 more ... more As the result of CommitFailedException , Iceberg retries the commit 4 times and all of the retry attempts failed. The retry attempts were exhausted and eventually the query failed. 2022-08-30T19:58:18.386Z WARN 20220830_195613_00200_v3bfb.0.0.0-32-257 org.apache.iceberg.util.Tasks Retrying task after failure: Failed to commit to table our_schema.our_table 2022-08-30T20:00:18.894Z WARN 20220830_195613_00200_v3bfb.0.0.0-32-257 org.apache.iceberg.util.Tasks Retrying task after failure: Failed to commit to table our_schema.our_table 2022-08-30T20:02:19.543Z WARN 20220830_195613_00200_v3bfb.0.0.0-32-257 org.apache.iceberg.util.Tasks Retrying task after failure: Failed to commit to table our_schema.our_table 2022-08-30T20:04:20.385Z WARN 20220830_195613_00200_v3bfb.0.0.0-32-257 org.apache.iceberg.util.Tasks Retrying task after failure: Failed to commit to table our_schema.our_table 2022-08-30T20:05:32.872Z ERROR SplitRunner-151-257 io.trino.execution.executor.TaskExecutor Error processing Split 20220830_195613_00200_v3bfb.0.0.0-32 (start = 4349124.505863, wall = 557971 ms, cpu = 7 ms, wait = 7 ms, calls = 27) 2022-08-30T20:05:32.876Z ERROR remote-task-callback-1373 io.trino.execution.scheduler.PipelinedStageExecution Pipelined stage execution for stage 20220830_195613_00200_v3bfb.0 failed 2022-08-30T20:05:32.878Z ERROR stage-scheduler io.trino.execution.StageStateMachine Stage 20220830_195613_00200_v3bfb.0 failed After this query failure, we start to see missing manifest list error. For example, query 20220830_200538_00217_v3bfb-2732. When performing base table scan, the metadata file is now pointing at the new snapshot created from Step 4. Yet the manifest list file is missing, resulting in query failure: 2022-08-30T20:05:39.366Z INFO Query-20220830_200538_00217_v3bfb-2732 org.apache.iceberg.BaseTableScan Scanning table "sdb_trace_non_prod_hourly"."statement_text" snapshot 5248356702593051690 created at 2022-08-30 19:56:17.773 with filter true "message":"java.sql.SQLException: Query failed (#20220830_200538_00217_v3bfb): Failed to get status for file: s3://our_bucket/our_table/metadata/snap-5248356702593051690-1-4bab0c8e-543c-41bd-99a7-62dcb1864795.avro It looks like Iceberg successfully committed a snapshot change. However, because of the CommitFailedException, The manifest list file that was created along with the snapshot was deleted. We dived deeper into the iceberg library and suspected that this is where iceberg removes manifest list in the event of commit failure: [commitSimpleTransaction](https://github.com/apache/iceberg/blob/cc7e6d9fa259707a5c690b3534ea6b5c7da3fc05/core/src/main/java/org/apache/iceberg/BaseTransaction.java#L402) calls [cleanUpOnCommitFailure](https://github.com/apache/iceberg/blob/cc7e6d9fa259707a5c690b3534ea6b5c7da3fc05/core/src/main/java/org/apache/iceberg/BaseTransaction.java#L443) in the event of commit failure. Then [cleanAll](https://github.com/apache/iceberg/blob/cc7e6d9fa259707a5c690b3534ea6b5c7da3fc05/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L435) is called to remove manifest list file in the new snapshot update. Can someone in the community help us understand: 1. Why would a commitTaskFailure exception be thrown when the snapshot has been successfully committed? 2. When a commitFailedException is thrown, is the commit state known or unknown? If the state is known, removing the manifest list after a successful snapshot creation and commitFailedException will corrupt the table. Should we avoid removing manifest file when the metadata file is updated? Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
