[I] Encountered ERROR RewriteDataFilesCommitManager: Cannot commit groups [iceberg]

via GitHub Thu, 18 Jan 2024 23:01:19 -0800


a8356555 opened a new issue, #9521:
URL: https://github.com/apache/iceberg/issues/9521


   ### Query engine
   
   Spark
   
   ### Question
   
   I am currently using Flink to stream data into an Iceberg table. The Flink 
job writes to the Iceberg table every minute. Due to the presence of too many 
small files, I use following Spark code for maintenance.
   ```sql
   CALL glue_catalog.system.rewrite_data_files(
       table => 'my_db.my_table', 
       options => map(
           'max-concurrent-file-group-rewrites', '4',
           'partial-progress.enabled', 'true')
       )
   ```
   
   However, during the maintenance process, I encountered the following ERROR. 
   ```
   24/01/17 18:12:29 ERROR RewriteDataFilesCommitManager: Cannot commit groups 
[RewriteFileGroup{info=FileGroupInfo{globalIndex=350, partitionIndex=1, 
partition=PartitionData{date_utc8=19483}}, numRewrittenFiles=163, 
numAddedFiles=2, numRewrittenBytes=140070183}, ...], attempting to clean up 
written files
   org.apache.iceberg.exceptions.CommitFailedException: Cannot commit 
GlueCatalog.bitopro_ods.ods_bitopro_mysql_order_matches because base metadata 
location 
's3://production-data-glue-iceberg-warehouse/bitopro_ods.db/ods_bitopro_mysql_order_matches/metadata/66157-79e49b17-b6c6-432b-9e06-c38b2150c312.metadata.json'
 is not same as the current Glue location 
's3://production-data-glue-iceberg-warehouse/bitopro_ods.db/ods_bitopro_mysql_order_matches/metadata/66160-9b65de57-7904-4e08-b215-f88f00b8c66d.metadata.json'
        at 
org.apache.iceberg.aws.glue.GlueTableOperations.checkMetadataLocation(GlueTableOperations.java:272)
        at 
org.apache.iceberg.aws.glue.GlueTableOperations.doCommit(GlueTableOperations.java:158)
        at 
org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:135)
        at 
org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:390)
        at 
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
        at 
org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
        at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
        at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
        at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:364)
        at 
org.apache.iceberg.actions.RewriteDataFilesCommitManager.commitFileGroups(RewriteDataFilesCommitManager.java:78)
        at 
org.apache.iceberg.actions.RewriteDataFilesCommitManager.commitOrClean(RewriteDataFilesCommitManager.java:100)
        at 
org.apache.iceberg.actions.RewriteDataFilesCommitManager$CommitService.commitOrClean(RewriteDataFilesCommitManager.java:134)
        at 
org.apache.iceberg.actions.BaseCommitService.commitReadyCommitGroups(BaseCommitService.java:205)
        at 
org.apache.iceberg.actions.BaseCommitService.offer(BaseCommitService.java:133)
        at 
org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction.lambda$doExecuteWithPartialProgress$4(RewriteDataFilesSparkAction.java:355)
        at 
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
        at org.apache.iceberg.util.Tasks$Builder.access$300(Tasks.java:69)
        at org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:315)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
   ```
   It looks like a concurrent write issue. However, the flink job cannot be 
stopped. How can I solve this error?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Encountered ERROR RewriteDataFilesCommitManager: Cannot commit groups [iceberg]

Reply via email to