szehon-ho commented on issue #10312:
URL: https://github.com/apache/iceberg/issues/10312#issuecomment-2261085249
yea I think @RussellSpitzer is right, we should rely on validation error to
prevent this scenario here, ie T1 should not be able to commit successfully.
i need to understand one thing:
> When I set use-starting-sequence-number = false for rewriteDataFiles,
Thread 1 compact data files failed at t4. stacktrace:
>
> Caused by: org.apache.iceberg.exceptions.ValidationException: Cannot
commit, found new delete for replaced data file: GenericDataFile{content=data,
file_path=/var/folders/5z/dqrlv_ts0wqf36vd39bb384h0000gn/T/junit17491575750166086656/9f77fae8-d62a-426d-971f-a342b6775c44/test_schema/test_table/data/00000-2-52ae94aa-b796-4c42-bf9c-92d36c52e522-00001.parquet,
file_format=PARQUET, spec_id=0, partition=PartitionData{}, record_count=1,
file_size_in_bytes=407, column_sizes=null,
value_counts=org.apache.iceberg.util.SerializableMap@0,
null_value_counts=org.apache.iceberg.util.SerializableMap@1,
nan_value_counts=org.apache.iceberg.util.SerializableMap@0,
lower_bounds=org.apache.iceberg.SerializableByteBufferMap@e1782,
upper_bounds=org.apache.iceberg.SerializableByteBufferMap@e1782,
key_metadata=null, split_offsets=[4], equality_ids=null, sort_order_id=null}
> at
org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:50)
> at
org.apache.iceberg.MergingSnapshotProducer.validateNoNewDeletesForDataFiles(MergingSnapshotProducer.java:418)
> at
org.apache.iceberg.MergingSnapshotProducer.validateNoNewDeletesForDataFiles(MergingSnapshotProducer.java:367)
> at
org.apache.iceberg.BaseRewriteFiles.validate(BaseRewriteFiles.java:108)
> at org.apache.iceberg.SnapshotProducer.apply(SnapshotProducer.java:175)
> at
org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:296)
> at
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404)
> at
org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214)
> at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198)
> at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
> at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:295)
> at
org.apache.iceberg.actions.RewriteDataFilesCommitManager.commitFileGroups(RewriteDataFilesCommitManager.java:89)
> at
org.apache.iceberg.actions.RewriteDataFilesCommitManager.commitOrClean(RewriteDataFilesCommitManager.java:110)
> at
org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction.doExecute(RewriteDataFilesSparkAction.java:291)
> ... 8 more
> your process is in use-starting-sequence-number = true ?
> I test with use-starting-sequence-number = true and compact failed(apache
iceberg1.4.3):
> Exception in thread "main"
org.apache.iceberg.exceptions.ValidationException: Cannot commit, found new
delete for replaced data file: GenericDataFile ...
from above conversation it seem we get the validationException in both
code-paths, isnt it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]