WantingDU opened a new issue, #13552:
URL: https://github.com/apache/iceberg/issues/13552
### Apache Iceberg version
1.9.1 (latest release)
### Query engine
Flink
### Please describe the bug 🐞
We are using Apache Iceberg with Flink 1.20.1, and we have observed that
jobs cannot restore properly from checkpoints due to a missing
fileSequenceNumber. The Preconditions.checkNotNull at
SplitComparator#fileSequenceNumber always fails during the restoration process.
We are using Apache Iceberg with Flink in V2 format. We have verified that
the sequence-number field is correctly present inside our metadata.json
snapshots.
However, we always encounter the following exception during job restoration:
```
Caused by: java.lang.NullPointerException: Invalid file sequence number:
null. Doesn't support splits written with V1 format:
IcebergSourceSplit{files=[SplitScanTask{file=<xxx>.parquet, start=4,
length=4416967}], fileOffset=0, recordOffset=0}
at
org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:1008)
at
org.apache.iceberg.flink.source.split.SplitComparators.lambda$fileSequenceNumber$96288a6c$1(SplitComparators.java:45)
at java.base/java.util.PriorityQueue.siftUpUsingComparator(Unknown
Source)
at java.base/java.util.PriorityQueue.siftUp(Unknown Source)
at java.base/java.util.PriorityQueue.offer(Unknown Source)
at java.base/java.util.PriorityQueue.add(Unknown Source)
at
org.apache.iceberg.flink.source.assigner.DefaultSplitAssigner.lambda$new$0(DefaultSplitAssigner.java:54)
at java.base/java.util.ArrayList.forEach(Unknown Source)
at
org.apache.iceberg.flink.source.assigner.DefaultSplitAssigner.<init>(DefaultSplitAssigner.java:54)
at
org.apache.iceberg.flink.source.assigner.OrderedSplitAssignerFactory.createAssigner(OrderedSplitAssignerFactory.java:44)
at
org.apache.iceberg.flink.source.IcebergSource.createEnumerator(IcebergSource.java:217)
at
org.apache.iceberg.flink.source.IcebergSource.restoreEnumerator(IcebergSource.java:193)
at
org.apache.iceberg.flink.source.IcebergSource.restoreEnumerator(IcebergSource.java:88)
at
org.apache.flink.runtime.source.coordinator.SourceCoordinator.resetToCheckpoint(SourceCoordinator.java:511)
at
org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$DeferrableCoordinator.resetAndStart(RecreateOnResetOperatorCoordinator.java:429)
```
We suspect the issue arises because the fileSequenceNumber field is lost
during the serialization and deserialization phase handled by
FileScanTaskParser.
This issue might be relevant to [GitHub Issue
#13320](https://github.com/apache/iceberg/issues/13320)
### Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [x] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]