rdblue commented on code in PR #10962:
URL: https://github.com/apache/iceberg/pull/10962#discussion_r1861332491
##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -924,6 +934,7 @@ public List<ManifestFile> apply(TableMetadata base,
Snapshot snapshot) {
!= ManifestWriter
.UNASSIGNED_SEQ) // filter out unassigned in
rewritten manifests
.reduce(base.lastSequenceNumber(), Math::min);
+ long minDataSequenceNumber = Math.min(minNewFileSequenceNumber,
minExistingDataSequenceNumber);
Review Comment:
The logic for calculating the min sequence number is getting too long to
embed here. I think it should be moved to a separate private method:
```java
private long minDataSequenceNumber() {
long minAddedDataSequenceNumber = addedDataFiles().stream()
.map(ContentFile::dataSequenceNumber)
.filter(Objects::nonNull)
.filter(seq -> seq >= 0)
.reduce(base.nextSequenceNumber(), Math::min);
long minExistingDataSequenceNumber =
filtered.stream()
.map(ManifestFile::minSequenceNumber)
.filter(
seq ->
seq
!= ManifestWriter
.UNASSIGNED_SEQ) // filter out unassigned in
rewritten manifests
.reduce(base.lastSequenceNumber(), Math::min);
long minDataSequenceNumber = Math.min(minAddedDataSequenceNumber,
minExistingDataSequenceNumber)
return Math.min(Math.min(minAddedDataSequenceNumber,
minExistingDataSequenceNumber), newDataFilesDataSequenceNumber);
}
```
This also fixes the awkwardness of deciding whether to use
`base.nextSequenceNumber()` or `newDataFilesDataSequenceNumber`.
I also agree with @amogh-jahagirdar that checking `addedDataFiles()` is not
currently necessary, but it seems like a good idea to future-proof this issue
in case we change the code again. This bug was likely introduced when we added
`newDataFilesDataSequenceNumber`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]