pvary commented on code in PR #15656:
URL: https://github.com/apache/iceberg/pull/15656#discussion_r2946383074
##########
core/src/main/java/org/apache/iceberg/CherryPickOperation.java:
##########
@@ -218,17 +222,43 @@ private static void validateReplacedPartitions(
parentId == null || isCurrentAncestor(meta, parentId),
"Cannot cherry-pick overwrite, based on non-ancestor of the current
state: %s",
parentId);
- try (CloseableIterable<DataFile> newFiles =
- SnapshotUtil.newFilesBetween(
- parentId, meta.currentSnapshot().snapshotId(), meta::snapshot,
io)) {
- for (DataFile newFile : newFiles) {
- ValidationException.check(
- !replacedPartitions.contains(newFile.specId(),
newFile.partition()),
- "Cannot cherry-pick replace partitions with changed partition:
%s",
- newFile.partition());
+ List<Snapshot> snapshots =
+ Lists.newArrayList(
+ SnapshotUtil.ancestorsBetween(
+ meta.currentSnapshot().snapshotId(), parentId,
meta::snapshot));
+ if (!snapshots.isEmpty()) {
+ Iterable<CloseableIterable<DataFile>> addedFileTasks =
+ Iterables.concat(
+ Iterables.transform(
+ snapshots,
+ snap -> {
+ Iterable<ManifestFile> ownedManifests =
+ Iterables.filter(
+ snap.dataManifests(io),
+ m -> Objects.equals(m.snapshotId(),
snap.snapshotId()));
+ return Iterables.transform(
+ ownedManifests,
+ manifest -> {
+ CloseableIterable<ManifestEntry<DataFile>> entries
=
+ ManifestFiles.read(manifest, io,
meta.specsById()).entries();
+ CloseableIterable<ManifestEntry<DataFile>> added =
+ CloseableIterable.filter(
+ entries, e -> e.status() ==
ManifestEntry.Status.ADDED);
+ return CloseableIterable.transform(added, e ->
e.file().copy());
+ });
+ }));
Review Comment:
I'm not really happy with this formatting. The variable names are lost in
the middle of the lines and I find this hard to read.
Either do a full inline if we thing it is trivial to understand
```suggestion
Iterable<CloseableIterable<DataFile>> addedFileTasks =
Iterables.concat(
Iterables.transform(
snapshots,
snap ->
Iterables.transform(
Iterables.filter(
snap.dataManifests(io),
m -> Objects.equals(m.snapshotId(),
snap.snapshotId())),
manifest ->
CloseableIterable.transform(
CloseableIterable.filter(
ManifestFiles.read(manifest, io,
meta.specsById())
.entries(),
e -> e.status() ==
ManifestEntry.Status.ADDED),
e -> e.file().copy()))));
```
or extract some of the iterator generations to methods, like:
```
private static CloseableIterable<DataFile> addedFiles(
ManifestFile manifest, FileIO io, Map<Integer, PartitionSpec>
specsById) {
CloseableIterable<ManifestEntry<DataFile>> entries =
ManifestFiles.read(manifest, io, specsById).entries();
CloseableIterable<ManifestEntry<DataFile>> added =
CloseableIterable.filter(entries, e -> e.status() ==
ManifestEntry.Status.ADDED);
return CloseableIterable.transform(added, e -> e.file().copy());
}
private static Iterable<ManifestFile> manifestsCreatedBy(Snapshot
snapshot, FileIO io) {
return Iterables.filter(
snapshot.dataManifests(io),
m -> Objects.equals(m.snapshotId(), snapshot.snapshotId()));
}
```
And then:
```
Iterable<CloseableIterable<DataFile>> addedFileTasks =
Iterables.concat(
Iterables.transform(
snapshots,
snap -> Iterables.transform(
manifestsCreatedBy(snap, io),
manifest -> addedFiles(manifest, io,
meta.specsById()))));
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]