aokolnychyi commented on code in PR #11158:
URL: https://github.com/apache/iceberg/pull/11158#discussion_r1792639080
##########
core/src/main/java/org/apache/iceberg/FastAppend.java:
##########
@@ -215,7 +213,7 @@ private List<ManifestFile> writeNewManifests() throws
IOException {
}
if (newManifests == null && !newFiles.isEmpty()) {
- this.newManifests = writeDataManifests(newFiles, spec);
+ this.newManifests = writeDataManifests(Lists.newArrayList(newFiles),
spec);
Review Comment:
What about modifying `writeDataManifests` to accept `Collection` and moving
the list creation to `divide`?
##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -533,4 +531,51 @@ private Pair<InclusiveMetricsEvaluator,
StrictMetricsEvaluator> metricsEvaluator
return metricsEvaluators.get(partition);
}
}
+
+ private class FilesToDeleteHolder {
Review Comment:
Is there any way we can do this differently? In theory, we can add another
abstract method, similar to how we handle manifest writers.
```
protected abstract Set<F> newFileSet();
protected abstract ManifestWriter<F> newManifestWriter(PartitionSpec spec);
protected abstract ManifestReader<F> newManifestReader(ManifestFile
manifest);
```
One caveat is calling this method to initialize an instance field. It is
considered a bad practice but implementations will be stateless, so it will
work. We could pass `Supplier<Set<F>>` but not sure it is better. In either
case, we need to find a way not to have both sets of files here. It will also
reduce the number of changes.
##########
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java:
##########
@@ -372,8 +367,14 @@ private boolean manifestHasDeletedFiles(
for (ManifestEntry<F> entry : reader.liveEntries()) {
F file = entry.file();
+
+ // add path-based delete to set of files to be deleted
+ if (deletePaths.contains(CharSequenceWrapper.wrap(file.path()))) {
Review Comment:
Why do we wrap? It is `CharSequenceSet`.
##########
core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java:
##########
@@ -81,8 +83,8 @@ abstract class MergingSnapshotProducer<ThisT> extends
SnapshotProducer<ThisT> {
// update data
private final Map<PartitionSpec, List<DataFile>> newDataFilesBySpec =
Maps.newHashMap();
- private final CharSequenceSet newDataFilePaths = CharSequenceSet.empty();
- private final CharSequenceSet newDeleteFilePaths = CharSequenceSet.empty();
+ private final DataFileSet newDataFiles = DataFileSet.create();
+ private final DeleteFileSet newDeleteFiles = DeleteFileSet.create();
Review Comment:
Do we need these extra collections? Can't we use sets in
`newDataFilesBySpec` and `newDeleteFilesBySpec`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]