coderfender commented on code in PR #12824:
URL: https://github.com/apache/iceberg/pull/12824#discussion_r2082637481
##########
core/src/main/java/org/apache/iceberg/actions/BinPackRewriteFilePlanner.java:
##########
@@ -199,30 +214,48 @@ protected long defaultTargetFileSize() {
public FileRewritePlan<FileGroupInfo, FileScanTask, DataFile,
RewriteFileGroup> plan() {
StructLikeMap<List<List<FileScanTask>>> plan = planFileGroups();
RewriteExecutionContext ctx = new RewriteExecutionContext();
- Stream<RewriteFileGroup> groups =
- plan.entrySet().stream()
- .filter(e -> !e.getValue().isEmpty())
- .flatMap(
- e -> {
- StructLike partition = e.getKey();
- List<List<FileScanTask>> scanGroups = e.getValue();
- return scanGroups.stream()
- .map(
- tasks -> {
- long inputSize = inputSize(tasks);
- return newRewriteGroup(
- ctx,
- partition,
- tasks,
- inputSplitSize(inputSize),
- expectedOutputFiles(inputSize));
- });
- })
- .sorted(RewriteFileGroup.comparator(rewriteJobOrder));
+ List<RewriteFileGroup> selectedFileGroups = new ArrayList<>();
+ AtomicInteger fileCountRunner = new AtomicInteger();
+ plan.entrySet().stream()
Review Comment:
@pvary , I pushed a commit which moved the pruning logic rught after we get
fileScanTasks from the scan API . The one good thing with this is that the
implementation is easier than the above approach and the other thing to note is
the file scan tasks getting pruned are always guaranteed to be random (since we
are pruning before grouping the partitions) . Let me know if you think this is
a clearer approach than the previous one or other wise
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]