SteNicholas commented on code in PR #7159: URL: https://github.com/apache/hudi/pull/7159#discussion_r1025443367
########## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/clustering/plan/strategy/FlinkSizeBasedClusteringPlanStrategy.java: ########## @@ -70,9 +70,11 @@ protected Stream<HoodieClusteringGroup> buildClusteringGroupsForPartition(String // check if max size is reached and create new group, if needed. // in now, every clustering group out put is 1 file group. if (totalSizeSoFar >= writeConfig.getClusteringTargetFileMaxBytes() && !currentGroup.isEmpty()) { - LOG.info("Adding one clustering group " + totalSizeSoFar + " max bytes: " - + writeConfig.getClusteringMaxBytesInGroup() + " num input slices: " + currentGroup.size()); - fileSliceGroups.add(Pair.of(currentGroup, 1)); + if (currentGroup.size() > 1 || (!StringUtils.isNullOrEmpty(writeConfig.getClusteringSortColumns()) && currentGroup.size() == 1)) { Review Comment: If the one file has already been clustered, it's no need to add into `fileSliceGroups`. IMO, whether the `fileSliceGroups` has only one clustered file could move out of the `buildClusteringGroupsForPartition`, therefore this doesn't modify the strategy of Flink and Spark engine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org