codope commented on issue #4170: URL: https://github.com/apache/hudi/issues/4170#issuecomment-990053669
@rubenssoto Are all the files in the screenshot created due to clustering? From the screenshot, it looks like that one output file group is getting created (all fileGroupIds in the screenshot are unique) per input slice. However, from the configs the [number of output groups](https://github.com/apache/hudi/blob/68f8597b12edebba759b942f6337540f21a2db96/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/plan/strategy/SparkSizeBasedClusteringPlanStrategy.java#L79-L86) should have been 2. Can you paste the clustering plan in the requested replacecommit in the timeline? Or you could also share info level logs and look for: ``` LOG.info("Adding one clustering group " + totalSizeSoFar + " max bytes: " + writeConfig.getClusteringMaxBytesInGroup() + " num input slices: " + currentGroup.size() + " output groups: " + numOutputGroups); LOG.info("Adding final clustering group " + totalSizeSoFar + " max bytes: " + writeConfig.getClusteringMaxBytesInGroup() + " num input slices: " + currentGroup.size() + " output groups: " + numOutputGroups); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org