[GitHub] [hudi] codope commented on issue #4170: [SUPPORT] Understanding Clustering Behavior

GitBox Thu, 09 Dec 2021 09:17:04 -0800


codope commented on issue #4170:
URL: https://github.com/apache/hudi/issues/4170#issuecomment-990053669



   @rubenssoto Are all the files in the screenshot created due to clustering? 
From the screenshot, it looks like that one output file group is getting 
created (all fileGroupIds in the screenshot are unique) per input slice. 
However, from the configs the [number of output 
groups](https://github.com/apache/hudi/blob/68f8597b12edebba759b942f6337540f21a2db96/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/plan/strategy/SparkSizeBasedClusteringPlanStrategy.java#L79-L86)
 should have been 2.
   
   Can you paste the clustering plan in the requested replacecommit in the 
timeline? Or you could also share info level logs and look for:
   ```
   LOG.info("Adding one clustering group " + totalSizeSoFar + " max bytes: "
               + writeConfig.getClusteringMaxBytesInGroup() + " num input 
slices: " + currentGroup.size() + " output groups: " + numOutputGroups);
   
   LOG.info("Adding final clustering group " + totalSizeSoFar + " max bytes: "
             + writeConfig.getClusteringMaxBytesInGroup() + " num input slices: 
" + currentGroup.size() + " output groups: " + numOutputGroups);
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codope commented on issue #4170: [SUPPORT] Understanding Clustering Behavior

Reply via email to