weimingdiit commented on code in PR #7362: URL: https://github.com/apache/hudi/pull/7362#discussion_r1043981502
########## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java: ########## @@ -179,6 +179,20 @@ public class HoodieCompactionConfig extends HoodieConfig { + "record size estimate compute dynamically based on commit metadata. " + " This is critical in computing the insert parallelism and bin-packing inserts into small files."); + public static final ConfigProperty<String> COPY_ON_WRITE_RECORD_DYNAMIC_SAMPLE_MAXNUM = ConfigProperty + .key("hoodie.copyonwrite.record.dynamic.sample.maxnum") + .defaultValue(String.valueOf(100)) + .withDocumentation("Although dynamic sampling is adopted, if the record size assumed by the user is unreasonable during the first write execution, " + + "files that are too large or too small will be generated. Therefore, sampling is conducted from the data set during the first write operation. " + + "In order to ensure performance, this parameter controls the absolute value of sampling."); Review Comment: > just do the records sampling and get the estimated size is okey ? Yeah, It is only used to accurately estimate the size of the record when first written -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org