amrishlal opened a new pull request, #8645: URL: https://github.com/apache/hudi/pull/8645
### Change Logs Calculate and output file size stats of data files that were modified in the half-open interval [start date (--start-date parameter), end date (--end-date parameter)). --num-days parameter can be used to select data files over last --num-days. If --start-date is specified, --num-days will be ignored. If none of the date parameters are set, stats will be computed over all data files of all partitions in the table. By default, only table level file size stats are printed. If --partition-status option is used, partition level file size stats also get printed. The following stats and calculated: * Number of files. * Total table size. * Minimum file size * Maximum file size * Average file size * Median file size * p50 file size * p90 file size * p95 file size * p99 file size Sample spark-submit command: > ./bin/spark-submit \ --class org.apache.hudi.utilities.TableSizeStats \ $HUDI_DIR/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.14.0-SNAPSHOT.jar \ --base-path <base-path> \ --num-days <number-of-days> ### Impact Offline utility ### Risk level (write none, low medium or high below) low ### Documentation Update ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org