[ https://issues.apache.org/jira/browse/HUDI-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620607#comment-17620607 ]
sivabalan narayanan commented on HUDI-1570: ------------------------------------------- Add a FAQ on how to fetch the record size for a given table. > Add Avg record size in commit metadata > -------------------------------------- > > Key: HUDI-1570 > URL: https://issues.apache.org/jira/browse/HUDI-1570 > Project: Apache Hudi > Issue Type: Improvement > Components: Utilities > Reporter: sivabalan narayanan > Assignee: Jonathan Vexler > Priority: Major > Fix For: 0.13.0 > > Attachments: Screen Shot 2021-01-31 at 7.05.55 PM.png > > Original Estimate: 2h > Remaining Estimate: 2h > > Many users want to understand what would be their avg record size in hudi > storage. They need this so that they can deduce their bloom config values. > As of now, there is no easy way to fetch record size for the end user. Even > w/ hudi-cli, we could decipher from commit metadata, but we need to make some > rough calculation. So, it would be better if we store the avg record size w/ > WriteStats (total bytes written/ total records written) , as well as in > commit metadata. So, in hudi_cli, we could expose this info along w/ "commit > showpartitions" or expose another command "commit showmetadata" or something. > As of now, we could calculate the avg size from bytes written/records written > from commit metadata. > !Screen Shot 2021-01-31 at 7.05.55 PM.png! > > -- This message was sent by Atlassian Jira (v8.20.10#820010)