cxzl25 opened a new pull request, #1816:
URL: https://github.com/apache/orc/pull/1816

   ### What changes were proposed in this pull request?
   Add support for summarizing the number of files, file sizes and file lines 
in the sizes command.
   
   ### Why are the changes needed?
   When we count the size of each field, we only know the percentage and the 
average size of each row, but we do not know the overall value.
   
   ### How was this patch tested?
   local test
   
   ```bash
   java -jar orc-tools-2.1.0-SNAPSHOT-uber.jar sizes -h
   usage: sizes
    -h,--help              Print help message
    -i,--ignoreExtension   Ignore ORC file extension
    -s,--summary           Summarize the number of files, file size, and
                           number of file lines
   ```
   
   ```
   java -jar orc-tools-2.1.0-SNAPSHOT-uber.jar sizes -s
   ```
   
   ```
   Total Files: 5
   Total Sizes: 4803687270
   Total Rows: 39820045
   Percent  Bytes/Row  Name
     26.41  31.86
   ```
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to