omalley opened a new pull request #925: URL: https://github.com/apache/orc/pull/925
### What changes were proposed in this pull request? This patch adds a new tool that accounts for the total size of a set of ORC files. For files written by >= ORC 1.5, you'll get a column breakdown of the file. There are some virtual columns that are included: - _index the indexes that are used for skipping inside the stripe - _data the data in files written prior to ORC 1.5 - _stripe_footer the stripe metadata - _file_footer the file metadata - _padding padding added to align stripes to HDFS block boundaries I also added a new method on TypeDescription that gets the full field name, which is the inverse of findSubtype. ### Why are the changes needed? The tool helps diagnose the compression of a set of files. ### How was this patch tested? I added a test of the new TypeDescription.getFullFieldName. I ran the tool over some of the examples and some multiple-terabyte directories of production ORC files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
