[ https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prasanth J updated HIVE-4340: ----------------------------- Attachment: (was: HIVE-4340-java-only.4.patch.txt) > ORC should provide raw data size > -------------------------------- > > Key: HIVE-4340 > URL: https://issues.apache.org/jira/browse/HIVE-4340 > Project: Hive > Issue Type: Improvement > Components: File Formats > Affects Versions: 0.11.0 > Reporter: Kevin Wilfong > Assignee: Kevin Wilfong > Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, > HIVE-4340.3.patch.txt > > > ORC's SerDe currently does nothing, and hence does not calculate a raw data > size. WriterImpl, however, has enough information to provide one. > WriterImpl should compute a raw data size for each row, aggregate them per > stripe and record it in the strip information, as RC currently does in its > key header, and allow the FileSinkOperator access to the size per row. > FileSinkOperator should be able to get the raw data size from either the > SerDe or the RecordWriter when the RecordWriter can provide it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira