[ https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13636677#comment-13636677 ]
Kevin Wilfong commented on HIVE-4340: ------------------------------------- https://reviews.facebook.net/D10179 > ORC should provide raw data size > -------------------------------- > > Key: HIVE-4340 > URL: https://issues.apache.org/jira/browse/HIVE-4340 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers > Affects Versions: 0.11.0 > Reporter: Kevin Wilfong > Assignee: Kevin Wilfong > > ORC's SerDe currently does nothing, and hence does not calculate a raw data > size. WriterImpl, however, has enough information to provide one. > WriterImpl should compute a raw data size for each row, aggregate them per > stripe and record it in the strip information, as RC currently does in its > key header, and allow the FileSinkOperator access to the size per row. > FileSinkOperator should be able to get the raw data size from either the > SerDe or the RecordWriter when the RecordWriter can provide it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira