[
https://issues.apache.org/jira/browse/HIVE-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815210#comment-13815210
]
Owen O'Malley commented on HIVE-5562:
-------------------------------------
+1
> Provide stripe level column statistics in ORC
> ---------------------------------------------
>
> Key: HIVE-5562
> URL: https://issues.apache.org/jira/browse/HIVE-5562
> Project: Hive
> Issue Type: New Feature
> Components: File Formats
> Affects Versions: 0.13.0
> Reporter: Prasanth J
> Assignee: Prasanth J
> Labels: orcfile
> Fix For: 0.13.0
>
> Attachments: HIVE-5562.1.patch.txt, HIVE-5562.2.patch.txt
>
>
> ORC maintains two levels of column statistics. Index statistics (for every
> rowgroup) and file level column statistics for the entire file. It is useful
> to have stripe level column statistics which will be intermediate to index
> and file statistics. The reason to maintain stripe level statistics is that,
> the current input split computation logic is based on stripe boundaries. So
> if stripe level statistics are available and if a stripe doesn't satisfy a
> predicate condition then that entire stripe (also split) can be eliminated
> from split computation.
--
This message was sent by Atlassian JIRA
(v6.1#6144)