[
https://issues.apache.org/jira/browse/HIVE-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731681#comment-13731681
]
Gopal V commented on HIVE-4246:
-------------------------------
The IN() implementation does a linear search on the predicate leaves right now.
Since we are only checking range & not actual membership, it would be better to
store it as a sorted list and perform a bin search.
In most cases this will enable a fast path for the list's min/max.
But in the corner case we'll get a case where the bin search inserts min & max
at the same location & matches no element, then we can skip the block.
> Implement predicate pushdown for ORC
> ------------------------------------
>
> Key: HIVE-4246
> URL: https://issues.apache.org/jira/browse/HIVE-4246
> Project: Hive
> Issue Type: New Feature
> Components: File Formats
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Attachments: HIVE-4246.D11415.1.patch
>
>
> By using the push down predicates from the table scan operator, ORC can skip
> over 10,000 rows at a time that won't satisfy the predicate. This will help a
> lot, especially if the file is sorted by the column that is used in the
> predicate.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira