[ 
https://issues.apache.org/jira/browse/HIVE-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811604#comment-13811604
 ] 

Prasanth J commented on HIVE-5632:
----------------------------------

[~ehans] Sorry about my comment about in-memory skips above. For every 10,000 
rows or configured number of rows (orc.row.index.stride) ORC creates disk 
ranges (byte ranges) that are required to be read. Only the disk ranges that 
satisfies min/max conditions will be read. 

> Eliminate splits based on SARGs using stripe statistics in ORC
> --------------------------------------------------------------
>
>                 Key: HIVE-5632
>                 URL: https://issues.apache.org/jira/browse/HIVE-5632
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 0.13.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>              Labels: orcfile
>         Attachments: HIVE-5632.1.patch.txt, HIVE-5632.2.patch.txt, 
> HIVE-5632.3.patch.txt, orc_split_elim.orc
>
>
> HIVE-5562 provides stripe level statistics in ORC. Stripe level statistics 
> combined with predicate pushdown in ORC (HIVE-4246) can be used to eliminate 
> the stripes (thereby splits) that doesn't satisfy the predicate condition. 
> This can greatly reduce unnecessary reads.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to