[ 
https://issues.apache.org/jira/browse/HIVE-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4478:
--------------------------------

    Affects Version/s: 0.12.0
    
> In ORC, add boolean noNulls flag to column stripe metadata
> ----------------------------------------------------------
>
>                 Key: HIVE-4478
>                 URL: https://issues.apache.org/jira/browse/HIVE-4478
>             Project: Hive
>          Issue Type: Sub-task
>          Components: File Formats
>    Affects Versions: 0.12.0
>            Reporter: Eric Hanson
>            Assignee: Prasanth J
>         Attachments: HIVE-4478.1.patch.txt, HIVE-4478.2.git.patch.txt
>
>
> Currently, the stripe metadata for ORC contains the min and max value for 
> each column in the stripe. This will be used for stripe elimination. However, 
> an additional bit of metadata for each column for each stripe, noNulls 
> (true/false), is needed to help speed up vectorized query execution as much 
> as 30%. 
> The vectorized QE code has a Boolean flag for each column vector called 
> noNulls. If this is true, all the null-checking logic is skipped for that 
> column for a VectorizedRowBatch when an operation is performed on that 
> column. For simple filters and arithmetic expressions, this can save on the 
> order of 30% of the time.
> Once this noNulls stripe metadata is available, the vectorized iterator 
> (reader) for ORC can be updated to avoid all expense to load the isNull 
> bitmap, and efficiently set the noNulls flag for each column vector.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to