[
https://issues.apache.org/jira/browse/HIVE-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Owen O'Malley reassigned HIVE-4478:
-----------------------------------
Assignee: Prasanth J (was: Owen O'Malley)
> In ORC, add boolean noNulls flag to column stripe metadata
> ----------------------------------------------------------
>
> Key: HIVE-4478
> URL: https://issues.apache.org/jira/browse/HIVE-4478
> Project: Hive
> Issue Type: Sub-task
> Components: File Formats
> Reporter: Eric Hanson
> Assignee: Prasanth J
>
> Currently, the stripe metadata for ORC contains the min and max value for
> each column in the stripe. This will be used for stripe elimination. However,
> an additional bit of metadata for each column for each stripe, noNulls
> (true/false), is needed to help speed up vectorized query execution as much
> as 30%.
> The vectorized QE code has a Boolean flag for each column vector called
> noNulls. If this is true, all the null-checking logic is skipped for that
> column for a VectorizedRowBatch when an operation is performed on that
> column. For simple filters and arithmetic expressions, this can save on the
> order of 30% of the time.
> Once this noNulls stripe metadata is available, the vectorized iterator
> (reader) for ORC can be updated to avoid all expense to load the isNull
> bitmap, and efficiently set the noNulls flag for each column vector.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira