[
https://issues.apache.org/jira/browse/HIVE-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694879#comment-13694879
]
Owen O'Malley commented on HIVE-4478:
-------------------------------------
Prasanth, I don't think we should add the suppress information to the
BitFieldWriter. Could you instead modify the StreamFactory.createStream from a
PositionedOutputStream to OutStream? Then you can hold onto the present
stream's OutWriter in the TreeWriter class. Then the TreeWriter.writeStream can
suppress the underlying OutStream directly.
> In ORC, add boolean noNulls flag to column stripe metadata
> ----------------------------------------------------------
>
> Key: HIVE-4478
> URL: https://issues.apache.org/jira/browse/HIVE-4478
> Project: Hive
> Issue Type: Sub-task
> Components: File Formats
> Reporter: Eric Hanson
> Assignee: Prasanth J
> Attachments: HIVE-4478.1.patch.txt
>
>
> Currently, the stripe metadata for ORC contains the min and max value for
> each column in the stripe. This will be used for stripe elimination. However,
> an additional bit of metadata for each column for each stripe, noNulls
> (true/false), is needed to help speed up vectorized query execution as much
> as 30%.
> The vectorized QE code has a Boolean flag for each column vector called
> noNulls. If this is true, all the null-checking logic is skipped for that
> column for a VectorizedRowBatch when an operation is performed on that
> column. For simple filters and arithmetic expressions, this can save on the
> order of 30% of the time.
> Once this noNulls stripe metadata is available, the vectorized iterator
> (reader) for ORC can be updated to avoid all expense to load the isNull
> bitmap, and efficiently set the noNulls flag for each column vector.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira