[
https://issues.apache.org/jira/browse/HIVE-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696167#comment-13696167
]
Hudson commented on HIVE-4478:
------------------------------
Integrated in Hive-trunk-hadoop2 #263 (See
[https://builds.apache.org/job/Hive-trunk-hadoop2/263/])
HIVE-4478. In ORC remove ispresent stream from columns that contain no null
values in a stripe. (Prasanth Jayachandran via omalley) (Revision 1497912)
Result = FAILURE
omalley :
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1497912
Files :
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
*
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java
* /hive/trunk/ql/src/test/resources/orc-file-dump.out
> In ORC, add boolean noNulls flag to column stripe metadata
> ----------------------------------------------------------
>
> Key: HIVE-4478
> URL: https://issues.apache.org/jira/browse/HIVE-4478
> Project: Hive
> Issue Type: Sub-task
> Components: File Formats
> Affects Versions: 0.12.0
> Reporter: Eric Hanson
> Assignee: Prasanth J
> Fix For: 0.12.0
>
> Attachments: HIVE-4478.1.patch.txt, HIVE-4478.2.git.patch.txt
>
>
> Currently, the stripe metadata for ORC contains the min and max value for
> each column in the stripe. This will be used for stripe elimination. However,
> an additional bit of metadata for each column for each stripe, noNulls
> (true/false), is needed to help speed up vectorized query execution as much
> as 30%.
> The vectorized QE code has a Boolean flag for each column vector called
> noNulls. If this is true, all the null-checking logic is skipped for that
> column for a VectorizedRowBatch when an operation is performed on that
> column. For simple filters and arithmetic expressions, this can save on the
> order of 30% of the time.
> Once this noNulls stripe metadata is available, the vectorized iterator
> (reader) for ORC can be updated to avoid all expense to load the isNull
> bitmap, and efficiently set the noNulls flag for each column vector.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira