[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index

Gopal V (JIRA) Wed, 07 Jan 2015 11:30:11 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14268071#comment-14268071
 ]


Gopal V commented on HIVE-4639:
-------------------------------

Yes, we have that granularity locked up in two states (as a tri-state, now - 
all_nulls, some_nulls, no_nulls).

We actually have all_nulls/no_values encoded as "min=null/max=null". This patch 
is the "some_nulls/no_nulls" boolean on top of that - though, that information 
is in somewhat non-obvious detail.

Another thought occurs, that since we have a whole long stream of IS_PRESENT 
already, I suspect storing the actual NULL count would be somewhat helpful, if 
we need to have a heuristic for IS_NULL row-level predicate evaluation for wide 
de-normalized tables (i.e read filter col first and then avoid creating large 
vector batches for the rest).

> Add has null flag to ORC internal index
> ---------------------------------------
>
>                 Key: HIVE-4639
>                 URL: https://issues.apache.org/jira/browse/HIVE-4639
>             Project: Hive
>          Issue Type: Improvement
>          Components: File Formats
>            Reporter: Owen O'Malley
>            Assignee: Prasanth Jayachandran
>         Attachments: HIVE-4639.1.patch
>
>
> It would enable more predicate pushdown if we added a flag to the index entry 
> recording if there were any null values in the column for the 10k rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index

Reply via email to