[ https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14268071#comment-14268071 ]
Gopal V commented on HIVE-4639: ------------------------------- Yes, we have that granularity locked up in two states (as a tri-state, now - all_nulls, some_nulls, no_nulls). We actually have all_nulls/no_values encoded as "min=null/max=null". This patch is the "some_nulls/no_nulls" boolean on top of that - though, that information is in somewhat non-obvious detail. Another thought occurs, that since we have a whole long stream of IS_PRESENT already, I suspect storing the actual NULL count would be somewhat helpful, if we need to have a heuristic for IS_NULL row-level predicate evaluation for wide de-normalized tables (i.e read filter col first and then avoid creating large vector batches for the rest). > Add has null flag to ORC internal index > --------------------------------------- > > Key: HIVE-4639 > URL: https://issues.apache.org/jira/browse/HIVE-4639 > Project: Hive > Issue Type: Improvement > Components: File Formats > Reporter: Owen O'Malley > Assignee: Prasanth Jayachandran > Attachments: HIVE-4639.1.patch > > > It would enable more predicate pushdown if we added a flag to the index entry > recording if there were any null values in the column for the 10k rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)