[
https://issues.apache.org/jira/browse/HIVE-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009938#comment-14009938
]
Craig Condit commented on HIVE-1643:
------------------------------------
The patch as-is has a few issues...
First, at least in Hive 0.12, it interacts badly when multiple tables are
joined. I've seen cases where it was clear that Hive was attempting to push
down predicates for the wrong table, leading to NullPointerExceptions when the
column is looked up and not found since the HBase storage handler assumes that
any predicate that it receives will be for a valid column. I suspect this must
be a bug in the query optimizer, but have not been able to determine exactly
where.
Second, the fallback behavior when a complex query predicate is passed down is
to punt on the entire expression, even if it could be partially evaluated (for
example rowkey >= 'A' AND rowkey < 'B' AND ([complex bit])). This leads to
unexpected full table scans in HBase. At the very least, the code should try
really hard to at least handle the rowkey parts if possible. This can happen
unexpectedly, if a single term uses an operator that the storage handler does
not have a case for.
Third, even if the predicate pushdown works, this often results in secondary
issues when interacting with HBase. In a case where no rowkey expression
exists, it's possible to run very high CPU usage on HBase to evaluate the
filters, and even get HBase RPC timeouts if enough rows are filtered out to
result in no data being returned quickly enough. It would be nice to be able to
control (somehow) which expressions the code tries to push down.
At our location, we didn't even try to port the patch to Hive 0.13 when we
upgraded, mainly due to issues #2 and #3. Fortunately, CTEs have allowed us to
ensure that only rowkey predicates get pushed down like so:
{noformat}
with a as (select ... from hbase_table where rowkey >= 'start' and rowkey <
'end') do select * from a where ...;
{noformat}
It might be more useful for Hive-HBase integration to focus on ensuring that
rowkey predicates are always pushed down (except for things like OR/NOT
expressions, etc.) rather than trying to push down other types of expressions.
> support range scans and non-key columns in HBase filter pushdown
> ----------------------------------------------------------------
>
> Key: HIVE-1643
> URL: https://issues.apache.org/jira/browse/HIVE-1643
> Project: Hive
> Issue Type: Improvement
> Components: HBase Handler
> Affects Versions: 0.9.0
> Reporter: John Sichi
> Assignee: bharath v
> Labels: patch
> Attachments: HIVE-1643.patch, Hive-1643.2.patch, hbase_handler.patch
>
>
> HIVE-1226 added support for WHERE rowkey=3. We would like to support WHERE
> rowkey BETWEEN 10 and 20, as well as predicates on non-rowkeys (plus
> conjunctions etc). Non-rowkey conditions can't be used to filter out entire
> ranges, but they can be used to push the per-row filter processing as far
> down as possible.
--
This message was sent by Atlassian JIRA
(v6.2#6252)