Andrew Mains created HIVE-10545:
-----------------------------------
Summary: Implement predicate pushdown for queries over HBase
snapshots
Key: HIVE-10545
URL: https://issues.apache.org/jira/browse/HIVE-10545
Project: Hive
Issue Type: Improvement
Components: HBase Handler
Reporter: Andrew Mains
Hive's hbase integration currently supports queries over HBase snapshots, and
predicate pushdown for queries over HBase tables, but doesn't currently support
predicate pushdown for queries over HBase snapshots. This seems to be largely
due to the fact that the hbase handler uses the `mapred`
TableSnapshotInputFormat implementation, which doesn't support pushing a scan
to the job, and not the `mapreduce` implementation, which does (see
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapred/TableMapReduceUtil.html#initTableSnapshotMapJob(java.lang.String,%20java.lang.String,%20java.lang.Class,%20java.lang.Class,%20java.lang.Class,%20org.apache.hadoop.mapred.JobConf,%20boolean,%20org.apache.hadoop.fs.Path
vs
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.html#initTableSnapshotMapperJob(java.lang.String,%20org.apache.hadoop.hbase.client.Scan,%20java.lang.Class,%20java.lang.Class,%20java.lang.Class,%20org.apache.hadoop.mapreduce.Job,%20boolean,%20org.apache.hadoop.fs.Path))
.
Hive should be able to switch to the mapreduce implementation (performing the
necessary shimming between mapred and mapreduce), and thus gain the ability to
push predicates down to the input format in the same way as is done with
HiveTableInputFormat. This switch should result in significant performance
improvements for queries which specify range/equality conditions on the row key
(which seems like it would be a reasonably common case).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)