Re: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?
kind of found this http://hortonworks.com/blog/hbase-via-hive-part-1/ From a performance perspective, there are things Hive can do today (ie, not dependent on data types) to take advantage of HBase. There’s also the possibility of an HBase-aware Hive to make use of HBase tables as intermediate storage location (HIVE-3565 https://issues.apache.org/jira/browse/HIVE-3565), facilitating map-side joins against dimension tables loaded into HBase. Hive could make use of HBase’s natural indexed structure (HIVE-3634 https://issues.apache.org/jira/browse/HIVE-3634, HIVE-3727 https://issues.apache.org/jira/browse/HIVE-3727), potentially saving huge scans. Currently, the user doesn’t have (any?) control over the scans which are executed. Configuration on a per-job, or at least per-table basis should be enabled (HIVE-1233 https://issues.apache.org/jira/browse/HIVE-1233). That would enable an HBase-savy user to provide Hive with hints regarding how it should interact with HBase. Support for simple split sampling of HBase tables ( HIVE-3399 https://issues.apache.org/jira/browse/HIVE-3399) could also be easily done because HBase manages table partitions already. On Thu, Jul 24, 2014 at 2:03 PM, Yang tedd...@gmail.com wrote: if I do a join of a table based on txt file and a table based on HBase, and say the latter is very large, is HIVE smart enough to utilize the HBase table's index to do the join, instead of implementing this as a regular map reduce job, where each table is scanned fully, bucketed on join keys, and then the matching items found out through the reducer? thanks Yang
RE: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?
I don't think Hbase-Hive integration part is that smart, be able to utilize the index existing in the HBase. But I think it depends on the version you are using. From my experience, there are a lot of improvement space in the Hbase-hive integration, especially push down logic into HBase engine. Yong From: tedd...@gmail.com Date: Thu, 24 Jul 2014 14:03:42 -0700 Subject: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin? To: user@hive.apache.org if I do a join of a table based on txt file and a table based on HBase, and say the latter is very large, is HIVE smart enough to utilize the HBase table's index to do the join, instead of implementing this as a regular map reduce job, where each table is scanned fully, bucketed on join keys, and then the matching items found out through the reducer? thanksYang
Re: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?
The following article about using Klout's Brickhouse library to access an HBase table as a map through its key might be useful. http://brickhouseconfessions.wordpress.com/2013/08/06/squash-the-long-tail-with-brickhouses-hbase-udfs/ On Jul 24, 2014 8:56 PM, Andrew Mains andrew.ma...@kontagent.com wrote: Agreed--as far as I can tell there isn't any support for this currently. This JIRA (https://issues.apache.org/jira/browse/HIVE-3727, referenced in http://hortonworks.com/blog/hbase-via-hive-part-1/) seems relevant, but there's no recent work on it, and I imagine the patch included is out of date with trunk. Perhaps it's worth resurrecting? Andrew On 7/24/14, 4:45 PM, java8964 wrote: I don't think Hbase-Hive integration part is that smart, be able to utilize the index existing in the HBase. But I think it depends on the version you are using. From my experience, there are a lot of improvement space in the Hbase-hive integration, especially push down logic into HBase engine. Yong -- From: tedd...@gmail.com Date: Thu, 24 Jul 2014 14:03:42 -0700 Subject: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin? To: user@hive.apache.org if I do a join of a table based on txt file and a table based on HBase, and say the latter is very large, is HIVE smart enough to utilize the HBase table's index to do the join, instead of implementing this as a regular map reduce job, where each table is scanned fully, bucketed on join keys, and then the matching items found out through the reducer? thanks Yang