Re: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?

2014-07-24 Thread Yang
kind of found this
http://hortonworks.com/blog/hbase-via-hive-part-1/



From a performance perspective, there are things Hive can do today (ie,
not dependent on data types) to take advantage of HBase. There’s also
the possibility of an HBase-aware Hive to make use of HBase tables as
intermediate storage location (HIVE-3565
https://issues.apache.org/jira/browse/HIVE-3565), facilitating map-side
joins against dimension tables loaded into HBase. Hive could make use of
HBase’s natural indexed structure (HIVE-3634
https://issues.apache.org/jira/browse/HIVE-3634, HIVE-3727
https://issues.apache.org/jira/browse/HIVE-3727), potentially saving huge
scans. Currently, the user doesn’t have (any?) control over the scans which
are executed. Configuration on a per-job, or at least per-table basis
should be enabled (HIVE-1233
https://issues.apache.org/jira/browse/HIVE-1233). That would enable
an HBase-savy user to provide Hive with hints regarding how it should
interact with HBase. Support for simple split sampling of HBase tables (
HIVE-3399 https://issues.apache.org/jira/browse/HIVE-3399) could also be
easily done because HBase manages table partitions already.


On Thu, Jul 24, 2014 at 2:03 PM, Yang tedd...@gmail.com wrote:

 if I do a join of a table based on txt file and a table based on HBase,
 and say the latter is very large, is HIVE smart enough to utilize the HBase
 table's index to do the join, instead of implementing this as a regular map
 reduce job, where each table is scanned fully, bucketed on join keys, and
 then the matching items found out through the reducer?


 thanks
 Yang



RE: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?

2014-07-24 Thread java8964
I don't think Hbase-Hive integration part is that smart, be able to utilize the 
index existing in the HBase. But I think it depends on the version you are 
using.
From my experience, there are a lot of improvement space in the Hbase-hive 
integration, especially push down logic into HBase engine.
Yong

From: tedd...@gmail.com
Date: Thu, 24 Jul 2014 14:03:42 -0700
Subject: does the HBase-Hive integration support using HBase index (primary key 
or secondary index) in the JOIN implementatoin?
To: user@hive.apache.org

if I do a join of a table based on txt file and a table based on HBase, and say 
the latter is very large, is HIVE smart enough to utilize the HBase table's 
index to do the join, instead of implementing this as a regular map reduce job, 
where each table is scanned fully, bucketed on join keys, and then the matching 
items found out through the reducer?



thanksYang

Re: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?

2014-07-24 Thread Juan Martin Pampliega
The following article about using Klout's Brickhouse library to access an
HBase table as a map through its key might be useful.
http://brickhouseconfessions.wordpress.com/2013/08/06/squash-the-long-tail-with-brickhouses-hbase-udfs/
On Jul 24, 2014 8:56 PM, Andrew Mains andrew.ma...@kontagent.com wrote:

  Agreed--as far as I can tell there isn't any support for this currently.

 This JIRA (https://issues.apache.org/jira/browse/HIVE-3727, referenced in
 http://hortonworks.com/blog/hbase-via-hive-part-1/) seems relevant, but
 there's no recent work on it, and I imagine the patch included is out of
 date with trunk.  Perhaps it's worth resurrecting?

 Andrew

 On 7/24/14, 4:45 PM, java8964 wrote:

 I don't think Hbase-Hive integration part is that smart, be able to
 utilize the index existing in the HBase. But I think it depends on the
 version you are using.

  From my experience, there are a lot of improvement space in the
 Hbase-hive integration, especially push down logic into HBase engine.

  Yong

  --
 From: tedd...@gmail.com
 Date: Thu, 24 Jul 2014 14:03:42 -0700
 Subject: does the HBase-Hive integration support using HBase index
 (primary key or secondary index) in the JOIN implementatoin?
 To: user@hive.apache.org

 if I do a join of a table based on txt file and a table based on HBase,
 and say the latter is very large, is HIVE smart enough to utilize the HBase
 table's index to do the join, instead of implementing this as a regular map
 reduce job, where each table is scanned fully, bucketed on join keys, and
 then the matching items found out through the reducer?


  thanks
 Yang