Hi Wenxing,
From the use case you describe, you may want to take a look at Trafodion or 
EsgynDB (commercial version of Trafodion).
http://trafodion.incubator.apache.org/
Trafodion uses a very mature SQL engine on top of HBASE/HIVE coming with 20 
years of IP given away to open source by Hewlett-packard 2 years ago.
Support many different JOIN types (hash join, nested joins, merge joins) with 
optimized overflow to disk mechanisms over an optimized pipelined architecture, 
full indexing capabilities, and an optimized row format that will make your 
hbase table a lot faster than it is when using one cell per column.
From a SQL capability standpoint for analytics queries, Trafodion can run full 
TPCDS 99 queries.
Hope this helps,
Eric




-----Original Message-----
From: wenxing zheng [mailto:wenxing.zh...@gmail.com] 
Sent: Friday, September 29, 2017 7:24 AM
To: dev@hbase.apache.org
Subject: Re: Performance issue in the Join query on the HBase tables

Thanks to Ted.

We didn't try the phoneix yet. From the performance test on the official site 
of phoenix, I didn't find the report on the Join query. Not sure whether it's 
much better or not

On Fri, Sep 29, 2017 at 8:01 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Have you looked at Phoenix ?
>
> https://phoenix.apache.org/joins.html
>
> On Fri, Sep 29, 2017 at 3:25 AM, wenxing zheng 
> <wenxing.zh...@gmail.com>
> wrote:
>
> > Dear all,
> >
> > I have 3 big HBase tables, which all have millions of rows(rows are
> synced
> > from MySQL DB via Bin log) and for each HBase table, we have an 
> > external table on Hive correspondingly with the storage by 
> > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. The advantage is
> that
> > we can always keep sync up with the production DB and provides 
> > random access by key.
> >
> > Now our business needs to do some analysis on those tables with Join
> query.
> > What's the best practice to make it?
> >
> > From my experiment, I found that with the Spark SQL on HBase or 
> > Hive, the job ran very slowly and will saturate the network 
> > bandwidth. But it works very well for the Hive SQL directly against 
> > Hive from HDFS files(make a copy of the data to HDFS files).
> >
> > Appreciated for any advice on what would be the problem here? and 
> > the way to optimize the job.
> > Regards, Wenxing
> >
>

Reply via email to