Re: Performance issue in the Join query on the HBase tables

wenxing zheng Fri, 29 Sep 2017 05:24:08 -0700

Thanks to Ted.

We didn't try the phoneix yet. From the performance test on the official
site of phoenix, I didn't find the report on the Join query. Not sure
whether it's much better or not


On Fri, Sep 29, 2017 at 8:01 PM, Ted Yu <[email protected]> wrote:

> Have you looked at Phoenix ?
>
> https://phoenix.apache.org/joins.html
>
> On Fri, Sep 29, 2017 at 3:25 AM, wenxing zheng <[email protected]>
> wrote:
>
> > Dear all,
> >
> > I have 3 big HBase tables, which all have millions of rows(rows are
> synced
> > from MySQL DB via Bin log) and for each HBase table, we have an external
> > table on Hive correspondingly with the storage by
> > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. The advantage is
> that
> > we can always keep sync up with the production DB and provides random
> > access by key.
> >
> > Now our business needs to do some analysis on those tables with Join
> query.
> > What's the best practice to make it?
> >
> > From my experiment, I found that with the Spark SQL on HBase or Hive, the
> > job ran very slowly and will saturate the network bandwidth. But it works
> > very well for the Hive SQL directly against Hive from HDFS files(make a
> > copy of the data to HDFS files).
> >
> > Appreciated for any advice on what would be the problem here? and the way
> > to optimize the job.
> > Regards, Wenxing
> >
>

Re: Performance issue in the Join query on the HBase tables

Reply via email to