Have you looked at Phoenix ?

https://phoenix.apache.org/joins.html

On Fri, Sep 29, 2017 at 3:25 AM, wenxing zheng <wenxing.zh...@gmail.com>
wrote:

> Dear all,
>
> I have 3 big HBase tables, which all have millions of rows(rows are synced
> from MySQL DB via Bin log) and for each HBase table, we have an external
> table on Hive correspondingly with the storage by
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. The advantage is that
> we can always keep sync up with the production DB and provides random
> access by key.
>
> Now our business needs to do some analysis on those tables with Join query.
> What's the best practice to make it?
>
> From my experiment, I found that with the Spark SQL on HBase or Hive, the
> job ran very slowly and will saturate the network bandwidth. But it works
> very well for the Hive SQL directly against Hive from HDFS files(make a
> copy of the data to HDFS files).
>
> Appreciated for any advice on what would be the problem here? and the way
> to optimize the job.
> Regards, Wenxing
>

Reply via email to