Thanks to Ted. We didn't try the phoneix yet. From the performance test on the official site of phoenix, I didn't find the report on the Join query. Not sure whether it's much better or not
On Fri, Sep 29, 2017 at 8:01 PM, Ted Yu <[email protected]> wrote: > Have you looked at Phoenix ? > > https://phoenix.apache.org/joins.html > > On Fri, Sep 29, 2017 at 3:25 AM, wenxing zheng <[email protected]> > wrote: > > > Dear all, > > > > I have 3 big HBase tables, which all have millions of rows(rows are > synced > > from MySQL DB via Bin log) and for each HBase table, we have an external > > table on Hive correspondingly with the storage by > > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'. The advantage is > that > > we can always keep sync up with the production DB and provides random > > access by key. > > > > Now our business needs to do some analysis on those tables with Join > query. > > What's the best practice to make it? > > > > From my experiment, I found that with the Spark SQL on HBase or Hive, the > > job ran very slowly and will saturate the network bandwidth. But it works > > very well for the Hive SQL directly against Hive from HDFS files(make a > > copy of the data to HDFS files). > > > > Appreciated for any advice on what would be the problem here? and the way > > to optimize the job. > > Regards, Wenxing > > >
