Re: Issues while Running Apache Phoenix against TPC-H data

John Leach Fri, 19 Aug 2016 14:36:01 -0700

Sorry for the delay on this end…

Each Region Server has 24 Gigs of Ram, 12 cores plus 12 virtual cores.


Would you please provide appropriate configurations for an analytic and data 
load benchmark?  

I am hearing HBase 1.2 and latest Phoenix release.  

If you are using open source hbase, would you be willing to attach your key xml 
files (hbase-site.xml) so that we are testing phoenix in the best light?

Mujtaba would you mind sharing your schema and any indexes that are 
appropriate?  

We have a few copies of the TPCH data floating around but I appreciate the link.

Thanks,
John Leach

> On Aug 19, 2016, at 4:03 PM, <[email protected]> <[email protected]> wrote:
> 
> I think Stack it trying to help and was just asking whether Mujtaba did 
> something special to load the data (and perhaps how it took for us and on how 
> many nodes we did that).(If it loaded fine for us and there was nothing 
> special we had to do, I agree that there's no way (or need) to troubleshoot 
> vendor specific benchmark setups.)
> I also agree running some subset of TPC-* would be a boon for Phoenix and 
> boost its adoption.
> 
> At the same time Phoenix is moving at an incredible speed. 4.7 is already old 
> (considering the fixes in 4.8), 4.4 is _ancient_. In 4.9 (or 5.0) we'll have 
> column and dense encoding, which would speed up this type of query.
> 
> Now, Amit never replied about how their HBase is actually configured (heap 
> sizes, etc). Phoenix runs inside of the region server, and hence their 
> configuration is extremely important.
> -- Lars
> 
>      From: James Taylor <[email protected]>
> To: "[email protected]" <[email protected]> 
> Sent: Friday, August 19, 2016 1:19 PM
> Subject: Re: Issues while Running Apache Phoenix against TPC-H data
> 
> On Fri, Aug 19, 2016 at 11:37 AM, Stack <[email protected]> wrote:
> 
>> On Thu, Aug 18, 2016 at 5:54 PM, James Taylor <[email protected]>
>> wrote:
>> 
>>> The data loaded fine for us.
>> 
>> 
>> Mind describing what you did to get it to work and with what versions and
>> configurations and with what TPC loading and how much of the workload was
>> supported? Was it a one-off project?
>> 
> 
> Mujtaba already kindly responded to this (about a week back on this
> thread). He was able to load the data for the benchmark onto one of our
> internal clusters. He didn't run the benchmarks. Sorry, but I don't have
> any more specific knowledge, but generally I think:
> - it's difficult for an OS project to troubleshoot environmental issues and
> it's even more difficult if a user is using a vendor specific distro. IMHO,
> if you ask an open source project for help, you should be using the
> artifacts that they produce (preferably the latest release).
> - using a three node cluster for HBase is not ideal for benchmarking.
> - doing full table scans over large HBase tables will be slow.
> 
> 
>> 
>> 
>> 
>>> If TPC is not representative of real
>>> workloads, I'm not sure there's value in spending a lot of time running
>>> them.
>> 
>> 
>> I suppose the project could just ignore TPC but I'd suggest that Phoenix
>> put up a page explaining why TPC does not apply if this the case; i.e. it
>> is not representative of Phoenix work loads. When people see that Phoenix
>> is for "OLTP and analytical queries", they probably think the TPC loadings
>> will just work given their standing in the industry. Putting up a disavowal
>> with explanation will save folks time trying to make it work and it can
>> also be cited when folks try to run TPC against Phoenix and they have a bad
>> experience, say bad performance.
>> 
> 
> I haven't run the TPC benchmarks, so I have no idea how they perform. I
> work at Salesforce where we use Phoenix (among may other technologies) to
> support various big data use cases. The workloads I'm familiar with aren't
> similar to the TPC benchmarks, so they're not relevant for my work. But if
> TPC benchmarks are relevant for your work, then that'd be great if you
> pursued this. Or maybe we can get this "Phoenix" person you mentioned to do
> it (smile).
> 
> 
>> 
>> On the other hand, even if an artificial loading, unless Phoenix has a
>> better means of verifying all works, I'd think it would be a useful test to
>> run before release or on a nightly basis verifying no regression in
>> performance or in utility.
>> 
> 
> I think the community would welcome enhancing our existing regression test
> suite. If you're up for leading that effort, that'd be great.
> 
> Thanks,
> James
> 
>

Re: Issues while Running Apache Phoenix against TPC-H data

Reply via email to