It's TPC-DS, not -H, but this is what I was using way back when to run perf tests over Phoenix and the query server while I was developing on it. The first project generates, loads the data via mapreduce and the second tool wraps up use of jmeter to run queries in parallel.
https://github.com/ndimiduk/tpcds-gen https://github.com/ndimiduk/phoenix-performance Probably there's dust and bit-rot to brush off of both projects, but maybe it'll help someone looking for a starting point? Apologies, but I haven't had time to see what the speakers have shared about their setup. -n On Friday, August 19, 2016, Andrew Purtell <apurt...@apache.org> wrote: > > Maybe there's such a test harness that already exists for TPC? > > TPC provides tooling but it's all proprietary. The generated data can be > kept separately (Druid does it at least - > http://druid.io/blog/2014/03/17/benchmarking-druid.html > ). > > I'd say there would be one time setup: generation of data sets of various > sizes, conversion to compressed CSV, and upload to somewhere public (S3?). > Not strictly necessary, but it would save everyone a lot of time and hassle > to not have to download the TPC data generators and munge the output every > time. For this one could use the TPC tools. > > Then, the most sensible avenue I think would be implementation of new > Phoenix integration tests that consume that data and run uniquely tweaked > queries (yeah - every datastore vendor must do that with TPC). Phoenix can > use hbase-it and get the cluster and chaos tooling such as it is for free, > but the upsert/initialization/bulk load and query tooling would be all > Phoenix based: the CSV loader, the JDBC driver. > > > > > > On Fri, Aug 19, 2016 at 5:31 PM, James Taylor <jamestay...@apache.org > <javascript:;>> > wrote: > > > On Fri, Aug 19, 2016 at 3:01 PM, Andrew Purtell <apurt...@apache.org > <javascript:;>> > > wrote: > > > > > > I have a long interest in 'canned' loadings. Interesting ones are > hard > > to > > > > come by. If Phoenix ran any or a subset of TPCs, I'd like to try it. > > > > > > Likewise > > > > > > > But I don't want to be the first to try it. I am not a Phoenix > expert. > > > > > > Same here, I'd just email dev@phoenix with a report that TPC query XYZ > > > didn't work and that would be as far as I could get. > > > > > > I don't think the first phase would require Phoenix experience. It's > more > > around the automation for running each TPC benchmark so the process is > > repeatable: > > - pulling in the data > > - scripting the jobs > > - having a test harness they run inside > > - identifying the queries that don't work (ideally you wouldn't stop at > the > > first error) > > - filing JIRAs for these > > > > The entire framework could be built and tested using standard JDBC APIs, > > and then initially run using MySQL or some other RDBMS before trying it > > with Phoenix. Maybe there's such a test harness that already exists for > > TPC? > > > > Then I think the next phase would require more Phoenix & HBase > experience: > > - tweaking queries where possible given any limitations in Phoenix > > - adding missing syntax (or potentially using the calcite branch which > > supports more) > > - tweaking Phoenix schema declarations to optimize > > - tweaking Phoenix & HBase configs to optimize > > - determining which secondary indexes to add (though I think there's an > > academic paper on this, I can't seem to find it) > > > > Both phases would require a significant amount of time and effort. Each > > benchmark would likely require unique tweaks. > > > > Thanks, > > James > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >