It looks like you guys already have most of the TPCH queries running based on Enis’s talk in Ireland this year. Very cool.
(Slide 20: Phoenix can execute most of the TPC-H queries!) Regards, John Leach > On Aug 19, 2016, at 8:28 PM, Nick Dimiduk <ndimi...@gmail.com> wrote: > > It's TPC-DS, not -H, but this is what I was using way back when to run perf > tests over Phoenix and the query server while I was developing on it. The > first project generates, loads the data via mapreduce and the second tool > wraps up use of jmeter to run queries in parallel. > > https://github.com/ndimiduk/tpcds-gen > https://github.com/ndimiduk/phoenix-performance > > Probably there's dust and bit-rot to brush off of both projects, but maybe > it'll help someone looking for a starting point? > > Apologies, but I haven't had time to see what the speakers have shared > about their setup. > > -n > > On Friday, August 19, 2016, Andrew Purtell <apurt...@apache.org> wrote: > >>> Maybe there's such a test harness that already exists for TPC? >> >> TPC provides tooling but it's all proprietary. The generated data can be >> kept separately (Druid does it at least - >> http://druid.io/blog/2014/03/17/benchmarking-druid.html >> ). >> >> I'd say there would be one time setup: generation of data sets of various >> sizes, conversion to compressed CSV, and upload to somewhere public (S3?). >> Not strictly necessary, but it would save everyone a lot of time and hassle >> to not have to download the TPC data generators and munge the output every >> time. For this one could use the TPC tools. >> >> Then, the most sensible avenue I think would be implementation of new >> Phoenix integration tests that consume that data and run uniquely tweaked >> queries (yeah - every datastore vendor must do that with TPC). Phoenix can >> use hbase-it and get the cluster and chaos tooling such as it is for free, >> but the upsert/initialization/bulk load and query tooling would be all >> Phoenix based: the CSV loader, the JDBC driver. >> >> >> >> >> >> On Fri, Aug 19, 2016 at 5:31 PM, James Taylor <jamestay...@apache.org >> <javascript:;>> >> wrote: >> >>> On Fri, Aug 19, 2016 at 3:01 PM, Andrew Purtell <apurt...@apache.org >> <javascript:;>> >>> wrote: >>> >>>>> I have a long interest in 'canned' loadings. Interesting ones are >> hard >>> to >>>>> come by. If Phoenix ran any or a subset of TPCs, I'd like to try it. >>>> >>>> Likewise >>>> >>>>> But I don't want to be the first to try it. I am not a Phoenix >> expert. >>>> >>>> Same here, I'd just email dev@phoenix with a report that TPC query XYZ >>>> didn't work and that would be as far as I could get. >>>> >>>> I don't think the first phase would require Phoenix experience. It's >> more >>> around the automation for running each TPC benchmark so the process is >>> repeatable: >>> - pulling in the data >>> - scripting the jobs >>> - having a test harness they run inside >>> - identifying the queries that don't work (ideally you wouldn't stop at >> the >>> first error) >>> - filing JIRAs for these >>> >>> The entire framework could be built and tested using standard JDBC APIs, >>> and then initially run using MySQL or some other RDBMS before trying it >>> with Phoenix. Maybe there's such a test harness that already exists for >>> TPC? >>> >>> Then I think the next phase would require more Phoenix & HBase >> experience: >>> - tweaking queries where possible given any limitations in Phoenix >>> - adding missing syntax (or potentially using the calcite branch which >>> supports more) >>> - tweaking Phoenix schema declarations to optimize >>> - tweaking Phoenix & HBase configs to optimize >>> - determining which secondary indexes to add (though I think there's an >>> academic paper on this, I can't seem to find it) >>> >>> Both phases would require a significant amount of time and effort. Each >>> benchmark would likely require unique tweaks. >>> >>> Thanks, >>> James >>> >> >> >> >> -- >> Best regards, >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >>