Re: Issues while Running Apache Phoenix against TPC-H data

Nick Dimiduk Fri, 19 Aug 2016 18:28:46 -0700

It's TPC-DS, not -H, but this is what I was using way back when to run perf
tests over Phoenix and the query server while I was developing on it. The
first project generates, loads the data via mapreduce and the second tool
wraps up use of jmeter to run queries in parallel.


https://github.com/ndimiduk/tpcds-gen
https://github.com/ndimiduk/phoenix-performance

Probably there's dust and bit-rot to brush off of both projects, but maybe
it'll help someone looking for a starting point?

Apologies, but I haven't had time to see what the speakers have shared
about their setup.

-n

On Friday, August 19, 2016, Andrew Purtell <[email protected]> wrote:

> > Maybe there's such a test harness that already exists for TPC?
>
> TPC provides tooling but it's all proprietary. The generated data can be
> kept separately (Druid does it at least -
> http://druid.io/blog/2014/03/17/benchmarking-druid.html
> ).
>
> I'd say there would be one time setup: generation of data sets of various
> sizes, conversion to compressed CSV, and upload to somewhere public (S3?).
> Not strictly necessary, but it would save everyone a lot of time and hassle
> to not have to download the TPC data generators and munge the output every
> time. For this one could use the TPC tools.
>
> Then, the most sensible avenue I think would be implementation of new
> Phoenix integration tests that consume that data and run uniquely tweaked
> queries (yeah - every datastore vendor must do that with TPC). Phoenix can
> use hbase-it and get the cluster and chaos tooling such as it is for free,
> but the upsert/initialization/bulk load and query tooling would be all
> Phoenix based: the CSV loader, the JDBC driver.
>
> 
> 
>
>
> On Fri, Aug 19, 2016 at 5:31 PM, James Taylor <[email protected]
> <javascript:;>>
> wrote:
>
> > On Fri, Aug 19, 2016 at 3:01 PM, Andrew Purtell <[email protected]
> <javascript:;>>
> > wrote:
> >
> > > > I have a long interest in 'canned' loadings. Interesting ones are
> hard
> > to
> > > > come by. If Phoenix ran any or a subset of TPCs, I'd like to try it.
> > >
> > > Likewise
> > >
> > > > But I don't want to be the first to try it. I am not a Phoenix
> expert.
> > >
> > > Same here, I'd just email dev@phoenix with a report that TPC query XYZ
> > > didn't work and that would be as far as I could get.
> > >
> > > I don't think the first phase would require Phoenix experience. It's
> more
> > around the automation for running each TPC benchmark so the process is
> > repeatable:
> > - pulling in the data
> > - scripting the jobs
> > - having a test harness they run inside
> > - identifying the queries that don't work (ideally you wouldn't stop at
> the
> > first error)
> > - filing JIRAs for these
> >
> > The entire framework could be built and tested using standard JDBC APIs,
> > and then initially run using MySQL or some other RDBMS before trying it
> > with Phoenix. Maybe there's such a test harness that already exists for
> > TPC?
> >
> > Then I think the next phase would require more Phoenix & HBase
> experience:
> > - tweaking queries where possible given any limitations in Phoenix
> > - adding missing syntax (or potentially using the calcite branch which
> > supports more)
> > - tweaking Phoenix schema declarations to optimize
> > - tweaking Phoenix & HBase configs to optimize
> > - determining which secondary indexes to add (though I think there's an
> > academic paper on this, I can't seem to find it)
> >
> > Both phases would require a significant amount of time and effort. Each
> > benchmark would likely require unique tweaks.
> >
> > Thanks,
> > James
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: Issues while Running Apache Phoenix against TPC-H data

Reply via email to