Re: Issues while Running Apache Phoenix against TPC-H data

James Taylor Fri, 19 Aug 2016 17:32:10 -0700

On Fri, Aug 19, 2016 at 3:01 PM, Andrew Purtell <apurt...@apache.org> wrote:


> > I have a long interest in 'canned' loadings. Interesting ones are hard to
> > come by. If Phoenix ran any or a subset of TPCs, I'd like to try it.
>
> Likewise
>
> > But I don't want to be the first to try it. I am not a Phoenix expert.
>
> Same here, I'd just email dev@phoenix with a report that TPC query XYZ
> didn't work and that would be as far as I could get.
>
> I don't think the first phase would require Phoenix experience. It's more
around the automation for running each TPC benchmark so the process is
repeatable:
- pulling in the data
- scripting the jobs
- having a test harness they run inside
- identifying the queries that don't work (ideally you wouldn't stop at the
first error)
- filing JIRAs for these

The entire framework could be built and tested using standard JDBC APIs,
and then initially run using MySQL or some other RDBMS before trying it
with Phoenix. Maybe there's such a test harness that already exists for TPC?

Then I think the next phase would require more Phoenix & HBase experience:
- tweaking queries where possible given any limitations in Phoenix
- adding missing syntax (or potentially using the calcite branch which
supports more)
- tweaking Phoenix schema declarations to optimize
- tweaking Phoenix & HBase configs to optimize
- determining which secondary indexes to add (though I think there's an
academic paper on this, I can't seem to find it)

Both phases would require a significant amount of time and effort. Each
benchmark would likely require unique tweaks.

Thanks,
James

Re: Issues while Running Apache Phoenix against TPC-H data

Reply via email to