Re: Updates on Benchmarking and Optimization Research for Calcite

Michael Mior Wed, 21 Feb 2018 06:14:02 -0800

1. TPC-DS seems like a great starting point to me. SSB would also be a good
addition.


2. You can set hive.cbo.enabled in your Hive configto false to turn off the
optimizer.

3. Count me interested in general although I have limited time available in
the immediate future. I'd be interested in joining the call next week if
possible.

--
Michael Mior
mm...@apache.org

2018-02-20 16:12 GMT-05:00 Edmon Begoli <ebeg...@gmail.com>:

> Just a quick update on the progress of benchmarking setup for Calcite, and
> a call to you for feedback and participation:
>
> 1. We (Ashwin Vajantri. member of my team) has installed Postgres and Hive
> on our servers, and he has loaded TPC-DS benchmark data, and ran some test
> queries. He also installed Calcite on top of Postgres so we can do
> comparisons of performance for through Calcite vs. native.
> (we have a full documentation for all this in a Google Doc I shared with
> those interested in this work. We'll make if public once complete)
>
> 2. Another colleague, Dr. Seung-Hwan Lim is ready to look into more
> detailed benchmarking and optimization aspects, as well as to look into
> other engines that we work with and know -- MapD, Spark, Druid, Cassandra,
> or Flink.
>
> All this so far is based, and in support of following JIRA issues:
> https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-2168
> https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-2169
>
> My question to the community is:
>
> 1. Does anyone have any feedback on specific queries or engines we want to
> target, and start with?
>
> 2. How can we meaningfully turn on and turn off Hive optimizer to measure
> the performance?
>
> 3. Anyone wants to pitch in help in any area?
>
> I am planning to schedule an online meeting next week to connect and
> discuss for those interested.
>

Re: Updates on Benchmarking and Optimization Research for Calcite

Reply via email to