1. TPC-DS seems like a great starting point to me. SSB would also be a good addition.
2. You can set hive.cbo.enabled in your Hive configto false to turn off the optimizer. 3. Count me interested in general although I have limited time available in the immediate future. I'd be interested in joining the call next week if possible. -- Michael Mior mm...@apache.org 2018-02-20 16:12 GMT-05:00 Edmon Begoli <ebeg...@gmail.com>: > Just a quick update on the progress of benchmarking setup for Calcite, and > a call to you for feedback and participation: > > 1. We (Ashwin Vajantri. member of my team) has installed Postgres and Hive > on our servers, and he has loaded TPC-DS benchmark data, and ran some test > queries. He also installed Calcite on top of Postgres so we can do > comparisons of performance for through Calcite vs. native. > (we have a full documentation for all this in a Google Doc I shared with > those interested in this work. We'll make if public once complete) > > 2. Another colleague, Dr. Seung-Hwan Lim is ready to look into more > detailed benchmarking and optimization aspects, as well as to look into > other engines that we work with and know -- MapD, Spark, Druid, Cassandra, > or Flink. > > All this so far is based, and in support of following JIRA issues: > https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-2168 > https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-2169 > > My question to the community is: > > 1. Does anyone have any feedback on specific queries or engines we want to > target, and start with? > > 2. How can we meaningfully turn on and turn off Hive optimizer to measure > the performance? > > 3. Anyone wants to pitch in help in any area? > > I am planning to schedule an online meeting next week to connect and > discuss for those interested. >