Hi Jesus, Many thanks for your inputs.
Best, Ashwin On Mon, Mar 5, 2018 at 12:41 PM, Jesus Camacho Rodriguez < jcama...@apache.org> wrote: > Hi Ashwin, > > 1) It is important that table/column stats are available, so Calcite can > trigger correctly its cost-based optimizations. You can do that either > manually by running ANALYZE... COMPUTE STATISTICS FOR COLUMNS statement, or > enabling hive.stats.autogather indeed. > > 2) Calcite-based optimizer is enabled by default, hence you do not need to > set any other flag. > > Calcite will log messages during optimization, so if you set the correct > logger level for Calcite (e.g. DEBUG), you will see messages, e.g., with > the Calcite rules that have been triggered. > In turn, optimization time for every optimization stage is recorded using > PerfLogger, so you will be able to see this information in the logs (or you > could add your own if you need to). > > If you had more questions about Hive optimizer vs Calcte in general, I > would suggest that you use the Hive dev list to ask them, as you may be > able to get more help over there. > > -Jesús > > > On 3/3/18, 7:40 AM, "AshwinKumar AshwinKumar" <aash...@g.clemson.edu> > wrote: > > Hello Dev Team, > > I am trying to run queries on Apache HIVE by setting the flag > *hive.cbo.enabled* to true and also to false and then compare the > metrics. > I have a few questions regarding the same - > 1. Do I need to set *hive.stats.autogather(to gather the tables > statistics)* > to true as well before setting turning on the CBO. > 2. Is there any other flags which I need to set to activate the > calcite CBO > . > > Also could you please let me know what is best way to obtain any > instrumentation data from Calcite process. > > Thanks, > Ashwin > > On Thu, Mar 1, 2018 at 2:26 AM, Riccardo Tommasini < > riccardo.tommas...@polimi.it> wrote: > > > Hello, > > > > I can definitely help if you need me to do something. > > > > And I would also like to join the online meeting. > > > > Cheers, > > > > On 20 Feb 2018, 22:13 +0100, Edmon Begoli <ebeg...@gmail.com>, > wrote: > > Just a quick update on the progress of benchmarking setup for > Calcite, and > > a call to you for feedback and participation: > > > > 1. We (Ashwin Vajantri. member of my team) has installed Postgres > and Hive > > on our servers, and he has loaded TPC-DS benchmark data, and ran > some test > > queries. He also installed Calcite on top of Postgres so we can do > > comparisons of performance for through Calcite vs. native. > > (we have a full documentation for all this in a Google Doc I shared > with > > those interested in this work. We'll make if public once complete) > > > > 2. Another colleague, Dr. Seung-Hwan Lim is ready to look into more > > detailed benchmarking and optimization aspects, as well as to look > into > > other engines that we work with and know -- MapD, Spark, Druid, > Cassandra, > > or Flink. > > > > All this so far is based, and in support of following JIRA issues: > > https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-2168 > > https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-2169 > > > > My question to the community is: > > > > 1. Does anyone have any feedback on specific queries or engines we > want to > > target, and start with? > > > > 2. How can we meaningfully turn on and turn off Hive optimizer to > measure > > the performance? > > > > 3. Anyone wants to pitch in help in any area? > > > > I am planning to schedule an online meeting next week to connect and > > discuss for those interested. > > > > > >