Re: Updates on Benchmarking and Optimization Research for Calcite

AshwinKumar AshwinKumar Wed, 07 Mar 2018 00:33:38 -0800

Hi Jesus,

Many thanks for your inputs.


Best,
Ashwin

On Mon, Mar 5, 2018 at 12:41 PM, Jesus Camacho Rodriguez <
jcama...@apache.org> wrote:

> Hi Ashwin,
>
> 1) It is important that table/column stats are available, so Calcite can
> trigger correctly its cost-based optimizations. You can do that either
> manually by running ANALYZE... COMPUTE STATISTICS FOR COLUMNS statement, or
> enabling hive.stats.autogather indeed.
>
> 2) Calcite-based optimizer is enabled by default, hence you do not need to
> set any other flag.
>
> Calcite will log messages during optimization, so if you set the correct
> logger level for Calcite (e.g. DEBUG), you will see messages, e.g., with
> the Calcite rules that have been triggered.
> In turn, optimization time for every optimization stage is recorded using
> PerfLogger, so you will be able to see this information in the logs (or you
> could add your own if you need to).
>
> If you had more questions about Hive optimizer vs Calcte in general, I
> would suggest that you use the Hive dev list to ask them, as you may be
> able to get more help over there.
>
> -Jesús
>
>
> On 3/3/18, 7:40 AM, "AshwinKumar AshwinKumar" <aash...@g.clemson.edu>
> wrote:
>
>     Hello Dev Team,
>
>     I am trying to run queries on Apache HIVE by setting the flag
>     *hive.cbo.enabled* to true and also to false and then compare the
> metrics.
>     I have a few questions regarding the same -
>     1. Do I need to set *hive.stats.autogather(to gather the tables
> statistics)*
>     to true as well before setting turning on the CBO.
>     2. Is there any other flags which I need to set to activate the
> calcite CBO
>     .
>
>     Also could you please let me know what is best way to obtain any
>     instrumentation data from Calcite process.
>
>     Thanks,
>     Ashwin
>
>     On Thu, Mar 1, 2018 at 2:26 AM, Riccardo Tommasini <
>     riccardo.tommas...@polimi.it> wrote:
>
>     > Hello,
>     >
>     > I can definitely help if you need me to do something.
>     >
>     > And I would also like to join the online meeting.
>     >
>     > Cheers,
>     >
>     > On 20 Feb 2018, 22:13 +0100, Edmon Begoli <ebeg...@gmail.com>,
> wrote:
>     > Just a quick update on the progress of benchmarking setup for
> Calcite, and
>     > a call to you for feedback and participation:
>     >
>     > 1. We (Ashwin Vajantri. member of my team) has installed Postgres
> and Hive
>     > on our servers, and he has loaded TPC-DS benchmark data, and ran
> some test
>     > queries. He also installed Calcite on top of Postgres so we can do
>     > comparisons of performance for through Calcite vs. native.
>     > (we have a full documentation for all this in a Google Doc I shared
> with
>     > those interested in this work. We'll make if public once complete)
>     >
>     > 2. Another colleague, Dr. Seung-Hwan Lim is ready to look into more
>     > detailed benchmarking and optimization aspects, as well as to look
> into
>     > other engines that we work with and know -- MapD, Spark, Druid,
> Cassandra,
>     > or Flink.
>     >
>     > All this so far is based, and in support of following JIRA issues:
>     > https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-2168
>     > https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-2169
>     >
>     > My question to the community is:
>     >
>     > 1. Does anyone have any feedback on specific queries or engines we
> want to
>     > target, and start with?
>     >
>     > 2. How can we meaningfully turn on and turn off Hive optimizer to
> measure
>     > the performance?
>     >
>     > 3. Anyone wants to pitch in help in any area?
>     >
>     > I am planning to schedule an online meeting next week to connect and
>     > discuss for those interested.
>     >
>
>
>
>

Re: Updates on Benchmarking and Optimization Research for Calcite

Reply via email to