Re: Modify Calcite Planner in Hive to remove GROUP BY

Julian Hyde Wed, 26 Jun 2019 16:50:44 -0700

> Select count(*) from empty_table group by <constant> will produce NULL


Really? I thought it should produce zero rows.

Hsqldb:

> select count(*) from "foodmart"."days" where false group by true;
+-----------------+
|       C1        |
+-----------------+
+-----------------+
No rows selected (0.001 seconds)


Julian


> On Jun 26, 2019, at 1:12 PM, Vineet Garg <vg...@apache.org> wrote:
> 
> Hello Krzysztof,
> 
> The rewrite you mention in Hive was done in HIVE-19674
> <https://issues.apache.org/jira/browse/HIVE-19674> to be able to push such
> group by to Druid. Currently there is no way to disable this rewrite.
> 
> As for removing Group by <constant>, there are rules/rewrites which can
> reduce grouping keys by removing constants but removing whole group by is
> not safe since it can lead to semantically different query.
> e.g. Select count(*) from empty_table group by <constant> will produce NULL
> but Select count(*) from empty_table will produce 0.
> 
> P.S. There was a bug in HIVE-19674' patch which was further fixed by
> HIVE-21539 <https://issues.apache.org/jira/browse/HIVE-21539>.
> 
> Regards,
> Vineet Garg
> 
> On Wed, Jun 26, 2019 at 7:08 AM Haisheng Yuan <h.y...@alibaba-inc.com>
> wrote:
> 
>> Calcite has the rule that does the work. But you can't remove the group by
>> clause if the constant is the only group key. The semantic is different
>> without group key. Try it on empty relation, you will see the difference.
>> 
>> 
>> 
>> 
>> 
>> Thanks~
>> Haisheng
>> Yuan------------------------------------------------------------------
>> 发件人：Krzysztof Zarzycki<k.zarzy...@gmail.com>
>> 日 期：2019年06月26日 21:52:41
>> 收件人：<dev@calcite.apache.org>
>> 主 题：Modify Calcite Planner in Hive to remove GROUP BY <constant>
>> 
>> Hello,
>> 
>> While the question I have might look like regards to Hive, I believe is
>> more about Calcite. I need to add a Calcite plan rule to Hive, that removes
>> "Group by" clause when it groups by some constant value (GROUP BY TRUE more
>> precisely). As far as I believe, the query semantically is the same.
>> Could anyone on this mailing list help me how to do it properly? While I'm
>> an experienced java engineer, I have no clue how to achieve this.
>> I was trying to modify hive code to do this myself, but unfortunately I got
>> only NullPointerExceptions.
>> 
>> 
>> More context below:
>> I want to use JdbcStorageHandler in Hive, that connects to Apache Kylin and
>> forward queries there. Then I put Tableau on top of Hive. Unfortunately,
>> the queries produced by Tableau to Hive and then reproduced by Calcite
>> Planner to Kylin, cannot be handled by Kylin (which BTW uses Calcite as
>> well). I disabled some of the hive optimizations which fixed some of my
>> queries. But I'm stuck on one I cannot disable. Tableau generates a query
>> with "GROUP BY 1.000000...01" , that is translated to "GROUP BY TRUE", by
>> Hive/Calcite. But neither of those can be handled by Kylin. I got an idea
>> that I will remove GROUP BY completely, because in my understanding it's
>> unecessary.
>> 
>> I will be very grateful for your help,
>> Kind Regards,
>> Krzysztof
>> 
>>

Re: Modify Calcite Planner in Hive to remove GROUP BY

Reply via email to