Re: Pushing down aggregates through rhs of a left join

Haisheng Yuan Fri, 31 Jul 2020 14:57:19 -0700

>  Is there interest
>  in PRing something like this to calcite, either as a new rule or part
>  of AggregateJoinTranspose?


Yes. It is better to be part of AggregateJoinTranspose.

On 2020/07/31 05:41:56, Alex Baden <alex.ba...@omnisci.com> wrote: 
> Hi all,
> 
> I have a query of the form:
> 
> SELECT a.x, SUM(b.y), SUM(b.z) FROM t1 a LEFT JOIN t2 b ON a.join_key
> = b.join_key GROUP BY a.x ORDER BY a.x;
> 
> If table b has a large number of duplicate keys for `join_key`, the
> left join can be very expensive to compute. Instead, we would like to
> run:
> 
> WITH t2g AS (SELECT b.join_key, SUM(b.y), SUM(b.z) FROM t2 b GROUP BY
> b.join_key)
> SELECT a.x, SUM(bg.y), SUM(bg.z) FROM t1 a LEFT JOIN t2g bg ON
> a.join_key = t2g.join_key GROUP BY a.x ORDER BY a.x;
> 
> Essentially, since we are only projecting aggregates from the rhs of
> the join, and the aggregate functions are associative, we can group by
> the join key to compute the aggregates up front, then join on the
> grouped results, and finally aggregate among join matches.
> 
> Looking at the comments of the AGGREGATE_JOIN_TRANSPOSE_RULE, I noted
> the following:
> // OUTER joins are supported for group by without aggregate functions
> 
> But based on the above, if we have a left join I believe we can
> transpose the aggregate and the join if the following conditions hold:
> 1) only expressions from the rhs of the join are aggregated
> 2) all aggregate functions from (1) are associative (can be split)
> 3) at least one expression from the lhs of the join is grouped
> 
> I am interested in implementing this rule (assuming the conditions
> above are strong enough to guarantee correctness). Is there interest
> in PRing something like this to calcite, either as a new rule or part
> of AggregateJoinTranspose?
> 
> Thanks,
> Alex
>

Re: Pushing down aggregates through rhs of a left join

Reply via email to