> Is there interest > in PRing something like this to calcite, either as a new rule or part > of AggregateJoinTranspose?
Yes. It is better to be part of AggregateJoinTranspose. On 2020/07/31 05:41:56, Alex Baden <alex.ba...@omnisci.com> wrote: > Hi all, > > I have a query of the form: > > SELECT a.x, SUM(b.y), SUM(b.z) FROM t1 a LEFT JOIN t2 b ON a.join_key > = b.join_key GROUP BY a.x ORDER BY a.x; > > If table b has a large number of duplicate keys for `join_key`, the > left join can be very expensive to compute. Instead, we would like to > run: > > WITH t2g AS (SELECT b.join_key, SUM(b.y), SUM(b.z) FROM t2 b GROUP BY > b.join_key) > SELECT a.x, SUM(bg.y), SUM(bg.z) FROM t1 a LEFT JOIN t2g bg ON > a.join_key = t2g.join_key GROUP BY a.x ORDER BY a.x; > > Essentially, since we are only projecting aggregates from the rhs of > the join, and the aggregate functions are associative, we can group by > the join key to compute the aggregates up front, then join on the > grouped results, and finally aggregate among join matches. > > Looking at the comments of the AGGREGATE_JOIN_TRANSPOSE_RULE, I noted > the following: > // OUTER joins are supported for group by without aggregate functions > > But based on the above, if we have a left join I believe we can > transpose the aggregate and the join if the following conditions hold: > 1) only expressions from the rhs of the join are aggregated > 2) all aggregate functions from (1) are associative (can be split) > 3) at least one expression from the lhs of the join is grouped > > I am interested in implementing this rule (assuming the conditions > above are strong enough to guarantee correctness). Is there interest > in PRing something like this to calcite, either as a new rule or part > of AggregateJoinTranspose? > > Thanks, > Alex >