Hi all, I have a query of the form:
SELECT a.x, SUM(b.y), SUM(b.z) FROM t1 a LEFT JOIN t2 b ON a.join_key = b.join_key GROUP BY a.x ORDER BY a.x; If table b has a large number of duplicate keys for `join_key`, the left join can be very expensive to compute. Instead, we would like to run: WITH t2g AS (SELECT b.join_key, SUM(b.y), SUM(b.z) FROM t2 b GROUP BY b.join_key) SELECT a.x, SUM(bg.y), SUM(bg.z) FROM t1 a LEFT JOIN t2g bg ON a.join_key = t2g.join_key GROUP BY a.x ORDER BY a.x; Essentially, since we are only projecting aggregates from the rhs of the join, and the aggregate functions are associative, we can group by the join key to compute the aggregates up front, then join on the grouped results, and finally aggregate among join matches. Looking at the comments of the AGGREGATE_JOIN_TRANSPOSE_RULE, I noted the following: // OUTER joins are supported for group by without aggregate functions But based on the above, if we have a left join I believe we can transpose the aggregate and the join if the following conditions hold: 1) only expressions from the rhs of the join are aggregated 2) all aggregate functions from (1) are associative (can be split) 3) at least one expression from the lhs of the join is grouped I am interested in implementing this rule (assuming the conditions above are strong enough to guarantee correctness). Is there interest in PRing something like this to calcite, either as a new rule or part of AggregateJoinTranspose? Thanks, Alex