Pushing down aggregates through rhs of a left join

Alex Baden Thu, 30 Jul 2020 22:42:43 -0700

Hi all,

I have a query of the form:


SELECT a.x, SUM(b.y), SUM(b.z) FROM t1 a LEFT JOIN t2 b ON a.join_key
= b.join_key GROUP BY a.x ORDER BY a.x;

If table b has a large number of duplicate keys for `join_key`, the
left join can be very expensive to compute. Instead, we would like to
run:

WITH t2g AS (SELECT b.join_key, SUM(b.y), SUM(b.z) FROM t2 b GROUP BY
b.join_key)
SELECT a.x, SUM(bg.y), SUM(bg.z) FROM t1 a LEFT JOIN t2g bg ON
a.join_key = t2g.join_key GROUP BY a.x ORDER BY a.x;

Essentially, since we are only projecting aggregates from the rhs of
the join, and the aggregate functions are associative, we can group by
the join key to compute the aggregates up front, then join on the
grouped results, and finally aggregate among join matches.

Looking at the comments of the AGGREGATE_JOIN_TRANSPOSE_RULE, I noted
the following:
// OUTER joins are supported for group by without aggregate functions

But based on the above, if we have a left join I believe we can
transpose the aggregate and the join if the following conditions hold:
1) only expressions from the rhs of the join are aggregated
2) all aggregate functions from (1) are associative (can be split)
3) at least one expression from the lhs of the join is grouped

I am interested in implementing this rule (assuming the conditions
above are strong enough to guarantee correctness). Is there interest
in PRing something like this to calcite, either as a new rule or part
of AggregateJoinTranspose?

Thanks,
Alex

Pushing down aggregates through rhs of a left join

Reply via email to