On Wed, Dec 28, 2022 at 6:18 AM Tom Lane <[email protected]> wrote:
> This patch is aimed at being smarter about cases where we have
> redundant GROUP BY entries, for example
>
> SELECT ... WHERE a.x = b.y GROUP BY a.x, b.y;
>
> It's clearly not necessary to perform grouping using both columns.
> Grouping by either one alone would produce the same results,
> assuming compatible equality semantics. I'm not sure how often
> such cases arise in the wild; but we have about ten of them in our
> regression tests, which makes me think it's worth the trouble to
> de-duplicate as long as it doesn't cost too much. And it doesn't,
> because PathKey construction already detects exactly this sort of
> redundancy. We need only do something with the knowledge.
While we are here, I wonder if we can do the same trick for
distinctClause, to cope with cases like
select distinct a.x, b.y from a, b where a.x = b.y;
And there is case from regression test 'select_distinct.sql' that can
benefit from this optimization.
--
-- Check mentioning same column more than once
--
EXPLAIN (VERBOSE, COSTS OFF)
SELECT count(*) FROM
(SELECT DISTINCT two, four, two FROM tenk1) ss;
Thanks
Richard