On Tue, Apr 7, 2026 at 10:31 PM Haibo Yan <[email protected]> wrote:
> Hi Hackers > > I’d like to start a patch series to add support for DISTINCT in plain > aggregate window functions. > > PostgreSQL currently rejects cases such as: > > > --------------------------------------------------------------------------------------------------------- > > count(DISTINCT x) OVER (PARTITION BY p) > > sum(DISTINCT x) OVER () > > > --------------------------------------------------------------------------------------------------------- > > My plan is to implement this incrementally, by frame class and by feature > dimension, rather than trying to solve every case in a single patch. > > For the first step, I’m posting patches 1-2 only and would appreciate your > review on those. > > Patch 1 is intentionally very small: > > > - add parse/deparse plumbing for DISTINCT in plain aggregate window > functions > - carry the information through WindowFunc > - preserve it in ruleutils / deparse > - but still reject execution > > Patch 1 by itself does not add user-visible execution support, so I think > it is best reviewed together with patch 2. > > Patch 2 adds the first real executor support: > > > - plain aggregate window functions only > - single-argument DISTINCT only > - whole-partition frames only > > That means support for cases where the frame is effectively the entire > partition, for example: > > > --------------------------------------------------------------------------------------------------------- > > count(DISTINCT x) OVER (PARTITION BY p) > sum(DISTINCT x) OVER () > avg(DISTINCT x) OVER ( > PARTITION BY p > ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING > ) > > > --------------------------------------------------------------------------------------------------------- > > The executor approach in patch 2 is deliberately conservative: > > > - collect the partition’s aggregate inputs > - sort and deduplicate them > - feed the distinct values into the aggregate transition function > - finalize once > - reuse the cached result for all rows in the partition > > This avoids the much harder moving-frame cases for now. > > My proposed overall roadmap is below: > > Patch 1 > > > - parse/deparse plumbing only > - allow DISTINCT to be represented on plain aggregate window functions > - preserve it through deparse / view definition > - still reject execution > > Patch 2 > > > - executor support for whole-partition frames > - plain aggregate window functions only > - single-argument DISTINCT only > - sort-and-dedup implementation > > Patch 3 > > > - executor support for non-shrinking frames > - frames starting at UNBOUNDED PRECEDING with no EXCLUDE > - incremental hash-based seen-set > - covers default ORDER BY frame and supported ... CURRENT ROW / ... > FOLLOWING cases > > Patch 4 > > > - executor support for sliding ROWS frames > - refcounted DISTINCT state > - add/remove distinct contributions as rows enter and leave the frame > - fallback to restart/recompute for aggregates without inverse > transition support > > Patch 5 > > > - extend the sliding DISTINCT machinery to sliding RANGE and GROUPS > - keep the same refcounted model > - no EXCLUDE yet > > Patch 6 > > > - support EXCLUDE clauses > - likely correctness-first, with restart/recompute where incremental > maintenance is too awkward > > Patch 7 > > > - support multi-argument DISTINCT > - upgrade DISTINCT keys from single datum to tuple/composite key > representation > > Patch 8 > > > - support aggregate ORDER BY inside window aggregates > - left until last because it is orthogonal to frame-shape support and > substantially complicates both parse representation and executor behavior > > In short, the roadmap is: > > > 1. plumbing > 2. whole-partition > 3. non-shrinking > 4. sliding ROWS > 5. sliding RANGE / GROUPS > 6. EXCLUDE > 7. multi-arg DISTINCT > 8. aggregate ORDER BY > > For this posting, I’d especially appreciate feedback on: > > > - whether patch 1 + patch 2 is a reasonable first split > - whether whole-partition-only executor support is a good first > executable step > - whether the proposed long-term breakdown seems sensible > > Thanks in advance for any review or comments. > > Best regards, > > Haibo Yan > > > I’ve managed to finish the first sub-series adding initial support for DISTINCT in plain aggregate window functions. Patch 1 teaches the parser and deparser to accept DISTINCT in plain window aggregates. This is representation-only and does not change execution yet. Patch 2 adds executor support for the simplest case, whole-partition frames, using a sort-and-deduplicate path. Patch 3 extends that support to non-shrinking frames, where rows only enter the frame, by using an incremental hash-based seen-set instead of restarting the aggregate for each row. Please review. Regards, Haibo
v2-0003-Extend-DISTINCT-window-aggregates-to-grow-only-fr.patch
Description: Binary data
v2-0002-Support-DISTINCT-in-whole-partition-window-aggreg.patch
Description: Binary data
v2-0001-Parse-and-deparse-DISTINCT-in-window-aggregate-ca.patch
Description: Binary data
