Re: Questions regarding distinct operation implementation

David G. Johnston Thu, 01 Dec 2022 13:52:29 -0800

On Thu, Dec 1, 2022 at 2:37 PM David Rowley <[email protected]> wrote:


>
> The question is, what do you want to make work?  If you're not worried
> about supporting DISTINCT when there is an ORDER BY clause and the
> frame options are effectively ROWS BETWEEN UNBOUNDED PRECEDING AND
> UNBOUNDED FOLLOWING, then it's going to be much easier to make work.
> You never need to worry about rows dropping out of visibility in the
> frame. Simply all rows in the partition are in the frame.
>

I would definitely want the ability to have the output ordered and distinct
at the same time.

array_agg(distinct col) over (order by whatever)

Conceptually this seems like it can be trivially accomplished with a simple
lookup table, the key being the distinct column(s) and the value being a
counter - with the entry being removed when the counter goes to zero
(decreases happening each time a row goes out of scope).  The main concern,
I suspect, isn't implementation ability, it is speed and memory consumption.

I would expect the distinct output to be identical to the non-distinct
output except for duplicates removed.  Using array_agg as an example makes
seeing the distinction quite easy.

Thinking over the above a bit more, is something like this possible?

array_agg(distinct col order by col) over (order by whatever)

i.e., can we add order by within the aggregate to control its internal
ordering separately from the ordering needed for the window framing?

David J.

Re: Questions regarding distinct operation implementation

Reply via email to