Re: Questions regarding distinct operation implementation

Ankit Kumar Pandey Fri, 02 Dec 2022 12:10:20 -0800


On 02/12/22 00:40, Ankit Kumar Pandey wrote:

On 25/11/22 11:00, Ankit Kumar Pandey wrote:
On 25/11/22 02:14, David Rowley wrote:
On Fri, 25 Nov 2022 at 06:57, Ankit Kumar Pandey<[email protected]> wrote:
Please let me know any opinions on this.
I think if you're planning on working on this then step 1 would have
to be checking the SQL standard to see which set of rows it asks
implementations to consider for duplicate checks when deciding if the
transition should be performed or not.  Having not looked, I don't
know if this is the entire partition or just the rows in the current
frame.

Depending on what you want, an alternative today would be to run a
subquery to uniquify the rows the way you want and then do the window
function stuff.

David
Thanks David, these are excellent pointers, I will look into SQLstandard first and so on.
Hi,
Looking further into it, I am bit clear about expectations of havingdistinct in Windows Aggregates (although I couldn't got hands on SQLstandard as it is not in public domain but distinct in windowsaggregate is supported by Oracle and I am using it as reference).
For table (mytable):

id, name

1, A

1, A

10, B

3, A

1, A
/select avg(distinct id) over (partition by name)/ from mytable (inoracle db) yields:
2

2

2

2

10
From this, it is seen distinct is taken across the all rows in thepartition.
I also thought of using a subquery approach: /select avg(id) over(partition by name) from (select distinct(id), name from mytable)/
but this obviously doesn't yield right answer because result shouldcontain same number of rows as input. This implies we need to findpartition first and then remove duplicates within the partition.
Can we avoid any ordering/sort until existing logic finds if value isin frame (so as to respect any /order by/ clause given by user), andonce it is determined that tuple is in frame, skip the tuple if it isa duplicate? If aforementioned approach is right, question is how dowe check if it is duplicate? Should we create a lookup table (astuples coming to advance_windowaggregate can be in arbitrary order)?Or any other approach would be better?
Any opinion on this will be appreciated.

--
Regards,
Ankit Kumar Pandey

Hi,

I am still looking at this but unable to move ahead as I am not able touse prior use-cases (normal aggregates) to implement distinct in windowfunction because they both differ in design (and window function is bitunique in general). One approach (among others) that I thought was thatduring spool_tuples, rescan tuplestore and add new tuples only if theyare not already present. This is not very efficient because of multipleread operation on tuplestore, only for checking if tuple already existsand other issues (like tuplestore_in_memory forcing entire partition toget spooled in one go) etc.


Any ideas will be much appreciated.

Thanks.

--
Regards,
Ankit Kumar Pandey

Re: Questions regarding distinct operation implementation

Reply via email to