"count distinct' does not have that problem, whether in a group-by or not.
I'm still not sure these are equivalent queries but maybe not seeing it.
Windowing makes sense when you need the whole window, or when you need
sliding windows to express the desired groups.
It may be unnecessary when your query does not need the window, just a
summary stat like 'max'. Depends.

On Sun, Feb 27, 2022 at 2:14 PM Bjørn Jørgensen <bjornjorgen...@gmail.com>
wrote:

> You are using distinct which collects everything to the driver. Soo use
> the other one :)
>
> søn. 27. feb. 2022 kl. 21:00 skrev Sid <flinkbyhe...@gmail.com>:
>
>> Basically, I am trying two different approaches for the same problem and
>> my concern is how it will behave in the case of big data if you talk about
>> millions of records. Which one would be faster? Is using windowing
>> functions a better way since it will load the entire dataset into a single
>> window and do the operations?
>>
>
>

Reply via email to