Re: [HACKERS] introduction of WIP window function patch

H . Harada Sat, 05 Jul 2008 11:41:44 -0700

Hi,

2008/7/6 Simon Riggs <[EMAIL PROTECTED]>:
>
> On Sat, 2008-07-05 at 16:20 +0200, Martijn van Oosterhout wrote:
>> On Sat, Jul 05, 2008 at 07:04:29PM +0900, H.Harada wrote:
>
>> > http://umitanuki.net/pgsql/wfv01/design.html
>> >
>> > The problem is, as written in the "Things to discussed" section of the
>> > document, how you define window functions (e.g. RANK()). My idea is to
>> > treat them as specialized functions such as SET OF functions and mark
>> > it in pg_proc. But this doesn't resolve RANK() boundary problem.
>>
>> Actually, I would make RANK() and ROW_NUMBER() act more like
>> aggregates. ISTM you have two kinds of window functions:
>>
>> - aggregation: a result is calculated over a set and the result copied
>>   across all the rows.
>> - order depenadant: same as above, but the result is different for each
>>   row.
>>
>> I think you could make the latter work using the current aggregation
>> setup, just by calling the final_func for each row rather than just
>> once at the end.
>
> AFAICS there's no overlap between windowed aggregates and normal
> aggregates, so we can different infrastructure for each. I like the
> suggestion of doing it very similarly to current aggregates, but I would
> introduce a new function hook for windowed aggregates, wfunc.


I think there are two types of functions for windowed mode.
- windowed aggregate
this type of function is exactly same as normal aggregate. So we use
functions that have been in pgsql already. Actually in my patch above,
I didn't introduce any new function. This type of function includes
simply sum(), avg(), etc. which returns same values on a partition or
a window frame.

- windowed function
this is the NEW type of function. I guess we should add a new function
type to pgsql. This type of function includes rank(), rank_dense(),
row_number(), etc. Windowed functions returns different values per
tuple.

The difference between two types is if the function returns the same
value during a partition or different values.

So, windowed aggregate and normal aggregate overlap each other. How
you know which one is that you see OVER clause in SQL just after the
function call. When you see OVER after func(), and pg_proc says it's
an aggregate, it's a windowed aggregate. Otherwise, it's a windowed
function.

If I misunderstood about those definitions please correct me.

> Denserank is fairly simple
>
> CREATE AGGREGATE denserank()
> (
>        sfunc = increment
>        stype = bigint
>        initcond = 0
> )

I think this is ROW_NUMBER(), not DENSE_RANK(), isn't it?

> rank() is fairly complex because the state data must track 3 things:
> * the number of tuples seen so far (bigint)
> * the value of the last tuple seen (anyelement)
> * the rank of the last tuple seen (bigint)
>
> sfunc would compare the new value with the last value, if they match
> then we return the rank of the last tuple. If they don't match then we
> set the stored value and rank, then return the number of tuples seen so
> far as the rank.

Yeah, I think RANK() needs to know some or all of tuple of current
row. But it seems not to take any argument, which is the "boundary
problem" I said. Definitely RANK() or windowed function should know
about current tuple so that when to increment rank. But how? As
explicit argument? or specialized method?


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] introduction of WIP window function patch

Reply via email to