[HACKERS] window function v03 against HEAD
I happily announce that the first design of window function was finished and the patch against HEAD is released online. See http://umitanuki.net/pgsql/wfv03/design.html The window functions such like row_number(), rank(), dense_rank(), percent_rank() are experimentally defined within the patch as * It is formed as an aggregate function, seen in pg_aggregate * It does or does not have trans function. * It has a final function that is declared as VOLATILE. * Its final function is called multiple times during returning values. And ranking system is a bit ugly. The problem was how you know the ranking boundary. Inside rank_final(), you see WindowState node is pulled out from fcinfo-context and TupleTableSlot is used. Actually I am not sure if this is appropriate but have tried it, which did work. So as the first release, I guess it is better that the window functions are formulated but not open as user-defined functions. Also, since current design of the window functions is by aggregate, some of the SQL spec functions such like lag()/lead() was dropped. These functions need to reach for random rows from current row. For this needs, we might have to provide something like Window Object mechanism that allows random access to the current frame. There's no additional docs and tests, as well as lack of comments in source code. The make check will fail because some of the window functions do not have trans function. Reviews and comments are welcome. I'm eager to refine it. Regards, -- Hitoshi Harada window_function_v03_patch.tgz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] introduction of WIP window function patch
2008/7/17 David Fetter [EMAIL PROTECTED]: On Sat, Jul 05, 2008 at 07:04:29PM +0900, H.Harada wrote: Hi, As I proposed a month before, I am working on window function. Although this work is at quite early step, I would like to introduce it since a part of it have been finished. If you can afford and are interested in it, please review the document and patch, or compile the applied source to execute an attached sample SQL. http://umitanuki.net/pgsql/wfv01/design.html Currently, only aggregation over window does work. I am planning to work for the combination of window and normal aggregation from now on, which I guess I can manage to do. The problem is, as written in the Things to discussed section of the document, how you define window functions (e.g. RANK()). My idea is to treat them as specialized functions such as SET OF functions and mark it in pg_proc. But this doesn't resolve RANK() boundary problem. I am so happy with any kind of comments, reviews or critiques. Regards, Many people have been waiting for years for this functionality. Thanks so much for doing this. When will you have another patch that applies against CVS HEAD? Cheers, David. -- David Fetter [EMAIL PROTECTED] http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: [EMAIL PROTECTED] Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate As you see the patch is not completed yet, lacking RANK() etc. I am now struggling for this design which I assume will finish by the end of this month. Since then, I will apply it against HEAD. So another patch to -hackers will be around first half of August, or by the next Commit Fest at worst. There must be more discussion about the RANK() design after the post. But I hope it will be in 8.4. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] introduction of WIP window function patch
2008/7/6 Simon Riggs [EMAIL PROTECTED]: I think there are two types of functions for windowed mode. - windowed aggregate this type of function is exactly same as normal aggregate. So we use functions that have been in pgsql already. Actually in my patch above, I didn't introduce any new function. This type of function includes simply sum(), avg(), etc. which returns same values on a partition or a window frame. - windowed function this is the NEW type of function. I guess we should add a new function type to pgsql. This type of function includes rank(), rank_dense(), row_number(), etc. Windowed functions returns different values per tuple. The difference between two types is if the function returns the same value during a partition or different values. So, windowed aggregate and normal aggregate overlap each other. How you know which one is that you see OVER clause in SQL just after the function call. When you see OVER after func(), and pg_proc says it's an aggregate, it's a windowed aggregate. Otherwise, it's a windowed function. If I misunderstood about those definitions please correct me. Yes, I understand that and I think Martijn does also. I've done some thinking and rooting around on this and I think I have a different proposal for you, different to what we just discussed. SQL2008 specifies window functions as * rank functions * distribution functions: percent_rank() and cume_dist() * rownumber() * ntile() * lead() and lag() * first, last and n-th value functions * inverse distribution functions (similar to n-th value, based upon distribution function results) plus window aggregate functions (the normal aggregates COUNT, SUM etc) Now looking through all of those, I don't see *any* window functions that need access to different datatypes, or actually need to see the values of the attributes. The normal aggregates work with windows identically to the way they do without windows, so no change needed there. AFAICS we could define all of the non-aggregate window functions on the above list *without* defining them as functions in pg_proc. That would be a benefit because the window functions are very powerful and we'd need to give them access to any/all tuples in the window. So that would mean we don't provide a mechanism for user-defined windowed aggregate functions at all. Which solves the discussion about how to pass generic info through to them (at least long enough to get the first implementation done). We do already have such functions in code, e.g. greatest(). Sure they need to be defined in code, but we don't need to come up with a generic API for them. If you disagree, think about how we'd implement lag() or ntile() and what info we'd need to pass them. Well, your idea is one of considerable choices. But I like pgsql's extensibility that enables pgsql more powerful DBMS. So, I design it as you propsed though trying to unify the function form somehow. Just idea, how about pass window object to a function? We'll provide window operation API then in the function you take window object through fcinfo: Datum func(PG_FUNCTION_ARGS) { Datum v; WindowObject w = get_window(fcinfo); HeapTuple htup_current = window_current_row(w); HeapTuple htup_prev = window_preceding(w, 1); /* do something */ PG_RETURN_DATUM(v); } so that a function access whole the window. APIs include - current row - preceding row - following row - current key - preceding key - following key - iterate for the window where key means ORDER BY values in OVER clause. Fortunately, my patch uses tuplestore/tuplesort to create window, which allows random access operation such above. Is there security/performance issue about this? -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] introduction of WIP window function patch
2008/7/6 Simon Riggs [EMAIL PROTECTED]: On Sun, 2008-07-06 at 17:39 +0900, H.Harada wrote: Is there security/performance issue about this? Performance, yes. If we need access to more rows than will fit within work_mem we have a problem and will need to restart sort. Giving random access to all tuples in the current window, whatever its size would be very costly, which is why we have optimized that access for merge joins. So we need to know how far back access is required, if any - think of that as an access window definition. Is this about tuplesort, not tuplestore? At a glance, tuplestore seems to be able to access tuples randomly without any performance problem, just fitting its pointer. So I thought planner should always insert Sort node before Window node to let Window not to sort, as I explained in the document. But anyways, I understand some kind of optimization mechanism to scroll in/out window is needed. I know I rattle on about performance, but with window functions it will be critical to their usability to have them perform well. We can already do the types of analysis that window functions allow, it just requires hand written procedures to do it. So the window functions must perform acceptably well against very large tables (i.e. much bigger than memory). I know, the probable use case is against large data set. There's no reason to add this feature if it is slower than self joins or other kludge methods. -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] introduction of WIP window function patch
Hi, As I proposed a month before, I am working on window function. Although this work is at quite early step, I would like to introduce it since a part of it have been finished. If you can afford and are interested in it, please review the document and patch, or compile the applied source to execute an attached sample SQL. http://umitanuki.net/pgsql/wfv01/design.html Currently, only aggregation over window does work. I am planning to work for the combination of window and normal aggregation from now on, which I guess I can manage to do. The problem is, as written in the Things to discussed section of the document, how you define window functions (e.g. RANK()). My idea is to treat them as specialized functions such as SET OF functions and mark it in pg_proc. But this doesn't resolve RANK() boundary problem. I am so happy with any kind of comments, reviews or critiques. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] introduction of WIP window function patch
Hi, 2008/7/6 Simon Riggs [EMAIL PROTECTED]: On Sat, 2008-07-05 at 16:20 +0200, Martijn van Oosterhout wrote: On Sat, Jul 05, 2008 at 07:04:29PM +0900, H.Harada wrote: http://umitanuki.net/pgsql/wfv01/design.html The problem is, as written in the Things to discussed section of the document, how you define window functions (e.g. RANK()). My idea is to treat them as specialized functions such as SET OF functions and mark it in pg_proc. But this doesn't resolve RANK() boundary problem. Actually, I would make RANK() and ROW_NUMBER() act more like aggregates. ISTM you have two kinds of window functions: - aggregation: a result is calculated over a set and the result copied across all the rows. - order depenadant: same as above, but the result is different for each row. I think you could make the latter work using the current aggregation setup, just by calling the final_func for each row rather than just once at the end. AFAICS there's no overlap between windowed aggregates and normal aggregates, so we can different infrastructure for each. I like the suggestion of doing it very similarly to current aggregates, but I would introduce a new function hook for windowed aggregates, wfunc. I think there are two types of functions for windowed mode. - windowed aggregate this type of function is exactly same as normal aggregate. So we use functions that have been in pgsql already. Actually in my patch above, I didn't introduce any new function. This type of function includes simply sum(), avg(), etc. which returns same values on a partition or a window frame. - windowed function this is the NEW type of function. I guess we should add a new function type to pgsql. This type of function includes rank(), rank_dense(), row_number(), etc. Windowed functions returns different values per tuple. The difference between two types is if the function returns the same value during a partition or different values. So, windowed aggregate and normal aggregate overlap each other. How you know which one is that you see OVER clause in SQL just after the function call. When you see OVER after func(), and pg_proc says it's an aggregate, it's a windowed aggregate. Otherwise, it's a windowed function. If I misunderstood about those definitions please correct me. Denserank is fairly simple CREATE AGGREGATE denserank() ( sfunc = increment stype = bigint initcond = 0 ) I think this is ROW_NUMBER(), not DENSE_RANK(), isn't it? rank() is fairly complex because the state data must track 3 things: * the number of tuples seen so far (bigint) * the value of the last tuple seen (anyelement) * the rank of the last tuple seen (bigint) sfunc would compare the new value with the last value, if they match then we return the rank of the last tuple. If they don't match then we set the stored value and rank, then return the number of tuples seen so far as the rank. Yeah, I think RANK() needs to know some or all of tuple of current row. But it seems not to take any argument, which is the boundary problem I said. Definitely RANK() or windowed function should know about current tuple so that when to increment rank. But how? As explicit argument? or specialized method? -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] introduction of WIP window function patch
2008/7/5 Martijn van Oosterhout [EMAIL PROTECTED]: On Sat, Jul 05, 2008 at 07:04:29PM +0900, H.Harada wrote: Hi, As I proposed a month before, I am working on window function. Very nice! http://umitanuki.net/pgsql/wfv01/design.html The problem is, as written in the Things to discussed section of the document, how you define window functions (e.g. RANK()). My idea is to treat them as specialized functions such as SET OF functions and mark it in pg_proc. But this doesn't resolve RANK() boundary problem. Actually, I would make RANK() and ROW_NUMBER() act more like aggregates. ISTM you have two kinds of window functions: - aggregation: a result is calculated over a set and the result copied across all the rows. - order depenadant: same as above, but the result is different for each row. So I agree the definition of these two types. I think you could make the latter work using the current aggregation setup, just by calling the final_func for each row rather than just once at the end. How do you know which type of two above is used in the same SQL syntax? -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] proposal: add window function to 8.4
This topic has been discussed on this list and many user expect that PostgreSQL implements it. I'd like to work on this feature and hope that we can include it on 8.4. Former discussions are here: http://archives.postgresql.org/pgsql-hackers/2004-11/msg01093.php http://archives.postgresql.org/pgsql-hackers/2007-01/msg00861.php How it works and examples: SELECT dept, empno, RANK() OVER(PARTITION BY dept ORDER BY age) as age_rank, RANK() OVER(PARTITION BY dept ORDER BY salary) as salary_rank, SUM(salary) OVER(PARTITION BY dept ORDER BY age) as run_total FROM employees ORDER BY 1, 3, 4; dept empno age_rank salary_rank run_total ENG 2 12 4 ENG 1 21 9 QA 3 12 3 QA 4 21 65000 (ref.: http://www.gavinsherry.org/talks/window_talk.pdf) My current idea and concept: - add window function and treat it specially such like aggregate function and setof function. - some features may be dropped at the first release, considering to support them later. - to formulate and to let it work properly are primary, performance optimization is secondary. From my survey around web and list archive, the points are: - what is window function rank(), rank_dense(), lead() and others? First of all, we should define the window function such like aggregate function. In my opinion, there are two types of functions in OVER () call context. One of them is aggregate, and the other is window (ranking) function. Sum() in a query like SELECT empno, sum(salary) OVER (PARTITION BY depno) FROM empsalary; is obviously aggregate function. This type of function can be used as it is now. Only executer will change its behavior. Current pgsql feature sets lack window function like rank(). This type of function must 1) have a context such as SETOF functions, 2) return values until executor says DONE, rather than function itself says DONE as in SETOF function, and 3) know about other tuples (mainly ORDER BY clause), as rank() doesn't take any arguments but knows the ranking boundary. I suppose that pgsql have new function system for these types called window function. Once we can define window function, users have extensibility to this type of function. - do we really need FRAME clause? From my survey, Oracle and DB2 have FRAME clause SELECT empno, sum(salary) OVER (ORDER BY empno ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM empsalary; while MS SQL Server doesn't (note that the literal from ROWS to CURRENT ROW is called FRAME clause). To implement FRAME clause is much more complicated than only PARTITION and ORDER clause support because we need random access to result tuples. Though we will be expected to support FAME clause in long-term, for the first release it might be able to be put off. Even though rank() doesn't support FRAME clause (only PARTITION and ORDER) it is so useful, more than now at least. - execution order In SQL:2003, the execution order is defined as where - group by - having - (windowing) * N - order by (outer, currently existing one) where windowing is partition by - order by - (framing) * N But Oracle seems that it has another order (windowing) * N - where - group by ... and so on. which is better for us? With Oracle's one you can say SELECT empno, rank() OVER (PARTITION BY depno ORDER BY saraly) AS topsalary FROM empsalary WHERE topsaraly 3 to get the top 3 people taking heighest salary. In the SQL standard, you should the nest query. I insist the first (standard) one is better because we may want use the result of normal aggregate in OVER clause. - plan and node Currently in my mind the execution strategy could be: 1. Where GroupBy Having | 2. SortBy partitionClause, orderByClause | 3. Window foreach partition: if not there_is_frame(): aggvalue = null foreach row in partition: aggvalue = agg_trans_func(aggvalue) aggvalue = agg_final_func(aggvalue) foreach row in partition: if has frame clause: aggvalue = null frame = make_frame() foreach row_in_frame: aggvalue = aggregate_trans_func(aggvalue) aggvalue = aggregate_final_func(aggvalue) set aggvalue to row val = window_func() set val to row goto 2. if another window remained | 4. SortBy ORDER BY clause (outer) Limit | 5. Output This pseudo code is quite simple and stupid. We may optimize it by splitting tasks with MergeJoin, etc. or think about process 2. that collect same PARTITION clauses to reduce sort operation. Or to use Hash Agg to create PARTITION. But let's be stupid at first. Optimization is secondary. References: description by Stefan DeBloch http://wwwdvs.informatik.uni-kl.de/courses/NEDM/WS0607/Vorlesungsunterlagen/NEDM.Chapter.06.Windows_and_Query_Functions_in_SQL.pdf via Wikipedia[Select (SQL)] http://en.wikipedia.org/wiki/Select_(SQL) BTW, what about Bizgres