Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?
Bruce Momjian br...@momjian.us writes: Was this dealt with? Yes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?
Tom Lane wrote: According to SQL2008 section 7.11 window clause, general rule 5, the default definition of window framing in a window that has an ordering clause but no framing (RANGE/ROWS) clause is that the window frame for a given row R runs from the first row of its partition through the last peer of R. Section 6.10's general rules define the results of LEAD, LAG, FIRST_VALUE, LAST_VALUE, NTH_VALUE in terms of the rows available in the window frame of the current window, not its partition. Meanwhile, section 6.10 window function syntax rule 6 says that LEAD/LAG must use a window that has an ordering clause and no framing clause. This means that without an explicit framing clause, none of these functions can look beyond the last peer of the current row; and what's worse, LEAD/LAG seem to be explicitly forbidden from looking further than that even if we had an implementation of framing clauses. This seems to be less than sane. I would certainly expect that LEAD(x) gives you the next value of x regardless of peer-row status, since LAG(x) gives you the prior value of x regardless of peer row status. It is also simply bizarre for FIRST_VALUE to give you the partition's first row when LAST_VALUE doesn't give you the partition's last row. Are there any errata for SQL2008 yet? Can anyone check the actual behavior of DB2 or other DBMS's that claim to implement these functions? I notice that the current patch code seems to implement first/last/nth_value using the frame, but lead/lag using the partition, which doesn't conform to spec AFAICS ... but lead/lag on the frame doesn't actually appear to be a useful definition so I'd rather go with that than with what the letter of the spec seems to say. Lastly, for a simple aggregate used with an OVER clause, the current patch seems to define the aggregate as being taken over the frame rather than the partition, but I cannot find anything in SQL2008 that lends any support to *either* definition. Comments? This all seems rather badly broken. Was this dealt with? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?
On Saturday 27 December 2008 20:32:10 Ron Mayer wrote: ISTM ISO should hire you guys (or the postgres project as a whole) to proof-read their specs before they publish them. The way it really works though, effectively, is that vendors hire ISO to publish their specs. Having a few inconsistencies in 2000 pages of language specification with 20 years of legacy around it isn't so bad IMO, considering that there are really only a handful of guys working on this with any intensity. If we cared enough, we could submit these sorts of issues to the committee for clarification or correction. If anyone is convinced enough about this particular case, I can try to relay it and see what happens. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?
2008/12/27 Tom Lane t...@sss.pgh.pa.us: I notice that the current patch code seems to implement first/last/nth_value using the frame, but lead/lag using the partition, which doesn't conform to spec AFAICS ... but lead/lag on the frame doesn't actually appear to be a useful definition so I'd rather go with that than with what the letter of the spec seems to say. In 4.15, it says: The lead and lag functions each take three arguments, a value expression VE, an exact numeric literal OFFSET, and a value expression DEFAULT. For each row R within the *window partition P of R* defined by a window structure descriptor, the lag function returns the value of VE evaluated on a row that is OFFSET number of rows before R within P, for lead/lag, and returns the value of VE evaluated on the n-th row from the first (if FROM FIRST is specified or implied) or the last (if FROM LAST is specified) row of the *window frame* of R defined by a window structure descriptor for nth_value, added * by me. I understand lead/lag can affect all rows in the partition whereas first/last/nth_value does only rows in the frame. I guess that's why 6.10 rule 6.b forbids frame caluse in lead/lag but actually we can ignore frame though the frame is specified in the window attached with lead/lag, and it is better if you call aggregate with frame and lead/lag on the same window spec, which allows us to optimize it by calling them on the same node. It violates the spec but we'd better to extend the standard like the offset argument of lead/lag. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?
Hitoshi Harada umi.tan...@gmail.com writes: 2008/12/27 Tom Lane t...@sss.pgh.pa.us: I notice that the current patch code seems to implement first/last/nth_value using the frame, but lead/lag using the partition, which doesn't conform to spec AFAICS ... In 4.15, it says: The lead and lag functions each take three arguments, a value expression VE, an exact numeric literal OFFSET, and a value expression DEFAULT. For each row R within the *window partition P of R* defined by a window structure descriptor, the lag function returns the value of VE evaluated on a row that is OFFSET number of rows before R within P, Well, that's interesting, but I think the controlling definition is in 6.10 general rule 1b, which very clearly states that the frame is to be used for lead/lag (and the adjacent rules say the same for all the other standard window functions). The wording in 4.15 does seem like evidence that the spec authors may have misspoke in 6.10, but we're never going to settle it from the text of the spec. Can anyone check what DB2 and Oracle do here? In any case, both sections agree that last_value works on the frame, which makes it effectively useless with the default frame definition. So I'm still thinking that we need at least a subset of frame support. I'm tempted to propose that we just handle the CURRENT ROW and UNBOUNDED options for each end of the frame. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?
2008/12/28 Tom Lane t...@sss.pgh.pa.us: Hitoshi Harada umi.tan...@gmail.com writes: 2008/12/27 Tom Lane t...@sss.pgh.pa.us: I notice that the current patch code seems to implement first/last/nth_value using the frame, but lead/lag using the partition, which doesn't conform to spec AFAICS ... In 4.15, it says: The lead and lag functions each take three arguments, a value expression VE, an exact numeric literal OFFSET, and a value expression DEFAULT. For each row R within the *window partition P of R* defined by a window structure descriptor, the lag function returns the value of VE evaluated on a row that is OFFSET number of rows before R within P, Well, that's interesting, but I think the controlling definition is in 6.10 general rule 1b, which very clearly states that the frame is to be used for lead/lag (and the adjacent rules say the same for all the other standard window functions). The wording in 4.15 does seem like evidence that the spec authors may have misspoke in 6.10, but we're never going to settle it from the text of the spec. Can anyone check what DB2 and Oracle do here? I tested on Oracle 10.2.0, and the results are: select depname, empno, salary, lead(salary, 1) over (order by salary), lag(salary, 1) over (order by salary), first_value(salary) over (order by salary), last_value(salary) over (order by salary) from empsalary; DEPNAME EMPNO SALARY LEAD(SALARY,1)OVER(ORDERBYSALARY) LAG(SALARY,1)OVER(ORDERBYSALARY) FIRST_VALUE(SALARY)OVER(ORDERBYSALARY) LAST_VALUE(SALARY)OVER(ORDERBYSALARY) personnel5 35003900 35003500 personnel2 39004200350035003900 develop 7 42004500390035004200 develop 9 45004800420035004500 sales4 48004800450035004800 sales3 48005000480035004800 sales1 50005200480035005000 develop 10 52005200500035005200 develop 11 52006000520035005200 develop 8 6000 520035006000 which means the section 4.15 is true. Could anyone try DB2? In any case, both sections agree that last_value works on the frame, which makes it effectively useless with the default frame definition. So I'm still thinking that we need at least a subset of frame support. I'm tempted to propose that we just handle the CURRENT ROW and UNBOUNDED options for each end of the frame. If we can afford it, I and many users are so glad with it. In my opinion it is not so painful if we had done concrete fundamentals of window functions, but isn't it up to our time for the release? Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?
Hitoshi Harada wrote: 2008/12/28 Tom Lane t...@sss.pgh.pa.us: Hitoshi Harada umi.tan...@gmail.com writes: 2008/12/27 Tom Lane t...@sss.pgh.pa.us: which doesn't conform to spec AFAICS ... 4.15...says: interesting...6.10 general rule 1b, which very clearly states ... ... 4.15 does seem like evidence that the spec authors may have misspoke in 6.10 Oracle... results are: ... which means the section 4.15 is true ISTM ISO should hire you guys (or the postgres project as a whole) to proof-read their specs before they publish them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?
Hitoshi Harada wrote: I tested on Oracle 10.2.0, and the results are: select depname, empno, salary, lead(salary, 1) over (order by salary), lag(salary, 1) over (order by salary), first_value(salary) over (order by salary), last_value(salary) over (order by salary) from empsalary; DEPNAMEEMPNO SALARY LEAD(SALARY,1)OVER(ORDERBYSALARY) LAG(SALARY,1)OVER(ORDERBYSALARY) FIRST_VALUE(SALARY)OVER(ORDERBYSALARY) LAST_VALUE(SALARY)OVER(ORDERBYSALARY) personnel 5 35003900 35003500 personnel 2 39004200350035003900 develop7 42004500390035004200 develop9 45004800420035004500 sales 4 48004800450035004800 sales 3 48005000480035004800 sales 1 50005200480035005000 develop10 52005200500035005200 develop11 52006000520035005200 develop8 6000 520035006000 which means the section 4.15 is true. Could anyone try DB2? DB2 9.5 results: Using the empsalary table from the regression test in the patch: select depname, empno, salary, lead(salary, 1) over (order by salary), lag(salary, 1) over (order by salary), first_value(salary) over (order by salary), last_value(salary) over (order by salary) from empsalary; personnel 5 3500390035003500 personnel 2 39004200350035003900 develop 7 42004500390035004200 develop 9 45004800420035004500 sales 4 48004800450035004800 sales 3 48005000480035004800 sales 1 50005200480035005000 develop 10 52005200500035005200 develop 11 52006000520035005200 develop 8 6000520035006000 Which matches with your Oracle results. So either they both got it wrong by one copying the other... sarcasm Of course we all know it couldn't be Oracle copying IBM, that would never happen... /sarcasm David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?
Hitoshi Harada wrote: I tested on Oracle 10.2.0, and the results are: select depname, empno, salary, lead(salary, 1) over (order by salary), lag(salary, 1) over (order by salary), first_value(salary) over (order by salary), last_value(salary) over (order by salary) from empsalary; DEPNAMEEMPNO SALARY LEAD(SALARY,1)OVER(ORDERBYSALARY) LAG(SALARY,1)OVER(ORDERBYSALARY) FIRST_VALUE(SALARY)OVER(ORDERBYSALARY) LAST_VALUE(SALARY)OVER(ORDERBYSALARY) personnel 5 35003900 35003500 personnel 2 39004200350035003900 develop7 42004500390035004200 develop9 45004800420035004500 sales 4 48004800450035004800 sales 3 48005000480035004800 sales 1 50005200480035005000 develop10 52005200500035005200 develop11 52006000520035005200 develop8 6000 520035006000 which means the section 4.15 is true. Could anyone try DB2? DB2 9.5 results: Using the empsalary table from the regression test in the patch: select depname, empno, salary, lead(salary, 1) over (order by salary), lag(salary, 1) over (order by salary), first_value(salary) over (order by salary), last_value(salary) over (order by salary) from empsalary; personnel 5 3500390035003500 personnel 2 39004200350035003900 develop 7 42004500390035004200 develop 9 45004800420035004500 sales 4 48004800450035004800 sales 3 48005000480035004800 sales 1 50005200480035005000 develop 10 52005200500035005200 develop 11 52006000520035005200 develop 8 6000520035006000 Which matches with your Oracle results. So either they both got it wrong by one copying the other... sarcasm Of course we all know it couldn't be Oracle copying IBM, that would never happen... /sarcasm David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?
David Rowley dgrow...@gmail.com writes: Hitoshi Harada wrote: I tested on Oracle 10.2.0, and the results are: ... which means the section 4.15 is true. Could anyone try DB2? DB2 9.5 results [ are the same ] OK, good, that means the reference to the frame in 6.10 rule 1b is just a copy-and-pasteo. (I wonder if it got fixed in the final spec?) I guess this makes sense because we also see that 6.10 syntax rule 6b forbids a framing spec on lead/lag, which makes sense if these functions ignore the frame and no sense otherwise. I also realized after more thought that the way to get frame = partition with a default frame spec is to omit any ORDER BY in the window spec. So my concern about LAST_VALUE being useless without framing ability is unfounded, and I withdraw the complaint that we need to put in some minimal framing features. But we'll have to be careful to document all this properly. Also, it does seem that it might be worthwhile to try to ensure that combinations of windows that have the same PARTITION list and empty vs nonempty ORDER BY get optimized well. Right now it's dependent on ordering of the WindowClause whether you pay an extra sort for that case or not. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Frames vs partitions: is SQL2008 completely insane?
According to SQL2008 section 7.11 window clause, general rule 5, the default definition of window framing in a window that has an ordering clause but no framing (RANGE/ROWS) clause is that the window frame for a given row R runs from the first row of its partition through the last peer of R. Section 6.10's general rules define the results of LEAD, LAG, FIRST_VALUE, LAST_VALUE, NTH_VALUE in terms of the rows available in the window frame of the current window, not its partition. Meanwhile, section 6.10 window function syntax rule 6 says that LEAD/LAG must use a window that has an ordering clause and no framing clause. This means that without an explicit framing clause, none of these functions can look beyond the last peer of the current row; and what's worse, LEAD/LAG seem to be explicitly forbidden from looking further than that even if we had an implementation of framing clauses. This seems to be less than sane. I would certainly expect that LEAD(x) gives you the next value of x regardless of peer-row status, since LAG(x) gives you the prior value of x regardless of peer row status. It is also simply bizarre for FIRST_VALUE to give you the partition's first row when LAST_VALUE doesn't give you the partition's last row. Are there any errata for SQL2008 yet? Can anyone check the actual behavior of DB2 or other DBMS's that claim to implement these functions? I notice that the current patch code seems to implement first/last/nth_value using the frame, but lead/lag using the partition, which doesn't conform to spec AFAICS ... but lead/lag on the frame doesn't actually appear to be a useful definition so I'd rather go with that than with what the letter of the spec seems to say. Lastly, for a simple aggregate used with an OVER clause, the current patch seems to define the aggregate as being taken over the frame rather than the partition, but I cannot find anything in SQL2008 that lends any support to *either* definition. Comments? This all seems rather badly broken. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?
I wrote: Lastly, for a simple aggregate used with an OVER clause, the current patch seems to define the aggregate as being taken over the frame rather than the partition, but I cannot find anything in SQL2008 that lends any support to *either* definition. Never mind that --- I found it in 10.9 syntax rule 4.b. But what this seems to boil down to is that LEAD() and LAST_VALUE() are completely useless unless you're allowed to specify a nondefault framing clause ... and don't mind ignoring the clearly-insane restriction of 6.10 syntax rule 6.b. The minimum extra functionality needed to make these functions useful would seem to be to allow UNBOUNDED FOLLOWING ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers