Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?

2009-01-22 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Was this dealt with?

Yes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?

2009-01-21 Thread Bruce Momjian
Tom Lane wrote:
 According to SQL2008 section 7.11 window clause, general rule 5, the
 default definition of window framing in a window that has an ordering
 clause but no framing (RANGE/ROWS) clause is that the window frame for
 a given row R runs from the first row of its partition through the last
 peer of R.
 
 Section 6.10's general rules define the results of LEAD, LAG,
 FIRST_VALUE, LAST_VALUE, NTH_VALUE in terms of the rows available in the
 window frame of the current window, not its partition.
 
 Meanwhile, section 6.10 window function syntax rule 6 says that
 LEAD/LAG must use a window that has an ordering clause and no
 framing clause.
 
 This means that without an explicit framing clause, none of these
 functions can look beyond the last peer of the current row; and
 what's worse, LEAD/LAG seem to be explicitly forbidden from looking
 further than that even if we had an implementation of framing clauses.
 
 This seems to be less than sane.  I would certainly expect that LEAD(x)
 gives you the next value of x regardless of peer-row status, since
 LAG(x) gives you the prior value of x regardless of peer row status.
 It is also simply bizarre for FIRST_VALUE to give you the partition's
 first row when LAST_VALUE doesn't give you the partition's last row.
 
 Are there any errata for SQL2008 yet?  Can anyone check the actual
 behavior of DB2 or other DBMS's that claim to implement these functions?
 
 I notice that the current patch code seems to implement
 first/last/nth_value using the frame, but lead/lag using the partition,
 which doesn't conform to spec AFAICS ... but lead/lag on the frame
 doesn't actually appear to be a useful definition so I'd rather go
 with that than with what the letter of the spec seems to say.
 
 Lastly, for a simple aggregate used with an OVER clause, the current
 patch seems to define the aggregate as being taken over the frame
 rather than the partition, but I cannot find anything in SQL2008 that
 lends any support to *either* definition.
 
 Comments?  This all seems rather badly broken.

Was this dealt with?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?

2009-01-03 Thread Peter Eisentraut
On Saturday 27 December 2008 20:32:10 Ron Mayer wrote:
 ISTM ISO should hire you guys (or the postgres project as a whole)
 to proof-read their specs before they publish them.

The way it really works though, effectively, is that vendors hire ISO to 
publish their specs.

Having a few inconsistencies in 2000 pages of language specification with 20 
years of legacy around it isn't so bad IMO, considering that there are really 
only a handful of guys working on this with any intensity.

If we cared enough, we could submit these sorts of issues to the committee for 
clarification or correction.  If anyone is convinced enough about this 
particular case, I can try to relay it and see what happens.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?

2008-12-27 Thread Hitoshi Harada
2008/12/27 Tom Lane t...@sss.pgh.pa.us:
 I notice that the current patch code seems to implement
 first/last/nth_value using the frame, but lead/lag using the partition,
 which doesn't conform to spec AFAICS ... but lead/lag on the frame
 doesn't actually appear to be a useful definition so I'd rather go
 with that than with what the letter of the spec seems to say.

In 4.15, it says:

The lead and lag functions each take three arguments, a value
expression VE, an exact numeric literal
OFFSET, and a value expression DEFAULT. For each row R within the
*window partition P of R* defined by
a window structure descriptor, the lag function returns the value of
VE evaluated on a row that is OFFSET
number of rows before R within P,

for lead/lag, and

returns the value of VE evaluated on the n-th row from the first (if
FROM FIRST is specified or implied) or the last (if FROM LAST is
specified) row of the *window frame* of R
defined by a window structure descriptor

for nth_value, added * by me.

I understand lead/lag can affect all rows in the partition whereas
first/last/nth_value does only rows in the frame.

I guess that's why 6.10 rule 6.b forbids frame caluse in lead/lag but
actually we can ignore frame though the frame is specified in the
window attached with lead/lag, and it is better if you call aggregate
with frame and lead/lag on the same window spec, which allows us to
optimize it by calling them on the same node. It violates the spec but
we'd better to extend the standard like the offset argument of
lead/lag.


Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?

2008-12-27 Thread Tom Lane
Hitoshi Harada umi.tan...@gmail.com writes:
 2008/12/27 Tom Lane t...@sss.pgh.pa.us:
 I notice that the current patch code seems to implement
 first/last/nth_value using the frame, but lead/lag using the partition,
 which doesn't conform to spec AFAICS ...

 In 4.15, it says:

 The lead and lag functions each take three arguments, a value
 expression VE, an exact numeric literal
 OFFSET, and a value expression DEFAULT. For each row R within the
 *window partition P of R* defined by
 a window structure descriptor, the lag function returns the value of
 VE evaluated on a row that is OFFSET
 number of rows before R within P,

Well, that's interesting, but I think the controlling definition is in
6.10 general rule 1b, which very clearly states that the frame is to be
used for lead/lag (and the adjacent rules say the same for all the other
standard window functions).

The wording in 4.15 does seem like evidence that the spec authors may
have misspoke in 6.10, but we're never going to settle it from the text
of the spec.  Can anyone check what DB2 and Oracle do here?

In any case, both sections agree that last_value works on the frame,
which makes it effectively useless with the default frame definition.
So I'm still thinking that we need at least a subset of frame support.
I'm tempted to propose that we just handle the CURRENT ROW and UNBOUNDED
options for each end of the frame.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?

2008-12-27 Thread Hitoshi Harada
2008/12/28 Tom Lane t...@sss.pgh.pa.us:
 Hitoshi Harada umi.tan...@gmail.com writes:
 2008/12/27 Tom Lane t...@sss.pgh.pa.us:
 I notice that the current patch code seems to implement
 first/last/nth_value using the frame, but lead/lag using the partition,
 which doesn't conform to spec AFAICS ...

 In 4.15, it says:

 The lead and lag functions each take three arguments, a value
 expression VE, an exact numeric literal
 OFFSET, and a value expression DEFAULT. For each row R within the
 *window partition P of R* defined by
 a window structure descriptor, the lag function returns the value of
 VE evaluated on a row that is OFFSET
 number of rows before R within P,

 Well, that's interesting, but I think the controlling definition is in
 6.10 general rule 1b, which very clearly states that the frame is to be
 used for lead/lag (and the adjacent rules say the same for all the other
 standard window functions).

 The wording in 4.15 does seem like evidence that the spec authors may
 have misspoke in 6.10, but we're never going to settle it from the text
 of the spec.  Can anyone check what DB2 and Oracle do here?

I tested on Oracle 10.2.0, and the results are:

select depname, empno, salary,
lead(salary, 1) over (order by salary),
lag(salary, 1) over (order by salary),
first_value(salary) over (order by salary),
last_value(salary) over (order by salary)
from empsalary;

DEPNAME  EMPNO   SALARY  LEAD(SALARY,1)OVER(ORDERBYSALARY)  
LAG(SALARY,1)OVER(ORDERBYSALARY)
FIRST_VALUE(SALARY)OVER(ORDERBYSALARY)  
LAST_VALUE(SALARY)OVER(ORDERBYSALARY)
personnel5   35003900    35003500
personnel2   39004200350035003900
develop  7   42004500390035004200
develop  9   45004800420035004500
sales4   48004800450035004800
sales3   48005000480035004800
sales1   50005200480035005000
develop  10  52005200500035005200
develop  11  52006000520035005200
develop  8   6000    520035006000

which means the section 4.15 is true. Could anyone try DB2?

 In any case, both sections agree that last_value works on the frame,
 which makes it effectively useless with the default frame definition.
 So I'm still thinking that we need at least a subset of frame support.
 I'm tempted to propose that we just handle the CURRENT ROW and UNBOUNDED
 options for each end of the frame.

If we can afford it, I and many users are so glad with it. In my
opinion it is not so painful if we had done concrete fundamentals of
window functions, but isn't it up to our time for the release?


Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?

2008-12-27 Thread Ron Mayer
Hitoshi Harada wrote:
 2008/12/28 Tom Lane t...@sss.pgh.pa.us:
 Hitoshi Harada umi.tan...@gmail.com writes:
 2008/12/27 Tom Lane t...@sss.pgh.pa.us:
 which doesn't conform to spec AFAICS ...
 4.15...says:
 interesting...6.10 general rule 1b, which very clearly states ...
 ... 4.15 does seem like evidence that the spec authors may
 have misspoke in 6.10
 Oracle... results are: ... which means the section 4.15 is true

ISTM ISO should hire you guys (or the postgres project as a whole)
to proof-read their specs before they publish them.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?

2008-12-27 Thread David Rowley
Hitoshi Harada wrote:
 I tested on Oracle 10.2.0, and the results are:
 
 select depname, empno, salary,
 lead(salary, 1) over (order by salary),
 lag(salary, 1) over (order by salary),
 first_value(salary) over (order by salary),
 last_value(salary) over (order by salary)
 from empsalary;
 
 DEPNAMEEMPNO   SALARY  LEAD(SALARY,1)OVER(ORDERBYSALARY)
 LAG(SALARY,1)OVER(ORDERBYSALARY)
 FIRST_VALUE(SALARY)OVER(ORDERBYSALARY)
 LAST_VALUE(SALARY)OVER(ORDERBYSALARY)
 personnel  5   35003900    35003500
 personnel  2   39004200350035003900
 develop7   42004500390035004200
 develop9   45004800420035004500
 sales  4   48004800450035004800
 sales  3   48005000480035004800
 sales  1   50005200480035005000
 develop10  52005200500035005200
 develop11  52006000520035005200
 develop8   6000    520035006000
 
 which means the section 4.15 is true. Could anyone try DB2?

DB2 9.5 results:

Using the empsalary table from the regression test in the patch:

select depname, empno, salary,
lead(salary, 1) over (order by salary),
lag(salary, 1) over (order by salary),
first_value(salary) over (order by salary),
last_value(salary) over (order by salary)
from empsalary;

personnel   5   3500390035003500
personnel   2   39004200350035003900
develop 7   42004500390035004200
develop 9   45004800420035004500
sales   4   48004800450035004800
sales   3   48005000480035004800
sales   1   50005200480035005000
develop 10  52005200500035005200
develop 11  52006000520035005200
develop 8   6000520035006000


Which matches with your Oracle results. So either they both got it wrong by
one copying the other... sarcasm Of course we all know it couldn't be
Oracle copying IBM, that would never happen... /sarcasm

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?

2008-12-27 Thread David Rowley
Hitoshi Harada wrote:
 I tested on Oracle 10.2.0, and the results are:
 
 select depname, empno, salary,
 lead(salary, 1) over (order by salary),
 lag(salary, 1) over (order by salary),
 first_value(salary) over (order by salary),
 last_value(salary) over (order by salary)
 from empsalary;
 
 DEPNAMEEMPNO   SALARY  LEAD(SALARY,1)OVER(ORDERBYSALARY)
 LAG(SALARY,1)OVER(ORDERBYSALARY)
 FIRST_VALUE(SALARY)OVER(ORDERBYSALARY)
 LAST_VALUE(SALARY)OVER(ORDERBYSALARY)
 personnel  5   35003900    35003500
 personnel  2   39004200350035003900
 develop7   42004500390035004200
 develop9   45004800420035004500
 sales  4   48004800450035004800
 sales  3   48005000480035004800
 sales  1   50005200480035005000
 develop10  52005200500035005200
 develop11  52006000520035005200
 develop8   6000    520035006000
 
 which means the section 4.15 is true. Could anyone try DB2?

DB2 9.5 results:

Using the empsalary table from the regression test in the patch:

select depname, empno, salary,
lead(salary, 1) over (order by salary),
lag(salary, 1) over (order by salary),
first_value(salary) over (order by salary),
last_value(salary) over (order by salary)
from empsalary;

personnel   5   3500390035003500
personnel   2   39004200350035003900
develop 7   42004500390035004200
develop 9   45004800420035004500
sales   4   48004800450035004800
sales   3   48005000480035004800
sales   1   50005200480035005000
develop 10  52005200500035005200
develop 11  52006000520035005200
develop 8   6000520035006000


Which matches with your Oracle results. So either they both got it wrong by
one copying the other... sarcasm Of course we all know it couldn't be
Oracle copying IBM, that would never happen... /sarcasm

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?

2008-12-27 Thread Tom Lane
David Rowley dgrow...@gmail.com writes:
 Hitoshi Harada wrote:
 I tested on Oracle 10.2.0, and the results are:
 ...
 which means the section 4.15 is true. Could anyone try DB2?

 DB2 9.5 results [ are the same ]

OK, good, that means the reference to the frame in 6.10 rule 1b is just
a copy-and-pasteo.  (I wonder if it got fixed in the final spec?)
I guess this makes sense because we also see that 6.10 syntax rule 6b
forbids a framing spec on lead/lag, which makes sense if these functions
ignore the frame and no sense otherwise.

I also realized after more thought that the way to get frame = partition
with a default frame spec is to omit any ORDER BY in the window spec.
So my concern about LAST_VALUE being useless without framing ability is
unfounded, and I withdraw the complaint that we need to put in some
minimal framing features.  But we'll have to be careful to document all
this properly.  Also, it does seem that it might be worthwhile to try to
ensure that combinations of windows that have the same PARTITION list
and empty vs nonempty ORDER BY get optimized well.  Right now it's
dependent on ordering of the WindowClause whether you pay an extra sort
for that case or not.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Frames vs partitions: is SQL2008 completely insane?

2008-12-26 Thread Tom Lane
According to SQL2008 section 7.11 window clause, general rule 5, the
default definition of window framing in a window that has an ordering
clause but no framing (RANGE/ROWS) clause is that the window frame for
a given row R runs from the first row of its partition through the last
peer of R.

Section 6.10's general rules define the results of LEAD, LAG,
FIRST_VALUE, LAST_VALUE, NTH_VALUE in terms of the rows available in the
window frame of the current window, not its partition.

Meanwhile, section 6.10 window function syntax rule 6 says that
LEAD/LAG must use a window that has an ordering clause and no
framing clause.

This means that without an explicit framing clause, none of these
functions can look beyond the last peer of the current row; and
what's worse, LEAD/LAG seem to be explicitly forbidden from looking
further than that even if we had an implementation of framing clauses.

This seems to be less than sane.  I would certainly expect that LEAD(x)
gives you the next value of x regardless of peer-row status, since
LAG(x) gives you the prior value of x regardless of peer row status.
It is also simply bizarre for FIRST_VALUE to give you the partition's
first row when LAST_VALUE doesn't give you the partition's last row.

Are there any errata for SQL2008 yet?  Can anyone check the actual
behavior of DB2 or other DBMS's that claim to implement these functions?

I notice that the current patch code seems to implement
first/last/nth_value using the frame, but lead/lag using the partition,
which doesn't conform to spec AFAICS ... but lead/lag on the frame
doesn't actually appear to be a useful definition so I'd rather go
with that than with what the letter of the spec seems to say.

Lastly, for a simple aggregate used with an OVER clause, the current
patch seems to define the aggregate as being taken over the frame
rather than the partition, but I cannot find anything in SQL2008 that
lends any support to *either* definition.

Comments?  This all seems rather badly broken.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Frames vs partitions: is SQL2008 completely insane?

2008-12-26 Thread Tom Lane
I wrote:
 Lastly, for a simple aggregate used with an OVER clause, the current
 patch seems to define the aggregate as being taken over the frame
 rather than the partition, but I cannot find anything in SQL2008 that
 lends any support to *either* definition.

Never mind that --- I found it in 10.9 syntax rule 4.b.

But what this seems to boil down to is that LEAD() and LAST_VALUE()
are completely useless unless you're allowed to specify a nondefault
framing clause ... and don't mind ignoring the clearly-insane
restriction of 6.10 syntax rule 6.b.

The minimum extra functionality needed to make these functions useful
would seem to be to allow UNBOUNDED FOLLOWING ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers