Re: [GENERAL] select random order by random

2007-11-02 Thread John D. Burger


On Nov 1, 2007, at 18:57, Tom Lane wrote:


In the usual tradition of SQL99, the spec text is enormously less
readable than SQL92 was, but I *think* this says nearly the same thing
as what we do: a plain column reference in ORDER BY is first sought as
an output column name, and failing that sought as a column name of one
of the input tables.  They are more restrictive than we are but that's
OK.

For the particular issue at hand here, it seems to me that 18.f.i.2.B
dictates that a sort key matching an output column be treated as a
reference to the column, not as an independently evaluated expression.
Admittedly they are not talking about volatile functions per se, but
I think there's some defense here for the way our parser does it.


But the described behavior (or rather its obvious extension to  
Postgres) does not seem to match the OP's later example:


 select random() as xxx, random() from generate_series(1, 10) order  
by random();


xxx|   random
---+
0.117905601913997 |  0.587338728172397
0.167445760298262 |  0.183822357647038
0.212947336590359 |  0.726537112484936
0.215260865732683 |   0.57848364467662
0.503411483719671 |   0.51932220557673
0.783855747796528 |  0.366456756538924
0.803222402838628 | 0.0357640516179446
0.917076966221015 |  0.918215564414028
0.937211547017662 |  0.146829404470897
0.987405426328725 |  0.308503020232778

Clearly the sort key is matched to the first output column, despite  
its renaming.  Contrast this with


  ... order by random;  // plain column reference

This substantially breaks the principle of least surprise for me.

Caveat - this is on 7.4 (sigh), perhaps more modern versions have  
different behavior.


- John D. Burger
  MITRE



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] select random order by random

2007-11-02 Thread Ron Mayer
Chris Browne wrote:
 If I replicate your query, with extra columns, AND NAMES, I get the following:
 
 [EMAIL PROTECTED]:5433=# select random() as r1, random() as r2, random() as 
 r3 from generate_series(1,10) order by random();
  r1 | r2 |r3 
 ++---
  0.0288224648684263 |  0.904462072532624 |  0.27792159980163
   0.144174488261342 |  0.406729203648865 | 0.452183415647596
  ...
 It is indeed somewhat curious that the query parser chose to interpret
 that the order by random() was referring to column #1.

And even more curiously, IMHO, even specifying
column names isn't enough.  Note that this:

li=# select * from (select (random()*10)::int as a, (random()*10)::int as b 
from generate_series(1,10) order by a) as x order by b;
 a | b
---+
 0 |  8
 1 | 10
 3 |  4
 4 |  8
 5 |  1
 5 |  9
 6 |  4
 6 |  5
 8 |  4
 9 |  0
(10 rows)

is sorted by a even though the outermost order by
clause explicitly said to order by b.

Seems like it's a known odd behavior ...
http://archives.postgresql.org/pgsql-general/2006-11/msg01523.php
http://archives.postgresql.org/pgsql-general/2006-11/msg01539.php
http://archives.postgresql.org/pgsql-general/2006-11/msg01544.php

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] select random order by random

2007-11-01 Thread piotr_sobolewski
Dear sirs, 

I was very surprised when I executed such SQL query (under PostgreSQL 8.2):
select random() from generate_series(1, 10) order by random();

I thought I would receive ten random numbers in random order. But I received 
ten random numbers sorted numerically:
  random
---
 0.102324520237744
  0.17704638838768
 0.533014383167028
  0.60182224214077
 0.644065519794822
 0.750732169486582
 0.821376844774932
  0.88221683120355
 0.889879426918924
 0.924697323236614
(10 rows)

I don't understand - why the result is like that? It seems like in each row 
both random()s were giving the same result. Why is it like that? What caused 
it?

-- 
Piotr Sobolewski
http://www.piotrsobolewski.w.pl


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] select random order by random

2007-11-01 Thread brian

piotr_sobolewski wrote:
Dear sirs, 


I was very surprised when I executed such SQL query (under PostgreSQL 8.2):
select random() from generate_series(1, 10) order by random();

I thought I would receive ten random numbers in random order. But I received 
ten random numbers sorted numerically:

  random
---
 0.102324520237744
  0.17704638838768
 0.533014383167028
  0.60182224214077
 0.644065519794822
 0.750732169486582
 0.821376844774932
  0.88221683120355
 0.889879426918924
 0.924697323236614
(10 rows)

I don't understand - why the result is like that? It seems like in each row 
both random()s were giving the same result. Why is it like that? What caused 
it?




Your query specifically requested that the result be ordered by the 
column random in the result set (the default ordering direction being 
ASC). Your query is semantically identical to:


SELECT random() AS foo FROM generate_series(1, 10) ORDER BY foo ASC;

I should think that you would get a better result if you dropped the 
ORDER BY clause.


brian

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] select random order by random

2007-11-01 Thread Lee Keel
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-general-
 [EMAIL PROTECTED] On Behalf Of piotr_sobolewski
 Sent: Thursday, November 01, 2007 9:25 AM
 To: pgsql-general@postgresql.org
 Subject: [GENERAL] select random order by random
 
 Dear sirs,
 
 I was very surprised when I executed such SQL query (under PostgreSQL
 8.2):
 select random() from generate_series(1, 10) order by random();
 
 I thought I would receive ten random numbers in random order. But I
 received
 ten random numbers sorted numerically:
   random
 ---
  0.102324520237744
   0.17704638838768
  0.533014383167028
   0.60182224214077
  0.644065519794822
  0.750732169486582
  0.821376844774932
   0.88221683120355
  0.889879426918924
  0.924697323236614
 (10 rows)
 
 I don't understand - why the result is like that? It seems like in each
 row
 both random()s were giving the same result. Why is it like that? What
 caused
 it?
 
 --
 Piotr Sobolewski
 http://www.piotrsobolewski.w.pl
 
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
[Lee Keel] 

Would this not have to do with the 'order by' you added to the end of the
statement?  If you remove the order by clause, then it works for me...

-LK


This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the sender. This message 
contains confidential information and is intended only for the individual 
named. If you are not the named addressee you should not disseminate, 
distribute or copy this e-mail.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] select random order by random

2007-11-01 Thread Scott Marlowe
On 11/1/07, Lee Keel [EMAIL PROTECTED] wrote:
  Dear sirs,
 
  I was very surprised when I executed such SQL query (under PostgreSQL
  8.2):
  select random() from generate_series(1, 10) order by random();
 
  I thought I would receive ten random numbers in random order. But I
  received
  ten random numbers sorted numerically:
random
  ---
   0.102324520237744
0.17704638838768
   0.533014383167028
0.60182224214077
   0.644065519794822
   0.750732169486582
   0.821376844774932
0.88221683120355
   0.889879426918924
   0.924697323236614
  (10 rows)
 
  I don't understand - why the result is like that? It seems like in each
  row
  both random()s were giving the same result. Why is it like that? What
  caused
  it?

 Would this not have to do with the 'order by' you added to the end of the
 statement?  If you remove the order by clause, then it works for me...

I think that Piotr expected the random() to be evaluated in both
places separately.

My guess is that it was recognized by the planner as the same function
and evaluated once per row only.

If you try this:

select random() from generate_series(1, 10) order by random()*1;

then you'll get random ordering.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] select random order by random

2007-11-01 Thread Gregory Stark
Scott Marlowe [EMAIL PROTECTED] writes:

 I think that Piotr expected the random() to be evaluated in both
 places separately.

 My guess is that it was recognized by the planner as the same function
 and evaluated once per row only.

 If you try this:

 select random() from generate_series(1, 10) order by random()*1;

 then you'll get random ordering.

This does strike me as wrong. random() is marked volatile and the planner
ought not collapse multiple calls into one. Note that it affects other
volatile functions too:

postgres=#  select nextval('s') from generate_series(1, 10) order by 
nextval('s');
 nextval 
-
   1
   2
   3
   4
   5
   6
   7
   8
   9
  10
(10 rows)

postgres=#  select nextval('s') from generate_series(1, 10) order by 
nextval('s');
 nextval 
-
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
(10 rows)

That's certainly not how I remembered it working but I'm not sure I ever
tested it before.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] select random order by random

2007-11-01 Thread Richard Huxton

Gregory Stark wrote:

Scott Marlowe [EMAIL PROTECTED] writes:


I think that Piotr expected the random() to be evaluated in both
places separately.

My guess is that it was recognized by the planner as the same function
and evaluated once per row only.

If you try this:

select random() from generate_series(1, 10) order by random()*1;

then you'll get random ordering.


This does strike me as wrong. random() is marked volatile and the planner
ought not collapse multiple calls into one. 


I think I agree with the earlier poster. Surely these two queries should 
be equivalent?


SELECT random()FROM generate_series(1, 10) ORDER BY random();
SELECT random() AS foo FROM generate_series(1, 10) ORDER BY foo;


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] select random order by random

2007-11-01 Thread Sam Mason
On Thu, Nov 01, 2007 at 04:49:16PM +, Richard Huxton wrote:
 Gregory Stark wrote:
 This does strike me as wrong. random() is marked volatile and the planner
 ought not collapse multiple calls into one. 
 
 I think I agree with the earlier poster. Surely these two queries should 
 be equivalent?
 
 SELECT random()FROM generate_series(1, 10) ORDER BY random();
 SELECT random() AS foo FROM generate_series(1, 10) ORDER BY foo;

If they were pure (in the functional programming sense) then this would
be a correct optimisation.  However, if they're marked as volatile then
they should be called independently---they're not pure anymore and
you're calling the code for its side-effects and optimising out the
either call changes the semantics.  Try playing around with monads in
Haskell or uniqueness types in Clean, they help to clarify what's going
on when you call a function in an impure language.


  Sam

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] select random order by random

2007-11-01 Thread Tom Lane
Richard Huxton [EMAIL PROTECTED] writes:
 Gregory Stark wrote:
 This does strike me as wrong. random() is marked volatile and the planner
 ought not collapse multiple calls into one. 

 I think I agree with the earlier poster. Surely these two queries should 
 be equivalent?

 SELECT random()FROM generate_series(1, 10) ORDER BY random();
 SELECT random() AS foo FROM generate_series(1, 10) ORDER BY foo;

Well, the latter case is why it acts that way, but Greg has a point that
when a volatile function is involved maybe they shouldn't be the same.
OTOH it's always been like that, and in the absence of a clear reason
to change it I'm inclined to leave it alone.

(BTW, this is not the planner's fault; the collapsing of the two
targetlist entries into one happens in the parser.)

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] select random order by random

2007-11-01 Thread piotr_sobolewski
On Thursday 01 November 2007 17:08, brian wrote:

  I was very surprised when I executed such SQL query (under PostgreSQL
  8.2): select random() from generate_series(1, 10) order by random();
 
  I don't understand - why the result is like that? It seems like in each
  row both random()s were giving the same result. Why is it like that? What
  caused it?

 Your query specifically requested that the result be ordered by the
 column random in the result set (the default ordering direction being
 ASC). Your query is semantically identical to:
 SELECT random() AS foo FROM generate_series(1, 10) ORDER BY foo ASC;

I also had such theory. But if I do such query:
select x from generate_series(1, 10) as x order by random();
the answer is shuffled in random order.
So why in one case this random() is treaded as a column name and in second - 
as function name?

And when I do such query:
select random() as xxx, random() from generate_series(1, 10) order by 
random();
your theory would predict that the answer is ordered by the second column (as 
the first one is renamed to 'xxx'). However in reality the answer is in 
random order.

 I should think that you would get a better result if you dropped the
 ORDER BY clause.

Yes, I know. However, once I made such request just for fun and curiosity, and 
found that I don't know why does it work like that. And since then I think 
about it and try to understand it - if in this case Postgres behaves the way 
I don't understand, I probably don't understand it well at all.

-- 
Piotr Sobolewski
http://www.piotrsobolewski.w.pl

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] select random order by random

2007-11-01 Thread piotr_sobolewski
On Thursday 01 November 2007 17:16, Scott Marlowe wrote:

   I was very surprised when I executed such SQL query (under PostgreSQL
   8.2):
   select random() from generate_series(1, 10) order by random();
  
 (...)
 My guess is that it was recognized by the planner as the same function
 and evaluated once per row only.

I also had such hypothesis, but I think that query:
select random(), random() from generate_series(1, 10) order by random();
contradicts it.

-- 
Piotr Sobolewski
http://www.piotrsobolewski.w.pl

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] select random order by random

2007-11-01 Thread Martijn van Oosterhout
On Thu, Nov 01, 2007 at 02:22:58PM -0400, Tom Lane wrote:
  SELECT random()FROM generate_series(1, 10) ORDER BY random();
  SELECT random() AS foo FROM generate_series(1, 10) ORDER BY foo;
 
 (BTW, this is not the planner's fault; the collapsing of the two
 targetlist entries into one happens in the parser.)

Something twigged telling me that in fact the latter expression is not
in standard SQL but a (very common) extension. A sort key is clearly
indicated to be a value expression with no indication anywhere that
column aliases are allowed here (though that may be in the common rules
somewhere).

Then again, I may be remembering all wrong...

Have a nice day,
-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 Those who make peaceful revolution impossible will make violent revolution 
 inevitable.
  -- John F Kennedy


signature.asc
Description: Digital signature


Re: [GENERAL] select random order by random

2007-11-01 Thread Scott Marlowe
On 11/1/07, Martijn van Oosterhout [EMAIL PROTECTED] wrote:
 On Thu, Nov 01, 2007 at 02:22:58PM -0400, Tom Lane wrote:
   SELECT random()FROM generate_series(1, 10) ORDER BY random();
   SELECT random() AS foo FROM generate_series(1, 10) ORDER BY foo;
 
  (BTW, this is not the planner's fault; the collapsing of the two
  targetlist entries into one happens in the parser.)

 Something twigged telling me that in fact the latter expression is not
 in standard SQL but a (very common) extension. A sort key is clearly
 indicated to be a value expression with no indication anywhere that
 column aliases are allowed here (though that may be in the common rules
 somewhere).

Well, the standard way I know if is to use column numbers.  i.e.:

select random() from generate_series(1,10) order by 1

That I'm pretty sure IS in the standard.  Don't see why column aliases
would be disallowed.  It's not like the where clause where the select
field doesn't exist when it fires.  The select field list does exist
when order by fires, so referring to it makes sense.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] select random order by random

2007-11-01 Thread Tom Lane
Martijn van Oosterhout [EMAIL PROTECTED] writes:
 Something twigged telling me that in fact the latter expression is not
 in standard SQL but a (very common) extension. A sort key is clearly
 indicated to be a value expression with no indication anywhere that
 column aliases are allowed here (though that may be in the common rules
 somewhere).

SQL92 says differently.  The committee basically redefined ORDER BY
entirely between SQL92 and SQL99.

What we actually try to support is both SQL92 and SQL99 interpretations,
which is a pretty unholy mess, but enough people (and programs) are used
to the SQL92 way that I don't foresee being able to drop it.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] select random order by random

2007-11-01 Thread Gregory Stark
Scott Marlowe [EMAIL PROTECTED] writes:

 On 11/1/07, Martijn van Oosterhout [EMAIL PROTECTED] wrote:
 On Thu, Nov 01, 2007 at 02:22:58PM -0400, Tom Lane wrote:
   SELECT random()FROM generate_series(1, 10) ORDER BY random();
   SELECT random() AS foo FROM generate_series(1, 10) ORDER BY foo;
 
  (BTW, this is not the planner's fault; the collapsing of the two
  targetlist entries into one happens in the parser.)

 Something twigged telling me that in fact the latter expression is not
 in standard SQL but a (very common) extension. A sort key is clearly
 indicated to be a value expression with no indication anywhere that
 column aliases are allowed here (though that may be in the common rules
 somewhere).

 Well, the standard way I know if is to use column numbers.  i.e.:

 select random() from generate_series(1,10) order by 1

 That I'm pretty sure IS in the standard.  Don't see why column aliases
 would be disallowed.  It's not like the where clause where the select
 field doesn't exist when it fires.  The select field list does exist
 when order by fires, so referring to it makes sense.

Well IIRC the standard requires the sort keys to be columns from the select
list. You can't put any old expression there, only copies of the expressions
used in the select list.

So in the spec random() can't really be considered a second call to
random(), it's just a retyped instance of the random() in the select list.
That is, it's just a longwinded way of saying order by 1 (meaning column 1).

So I guess having the parser do this substitution kind of makes sense if
you're thinking about things the way the spec does. It doesn't make much sense
if you're thinking the way Postgres does of having arbitrary expressions there
independent of what's in the select list.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] select random order by random

2007-11-01 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 So I guess having the parser do this substitution kind of makes sense
 if you're thinking about things the way the spec does. It doesn't make
 much sense if you're thinking the way Postgres does of having
 arbitrary expressions there independent of what's in the select list.

Again: this is not Postgres vs the spec, it is SQL92 vs SQL99.
I draw your attention to the relevant text...

SQL92:

 order by clause ::=
  ORDER BY sort specification list

 sort specification list ::=
  sort specification [ { comma sort specification }... ]

 sort specification ::=
  sort key [ collate clause  ] [ ordering specification ]

 sort key ::=
column name
  | unsigned integer

 ordering specification ::= ASC | DESC

...

 10)If ORDER BY is specified, then each sort specification in the
order by clause shall identify a column of T.

Case:

a) If a sort specification contains a column name, then T
  shall contain exactly one column with that column name and
  the sort specification identifies that column.

b) If a sort specification contains an unsigned integer,
  then the unsigned integer shall be greater than 0 and not
  greater than the degree of T. The sort specification iden-
  tifies the column of T with the ordinal position specified by
  the unsigned integer.

(T is the table emitted by the SELECT.)


SQL99:

 order by clause ::=
  ORDER BY sort specification list

 sort specification list ::=
  sort specification [ { comma sort specification }... ]

 sort specification ::=
  sort key [ collate clause ] [ ordering specification ]

 sort key ::=
  value expression

 ordering specification ::= ASC | DESC


18) If an order by clause is specified, then:

a) Let K(i) be the sort key contained in the i-th sort
  specification.

b) Let DT be the declared type of K(i).

c) If DT is a user-defined type, then the comparison form of DT
  shall be FULL.

d) K(i) shall not be a literal.

e) If QE is a query expression body that is a non-join query
  expression that is a non-join query term that is a non-
  join query primary that is a simple table that is a query
  specification, then the cursor specification is said to be
  a simple table query.

f) Case:

  i) If sort specification list contains any sort key K(i)
 that contains a column reference to a column that is not a
 column of T, then:

 1) The cursor specification shall be a simple table
   query.

 2) Case:

   A) If K(i) is not equivalent to a value expression
  immediately contained in any derived column in the
  select list SL of query specification QS contained
  in QE, then:

  I) T shall not be a grouped table.

 II) QS shall not specify the set quantifier DISTINCT
or directly contain one or more set function
specifications.

III) Let C(j) be a column that is not a column of T and
whose column reference is contained in some K(i).

 IV) Let SKL be the list of derived columns that are
column names of column references to every C(j).
The columns C(j) are said to be extended sort key
columns.

  V) Let TE be the table expression immediately
contained in QS.

 VI) Let ST be the result of evaluating the query
specification:

   SELECT SL, SKL FROM TE

   B) Otherwise:

  I) Let ST be T.

 II) For every derived column DC(e) of SL that is
equivalent to K(i), if DC(e) has a column name,
then let CN(e) be that column name; otherwise:

1) Let CN(e) be an implementation-defined column
   name that is not equal to any column name of
   any column of ST.

2) DC(e) is effectively replaced by DE(e) AS CN(e)
   in the select list of ST, where DE(e) is the
   derived element of DC(e).

III) K(i) is effectively replaced by CN(e).

 ii) Otherwise, let ST be T.

g) ST is said to be a sort table.