Re: [GENERAL] select random order by random
On Nov 1, 2007, at 18:57, Tom Lane wrote: In the usual tradition of SQL99, the spec text is enormously less readable than SQL92 was, but I *think* this says nearly the same thing as what we do: a plain column reference in ORDER BY is first sought as an output column name, and failing that sought as a column name of one of the input tables. They are more restrictive than we are but that's OK. For the particular issue at hand here, it seems to me that 18.f.i.2.B dictates that a sort key matching an output column be treated as a reference to the column, not as an independently evaluated expression. Admittedly they are not talking about volatile functions per se, but I think there's some defense here for the way our parser does it. But the described behavior (or rather its obvious extension to Postgres) does not seem to match the OP's later example: select random() as xxx, random() from generate_series(1, 10) order by random(); xxx| random ---+ 0.117905601913997 | 0.587338728172397 0.167445760298262 | 0.183822357647038 0.212947336590359 | 0.726537112484936 0.215260865732683 | 0.57848364467662 0.503411483719671 | 0.51932220557673 0.783855747796528 | 0.366456756538924 0.803222402838628 | 0.0357640516179446 0.917076966221015 | 0.918215564414028 0.937211547017662 | 0.146829404470897 0.987405426328725 | 0.308503020232778 Clearly the sort key is matched to the first output column, despite its renaming. Contrast this with ... order by random; // plain column reference This substantially breaks the principle of least surprise for me. Caveat - this is on 7.4 (sigh), perhaps more modern versions have different behavior. - John D. Burger MITRE ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] select random order by random
Chris Browne wrote: If I replicate your query, with extra columns, AND NAMES, I get the following: [EMAIL PROTECTED]:5433=# select random() as r1, random() as r2, random() as r3 from generate_series(1,10) order by random(); r1 | r2 |r3 ++--- 0.0288224648684263 | 0.904462072532624 | 0.27792159980163 0.144174488261342 | 0.406729203648865 | 0.452183415647596 ... It is indeed somewhat curious that the query parser chose to interpret that the order by random() was referring to column #1. And even more curiously, IMHO, even specifying column names isn't enough. Note that this: li=# select * from (select (random()*10)::int as a, (random()*10)::int as b from generate_series(1,10) order by a) as x order by b; a | b ---+ 0 | 8 1 | 10 3 | 4 4 | 8 5 | 1 5 | 9 6 | 4 6 | 5 8 | 4 9 | 0 (10 rows) is sorted by a even though the outermost order by clause explicitly said to order by b. Seems like it's a known odd behavior ... http://archives.postgresql.org/pgsql-general/2006-11/msg01523.php http://archives.postgresql.org/pgsql-general/2006-11/msg01539.php http://archives.postgresql.org/pgsql-general/2006-11/msg01544.php ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] select random order by random
Dear sirs, I was very surprised when I executed such SQL query (under PostgreSQL 8.2): select random() from generate_series(1, 10) order by random(); I thought I would receive ten random numbers in random order. But I received ten random numbers sorted numerically: random --- 0.102324520237744 0.17704638838768 0.533014383167028 0.60182224214077 0.644065519794822 0.750732169486582 0.821376844774932 0.88221683120355 0.889879426918924 0.924697323236614 (10 rows) I don't understand - why the result is like that? It seems like in each row both random()s were giving the same result. Why is it like that? What caused it? -- Piotr Sobolewski http://www.piotrsobolewski.w.pl ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] select random order by random
piotr_sobolewski wrote: Dear sirs, I was very surprised when I executed such SQL query (under PostgreSQL 8.2): select random() from generate_series(1, 10) order by random(); I thought I would receive ten random numbers in random order. But I received ten random numbers sorted numerically: random --- 0.102324520237744 0.17704638838768 0.533014383167028 0.60182224214077 0.644065519794822 0.750732169486582 0.821376844774932 0.88221683120355 0.889879426918924 0.924697323236614 (10 rows) I don't understand - why the result is like that? It seems like in each row both random()s were giving the same result. Why is it like that? What caused it? Your query specifically requested that the result be ordered by the column random in the result set (the default ordering direction being ASC). Your query is semantically identical to: SELECT random() AS foo FROM generate_series(1, 10) ORDER BY foo ASC; I should think that you would get a better result if you dropped the ORDER BY clause. brian ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] select random order by random
-Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-general- [EMAIL PROTECTED] On Behalf Of piotr_sobolewski Sent: Thursday, November 01, 2007 9:25 AM To: pgsql-general@postgresql.org Subject: [GENERAL] select random order by random Dear sirs, I was very surprised when I executed such SQL query (under PostgreSQL 8.2): select random() from generate_series(1, 10) order by random(); I thought I would receive ten random numbers in random order. But I received ten random numbers sorted numerically: random --- 0.102324520237744 0.17704638838768 0.533014383167028 0.60182224214077 0.644065519794822 0.750732169486582 0.821376844774932 0.88221683120355 0.889879426918924 0.924697323236614 (10 rows) I don't understand - why the result is like that? It seems like in each row both random()s were giving the same result. Why is it like that? What caused it? -- Piotr Sobolewski http://www.piotrsobolewski.w.pl ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match [Lee Keel] Would this not have to do with the 'order by' you added to the end of the statement? If you remove the order by clause, then it works for me... -LK This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] select random order by random
On 11/1/07, Lee Keel [EMAIL PROTECTED] wrote: Dear sirs, I was very surprised when I executed such SQL query (under PostgreSQL 8.2): select random() from generate_series(1, 10) order by random(); I thought I would receive ten random numbers in random order. But I received ten random numbers sorted numerically: random --- 0.102324520237744 0.17704638838768 0.533014383167028 0.60182224214077 0.644065519794822 0.750732169486582 0.821376844774932 0.88221683120355 0.889879426918924 0.924697323236614 (10 rows) I don't understand - why the result is like that? It seems like in each row both random()s were giving the same result. Why is it like that? What caused it? Would this not have to do with the 'order by' you added to the end of the statement? If you remove the order by clause, then it works for me... I think that Piotr expected the random() to be evaluated in both places separately. My guess is that it was recognized by the planner as the same function and evaluated once per row only. If you try this: select random() from generate_series(1, 10) order by random()*1; then you'll get random ordering. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] select random order by random
Scott Marlowe [EMAIL PROTECTED] writes: I think that Piotr expected the random() to be evaluated in both places separately. My guess is that it was recognized by the planner as the same function and evaluated once per row only. If you try this: select random() from generate_series(1, 10) order by random()*1; then you'll get random ordering. This does strike me as wrong. random() is marked volatile and the planner ought not collapse multiple calls into one. Note that it affects other volatile functions too: postgres=# select nextval('s') from generate_series(1, 10) order by nextval('s'); nextval - 1 2 3 4 5 6 7 8 9 10 (10 rows) postgres=# select nextval('s') from generate_series(1, 10) order by nextval('s'); nextval - 11 12 13 14 15 16 17 18 19 20 (10 rows) That's certainly not how I remembered it working but I'm not sure I ever tested it before. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] select random order by random
Gregory Stark wrote: Scott Marlowe [EMAIL PROTECTED] writes: I think that Piotr expected the random() to be evaluated in both places separately. My guess is that it was recognized by the planner as the same function and evaluated once per row only. If you try this: select random() from generate_series(1, 10) order by random()*1; then you'll get random ordering. This does strike me as wrong. random() is marked volatile and the planner ought not collapse multiple calls into one. I think I agree with the earlier poster. Surely these two queries should be equivalent? SELECT random()FROM generate_series(1, 10) ORDER BY random(); SELECT random() AS foo FROM generate_series(1, 10) ORDER BY foo; -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] select random order by random
On Thu, Nov 01, 2007 at 04:49:16PM +, Richard Huxton wrote: Gregory Stark wrote: This does strike me as wrong. random() is marked volatile and the planner ought not collapse multiple calls into one. I think I agree with the earlier poster. Surely these two queries should be equivalent? SELECT random()FROM generate_series(1, 10) ORDER BY random(); SELECT random() AS foo FROM generate_series(1, 10) ORDER BY foo; If they were pure (in the functional programming sense) then this would be a correct optimisation. However, if they're marked as volatile then they should be called independently---they're not pure anymore and you're calling the code for its side-effects and optimising out the either call changes the semantics. Try playing around with monads in Haskell or uniqueness types in Clean, they help to clarify what's going on when you call a function in an impure language. Sam ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] select random order by random
Richard Huxton [EMAIL PROTECTED] writes: Gregory Stark wrote: This does strike me as wrong. random() is marked volatile and the planner ought not collapse multiple calls into one. I think I agree with the earlier poster. Surely these two queries should be equivalent? SELECT random()FROM generate_series(1, 10) ORDER BY random(); SELECT random() AS foo FROM generate_series(1, 10) ORDER BY foo; Well, the latter case is why it acts that way, but Greg has a point that when a volatile function is involved maybe they shouldn't be the same. OTOH it's always been like that, and in the absence of a clear reason to change it I'm inclined to leave it alone. (BTW, this is not the planner's fault; the collapsing of the two targetlist entries into one happens in the parser.) regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] select random order by random
On Thursday 01 November 2007 17:08, brian wrote: I was very surprised when I executed such SQL query (under PostgreSQL 8.2): select random() from generate_series(1, 10) order by random(); I don't understand - why the result is like that? It seems like in each row both random()s were giving the same result. Why is it like that? What caused it? Your query specifically requested that the result be ordered by the column random in the result set (the default ordering direction being ASC). Your query is semantically identical to: SELECT random() AS foo FROM generate_series(1, 10) ORDER BY foo ASC; I also had such theory. But if I do such query: select x from generate_series(1, 10) as x order by random(); the answer is shuffled in random order. So why in one case this random() is treaded as a column name and in second - as function name? And when I do such query: select random() as xxx, random() from generate_series(1, 10) order by random(); your theory would predict that the answer is ordered by the second column (as the first one is renamed to 'xxx'). However in reality the answer is in random order. I should think that you would get a better result if you dropped the ORDER BY clause. Yes, I know. However, once I made such request just for fun and curiosity, and found that I don't know why does it work like that. And since then I think about it and try to understand it - if in this case Postgres behaves the way I don't understand, I probably don't understand it well at all. -- Piotr Sobolewski http://www.piotrsobolewski.w.pl ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] select random order by random
On Thursday 01 November 2007 17:16, Scott Marlowe wrote: I was very surprised when I executed such SQL query (under PostgreSQL 8.2): select random() from generate_series(1, 10) order by random(); (...) My guess is that it was recognized by the planner as the same function and evaluated once per row only. I also had such hypothesis, but I think that query: select random(), random() from generate_series(1, 10) order by random(); contradicts it. -- Piotr Sobolewski http://www.piotrsobolewski.w.pl ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] select random order by random
On Thu, Nov 01, 2007 at 02:22:58PM -0400, Tom Lane wrote: SELECT random()FROM generate_series(1, 10) ORDER BY random(); SELECT random() AS foo FROM generate_series(1, 10) ORDER BY foo; (BTW, this is not the planner's fault; the collapsing of the two targetlist entries into one happens in the parser.) Something twigged telling me that in fact the latter expression is not in standard SQL but a (very common) extension. A sort key is clearly indicated to be a value expression with no indication anywhere that column aliases are allowed here (though that may be in the common rules somewhere). Then again, I may be remembering all wrong... Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Those who make peaceful revolution impossible will make violent revolution inevitable. -- John F Kennedy signature.asc Description: Digital signature
Re: [GENERAL] select random order by random
On 11/1/07, Martijn van Oosterhout [EMAIL PROTECTED] wrote: On Thu, Nov 01, 2007 at 02:22:58PM -0400, Tom Lane wrote: SELECT random()FROM generate_series(1, 10) ORDER BY random(); SELECT random() AS foo FROM generate_series(1, 10) ORDER BY foo; (BTW, this is not the planner's fault; the collapsing of the two targetlist entries into one happens in the parser.) Something twigged telling me that in fact the latter expression is not in standard SQL but a (very common) extension. A sort key is clearly indicated to be a value expression with no indication anywhere that column aliases are allowed here (though that may be in the common rules somewhere). Well, the standard way I know if is to use column numbers. i.e.: select random() from generate_series(1,10) order by 1 That I'm pretty sure IS in the standard. Don't see why column aliases would be disallowed. It's not like the where clause where the select field doesn't exist when it fires. The select field list does exist when order by fires, so referring to it makes sense. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] select random order by random
Martijn van Oosterhout [EMAIL PROTECTED] writes: Something twigged telling me that in fact the latter expression is not in standard SQL but a (very common) extension. A sort key is clearly indicated to be a value expression with no indication anywhere that column aliases are allowed here (though that may be in the common rules somewhere). SQL92 says differently. The committee basically redefined ORDER BY entirely between SQL92 and SQL99. What we actually try to support is both SQL92 and SQL99 interpretations, which is a pretty unholy mess, but enough people (and programs) are used to the SQL92 way that I don't foresee being able to drop it. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] select random order by random
Scott Marlowe [EMAIL PROTECTED] writes: On 11/1/07, Martijn van Oosterhout [EMAIL PROTECTED] wrote: On Thu, Nov 01, 2007 at 02:22:58PM -0400, Tom Lane wrote: SELECT random()FROM generate_series(1, 10) ORDER BY random(); SELECT random() AS foo FROM generate_series(1, 10) ORDER BY foo; (BTW, this is not the planner's fault; the collapsing of the two targetlist entries into one happens in the parser.) Something twigged telling me that in fact the latter expression is not in standard SQL but a (very common) extension. A sort key is clearly indicated to be a value expression with no indication anywhere that column aliases are allowed here (though that may be in the common rules somewhere). Well, the standard way I know if is to use column numbers. i.e.: select random() from generate_series(1,10) order by 1 That I'm pretty sure IS in the standard. Don't see why column aliases would be disallowed. It's not like the where clause where the select field doesn't exist when it fires. The select field list does exist when order by fires, so referring to it makes sense. Well IIRC the standard requires the sort keys to be columns from the select list. You can't put any old expression there, only copies of the expressions used in the select list. So in the spec random() can't really be considered a second call to random(), it's just a retyped instance of the random() in the select list. That is, it's just a longwinded way of saying order by 1 (meaning column 1). So I guess having the parser do this substitution kind of makes sense if you're thinking about things the way the spec does. It doesn't make much sense if you're thinking the way Postgres does of having arbitrary expressions there independent of what's in the select list. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] select random order by random
Gregory Stark [EMAIL PROTECTED] writes: So I guess having the parser do this substitution kind of makes sense if you're thinking about things the way the spec does. It doesn't make much sense if you're thinking the way Postgres does of having arbitrary expressions there independent of what's in the select list. Again: this is not Postgres vs the spec, it is SQL92 vs SQL99. I draw your attention to the relevant text... SQL92: order by clause ::= ORDER BY sort specification list sort specification list ::= sort specification [ { comma sort specification }... ] sort specification ::= sort key [ collate clause ] [ ordering specification ] sort key ::= column name | unsigned integer ordering specification ::= ASC | DESC ... 10)If ORDER BY is specified, then each sort specification in the order by clause shall identify a column of T. Case: a) If a sort specification contains a column name, then T shall contain exactly one column with that column name and the sort specification identifies that column. b) If a sort specification contains an unsigned integer, then the unsigned integer shall be greater than 0 and not greater than the degree of T. The sort specification iden- tifies the column of T with the ordinal position specified by the unsigned integer. (T is the table emitted by the SELECT.) SQL99: order by clause ::= ORDER BY sort specification list sort specification list ::= sort specification [ { comma sort specification }... ] sort specification ::= sort key [ collate clause ] [ ordering specification ] sort key ::= value expression ordering specification ::= ASC | DESC 18) If an order by clause is specified, then: a) Let K(i) be the sort key contained in the i-th sort specification. b) Let DT be the declared type of K(i). c) If DT is a user-defined type, then the comparison form of DT shall be FULL. d) K(i) shall not be a literal. e) If QE is a query expression body that is a non-join query expression that is a non-join query term that is a non- join query primary that is a simple table that is a query specification, then the cursor specification is said to be a simple table query. f) Case: i) If sort specification list contains any sort key K(i) that contains a column reference to a column that is not a column of T, then: 1) The cursor specification shall be a simple table query. 2) Case: A) If K(i) is not equivalent to a value expression immediately contained in any derived column in the select list SL of query specification QS contained in QE, then: I) T shall not be a grouped table. II) QS shall not specify the set quantifier DISTINCT or directly contain one or more set function specifications. III) Let C(j) be a column that is not a column of T and whose column reference is contained in some K(i). IV) Let SKL be the list of derived columns that are column names of column references to every C(j). The columns C(j) are said to be extended sort key columns. V) Let TE be the table expression immediately contained in QS. VI) Let ST be the result of evaluating the query specification: SELECT SL, SKL FROM TE B) Otherwise: I) Let ST be T. II) For every derived column DC(e) of SL that is equivalent to K(i), if DC(e) has a column name, then let CN(e) be that column name; otherwise: 1) Let CN(e) be an implementation-defined column name that is not equal to any column name of any column of ST. 2) DC(e) is effectively replaced by DE(e) AS CN(e) in the select list of ST, where DE(e) is the derived element of DC(e). III) K(i) is effectively replaced by CN(e). ii) Otherwise, let ST be T. g) ST is said to be a sort table.