Re: [SQL] SQL challenge--top 10 for each key value?
Troels Arvin wrote: See http://www.acm.org/sigmod/record/issues/0403/index.html#standards for an article which summarizes the news in SQL:2003. This is a very useful page; thank you for creating it and for noting it in this thread! -- (Posted from an account used as a SPAM dump. If you really want to get in touch with me, dump the 'jboes' and substitute 'mur'.) Jeffery Boes [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] SQL challenge--top 10 for each key value?
No, it will not work twice in the same query as is. If you want to code two counter buckets and pass in some way to distinguish between the two yada yada yada it is possible. It is also possible to code this to do multi-level counting/breaks/calculations, etc. But the SD dictionary is by connection. So any values stored in it need to be initialized at the appropriate time *outside* of the first use. elein On Sun, Apr 11, 2004 at 12:38:20AM -0400, Greg Stark wrote: elein [EMAIL PROTECTED] writes: create or replace function pycounter(integer) returns integer as ' if args[0] == 0: SD[nextno] = 1 return SD[nextno] try: SD[nextno] += 1 except: SD[nextno] = 1 return SD[nextno] ' language 'plpythonu'; And clearly it can be done faster as a little C function. Does this approach have a hope of working if it's used twice in the same query? -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] SQL challenge--top 10 for each key value?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 How would you go about getting the top N (say, the top 10) for each query? Assume you have a table ch and three sequences 'aa', 'bb', and 'cc'. (Only 'aa' and 'bb' need to be initially set) SELECT setval('aa',1,'f'); SELECT setval('bb',1,'f'); SELECT nextval('cc') AS rating,q2 AS query, s2 AS score FROM ( SELECT 0 AS q1, 0 AS s1, NULL AS cs, nextval('aa') AS v1 UNION ALL (SELECT *, nextval('aa') AS v1 FROM (SELECT query AS q1, MAX(score) AS s1, checksum AS cs FROM ch GROUP BY 1,3 ORDER BY 1 ASC, 2 DESC) AS foo) ) AS uno, ( (SELECT *, nextval('bb') AS v2 FROM (SELECT query AS q2, MAX(score) AS s2, checksum AS cs FROM ch GROUP BY 1,3 ORDER BY 1 ASC, 2 DESC) AS foo) UNION ALL SELECT NULL AS q2, 0 AS s2, NULL AS cs, nextval('bb') AS v2 ) AS dos WHERE v1 = v2 AND q2 IS NOT NULL AND ( (CASE WHEN q1 != q2 THEN setval('cc',1,'f') ELSE 0 END 0) OR (CASE WHEN currval('cc')10 THEN 1 ELSE 0 END 0) ); - -- Greg Sabino Mullane [EMAIL PROTECTED] PGP Key: 0x14964AC8 200404101029 -BEGIN PGP SIGNATURE- iD8DBQFAeAZ1vJuQZxSWSsgRAqYuAJ9HaYLotPYkyi1U76I9xnvi8AhLTQCfUyJq +iVdbz5U7HKep89z0kp49U0= =6+OH -END PGP SIGNATURE- ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] SQL challenge--top 10 for each key value?
Rod, Greg It's not really like Oracles row num at all, though I suppose you can emulate rownum using it. The intention is that you will use it for aggregates like running totals, moving averages, counting, etc. Yes, that makes a certain amount of sense. I just take exception to the name Row Number becuase it confuses newbies about the actual nature of the data being returned and gets them back in the bad space of believing in fixed data ordering -- something which RDBMSes are supposed to avoid. I'm also disgusted at the vendor partiality that this shows. PostgreSQL and MySQL both have working implementations of a very similar concept using the non-confusing term Limit. However, since Oracle is on the committee, they had to use a more Oracle-friendly term, I guess. In the case of SQL was there ever any pretension otherwise? Was the SQL standard ever really useful as a real standard? I can write useful ANSI C89 code that will compile and work on any C compiler. Trying to write portable SQL92 code that does any useful work is about as productive as stapling bagels to your forehead. Well, there *was* a pretense otherwise, which ended about 1994, just as we were getting SQL on this project. Now the big vendors -- mostly IBM and Oracle since Informix and Sybase are dying -- run everything and adapt the standard to what features their products already have. So, yes, SQL92 needed development and expansion. But we didn't need SQL99. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [SQL] SQL challenge--top 10 for each key value?
On Fri, 09 Apr 2004 02:11:44 -0400, Tom Lane wrote: ROW_NUMBER() is a spec defined function. (6.10 of SQL200N) If the spec doesn't even have a year number yet, you can hardly expect real implementations to support it ;-) SQL:2003 is finished. Among its new (non-core) OLAP features are a set of windows functions (spec section 6.10), which include feature ID T611 (elementary OLAP): - ROW_NUMBER() OVER (...) - RANK() OVER (...) - DENSE_RANK() OVER (...) feature ID T612 (extended OLAP): - PERCENT_RANK() OVER (...) - CUME_DIST() OVER (...) See http://www.acm.org/sigmod/record/issues/0403/index.html#standards for an article which summarizes the news in SQL:2003. ROW_NUMBER() OVER may be used in queries where a PostgreSQL user which use LIMIT. RANK() OVER may be used in queries where a PostgreSQL user would have to come up with a somewhat strange query in order to get acceptable performance, see http://troels.arvin.dk/db/rdbms/#select-top-n-postgresql -- Greetings from Troels Arvin, Copenhagen, Denmark ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] SQL challenge--top 10 for each key value?
On Sun, 11 Apr 2004 00:13:17 +0200, I wrote: Among its new (non-core) OLAP features are a set of windows functions Sorry - I meant window functions... (Microsoft don't seem to have had much influence in SQL:2003's OLAP-specifications; IBM seems to be the big influencer in those parts of the standard.) -- Greetings from Troels Arvin, Copenhagen, Denmark ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] SQL challenge--top 10 for each key value?
This solution will be in Monday's edition of PostgreSQL General Bits (http://www.varlena.com/GeneralBits). (In other words, if it doesn't do what you mean, let me know now!) CREATE TYPE topscores AS (id integer, query integer, checksum char(32), score integer); CREATE OR REPLACE FUNCTION topscores(integer) RETURNS SETOF topscores AS ' DECLARE t topscores%ROWTYPE; r RECORD; q RECORD; n alias for $1; BEGIN FOR q IN SELECT distinct query from table70 order by query LOOP FOR t IN SELECT id , query, checksum, score FROM table70 where query = q.query ORDER BY query, score DESC LIMIT n LOOP RETURN NEXT t; END LOOP; END LOOP; RETURN; END; ' language 'plpgsql'; select * from topscores(1) ; select * from topscores(2) ; select * from topscores(3) ; On Thu, Apr 08, 2004 at 07:55:33PM +, Jeff Boes wrote: Offered up for anyone with time on their hands. I fiddled around with this for half an afternoon, then gave up and did it programmatically in Perl. Given a table that looks something like this: id | INTEGER query| INTEGER checksum | char(32) score| INTEGER include | BOOLEAN The table is unique by id. Checksum may be repeated, but I only care if it is repeated within a given group by query. (query is non-null.) I can get the top scorer for each query row by something like this: SELECT * FROM ( SELECT DISTINCT ON (checksum) * FROM my_table ORDER BY checksum, score DESC) ORDER BY query; How would you go about getting the top N (say, the top 10) for each query? And then, if that's too easy for you--consider a further case where I want every row for a given query that has include TRUE, and enough non-include rows to make N. I might end up with more than N rows for a given value of query if there were more than N with include set. I headed off in the direction of groups of SELECTs and UNIONs, and quit when I got to something like four levels of SELECT ... AS FOO ... -- Jeff Boes vox 269.226.9550 ext 24 Database Engineer fax 269.349.9076 Nexcerpt, Inc. http://www.nexcerpt.com ...Nexcerpt... Extend your Expertise ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] SQL challenge--top 10 for each key value?
elein [EMAIL PROTECTED] writes: create or replace function pycounter(integer) returns integer as ' if args[0] == 0: SD[nextno] = 1 return SD[nextno] try: SD[nextno] += 1 except: SD[nextno] = 1 return SD[nextno] ' language 'plpythonu'; And clearly it can be done faster as a little C function. Does this approach have a hope of working if it's used twice in the same query? -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] SQL challenge--top 10 for each key value?
Rod Taylor [EMAIL PROTECTED] writes: ROW_NUMBER() is a spec defined function. (6.10 of SQL200N) If the spec doesn't even have a year number yet, you can hardly expect real implementations to support it ;-). There is no such thing in the extant specs SQL92 or SQL99. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [SQL] SQL challenge--top 10 for each key value?
Rod, Something along the lines of the below would accomplish what you want according to spec. ROW_NUMBER() is a spec defined function. (6.10 of SQL200N) Great leaping little gods! They added something called row number to the spec? Boy howdy, folks were right ... the ANSI committee really has completly blown off the relational model completely. First there was the addition of network-database functions so that IBM could make DB2 look more like a real database, now this When a standards committee becomes hostage to a handful of vendors, kiss real standards goodbye. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [SQL] SQL challenge--top 10 for each key value?
Josh Berkus [EMAIL PROTECTED] writes: Rod, Something along the lines of the below would accomplish what you want according to spec. ROW_NUMBER() is a spec defined function. (6.10 of SQL200N) Great leaping little gods! They added something called row number to the spec? Boy howdy, folks were right ... the ANSI committee really has completly blown off the relational model completely. If it's like Oracle's rownum then it's the row number of the *output*, not the position on disk. So it's not entirely blowing off the relational model any more than ORDER BY does. The weird thing is the number of cases where you want ORDER BY or rownum inside subselects. Which the solution to the original question needed. When a standards committee becomes hostage to a handful of vendors, kiss real standards goodbye. In the case of SQL was there ever any pretension otherwise? Was the SQL standard ever really useful as a real standard? I can write useful ANSI C89 code that will compile and work on any C compiler. Trying to write portable SQL92 code that does any useful work is about as productive as stapling bagels to your forehead. -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] SQL challenge--top 10 for each key value?
On Fri, 2004-04-09 at 18:43, Greg Stark wrote: Josh Berkus [EMAIL PROTECTED] writes: Rod, Something along the lines of the below would accomplish what you want according to spec. ROW_NUMBER() is a spec defined function. (6.10 of SQL200N) Great leaping little gods! They added something called row number to the spec? Boy howdy, folks were right ... the ANSI committee really has completly blown off the relational model completely. If it's like Oracle's rownum then it's the row number of the *output*, not the position on disk. So it's not entirely blowing off the relational model any more than ORDER BY does. The weird thing is the number of cases where you want ORDER BY or rownum inside subselects. Which the solution to the original question needed. It's not really like Oracles row num at all, though I suppose you can emulate rownum using it. The intention is that you will use it for aggregates like running totals, moving averages, counting, etc. http://www.devx.com/getHelpOn/10MinuteSolution/16573/1954?pf=true ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[SQL] SQL challenge--top 10 for each key value?
Offered up for anyone with time on their hands. I fiddled around with this for half an afternoon, then gave up and did it programmatically in Perl. Given a table that looks something like this: id | INTEGER query| INTEGER checksum | char(32) score| INTEGER include | BOOLEAN The table is unique by id. Checksum may be repeated, but I only care if it is repeated within a given group by query. (query is non-null.) I can get the top scorer for each query row by something like this: SELECT * FROM ( SELECT DISTINCT ON (checksum) * FROM my_table ORDER BY checksum, score DESC) ORDER BY query; How would you go about getting the top N (say, the top 10) for each query? And then, if that's too easy for you--consider a further case where I want every row for a given query that has include TRUE, and enough non-include rows to make N. I might end up with more than N rows for a given value of query if there were more than N with include set. I headed off in the direction of groups of SELECTs and UNIONs, and quit when I got to something like four levels of SELECT ... AS FOO ... -- Jeff Boes vox 269.226.9550 ext 24 Database Engineer fax 269.349.9076 Nexcerpt, Inc. http://www.nexcerpt.com ...Nexcerpt... Extend your Expertise ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] SQL challenge--top 10 for each key value?
Jeff Boes [EMAIL PROTECTED] writes: I headed off in the direction of groups of SELECTs and UNIONs, and quit when I got to something like four levels of SELECT ... AS FOO ... four? wimp, that's nothing! ok, seriously I think there's no way to do this directly with straight SQL. You would have to define a non-immutable function that has some temporary storage where it keeps track of how many it has seen. The generic function that would help here would be some kind of rank(value) that would give you the equivalent of rownum except with a level break every time value changes. I've been hoping to see something like this on the list for a long time but haven't yet. If the value of n is constant and small you could cheat with an aggregate function with an array of the top n values. db= create function first3_accum(integer[],integer) returns integer[] as 'select case when array_upper($1,1) = 3 then $1 else array_append($1,$2) end' language sql strict immutable; CREATE FUNCTION db= create aggregate first3 (basetype = integer, sfunc = first3_accum, stype = integer[], initcond = '{}'); CREATE AGGREGATE then something like: SELECT first3(id) FROM (SELECT id FROM my_table ORDER BY query, CASE WHEN include THEN 1 ELSE 2 END ASC, score DESC) GROUP BY query But then you'll have to go back to the table to refetch the original records that you've found. The best way I find to do that is with the int_array_enum() function from the int_aggregate contrib module. SELECT * FROM my_table WHERE id IN ( SELECT int_array_enum(f3) FROM ( SELECT first3(id) as f3 FROM (SELECT id FROM my_table ORDER BY query, CASE WHEN include THEN 1 ELSE 2 END ASC, score DESC) as x GROUP BY query ) as x ) This last step is kind of annoying since you've already seen all those records. And it requires writing a new aggregate function every time the value of n changes though, which kind of sucks. In theory if the new work in 7.5 handling structured datatypes is as cool as it sounds you could have an array of complete records and when UNNEST is eventually incorporated into the array code then you could expand those instead of using the int_array_enum function. Neither of those things are ready yet as far as I know though. -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [SQL] SQL challenge--top 10 for each key value?
On Thu, 2004-04-08 at 19:33, Greg Stark wrote: Jeff Boes [EMAIL PROTECTED] writes: I headed off in the direction of groups of SELECTs and UNIONs, and quit when I got to something like four levels of SELECT ... AS FOO ... four? wimp, that's nothing! ok, seriously I think there's no way to do this directly with straight SQL. You would have to define a non-immutable function that has some temporary storage where it keeps track of how many it has seen. I don't believe that is true, though it is certainly is in PostgreSQL. The spec has the ability to apply a progressive aggregate on the results of a query (window function). Meaning you can accomplish things like counting (ROW_NUMBER) or running totals. Something along the lines of the below would accomplish what you want according to spec. ROW_NUMBER() is a spec defined function. (6.10 of SQL200N) SELECT * FROM (SELECT ROW_NUMBER() OVER (DISTINCT query) AS counter rest of query ) WHERE counter 10; ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])