Re: [SQL] SQL challenge--top 10 for each key value?

2004-04-19 Thread Jeff Boes
Troels Arvin wrote:
See http://www.acm.org/sigmod/record/issues/0403/index.html#standards for
an article which summarizes the news in SQL:2003.
This is a very useful page; thank you for creating it and for noting it 
in this thread!

--
(Posted from an account used as a SPAM dump. If you really want to get
in touch with me, dump the 'jboes' and substitute 'mur'.)

Jeffery Boes  [EMAIL PROTECTED]
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [SQL] SQL challenge--top 10 for each key value?

2004-04-11 Thread elein
No, it will not work twice in the same query as is.

If you want to code two counter buckets and pass in
some way to distinguish between the two yada yada yada
it is possible.  It is also possible to code this to
do multi-level counting/breaks/calculations, etc.

But the SD dictionary is by connection. So any values
stored in it need to be initialized at the appropriate
time *outside* of the first use.

elein

On Sun, Apr 11, 2004 at 12:38:20AM -0400, Greg Stark wrote:
 
 elein [EMAIL PROTECTED] writes:
 
  create or replace function pycounter(integer)
  returns integer as
  '
 if args[0] == 0:
SD[nextno] = 1
return SD[nextno]
 try:
SD[nextno] += 1
 except:
SD[nextno] = 1
 return SD[nextno]
  ' language 'plpythonu';
  
  And clearly it can be done faster as a little
  C function.
 
 Does this approach have a hope of working if it's used twice in the same
 query?
 
 
 -- 
 greg
 
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [SQL] SQL challenge--top 10 for each key value?

2004-04-10 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
 
 How would you go about getting the top N (say, the top 10) for each query?
 
Assume you have a table ch and three sequences 'aa', 'bb', and 'cc'.
(Only 'aa' and 'bb' need to be initially set)
 
SELECT setval('aa',1,'f');
SELECT setval('bb',1,'f');
 
SELECT nextval('cc') AS rating,q2 AS query, s2 AS score FROM
(
 SELECT 0 AS q1, 0 AS s1, NULL AS cs, nextval('aa') AS v1
 UNION ALL
 (SELECT *, nextval('aa') AS v1 FROM
  (SELECT query AS q1, MAX(score) AS s1, checksum AS cs FROM ch GROUP BY 1,3 ORDER BY 
1 ASC, 2 DESC) AS foo)
) AS uno,
(
 (SELECT *, nextval('bb') AS v2 FROM
  (SELECT query AS q2, MAX(score) AS s2, checksum AS cs FROM ch GROUP BY 1,3 ORDER BY 
1 ASC, 2 DESC) AS foo)
 UNION ALL
 SELECT NULL AS q2, 0 AS s2, NULL AS cs, nextval('bb') AS v2
) AS dos
WHERE v1 = v2 AND q2 IS NOT NULL
AND (
 (CASE WHEN q1 != q2 THEN setval('cc',1,'f') ELSE 0 END  0)
 OR
 (CASE WHEN currval('cc')10 THEN 1 ELSE 0 END 0)
);
 
 
 
- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200404101029
 
-BEGIN PGP SIGNATURE-
 
iD8DBQFAeAZ1vJuQZxSWSsgRAqYuAJ9HaYLotPYkyi1U76I9xnvi8AhLTQCfUyJq
+iVdbz5U7HKep89z0kp49U0=
=6+OH
-END PGP SIGNATURE-



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] SQL challenge--top 10 for each key value?

2004-04-10 Thread Josh Berkus
Rod, Greg

 It's not really like Oracles row num at all, though I suppose you can
 emulate rownum using it. The intention is that you will use it for
 aggregates like running totals, moving averages, counting, etc.

Yes, that makes a certain amount of sense.  I just take exception to the name 
Row Number becuase it confuses newbies about the actual nature of the data 
being returned and gets them back in the bad space of believing in fixed data 
ordering -- something which RDBMSes are supposed to avoid.

I'm also disgusted at the vendor partiality that this shows.  PostgreSQL and 
MySQL both have working implementations of a very similar concept using the 
non-confusing term Limit.   However, since Oracle is on the committee, they 
had to use a more Oracle-friendly term, I guess.

 In the case of SQL was there ever any pretension otherwise? Was the SQL
 standard ever really useful as a real standard? I can write useful ANSI
 C89 code that will compile and work on any C compiler. Trying to write
 portable SQL92 code that does any useful work is about as productive as
 stapling bagels to your forehead.

Well, there *was* a pretense otherwise, which ended about 1994, just as we 
were getting SQL on this project.  Now the big vendors -- mostly IBM and 
Oracle since Informix and Sybase are dying -- run everything and adapt the 
standard to what features their products already have.

So, yes, SQL92 needed development and expansion.   But we didn't need SQL99.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [SQL] SQL challenge--top 10 for each key value?

2004-04-10 Thread Troels Arvin
On Fri, 09 Apr 2004 02:11:44 -0400, Tom Lane wrote:

 ROW_NUMBER() is a spec defined function. (6.10 of SQL200N)
 
 If the spec doesn't even have a year number yet, you can hardly expect
 real implementations to support it ;-)

SQL:2003 is finished. Among its new (non-core) OLAP features are a set of
windows functions (spec section 6.10), which include

feature ID T611 (elementary OLAP):
 - ROW_NUMBER() OVER (...)
 - RANK() OVER (...)
 - DENSE_RANK() OVER (...)

feature ID T612 (extended OLAP):
 - PERCENT_RANK() OVER (...)
 - CUME_DIST() OVER (...)

See http://www.acm.org/sigmod/record/issues/0403/index.html#standards for
an article which summarizes the news in SQL:2003.

ROW_NUMBER() OVER may be used in queries where a PostgreSQL user which use
LIMIT.

RANK() OVER may be used in queries where a PostgreSQL user would have to
come up with a somewhat strange query in order to get acceptable
performance, see http://troels.arvin.dk/db/rdbms/#select-top-n-postgresql

-- 
Greetings from Troels Arvin, Copenhagen, Denmark



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [SQL] SQL challenge--top 10 for each key value?

2004-04-10 Thread Troels Arvin
On Sun, 11 Apr 2004 00:13:17 +0200, I wrote:

 Among its new (non-core) OLAP features are a set of
 windows functions

Sorry - I meant window functions... (Microsoft don't seem to have had
much influence in SQL:2003's OLAP-specifications; IBM seems to be the big
influencer in those parts of the standard.)

-- 
Greetings from Troels Arvin, Copenhagen, Denmark



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] SQL challenge--top 10 for each key value?

2004-04-10 Thread elein
This solution will be in Monday's edition of
PostgreSQL General Bits (http://www.varlena.com/GeneralBits).
(In other words, if it doesn't do what you mean, let me know now!)


CREATE TYPE topscores AS
   (id integer, query integer, checksum char(32), score integer);

CREATE OR REPLACE FUNCTION topscores(integer) RETURNS SETOF topscores AS
'
DECLARE
   t topscores%ROWTYPE;
   r RECORD;
   q RECORD;
   n alias for $1;
BEGIN
   FOR q IN SELECT distinct query from table70 order by query LOOP
  FOR t IN SELECT id , query, checksum, score
 FROM table70
 where query = q.query
 ORDER BY query, score DESC LIMIT n LOOP
RETURN NEXT t;
  END LOOP;
   END LOOP;
   RETURN;
END;
' language 'plpgsql';

select * from topscores(1) ;
select * from topscores(2) ;
select * from topscores(3) ;


On Thu, Apr 08, 2004 at 07:55:33PM +, Jeff Boes wrote:
 Offered up for anyone with time on their hands. I fiddled around with 
 this for half an afternoon, then gave up and did it programmatically in 
 Perl.
 
 Given a table that looks something like this:
 
 id   | INTEGER
 query| INTEGER
 checksum | char(32)
 score| INTEGER
 include  | BOOLEAN
 
 
 The table is unique by id. Checksum may be repeated, but I only care 
 if it is repeated within a given group by query. (query is non-null.)
 
 I can get the top scorer for each query row by something like this:
 
 SELECT * FROM (
   SELECT DISTINCT ON (checksum) *
   FROM my_table
   ORDER BY checksum, score DESC)
 ORDER BY query;
 
 How would you go about getting the top N (say, the top 10) for each query?
 
 And then, if that's too easy for you--consider a further case where I 
 want every row for a given query that has include TRUE, and enough 
 non-include rows to make N. I might end up with more than N rows for a 
 given value of query if there were more than N with include set.
 
 I headed off in the direction of groups of SELECTs and UNIONs, and quit 
 when I got to something like four levels of SELECT ... AS FOO ...
 
 -- 
 Jeff Boes  vox 269.226.9550 ext 24
 Database Engineer fax 269.349.9076
 Nexcerpt, Inc. http://www.nexcerpt.com
...Nexcerpt... Extend your Expertise
 
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] SQL challenge--top 10 for each key value?

2004-04-10 Thread Greg Stark

elein [EMAIL PROTECTED] writes:

 create or replace function pycounter(integer)
 returns integer as
 '
if args[0] == 0:
   SD[nextno] = 1
   return SD[nextno]
try:
   SD[nextno] += 1
except:
   SD[nextno] = 1
return SD[nextno]
 ' language 'plpythonu';
 
 And clearly it can be done faster as a little
 C function.

Does this approach have a hope of working if it's used twice in the same
query?


-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [SQL] SQL challenge--top 10 for each key value?

2004-04-09 Thread Tom Lane
Rod Taylor [EMAIL PROTECTED] writes:
 ROW_NUMBER() is a spec defined function. (6.10 of SQL200N)

If the spec doesn't even have a year number yet, you can hardly expect
real implementations to support it ;-).  There is no such thing in the
extant specs SQL92 or SQL99.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [SQL] SQL challenge--top 10 for each key value?

2004-04-09 Thread Josh Berkus
Rod,

 Something along the lines of the below would accomplish what you want
 according to spec. ROW_NUMBER() is a spec defined function. (6.10 of
 SQL200N)

Great leaping little gods!   They added something called row number to the 
spec? 

Boy howdy, folks were right ... the ANSI committee really has completly blown 
off the relational model completely.   First there was the addition of 
network-database functions so that IBM could make DB2 look more like a real 
database, now this 

When a standards committee becomes hostage to a handful of vendors, kiss real 
standards goodbye.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [SQL] SQL challenge--top 10 for each key value?

2004-04-09 Thread Greg Stark
Josh Berkus [EMAIL PROTECTED] writes:

 Rod,
 
  Something along the lines of the below would accomplish what you want
  according to spec. ROW_NUMBER() is a spec defined function. (6.10 of
  SQL200N)
 
 Great leaping little gods!   They added something called row number to the 
 spec? 
 
 Boy howdy, folks were right ... the ANSI committee really has completly blown 
 off the relational model completely.   

If it's like Oracle's rownum then it's the row number of the *output*, not the
position on disk. So it's not entirely blowing off the relational model any
more than ORDER BY does.

The weird thing is the number of cases where you want ORDER BY or rownum
inside subselects. Which the solution to the original question needed.

 When a standards committee becomes hostage to a handful of vendors, kiss
 real standards goodbye.

In the case of SQL was there ever any pretension otherwise? Was the SQL
standard ever really useful as a real standard? I can write useful ANSI C89
code that will compile and work on any C compiler. Trying to write portable
SQL92 code that does any useful work is about as productive as stapling bagels
to your forehead.

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [SQL] SQL challenge--top 10 for each key value?

2004-04-09 Thread Rod Taylor
On Fri, 2004-04-09 at 18:43, Greg Stark wrote:
 Josh Berkus [EMAIL PROTECTED] writes:
 
  Rod,
  
   Something along the lines of the below would accomplish what you want
   according to spec. ROW_NUMBER() is a spec defined function. (6.10 of
   SQL200N)
  
  Great leaping little gods!   They added something called row number to the 
  spec? 
  
  Boy howdy, folks were right ... the ANSI committee really has completly blown 
  off the relational model completely.   
 
 If it's like Oracle's rownum then it's the row number of the *output*, not the
 position on disk. So it's not entirely blowing off the relational model any
 more than ORDER BY does.
 
 The weird thing is the number of cases where you want ORDER BY or rownum
 inside subselects. Which the solution to the original question needed.

It's not really like Oracles row num at all, though I suppose you can
emulate rownum using it. The intention is that you will use it for
aggregates like running totals, moving averages, counting, etc.

http://www.devx.com/getHelpOn/10MinuteSolution/16573/1954?pf=true



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[SQL] SQL challenge--top 10 for each key value?

2004-04-08 Thread Jeff Boes
Offered up for anyone with time on their hands. I fiddled around with 
this for half an afternoon, then gave up and did it programmatically in 
Perl.

Given a table that looks something like this:

id   | INTEGER
query| INTEGER
checksum | char(32)
score| INTEGER
include  | BOOLEAN
The table is unique by id. Checksum may be repeated, but I only care 
if it is repeated within a given group by query. (query is non-null.)

I can get the top scorer for each query row by something like this:

SELECT * FROM (
  SELECT DISTINCT ON (checksum) *
  FROM my_table
  ORDER BY checksum, score DESC)
ORDER BY query;
How would you go about getting the top N (say, the top 10) for each query?

And then, if that's too easy for you--consider a further case where I 
want every row for a given query that has include TRUE, and enough 
non-include rows to make N. I might end up with more than N rows for a 
given value of query if there were more than N with include set.

I headed off in the direction of groups of SELECTs and UNIONs, and quit 
when I got to something like four levels of SELECT ... AS FOO ...

--
Jeff Boes  vox 269.226.9550 ext 24
Database Engineer fax 269.349.9076
Nexcerpt, Inc. http://www.nexcerpt.com
   ...Nexcerpt... Extend your Expertise
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [SQL] SQL challenge--top 10 for each key value?

2004-04-08 Thread Greg Stark

Jeff Boes [EMAIL PROTECTED] writes:

 I headed off in the direction of groups of SELECTs and UNIONs, and quit when I
 got to something like four levels of SELECT ... AS FOO ...

four? wimp, that's nothing!

ok, seriously I think there's no way to do this directly with straight SQL.
You would have to define a non-immutable function that has some temporary
storage where it keeps track of how many it has seen. 

The generic function that would help here would be some kind of rank(value)
that would give you the equivalent of rownum except with a level break every
time value changes. I've been hoping to see something like this on the list
for a long time but haven't yet.

If the value of n is constant and small you could cheat with an aggregate
function with an array of the top n values.

db= create function first3_accum(integer[],integer) returns integer[] as 'select case 
when array_upper($1,1) = 3 then $1 else array_append($1,$2) end' language sql strict 
immutable;
CREATE FUNCTION
db= create aggregate first3 (basetype = integer, sfunc = first3_accum, stype = 
integer[], initcond = '{}');
CREATE AGGREGATE

then something like:

SELECT first3(id)
  FROM (SELECT id 
 FROM my_table 
ORDER BY query, 
 CASE WHEN include THEN 1 ELSE 2 END ASC, 
 score DESC)
 GROUP BY query

But then you'll have to go back to the table to refetch the original records
that you've found. The best way I find to do that is with the int_array_enum()
function from the int_aggregate contrib module.

SELECT * 
  FROM my_table 
 WHERE id IN (
   SELECT int_array_enum(f3)
 FROM (
   SELECT first3(id) as f3
 FROM (SELECT id 
FROM my_table 
   ORDER BY query, 
CASE WHEN include THEN 1 ELSE 2 END ASC, 
score DESC) as x
GROUP BY query
 ) as x
   )


This last step is kind of annoying since you've already seen all those
records. And it requires writing a new aggregate function every time the value
of n changes though, which kind of sucks.

In theory if the new work in 7.5 handling structured datatypes is as cool as
it sounds you could have an array of complete records and when UNNEST is
eventually incorporated into the array code then you could expand those
instead of using the int_array_enum function. Neither of those things are
ready yet as far as I know though.

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [SQL] SQL challenge--top 10 for each key value?

2004-04-08 Thread Rod Taylor
On Thu, 2004-04-08 at 19:33, Greg Stark wrote:
 Jeff Boes [EMAIL PROTECTED] writes:
 
  I headed off in the direction of groups of SELECTs and UNIONs, and quit when I
  got to something like four levels of SELECT ... AS FOO ...
 
 four? wimp, that's nothing!
 
 ok, seriously I think there's no way to do this directly with straight SQL.
 You would have to define a non-immutable function that has some temporary
 storage where it keeps track of how many it has seen. 

I don't believe that is true, though it is certainly is in PostgreSQL.

The spec has the ability to apply a progressive aggregate on the results
of a query (window function). Meaning you can accomplish things like
counting (ROW_NUMBER) or running totals.

Something along the lines of the below would accomplish what you want
according to spec. ROW_NUMBER() is a spec defined function. (6.10 of
SQL200N)

SELECT * 
  FROM (SELECT ROW_NUMBER() OVER (DISTINCT query) AS counter
  rest of query
   )
  WHERE counter  10;


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])