[HACKERS] Top-k optimizations?

2005-01-13 Thread David Fetter
Folks,

As this came up in a work situation, I was wondering a little bit
about the top-k issue.  Right now, top-k is implemented (most easily,
I think) via a SELECT with a LIMIT and no OFFSET.  3 questions arise
from this.

1.  Are there currently any optimizations specific to top-k in
PostgreSQL?  If so, what are they?

2.  What kinds of top-k optimizations *can't* be part of PostgreSQL
(things that would break MVCC, e.g.)?

3.  What kinds of top-k optimizations might (eventually) be included
in PostgreSQL?

Hoping this stimulates some friendly  informative discussion... :)

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Top-k optimizations?

2005-01-13 Thread Kris Jurka


On Thu, 13 Jan 2005, David Fetter wrote:

 3.  What kinds of top-k optimizations might (eventually) be included
 in PostgreSQL?
 

See the TODO item:

Allow ORDER BY ... LIMIT 1 to select high/low value without sort or index 
using a sequential scan for highest/lowest values

If only one value is needed, there is no need to sort the entire table. 
Instead a sequential scan could get the matching value.

There is some discussion of this on the -general list here:

http://groups-beta.google.com/group/comp.databases.postgresql.general/messages/08c615cc2cbdf143,fe626a7cc9021d12,4f1d0575be60c26f,5c44463d8ef0e1ef,ceff42f0dae09272,dc9da98adcb6142c,7f34133e99b38825,28b43c5e79924da6,98be76099ea6513f,5d3f19a69e3b5a93?thread_id=7d35d3eb00ffd0e8mode=threadnoheader=1q=oleg+limit+sort#doc_7f34133e99b38825


Kris Jurka


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Top-k optimizations?

2005-01-13 Thread Ron Mayer
David Fetter wrote:
Folks,
As this came up in a work situation, I was wondering a little bit
about the top-k issue.  Right now, top-k is implemented (most easily,
I think) via a SELECT with a LIMIT and no OFFSET.  3 questions arise
from this.
I think the simplest LIMIT query doesn't make it easy to show ties;
but if you don't want the equivalent of MSSQL's with ties clause
I think LIMIT works well.

1.  Are there currently any optimizations specific to top-k in
PostgreSQL?  If so, what are they?
Well, when I do queries like
 select * from customers order by dollarsspent desc limit 3
it happily uses an index on dollarsspent.

3.  What kinds of top-k optimizations might (eventually) be included
in PostgreSQL?
I think a slightly related topic is whether syntactically it'd be
nice if postgresql had the SQL 2003 optional olap features to
specify ways of doing top-k queries as described here:
http://troels.arvin.dk/db/rdbms/#select-top-n
 SELECT * FROM (
  SELECT
RANK() OVER (ORDER BY age ASC) AS ranking,
person_id,  person_name, age
  FROM person
) AS foo
WHERE ranking = 3
Seems IBM and Oracle support that syntax or something very similar.
Yeah, I know it's probably an orthogonal question to the
optimizations one, but might make porting nicer.
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]