Re: [HACKERS] Optimizing DISTINCT with LIMIT

Greg Stark Sat, 06 Dec 2008 10:48:42 -0800

It's slow because there's no way around running through the entireinput. The optimization tmp is talking about wouldn't be relevantbecase there is an order by clause which was precisely why I I said itwas a fairly narrow use case. Most people who use limit want aspecific subset even if that specific subset is random. Without theorder by the subset is entirely arbitrary but not useully random.

Incidentally "order by ... limit" is amenable to an optimization whichavoids having to *sort* the whole input even though it still has toread the whole input. We implemented that in 8.3.



greg

On 6 Dec 2008, at 06:08 PM, Grzegorz Jaskiewicz <[EMAIL PROTECTED]>wrote:

On 2008-12-06, at 11:29, David Lee Lambert wrote:
I use "ORDER BY random() LIMIT :some_small_number" frequently toget a "feel"for data. That always builds the unrandomized relation and thensorts it. Iguess an alternate path for single-table queries would be torandomly choosea block number and then a tuple number; but that would be biasedtoward long
rows (of which fewer can appear in a block).
but that's going to be extremely slow, due to speed of random()function.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizing DISTINCT with LIMIT

Reply via email to