[HACKERS] Optimizing DISTINCT with LIMIT

tmp Thu, 04 Dec 2008 05:43:45 -0800

As far as I have understood the following query
  SELECT DISTINCT foo
  FROM bar
  LIMIT baz

is done by first sorting the input and then traversing the sorted data,ensuring uniqueness of output and stopping when the LIMIT threshold isreached. Furthermore, a part of the sort procedure is to traverse inputat least one time.

Now, if the input is large but the LIMIT threshold is small, thissorting step may increase the query time unnecessarily so here is asuggestion for optimization:If the input is "sufficiently" large and the LIMIT threshold"sufficiently" small, maintain the DISTINCT output by hashning whiletraversing the input and stop when the LIMIT threshold is reached. Nosorting required and *at* *most* one read of input.


Use case: Websites that needs to present small samples of huge queries fast.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Optimizing DISTINCT with LIMIT

Reply via email to