As far as I have understood the following query
  SELECT DISTINCT foo
  FROM bar
  LIMIT baz
is done by first sorting the input and then traversing the sorted data, ensuring uniqueness of output and stopping when the LIMIT threshold is reached. Furthermore, a part of the sort procedure is to traverse input at least one time.

Now, if the input is large but the LIMIT threshold is small, this sorting step may increase the query time unnecessarily so here is a suggestion for optimization: If the input is "sufficiently" large and the LIMIT threshold "sufficiently" small, maintain the DISTINCT output by hashning while traversing the input and stop when the LIMIT threshold is reached. No sorting required and *at* *most* one read of input.

Use case: Websites that needs to present small samples of huge queries fast.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to