[HACKERS] sequential scan result order vs performance

Andres Freund Sun, 30 Oct 2016 00:37:58 -0700

Hi,

while working on the executor, to process "batches" or "bubbles" of
tuples I hit some weird performance issues (as in things didn't improve
as much as I had hoped).  A fair amount of headscratching later I
figured out that the tuple order in sequential scans is a major
bottleneck.


When iterating over a page we return tuples in itemid order, which makes
them returned in *descending* order address-wise, as tuples are stored
starting from the end of the page.  But when actually accessing the
tuples, we access them increasing address order (header, and then column
by column). It appears that that, quite understandable confuses prefetch
units, leading to drastically increased cache miss ratios.

It's quite easy to change iteration so we start with the latest item,
and iterate till the first, rather than the other way round. In
benchmarks with somewhat wide columns and aggregation, this yields
speedups of over 30%, before hitting other bottlenecks.

I do wonder however if it's acceptable to change the result order of
sequential scans. People don't tend to specify ORDER BY everwhere - as
evidenced by large swathes of our regression tests failing spuriously -
so they might not be happy to see a somewhat weird order (pages
sequentially increasing, but individual tuples inside a page in reverse
order).

We could change the order only in cases where the user doesn't actually
see the result, say below aggregation, sort, and whatnot nodes. On the
other hand the benefit is quite significant for heavily filtered
sequential scans as well, COPY out also benefits.

Comments?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] sequential scan result order vs performance

Reply via email to