Hi, while working on the executor, to process "batches" or "bubbles" of tuples I hit some weird performance issues (as in things didn't improve as much as I had hoped). A fair amount of headscratching later I figured out that the tuple order in sequential scans is a major bottleneck.
When iterating over a page we return tuples in itemid order, which makes them returned in *descending* order address-wise, as tuples are stored starting from the end of the page. But when actually accessing the tuples, we access them increasing address order (header, and then column by column). It appears that that, quite understandable confuses prefetch units, leading to drastically increased cache miss ratios. It's quite easy to change iteration so we start with the latest item, and iterate till the first, rather than the other way round. In benchmarks with somewhat wide columns and aggregation, this yields speedups of over 30%, before hitting other bottlenecks. I do wonder however if it's acceptable to change the result order of sequential scans. People don't tend to specify ORDER BY everwhere - as evidenced by large swathes of our regression tests failing spuriously - so they might not be happy to see a somewhat weird order (pages sequentially increasing, but individual tuples inside a page in reverse order). We could change the order only in cases where the user doesn't actually see the result, say below aggregation, sort, and whatnot nodes. On the other hand the benefit is quite significant for heavily filtered sequential scans as well, COPY out also benefits. Comments? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers