On Tue, Jul 22, 2025 at 6:53 PM Andres Freund <and...@anarazel.de> wrote: > That may be true with local fast NVMe disks, but won't be true for networked > storage like in common clouds. Latencies of 0.3 - 4ms leave a lot of CPU > cycles for actual processing of the data.
I don't understand why it wouldn't be a problem for NVMe disks, too. Take a range scan on pgbench_accounts_pkey, for example -- something like your ORDER BY ... LIMIT N test case, but with pgbench data instead of TPC-H data. There are 6 heap blocks per leaf page. As I understand it, the simple patch will only be able to see up to 6 heap blocks "into the future", at any given time. Why isn't that quite a significant drawback, regardless of the underlying storage? > Also, plenty indexes are on multiple columns and/or wider datatypes, making > bubbles triggered due to "crossing-the-leaf-page" more common. I actually don't think that that's a significant factor. Even with fairly wide tuples, we'll still tend to be able to fit about 200 on each leaf page. For a variety of reasons that doesn't compare too badly to simple indexes (like pgbench_accounts_pkey), which will store about 370 when the index is in a pristine state. It does matter, but in the grand scheme of things it's unlikely to be decisive. -- Peter Geoghegan