On Wed, Jul 23, 2025 at 1:55 AM Tomas Vondra <to...@vondra.me> wrote: > On 7/21/25 14:39, Thomas Munro wrote: > > Here also are some alternative experimental patches for preserving > > accumulated look-ahead distance better in cases like that. Needs more > > exploration... thoughts/ideas welcome... > > Thanks! I'll rerun the tests with these patches once the current round > of tests (with the simple distance restore after a reset) completes.
Here's C, a tider expression of the policy from the B patch. Also, I realised that the quickly-drafted A patch didn't actually implement what Andres suggested in the other thread as I had intended, what he actually speculated about is distance * 2 + nblocks. But it doesn't seem to matter much: anything you come up with along those lines seems to suffer from the problem that you can easily produce a test that defeats it by inserting just one more hit in between the misses, where the numbers involved can be quite small. The only policy I've come up with so far that doesn't give up until we definitely can't do better is the one that tracks a hypothetical window of the largest distance we possibly could have, and refuses to shrink the actual window until even the maximum wouldn't be enough, as expressed in the B and C patches. On the flip side, that degree of pessimism has a cost: of course it takes much longer to come back to distance = 1 and perhaps the fast path. Does it matter? I don't know. (It's only a hunch at this point but I think I can see a potentially better way to derive that sustain value from information available with another in-development patch that adds a new io_currency_target value, using IO subsystem feedback to compute the IO concurrency level that avoids I/O stalls but not more instead of going all the way to the GUC limits and making it the user's problem to set them sensibly. I'll have to look into that properly, but I think it might be able to produce an ideal sustain value...)
From 7e637be6685c4f88f4bc490c392211bb6efb8fb7 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Mon, 21 Jul 2025 16:34:59 +1200 Subject: [PATCH 3/3] aio: Improve read_stream.c look-ahead heuristics C Previously we would reduce the look-ahead distance by one every time we got a cache hit, which sometimes performed poorly with mixed hit/miss patterns, especially if it was trapped at one. Instead, sustain the current distance until we've seen evidence that there is no window big enough to span the gap between rare IOs. In other words, we now use information from a much larger window to estimate the utility of looking far ahead. XXX Highly experimental! --- src/backend/storage/aio/read_stream.c | 36 ++++++++++++++++++--------- 1 file changed, 24 insertions(+), 12 deletions(-) diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c index f242b373b22..81f752d0414 100644 --- a/src/backend/storage/aio/read_stream.c +++ b/src/backend/storage/aio/read_stream.c @@ -99,6 +99,7 @@ struct ReadStream int16 forwarded_buffers; int16 pinned_buffers; int16 distance; + int16 distance_sustain; int16 initialized_buffers; int read_buffers_flags; bool sync_mode; /* using io_method=sync */ @@ -343,22 +344,36 @@ read_stream_start_pending_read(ReadStream *stream) /* Remember whether we need to wait before returning this buffer. */ if (!need_wait) { - /* Look-ahead distance decays, no I/O necessary. */ - if (stream->distance > 1) + /* + * Look-ahead distance decays if we haven't had any cache misses in a + * hypothetical window of recent accesses. + */ + if (stream->distance_sustain > 0) + stream->distance_sustain--; + else if (stream->distance > 1) stream->distance--; } else { - /* - * Remember to call WaitReadBuffers() before returning head buffer. - * Look-ahead distance will be adjusted after waiting. - */ + /* Remember to call WaitReadBuffers() before returning head buffer. */ stream->ios[io_index].buffer_index = buffer_index; if (++stream->next_io_index == stream->max_ios) stream->next_io_index = 0; Assert(stream->ios_in_progress < stream->max_ios); stream->ios_in_progress++; stream->seq_blocknum = stream->pending_read_blocknum + nblocks; + + /* Look-ahead distance doubles. */ + if (stream->distance > stream->max_pinned_buffers - stream->distance) + stream->distance = stream->max_pinned_buffers; + else + stream->distance += stream->distance; + + /* + * Don't let the distance begin to decay until we've seen no IOs over + * a hypothetical window of the maximum possible size. + */ + stream->distance_sustain = stream->max_pinned_buffers; } /* @@ -897,7 +912,6 @@ read_stream_next_buffer(ReadStream *stream, void **per_buffer_data) stream->ios[stream->oldest_io_index].buffer_index == oldest_buffer_index) { int16 io_index = stream->oldest_io_index; - int32 distance; /* wider temporary value, clamped below */ /* Sanity check that we still agree on the buffers. */ Assert(stream->ios[io_index].op.buffers == @@ -910,11 +924,6 @@ read_stream_next_buffer(ReadStream *stream, void **per_buffer_data) if (++stream->oldest_io_index == stream->max_ios) stream->oldest_io_index = 0; - /* Look-ahead distance ramps up rapidly after we do I/O. */ - distance = stream->distance * 2; - distance = Min(distance, stream->max_pinned_buffers); - stream->distance = distance; - /* * If we've reached the first block of a sequential region we're * issuing advice for, cancel that until the next jump. The kernel @@ -1056,7 +1065,10 @@ read_stream_reset(ReadStream *stream, int flags) /* Start off like a newly initialized stream, unless asked not to. */ if ((flags & READ_STREAM_RESET_CONTINUE) == 0) + { + stream->distance_sustain = 0; stream->distance = 1; + } stream->end_of_stream = false; } -- 2.39.5 (Apple Git-154)