On Wed, Jul 23, 2025 at 1:55 AM Tomas Vondra <to...@vondra.me> wrote:
> On 7/21/25 14:39, Thomas Munro wrote:
> > Here also are some alternative experimental patches for preserving
> > accumulated look-ahead distance better in cases like that.  Needs more
> > exploration... thoughts/ideas welcome...
>
> Thanks! I'll rerun the tests with these patches once the current round
> of tests (with the simple distance restore after a reset) completes.

Here's C, a tider expression of the policy from the B patch.

Also, I realised that the quickly-drafted A patch didn't actually
implement what Andres suggested in the other thread as I had intended,
what he actually speculated about is distance * 2 + nblocks.

But it doesn't seem to matter much: anything you come up with along
those lines seems to suffer from the problem that you can easily
produce a test that defeats it by inserting just one more hit in
between the misses, where the numbers involved can be quite small.
The only policy I've come up with so far that doesn't give up until we
definitely can't do better is the one that tracks a hypothetical
window of the largest distance we possibly could have, and refuses to
shrink the actual window until even the maximum wouldn't be enough, as
expressed in the B and C patches.

On the flip side, that degree of pessimism has a cost: of course it
takes much longer to come back to distance = 1 and perhaps the fast
path.  Does it matter?  I don't know.

(It's only a hunch at this point but I think I can see a potentially
better way to derive that sustain value from information available
with another in-development patch that adds a new io_currency_target
value, using IO subsystem feedback to compute the IO concurrency level
that avoids I/O stalls but not more instead of going all the way to
the GUC limits and making it the user's problem to set them sensibly.
I'll have to look into that properly, but I think it might be able to
produce an ideal sustain value...)
From 7e637be6685c4f88f4bc490c392211bb6efb8fb7 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Mon, 21 Jul 2025 16:34:59 +1200
Subject: [PATCH 3/3] aio: Improve read_stream.c look-ahead heuristics C

Previously we would reduce the look-ahead distance by one every time we
got a cache hit, which sometimes performed poorly with mixed hit/miss
patterns, especially if it was trapped at one.

Instead, sustain the current distance until we've seen evidence that
there is no window big enough to span the gap between rare IOs.  In
other words, we now use information from a much larger window to
estimate the utility of looking far ahead.

XXX Highly experimental!
---
 src/backend/storage/aio/read_stream.c | 36 ++++++++++++++++++---------
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/src/backend/storage/aio/read_stream.c 
b/src/backend/storage/aio/read_stream.c
index f242b373b22..81f752d0414 100644
--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -99,6 +99,7 @@ struct ReadStream
        int16           forwarded_buffers;
        int16           pinned_buffers;
        int16           distance;
+       int16           distance_sustain;
        int16           initialized_buffers;
        int                     read_buffers_flags;
        bool            sync_mode;              /* using io_method=sync */
@@ -343,22 +344,36 @@ read_stream_start_pending_read(ReadStream *stream)
        /* Remember whether we need to wait before returning this buffer. */
        if (!need_wait)
        {
-               /* Look-ahead distance decays, no I/O necessary. */
-               if (stream->distance > 1)
+               /*
+                * Look-ahead distance decays if we haven't had any cache 
misses in a
+                * hypothetical window of recent accesses.
+                */
+               if (stream->distance_sustain > 0)
+                       stream->distance_sustain--;
+               else if (stream->distance > 1)
                        stream->distance--;
        }
        else
        {
-               /*
-                * Remember to call WaitReadBuffers() before returning head 
buffer.
-                * Look-ahead distance will be adjusted after waiting.
-                */
+               /* Remember to call WaitReadBuffers() before returning head 
buffer. */
                stream->ios[io_index].buffer_index = buffer_index;
                if (++stream->next_io_index == stream->max_ios)
                        stream->next_io_index = 0;
                Assert(stream->ios_in_progress < stream->max_ios);
                stream->ios_in_progress++;
                stream->seq_blocknum = stream->pending_read_blocknum + nblocks;
+
+               /* Look-ahead distance doubles. */
+               if (stream->distance > stream->max_pinned_buffers - 
stream->distance)
+                       stream->distance = stream->max_pinned_buffers;
+               else
+                       stream->distance += stream->distance;
+
+               /*
+                * Don't let the distance begin to decay until we've seen no 
IOs over
+                * a hypothetical window of the maximum possible size.
+                */
+               stream->distance_sustain = stream->max_pinned_buffers;
        }
 
        /*
@@ -897,7 +912,6 @@ read_stream_next_buffer(ReadStream *stream, void 
**per_buffer_data)
                stream->ios[stream->oldest_io_index].buffer_index == 
oldest_buffer_index)
        {
                int16           io_index = stream->oldest_io_index;
-               int32           distance;       /* wider temporary value, 
clamped below */
 
                /* Sanity check that we still agree on the buffers. */
                Assert(stream->ios[io_index].op.buffers ==
@@ -910,11 +924,6 @@ read_stream_next_buffer(ReadStream *stream, void 
**per_buffer_data)
                if (++stream->oldest_io_index == stream->max_ios)
                        stream->oldest_io_index = 0;
 
-               /* Look-ahead distance ramps up rapidly after we do I/O. */
-               distance = stream->distance * 2;
-               distance = Min(distance, stream->max_pinned_buffers);
-               stream->distance = distance;
-
                /*
                 * If we've reached the first block of a sequential region we're
                 * issuing advice for, cancel that until the next jump.  The 
kernel
@@ -1056,7 +1065,10 @@ read_stream_reset(ReadStream *stream, int flags)
 
        /* Start off like a newly initialized stream, unless asked not to. */
        if ((flags & READ_STREAM_RESET_CONTINUE) == 0)
+       {
+               stream->distance_sustain = 0;
                stream->distance = 1;
+       }
        stream->end_of_stream = false;
 }
 
-- 
2.39.5 (Apple Git-154)

Reply via email to