Re: Don't synchronously wait for already-in-progress IO in read stream

Melanie Plageman Fri, 27 Mar 2026 16:00:01 -0700

On Fri, Mar 27, 2026 at 1:29 PM Andres Freund <[email protected]> wrote:
>
> Hi,
>
> I think I forgot to update the thread in my last message to note that I
> had committed some of the preliminary changes.
<--snip-->
> I've also done a bunch of cleanup in the commits. A few typos in commit
> messages and the actual code changes and a few larger changes in the test code
> & infrastructure. Mostly as part of allowing the aforementioned testing
> (read_buffers() now only waits at the end, to make some of the tests
> possible), but also just making the modified code a bit cleaner.


Review of 0004:

I noticed I use the word "backend" _a lot_ in this commit message.
Here is an attempt at using it less:

aio: Don't wait for already in-progress IO

When a backend attempts to start a read IO and finds the first buffer
already has I/O in progress, previously it waited for that I/O to
complete before initiating reads for any of the subsequent buffers.

Although it must wait for the I/O to finish when acquiring the buffer,
there's no reason to wait when setting up the read operation. Waiting at
this point prevents starting I/O on subsequent buffers and can
significantly reduce concurrency.

This matters in two workloads: when multiple backends scan the same
relation concurrently, and when a single backend requests the same block
multiple times within the readahead distance.

Waiting each time an in-progress read is encountered effectively
degenerates the access pattern into synchronous I/O.

To fix this, when encountering an already in-progress IO for the head
buffer, the wait reference is now recorded and waiting is deferred until
WaitReadBuffers(), when the buffer actually needs to be acquired.

In rare cases, it may still be necessary to wait synchronously at IO
start time: if another backend has set BM_IO_IN_PROGRESS on the buffer
but has not yet set the wait reference. Such windows should be brief and
uncommon.

Also, I'd say the tests (and the original prototype) qualify you for
co-authorship of the patch, but you do you.

Code and test stuff:

in query_wait_block()

-   $node->poll_query_until('postgres',
-       qq(SELECT wait_event FROM pg_stat_activity WHERE pid = $pid),
-       $waitfor);
+   my $waitquery;
+   if ($wait_current_session)
+   {
+       $waitquery =
+         qq(SELECT wait_event FROM pg_stat_activity WHERE pid = $pid);
+   }
+   else
+   {
+       $waitquery =
+         qq(SELECT wait_event FROM pg_stat_activity WHERE wait_event
= '$waitfor');
+   }
+
+   note "polling for completion with $waitquery";
+   $node->poll_query_until('postgres', $waitquery, $waitfor);

I guess you need WHERE wait_event = $waitfor to keep from getting more
than one row returned and then failing to parse properly. But I got
tripped up on this thinking won't poll_query_until() already give you
that?

I started thinking maybe wait_current_session should be a hash so we
can pass it with a name and it will make the query_wait_block() call
sites less inscrutable, but maybe that's over-engineering.

# Because no IO wref was assigned, block 2 should not report foreign IO
pump_until($psql_a->{run}, $psql_a->{timeout}, \$psql_a->{stdout},
        qr/0\|1\|t\|f\|2\n2\|3\|t\|f\|3/);

you mean block 3

# Because no IO wref was assigned, block 2 should not report foreign IO
pump_until($psql_a->{run}, $psql_a->{timeout}, \$psql_a->{stdout},

should say block 3

# Tests for StartReadBuffers() that dependent on injection point support
s/dependent/depend

You could change the first test
# Test if a read buffers encounters AIO in progress by another backend, it
# recognizes that other IO as a foreign IO.

To have 0 as the foreign IO (which is a slightly different code path
than non-head blocks being the foreign IO) and then you still
basically have coverage of a non-head block being a foreign IO in the
following test case that looks for multiple contiguous blocks being
foreign IO.

In test_read_buffers_inject:
# recognizes that other IO as a foreign IO. This time we encounter the
# foreign IO multiple times.

I find "foriegn IO multiple times" hard to parse. I prefer something
like "multiple
buffers undergoing foreign IO"

# B: Read block 2 and wait for the completion hook to be reached (which could
# be in B itself or in an IO worker)

should say blocks 2-3

I wonder if it is also worth testing a failed foreign IO (i.e.
operation->foreign_io && !(buf_state & BM_VALID) in
WaitReadBuffers()). I don't know that it is much different than the
other failed IO_IN_PROGRESS cases you are already testing.

Otherwise, I think we're ready to go!

- Melanie

Re: Don't synchronously wait for already-in-progress IO in read stream

Reply via email to