Hi,

On 2026-03-10 19:28:29 +0900, Michael Paquier wrote:
> On Tue, Mar 10, 2026 at 02:06:12PM +0800, Xuneng Zhou wrote:
> > Here’s v5 of the patchset. The wal_logging_large patch has been
> > removed, as no performance gains were observed in the benchmark runs.
>
> Looking at the numbers you are posting, it is harder to get excited
> about the hash, gin, bloom_vacuum and wal_logging.

It's perhaps worth emphasizing that, to allow real world usage of direct IO,
we'll need streaming implementation for most of these. Also, on windows the OS
provided readahead is ... not aggressive, so you'll hit IO stalls much more
frequently than you'd on linux (and some of the BSDs).

It might be a good idea to run the benchmarks with debug_io_direct=data.
That'll make them very slow, since the write side doesn't yet use AIO and thus
will do a lot of synchronous writes, but it should still allow to evaluate the
gains from using read stream.


The other thing that's kinda important to evaluate read streams is to test on
higher latency storage, even without direct IO.  Many workloads are not at all
benefiting from AIO when run on a local NVMe SSD with < 10us latency, but are
severely IO bound when run on a cloud storage disk with 0.5ms - 4ms latency.


To be able to test such higher latencies locally, I've found it quite useful
to use dm_delay above a fast disk. See [1].


> The worker method seems more efficient, may show that we are out of noise
> level.

I think that's more likely to show that memory bandwidth, probably due to
checksum computations, is a factor. The memory copy (from the kernel page
cache, with buffered IO) and the checksum computations (when checksums are
enabled) are parallelized by worker, but not by io_uring.


Greetings,

Andres Freund


[1]

  https://docs.kernel.org/admin-guide/device-mapper/delay.html

  Assuming /dev/md0 is mounted to /srv, and a delay of 1ms should be
  introduced for it:

  umount /srv && dmsetup create delayed --table "0 $(blockdev --getsz /dev/md0) 
delay /dev/md0 0 1" /dev/md0  && mount /dev/mapper/delayed /srv/

  To update the amount of delay to 3ms the following can be used:
  dmsetup suspend delayed && dmsetup reload delayed --table "0 $(blockdev 
--getsz /dev/md0) delay /dev/md0 0 3" /dev/md0 && dmsetup resume delayed

  (I will often just update the delay to 0 for comparison runs, as that
  doesn't require remounting)


Reply via email to