On Thu Aug 14, 2025 at 3:41 PM EDT, Andres Freund wrote:
> Hm, that is somewhat curious.
>
> I wonder if there's some wait time that's not being captured by "I/O
> Timings". A first thing to do would be to just run strace --summary-only while
> running the query, and see if there are syscall wait times that seem too long.
For the slow, sequential backwards scan query:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.271216 4 66808 io_uring_enter
0.00 0.000004 4 1 sendto
0.00 0.000001 0 2 1 recvfrom
0.00 0.000000 0 5 lseek
0.00 0.000000 0 1 epoll_wait
0.00 0.000000 0 4 openat
------ ----------- ----------- --------- --------- ----------------
100.00 0.271221 4 66821 1 total
For the fast, random backwards scan query:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.99 0.351518 4 77819 io_uring_enter
0.00 0.000007 2 3 1 epoll_wait
0.00 0.000006 6 1 sendto
0.00 0.000003 1 3 2 recvfrom
0.00 0.000002 2 1 read
0.00 0.000002 2 1 1 rt_sigreturn
0.00 0.000002 2 1 getpid
0.00 0.000002 1 2 kill
0.00 0.000000 0 3 lseek
------ ----------- ----------- --------- --------- ----------------
100.00 0.351542 4 77834 4 total
> What effective_io_concurrency and io_max_concurrency setting are you using? If
> there are no free IO handles that's currently not nicely reported (because
> it's unclear how exactly to do so, see comment above pgaio_io_acquire_nb()).
effective_io_concurrency is 100. io_max_concurrency is 64. Nothing out of
the ordinary there.
> iostat -xmy /path/to/block/device
>
> I'd like to see the difference in average IO size (rareq-sz), queue depth
> (aqu-sz) and completion time (r_await) between the fast and slow cases.
I'll show one second interval output.
Slow, sequential backwards scan query
-------------------------------------
Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s
wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm
d_await dareq-sz f/s f_await aqu-sz %util
nvme0n1 24613.00 192.29 0.00 0.00 0.20 8.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 4.92 53.20
avg-cpu: %user %nice %system %iowait %steal %idle
0.22 0.00 0.44 0.85 0.00 98.50
Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s
wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm
d_await dareq-sz f/s f_await aqu-sz %util
nvme0n1 25320.00 197.81 0.00 0.00 0.20 8.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 5.18 51.20
Fast, random backwards scan query
---------------------------------
Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s
wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm
d_await dareq-sz f/s f_await aqu-sz %util
nvme0n1 27140.59 212.04 0.00 0.00 0.20 8.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 5.50 23.37
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 0.84 0.00 0.00 98.66
Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s
wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm
d_await dareq-sz f/s f_await aqu-sz %util
nvme0n1 50401.00 393.76 0.00 0.00 0.20 8.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 10.06 41.60
--
Peter Geoghegan