On Fri, Feb 13, 2026 at 09:13:29AM -0600, JAEHOON KIM wrote: > On 2/12/2026 12:53 PM, Stefan Hajnoczi wrote: > > On Fri, Feb 06, 2026 at 12:50:38AM -0600, JAEHOON KIM wrote: > > > On 2/3/2026 3:12 PM, Stefan Hajnoczi wrote: > > > > On Fri, Jan 23, 2026 at 01:15:04PM -0600, JAEHOON KIM wrote: > > > > > On 1/19/2026 12:16 PM, Stefan Hajnoczi wrote: > > > > > > On Tue, Jan 13, 2026 at 11:48:21AM -0600, Jaehoon Kim wrote: > > > > > > > We evaluated the patches on an s390x host with a single guest > > > > > > > using 16 > > > > > > > virtio block devices backed by FCP multipath devices in a > > > > > > > separate-disk > > > > > > > setup, with the I/O scheduler set to 'none' in both host and > > > > > > > guest. > > > > > > > > > > > > > > The fio workload included sequential and random read/write with > > > > > > > varying > > > > > > > numbers of jobs (1,4,8,16) and io_depth of 8. The tests were > > > > > > > conducted > > > > > > > with single and dual iothreads, using the newly introduced > > > > > > > poll-weight > > > > > > > parameter to measure their impact on CPU cost and throughput. > > > > > > > > > > > > > > Compared to the baseline, across four FIO workload patterns > > > > > > > (sequential > > > > > > > R/W, random R/W), and averaged over FIO job counts of 1, 4, 8, > > > > > > > and 16, > > > > > > > throughput decreased slightly (-3% to -8% for one iothread, -2% > > > > > > > to -5% > > > > > > > for two iothreads), while CPU usage on the s390x host dropped > > > > > > > significantly (-10% to -25% and -7% to -12%, respectively). > > > > > > Hi Jaehoon, > > > > > > I would like to run the same fio benchmarks on a local NVMe drive > > > > > > (<10us > > > > > > request latency) to see how that type of hardware configuration is > > > > > > affected. Are the scripts and fio job files available somewhere? > > > > > > > > > > > > Thanks, > > > > > > Stefan > > > > > Thank you for your reply. > > > > > The fio scripts are not available in a location you can access, but > > > > > there is nothing particularly special in the settings. > > > > > I’m sharing below the methodology and test setup used by our > > > > > performance team. > > > > > > > > > > Guest Setup > > > > > ---------------------- > > > > > - 12 vCPUs, 4 GiB memory > > > > > - 16 virtio disks based on the FCP multipath devices in the host > > > > > > > > > > FIO test parameters > > > > > ----------------------- > > > > > - FIO Version: fio-3.33 > > > > > - Filesize: 2G > > > > > - Blocksize: 8K / 128K > > > > > - Direct I/O: 1 > > > > > - FIO I/O Engine: libaio > > > > > - NUMJOB List: 1, 4, 8, 16 > > > > > - IODEPTH: 8 > > > > > - Runtime (s): 150 > > > > > > > > > > Two FIO samples for random read > > > > > -------------------------------- > > > > > fio --direct=1 --name=test --numjobs=16 > > > > > --filename=base.0.0:base.1.0:base.2.0:base.3.0:base.4.0:base.5.0:base.6.0:base.7.0:base.8.0:base.9.0:base.10.0:base.11.0:base.12.0:base.13.0:base.14.0:base.15.0 > > > > > --size=32G --time_based --runtime=4m --readwrite=randread > > > > > --ioengine=libaio --iodepth=8 --bs=8k > > > > > fio --direct=1 --name=test --numjobs=4 > > > > > --filename=subw1/base.0.0:subw4/base.3.0:subw8/base.7.0:subw12/base.11.0:subw16/base.15.0 > > > > > > > > > > --size=8G --time_based --runtime=4m --readwrite=randread > > > > > --ioengine=libaio --iodepth=8 --bs=8k > > > > > > > > > > > > > > > additional notes > > > > > ---------------- > > > > > - Each file is placed on a separate disk device mounted under subw<n> > > > > > as specified in --filename=.... > > > > > - We execute one warmup run, then two measurement runs and calculate > > > > > the average > > > > Hi Jaehoon, > > > > I ran fio benchmarks on an Intel Optane SSD DC P4800X Series drive (<10 > > > > microsecond latency). This is with just 1 drive. > > > > > > > > The 8 KiB block size results show something similar to what you > > > > reported: there are IOPS (or throughput) regressions and CPU utilization > > > > improvements. > > > > > > > > Although the CPU improvements are welcome, I think the default behavior > > > > should only be changed if the IOPS regressions can be brought below 5%. > > > > > > > > The regressions seem to happen regardless of whether 1 or 2 IOThreads > > > > are configured. CPU utilization is different (98% vs 78%) depending on > > > > the number of IOThreads, so the regressions happen across a range of CPU > > > > utilizations. > > > > > > > > The 128 KiB block size results are not interesting because the drive > > > > already saturates at numjobs=1. This is expected since the drive cannot > > > > go much above ~2 GiB/s throughput. > > > > > > > > You can find the Ansible playbook, libvirt domain XML, fio > > > > command-lines, and the fio/sar data here: > > > > > > > > https://gitlab.com/stefanha/virt-playbooks/-/tree/aio-polling-efficiency > > > > > > > > Please let me know if you'd like me to rerun the benchmark with new > > > > patches or a configuration change. > > > > > > > > Do you want to have a video call to discuss your work and how to get the > > > > patches merged? > > > > > > > > Host > > > > ---- > > > > CPU: Intel Xeon Silver 4214 CPU @ 2.20GHz > > > > RAM: 32 GiB > > > > > > > > Guest > > > > ----- > > > > vCPUs: 8 > > > > RAM: 4 GiB > > > > Disk: 1 virtio-blk aio=native cache=none > > > > > > > > IOPS > > > > ---- > > > > rw bs numjobs iothreads iops diff > > > > randread 8k 1 1 163417 -7.8% > > > > randread 8k 1 2 165041 -2.4% > > > > randread 8k 4 1 221508 -0.64% > > > > randread 8k 4 2 251298 0.008% > > > > randread 8k 8 1 222128 -0.51% > > > > randread 8k 8 2 249489 -2.6% > > > > randread 8k 16 1 230535 -0.18% > > > > randread 8k 16 2 246732 -0.22% > > > > randread 128k 1 1 17616 -0.11% > > > > randread 128k 1 2 17678 0.027% > > > > randread 128k 4 1 17536 -0.27% > > > > randread 128k 4 2 17610 -0.031% > > > > randread 128k 8 1 17369 -0.42% > > > > randread 128k 8 2 17433 -0.071% > > > > randread 128k 16 1 17215 -0.61% > > > > randread 128k 16 2 17269 -0.22% > > > > randwrite 8k 1 1 156597 -3.1% > > > > randwrite 8k 1 2 157720 -3.8% > > > > randwrite 8k 4 1 218448 -0.5% > > > > randwrite 8k 4 2 247075 -5.1% > > > > randwrite 8k 8 1 220866 -0.75% > > > > randwrite 8k 8 2 260935 -0.011% > > > > randwrite 8k 16 1 230913 0.23% > > > > randwrite 8k 16 2 261125 -0.01% > > > > randwrite 128k 1 1 16009 0.094% > > > > randwrite 128k 1 2 16070 0.035% > > > > randwrite 128k 4 1 16073 -0.62% > > > > randwrite 128k 4 2 16131 0.059% > > > > randwrite 128k 8 1 16106 0.092% > > > > randwrite 128k 8 2 16153 0.048% > > > > randwrite 128k 16 1 16102 -0.0091% > > > > randwrite 128k 16 2 16160 0.048% > > > > > > > > IOThread CPU usage > > > > ------------------ > > > > iothreads before after > > > > 1 98.7 95.81 > > > > 2 78.43 66.13 > > > > > > > > Stefan > > > Hello Stefan, > > > > > > Thank you very much for your effort in running these benchmarks. > > > The results show a pattern very similar to what our performance team > > > observed. > > > > > > I fully agree with the 5% threshold for the default behavior. > > > However, we need an approach that balances the current performance > > > oriented polling scheme with CPU efficiency. > > > > > > I found that relying on grow/shrink parameters was too limited to > > > achieve these results. This is why I've adjusted the process using a > > > weight-based grow/shrink approach to ensure the polling window remains > > > robust against jitter. Specifically, it avoids abrupt resets to zero > > > by implementing a gradual shrink rather than an immediate reset, even > > > when device latency exceeds the threshold. > > > > > > As seen in both your results and our team's measurements, this may lead > > > to a bit of a performance trade-off, but it provides a reasonable > > > balance for CPU-sensitive environment. > > > > > > Thank you for suggesting the video call and I am also looking forward to > > > hearing your thoughts. I'm on US Central Time. Except for Tuesday, I can > > > adjust my schedule to a time that works for you. > > > > > > Please let me know your preferred time. > > Is Monday, February 16th at 10:00am CST good for you? If not, please > > feel free to pick any time on Monday. > > > > Meeting link: https://meet.jit.si/AioPollingOptimization > > > > Anyone else interested in this topic is welcome to join. > > > > Thanks, > > Stefan > > Thank you for the invite, Stefan. > Monday at 10:00 AM CST works well for me. > I'll make sure to be there and look forward to the discussion. See you then!
Great, talk to you soon! Stefan
signature.asc
Description: PGP signature
