On Fri, Feb 13, 2026 at 09:13:29AM -0600, JAEHOON KIM wrote:
> On 2/12/2026 12:53 PM, Stefan Hajnoczi wrote:
> > On Fri, Feb 06, 2026 at 12:50:38AM -0600, JAEHOON KIM wrote:
> > > On 2/3/2026 3:12 PM, Stefan Hajnoczi wrote:
> > > > On Fri, Jan 23, 2026 at 01:15:04PM -0600, JAEHOON KIM wrote:
> > > > > On 1/19/2026 12:16 PM, Stefan Hajnoczi wrote:
> > > > > > On Tue, Jan 13, 2026 at 11:48:21AM -0600, Jaehoon Kim wrote:
> > > > > > > We evaluated the patches on an s390x host with a single guest 
> > > > > > > using 16
> > > > > > > virtio block devices backed by FCP multipath devices in a 
> > > > > > > separate-disk
> > > > > > > setup, with the I/O scheduler set to 'none' in both host and 
> > > > > > > guest.
> > > > > > > 
> > > > > > > The fio workload included sequential and random read/write with 
> > > > > > > varying
> > > > > > > numbers of jobs (1,4,8,16) and io_depth of 8. The tests were 
> > > > > > > conducted
> > > > > > > with single and dual iothreads, using the newly introduced 
> > > > > > > poll-weight
> > > > > > > parameter to measure their impact on CPU cost and throughput.
> > > > > > > 
> > > > > > > Compared to the baseline, across four FIO workload patterns 
> > > > > > > (sequential
> > > > > > > R/W, random R/W), and averaged over FIO job counts of 1, 4, 8, 
> > > > > > > and 16,
> > > > > > > throughput decreased slightly (-3% to -8% for one iothread, -2% 
> > > > > > > to -5%
> > > > > > > for two iothreads), while CPU usage on the s390x host dropped
> > > > > > > significantly (-10% to -25% and -7% to -12%, respectively).
> > > > > > Hi Jaehoon,
> > > > > > I would like to run the same fio benchmarks on a local NVMe drive 
> > > > > > (<10us
> > > > > > request latency) to see how that type of hardware configuration is
> > > > > > affected. Are the scripts and fio job files available somewhere?
> > > > > > 
> > > > > > Thanks,
> > > > > > Stefan
> > > > > Thank you for your reply.
> > > > > The fio scripts are not available in a location you can access, but 
> > > > > there is nothing particularly special in the settings.
> > > > > I’m sharing below the methodology and test setup used by our 
> > > > > performance team.
> > > > > 
> > > > > Guest Setup
> > > > > ----------------------
> > > > > - 12 vCPUs, 4 GiB memory
> > > > > - 16 virtio disks based on the FCP multipath devices in the host
> > > > > 
> > > > > FIO test parameters
> > > > > -----------------------
> > > > > - FIO Version: fio-3.33
> > > > > - Filesize: 2G
> > > > > - Blocksize: 8K / 128K
> > > > > - Direct I/O: 1
> > > > > - FIO I/O Engine: libaio
> > > > > - NUMJOB List: 1, 4, 8, 16
> > > > > - IODEPTH: 8
> > > > > - Runtime (s): 150
> > > > > 
> > > > > Two FIO samples for random read
> > > > > --------------------------------
> > > > > fio --direct=1 --name=test --numjobs=16 
> > > > > --filename=base.0.0:base.1.0:base.2.0:base.3.0:base.4.0:base.5.0:base.6.0:base.7.0:base.8.0:base.9.0:base.10.0:base.11.0:base.12.0:base.13.0:base.14.0:base.15.0
> > > > >  --size=32G  --time_based --runtime=4m --readwrite=randread 
> > > > > --ioengine=libaio --iodepth=8 --bs=8k
> > > > > fio --direct=1 --name=test --numjobs=4  
> > > > > --filename=subw1/base.0.0:subw4/base.3.0:subw8/base.7.0:subw12/base.11.0:subw16/base.15.0
> > > > >                                                                       
> > > > >   --size=8G   --time_based --runtime=4m --readwrite=randread 
> > > > > --ioengine=libaio --iodepth=8 --bs=8k
> > > > > 
> > > > > 
> > > > > additional notes
> > > > > ----------------
> > > > > - Each file is placed on a separate disk device mounted under subw<n> 
> > > > > as specified in --filename=....
> > > > > - We execute one warmup run, then two measurement runs and calculate 
> > > > > the average
> > > > Hi Jaehoon,
> > > > I ran fio benchmarks on an Intel Optane SSD DC P4800X Series drive (<10
> > > > microsecond latency). This is with just 1 drive.
> > > > 
> > > > The 8 KiB block size results show something similar to what you
> > > > reported: there are IOPS (or throughput) regressions and CPU utilization
> > > > improvements.
> > > > 
> > > > Although the CPU improvements are welcome, I think the default behavior
> > > > should only be changed if the IOPS regressions can be brought below 5%.
> > > > 
> > > > The regressions seem to happen regardless of whether 1 or 2 IOThreads
> > > > are configured. CPU utilization is different (98% vs 78%) depending on
> > > > the number of IOThreads, so the regressions happen across a range of CPU
> > > > utilizations.
> > > > 
> > > > The 128 KiB block size results are not interesting because the drive
> > > > already saturates at numjobs=1. This is expected since the drive cannot
> > > > go much above ~2 GiB/s throughput.
> > > > 
> > > > You can find the Ansible playbook, libvirt domain XML, fio
> > > > command-lines, and the fio/sar data here:
> > > > 
> > > > https://gitlab.com/stefanha/virt-playbooks/-/tree/aio-polling-efficiency
> > > > 
> > > > Please let me know if you'd like me to rerun the benchmark with new
> > > > patches or a configuration change.
> > > > 
> > > > Do you want to have a video call to discuss your work and how to get the
> > > > patches merged?
> > > > 
> > > > Host
> > > > ----
> > > > CPU: Intel Xeon Silver 4214 CPU @ 2.20GHz
> > > > RAM: 32 GiB
> > > > 
> > > > Guest
> > > > -----
> > > > vCPUs: 8
> > > > RAM: 4 GiB
> > > > Disk: 1 virtio-blk aio=native cache=none
> > > > 
> > > > IOPS
> > > > ----
> > > > rw        bs   numjobs iothreads iops   diff
> > > > randread  8k   1       1         163417 -7.8%
> > > > randread  8k   1       2         165041 -2.4%
> > > > randread  8k   4       1         221508 -0.64%
> > > > randread  8k   4       2         251298 0.008%
> > > > randread  8k   8       1         222128 -0.51%
> > > > randread  8k   8       2         249489 -2.6%
> > > > randread  8k   16      1         230535 -0.18%
> > > > randread  8k   16      2         246732 -0.22%
> > > > randread  128k 1       1          17616 -0.11%
> > > > randread  128k 1       2          17678 0.027%
> > > > randread  128k 4       1          17536 -0.27%
> > > > randread  128k 4       2          17610 -0.031%
> > > > randread  128k 8       1          17369 -0.42%
> > > > randread  128k 8       2          17433 -0.071%
> > > > randread  128k 16      1          17215 -0.61%
> > > > randread  128k 16      2          17269 -0.22%
> > > > randwrite 8k   1       1         156597 -3.1%
> > > > randwrite 8k   1       2         157720 -3.8%
> > > > randwrite 8k   4       1         218448 -0.5%
> > > > randwrite 8k   4       2         247075 -5.1%
> > > > randwrite 8k   8       1         220866 -0.75%
> > > > randwrite 8k   8       2         260935 -0.011%
> > > > randwrite 8k   16      1         230913 0.23%
> > > > randwrite 8k   16      2         261125 -0.01%
> > > > randwrite 128k 1       1          16009 0.094%
> > > > randwrite 128k 1       2          16070 0.035%
> > > > randwrite 128k 4       1          16073 -0.62%
> > > > randwrite 128k 4       2          16131 0.059%
> > > > randwrite 128k 8       1          16106 0.092%
> > > > randwrite 128k 8       2          16153 0.048%
> > > > randwrite 128k 16      1          16102 -0.0091%
> > > > randwrite 128k 16      2          16160 0.048%
> > > > 
> > > > IOThread CPU usage
> > > > ------------------
> > > > iothreads before  after
> > > > 1         98.7    95.81
> > > > 2         78.43   66.13
> > > > 
> > > > Stefan
> > > Hello Stefan,
> > > 
> > > Thank you very much for your effort in running these benchmarks.
> > > The results show a pattern very similar to what our performance team
> > > observed.
> > > 
> > > I fully agree with the 5% threshold for the default behavior.
> > > However, we need an approach that balances the current performance
> > > oriented polling scheme with CPU efficiency.
> > > 
> > > I found that relying on grow/shrink parameters was too limited to
> > > achieve these results. This is why I've adjusted the process using a
> > > weight-based grow/shrink approach to ensure the polling window remains
> > > robust against jitter. Specifically, it avoids abrupt resets to zero
> > > by implementing a gradual shrink rather than an immediate reset, even
> > > when device latency exceeds the threshold.
> > > 
> > > As seen in both your results and our team's measurements, this may lead
> > > to a bit of a performance trade-off, but it provides a reasonable
> > > balance for CPU-sensitive environment.
> > > 
> > > Thank you for suggesting the video call and I am also looking forward to
> > > hearing your thoughts. I'm on US Central Time. Except for Tuesday, I can
> > > adjust my schedule to a time that works for you.
> > > 
> > > Please let me know your preferred time.
> > Is Monday, February 16th at 10:00am CST good for you? If not, please
> > feel free to pick any time on Monday.
> > 
> > Meeting link: https://meet.jit.si/AioPollingOptimization
> > 
> > Anyone else interested in this topic is welcome to join.
> > 
> > Thanks,
> > Stefan
> 
> Thank you for the invite, Stefan.
> Monday at 10:00 AM CST works well for me.
> I'll make sure to be there and look forward to the discussion. See you then!

Great, talk to you soon!

Stefan

Attachment: signature.asc
Description: PGP signature

Reply via email to