Hi Jaehoon, Following the call earlier this week I ran a single fio job to get a clearer picture of: 1. The QEMU 10.0.0 regression that prompted you to optimize AioContext polling. 2. How the poll-weight parameter affects IOPS.
run rw bs numjobs iothreads iops diff v9.2.0 randread 8k 1 1 174944 3.6% v10.0.0 randread 8k 1 1 174285 3.2% baseline randread 8k 1 1 168908 0.0% w2 randread 8k 1 1 163718 -3.1% w3 randread 8k 1 1 165805 -1.8% w4 randread 8k 1 1 167388 -0.9% This time I only ran randread bs=8k iodepth=8 numjobs=1 with a single IOThread. Observations: - There might be an IOPS regression between v10.0.0 and the baseline (9ad7f544c696) that your patches apply on top of. This is different from the CPU utilization regression that you found in v9.2.0 -> v10.0.0. I will bisect it. - poll-weight=3 and 4 improve IOPS to a level that is acceptable. CPU utilization looks like this: run %usr %nice %sys %iowait %steal %irq %soft %guest %gnice %idle baseline 49.37 0.00 31.10 0.00 0.00 11.61 0.04 0.00 0.00 7.89 w2 46.24 0.00 32.61 0.00 0.00 11.84 0.10 0.00 0.00 9.21 w3 48.04 0.00 32.17 0.00 0.00 11.98 0.08 0.00 0.00 7.73 w4 48.56 0.00 31.23 0.00 0.00 11.48 0.03 0.00 0.00 8.69 poll-weight=2 is the winner at CPU utilization. I'm not sure if poll-weight=3 will produce an acceptable CPU utilization improvement for you. Do you have data or want to re-run to measure poll-weight=3? Stefan
signature.asc
Description: PGP signature
