Hi Jaehoon,
Following the call earlier this week I ran a single fio job to get a
clearer picture of:
1. The QEMU 10.0.0 regression that prompted you to optimize AioContext
   polling.
2. How the poll-weight parameter affects IOPS.

run      rw        bs   numjobs iothreads iops   diff
v9.2.0   randread  8k   1       1         174944 3.6%
v10.0.0  randread  8k   1       1         174285 3.2%
baseline randread  8k   1       1         168908 0.0%
w2       randread  8k   1       1         163718 -3.1%
w3       randread  8k   1       1         165805 -1.8%
w4       randread  8k   1       1         167388 -0.9%

This time I only ran randread bs=8k iodepth=8 numjobs=1 with a single
IOThread.

Observations:

- There might be an IOPS regression between v10.0.0 and the baseline
  (9ad7f544c696) that your patches apply on top of. This is different
  from the CPU utilization regression that you found in v9.2.0 ->
  v10.0.0. I will bisect it.

- poll-weight=3 and 4 improve IOPS to a level that is acceptable. CPU
  utilization looks like this:

run         %usr     %nice      %sys   %iowait    %steal      %irq     %soft    
%guest    %gnice     %idle
baseline   49.37      0.00     31.10      0.00      0.00     11.61      0.04    
  0.00      0.00      7.89
w2         46.24      0.00     32.61      0.00      0.00     11.84      0.10    
  0.00      0.00      9.21
w3         48.04      0.00     32.17      0.00      0.00     11.98      0.08    
  0.00      0.00      7.73
w4         48.56      0.00     31.23      0.00      0.00     11.48      0.03    
  0.00      0.00      8.69

poll-weight=2 is the winner at CPU utilization. I'm not sure if
poll-weight=3 will produce an acceptable CPU utilization improvement for
you. Do you have data or want to re-run to measure poll-weight=3?

Stefan

Attachment: signature.asc
Description: PGP signature

Reply via email to