Hello all, thank you for the responses, I ran 3 runs per each commit using 5 iteration of fio-nbd using
f9fc8932b11f3bcf2a2626f567cb6fdd36a33a94 f9fc8932b11f3bcf2a2626f567cb6fdd36a33a94 + Stefan's commit d7482ffe9756919531307330fd1c6dbec66e8c32 using the regressed f9fc8932b11f3bcf2a2626f567cb6fdd36a33a94 as a base-line the relative percentage results were: f9f | 0.0 | -2.8 | 0.6 stefan | -3.1 | -1.2 | -2.2 d74 | 7.2 | 9.1 | 8.2 Not sure whether the Stefan's commit was suppose to be applied on top of the f9fc893b commit but at least for fio-nbd 4k writes it slightly worsen the situation. Do you want me to try the fio inside guest as well, or is this fio-nbd check sufficient for now? Also let me briefly share the details about the execution: --- mkdir -p /var/lib/runperf/runperf-nbd/ truncate -s 256M /var/lib/runperf/runperf-nbd//disk.img nohup qemu-nbd -t -k /var/lib/runperf/runperf-nbd//socket -f raw /var/lib/runperf/runperf-nbd//disk.img &> $(mktemp /var/lib/runperf/runperf-nbd//qemu_nbd_XXXX.log) & echo $! >> /var/lib/runperf/runperf-nbd//kill_pids for PID in $(cat /var/lib/runperf/runperf-nbd//kill_pids); do disown -h $PID; done export TERM=xterm-256color true mkdir -p /var/lib/runperf/runperf-nbd/ cat > /var/lib/runperf/runperf-nbd/nbd.fio << \Gr1UaS # To use fio to test nbdkit: # # nbdkit -U - memory size=256M --run 'export unixsocket; fio examples/nbd.fio' # # To use fio to test qemu-nbd: # # rm -f /tmp/disk.img /tmp/socket # truncate -s 256M /tmp/disk.img # export target=/tmp/socket # qemu-nbd -t -k $target -f raw /tmp/disk.img & # fio examples/nbd.fio # killall qemu-nbd [global] bs = $@ runtime = 30 ioengine = nbd iodepth = 32 direct = 1 sync = 0 time_based = 1 clocksource = gettimeofday ramp_time = 5 write_bw_log = fio write_iops_log = fio write_lat_log = fio log_avg_msec = 1000 write_hist_log = fio log_hist_msec = 10000 # log_hist_coarseness = 4 # 76 bins rw = $@ uri=nbd+unix:///?socket=/var/lib/runperf/runperf-nbd/socket # Starting from nbdkit 1.14 the following will work: #uri=${uri} [job0] offset=0 [job1] offset=64m [job2] offset=128m [job3] offset=192m Gr1UaS benchmark_bin=/usr/local/bin/fio pbench-fio --block-sizes=4 --job-file=/var/lib/runperf/runperf-nbd/nbd.fio --numjobs=4 --runtime=60 --samples=5 --test-types=write --clients=$WORKER_IP --- I am using pbench to run the execution, but you can simply replace the "$@" variables in the produced "/var/lib/runperf/runperf-nbd/nbd.fio" and run it directly using fio. Regards, Lukáš Dne 05. 05. 22 v 15:27 Paolo Bonzini napsal(a): > On 5/5/22 14:44, Daniel P. Berrangé wrote: >>> util/thread-pool.c uses qemu_sem_*() to notify worker threads when work >>> becomes available. It makes sense that this operation is >>> performance-critical and that's why the benchmark regressed. >> >> Doh, I questioned whether the change would have a performance impact, >> and it wasn't thought to be used in perf critical places > > The expectation was that there would be no contention and thus no overhead > because of the pool->lock that exists anyway, but that was optimistic. > > Lukáš, can you run a benchmark with this condvar implementation that was > suggested by Stefan: > > https://lore.kernel.org/qemu-devel/20220505131346.823941-1-pbonz...@redhat.com/raw > > ? > > If it still regresses, we can either revert the patch or look at a different > implementation (even getting rid of the global queue is an option). > > Thanks, > > Paolo >
OpenPGP_0x26B362E47FCF22C1.asc
Description: OpenPGP public key
OpenPGP_signature
Description: OpenPGP digital signature