Hello all,

thank you for the responses, I ran 3 runs per each commit using 5 iteration of 
fio-nbd using 

f9fc8932b11f3bcf2a2626f567cb6fdd36a33a94
f9fc8932b11f3bcf2a2626f567cb6fdd36a33a94 + Stefan's commit
d7482ffe9756919531307330fd1c6dbec66e8c32

using the regressed f9fc8932b11f3bcf2a2626f567cb6fdd36a33a94 as a base-line the 
relative percentage results were:

f9f    |  0.0 | -2.8 |  0.6
stefan | -3.1 | -1.2 | -2.2
d74    |  7.2 |  9.1 |  8.2

Not sure whether the Stefan's commit was suppose to be applied on top of the 
f9fc893b commit but at least for fio-nbd 4k writes it slightly worsen the 
situation.

Do you want me to try the fio inside guest as well, or is this fio-nbd check 
sufficient for now?

Also let me briefly share the details about the execution:

---

mkdir -p /var/lib/runperf/runperf-nbd/
truncate -s 256M /var/lib/runperf/runperf-nbd//disk.img
nohup qemu-nbd -t -k /var/lib/runperf/runperf-nbd//socket -f raw 
/var/lib/runperf/runperf-nbd//disk.img &> $(mktemp 
/var/lib/runperf/runperf-nbd//qemu_nbd_XXXX.log) & echo $! >> 
/var/lib/runperf/runperf-nbd//kill_pids
for PID in $(cat /var/lib/runperf/runperf-nbd//kill_pids); do disown -h $PID; 
done
export TERM=xterm-256color
true
mkdir -p /var/lib/runperf/runperf-nbd/
cat > /var/lib/runperf/runperf-nbd/nbd.fio << \Gr1UaS
# To use fio to test nbdkit:
#
# nbdkit -U - memory size=256M --run 'export unixsocket; fio examples/nbd.fio'
#
# To use fio to test qemu-nbd:
#
# rm -f /tmp/disk.img /tmp/socket
# truncate -s 256M /tmp/disk.img
# export target=/tmp/socket
# qemu-nbd -t -k $target -f raw /tmp/disk.img &
# fio examples/nbd.fio
# killall qemu-nbd

[global]
bs = $@
runtime = 30
ioengine = nbd
iodepth = 32
direct = 1
sync = 0
time_based = 1
clocksource = gettimeofday
ramp_time = 5
write_bw_log = fio
write_iops_log = fio
write_lat_log = fio
log_avg_msec = 1000
write_hist_log = fio
log_hist_msec = 10000
# log_hist_coarseness = 4 # 76 bins

rw = $@
uri=nbd+unix:///?socket=/var/lib/runperf/runperf-nbd/socket
# Starting from nbdkit 1.14 the following will work:
#uri=${uri}

[job0]
offset=0

[job1]
offset=64m

[job2]
offset=128m

[job3]
offset=192m

Gr1UaS

benchmark_bin=/usr/local/bin/fio pbench-fio  --block-sizes=4 
--job-file=/var/lib/runperf/runperf-nbd/nbd.fio --numjobs=4 --runtime=60 
--samples=5 --test-types=write --clients=$WORKER_IP

---

I am using pbench to run the execution, but you can simply replace the "$@" 
variables in the produced "/var/lib/runperf/runperf-nbd/nbd.fio" and run it 
directly using fio.

Regards,
Lukáš


Dne 05. 05. 22 v 15:27 Paolo Bonzini napsal(a):
> On 5/5/22 14:44, Daniel P. Berrangé wrote:
>>> util/thread-pool.c uses qemu_sem_*() to notify worker threads when work
>>> becomes available. It makes sense that this operation is
>>> performance-critical and that's why the benchmark regressed.
>>
>> Doh, I questioned whether the change would have a performance impact,
>> and it wasn't thought to be used in perf critical places
> 
> The expectation was that there would be no contention and thus no overhead 
> because of the pool->lock that exists anyway, but that was optimistic.
> 
> Lukáš, can you run a benchmark with this condvar implementation that was 
> suggested by Stefan:
> 
> https://lore.kernel.org/qemu-devel/20220505131346.823941-1-pbonz...@redhat.com/raw
> 
> ?
> 
> If it still regresses, we can either revert the patch or look at a different 
> implementation (even getting rid of the global queue is an option).
> 
> Thanks,
> 
> Paolo
> 

Attachment: OpenPGP_0x26B362E47FCF22C1.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to