Re: qemu-nbd performance regression in bd2cd4a4

Eric Blake Thu, 06 Apr 2023 08:08:22 -0700

On Thu, Apr 06, 2023 at 12:55:38PM +0200, Lukáš Doktor wrote:
> Hello Florian, folks,
> 
> my CI caught ~5% regression (in 60s runs, when using 240s it was about 10%) 
> in qemu-nbd performance bisected multiple-times up to 
> bd2cd4a441ded163b62371790876f28a9b834317 in fio when using 4k blocks read. 
> Note that other scenarios (reads using 1024k blocks, writes using 4 nor 1024k 
> blocks) were not affected. Is this expected?


Large operations (1024k blocks) are dominated by the transaction
itself, and not the network overhead.  Small operations (4k reads)
used to benefit from TCP batching (introduces latency, but less
network overhead), but we intentionally started corking things
(decreases latency, but now the network is prone to send smaller
packets which means more network overhead).  So a slight decrease in
performance for only small size traffic is not surprising.  I'm not
sure if anything can be done about it in the short term, because the
benefits in the other direction (magnitude order of improvement for
TLS traffic) by being transactional instead of batching outweigh the
network overhead of small transactions, and most clients are going to
do more than just minimum-size reads.

However, commit bd2cd4a44 does mention a potential future optimization
of not uncorking if there is an easy way to detect if another reply in
the queue will be sent shortly.  Also, distinct actions for corking
and uncorking costs extra system calls; it may be possible to utilize
MSG_MORE on the existing data syscall paths instead of having to
separately cork/uncork, which in turn could still mark message
transaction boundaries with less overhead.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: qemu-nbd performance regression in bd2cd4a4

Reply via email to