On Thu, Apr 06, 2023 at 12:55:38PM +0200, Lukáš Doktor wrote: > Hello Florian, folks, > > my CI caught ~5% regression (in 60s runs, when using 240s it was about 10%) > in qemu-nbd performance bisected multiple-times up to > bd2cd4a441ded163b62371790876f28a9b834317 in fio when using 4k blocks read. > Note that other scenarios (reads using 1024k blocks, writes using 4 nor 1024k > blocks) were not affected. Is this expected?
Large operations (1024k blocks) are dominated by the transaction itself, and not the network overhead. Small operations (4k reads) used to benefit from TCP batching (introduces latency, but less network overhead), but we intentionally started corking things (decreases latency, but now the network is prone to send smaller packets which means more network overhead). So a slight decrease in performance for only small size traffic is not surprising. I'm not sure if anything can be done about it in the short term, because the benefits in the other direction (magnitude order of improvement for TLS traffic) by being transactional instead of batching outweigh the network overhead of small transactions, and most clients are going to do more than just minimum-size reads. However, commit bd2cd4a44 does mention a potential future optimization of not uncorking if there is an easy way to detect if another reply in the queue will be sent shortly. Also, distinct actions for corking and uncorking costs extra system calls; it may be possible to utilize MSG_MORE on the existing data syscall paths instead of having to separately cork/uncork, which in turn could still mark message transaction boundaries with less overhead. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org