On 9/16/20 5:07 PM, Denis V. Lunev wrote: > On 9/16/20 4:32 PM, Stefan Hajnoczi wrote: >> On Thu, Aug 27, 2020 at 3:24 PM Stefan Hajnoczi <stefa...@gmail.com> wrote: >>> Hi Denis, >>> A performance regression was found after the virtio-blk queue-size >>> property was increased from 128 to 256 in QEMU 5.0 in commit >>> c9b7d9ec21dfca716f0bb3b68dee75660d86629c ("virtio: increase virtqueue >>> size for virtio-scsi and virtio-blk"). I wanted to let you know if case >>> you have ideas or see something similar. >> Ping, have you noticed performance regressions after switching to >> virtio-blk queue-size 256? > oops, I have missed original letter. > > Denis Plotnikov have left the team at the moment. > > >>> Throughput and IOPS of the following fio benchmarks dropped by 30-40%: >>> >>> # mkfs.xfs /dev/vdb >>> # mount /dev/vdb /mnt >>> # fio --rw=%s --bs=%s --iodepth=64 --runtime=1m --direct=1 >>> --filename=/mnt/%s --name=job1 --ioengine=libaio --thread --group_reporting >>> --numjobs=16 --size=512MB --time_based --output=/tmp/fio_result &> /dev/null >>> - rw: read write >>> - bs: 4k 64k >>> >>> Note that there are 16 threads submitting 64 requests each! The guest >>> block device queue depth will be maxed out. The virtqueue should be full >>> most of the time. >>> >>> Have you seen regressions after virtio-blk queue-size was increased in >>> QEMU 5.0? >>> >>> Here are the details of the host storage: >>> >>> # mkfs.xfs /dev/sdb # 60GB SSD drive >>> # mount /dev/sdb /mnt/test >>> # qemu-img create -f qcow2 /mnt/test/storage2.qcow2 40G >>> >>> The guest command-line is: >>> >>> # MALLOC_PERTURB_=1 numactl \ >>> -m 1 /usr/libexec/qemu-kvm \ >>> -S \ >>> -name 'avocado-vt-vm1' \ >>> -sandbox on \ >>> -machine q35 \ >>> -device >>> pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 >>> \ >>> -device >>> pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0 \ >>> -nodefaults \ >>> -device VGA,bus=pcie.0,addr=0x2 \ >>> -m 4096 \ >>> -smp 2,maxcpus=2,cores=1,threads=1,dies=1,sockets=2 \ >>> -cpu 'IvyBridge',+kvm_pv_unhalt \ >>> -chardev >>> socket,server,id=qmp_id_qmpmonitor1,nowait,path=/var/tmp/avocado_bapfdqao/monitor-qmpmonitor1-20200721-014154-5HJGMjxW >>> \ >>> -mon chardev=qmp_id_qmpmonitor1,mode=control \ >>> -chardev >>> socket,server,id=qmp_id_catch_monitor,nowait,path=/var/tmp/avocado_bapfdqao/monitor-catch_monitor-20200721-014154-5HJGMjxW >>> \ >>> -mon chardev=qmp_id_catch_monitor,mode=control \ >>> -device pvpanic,ioport=0x505,id=id31BN83 \ >>> -chardev >>> socket,server,id=chardev_serial0,nowait,path=/var/tmp/avocado_bapfdqao/serial-serial0-20200721-014154-5HJGMjxW >>> \ >>> -device isa-serial,id=serial0,chardev=chardev_serial0 \ >>> -chardev >>> socket,id=seabioslog_id_20200721-014154-5HJGMjxW,path=/var/tmp/avocado_bapfdqao/seabios-20200721-014154-5HJGMjxW,server,nowait >>> \ >>> -device >>> isa-debugcon,chardev=seabioslog_id_20200721-014154-5HJGMjxW,iobase=0x402 \ >>> -device >>> pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 >>> \ >>> -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \ >>> -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ >>> -blockdev >>> node-name=file_image1,driver=file,aio=threads,filename=rootfs.qcow2,cache.direct=on,cache.no-flush=off >>> \ >>> -blockdev >>> node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 >>> \ >>> -device >>> pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 >>> \ >>> -device >>> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,bus=pcie-root-port-2,addr=0x0 >>> \ >>> -blockdev >>> node-name=file_disk1,driver=file,aio=threads,filename=/mnt/test/storage2.qcow2,cache.direct=on,cache.no-flush=off >>> \ >>> -blockdev >>> node-name=drive_disk1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_disk1 >>> \ >>> -device >>> pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 >>> \ >>> -device >>> virtio-blk-pci,id=disk1,drive=drive_disk1,bootindex=1,write-cache=on,bus=pcie-root-port-3,addr=0x0 >>> \ >>> -device >>> pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 >>> \ >>> -device >>> virtio-net-pci,mac=9a:37:37:37:37:4e,id=idBMd7vy,netdev=idLb51aS,bus=pcie-root-port-4,addr=0x0 >>> \ >>> -netdev tap,id=idLb51aS,fd=14 \ >>> -vnc :0 \ >>> -rtc base=utc,clock=host,driftfix=slew \ >>> -boot menu=off,order=cdn,once=c,strict=off \ >>> -enable-kvm \ >>> -device >>> pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=6 > I will make a check today. > > Talking about our performance measurements, we have not > seen ANY performance degradation, especially 30-40%. > This looking quite strange to me. > > Though there is quite important difference. We are always > using O_DIRECT and 'native' AIO engine. > > Den
I have put my hands into this and it looks like you are right. There is a difference. It is not as significant for me as in your case, but I observe stable around 10% difference with 128 vs 256 queue size. I have checked with: - QEMU 5.1 - Fedora 31 in guest - qcow2 (64k, 1Mb) and raw image on host - nocache and both threaded/native IO modes The test was run on Thinkpad Carbon X1 gen 6 laptop. For the reference, I have seen 330k IOPS for read at max which is looking awesome for native and 220k IOPS for threads. Den