On Mon, Feb 14, 2022 at 02:52:29PM +0000, Stefan Hajnoczi wrote: > On Mon, Feb 07, 2022 at 11:22:14PM -0800, Elena Ufimtseva wrote: > > This patchset is an RFC version for the ioregionfd implementation > > in QEMU. The kernel patches are to be posted with some fixes as a v4. > > > > For this implementation version 3 of the posted kernel patches was user: > > https://lore.kernel.org/kvm/cover.1613828726.git.eafanas...@gmail.com/ > > > > The future version will include support for vfio/libvfio-user. > > Please refer to the design discussion here proposed by Stefan: > > https://lore.kernel.org/all/YXpb1f3KicZxj1oj@stefanha-x1.localdomain/T/ > > > > The vfio-user version needed some bug-fixing and it was decided to send > > this for multiprocess first. > > > > The ioregionfd is configured currently trough the command line and each > > ioregionfd represent an object. This allow for easy parsing and does > > not require device/remote object command line option modifications. > > > > The following command line can be used to specify ioregionfd: > > <snip> > > '-object', > > 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\ > > '-object', > > 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1',\ > > '-object', > > 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\ >
Hi Stefan Thank you for taking a look! > Explicit configuration of ioregionfd-object is okay for early > prototyping, but what is the plan for integrating this? I guess > x-remote-object would query the remote device to find out which > ioregionfds need to be registered and the user wouldn't need to specify > ioregionfds on the command-line? Yes, this can be done. For some reason I thought that user will be able to configure the number/size of the regions to be configured as ioregionfds. > > > </snip> > > > > Proxy side of ioregionfd in this version uses only one file descriptor: > > <snip> > > '-device', > > 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()), > > \ > > </snip> > > This raises the question of the ioregionfd file descriptor lifecycle. In > the end I think it shouldn't be specified on the command-line. Instead > the remote device should create it and pass it to QEMU over the > mpqemu/remote fd? Yes, this will be same as vfio-user does. > > > > > This is done for RFC version and my though was that next version will > > be for vfio-user, so I have not dedicated much effort to this command > > line options. > > > > The multiprocess messaging protocol was extended to support inquiries > > by the proxy if device has any ioregionfds. > > This RFC implements inquires by proxy about the type of BAR (ioregionfd > > or not) and the type of it (memory/io). > > > > Currently there are few limitations in this version of ioregionfd. > > - one ioregionfd per bar, only full bar size is supported; > > - one file descriptor per device for all of its ioregionfds; > > - each remote device runs fd handler for all its BARs in one IOThread; > > - proxy supports only one fd. > > > > Some of these limitations will be dropped in the future version. > > This RFC is to acquire the feedback/suggestions from the community > > on the general approach. > > > > The quick performance test was done for the remote lsi device with > > ioregionfd and without for both mem BARs (1 and 2) with help > > of the fio tool: > > > > Random R/W: > > > > read IOPS read BW write IOPS write BW > > no ioregionfd 889 3559KiB/s 890 3561KiB/s > > ioregionfd 938 3756KiB/s 939 3757KiB/s > > This is extremely slow, even for random I/O. How does this compare to > QEMU running the LSI device without multi-process mode? These tests had the iodepth=256. I have changed this to 1 and tested without multiprocess, with multiprocess and multiprocess with both mmio regions as ioregionfds: read IOPS read BW(KiB/s) write IOPS write BW (KiB/s) no multiprocess 89 358 90 360 multiprocess 138 556 139 557 multiprocess ioregionfd 174 698 173 693 The fio config for randomrw: [global] bs=4K iodepth=1 direct=0 ioengine=libaio group_reporting time_based runtime=240 numjobs=1 name=raw-randreadwrite rw=randrw size=8G [job1] filename=/fio/randomrw And QEMU command line for non-mutliprocess: /usr/local/bin/qemu-system-x86_64 -name "OL7.4" -machine q35,accel=kvm -smp sockets=1,cores=2,threads=2 -m 2048 -hda /home/homedir/ol7u9boot.img -boot d -vnc :0 -chardev stdio,id=seabios -device isa-debugcon,iobase=0x402,chardev=seabios -device lsi53c895a,id=lsi1 -drive id=drive_image1,if=none,file=/home/homedir/10gb.qcow2 -device scsi-hd,id=drive1,drive=drive_image1,bus=lsi1.0,scsi-id=0 QEMU command line for multiprocess: remote_cmd = [ PROC_QEMU, \ '-machine', 'x-remote', \ '-device', 'lsi53c895a,id=lsi0', \ '-drive', 'id=drive_image1,file=/home/homedir/10gb.qcow2', \ '-device', 'scsi-hd,id=drive2,drive=drive_image1,bus=lsi0.0,' \ 'scsi-id=0', \ '-nographic', \ '-monitor', 'unix:/home/homedir/rem-sock,server,nowait', \ '-object', 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\ '-object', 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1,',\ '-object', 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\ ] proxy_cmd = [ PROC_QEMU, \ '-D', '/tmp/qemu-debug-log', \ '-name', 'OL7.4', \ '-machine', 'pc,accel=kvm', \ '-smp', 'sockets=1,cores=2,threads=2', \ '-m', '2048', \ '-object', 'memory-backend-memfd,id=sysmem-file,size=2G', \ '-numa', 'node,memdev=sysmem-file', \ '-hda','/home/homedir/ol7u9boot.img', \ '-boot', 'd', \ '-vnc', ':0', \ '-device', 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()), \ '-monitor', 'unix:/home/homedir/qemu-sock,server,nowait', \ '-netdev','tap,id=mynet0,ifname=tap0,script=no,downscript=no', '-device','e1000,netdev=mynet0,mac=52:55:00:d1:55:01',\ ] Where for the test without ioregionfds, they are commented out. I am doing more testing as I see some inconsistent results. > > > Sequential Read and Sequential Write: > > > > Sequential read Sequential write > > read IOPS read BW write IOPS write BW > > > > no ioregionfd 367k 1434MiB/s 76k 297MiB/s > > ioregionfd 374k 1459MiB/s 77.3k 302MiB/s > > It's normal for read and write IOPS to differ, but the read IOPS are > very high. I wonder if caching and read-ahead are hiding the LSI > device's actual performance here. > > What are the fio and QEMU command-lines? > > In order to benchmark ioregionfd it's best to run a benchmark where the > bottleneck is MMIO/PIO dispatch. Otherwise we're looking at some other > bottleneck (e.g. physical disk I/O performance) and the MMIO/PIO > dispatch cost doesn't affect IOPS significantly. > > I suggest trying --blockdev null-co,size=64G,id=null0 as the disk > instead of a file or host block device. The fio block size should be 4k > to minimize the amount of time spent on I/O buffer contents and > iodepth=1 because batching multiple requests with iodepth > 0 hides the > MMIO/PIO dispatch bottleneck. The queue depth in the tests above was 256, I will try that you have suggested. The block size is 4k. I am also looking at some other system issue that can interfere with test, will be running test on the fresh install and with settings you mentioned above. Thank you! > > Stefan