On Mon, Feb 07, 2022 at 11:22:14PM -0800, Elena Ufimtseva wrote: > This patchset is an RFC version for the ioregionfd implementation > in QEMU. The kernel patches are to be posted with some fixes as a v4. > > For this implementation version 3 of the posted kernel patches was user: > https://lore.kernel.org/kvm/cover.1613828726.git.eafanas...@gmail.com/ > > The future version will include support for vfio/libvfio-user. > Please refer to the design discussion here proposed by Stefan: > https://lore.kernel.org/all/YXpb1f3KicZxj1oj@stefanha-x1.localdomain/T/ > > The vfio-user version needed some bug-fixing and it was decided to send > this for multiprocess first. > > The ioregionfd is configured currently trough the command line and each > ioregionfd represent an object. This allow for easy parsing and does > not require device/remote object command line option modifications. > > The following command line can be used to specify ioregionfd: > <snip> > '-object', 'x-remote-object,id=robj1,devid=lsi0,fd='+str(remote.fileno()),\ > '-object', > 'ioregionfd-object,id=ioreg2,devid=lsi0,iofd='+str(iord.fileno())+',bar=1',\ > '-object', > 'ioregionfd-object,id=ioreg3,devid=lsi0,iofd='+str(iord.fileno())+',bar=2',\
Explicit configuration of ioregionfd-object is okay for early prototyping, but what is the plan for integrating this? I guess x-remote-object would query the remote device to find out which ioregionfds need to be registered and the user wouldn't need to specify ioregionfds on the command-line? > </snip> > > Proxy side of ioregionfd in this version uses only one file descriptor: > <snip> > '-device', > 'x-pci-proxy-dev,id=lsi0,fd='+str(proxy.fileno())+',ioregfd='+str(iowr.fileno()), > \ > </snip> This raises the question of the ioregionfd file descriptor lifecycle. In the end I think it shouldn't be specified on the command-line. Instead the remote device should create it and pass it to QEMU over the mpqemu/remote fd? > > This is done for RFC version and my though was that next version will > be for vfio-user, so I have not dedicated much effort to this command > line options. > > The multiprocess messaging protocol was extended to support inquiries > by the proxy if device has any ioregionfds. > This RFC implements inquires by proxy about the type of BAR (ioregionfd > or not) and the type of it (memory/io). > > Currently there are few limitations in this version of ioregionfd. > - one ioregionfd per bar, only full bar size is supported; > - one file descriptor per device for all of its ioregionfds; > - each remote device runs fd handler for all its BARs in one IOThread; > - proxy supports only one fd. > > Some of these limitations will be dropped in the future version. > This RFC is to acquire the feedback/suggestions from the community > on the general approach. > > The quick performance test was done for the remote lsi device with > ioregionfd and without for both mem BARs (1 and 2) with help > of the fio tool: > > Random R/W: > > read IOPS read BW write IOPS write BW > no ioregionfd 889 3559KiB/s 890 3561KiB/s > ioregionfd 938 3756KiB/s 939 3757KiB/s This is extremely slow, even for random I/O. How does this compare to QEMU running the LSI device without multi-process mode? > Sequential Read and Sequential Write: > > Sequential read Sequential write > read IOPS read BW write IOPS write BW > > no ioregionfd 367k 1434MiB/s 76k 297MiB/s > ioregionfd 374k 1459MiB/s 77.3k 302MiB/s It's normal for read and write IOPS to differ, but the read IOPS are very high. I wonder if caching and read-ahead are hiding the LSI device's actual performance here. What are the fio and QEMU command-lines? In order to benchmark ioregionfd it's best to run a benchmark where the bottleneck is MMIO/PIO dispatch. Otherwise we're looking at some other bottleneck (e.g. physical disk I/O performance) and the MMIO/PIO dispatch cost doesn't affect IOPS significantly. I suggest trying --blockdev null-co,size=64G,id=null0 as the disk instead of a file or host block device. The fio block size should be 4k to minimize the amount of time spent on I/O buffer contents and iodepth=1 because batching multiple requests with iodepth > 0 hides the MMIO/PIO dispatch bottleneck. Stefan
signature.asc
Description: PGP signature