On Mon, Dec 11, 2023 at 1:30 PM Akihiko Odaki <akihiko.od...@daynix.com> wrote: > > On 2023/12/11 11:52, Jason Wang wrote: > > On Sun, Dec 10, 2023 at 12:06 PM Akihiko Odaki <akihiko.od...@daynix.com> > > wrote: > >> > >> Introduction > >> ------------ > >> > >> This series is based on the RFC series submitted by Yui Washizu[1]. > >> See also [2] for the context. > >> > >> This series enables SR-IOV emulation for virtio-net. It is useful > >> to test SR-IOV support on the guest, or to expose several vDPA devices > >> in a VM. vDPA devices can also provide L2 switching feature for > >> offloading though it is out of scope to allow the guest to configure > >> such a feature. > >> > >> The PF side code resides in virtio-pci. The VF side code resides in > >> the PCI common infrastructure, but it is restricted to work only for > >> virtio-net-pci because of lack of validation. > >> > >> User Interface > >> -------------- > >> > >> A user can configure a SR-IOV capable virtio-net device by adding > >> virtio-net-pci functions to a bus. Below is a command line example: > >> -netdev user,id=n -netdev user,id=o > >> -netdev user,id=p -netdev user,id=q > >> -device pcie-root-port,id=b > >> -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f > >> -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f > >> -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f > >> -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f > >> > >> The VFs specify the paired PF with "sriov-pf" property. The PF must be > >> added after all VFs. It is user's responsibility to ensure that VFs have > >> function numbers larger than one of the PF, and the function numbers > >> have a consistent stride. > > > > This seems not user friendly. Any reason we can't just allow user to > > specify the stride here? > > It should be possible to assign addr automatically without requiring > user to specify the stride. I'll try that in the next version. > > > > > Btw, I vaguely remember qemu allows the params to be accepted as a > > list. If this is true, we can accept a list of netdev here? > > Yes, rocker does that. But the problem is not just about getting > parameters needed for VFs, which I forgot to mention in the cover letter > and will explain below. > > > > >> > >> Keeping VF instances > >> -------------------- > >> > >> A problem with SR-IOV emulation is that it needs to hotplug the VFs as > >> the guest requests. Previously, this behavior was implemented by > >> realizing and unrealizing VFs at runtime. However, this strategy does > >> not work well for the proposed virtio-net emulation; in this proposal, > >> device options passed in the command line must be maintained as VFs > >> are hotplugged, but they are consumed when the machine starts and not > >> available after that, which makes realizing VFs at runtime impossible. > > > > Could we store the device options in the PF? > > I wrote it's to store the device options, but the problem is actually > more about realizing VFs at runtime instead of at the initialization time. > > Realizing VFs at runtime have two major problems. One is that it delays > the validations of options; invalid options will be noticed when the > guest requests to realize VFs.
If PCI spec allows the failure when creating VF, then it should not be a problem. > netdevs also warn that they are not used > at initialization time, not knowing that they will be used by VFs later. We could invent things to calm down this false positive. > References to other QEMU objects in the option may also die before VFs > are realized. Is there any other thing than netdev we need to consider? > > The other problem is that QEMU cannot interact with the unrealized VFs. > For example, if you type "device_add virtio-net-pci,id=vf,sriov-pf=pf" > in HMP, you will expect "device_del vf" works, but it's hard to > implement such behaviors with unrealized VFs. I think hotplug can only be done at PF level if we do that. > > I was first going to compromise and allow such quirky behaviors, but I > realized such a compromise is unnecessary if we reuse the PCI power down > logic so I wrote v2. Haven't checked the code, but anything related to the PM here? Thanks > > Regards, > Akihiko Odaki >