On Tue, Jul 09, 2024 at 10:59:30AM GMT, Nicolin Chen wrote:
> On Tue, Jul 09, 2024 at 03:26:58PM +0200, Eric Auger wrote:
> > > +    /* Build PCI Expander Bridge + Root Port from the top of PCI_BUS_MAX 
> > > */
> > > +    if (vms->num_nested_smmus) {
> > > +        /* VIRT_NESTED_SMMU must hold all vSMMUs */
> > > +        g_assert(vms->num_nested_smmus <=
> > > +                 vms->memmap[VIRT_NESTED_SMMU].size / SMMU_IO_LEN);
> > > +
> > > +        vms->nested_smmu_phandle = g_new0(uint32_t, 
> > > vms->num_nested_smmus);
> > > +
> > > +        for (i = 0; i < vms->num_nested_smmus; i++) {
> > > +            DeviceState *smmu_dev;
> > > +            PCIBus *pxb_bus;
> > > +
> > > +            pxb_bus = create_pcie_expander_bridge(vms, i);
> > > +            g_assert(pxb_bus);
> > > +
> > > +            vms->nested_smmu_phandle[i] = 
> > > qemu_fdt_alloc_phandle(ms->fdt);
> > > +            smmu_dev = create_nested_smmu(vms, pxb_bus, i);
> > > +            g_assert(smmu_dev);
> > > +
> > > +            qemu_fdt_setprop_cells(ms->fdt, nodename, "iommu-map", 0x0,
> > > +                                   vms->nested_smmu_phandle[i], 0x0, 
> > > 0x10000);
>
> > I think libvirt is supposed to create the pcie bus topology instead and
> > it shall not be created by qemu behind the scene.
> > The pcie elements you create here are not visible to libvirt and I guess
> > they may collide with elements explicitly created by libvirt at a given
> > pci bdf.
>
> Yea, the bdf conflict is a concern. So I allocated the bdf list
> from the top of the bus number... one of the reasons of doing
> this is to ease users so they don't need to deal with the over-
> complicated topology. I will try libvirt and see how it goes.

libvirt developer here. Eric is absolutely right that creating PCI
controllers implicitly will not fly, as libvirt's PCI address
allocation algorithm would not know about them and would produce
incorrect output as a result.

> > I think it would make more sense to be able to attach an smmu instance
> > to a given pci root or pxb either by adding an iommu id to a given
> > pxb-pcie option
> >
> > -device
> > pxb-pcie,bus_nr=100,id=pci.12,numa_node=0,bus=pcie.0,addr=0x3,iommu=<id>
> > or
> > adding a list of pxb ids to the iommu option. It is unfortunate the
> > iommu option is a machine option.
>
> Yes. I had thought about that too, but the virt-machine code
> creates all the instance at this moment...
>
> > platform bus framework could be considered to dynamically allocate them
> > using the -device option. This has been used along with dt generation
> > but with ACPI this would need to be studied. However at the time the
> > smmu was integrated the machine option was prefered.
> >
> > Maybe using the 1st option would allow to infer that if there are
> > different iommu ids this implies that several IOMMU instances need to be
> > created.
>
> Yea, I like the idea of creating iommu instance with a "-device"
> string.

This sounds like the best option to me as well. Incidentally, it's
also how the intel-iommu device works. From libvirt's test suite:

  -device '{"driver":"intel-iommu","id":"iommu0"}'

Looks like a sensible model to follow. You could then associate each
specific IOMMU instance to a PCI controller using an attribute, as
Eric suggested above.

> One more question. Let's say we have 2 smmus/pxb-buses:
>   [ pxb0] <---> vSMMU0/pSMMU0 [ devA, devB, devC ]
>   [ pxb1] <---> vSMMU1/pSMMU1 [ devD, devE, devF ]
> How would a user know that devA/devB should be attached to pxb0
> without doing like devA->pxb0 and devB->pxb1? Should QEMU just
> error out until the user associate them correctly? Or they may
> rely on libvirt to figure that out, i.e. moving the iommu node
> matching from QEMU to libvirt?

IIUC having devices associated to the "correct" SMMU is only a
requirement for optimal performance, not pure functionality, right?
In other words, getting the association wrong will make things slower
but they will keep working.

Is a 1:1 match between pSMMU and vSMMU mandatory? On a system with 4
SMMUs, could I create a VMs with

  vSMMU0 <-> pSMMU0
  vSMMU1 <-> pSMMU1

and another one with

  vSMMU0 <-> pSMMU2
  vSMMU1 <-> pSMMU3

assuming of course that devices are assigned to the correct VM?

How is the association between vSMMU and pSMMU created anyway? Is
that something that the user can control, or is it done automatically
somehow?

-- 
Andrea Bolognani / Red Hat / Virtualization


Reply via email to