Peter Xu <pet...@redhat.com> writes:

> On Wed, Aug 25, 2021 at 02:28:55PM +0200, Markus Armbruster wrote:
>> Markus Armbruster <arm...@redhat.com> writes:
>> 
>> > Peter Xu <pet...@redhat.com> writes:
>> >
>> >> On Mon, Aug 23, 2021 at 05:56:23PM -0400, Eduardo Habkost wrote:
>> >>> I don't have any other example, but I assume address assignment
>> >>> based on ordering is a common pattern in device code.
>> >>> 
>> >>> I would take a very close and careful look at the devices with
>> >>> non-default vmsd priority.  If you can prove that the 13 device
>> >>> types with non-default priority are all order-insensitive, a
>> >>> custom sort function as you describe might be safe.
>> >>
>> >> Besides virtio-mem-pci, there'll also similar devfn issue with all
>> >> MIG_PRI_PCI_BUS, as they'll be allocated just like other pci devices.  
>> >> Say,
>> >> below two cmdlines will generate different pci topology too:
>> >>
>> >>   $ qemu-system-x86_64 -device pcie-root-port,chassis=0 \
>> >>                        -device pcie-root-port,chassis=1 \
>> >>                        -device virtio-net-pci
>> >>
>> >> And:
>> >>
>> >>   $ qemu-system-x86_64 -device pcie-root-port,chassis=0 \
>> >>                        -device virtio-net-pci
>> >>                        -device pcie-root-port,chassis=1 \
>> >>
>> >> This cannot be solved by keeping priority==0 ordering.
>> >>
>> >> After a second thought, I think I was initially wrong on seeing migration
>> >> priority and device realization the same problem.
>> >>
>> >> For example, for live migration we have a requirement on PCI_BUS being 
>> >> migrated
>> >> earlier than MIG_PRI_IOMMU because there's bus number information required
>> >> because IOMMU relies on the bus number to find address spaces.  However 
>> >> that's
>> >> definitely not a requirement for device realizations, say, realizing 
>> >> vIOMMU
>> >> after pci buses are fine (bus assigned during bios).
>> >>
>> >> I've probably messed up with the ideas (though they really look alike!).  
>> >> Sorry
>> >> about that.
>> >>
>> >> Since the only ordering constraint so far is IOMMU vs all the rest of 
>> >> devices,
>> >> I'll introduce a new priority mechanism and only make sure vIOMMUs are 
>> >> realized
>> >> earlier.  That'll also avoid other implications on pci devfn allocations.
>> >>
>> >> Will rework a new version tomorrow.  Thanks a lot for all the comments,
>> >
>> > Is it really a good idea to magically reorder device realization just to
>> > make a non-working command line work?  Why can't we just fail the
>> > non-working command line in a way that tells users how to get a working
>> > one?  We have way too much ordering magic already...
>> >
>> > If we decide we want more magic, then I'd argue for *dependencies*
>> > instead of priorities.  Dependencies are specific and local: $this needs
>> > to go after $that because $reasons.  Priorities are unspecific and
>> > global.
>> 
>> Having thought about this a bit more...
>> 
>> Constraints on realize order are nothing new.  For instance, when a
>> device plugs into a bus, it needs to be realized after the device
>> providing the bus.
>> 
>> We ensure this by having the device refer to the bus, e.g. bus=pci.0.
>> The reference may be implicit, but it's there.  It must resolve for
>> device creation to succeed, and if it resolves, the device providing the
>> bus will be realized in time.
>> 
>> I believe what's getting us into trouble with IOMMU is not having such a
>> reference.  Or in other words, keeping the dependence between the IOMMU
>> and the devices relying on it *implicit*, and thus hidden from the
>> existing realize-ordering machinery.
>> 
>> Instead of inventing another such machinery, let's try to use the one we
>> already have.
>
> Hmm... I just found that we don't have such machinery, do we?
>
> This does not really work:
>
> $ ./qemu-system-x86_64 -M q35 -device virtio-net-pci,bus=pcie.1 \
>                        -device pcie-root-port,id=pcie.1,bus=pcie.0
> qemu-system-x86_64: -device virtio-net-pci,bus=pcie.1: Bus 'pcie.1' not found
>
> While this will:
>
> $ ./qemu-system-x86_64 -M q35 -device pcie-root-port,id=pcie.1,bus=pcie.0 \
>                        -device virtio-net-pci,bus=pcie.1

This is exactly what I described.  bus=pcie.0 is the explicit reference.
It must resolve for device creation to succeed, and if it resolves, the
device providing the bus will be realized in time.  It resolves in the
second example, but not the first.

Look ma, no magic!  Instead, stupid & predictable.


Reply via email to