Re: [Qemu-devel] Attaching PCI devices to the PCIe root complex

Michael S. Tsirkin Mon, 30 Sep 2013 03:48:04 -0700

On Mon, Sep 30, 2013 at 11:55:47AM +0200, Markus Armbruster wrote:
> "Michael S. Tsirkin" <m...@redhat.com> writes:
> 
> > On Fri, Sep 27, 2013 at 07:06:44PM +0200, Markus Armbruster wrote:
> >> Marcel Apfelbaum <marcel.apfelb...@gmail.com> writes:
> >> 
> >> > On Wed, 2013-09-25 at 10:01 +0300, Michael S. Tsirkin wrote:
> >> >> On Tue, Sep 24, 2013 at 06:01:02AM -0400, Laine Stump wrote:
> >> >> > When I added support for the Q35-based machinetypes to libvirt, I
> >> >> > specifically prohibited attaching any PCI devices (with the exception 
> >> >> > of
> >> >> > graphics controllers) to the PCIe root complex,
> >> >> 
> >> >> That's wrong I think. Anything attached to RC is an integrated
> >> >> endpoint, and these can be PCI devices.
> >> > I couldn't find on PCIe spec any mention that "Root Complex
> >> > Integrated EndPoint"
> >> > must be PCIe. But, from spec 1.3.2.3:
> >> > - A Root Complex Integrated Endpoint must not require I/O resources
> >> > claimed through BAR(s).
> >> > - A Root Complex Integrated Endpoint must not generate I/O Requests.
> >> > - A Root Complex Integrated Endpoint is required to support MSI or
> >> > MSI-X or both if an
> >> > interrupt resource is requested.
> >> > I suppose that this restriction can be removed for PCI devices that
> >> > 1. Actually work when plugged in into RC Integrated EndPoint
> >> > 2. Respond to the above limitations
> >> >
> >> >> 
> >> >> > and had planned to
> >> >> > prevent attaching them to PCIe root ports (ioh3420 device) and PCIe
> >> >> > downstream switch ports (xio-3130 device) as well. I did this because,
> >> >> > even though qemu currently allows attaching a normal PCI device in any
> >> >> > of these three places, the restriction exists for real hardware and I
> >> >> > didn't see any guarantee that qemu wouldn't add the restriction in the
> >> >> > future in order to more closely emulate real hardware.
> >> >> > 
> >> >> > However, since I did that, I've learned that many of the qemu "pci"
> >> >> > devices really should be considered as "pci or pcie". Gerd Hoffman 
> >> >> > lists
> >> >> > some of these cases in a bug he filed against libvirt:
> >> >> > 
> >> >> >    https://bugzilla.redhat.com/show_bug.cgi?id=1003983
> >> >> > 
> >> >> > I would like to loosen up the restrictions in libvirt, but want to 
> >> >> > make
> >> >> > sure that I don't allow something that could later be forbidden by 
> >> >> > qemu
> >> >> > (thus creating a compatibility problem during upgrades). Beyond Gerd's
> >> >> > specific requests to allow ehci, uhci, and hda controllers to attach 
> >> >> > to
> >> >> > PCIe ports, are there any other devices that I specifically should or
> >> >> > shouldn't allow? (I would rather be conservative in what I allow - 
> >> >> > it's
> >> >> > easy to allow more things later, but nearly impossible to revoke
> >> >> > permission once it's been allowed).
> >> > For the moment I would not remove any restrictions, but only the ones
> >> > requested and verified by somebody.
> >> >
> >> >> 
> >> >> IMO, we really need to grow an interface to query this kind of thing.
> >> > Basically libvirt needs to know:
> >> > 1. for (libvirt) controllers: what kind of devices can be plugged in
> >> > 2. for devices (controller is also a device)
> >> >     - to which controllers can it be plugged in
> >> >     - does it support hot-plug?
> >> > 3. implicit controllers of the machine types (q35 - "pcie-root", i440fx 
> >> > - "pci-root")
> >> > All the above must be exported to libvirt
> >> >
> >> > Implementation options:
> >> > 1. Add a compliance field on PCI/PCIe devices and controllers stating if 
> >> > it supports
> >> >    PCI/PCIe or both (and maybe hot-plug)
> >> >    - consider plug type + compliance to figure out whether a plug can go 
> >> > into a socket
> >> >    
> >> > 2. Use Markus Armbruster idea of introducing a concept of "plug and 
> >> > sockets":
> >> >    - dividing the devices into adapters and plugs
> >> >    - adding sockets to bridges(buses?).
> >> >    In this way it would be clear which devices can connect to bridges
> >> 
> >> This isn't actually my idea.  It's how things are designed to work in
> >> qdev, at least in my admittedly limited understanding of qdev.
> >> 
> >> In traditional qdev, a device has exactly one plug (its "bus type",
> >> shown by -device help), and it may have one ore more buses.  Each bus
> >> has a type, and you can plug a device only into a bus of the matching
> >> type.  This was too limiting, and is not how things work now.
> >> 
> >> As far as I know, libvirt already understands that a device can only
> >> plug into a matching bus.
> >> 
> >> In my understanding of where we're headed with qdev, things are / will
> >> become more general, yet stay really simple:
> >> 
> >> * A device can have an arbitrary number of sockets and plugs.
> >> 
> >> * Each socket / plug has a type.
> >> 
> >> * A plug can go into a socket of the same type, and only there.
> >> 
> >> Pretty straightforward generalization of traditional qdev.  I wouldn't
> >> expect libvirt to have serious trouble coping with it, as long as we
> >> provide the necessary information on device models' plugs and sockets.
> >> 
> >> In this framework, there's no such thing as a device model that can plug
> >> either into a PCI or a PCIe socket.  Makes sense to me, because there's
> >> no such thing in the physical world, either.
> >> Instead, and just like in the physical world, you have one separate
> >> device variant per desired plug type.
> >> 
> >
> > Two types of bus is not how things work in real world though.
> > In real world there are 3 types of express bus besides
> > classical pci bus, and limitations on which devices go where
> > that actually apply to devices qemu emulates.
> > For example, a dwonstream switch port can only go on
> > internal virtual express bus of a switch.
> > Devices with multiple interfaces actually do exist
> > in real world - e.g. esata/usb -
> 
> I think the orthodox way to model a disk with both eSATA an USB
> connectors would be two separate plugs, where only one of them can be
> used at the same time.  Not that I can see why anyone would want to
> model such a device when you can just as well have separate eSATA-only
> and USB-only devices, and use the one you want.
> 
> >                                  I never heard of a pci/pci express
> > one but it's not impossible I think.
> 
> PCI on one side of the card, PCIe on the other, and a switchable
> backplate?  Weird :)
> 
> Again, I can't see why we'd want to model this, even if it existed.
> 
> Nevertheless, point taken: devices with multiple interfaces of which
> only one can be used at the same time exist, and we can't exclude the
> possibility that we want to model such a device one day.
> 
> >> To get that, you have to split the device into a common core and bus
> >> adapters.  You compose the core with the PCI adapter to get the PCI
> >> device, with the PCIe adapter to get the PCIe device, and so forth.
> >> I'm not claiming that's the best way to do PCI + PCIe.  It's a purely
> >> theoretical approach, concerned only with conceptual cleanliness, not
> >> practical coding difficulties.
> >
> > I don't mind if that's the internal implementation,
> > but I don't think we should expose this split
> > in the user interface.
> 
> I'm not so sure.
> 
> The current interface munges together all PCIish connectors, and the
> result is a mess: users can't see which device can be plugged into which
> socket.  Libvirt needs to know, and it has grown a bunch of hardcoded ad
> hoc rules, which aren't quite right.
> 
> With separate types for incompatible plugs and sockets, the "what can
> plug into what" question remains as trivial as it was by the initial
> design.


Yes but, same as in the initial design,
it really makes it user's problem.

So we'd have
virtio-net-pci-conventional
virtio-net-pci-express
virtio-net-pci-integrated


All this while users just really want to say "virtio"
(that's the expert user, what most people want is for guest to be faster).

> >> What we have now is entirely different: we've overloaded the existing
> >> PCI plug with all the other PCI-related plug types that came with the
> >> PCIe support, so it means pretty much nothing anymore.  In particular,
> >> there's no way for libvirt to figure out progragmatically whether some
> >> alleged "PCI" device can go into some alleged "PCI" bus.  I call that a
> >> mess.
> >
> > There are lots of problems.
> >
> > First, bus type is not the only factor that can limit
> > which devices go where.
> > For example, specific slots might not support hotplug.
> 
> We made "can hotplug" a property of the bus.  Perhaps it should be a
> property of the slot.

Sure.

> > Another example, all devices in same pci slot must have
> > "multifunction" property set.
> 
> PCI multifunction devices are simply done wrong in the current code.
> 
> I think the orthodox way to model multifunction devices would involve
> composing the functions with a container device, resulting in a
> composite PCI device that can only be plugged as a whole.

It would also presumably involve a new bus which has
no basis in reality and a new type of device
for when it's a function within a multifunction device.

> Again, this is a conceptually clean approach, unconcerned with practical
> coding difficulties.

And need to grow new interfaces to specify these containers.

> > Second, there's no way to find out what is a valid
> > bus address. For example, with express you can
> > only use slot 0.
> 
> Device introspection via QOM should let you enumerate available sockets.

Slots would need to become sockets for this to work :)

> > Hotplug is only supported ATM if no two devices
> > share a pci slot.
> 
> Do you mean "hotplug works only with multifunction off"?  If yes, see
> above.  If no, please elaborate.

Well it does kind of work with most guests.
The way it works is a hack though.

> > Also, there's apparently no way to figure
> > out what kind of bus (or multiple buses) is behind each device.
> 
> Again, introspection via QOM should let you enumerate available sockets.

Maybe it should but it doesn't seem to let me.

> > Solution proposed above (separate each device into
> > two parts) only solves the pci versus express issue without
> > addressing any of the other issues.
> > So I'm not sure it's worth the effort.
> 
> As I said, I'm not claiming I know the only sane solution to this
> problem.  I've only described a solution that stays true to the qdev /
> QOM design as I understand it.

Right. And I don't argue from the implementation point of view.
What I am saying is, encoding everything in a single string
isn't a good user interface.

We really should let users say "virtio" detect that
it's connected to pci on one end and network on
the other and instanciate virtio-net-pci.

> True qdev / QOM experts, please help out with advice.

-- 
MST

Re: [Qemu-devel] Attaching PCI devices to the PCIe root complex

Reply via email to