Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-10-11 Thread Andrea Bolognani
On Mon, 2016-10-10 at 17:36 +0300, Marcel Apfelbaum wrote:
> > What's the advantage in using ARI to stuff more than eight
> > of anything that's not Endpoint Devices in a single slot?
> > 
> > I mean, if we just fill up all 32 slots in a PCIe Root Bus
> > with 8 PCIe Root Ports each we already end up having 256
> > hotpluggable slots[1]. Why would it be preferable to use
> > ARI, or even PCIe Switches, instead?
> 
> What if you need more devices (functions actually) ?
> 
> If some of the pcie.0 slots are occupied by other Integrated devices
> and you need more than 256 functions you can:
> (1) Add a PCIe Switch - if you need hot-plug support -an you are pretty 
> limited
>  by the bus numbers, but it will give you a few more slots.
> (2) Use multi-function devices per root port if you are not interested in 
> hotplug.
>  In this case ARI will give you up to 256 devices per Root Port.
> 
> Now the question is why ARI? Better utilization of the "problematic"
> resources like Bus numbers and IO space; all that if you need an insane
> number of devices, but we don't judge :).

My point is that AIUI ARI is something you only care about
for endpoint devices that want to have more than 8 functions.

When it comes to controller, there's no advantage that I can
think of in having 1 slot with 256 functions as opposed to 32
slots with 8 functions each; if anything, I expect that at
least some guest OSs would be quite baffled in finding eg. a
network adapter, a SCSI controller and a GPU as separate
functions of a single PCI slot.

> > [1] The last slot will have to be limited to 7 PCIe Root
> > Ports if we don't want to run out of bus numbers
> 
> I don't follow how this will 'save' us. If all the root ports
> are in use and you leave space for one more, what can you do with it?

Probably my math is off, but if we can only have 256 PCI
buses (0-255) and we plug a PCIe Root Port in each of the
8 functions (0-7) of the 32 ports (0-31) available on the
PCIe Root Bus, we end up with

  0:00.[0-7] -> [001-008]:0.[0-7]
  0:01.[0-7] -> [009-016]:0.[0-7]
  0:02.[0-7] -> [017-024]:0.[0-7]
  ...
  0.30.[0-7] -> [241-248]:0.[0-7]
  0.31.[0-7] -> [249-256]:0.[0-7]

but 256 is not a valid bus number, so we should skip that
last PCIe Root Port and stop at 255.

-- 
Andrea Bolognani / Red Hat / Virtualization



Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-10-11 Thread Andrea Bolognani
On Mon, 2016-10-10 at 17:15 +0300, Marcel Apfelbaum wrote:
> > > > 2) can you really only plug a pcie-root-port (ioh3420)
> > > > into a pxb-pcie? Or will it accept anything that pcie.0
> > > > accepts?
> > > 
> > > It supports only PCI Express Root Ports. It does not
> > > support Integrated Devices.
> > 
> > So no PCI Express Switch Upstream Ports?
> 
> The switch upstream ports can only be plugged into PCIe Root Ports.
> There is an error in the RFC showing otherwise, it is already
> corrected in V1, not yet upstream.

I was pretty sure that was the case, but I wanted to
double-check just to be on the safe side ;)

> > What about DMI-to-PCI Bridges?
> 
> Yes, the dmi-to-pci bridge can be plugged into the pxb-pcie, I'll
> be sure to emphasize it.

Cool. I would have been very surprised if that would have
not been the case, considering how we need to use multiple
pxb-pcie to link PCI devices to specific NUMA nodes.

-- 
Andrea Bolognani / Red Hat / Virtualization



Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-10-10 Thread Marcel Apfelbaum

On 10/10/2016 03:02 PM, Andrea Bolognani wrote:

On Tue, 2016-10-04 at 12:52 -0600, Alex Williamson wrote:

I'ts all just idle number games, but what I was thinking of was the
difference between plugging  a bunch of root-port+upstream+downstreamxN
combos directly into pcie-root (flat), vs. plugging the first into
pcie-root, and then subsequent ones into e.g. the last downstream port
of the previous set. Take the simplest case of needing 63 hotpluggable
slots. In the "flat" case, you have:

2 x pcie-root-port
 2 x pcie-switch-upstream-port
 63 x pcie-switch-downstream-port

In the "nested" or "chained" case you have:

 1 x pcie-root-port
 1 x pcie-switch-upstream-port
 32 x pcie-downstream-port
 1 x pcie-switch-upstream-port
 32 x pcie-switch-downstream-port


You're not thinking in enough dimensions.  A single root port can host
multiple sub-hierarchies on it's own.  We can have a multi-function
upstream switch, so you can have 8 upstream ports (00.{0-7}).  If we
implemented ARI on the upstream ports, we could have 256 upstream ports
attached to a single root port, but of course then we've run out of
bus numbers before we've even gotten to actual devices buses.

Another option, look at the downstream ports, why do they each need to
be in separate slots?  We have the address space of an entire bus to
work with, so we can also create multi-function downstream ports, which
gives us 256 downstream ports per upstream port.  Oops, we just ran out
of bus numbers again, but at least actual devices can be attached.


What's the advantage in using ARI to stuff more than eight
of anything that's not Endpoint Devices in a single slot?

I mean, if we just fill up all 32 slots in a PCIe Root Bus
with 8 PCIe Root Ports each we already end up having 256
hotpluggable slots[1]. Why would it be preferable to use
ARI, or even PCIe Switches, instead?



What if you need more devices (functions actually) ?

If some of the pcie.0 slots are occupied by other Integrated devices
and you need more than 256 functions you can:
(1) Add a PCIe Switch - if you need hot-plug support -an you are pretty limited
by the bus numbers, but it will give you a few more slots.
(2) Use multi-function devices per root port if you are not interested in 
hotplug.
In this case ARI will give you up to 256 devices per Root Port.

Now the question is why ARI? Better utilization of the "problematic"
resources like Bus numbers and IO space; all that if you need an insane
number of devices, but we don't judge :).

Thanks,
Marcel



[1] The last slot will have to be limited to 7 PCIe Root
Ports if we don't want to run out of bus numbers


I don't follow how this will 'save' us. If all the root ports
are in use and you leave space for one more, what can you do with it?


--
Andrea Bolognani / Red Hat / Virtualization






Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-10-10 Thread Marcel Apfelbaum

On 10/10/2016 02:09 PM, Andrea Bolognani wrote:

On Wed, 2016-10-05 at 12:17 +0300, Marcel Apfelbaum wrote:

2) can you really only plug a pcie-root-port (ioh3420)
into a pxb-pcie? Or will it accept anything that pcie.0
accepts?


It supports only PCI Express Root Ports. It does not
support Integrated Devices.


So no PCI Express Switch Upstream Ports?


The switch upstream ports can only be plugged into PCIe Root Ports.
There is an error in the RFC showing otherwise, it is already
corrected in V1, not yet upstream.


What about DMI-to-PCI Bridges?

Yes, the dmi-to-pci bridge can be plugged into the pxb-pcie, I'll
be sure to emphasize it.

Thanks,
Marcel



--
Andrea Bolognani / Red Hat / Virtualization






Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-10-10 Thread Andrea Bolognani
On Tue, 2016-10-04 at 12:52 -0600, Alex Williamson wrote:
> > I'ts all just idle number games, but what I was thinking of was the 
> > difference between plugging  a bunch of root-port+upstream+downstreamxN 
> > combos directly into pcie-root (flat), vs. plugging the first into 
> > pcie-root, and then subsequent ones into e.g. the last downstream port 
> > of the previous set. Take the simplest case of needing 63 hotpluggable 
> > slots. In the "flat" case, you have:
> > 
> > 2 x pcie-root-port
> > 2 x pcie-switch-upstream-port
> > 63 x pcie-switch-downstream-port
> > 
> > In the "nested" or "chained" case you have:
> > 
> > 1 x pcie-root-port
> > 1 x pcie-switch-upstream-port
> > 32 x pcie-downstream-port
> > 1 x pcie-switch-upstream-port
> > 32 x pcie-switch-downstream-port
> 
> You're not thinking in enough dimensions.  A single root port can host
> multiple sub-hierarchies on it's own.  We can have a multi-function
> upstream switch, so you can have 8 upstream ports (00.{0-7}).  If we
> implemented ARI on the upstream ports, we could have 256 upstream ports
> attached to a single root port, but of course then we've run out of
> bus numbers before we've even gotten to actual devices buses.
> 
> Another option, look at the downstream ports, why do they each need to
> be in separate slots?  We have the address space of an entire bus to
> work with, so we can also create multi-function downstream ports, which
> gives us 256 downstream ports per upstream port.  Oops, we just ran out
> of bus numbers again, but at least actual devices can be attached.

What's the advantage in using ARI to stuff more than eight
of anything that's not Endpoint Devices in a single slot?

I mean, if we just fill up all 32 slots in a PCIe Root Bus
with 8 PCIe Root Ports each we already end up having 256
hotpluggable slots[1]. Why would it be preferable to use
ARI, or even PCIe Switches, instead?


[1] The last slot will have to be limited to 7 PCIe Root
Ports if we don't want to run out of bus numbers
-- 
Andrea Bolognani / Red Hat / Virtualization



Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-10-10 Thread Andrea Bolognani
On Wed, 2016-10-05 at 12:17 +0300, Marcel Apfelbaum wrote:
> > 2) can you really only plug a pcie-root-port (ioh3420)
> > into a pxb-pcie? Or will it accept anything that pcie.0
> > accepts?
> 
> It supports only PCI Express Root Ports. It does not
> support Integrated Devices.

So no PCI Express Switch Upstream Ports? What about
DMI-to-PCI Bridges?

-- 
Andrea Bolognani / Red Hat / Virtualization



Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-10-05 Thread Marcel Apfelbaum

On 10/04/2016 07:25 PM, Laine Stump wrote:

On 10/04/2016 11:45 AM, Alex Williamson wrote:

On Tue, 4 Oct 2016 15:59:11 +0100
"Daniel P. Berrange"  wrote:


On Mon, Sep 05, 2016 at 06:24:48PM +0200, Laszlo Ersek wrote:

On 09/01/16 15:22, Marcel Apfelbaum wrote:

+2.3 PCI only hierarchy
+==
+Legacy PCI devices can be plugged into pcie.0 as Integrated Devices or
+into DMI-PCI bridge. PCI-PCI bridges can be plugged into DMI-PCI bridges
+and can be nested until a depth of 6-7. DMI-BRIDGES should be plugged
+only into pcie.0 bus.
+
+   pcie.0 bus
+   --
+||
+   ---   --
+   | PCI Dev |   | DMI-PCI BRIDGE |
+   ----
+   ||
+-----
+| PCI Dev || PCI-PCI Bridge |
+-----
+ |   |
+  --- ---
+  | PCI Dev | | PCI Dev |
+  --- ---


Works for me, but I would again elaborate a little bit on keeping the
hierarchy flat.

First, in order to preserve compatibility with libvirt's current
behavior, let's not plug a PCI device directly in to the DMI-PCI bridge,
even if that's possible otherwise. Let's just say

- there should be at most one DMI-PCI bridge (if a legacy PCI hierarchy
is required),


Why do you suggest this ? If the guest has multiple NUMA nodes
and you're creating a PXB for each NUMA node, then it looks valid
to want to have a DMI-PCI bridge attached to each PXB, so you can
have legacy PCI devices on each NUMA node, instead of putting them
all on the PCI bridge without NUMA affinity.


Seems like this is one of those "generic" vs "specific" device issues.
We use the DMI-to-PCI bridge as if it were a PCIe-to-PCI bridge, but
DMI is actually an Intel proprietary interface, the bridge just has the
same software interface as a PCI bridge.  So while you can use it as a
generic PCIe-to-PCI bridge, it's at least going to make me cringe every
time.



If using it this way makes kittens cry or something, then we'd be happy to use 
a generic pcie-to-pci bridge if somebody created one :-)





- only PCI-PCI bridges should be plugged into the DMI-PCI bridge,


What's the rational for that, as opposed to plugging devices directly
into the DMI-PCI bridge which seems to work ?




Hi,


IIRC, something about hotplug, but from a PCI perspective it doesn't
make any sense to me either.




Indeed, the reason to plug the PCI bridge into the DMI-TO-PCI bridge
would be the hot-plug support.
The PCI bridges can support hotplug on Q35.
There is even an RFC on the list doing that:
https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg05681.html

With the DMI-PCI bridge is another story. From what I understand the actual
device (i82801b11) do not support hotplug and the chances to make it work
are minimal.




At one point Marcel and Michael were discussing the possibility of making 
hotplug work on a dmi-to-pci-bridge. Currently it doesn't even work for 
pci-bridge so (as I think I said in another message
just now) it is kind of pointless, although when I asked about eliminating use of 
pci-bridge in favor of just using dmi-to-pci-bridge directly, I got lots of 
"no" votes.



Since we have an RFC showing it is possible to have hotplug for PCI devices 
pluged into PCI bridges
it is better to continue using the PCI bridge until one of the bellow will 
happen:
 1 - pci-bridge ACPI hotplug will be possible
 2 - i82801b11 ACPI hotplug will be possible
 3 - a new pcie-pci bridge will be coded




 Same with the restriction from using slot
0 on PCI bridges, there's no basis for that except on the root bus.


I tried allowing devices to be plugged into slot 0 of a pci-bridge in libvirt - qemu 
barfed, so I moved the "minSlot" for pci-bridge back up to 1. Slot 0 is 
completely usable on a dmi-to-pci-bridge
though (and libvirt allows it). At this point, even if qemu enabled using slot 
0 of a pci-bridge, libvirt wouldn't be able to expose that to users (unless the 
min/max slot of each PCI controller was
made visible somewhere via QMP)



The reason for not being able to plug a device into slot 0 of a PCI Bridge is 
the SHPC (Hot-plug controller)
device embedded in the PCI bridge by default. The SHPC spec requires this.
If one disables it with shpc=false, he should be able to use the slot 0.

Funny thing, the SHPC is not actually used by neither i440fx or Q35 machines,
for i440fx we use ACPI based PCI hotplug and for Q35 we use PCIe native hotplug.

Should we default the shpc to off?

Thanks,
Marcel






Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-10-05 Thread Marcel Apfelbaum

On 10/04/2016 08:54 PM, Laine Stump wrote:

On 10/04/2016 12:10 PM, Laine Stump wrote:

On 10/04/2016 11:40 AM, Laszlo Ersek wrote:



Small correction to your wording though: you don't want to attach the
DMI-PCI bridge to the PXB device, but to the extra root bus provided by
the PXB.


This made me realize something - the root bus on a pxb-pcie controller
has a single slot and that slot can accept either a pcie-root-port
(ioh3420) or a dmi-to-pci-bridge. If you want to have both express and
legacy PCI devices on the same NUMA node, then you would either need to
create one pxb-pcie for the pcie-root-port and another for the
dmi-to-pci-bridge, or you would need to put the pcie-root-port and
dmi-to-pci-bridge onto different functions of the single slot. Should
the latter work properly?




Hi,


We were discussing pxb-pcie today while Dan was trying to get a particular 
configuration working, and there was some disagreement about two points that I 
stated above as fact (but which may just be
misunderstanding again):

1) Does pxb-pcie only provide a single slot (0)? Or does it provide 32 slots 
(0-31) just like the pcie root complex?



It provides 32 slots behaving like a PCI Express Root Complex.


2) can you really only plug a pcie-root-port (ioh3420) into a pxb-pcie? Or will 
it accept anything that pcie.0 accepts?


It supports only PCI Express Root Ports. It does not support Integrated Devices.

Thanks,
Marcel




Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-10-04 Thread Laszlo Ersek
On 10/04/16 20:08, Laine Stump wrote:
> On 10/04/2016 12:43 PM, Laszlo Ersek wrote:
>> On 10/04/16 18:10, Laine Stump wrote:
>>> On 10/04/2016 11:40 AM, Laszlo Ersek wrote:
 On 10/04/16 16:59, Daniel P. Berrange wrote:
> On Mon, Sep 05, 2016 at 06:24:48PM +0200, Laszlo Ersek wrote:
 All valid *high-level* topology goals should be permitted / covered one
 way or another by this document, but in as few ways as possible --
 hopefully only one way. For example, if you read the rest of the
 thread,
 flat hierarchies are preferred to deeply nested hierarchies, because
 flat ones save on bus numbers
>>>
>>> Do they?
>>
>> Yes. Nesting implies bridges, and bridges take up bus numbers. For
>> example, in a PCI Express switch, the upstream port of the switch
>> consumes a bus number, with no practical usefulness.
> 
> I'ts all just idle number games, but what I was thinking of was the
> difference between plugging  a bunch of root-port+upstream+downstreamxN
> combos directly into pcie-root (flat), vs. plugging the first into
> pcie-root, and then subsequent ones into e.g. the last downstream port
> of the previous set. Take the simplest case of needing 63 hotpluggable
> slots. In the "flat" case, you have:
> 
>2 x pcie-root-port
>2 x pcie-switch-upstream-port
>63 x pcie-switch-downstream-port
> 
> In the "nested" or "chained" case you have:
> 
>1 x pcie-root-port
>1 x pcie-switch-upstream-port
>32 x pcie-downstream-port
>1 x pcie-switch-upstream-port
>32 x pcie-switch-downstream-port
> 
> so you use the same number of PCI controllers.
> 
> Of course if you're talking about the difference between using
> upstream+downstream vs. just having a bunch of pcie-root-ports directly
> on pcie-root then you're correct, but only marginally - for 63
> hotpluggable ports, you would need 63 x pcie-root-port, so a savings of
> 4 controllers - about 6.5%.

We aim at 200+ ports.

Also, nesting causes recursion in any guest code that traverses the
hierarchy. I think it has some performance impact, plus, for me at
least, interpreting PCI enumeration logs with deep recursion is way
harder than the flat stuff. The bus number space is flat, and for me
it's easier to "map back" to the topology if the topology is also mostly
flat.

> (Of course this is all moot since you run
> out of ioport space after, what, 7 controllers needing it anyway? :-P)

No, it's not moot. The idea is that PCI Express devices must not require
IO space for correct operation -- I believe this is actually mandated by
the PCI Express spec --, so in the PCI Express hierarchy we wouldn't
reserve IO space at all. We discussed this earlier up-thread, please see:

http://lists.nongnu.org/archive/html/qemu-devel/2016-09/msg00672.html

* Finally, this is the spot where we should design and explain our
  resource reservation for hotplug: [...]

>> IIRC we collectively devised a flat pattern elsewhere in the thread
>> where you could exhaust the 0..255 bus number space such that almost
>> every bridge (= taking up a bus number) would also be capable of
>> accepting a hot-plugged or cold-plugged PCI Express device. That is,
>> practically no wasted bus numbers.
>>
>> Hm search this message for "population algorithm":
>>
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg394730.html
>>
>> and then Gerd's big improvement / simplification on it, with
>> multifunction:
>>
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg395437.html
>>
>> In Gerd's scheme, you'd only need only one or two (I'm lazy to count
>> exactly :)) PCI Express switches, to exhaust all bus numbers. Minimal
>> waste due to upstream ports.
> 
> Yep. And in response to his message, that's what I'm implementing as the
> default strategy in libvirt :-)

Sounds great, thanks!
Laszlo




Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-10-04 Thread Alex Williamson
On Tue, 4 Oct 2016 14:08:45 -0400
Laine Stump  wrote:

> On 10/04/2016 12:43 PM, Laszlo Ersek wrote:
> > On 10/04/16 18:10, Laine Stump wrote:  
> >> On 10/04/2016 11:40 AM, Laszlo Ersek wrote:  
> >>> On 10/04/16 16:59, Daniel P. Berrange wrote:  
>  On Mon, Sep 05, 2016 at 06:24:48PM +0200, Laszlo Ersek wrote:  
> >>> All valid *high-level* topology goals should be permitted / covered one
> >>> way or another by this document, but in as few ways as possible --
> >>> hopefully only one way. For example, if you read the rest of the thread,
> >>> flat hierarchies are preferred to deeply nested hierarchies, because
> >>> flat ones save on bus numbers  
> >>
> >> Do they?  
> >
> > Yes. Nesting implies bridges, and bridges take up bus numbers. For
> > example, in a PCI Express switch, the upstream port of the switch
> > consumes a bus number, with no practical usefulness.  
> 
> I'ts all just idle number games, but what I was thinking of was the 
> difference between plugging  a bunch of root-port+upstream+downstreamxN 
> combos directly into pcie-root (flat), vs. plugging the first into 
> pcie-root, and then subsequent ones into e.g. the last downstream port 
> of the previous set. Take the simplest case of needing 63 hotpluggable 
> slots. In the "flat" case, you have:
> 
> 2 x pcie-root-port
> 2 x pcie-switch-upstream-port
> 63 x pcie-switch-downstream-port
> 
> In the "nested" or "chained" case you have:
> 
> 1 x pcie-root-port
> 1 x pcie-switch-upstream-port
> 32 x pcie-downstream-port
> 1 x pcie-switch-upstream-port
> 32 x pcie-switch-downstream-port

You're not thinking in enough dimensions.  A single root port can host
multiple sub-hierarchies on it's own.  We can have a multi-function
upstream switch, so you can have 8 upstream ports (00.{0-7}).  If we
implemented ARI on the upstream ports, we could have 256 upstream ports
attached to a single root port, but of course then we've run out of
bus numbers before we've even gotten to actual devices buses.

Another option, look at the downstream ports, why do they each need to
be in separate slots?  We have the address space of an entire bus to
work with, so we can also create multi-function downstream ports, which
gives us 256 downstream ports per upstream port.  Oops, we just ran out
of bus numbers again, but at least actual devices can be attached.
Thanks,

Alex


> so you use the same number of PCI controllers.
> 
> Of course if you're talking about the difference between using 
> upstream+downstream vs. just having a bunch of pcie-root-ports directly 
> on pcie-root then you're correct, but only marginally - for 63 
> hotpluggable ports, you would need 63 x pcie-root-port, so a savings of 
> 4 controllers - about 6.5%. (Of course this is all moot since you run 
> out of ioport space after, what, 7 controllers needing it anyway? :-P)
> 
> >
> > IIRC we collectively devised a flat pattern elsewhere in the thread
> > where you could exhaust the 0..255 bus number space such that almost
> > every bridge (= taking up a bus number) would also be capable of
> > accepting a hot-plugged or cold-plugged PCI Express device. That is,
> > practically no wasted bus numbers.
> >
> > Hm search this message for "population algorithm":
> >
> > https://www.mail-archive.com/qemu-devel@nongnu.org/msg394730.html
> >
> > and then Gerd's big improvement / simplification on it, with multifunction:
> >
> > https://www.mail-archive.com/qemu-devel@nongnu.org/msg395437.html
> >
> > In Gerd's scheme, you'd only need only one or two (I'm lazy to count
> > exactly :)) PCI Express switches, to exhaust all bus numbers. Minimal
> > waste due to upstream ports.  
> 
> Yep. And in response to his message, that's what I'm implementing as the 
> default strategy in libvirt :-)
> 
> 




Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-10-04 Thread Laine Stump

On 10/04/2016 12:43 PM, Laszlo Ersek wrote:

On 10/04/16 18:10, Laine Stump wrote:

On 10/04/2016 11:40 AM, Laszlo Ersek wrote:

On 10/04/16 16:59, Daniel P. Berrange wrote:

On Mon, Sep 05, 2016 at 06:24:48PM +0200, Laszlo Ersek wrote:

All valid *high-level* topology goals should be permitted / covered one
way or another by this document, but in as few ways as possible --
hopefully only one way. For example, if you read the rest of the thread,
flat hierarchies are preferred to deeply nested hierarchies, because
flat ones save on bus numbers


Do they?


Yes. Nesting implies bridges, and bridges take up bus numbers. For
example, in a PCI Express switch, the upstream port of the switch
consumes a bus number, with no practical usefulness.


I'ts all just idle number games, but what I was thinking of was the 
difference between plugging  a bunch of root-port+upstream+downstreamxN 
combos directly into pcie-root (flat), vs. plugging the first into 
pcie-root, and then subsequent ones into e.g. the last downstream port 
of the previous set. Take the simplest case of needing 63 hotpluggable 
slots. In the "flat" case, you have:


   2 x pcie-root-port
   2 x pcie-switch-upstream-port
   63 x pcie-switch-downstream-port

In the "nested" or "chained" case you have:

   1 x pcie-root-port
   1 x pcie-switch-upstream-port
   32 x pcie-downstream-port
   1 x pcie-switch-upstream-port
   32 x pcie-switch-downstream-port

so you use the same number of PCI controllers.

Of course if you're talking about the difference between using 
upstream+downstream vs. just having a bunch of pcie-root-ports directly 
on pcie-root then you're correct, but only marginally - for 63 
hotpluggable ports, you would need 63 x pcie-root-port, so a savings of 
4 controllers - about 6.5%. (Of course this is all moot since you run 
out of ioport space after, what, 7 controllers needing it anyway? :-P)




IIRC we collectively devised a flat pattern elsewhere in the thread
where you could exhaust the 0..255 bus number space such that almost
every bridge (= taking up a bus number) would also be capable of
accepting a hot-plugged or cold-plugged PCI Express device. That is,
practically no wasted bus numbers.

Hm search this message for "population algorithm":

https://www.mail-archive.com/qemu-devel@nongnu.org/msg394730.html

and then Gerd's big improvement / simplification on it, with multifunction:

https://www.mail-archive.com/qemu-devel@nongnu.org/msg395437.html

In Gerd's scheme, you'd only need only one or two (I'm lazy to count
exactly :)) PCI Express switches, to exhaust all bus numbers. Minimal
waste due to upstream ports.


Yep. And in response to his message, that's what I'm implementing as the 
default strategy in libvirt :-)






Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-10-04 Thread Laine Stump

On 10/04/2016 12:10 PM, Laine Stump wrote:

On 10/04/2016 11:40 AM, Laszlo Ersek wrote:



Small correction to your wording though: you don't want to attach the
DMI-PCI bridge to the PXB device, but to the extra root bus provided by
the PXB.


This made me realize something - the root bus on a pxb-pcie controller
has a single slot and that slot can accept either a pcie-root-port
(ioh3420) or a dmi-to-pci-bridge. If you want to have both express and
legacy PCI devices on the same NUMA node, then you would either need to
create one pxb-pcie for the pcie-root-port and another for the
dmi-to-pci-bridge, or you would need to put the pcie-root-port and
dmi-to-pci-bridge onto different functions of the single slot. Should
the latter work properly?


We were discussing pxb-pcie today while Dan was trying to get a 
particular configuration working, and there was some disagreement about 
two points that I stated above as fact (but which may just be 
misunderstanding again):


1) Does pxb-pcie only provide a single slot (0)? Or does it provide 32 
slots (0-31) just like the pcie root complex?


2) can you really only plug a pcie-root-port (ioh3420) into a pxb-pcie? 
Or will it accept anything that pcie.0 accepts?




Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-10-04 Thread Laszlo Ersek
On 10/04/16 18:10, Laine Stump wrote:
> On 10/04/2016 11:40 AM, Laszlo Ersek wrote:
>> On 10/04/16 16:59, Daniel P. Berrange wrote:
>>> On Mon, Sep 05, 2016 at 06:24:48PM +0200, Laszlo Ersek wrote:
 On 09/01/16 15:22, Marcel Apfelbaum wrote:
> +2.3 PCI only hierarchy
> +==
> +Legacy PCI devices can be plugged into pcie.0 as Integrated
> Devices or
> +into DMI-PCI bridge. PCI-PCI bridges can be plugged into DMI-PCI
> bridges
> +and can be nested until a depth of 6-7. DMI-BRIDGES should be plugged
> +only into pcie.0 bus.
> +
> +   pcie.0 bus
> +   --
> +||
> +   ---   --
> +   | PCI Dev |   | DMI-PCI BRIDGE |
> +   ----
> +   ||
> +-----
> +| PCI Dev || PCI-PCI Bridge |
> +-----
> + |   |
> +  --- ---
> +  | PCI Dev | | PCI Dev |
> +  --- ---

 Works for me, but I would again elaborate a little bit on keeping the
 hierarchy flat.

 First, in order to preserve compatibility with libvirt's current
 behavior, let's not plug a PCI device directly in to the DMI-PCI
 bridge,
 even if that's possible otherwise. Let's just say

 - there should be at most one DMI-PCI bridge (if a legacy PCI hierarchy
 is required),
>>>
>>> Why do you suggest this ? If the guest has multiple NUMA nodes
>>> and you're creating a PXB for each NUMA node, then it looks valid
>>> to want to have a DMI-PCI bridge attached to each PXB, so you can
>>> have legacy PCI devices on each NUMA node, instead of putting them
>>> all on the PCI bridge without NUMA affinity.
>>
>> You are right. I meant the above within one PCI Express root bus.
>>
>> Small correction to your wording though: you don't want to attach the
>> DMI-PCI bridge to the PXB device, but to the extra root bus provided by
>> the PXB.
> 
> This made me realize something - the root bus on a pxb-pcie controller
> has a single slot and that slot can accept either a pcie-root-port
> (ioh3420) or a dmi-to-pci-bridge. If you want to have both express and
> legacy PCI devices on the same NUMA node, then you would either need to
> create one pxb-pcie for the pcie-root-port and another for the
> dmi-to-pci-bridge, or you would need to put the pcie-root-port and
> dmi-to-pci-bridge onto different functions of the single slot. Should
> the latter work properly?

Yes, I expect so. (Famous last words? :))

> 
> 
>>
>>>
 - only PCI-PCI bridges should be plugged into the DMI-PCI bridge,
>>>
>>> What's the rational for that, as opposed to plugging devices directly
>>> into the DMI-PCI bridge which seems to work ?
>>
>> The rationale is that libvirt used to do it like this.
> 
> 
> Nah, that's just the *result* of the rationale that we wanted the
> devices to be hotpluggable. At some later date we learned the hotplug on
> a pci-bridge device doesn't work on a Q35 machine anyway, so it was kind
> of pointless (but we still do it because we hold out hope that hotplug
> of legacy PCI devices into a pci-bridge on Q35 machines will work one day)
> 
> 
>> And the rationale
>> for *that* is that DMI-PCI bridges cannot accept hotplugged devices,
>> while PCI-PCI bridges can.
>>
>> Technically nothing forbids (AFAICT) cold-plugging PCI devices into
>> DMI-PCI bridges, but this document is expressly not just about technical
>> constraints -- it's a policy document. We want to simplify / trim the
>> supported PCI and PCI Express hierarchies as much as possible.
>>
>> All valid *high-level* topology goals should be permitted / covered one
>> way or another by this document, but in as few ways as possible --
>> hopefully only one way. For example, if you read the rest of the thread,
>> flat hierarchies are preferred to deeply nested hierarchies, because
>> flat ones save on bus numbers
> 
> Do they?

Yes. Nesting implies bridges, and bridges take up bus numbers. For
example, in a PCI Express switch, the upstream port of the switch
consumes a bus number, with no practical usefulness.

IIRC we collectively devised a flat pattern elsewhere in the thread
where you could exhaust the 0..255 bus number space such that almost
every bridge (= taking up a bus number) would also be capable of
accepting a hot-plugged or cold-plugged PCI Express device. That is,
practically no wasted bus numbers.

Hm search this message for "population algorithm":

https://www.mail-archive.com/qemu-devel@nongnu.org/msg394730.html

and then Gerd's big 

Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-10-04 Thread Laine Stump

On 10/04/2016 11:45 AM, Alex Williamson wrote:

On Tue, 4 Oct 2016 15:59:11 +0100
"Daniel P. Berrange"  wrote:


On Mon, Sep 05, 2016 at 06:24:48PM +0200, Laszlo Ersek wrote:

On 09/01/16 15:22, Marcel Apfelbaum wrote:

+2.3 PCI only hierarchy
+==
+Legacy PCI devices can be plugged into pcie.0 as Integrated Devices or
+into DMI-PCI bridge. PCI-PCI bridges can be plugged into DMI-PCI bridges
+and can be nested until a depth of 6-7. DMI-BRIDGES should be plugged
+only into pcie.0 bus.
+
+   pcie.0 bus
+   --
+||
+   ---   --
+   | PCI Dev |   | DMI-PCI BRIDGE |
+   ----
+   ||
+-----
+| PCI Dev || PCI-PCI Bridge |
+-----
+ |   |
+  --- ---
+  | PCI Dev | | PCI Dev |
+  --- ---


Works for me, but I would again elaborate a little bit on keeping the
hierarchy flat.

First, in order to preserve compatibility with libvirt's current
behavior, let's not plug a PCI device directly in to the DMI-PCI bridge,
even if that's possible otherwise. Let's just say

- there should be at most one DMI-PCI bridge (if a legacy PCI hierarchy
is required),


Why do you suggest this ? If the guest has multiple NUMA nodes
and you're creating a PXB for each NUMA node, then it looks valid
to want to have a DMI-PCI bridge attached to each PXB, so you can
have legacy PCI devices on each NUMA node, instead of putting them
all on the PCI bridge without NUMA affinity.


Seems like this is one of those "generic" vs "specific" device issues.
We use the DMI-to-PCI bridge as if it were a PCIe-to-PCI bridge, but
DMI is actually an Intel proprietary interface, the bridge just has the
same software interface as a PCI bridge.  So while you can use it as a
generic PCIe-to-PCI bridge, it's at least going to make me cringe every
time.



If using it this way makes kittens cry or something, then we'd be happy 
to use a generic pcie-to-pci bridge if somebody created one :-)






- only PCI-PCI bridges should be plugged into the DMI-PCI bridge,


What's the rational for that, as opposed to plugging devices directly
into the DMI-PCI bridge which seems to work ?


IIRC, something about hotplug, but from a PCI perspective it doesn't
make any sense to me either.



At one point Marcel and Michael were discussing the possibility of 
making hotplug work on a dmi-to-pci-bridge. Currently it doesn't even 
work for pci-bridge so (as I think I said in another message just now) 
it is kind of pointless, although when I asked about eliminating use of 
pci-bridge in favor of just using dmi-to-pci-bridge directly, I got lots 
of "no" votes.




 Same with the restriction from using slot
0 on PCI bridges, there's no basis for that except on the root bus.


I tried allowing devices to be plugged into slot 0 of a pci-bridge in 
libvirt - qemu barfed, so I moved the "minSlot" for pci-bridge back up 
to 1. Slot 0 is completely usable on a dmi-to-pci-bridge though (and 
libvirt allows it). At this point, even if qemu enabled using slot 0 of 
a pci-bridge, libvirt wouldn't be able to expose that to users (unless 
the min/max slot of each PCI controller was made visible somewhere via QMP)





Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-10-04 Thread Laine Stump

On 10/04/2016 11:40 AM, Laszlo Ersek wrote:

On 10/04/16 16:59, Daniel P. Berrange wrote:

On Mon, Sep 05, 2016 at 06:24:48PM +0200, Laszlo Ersek wrote:

On 09/01/16 15:22, Marcel Apfelbaum wrote:

+2.3 PCI only hierarchy
+==
+Legacy PCI devices can be plugged into pcie.0 as Integrated Devices or
+into DMI-PCI bridge. PCI-PCI bridges can be plugged into DMI-PCI bridges
+and can be nested until a depth of 6-7. DMI-BRIDGES should be plugged
+only into pcie.0 bus.
+
+   pcie.0 bus
+   --
+||
+   ---   --
+   | PCI Dev |   | DMI-PCI BRIDGE |
+   ----
+   ||
+-----
+| PCI Dev || PCI-PCI Bridge |
+-----
+ |   |
+  --- ---
+  | PCI Dev | | PCI Dev |
+  --- ---


Works for me, but I would again elaborate a little bit on keeping the
hierarchy flat.

First, in order to preserve compatibility with libvirt's current
behavior, let's not plug a PCI device directly in to the DMI-PCI bridge,
even if that's possible otherwise. Let's just say

- there should be at most one DMI-PCI bridge (if a legacy PCI hierarchy
is required),


Why do you suggest this ? If the guest has multiple NUMA nodes
and you're creating a PXB for each NUMA node, then it looks valid
to want to have a DMI-PCI bridge attached to each PXB, so you can
have legacy PCI devices on each NUMA node, instead of putting them
all on the PCI bridge without NUMA affinity.


You are right. I meant the above within one PCI Express root bus.

Small correction to your wording though: you don't want to attach the
DMI-PCI bridge to the PXB device, but to the extra root bus provided by
the PXB.


This made me realize something - the root bus on a pxb-pcie controller 
has a single slot and that slot can accept either a pcie-root-port 
(ioh3420) or a dmi-to-pci-bridge. If you want to have both express and 
legacy PCI devices on the same NUMA node, then you would either need to 
create one pxb-pcie for the pcie-root-port and another for the 
dmi-to-pci-bridge, or you would need to put the pcie-root-port and 
dmi-to-pci-bridge onto different functions of the single slot. Should 
the latter work properly?








- only PCI-PCI bridges should be plugged into the DMI-PCI bridge,


What's the rational for that, as opposed to plugging devices directly
into the DMI-PCI bridge which seems to work ?


The rationale is that libvirt used to do it like this.



Nah, that's just the *result* of the rationale that we wanted the 
devices to be hotpluggable. At some later date we learned the hotplug on 
a pci-bridge device doesn't work on a Q35 machine anyway, so it was kind 
of pointless (but we still do it because we hold out hope that hotplug 
of legacy PCI devices into a pci-bridge on Q35 machines will work one day)




And the rationale
for *that* is that DMI-PCI bridges cannot accept hotplugged devices,
while PCI-PCI bridges can.

Technically nothing forbids (AFAICT) cold-plugging PCI devices into
DMI-PCI bridges, but this document is expressly not just about technical
constraints -- it's a policy document. We want to simplify / trim the
supported PCI and PCI Express hierarchies as much as possible.

All valid *high-level* topology goals should be permitted / covered one
way or another by this document, but in as few ways as possible --
hopefully only one way. For example, if you read the rest of the thread,
flat hierarchies are preferred to deeply nested hierarchies, because
flat ones save on bus numbers


Do they?


, are easier to setup and understand,
probably perform better, and don't lose any generality for cold- or hotplug.

Thanks
Laszlo






Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-10-04 Thread Alex Williamson
On Tue, 4 Oct 2016 15:59:11 +0100
"Daniel P. Berrange"  wrote:

> On Mon, Sep 05, 2016 at 06:24:48PM +0200, Laszlo Ersek wrote:
> > On 09/01/16 15:22, Marcel Apfelbaum wrote:  
> > > +2.3 PCI only hierarchy
> > > +==
> > > +Legacy PCI devices can be plugged into pcie.0 as Integrated Devices or
> > > +into DMI-PCI bridge. PCI-PCI bridges can be plugged into DMI-PCI bridges
> > > +and can be nested until a depth of 6-7. DMI-BRIDGES should be plugged
> > > +only into pcie.0 bus.
> > > +
> > > +   pcie.0 bus
> > > +   --
> > > +||
> > > +   ---   --
> > > +   | PCI Dev |   | DMI-PCI BRIDGE |
> > > +   ----
> > > +   ||
> > > +-----
> > > +| PCI Dev || PCI-PCI Bridge |
> > > +-----
> > > + |   |
> > > +  --- ---
> > > +  | PCI Dev | | PCI Dev |
> > > +  --- ---  
> > 
> > Works for me, but I would again elaborate a little bit on keeping the
> > hierarchy flat.
> > 
> > First, in order to preserve compatibility with libvirt's current
> > behavior, let's not plug a PCI device directly in to the DMI-PCI bridge,
> > even if that's possible otherwise. Let's just say
> > 
> > - there should be at most one DMI-PCI bridge (if a legacy PCI hierarchy
> > is required),  
> 
> Why do you suggest this ? If the guest has multiple NUMA nodes
> and you're creating a PXB for each NUMA node, then it looks valid
> to want to have a DMI-PCI bridge attached to each PXB, so you can
> have legacy PCI devices on each NUMA node, instead of putting them
> all on the PCI bridge without NUMA affinity.

Seems like this is one of those "generic" vs "specific" device issues.
We use the DMI-to-PCI bridge as if it were a PCIe-to-PCI bridge, but
DMI is actually an Intel proprietary interface, the bridge just has the
same software interface as a PCI bridge.  So while you can use it as a
generic PCIe-to-PCI bridge, it's at least going to make me cringe every
time.
 
> > - only PCI-PCI bridges should be plugged into the DMI-PCI bridge,  
> 
> What's the rational for that, as opposed to plugging devices directly
> into the DMI-PCI bridge which seems to work ?

IIRC, something about hotplug, but from a PCI perspective it doesn't
make any sense to me either.  Same with the restriction from using slot
0 on PCI bridges, there's no basis for that except on the root bus.
Thanks,

Alex



Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-10-04 Thread Laszlo Ersek
On 10/04/16 16:59, Daniel P. Berrange wrote:
> On Mon, Sep 05, 2016 at 06:24:48PM +0200, Laszlo Ersek wrote:
>> On 09/01/16 15:22, Marcel Apfelbaum wrote:
>>> +2.3 PCI only hierarchy
>>> +==
>>> +Legacy PCI devices can be plugged into pcie.0 as Integrated Devices or
>>> +into DMI-PCI bridge. PCI-PCI bridges can be plugged into DMI-PCI bridges
>>> +and can be nested until a depth of 6-7. DMI-BRIDGES should be plugged
>>> +only into pcie.0 bus.
>>> +
>>> +   pcie.0 bus
>>> +   --
>>> +||
>>> +   ---   --
>>> +   | PCI Dev |   | DMI-PCI BRIDGE |
>>> +   ----
>>> +   ||
>>> +-----
>>> +| PCI Dev || PCI-PCI Bridge |
>>> +-----
>>> + |   |
>>> +  --- ---
>>> +  | PCI Dev | | PCI Dev |
>>> +  --- ---
>>
>> Works for me, but I would again elaborate a little bit on keeping the
>> hierarchy flat.
>>
>> First, in order to preserve compatibility with libvirt's current
>> behavior, let's not plug a PCI device directly in to the DMI-PCI bridge,
>> even if that's possible otherwise. Let's just say
>>
>> - there should be at most one DMI-PCI bridge (if a legacy PCI hierarchy
>> is required),
> 
> Why do you suggest this ? If the guest has multiple NUMA nodes
> and you're creating a PXB for each NUMA node, then it looks valid
> to want to have a DMI-PCI bridge attached to each PXB, so you can
> have legacy PCI devices on each NUMA node, instead of putting them
> all on the PCI bridge without NUMA affinity.

You are right. I meant the above within one PCI Express root bus.

Small correction to your wording though: you don't want to attach the
DMI-PCI bridge to the PXB device, but to the extra root bus provided by
the PXB.

> 
>> - only PCI-PCI bridges should be plugged into the DMI-PCI bridge,
> 
> What's the rational for that, as opposed to plugging devices directly
> into the DMI-PCI bridge which seems to work ?

The rationale is that libvirt used to do it like this. And the rationale
for *that* is that DMI-PCI bridges cannot accept hotplugged devices,
while PCI-PCI bridges can.

Technically nothing forbids (AFAICT) cold-plugging PCI devices into
DMI-PCI bridges, but this document is expressly not just about technical
constraints -- it's a policy document. We want to simplify / trim the
supported PCI and PCI Express hierarchies as much as possible.

All valid *high-level* topology goals should be permitted / covered one
way or another by this document, but in as few ways as possible --
hopefully only one way. For example, if you read the rest of the thread,
flat hierarchies are preferred to deeply nested hierarchies, because
flat ones save on bus numbers, are easier to setup and understand,
probably perform better, and don't lose any generality for cold- or hotplug.

Thanks
Laszlo



Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-10-04 Thread Daniel P. Berrange
On Mon, Sep 05, 2016 at 06:24:48PM +0200, Laszlo Ersek wrote:
> On 09/01/16 15:22, Marcel Apfelbaum wrote:
> > +2.3 PCI only hierarchy
> > +==
> > +Legacy PCI devices can be plugged into pcie.0 as Integrated Devices or
> > +into DMI-PCI bridge. PCI-PCI bridges can be plugged into DMI-PCI bridges
> > +and can be nested until a depth of 6-7. DMI-BRIDGES should be plugged
> > +only into pcie.0 bus.
> > +
> > +   pcie.0 bus
> > +   --
> > +||
> > +   ---   --
> > +   | PCI Dev |   | DMI-PCI BRIDGE |
> > +   ----
> > +   ||
> > +-----
> > +| PCI Dev || PCI-PCI Bridge |
> > +-----
> > + |   |
> > +  --- ---
> > +  | PCI Dev | | PCI Dev |
> > +  --- ---
> 
> Works for me, but I would again elaborate a little bit on keeping the
> hierarchy flat.
> 
> First, in order to preserve compatibility with libvirt's current
> behavior, let's not plug a PCI device directly in to the DMI-PCI bridge,
> even if that's possible otherwise. Let's just say
> 
> - there should be at most one DMI-PCI bridge (if a legacy PCI hierarchy
> is required),

Why do you suggest this ? If the guest has multiple NUMA nodes
and you're creating a PXB for each NUMA node, then it looks valid
to want to have a DMI-PCI bridge attached to each PXB, so you can
have legacy PCI devices on each NUMA node, instead of putting them
all on the PCI bridge without NUMA affinity.

> - only PCI-PCI bridges should be plugged into the DMI-PCI bridge,

What's the rational for that, as opposed to plugging devices directly
into the DMI-PCI bridge which seems to work ?

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-16 Thread Andrea Bolognani
On Thu, 2016-09-15 at 17:20 +0300, Marcel Apfelbaum wrote:
> > Just catching up on mail after vacation and read this thread. Thanks
> > Marcel for writing this document (I guess a v1 is coming soon).
> 
> Yes, I am sorry but I got caught up with other stuff and I am
> going to be in PTO for a week, so V1 will take a little more time
> than I planned.

I finally caught up as well, and while I don't have much
value to contribute to the conversation, let me say this:
everything about this thread is absolutely awesome!

The amount of information one can absorb from the discussion
alone is amazing, but the guidelines contained in the
document we're crafting will certainly prove to be invaluable
to users and people working higher up in the stack alike.

Thanks Marcel and everyone involved. You guys rock! :)

-- 
Andrea Bolognani / Red Hat / Virtualization



Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-15 Thread Marcel Apfelbaum

On 09/15/2016 11:38 AM, Andrew Jones wrote:

On Wed, Sep 07, 2016 at 10:39:28PM +0300, Marcel Apfelbaum wrote:

On 09/07/2016 08:55 PM, Laine Stump wrote:

On 09/07/2016 04:06 AM, Marcel Apfelbaum wrote:

[snip]

Good point, maybe libvirt can avoid adding switches unless the user
explicitly
asked for them. I checked and it a actually works fine in QEMU.


I'm just now writing the code that auto-adds *-ports as they are needed, and 
doing it this way simplifies it *immensely*.

When I had to think about the possibility of needing upstream/downstream 
switches, as an endpoint device was added, I would need to check if a 
(root|downstream)-port was available and if not I might
be able to just add a root-port, or I might have to add a downstream-port; if 
the only option was a downstream port, then *that* might require adding a new 
*upstream* port.

If I can limit libvirt to only auto-adding root-ports (and if there is no 
downside to putting multiple root ports on a single root bus port), then I just 
need to find an empty function of an empty
slot on the root bus, add a root-port, and I'm done (and since 224 is *a lot*, 
I think at least for now it's okay to punt once they get past that point).

So, *is* there any downside to doing this?



No downside I can think of.
Just be sure to emphasize the auto-add mechanism stops at 'x' devices. If the 
user needs more,
he should manually add switches and manually assign the devices to the 
Downstream Ports.



Just catching up on mail after vacation and read this thread. Thanks
Marcel for writing this document (I guess a v1 is coming soon).


Yes, I am sorry but I got caught up with other stuff and I am
going to be in PTO for a week, so V1 will take a little more time
than I planned.

This

will be very useful for determining the best default configuration of
a virtio-pci mach-virt.



It will be very good if this doc will match both x86 and match-virt PCIe 
machines.
Your review would be appreciated.


FWIW, here is the proposal that I started formulating when I experimented
with this several months ago;

 - PCIe-only (disable-modern=off, disable-legacy=on)


If the virtio devices are plugged into PCI Express Root Ports
or Downstream Ports this is already the default configuration,
you don't need to add the disable-* anymore.


 - No legacy PCI support, i.e. no bridges (yup, I'm a PCIe purist,
   but don't have a leg to stand on if push came to shove)


Yes... We'll say that legacy PCI support is optional.


 - use one or more ports for virtio-scsi controllers for disks, one is
   probably enough
 - use one or more ports with multifunction, allowing up to 8 functions,
   for virtio-net, one port is probably enough


As Alex Williamson mentioned, PCI Express Root Ports are actually functions,
not devices, so you can have up to 8 Ports per slot. This will be better
than make the virtio-* multi-function devices because then you would need
to hot-plug/hot-unplug all of them.
If hot-plug is not an issue, using multi-function devices into multi-function
Ports will save a lot of bus numbers witch are, Like Laszlo mentioned a
scarce resource.


 - Add N extra ports for hotplug, N defaulting to 2
   - hotplug devices to first N-1 ports, reserving last for a switch
   - if switch is needed, hotplug it with M downstream ports
 (M defaulting to 2*(N-1)+1)


We would prefer multi-function ports to switches, since you'll go out
of bus numbers before you use all the PCI Express Root Ports anyway. (see 
previous mails)

However the switches will be supported for cases when you have a
lot of Integrated Devices and the Root Ports are not enough, or
to enable some testing scenarios.


 - Encourage somebody to develop generic versions of ports and switches,
   hi Marcel :-), and exclusively use those in the configuration



My goal is to try to come up with them for 2.8, but since I haven't started
to work on them yet, I can't commit :)

Thanks,
Marcel


Thanks,
drew






Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-15 Thread Andrew Jones
On Wed, Sep 07, 2016 at 10:39:28PM +0300, Marcel Apfelbaum wrote:
> On 09/07/2016 08:55 PM, Laine Stump wrote:
> > On 09/07/2016 04:06 AM, Marcel Apfelbaum wrote:
[snip]
> > > Good point, maybe libvirt can avoid adding switches unless the user
> > > explicitly
> > > asked for them. I checked and it a actually works fine in QEMU.
> > 
> > I'm just now writing the code that auto-adds *-ports as they are needed, 
> > and doing it this way simplifies it *immensely*.
> > 
> > When I had to think about the possibility of needing upstream/downstream 
> > switches, as an endpoint device was added, I would need to check if a 
> > (root|downstream)-port was available and if not I might
> > be able to just add a root-port, or I might have to add a downstream-port; 
> > if the only option was a downstream port, then *that* might require adding 
> > a new *upstream* port.
> > 
> > If I can limit libvirt to only auto-adding root-ports (and if there is no 
> > downside to putting multiple root ports on a single root bus port), then I 
> > just need to find an empty function of an empty
> > slot on the root bus, add a root-port, and I'm done (and since 224 is *a 
> > lot*, I think at least for now it's okay to punt once they get past that 
> > point).
> > 
> > So, *is* there any downside to doing this?
> > 
> 
> No downside I can think of.
> Just be sure to emphasize the auto-add mechanism stops at 'x' devices. If the 
> user needs more,
> he should manually add switches and manually assign the devices to the 
> Downstream Ports.
> 

Just catching up on mail after vacation and read this thread. Thanks
Marcel for writing this document (I guess a v1 is coming soon). This
will be very useful for determining the best default configuration of
a virtio-pci mach-virt.

FWIW, here is the proposal that I started formulating when I experimented
with this several months ago;

 - PCIe-only (disable-modern=off, disable-legacy=on)
 - No legacy PCI support, i.e. no bridges (yup, I'm a PCIe purist,
   but don't have a leg to stand on if push came to shove)
 - use one or more ports for virtio-scsi controllers for disks, one is
   probably enough
 - use one or more ports with multifunction, allowing up to 8 functions,
   for virtio-net, one port is probably enough
 - Add N extra ports for hotplug, N defaulting to 2
   - hotplug devices to first N-1 ports, reserving last for a switch
   - if switch is needed, hotplug it with M downstream ports
 (M defaulting to 2*(N-1)+1)
 - Encourage somebody to develop generic versions of ports and switches,
   hi Marcel :-), and exclusively use those in the configuration

Thanks,
drew



Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-08 Thread Gerd Hoffmann
  Hi,

> > Good point, maybe libvirt can avoid adding switches unless the user
> > explicitly
> > asked for them. I checked and it a actually works fine in QEMU.

> So, *is* there any downside to doing this?

I don't think so.

The only issue I can think of when it comes to multifunction is hotplug,
because hotplug works at slot level in pci so you can't hotplug single
functions.

But as you can't hotplug the root ports in the first place this is
nothing we have to worry about in this specific case.

cheers,
  Gerd




Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-08 Thread Gerd Hoffmann
  Hi,

> I had understood that the xhci could be a legacy PCI device or a PCI 
> Express device depending on the socket it was plugged into (or was that 
> possibly just someone doing some hand-waving over the fact that 
> obscuring the PCI Express capabilities effectively turns it into a 
> legacy PCI device?).

That is correct, it'll work both ways.

> If that's the case, why do you prefer the default 
> USB controller to be added in a root-port rather than as an integrated 
> device (which is what we do with the group of USB2 controllers, as well 
> as the primary video device)

Trying to mimic real hardware as close as possible.  The ich9 uhci/ehci
controllers are actually integrated chipset devices.  The nec xhci is a
express device in physical hardware.

That is more a personal preference though, there are no strong technical
reasons to do it that way.

cheers,
  Gerd




Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-07 Thread Laine Stump

On 09/07/2016 03:39 PM, Marcel Apfelbaum wrote:

On 09/07/2016 08:55 PM, Laine Stump wrote:

On 09/07/2016 04:06 AM, Marcel Apfelbaum wrote:

On 09/07/2016 09:21 AM, Gerd Hoffmann wrote:

  Hi,


ports, if that's allowed). For example:

-  1-32 ports needed: use root ports only

- 33-64 ports needed: use 31 root ports, and one switch with 2-32
downstream ports


I expect you rarely need any switches.  You can go multifunction with
the pcie root ports.  Which is how physical q35 works too btw,
typically
the root ports are on slot 1c for intel chipsets:

nilsson root ~# lspci -s1c
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 1 (rev c4)
00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 2 (rev c4)
00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 3 (rev c4)

Root bus has 32 slots, a few are taken (host bridge @ 00.0, lpc+sata @
1f.*, pci bridge @ 1e.0, maybe vga @ 01.0), leaving 28 free slots.
With
8 functions each you can have up to 224 root ports without any
switches,
and you have not many pci bus numbers left until you hit the 256 busses
limit ...



Good point, maybe libvirt can avoid adding switches unless the user
explicitly
asked for them. I checked and it a actually works fine in QEMU.


I'm just now writing the code that auto-adds *-ports as they are
needed, and doing it this way simplifies it *immensely*.

When I had to think about the possibility of needing
upstream/downstream switches, as an endpoint device was added, I would
need to check if a (root|downstream)-port was available and if not I
might
be able to just add a root-port, or I might have to add a
downstream-port; if the only option was a downstream port, then *that*
might require adding a new *upstream* port.

If I can limit libvirt to only auto-adding root-ports (and if there is
no downside to putting multiple root ports on a single root bus port),
then I just need to find an empty function of an empty
slot on the root bus, add a root-port, and I'm done (and since 224 is
*a lot*, I think at least for now it's okay to punt once they get past
that point).

So, *is* there any downside to doing this?



No downside I can think of.
Just be sure to emphasize the auto-add mechanism stops at 'x' devices.
If the user needs more,
he should manually add switches and manually assign the devices to the
Downstream Ports.


Actually, just the former - once the downstream ports are added, they'll 
automatically be used for endpoint devices (and even new upstream ports) 
as needed.





Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-07 Thread Marcel Apfelbaum

On 09/07/2016 08:55 PM, Laine Stump wrote:

On 09/07/2016 04:06 AM, Marcel Apfelbaum wrote:

On 09/07/2016 09:21 AM, Gerd Hoffmann wrote:

  Hi,


ports, if that's allowed). For example:

-  1-32 ports needed: use root ports only

- 33-64 ports needed: use 31 root ports, and one switch with 2-32
downstream ports


I expect you rarely need any switches.  You can go multifunction with
the pcie root ports.  Which is how physical q35 works too btw, typically
the root ports are on slot 1c for intel chipsets:

nilsson root ~# lspci -s1c
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 1 (rev c4)
00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 2 (rev c4)
00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 3 (rev c4)

Root bus has 32 slots, a few are taken (host bridge @ 00.0, lpc+sata @
1f.*, pci bridge @ 1e.0, maybe vga @ 01.0), leaving 28 free slots.  With
8 functions each you can have up to 224 root ports without any switches,
and you have not many pci bus numbers left until you hit the 256 busses
limit ...



Good point, maybe libvirt can avoid adding switches unless the user
explicitly
asked for them. I checked and it a actually works fine in QEMU.


I'm just now writing the code that auto-adds *-ports as they are needed, and 
doing it this way simplifies it *immensely*.

When I had to think about the possibility of needing upstream/downstream 
switches, as an endpoint device was added, I would need to check if a 
(root|downstream)-port was available and if not I might
be able to just add a root-port, or I might have to add a downstream-port; if 
the only option was a downstream port, then *that* might require adding a new 
*upstream* port.

If I can limit libvirt to only auto-adding root-ports (and if there is no 
downside to putting multiple root ports on a single root bus port), then I just 
need to find an empty function of an empty
slot on the root bus, add a root-port, and I'm done (and since 224 is *a lot*, 
I think at least for now it's okay to punt once they get past that point).

So, *is* there any downside to doing this?



No downside I can think of.
Just be sure to emphasize the auto-add mechanism stops at 'x' devices. If the 
user needs more,
he should manually add switches and manually assign the devices to the 
Downstream Ports.

Thanks,
Marcel









Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-07 Thread Marcel Apfelbaum

On 09/07/2016 07:08 PM, Alex Williamson wrote:

On Wed, 7 Sep 2016 11:06:45 +0300
Marcel Apfelbaum  wrote:


On 09/07/2016 09:21 AM, Gerd Hoffmann wrote:

  Hi,


ports, if that's allowed). For example:

-  1-32 ports needed: use root ports only

- 33-64 ports needed: use 31 root ports, and one switch with 2-32
downstream ports


I expect you rarely need any switches.  You can go multifunction with
the pcie root ports.  Which is how physical q35 works too btw, typically
the root ports are on slot 1c for intel chipsets:

nilsson root ~# lspci -s1c
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 1 (rev c4)
00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 2 (rev c4)
00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 3 (rev c4)

Root bus has 32 slots, a few are taken (host bridge @ 00.0, lpc+sata @
1f.*, pci bridge @ 1e.0, maybe vga @ 01.0), leaving 28 free slots.  With
8 functions each you can have up to 224 root ports without any switches,
and you have not many pci bus numbers left until you hit the 256 busses
limit ...



Good point, maybe libvirt can avoid adding switches unless the user explicitly
asked for them. I checked and it a actually works fine in QEMU.

BTW, when we will implement ARI we can have up to 256 Root Ports on a single 
slot...


"Root Ports on a single slot"... The entire idea of ARI is that there
is no slot, the slot/function address space is combined into one big
8-bit free-for-all.  Besides, can you do ARI on the root complex?


No, we can't :(
Indeed, for the Root Complex bus we need the (bus:)dev:fn tuple, because we can
have multiple devices plugged in. Thanks for the correction.

Thanks,
Marcel


Typically you need to look at whether the port upstream of a given
device supports ARI to enable, there's no upstream PCI device on the
root complex.  This is the suggestion I gave you for switches, if the
upstream switch port supports ARI then we can have 256 downstream
switch ports (assuming ARI isn't only specific to downstream ports).
Thanks,

Alex






Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-07 Thread Laine Stump

On 09/07/2016 03:04 AM, Gerd Hoffmann wrote:

  Hi,


Side note for usb: In practice you don't want to use the tons of
uhci/ehci controllers present in the original q35 but plug xhci into one
of the pcie root ports instead (unless your guest doesn't support xhci).


I've wondered about that recently. For i440fx machinetypes if you don't
specify a USB controller in libvirt's domain config, you will
automatically get the PIIX3 USB controller added. In order to maintain
consistency on the topic of "auto-adding USB when not specified", if the
machinetype is Q35 we will autoadd a set of USB2 (uhci/ehci) controllers
(I think I added that based on your comments at the time :-). But
recently I've mostly been hearing that people should use xhci instead.
So should libvirt add a single xhci (rather than the uhci/ehci set) at
the same port when no USB is specified?


Big advantage of xhci is that the hardware design is much more
virtualization friendly, i.e. it needs alot less cpu cycles to emulate
than uhci/ohci/ehci.  Also xhci can handle all usb speeds, so you don't
need the complicated uhci/ehci companion setup with uhci for usb1 and
ehci for usb2 devices.

The problem with xhci is guest support.  Which becomes less and less of
a problem over time of course.  All our firmware (seabios/edk2/slof) has
xhci support meanwhile.  ppc64 switched from ohci to xhci by default in
rhel-7.3.  Finding linux guests without xhci support is pretty hard
meanwhile.  Maybe RHEL-5 qualifies.  Windows 8 + newer ships with xhci
drivers.

So, yea, maybe it's time to switch the default for q35 to xhci,
especially if we keep uhci as default for i440fx and suggest to use that
machine type for oldish guests.  But I'd suggest to place xhci in a pcie
root port then, so maybe wait with that until libvirt can auto-add pcie
root ports as needed ...


I'm doing that right now  (giving libvirt the ability to auto-add root 
ports) :-)


I had understood that the xhci could be a legacy PCI device or a PCI 
Express device depending on the socket it was plugged into (or was that 
possibly just someone doing some hand-waving over the fact that 
obscuring the PCI Express capabilities effectively turns it into a 
legacy PCI device?). If that's the case, why do you prefer the default 
USB controller to be added in a root-port rather than as an integrated 
device (which is what we do with the group of USB2 controllers, as well 
as the primary video device)





Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-07 Thread Laine Stump

On 09/07/2016 04:06 AM, Marcel Apfelbaum wrote:

On 09/07/2016 09:21 AM, Gerd Hoffmann wrote:

  Hi,


ports, if that's allowed). For example:

-  1-32 ports needed: use root ports only

- 33-64 ports needed: use 31 root ports, and one switch with 2-32
downstream ports


I expect you rarely need any switches.  You can go multifunction with
the pcie root ports.  Which is how physical q35 works too btw, typically
the root ports are on slot 1c for intel chipsets:

nilsson root ~# lspci -s1c
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 1 (rev c4)
00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 2 (rev c4)
00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 3 (rev c4)

Root bus has 32 slots, a few are taken (host bridge @ 00.0, lpc+sata @
1f.*, pci bridge @ 1e.0, maybe vga @ 01.0), leaving 28 free slots.  With
8 functions each you can have up to 224 root ports without any switches,
and you have not many pci bus numbers left until you hit the 256 busses
limit ...



Good point, maybe libvirt can avoid adding switches unless the user
explicitly
asked for them. I checked and it a actually works fine in QEMU.


I'm just now writing the code that auto-adds *-ports as they are needed, 
and doing it this way simplifies it *immensely*.


When I had to think about the possibility of needing upstream/downstream 
switches, as an endpoint device was added, I would need to check if a 
(root|downstream)-port was available and if not I might be able to just 
add a root-port, or I might have to add a downstream-port; if the only 
option was a downstream port, then *that* might require adding a new 
*upstream* port.


If I can limit libvirt to only auto-adding root-ports (and if there is 
no downside to putting multiple root ports on a single root bus port), 
then I just need to find an empty function of an empty slot on the root 
bus, add a root-port, and I'm done (and since 224 is *a lot*, I think at 
least for now it's okay to punt once they get past that point).


So, *is* there any downside to doing this?





Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-07 Thread Alex Williamson
On Wed, 7 Sep 2016 11:06:45 +0300
Marcel Apfelbaum  wrote:

> On 09/07/2016 09:21 AM, Gerd Hoffmann wrote:
> >   Hi,
> >  
>  ports, if that's allowed). For example:
> 
>  -  1-32 ports needed: use root ports only
> 
>  - 33-64 ports needed: use 31 root ports, and one switch with 2-32
>  downstream ports  
> >
> > I expect you rarely need any switches.  You can go multifunction with
> > the pcie root ports.  Which is how physical q35 works too btw, typically
> > the root ports are on slot 1c for intel chipsets:
> >
> > nilsson root ~# lspci -s1c
> > 00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
> > Family PCI Express Root Port 1 (rev c4)
> > 00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
> > Family PCI Express Root Port 2 (rev c4)
> > 00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
> > Family PCI Express Root Port 3 (rev c4)
> >
> > Root bus has 32 slots, a few are taken (host bridge @ 00.0, lpc+sata @
> > 1f.*, pci bridge @ 1e.0, maybe vga @ 01.0), leaving 28 free slots.  With
> > 8 functions each you can have up to 224 root ports without any switches,
> > and you have not many pci bus numbers left until you hit the 256 busses
> > limit ...
> >  
> 
> Good point, maybe libvirt can avoid adding switches unless the user explicitly
> asked for them. I checked and it a actually works fine in QEMU.
> 
> BTW, when we will implement ARI we can have up to 256 Root Ports on a single 
> slot...

"Root Ports on a single slot"... The entire idea of ARI is that there
is no slot, the slot/function address space is combined into one big
8-bit free-for-all.  Besides, can you do ARI on the root complex?
Typically you need to look at whether the port upstream of a given
device supports ARI to enable, there's no upstream PCI device on the
root complex.  This is the suggestion I gave you for switches, if the
upstream switch port supports ARI then we can have 256 downstream
switch ports (assuming ARI isn't only specific to downstream ports).
Thanks,

Alex



Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-07 Thread Marcel Apfelbaum

On 09/07/2016 11:06 AM, Laszlo Ersek wrote:

On 09/07/16 08:21, Gerd Hoffmann wrote:

  Hi,


ports, if that's allowed). For example:

-  1-32 ports needed: use root ports only

- 33-64 ports needed: use 31 root ports, and one switch with 2-32
downstream ports


I expect you rarely need any switches.  You can go multifunction with
the pcie root ports.  Which is how physical q35 works too btw, typically
the root ports are on slot 1c for intel chipsets:

nilsson root ~# lspci -s1c
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 1 (rev c4)
00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 2 (rev c4)
00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 3 (rev c4)

Root bus has 32 slots, a few are taken (host bridge @ 00.0, lpc+sata @
1f.*, pci bridge @ 1e.0, maybe vga @ 01.0), leaving 28 free slots.  With
8 functions each you can have up to 224 root ports without any switches,
and you have not many pci bus numbers left until you hit the 256 busses
limit ...


This is an absolutely great idea. I wonder if it allows us to rip out
all the language about switches, upstream ports and downstream ports. It
would be awesome if we didn't have to mention and draw those things *at
all* (better: if we could summarily discourage their use).

Marcel, what do you think?


While I do think using multi-function Root Ports is definitely the preferred
way to go, keeping the switches around is not so bad, even to have
all PCI Express controllers available for testing scenarios.
We can (and will) of course state  we prefer multi-function Root Ports over
switches and ask libvirt/other management software to not add switches
unless are specifically requested by users.

Thanks,
Marcel



Thanks
Laszlo






Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-07 Thread Laszlo Ersek
On 09/07/16 08:21, Gerd Hoffmann wrote:
>   Hi,
> 
 ports, if that's allowed). For example:

 -  1-32 ports needed: use root ports only

 - 33-64 ports needed: use 31 root ports, and one switch with 2-32
 downstream ports
> 
> I expect you rarely need any switches.  You can go multifunction with
> the pcie root ports.  Which is how physical q35 works too btw, typically
> the root ports are on slot 1c for intel chipsets:
> 
> nilsson root ~# lspci -s1c
> 00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
> Family PCI Express Root Port 1 (rev c4)
> 00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
> Family PCI Express Root Port 2 (rev c4)
> 00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
> Family PCI Express Root Port 3 (rev c4)
> 
> Root bus has 32 slots, a few are taken (host bridge @ 00.0, lpc+sata @
> 1f.*, pci bridge @ 1e.0, maybe vga @ 01.0), leaving 28 free slots.  With
> 8 functions each you can have up to 224 root ports without any switches,
> and you have not many pci bus numbers left until you hit the 256 busses
> limit ...

This is an absolutely great idea. I wonder if it allows us to rip out
all the language about switches, upstream ports and downstream ports. It
would be awesome if we didn't have to mention and draw those things *at
all* (better: if we could summarily discourage their use).

Marcel, what do you think?

Thanks
Laszlo



Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-07 Thread Marcel Apfelbaum

On 09/07/2016 09:21 AM, Gerd Hoffmann wrote:

  Hi,


ports, if that's allowed). For example:

-  1-32 ports needed: use root ports only

- 33-64 ports needed: use 31 root ports, and one switch with 2-32
downstream ports


I expect you rarely need any switches.  You can go multifunction with
the pcie root ports.  Which is how physical q35 works too btw, typically
the root ports are on slot 1c for intel chipsets:

nilsson root ~# lspci -s1c
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 1 (rev c4)
00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 2 (rev c4)
00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 3 (rev c4)

Root bus has 32 slots, a few are taken (host bridge @ 00.0, lpc+sata @
1f.*, pci bridge @ 1e.0, maybe vga @ 01.0), leaving 28 free slots.  With
8 functions each you can have up to 224 root ports without any switches,
and you have not many pci bus numbers left until you hit the 256 busses
limit ...



Good point, maybe libvirt can avoid adding switches unless the user explicitly
asked for them. I checked and it a actually works fine in QEMU.

BTW, when we will implement ARI we can have up to 256 Root Ports on a single 
slot...

Thanks,
Marcel


cheers,
  Gerd






Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-07 Thread Marcel Apfelbaum

On 09/07/2016 10:53 AM, Laszlo Ersek wrote:

On 09/06/16 13:35, Gerd Hoffmann wrote:

  Hi,



[...]



Side note: the linux kernel allocates io space nevertheless, so
checking /proc/ioports after boot doesn't tell you what the firmware
did.


Yeah, we've got to convince Linux to stop doing that. Earlier Alex
mentioned the "hpiosize" and "hpmemsize" PCI subsystem options for the
kernel:

  hpiosize=nn[KMG]The fixed amount of bus space which is
  reserved for hotplug bridge's IO window.
  Default size is 256 bytes.
  hpmemsize=nn[KMG]   The fixed amount of bus space which is
  reserved for hotplug bridge's memory window.
  Default size is 2 megabytes.

This document (once complete) would be the basis for tweaking that stuff
in the kernel too. Primarily, "hpiosize" should default to zero, because
its current nonzero default (which gets rounded up to 4KB somewhere) is
what exhausts the IO space, if we have more than a handful of PCI
Express downstream / root ports.

Maybe we can add a PCI quirk for this to the kernel, for QEMU's PCI
Express ports (all of them -- root, upstream, downstream).



Yes, once we will have our "own" controllers and not Intel emulations as today.

Thanks,
Marcel


Thanks
Laszlo






Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-07 Thread Laszlo Ersek
On 09/06/16 13:35, Gerd Hoffmann wrote:
>   Hi,
> 
>>> +Plug only legacy PCI devices as Root Complex Integrated Devices
>>> +even if the PCIe spec does not forbid PCIe devices.
>>
>> I suggest "even though the PCI Express spec does not forbid PCI Express
>> devices as Integrated Devices". (Detail is good!)
> 
> While talking about integrated devices:  There is docs/q35-chipset.cfg,
> which documents how to mimic q35 with integrated devices as close and
> complete as possible.
> 
> Usage:
>   qemu-system-x86_64 -M q35 -readconfig docs/q35-chipset.cfg $args
> 
> Side note for usb: In practice you don't want to use the tons of
> uhci/ehci controllers present in the original q35 but plug xhci into one
> of the pcie root ports instead (unless your guest doesn't support xhci).
> 
>>> +as required by PCI spec will reserve a 4K IO range for each.
>>> +The firmware used by QEMU (SeaBIOS/OVMF) will further optimize
>>> +it by allocation the IO space only if there is at least a device
>>> +with IO BARs plugged into the bridge.
>>
>> This used to be true, but is no longer true, for OVMF. And I think it's
>> actually correct: we *should* keep the 4K IO reservation per PCI-PCI bridge.
>>
>> (But, certainly no IO reservation for PCI Express root port, upstream
>> port, or downstream port! And i'll need your help for telling these
>> apart in OVMF.)
> 
> IIRC the same is true for seabios, it looks for the pcie capability and
> skips io space allocation on pcie ports only.
> 
> Side note: the linux kernel allocates io space nevertheless, so
> checking /proc/ioports after boot doesn't tell you what the firmware
> did.

Yeah, we've got to convince Linux to stop doing that. Earlier Alex
mentioned the "hpiosize" and "hpmemsize" PCI subsystem options for the
kernel:

  hpiosize=nn[KMG]The fixed amount of bus space which is
  reserved for hotplug bridge's IO window.
  Default size is 256 bytes.
  hpmemsize=nn[KMG]   The fixed amount of bus space which is
  reserved for hotplug bridge's memory window.
  Default size is 2 megabytes.

This document (once complete) would be the basis for tweaking that stuff
in the kernel too. Primarily, "hpiosize" should default to zero, because
its current nonzero default (which gets rounded up to 4KB somewhere) is
what exhausts the IO space, if we have more than a handful of PCI
Express downstream / root ports.

Maybe we can add a PCI quirk for this to the kernel, for QEMU's PCI
Express ports (all of them -- root, upstream, downstream).

Thanks
Laszlo



Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-07 Thread Laszlo Ersek
On 09/06/16 20:32, Alex Williamson wrote:
> On Tue, 6 Sep 2016 21:14:11 +0300
> Marcel Apfelbaum  wrote:
> 
>> On 09/06/2016 06:38 PM, Alex Williamson wrote:
>>> On Thu,  1 Sep 2016 16:22:07 +0300
>>> Marcel Apfelbaum  wrote:

 +5. Device assignment
 +
 +Host devices are mostly PCIe and should be plugged only into PCIe ports.
 +PCI-PCI bridge slots can be used for legacy PCI host devices.  
>>>
>>> I don't think we have any evidence to suggest this as a best practice.
>>> We have a lot of experience placing PCIe host devices into a
>>> conventional PCI topology on 440FX.  We don't have nearly as much
>>> experience placing them into downstream PCIe ports.  This seems like
>>> how we would like for things to behave to look like real hardware
>>> platforms, but it's just navel gazing whether it's actually the right
>>> thing to do.  Thanks,
>>>  
>>
>> I had to look up the "navel gazing"...
>> Why I do agree with your statements I prefer a cleaner PCI Express machine
>> with as little legacy PCI as possible. I use this document as an opportunity
>> to start gaining experience with device assignment into PCI Express Root 
>> Ports
>> and Downstream Ports and solve the issues long the way.
> 
> That's exactly what I mean, there's an ulterior, personal motivation in
> this suggestion that's not really backed by facts.  You'd like to make
> the recommendation to place PCIe assigned devices into PCIe slots, but
> that's not necessarily the configuration with the best track record
> right now.  In fact there's really no advantage to a user to do this
> unless they have a device that needs PCIe (radeon and tg3
> potentially come to mind here).  So while I agree with you from an
> ideological standpoint, I don't think that's sufficient to make the
> recommendation you're proposing here.  Thanks,

To reinforce what Marcel already replied, this document is all about
ideology / policy, and not a status report. We should be looking
forward, not backward. Permitting an exception for plugging a PCI
Express device into a legacy PCI slot just because the PCI Express
device is an assigned, physical one, dilutes the message, and will lead
to all kinds of mess elsewhere.

I'm acutely aware that conforming to the "PCI Express into PCI Express"
recommendation might not *work* in practice, but that doesn't matter
right now. This document should translate to a task list for QEMU and
firmware developers alike. At least I need this document to exist
primarily so I know what to do in OVMF, and what topologies in QE's BZs
to reject out of hand. If the "PCI Express into PCI Express" guideline
will require some VFIO work, and causes Q35 (not i440fx) users some
pain, so be it, IMO.

I'm saying this knowing that you know about ten billion times more about
PCI / PCI Express than I do.

Thanks
Laszlo




Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-07 Thread Gerd Hoffmann
  Hi,

> > Side note for usb: In practice you don't want to use the tons of
> > uhci/ehci controllers present in the original q35 but plug xhci into one
> > of the pcie root ports instead (unless your guest doesn't support xhci).
> 
> I've wondered about that recently. For i440fx machinetypes if you don't 
> specify a USB controller in libvirt's domain config, you will 
> automatically get the PIIX3 USB controller added. In order to maintain 
> consistency on the topic of "auto-adding USB when not specified", if the 
> machinetype is Q35 we will autoadd a set of USB2 (uhci/ehci) controllers 
> (I think I added that based on your comments at the time :-). But 
> recently I've mostly been hearing that people should use xhci instead. 
> So should libvirt add a single xhci (rather than the uhci/ehci set) at 
> the same port when no USB is specified?

Big advantage of xhci is that the hardware design is much more
virtualization friendly, i.e. it needs alot less cpu cycles to emulate
than uhci/ohci/ehci.  Also xhci can handle all usb speeds, so you don't
need the complicated uhci/ehci companion setup with uhci for usb1 and
ehci for usb2 devices.

The problem with xhci is guest support.  Which becomes less and less of
a problem over time of course.  All our firmware (seabios/edk2/slof) has
xhci support meanwhile.  ppc64 switched from ohci to xhci by default in
rhel-7.3.  Finding linux guests without xhci support is pretty hard
meanwhile.  Maybe RHEL-5 qualifies.  Windows 8 + newer ships with xhci
drivers.

So, yea, maybe it's time to switch the default for q35 to xhci,
especially if we keep uhci as default for i440fx and suggest to use that
machine type for oldish guests.  But I'd suggest to place xhci in a pcie
root port then, so maybe wait with that until libvirt can auto-add pcie
root ports as needed ...

cheers,
  Gerd




Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-07 Thread Gerd Hoffmann
  Hi,

> >> ports, if that's allowed). For example:
> >>
> >> -  1-32 ports needed: use root ports only
> >>
> >> - 33-64 ports needed: use 31 root ports, and one switch with 2-32
> >> downstream ports

I expect you rarely need any switches.  You can go multifunction with
the pcie root ports.  Which is how physical q35 works too btw, typically
the root ports are on slot 1c for intel chipsets:

nilsson root ~# lspci -s1c
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 1 (rev c4)
00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 2 (rev c4)
00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 3 (rev c4)

Root bus has 32 slots, a few are taken (host bridge @ 00.0, lpc+sata @
1f.*, pci bridge @ 1e.0, maybe vga @ 01.0), leaving 28 free slots.  With
8 functions each you can have up to 224 root ports without any switches,
and you have not many pci bus numbers left until you hit the 256 busses
limit ...

cheers,
  Gerd



Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-06 Thread Marcel Apfelbaum

On 09/06/2016 06:38 PM, Alex Williamson wrote:

On Thu,  1 Sep 2016 16:22:07 +0300
Marcel Apfelbaum  wrote:


Proposes best practices on how to use PCIe/PCI device
in PCIe based machines and explain the reasoning behind them.

Signed-off-by: Marcel Apfelbaum 
---

Hi,

Please add your comments on what to add/remove/edit to make this doc usable.

Thanks,
Marcel

 docs/pcie.txt | 145 ++
 1 file changed, 145 insertions(+)
 create mode 100644 docs/pcie.txt

diff --git a/docs/pcie.txt b/docs/pcie.txt
new file mode 100644
index 000..52a8830
--- /dev/null
+++ b/docs/pcie.txt
@@ -0,0 +1,145 @@
+PCI EXPRESS GUIDELINES
+==
+
+1. Introduction
+
+The doc proposes best practices on how to use PCIe/PCI device
+in PCIe based machines and explains the reasoning behind them.
+
+
+2. Device placement strategy
+
+QEMU does not have a clear socket-device matching mechanism
+and allows any PCI/PCIe device to be plugged into any PCI/PCIe slot.
+Plugging a PCI device into a PCIe device might not always work and
+is weird anyway since it cannot be done for "bare metal".
+Plugging a PCIe device into a PCI slot will hide the Extended
+Configuration Space thus is also not recommended.
+
+The recommendation is to separate the PCIe and PCI hierarchies.
+PCIe devices should be plugged only into PCIe Root Ports and
+PCIe Downstream ports (let's call them PCIe ports).
+
+2.1 Root Bus (pcie.0)
+=
+Plug only legacy PCI devices as Root Complex Integrated Devices
+even if the PCIe spec does not forbid PCIe devices. The existing




Hi Alex,
Thanks for the review.



Surely we can have PCIe device on the root complex??



Yes, we can, is not forbidden. Even so, my understanding is
the main use for Integrated Devices is for legacy devices
like sound cards or nics that come with the motherboard.
Because of that my concern is we might be missing some support
for that in QEMU or even in linux kernel.

One example I got from Jason about an issue with Integrated Points in kernel:

commit d14053b3c714178525f22660e6aaf41263d00056
Author: David Woodhouse 
Date:   Thu Oct 15 09:28:06 2015 +0100

iommu/vt-d: Fix ATSR handling for Root-Complex integrated endpoints

The VT-d specification says that "Software must enable ATS on endpoint
devices behind a Root Port only if the Root Port is reported as
supporting ATS transactions."
  

We can say is a bug and is solved, what's the problem?
But my point it, why do it in the first place?
We are the hardware "vendors" and we can decide not to add PCIe
devices as Integrated Devices.




+hardware uses mostly PCI devices as Integrated Endpoints. In this
+way we may avoid some strange Guest OS-es behaviour.
+Other than that plug only PCIe Root Ports, PCIe Switches (upstream ports)
+or DMI-PCI bridges to start legacy PCI hierarchies.
+
+
+   pcie.0 bus
+   --
+|||   |
+   ---   --   --  --
+   | PCI Dev |   | PCIe Root Port |   |  Upstream Port |  | DMI-PCI bridge |
+   ---   --   --  --


Do you have a spec reference for plugging an upstream port directly
into the root complex?  IMHO this is invalid, an upstream port can only
be attached behind a downstream port, ie. a root port or downstream
switch port.



Yes, is a bug, both me and Laszlo spotted it and the 2.2 figure shows it right.
Thanks for finding it.


+
+2.2 PCIe only hierarchy
+===
+Always use PCIe Root ports to start a PCIe hierarchy. Use PCIe switches 
(Upstream
+Ports + several Downstream Ports) if out of PCIe Root Ports slots. PCIe 
switches
+can be nested until a depth of 6-7. Plug only PCIe devices into PCIe Ports.


This seems to contradict 2.1,


Yes, please forgive the bug, it will not appear in v2

 but I agree more with this statement to

only start a PCIe sub-hierarchy with a root port, not an upstream port
connected to the root complex.  The 2nd sentence is confusing, I don't
know if you're referring to fan-out via PCIe switch downstream of a
root port or again suggesting to use upstream switch ports directly on
the root complex.



The PCIe hierarchy always starts with PCI Express Root Ports, the switch
is to be plugged in the PCi Express ports. I will try to re-phrase to be more
clear.



+
+
+   pcie.0 bus
+   
+||   |
+   -   -   -
+   | Root Port |   | Root Port |   | Root Port |
+      --   -
+ |   |
+ -
+| 

Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-06 Thread Alex Williamson
On Tue, 6 Sep 2016 21:14:11 +0300
Marcel Apfelbaum  wrote:

> On 09/06/2016 06:38 PM, Alex Williamson wrote:
> > On Thu,  1 Sep 2016 16:22:07 +0300
> > Marcel Apfelbaum  wrote:
> >  
> >> Proposes best practices on how to use PCIe/PCI device
> >> in PCIe based machines and explain the reasoning behind them.
> >>
> >> Signed-off-by: Marcel Apfelbaum 
> >> ---
> >>
> >> Hi,
> >>
> >> Please add your comments on what to add/remove/edit to make this doc 
> >> usable.
> >>
> >> Thanks,
> >> Marcel
> >>
> >>  docs/pcie.txt | 145 
> >> ++
> >>  1 file changed, 145 insertions(+)
> >>  create mode 100644 docs/pcie.txt
> >>
> >> diff --git a/docs/pcie.txt b/docs/pcie.txt
> >> new file mode 100644
> >> index 000..52a8830
> >> --- /dev/null
> >> +++ b/docs/pcie.txt
> >> @@ -0,0 +1,145 @@
> >> +PCI EXPRESS GUIDELINES
> >> +==
> >> +
> >> +1. Introduction
> >> +
> >> +The doc proposes best practices on how to use PCIe/PCI device
> >> +in PCIe based machines and explains the reasoning behind them.
> >> +
> >> +
> >> +2. Device placement strategy
> >> +
> >> +QEMU does not have a clear socket-device matching mechanism
> >> +and allows any PCI/PCIe device to be plugged into any PCI/PCIe slot.
> >> +Plugging a PCI device into a PCIe device might not always work and
> >> +is weird anyway since it cannot be done for "bare metal".
> >> +Plugging a PCIe device into a PCI slot will hide the Extended
> >> +Configuration Space thus is also not recommended.
> >> +
> >> +The recommendation is to separate the PCIe and PCI hierarchies.
> >> +PCIe devices should be plugged only into PCIe Root Ports and
> >> +PCIe Downstream ports (let's call them PCIe ports).
> >> +
> >> +2.1 Root Bus (pcie.0)
> >> +=
> >> +Plug only legacy PCI devices as Root Complex Integrated Devices
> >> +even if the PCIe spec does not forbid PCIe devices. The existing  
> >  
> 
> Hi Alex,
> Thanks for the review.
> 
> 
> > Surely we can have PCIe device on the root complex??
> >  
> 
> Yes, we can, is not forbidden. Even so, my understanding is
> the main use for Integrated Devices is for legacy devices
> like sound cards or nics that come with the motherboard.
> Because of that my concern is we might be missing some support
> for that in QEMU or even in linux kernel.
> 
> One example I got from Jason about an issue with Integrated Points in kernel:
> 
> commit d14053b3c714178525f22660e6aaf41263d00056
> Author: David Woodhouse 
> Date:   Thu Oct 15 09:28:06 2015 +0100
> 
>  iommu/vt-d: Fix ATSR handling for Root-Complex integrated endpoints
> 
>  The VT-d specification says that "Software must enable ATS on endpoint
>  devices behind a Root Port only if the Root Port is reported as
>  supporting ATS transactions."
>
> 
> We can say is a bug and is solved, what's the problem?
> But my point it, why do it in the first place?
> We are the hardware "vendors" and we can decide not to add PCIe
> devices as Integrated Devices.
> 
> 
> 
> >> +hardware uses mostly PCI devices as Integrated Endpoints. In this
> >> +way we may avoid some strange Guest OS-es behaviour.
> >> +Other than that plug only PCIe Root Ports, PCIe Switches (upstream ports)
> >> +or DMI-PCI bridges to start legacy PCI hierarchies.
> >> +
> >> +
> >> +   pcie.0 bus
> >> +   
> >> --
> >> +|||   |
> >> +   ---   --   --  
> >> --
> >> +   | PCI Dev |   | PCIe Root Port |   |  Upstream Port |  | DMI-PCI 
> >> bridge |
> >> +   ---   --   --  
> >> --  
> >
> > Do you have a spec reference for plugging an upstream port directly
> > into the root complex?  IMHO this is invalid, an upstream port can only
> > be attached behind a downstream port, ie. a root port or downstream
> > switch port.
> >  
> 
> Yes, is a bug, both me and Laszlo spotted it and the 2.2 figure shows it 
> right.
> Thanks for finding it.
> 
> >> +
> >> +2.2 PCIe only hierarchy
> >> +===
> >> +Always use PCIe Root ports to start a PCIe hierarchy. Use PCIe switches 
> >> (Upstream
> >> +Ports + several Downstream Ports) if out of PCIe Root Ports slots. PCIe 
> >> switches
> >> +can be nested until a depth of 6-7. Plug only PCIe devices into PCIe 
> >> Ports.  
> >
> > This seems to contradict 2.1,  
> 
> Yes, please forgive the bug, it will not appear in v2
> 
>   but I agree more with this statement to
> > only start a PCIe sub-hierarchy with a root port, not an upstream port
> > connected to the root complex.  The 2nd sentence is confusing, I don't
> > know if you're referring to fan-out via PCIe switch downstream of a
> > root 

Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-06 Thread Marcel Apfelbaum

On 09/06/2016 09:32 PM, Alex Williamson wrote:

On Tue, 6 Sep 2016 21:14:11 +0300
Marcel Apfelbaum  wrote:


On 09/06/2016 06:38 PM, Alex Williamson wrote:

On Thu,  1 Sep 2016 16:22:07 +0300
Marcel Apfelbaum  wrote:


Proposes best practices on how to use PCIe/PCI device
in PCIe based machines and explain the reasoning behind them.

Signed-off-by: Marcel Apfelbaum 
---

Hi,

Please add your comments on what to add/remove/edit to make this doc usable.

Thanks,
Marcel



[...]




+The PCI hotplug is ACPI based and can work side by side with the PCIe
+native hotplug.
+
+PCIe devices can be natively hot-plugged/hot-unplugged into/from
+PCIe Ports (Root Ports/Downstream Ports). Switches are hot-pluggable.


Why?  This seems like a QEMU bug.  Clearly we need the downstream ports
in place when the upstream switch is hot-added, but this should be
feasible.



I don't get understand the question. I do think switches can be hot-plugged,
but I am not sure if QEMU allows it. If not, this is something we should solve.


Sorry, I read to quickly and inserted a  in there, I thought the
statement was that switches are not hot-pluggable.  I think the issue
will be the ordering of hot-adding the downstream switch ports prior to
the upstream switch port since or we're going to need to invent a
switch with hot-pluggable downstream ports.  I expect the guest is only
going to scan for downstream ports once after the upstream port is
discovered.



The problem I see is that I need to specify a bus to
plug the Downstream Port, but this is the id of the upstream port
I haven't added yet. I need to think a little bit more on how to do it,
or I am missing something.

[...]


+5. Device assignment
+
+Host devices are mostly PCIe and should be plugged only into PCIe ports.
+PCI-PCI bridge slots can be used for legacy PCI host devices.


I don't think we have any evidence to suggest this as a best practice.
We have a lot of experience placing PCIe host devices into a
conventional PCI topology on 440FX.  We don't have nearly as much
experience placing them into downstream PCIe ports.  This seems like
how we would like for things to behave to look like real hardware
platforms, but it's just navel gazing whether it's actually the right
thing to do.  Thanks,



I had to look up the "navel gazing"...
Why I do agree with your statements I prefer a cleaner PCI Express machine
with as little legacy PCI as possible. I use this document as an opportunity
to start gaining experience with device assignment into PCI Express Root Ports
and Downstream Ports and solve the issues long the way.


That's exactly what I mean, there's an ulterior, personal motivation in
this suggestion that's not really backed by facts.


Ulterior yes, personal no. Several developers of both ARM and x86 PCI Express
machines see the new machines as an opportunity to get rid of legacy and keep
them as modern as possible. Funny thing, I *personally* prefer to see Q35
as a replacement for PC machines, no need to keep and support them both.


  You'd like to make

the recommendation to place PCIe assigned devices into PCIe slots, but
that's not necessarily the configuration with the best track record
right now.


Since we haven't use Q35 at all until now (a speculation, but probably true)
the track record is kind of clean...

 In fact there's really no advantage to a user to do this

unless they have a device that needs PCIe (radeon and tg3
potentially come to mind here).


The advantage is  to avoid making the PCI Express "purists" (where are you 
now??)
to start a legacy PCI hierarchy to plug a modern device into
a modern PCI Express machine.

Another advantage is to avoid tainting the ACPI tables with ACPI hotplug
support for the PCI-bridge devices and stuff like that.

I agree is safer to plug assigned devices into PCI slots from
an "enterprise" point of view, but this is upstream, right ? :)
We look at the future... (and we don't have known issues yet anyway)

 So while I agree with you from an

ideological standpoint, I don't think that's sufficient to make the
recommendation you're proposing here.  Thanks,



I'll find a way to rephrase it, maybe:

Host devices are mostly PCIe and they can be plugged into PCI Express Root 
Ports/Downstream Ports,
however we have no experience doing that. As a fall-back the PCI hierarchy can 
be used to plug
an assigned device into a PCI slot.

Thanks,
Marcel


 +PCI-PCI bridge slots can be used for legacy PCI host devices.


Alex






Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-06 Thread Alex Williamson
On Thu,  1 Sep 2016 16:22:07 +0300
Marcel Apfelbaum  wrote:

> Proposes best practices on how to use PCIe/PCI device
> in PCIe based machines and explain the reasoning behind them.
> 
> Signed-off-by: Marcel Apfelbaum 
> ---
> 
> Hi,
> 
> Please add your comments on what to add/remove/edit to make this doc usable.
> 
> Thanks,
> Marcel
> 
>  docs/pcie.txt | 145 
> ++
>  1 file changed, 145 insertions(+)
>  create mode 100644 docs/pcie.txt
> 
> diff --git a/docs/pcie.txt b/docs/pcie.txt
> new file mode 100644
> index 000..52a8830
> --- /dev/null
> +++ b/docs/pcie.txt
> @@ -0,0 +1,145 @@
> +PCI EXPRESS GUIDELINES
> +==
> +
> +1. Introduction
> +
> +The doc proposes best practices on how to use PCIe/PCI device
> +in PCIe based machines and explains the reasoning behind them.
> +
> +
> +2. Device placement strategy
> +
> +QEMU does not have a clear socket-device matching mechanism
> +and allows any PCI/PCIe device to be plugged into any PCI/PCIe slot.
> +Plugging a PCI device into a PCIe device might not always work and
> +is weird anyway since it cannot be done for "bare metal".
> +Plugging a PCIe device into a PCI slot will hide the Extended
> +Configuration Space thus is also not recommended.
> +
> +The recommendation is to separate the PCIe and PCI hierarchies.
> +PCIe devices should be plugged only into PCIe Root Ports and
> +PCIe Downstream ports (let's call them PCIe ports).
> +
> +2.1 Root Bus (pcie.0)
> +=
> +Plug only legacy PCI devices as Root Complex Integrated Devices
> +even if the PCIe spec does not forbid PCIe devices. The existing

Surely we can have PCIe device on the root complex??

> +hardware uses mostly PCI devices as Integrated Endpoints. In this
> +way we may avoid some strange Guest OS-es behaviour.
> +Other than that plug only PCIe Root Ports, PCIe Switches (upstream ports)
> +or DMI-PCI bridges to start legacy PCI hierarchies.
> +
> +
> +   pcie.0 bus
> +   --
> +|||   |
> +   ---   --   --  --
> +   | PCI Dev |   | PCIe Root Port |   |  Upstream Port |  | DMI-PCI bridge |
> +   ---   --   --  --

Do you have a spec reference for plugging an upstream port directly
into the root complex?  IMHO this is invalid, an upstream port can only
be attached behind a downstream port, ie. a root port or downstream
switch port.

> +
> +2.2 PCIe only hierarchy
> +===
> +Always use PCIe Root ports to start a PCIe hierarchy. Use PCIe switches 
> (Upstream
> +Ports + several Downstream Ports) if out of PCIe Root Ports slots. PCIe 
> switches
> +can be nested until a depth of 6-7. Plug only PCIe devices into PCIe Ports.

This seems to contradict 2.1, but I agree more with this statement to
only start a PCIe sub-hierarchy with a root port, not an upstream port
connected to the root complex.  The 2nd sentence is confusing, I don't
know if you're referring to fan-out via PCIe switch downstream of a
root port or again suggesting to use upstream switch ports directly on
the root complex.

> +
> +
> +   pcie.0 bus
> +   
> +||   |
> +   -   -   -
> +   | Root Port |   | Root Port |   | Root Port |
> +      --   -
> + |   |
> + -
> +| PCIe Dev | | Upstream Port |
> + -
> +  ||
> + ------
> + | Downstream Port || Downstream Port |
> + ------
> + |
> + 
> + | PCIe Dev |
> + 
> +
> +2.3 PCI only hierarchy
> +==
> +Legacy PCI devices can be plugged into pcie.0 as Integrated Devices or
> +into DMI-PCI bridge. PCI-PCI bridges can be plugged into DMI-PCI bridges
> +and can be nested until a depth of 6-7. DMI-BRIDGES should be plugged
> +only into pcie.0 bus.
> +
> +   pcie.0 bus
> +   --
> +||
> +   ---   --
> +   | PCI Dev |   | DMI-PCI BRIDGE |
> +   ----
> +   ||
> +-----
> +  

Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-06 Thread Marcel Apfelbaum

On 09/06/2016 02:35 PM, Gerd Hoffmann wrote:

  Hi,


+Plug only legacy PCI devices as Root Complex Integrated Devices
+even if the PCIe spec does not forbid PCIe devices.


I suggest "even though the PCI Express spec does not forbid PCI Express
devices as Integrated Devices". (Detail is good!)


While talking about integrated devices:  There is docs/q35-chipset.cfg,
which documents how to mimic q35 with integrated devices as close and
complete as possible.

Usage:
  qemu-system-x86_64 -M q35 -readconfig docs/q35-chipset.cfg $args

Side note for usb: In practice you don't want to use the tons of
uhci/ehci controllers present in the original q35 but plug xhci into one
of the pcie root ports instead (unless your guest doesn't support xhci).



Hi Gerd,

Thanks for the comments, I'll be sure to refer them in the doc.
Marcel


[...]



Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-06 Thread Marcel Apfelbaum

On 09/06/2016 04:31 PM, Laszlo Ersek wrote:

On 09/05/16 22:02, Marcel Apfelbaum wrote:

On 09/05/2016 07:24 PM, Laszlo Ersek wrote:

On 09/01/16 15:22, Marcel Apfelbaum wrote:

Proposes best practices on how to use PCIe/PCI device
in PCIe based machines and explain the reasoning behind them.

Signed-off-by: Marcel Apfelbaum 
---

Hi,

Please add your comments on what to add/remove/edit to make this doc
usable.






[...]




(But, certainly no IO reservation for PCI Express root port, upstream
port, or downstream port! And i'll need your help for telling these
apart in OVMF.)



Just let me know how can I help.


Well, in the EFI_PCI_HOT_PLUG_INIT_PROTOCOL.GetResourcePadding()
implementation, I'll have to look at the PCI config space of the
"bridge-like" PCI device that the generic PCI Bus driver of edk2 passes
back to me, asking me about resource reservation.

Based on the config space, I should be able to tell apart "PCI-PCI
bridge" from "PCI Express downstream or root port". So what I'd need
here is a semi-formal natural language description of these conditions.


You can use PCI Express Spec: 7.8.2. PCI Express Capabilities Register (Offset 
02h)

Bit 7:4 Register Description:
Device/Port Type – Indicates the specific type of this PCI
Express Function. Note that different Functions in a multi-
Function device can generally be of different types.
Defined encodings are:
b PCI Express Endpoint
0001b Legacy PCI Express Endpoint
0100b Root Port of PCI Express Root Complex*
0101b Upstream Port of PCI Express Switch*
0110b Downstream Port of PCI Express Switch*
0111b PCI Express to PCI/PCI-X Bridge*
1000b PCI/PCI-X to PCI Express Bridge*
1001b Root Complex Integrated Endpoint





Hmm, actually I think I've already written code, for another patch, that
identifies the latter category. So everything where that check doesn't
fire can be deemed "PCI-PCI bridge". (This hook gets called only for
bridges.)

Yet another alternative: if we go for the special PCI capability, for
exposing reservation sizes from QEMU to the firmware, then I can simply
search the capability list for just that capability. I think that could
be the easiest for me.



That would be a "later" step.
BTW, following a offline chat with Michael S. Tsirkin
regarding virto 1.0 requiring 8M MMIO by default we arrived to a conclusion that
is not really needed and we came up with an alternative that will require less 
then 2M
MMIO space.
I put this here because the above solution will give us some time to deal with
the MMIO ranges reservation.

[...]


+
+
+4. Hot Plug
+
+The root bus pcie.0 does not support hot-plug, so Integrated Devices,


s/root bus/root complex/? Also, any root complexes added with pxb-pcie
don't support hotplug.



Actually pxb-pcie should support PCI Express Native Hotplug.


Huh, interesting.


If they don't is a bug and I'll take care of it.


Hmm, a bit lower down you mention that PCI Express native hot plug is
based on SHPCs. So, when you say that pxb-pcie should support PCI
Express Native Hotplug, you mean that it should occur through SHPC, right?



Yes, but I was talking about the Integrated SHPCs of the PCI Express
Root Ports and PCI Express Downstream Ports. (devices plugged into them)



However, for pxb-pci*, we had to disable SHPC: see QEMU commit
d10dda2d60c8 ("hw/pci-bridge: disable SHPC in PXB"), in June 2015.



This is only for the pxb device (not pxb-pcie) and only for the internal 
pci-bridge that comes with it.
And... we don't use SHPC based hot-plug for PCI, only for PCI Express.
For PCI we are using only the ACPI hotplug. So disabling it is not so bad.

The pxb-pcie does not have the internal PCI bridge. You don't need it because:
1. You can't have Integrated Devices for pxb-pcie
2. The PCI Express Upstream Port is a type of PCI-Bridge anyway.



For background, the series around it was

-- I think v7 was the last version.

... Actually, now I wonder if d10dda2d60c8 should be possible to revert
at this point! Namely, in OVMF I may have unwittingly fixed this issue
-- obviously much later than the QEMU commit: in March 2016. See

https://github.com/tianocore/edk2/commit/8f35eb92c419

If you look at the commit message of the QEMU patch, it says

[...]

Unfortunately, when this happens, the PCI_COMMAND_MEMORY bit is
clear in the root bus's command register [...]

which I think should no longer be true, thanks to edk2 commit 8f35eb92c419.

So maybe we should re-evaluate QEMU commit d10dda2d60c8. If pxb-pci and
pxb-pcie work with current OVMF, due to edk2 commit 8f35eb92c419, then
maybe we should revert QEMU commit d10dda2d60c8.

Not urgent for me :), obviously, I'm just explaining so you can make a
note for later, if you wish to (if hot-plugging directly into pxb-pcie
should be necessary -- I think it's very low priority).



As stated above, since we don't use it anyway it doesn't matter.

Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-06 Thread Laine Stump

On 09/06/2016 07:35 AM, Gerd Hoffmann wrote:

While talking about integrated devices:  There is docs/q35-chipset.cfg,
which documents how to mimic q35 with integrated devices as close and
complete as possible.

Usage:
  qemu-system-x86_64 -M q35 -readconfig docs/q35-chipset.cfg $args

Side note for usb: In practice you don't want to use the tons of
uhci/ehci controllers present in the original q35 but plug xhci into one
of the pcie root ports instead (unless your guest doesn't support xhci).


I've wondered about that recently. For i440fx machinetypes if you don't 
specify a USB controller in libvirt's domain config, you will 
automatically get the PIIX3 USB controller added. In order to maintain 
consistency on the topic of "auto-adding USB when not specified", if the 
machinetype is Q35 we will autoadd a set of USB2 (uhci/ehci) controllers 
(I think I added that based on your comments at the time :-). But 
recently I've mostly been hearing that people should use xhci instead. 
So should libvirt add a single xhci (rather than the uhci/ehci set) at 
the same port when no USB is specified?





Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-06 Thread Laszlo Ersek
On 09/05/16 22:02, Marcel Apfelbaum wrote:
> On 09/05/2016 07:24 PM, Laszlo Ersek wrote:
>> On 09/01/16 15:22, Marcel Apfelbaum wrote:
>>> Proposes best practices on how to use PCIe/PCI device
>>> in PCIe based machines and explain the reasoning behind them.
>>>
>>> Signed-off-by: Marcel Apfelbaum 
>>> ---
>>>
>>> Hi,
>>>
>>> Please add your comments on what to add/remove/edit to make this doc
>>> usable.
>>
> 
> Hi Laszlo,
> 
>> I'll give you a brain dump below -- most of it might easily be
>> incorrect, but I'll just speak my mind :)
>>
> 
> Thanks for taking the time to go over it, I'll do my best to respond
> to all the questions.
> 
>>>
>>> Thanks,
>>> Marcel
>>>
>>>  docs/pcie.txt | 145
>>> ++
>>>  1 file changed, 145 insertions(+)
>>>  create mode 100644 docs/pcie.txt
>>>
>>> diff --git a/docs/pcie.txt b/docs/pcie.txt
>>> new file mode 100644
>>> index 000..52a8830
>>> --- /dev/null
>>> +++ b/docs/pcie.txt
>>> @@ -0,0 +1,145 @@
>>> +PCI EXPRESS GUIDELINES
>>> +==
>>> +
>>> +1. Introduction
>>> +
>>> +The doc proposes best practices on how to use PCIe/PCI device
>>> +in PCIe based machines and explains the reasoning behind them.
>>
>> General request: please replace all occurrences of "PCIe" with "PCI
>> Express" in the text (not command lines, of course). The reason is that
>> the "e" letter is a minimal difference, and I've misread PCIe as PC
>> several times, while interpreting this document. Obviously the resultant
>> confusion is terrible, as you are explaining the difference between PCI
>> and PCI Express in the entire document :)
>>
> 
> Sure
> 
>>> +
>>> +
>>> +2. Device placement strategy
>>> +
>>> +QEMU does not have a clear socket-device matching mechanism
>>> +and allows any PCI/PCIe device to be plugged into any PCI/PCIe slot.
>>> +Plugging a PCI device into a PCIe device might not always work and
>>
>> s/PCIe device/PCI Express slot/
>>
> 
> Thanks!
> 
>>> +is weird anyway since it cannot be done for "bare metal".
>>> +Plugging a PCIe device into a PCI slot will hide the Extended
>>> +Configuration Space thus is also not recommended.
>>> +
>>> +The recommendation is to separate the PCIe and PCI hierarchies.
>>> +PCIe devices should be plugged only into PCIe Root Ports and
>>> +PCIe Downstream ports (let's call them PCIe ports).
>>
>> Please do not use the shorthand; we should always spell out downstream
>> ports and root ports. Assume people reading this document are dumber
>> than I am wrt. PCI / PCI Express -- I'm already pretty dumb, and I
>> appreciate the detail! :) If they are smart, they won't mind the detail;
>> if they lack expertise, they'll appreciate the detail, won't they. :)
>>
> 
> Sure
> 
>>> +
>>> +2.1 Root Bus (pcie.0)
>>
>> Can we call this Root Complex instead?
>>
> 
> Sorry, but we can't. The Root Complex is a type of Host-Bridge
> (and can actually "have" multiple Host-Bridges), not a bus.
> It stands between the CPU/Memory controller/APIC and the PCI/PCI Express
> fabric.
> (as you can see, I am not using PCIe even for the comments :))
> 
> The Root Complex *includes* an internal bus (pcie.0) but also
> can include some Integrated Devices, its own Configuration Space Registers
> (e.g Root Complex Register Block), ...
> 
> One of the main functions of the Root Complex is to
> generate PCI Express Transactions on behalf of the CPU(s) and
> to "translate" the corresponding PCI Express Transactions into DMA
> accesses.
> 
> I can change it to "PCI Express Root Bus", it will help?

Yes, it will, thank you.

All my other "root complex" mentions below were incorrect, in light of
your clarification, so please consider those accordingly.

> 
>>> +=
>>> +Plug only legacy PCI devices as Root Complex Integrated Devices
>>> +even if the PCIe spec does not forbid PCIe devices.
>>
>> I suggest "even though the PCI Express spec does not forbid PCI Express
>> devices as Integrated Devices". (Detail is good!)
>>
> Thanks
> 
>> Also, as Peter suggested, this (but not just this) would be a good place
>> to provide command line fragments.
>>
> 
> I've already added some examples, I'll appreciate if you can have a look
> on v2
> that I will post really soon.
> 
>>> The existing
>>> +hardware uses mostly PCI devices as Integrated Endpoints. In this
>>> +way we may avoid some strange Guest OS-es behaviour.
>>> +Other than that plug only PCIe Root Ports, PCIe Switches (upstream
>>> ports)
>>> +or DMI-PCI bridges to start legacy PCI hierarchies.
>>
>> H, I had to re-read this paragraph (while looking at the diagram)
>> five times until I mostly understood it :) What about the following
>> wording:
>>
>> 
>> Place only the following kinds of devices directly on the Root Complex:
>>
>> (1) For devices with dedicated, specific functionality (network card,
>> graphics card, IDE controller, etc), place only legacy PCI 

Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-06 Thread Gerd Hoffmann
  Hi,

> > +Plug only legacy PCI devices as Root Complex Integrated Devices
> > +even if the PCIe spec does not forbid PCIe devices.
> 
> I suggest "even though the PCI Express spec does not forbid PCI Express
> devices as Integrated Devices". (Detail is good!)

While talking about integrated devices:  There is docs/q35-chipset.cfg,
which documents how to mimic q35 with integrated devices as close and
complete as possible.

Usage:
  qemu-system-x86_64 -M q35 -readconfig docs/q35-chipset.cfg $args

Side note for usb: In practice you don't want to use the tons of
uhci/ehci controllers present in the original q35 but plug xhci into one
of the pcie root ports instead (unless your guest doesn't support xhci).

> > +as required by PCI spec will reserve a 4K IO range for each.
> > +The firmware used by QEMU (SeaBIOS/OVMF) will further optimize
> > +it by allocation the IO space only if there is at least a device
> > +with IO BARs plugged into the bridge.
> 
> This used to be true, but is no longer true, for OVMF. And I think it's
> actually correct: we *should* keep the 4K IO reservation per PCI-PCI bridge.
> 
> (But, certainly no IO reservation for PCI Express root port, upstream
> port, or downstream port! And i'll need your help for telling these
> apart in OVMF.)

IIRC the same is true for seabios, it looks for the pcie capability and
skips io space allocation on pcie ports only.

Side note: the linux kernel allocates io space nevertheless, so
checking /proc/ioports after boot doesn't tell you what the firmware
did.

cheers,
  Gerd




Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-05 Thread Marcel Apfelbaum

On 09/05/2016 07:24 PM, Laszlo Ersek wrote:

On 09/01/16 15:22, Marcel Apfelbaum wrote:

Proposes best practices on how to use PCIe/PCI device
in PCIe based machines and explain the reasoning behind them.

Signed-off-by: Marcel Apfelbaum 
---

Hi,

Please add your comments on what to add/remove/edit to make this doc usable.




Hi Laszlo,


I'll give you a brain dump below -- most of it might easily be
incorrect, but I'll just speak my mind :)



Thanks for taking the time to go over it, I'll do my best to respond
to all the questions.



Thanks,
Marcel

 docs/pcie.txt | 145 ++
 1 file changed, 145 insertions(+)
 create mode 100644 docs/pcie.txt

diff --git a/docs/pcie.txt b/docs/pcie.txt
new file mode 100644
index 000..52a8830
--- /dev/null
+++ b/docs/pcie.txt
@@ -0,0 +1,145 @@
+PCI EXPRESS GUIDELINES
+==
+
+1. Introduction
+
+The doc proposes best practices on how to use PCIe/PCI device
+in PCIe based machines and explains the reasoning behind them.


General request: please replace all occurrences of "PCIe" with "PCI
Express" in the text (not command lines, of course). The reason is that
the "e" letter is a minimal difference, and I've misread PCIe as PC
several times, while interpreting this document. Obviously the resultant
confusion is terrible, as you are explaining the difference between PCI
and PCI Express in the entire document :)



Sure


+
+
+2. Device placement strategy
+
+QEMU does not have a clear socket-device matching mechanism
+and allows any PCI/PCIe device to be plugged into any PCI/PCIe slot.
+Plugging a PCI device into a PCIe device might not always work and


s/PCIe device/PCI Express slot/



Thanks!


+is weird anyway since it cannot be done for "bare metal".
+Plugging a PCIe device into a PCI slot will hide the Extended
+Configuration Space thus is also not recommended.
+
+The recommendation is to separate the PCIe and PCI hierarchies.
+PCIe devices should be plugged only into PCIe Root Ports and
+PCIe Downstream ports (let's call them PCIe ports).


Please do not use the shorthand; we should always spell out downstream
ports and root ports. Assume people reading this document are dumber
than I am wrt. PCI / PCI Express -- I'm already pretty dumb, and I
appreciate the detail! :) If they are smart, they won't mind the detail;
if they lack expertise, they'll appreciate the detail, won't they. :)



Sure


+
+2.1 Root Bus (pcie.0)


Can we call this Root Complex instead?



Sorry, but we can't. The Root Complex is a type of Host-Bridge
(and can actually "have" multiple Host-Bridges), not a bus.
It stands between the CPU/Memory controller/APIC and the PCI/PCI Express fabric.
(as you can see, I am not using PCIe even for the comments :))

The Root Complex *includes* an internal bus (pcie.0) but also
can include some Integrated Devices, its own Configuration Space Registers
(e.g Root Complex Register Block), ...

One of the main functions of the Root Complex is to
generate PCI Express Transactions on behalf of the CPU(s) and
to "translate" the corresponding PCI Express Transactions into DMA accesses.

I can change it to "PCI Express Root Bus", it will help?


+=
+Plug only legacy PCI devices as Root Complex Integrated Devices
+even if the PCIe spec does not forbid PCIe devices.


I suggest "even though the PCI Express spec does not forbid PCI Express
devices as Integrated Devices". (Detail is good!)


Thanks


Also, as Peter suggested, this (but not just this) would be a good place
to provide command line fragments.



I've already added some examples, I'll appreciate if you can have a look on v2
that I will post really soon.


The existing
+hardware uses mostly PCI devices as Integrated Endpoints. In this
+way we may avoid some strange Guest OS-es behaviour.
+Other than that plug only PCIe Root Ports, PCIe Switches (upstream ports)
+or DMI-PCI bridges to start legacy PCI hierarchies.


H, I had to re-read this paragraph (while looking at the diagram)
five times until I mostly understood it :) What about the following wording:


Place only the following kinds of devices directly on the Root Complex:

(1) For devices with dedicated, specific functionality (network card,
graphics card, IDE controller, etc), place only legacy PCI devices on
the Root Complex. These will be considered Integrated Endpoints.
Although the PCI Express spec does not forbid PCI Express devices as
Integrated Endpoints, existing hardware mostly integrates legacy PCI
devices with the Root Complex. Guest OSes are suspected to behave
strangely when PCI Express devices are integrated with the Root Complex.

(2) PCI Express Root Ports, for starting exclusively PCI Express
hierarchies.

(3) PCI Express Switches (connected with their Upstream Ports to the
Root Complex), also for starting exclusively PCI Express hierarchies.

(4) For starting legacy 

Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-05 Thread Laszlo Ersek
On 09/01/16 15:22, Marcel Apfelbaum wrote:
> Proposes best practices on how to use PCIe/PCI device
> in PCIe based machines and explain the reasoning behind them.
> 
> Signed-off-by: Marcel Apfelbaum 
> ---
> 
> Hi,
> 
> Please add your comments on what to add/remove/edit to make this doc usable.

I'll give you a brain dump below -- most of it might easily be
incorrect, but I'll just speak my mind :)

> 
> Thanks,
> Marcel
> 
>  docs/pcie.txt | 145 
> ++
>  1 file changed, 145 insertions(+)
>  create mode 100644 docs/pcie.txt
> 
> diff --git a/docs/pcie.txt b/docs/pcie.txt
> new file mode 100644
> index 000..52a8830
> --- /dev/null
> +++ b/docs/pcie.txt
> @@ -0,0 +1,145 @@
> +PCI EXPRESS GUIDELINES
> +==
> +
> +1. Introduction
> +
> +The doc proposes best practices on how to use PCIe/PCI device
> +in PCIe based machines and explains the reasoning behind them.

General request: please replace all occurrences of "PCIe" with "PCI
Express" in the text (not command lines, of course). The reason is that
the "e" letter is a minimal difference, and I've misread PCIe as PC
several times, while interpreting this document. Obviously the resultant
confusion is terrible, as you are explaining the difference between PCI
and PCI Express in the entire document :)

> +
> +
> +2. Device placement strategy
> +
> +QEMU does not have a clear socket-device matching mechanism
> +and allows any PCI/PCIe device to be plugged into any PCI/PCIe slot.
> +Plugging a PCI device into a PCIe device might not always work and

s/PCIe device/PCI Express slot/

> +is weird anyway since it cannot be done for "bare metal".
> +Plugging a PCIe device into a PCI slot will hide the Extended
> +Configuration Space thus is also not recommended.
> +
> +The recommendation is to separate the PCIe and PCI hierarchies.
> +PCIe devices should be plugged only into PCIe Root Ports and
> +PCIe Downstream ports (let's call them PCIe ports).

Please do not use the shorthand; we should always spell out downstream
ports and root ports. Assume people reading this document are dumber
than I am wrt. PCI / PCI Express -- I'm already pretty dumb, and I
appreciate the detail! :) If they are smart, they won't mind the detail;
if they lack expertise, they'll appreciate the detail, won't they. :)

> +
> +2.1 Root Bus (pcie.0)

Can we call this Root Complex instead?

> +=
> +Plug only legacy PCI devices as Root Complex Integrated Devices
> +even if the PCIe spec does not forbid PCIe devices.

I suggest "even though the PCI Express spec does not forbid PCI Express
devices as Integrated Devices". (Detail is good!)

Also, as Peter suggested, this (but not just this) would be a good place
to provide command line fragments.

> The existing
> +hardware uses mostly PCI devices as Integrated Endpoints. In this
> +way we may avoid some strange Guest OS-es behaviour.
> +Other than that plug only PCIe Root Ports, PCIe Switches (upstream ports)
> +or DMI-PCI bridges to start legacy PCI hierarchies.

H, I had to re-read this paragraph (while looking at the diagram)
five times until I mostly understood it :) What about the following wording:


Place only the following kinds of devices directly on the Root Complex:

(1) For devices with dedicated, specific functionality (network card,
graphics card, IDE controller, etc), place only legacy PCI devices on
the Root Complex. These will be considered Integrated Endpoints.
Although the PCI Express spec does not forbid PCI Express devices as
Integrated Endpoints, existing hardware mostly integrates legacy PCI
devices with the Root Complex. Guest OSes are suspected to behave
strangely when PCI Express devices are integrated with the Root Complex.

(2) PCI Express Root Ports, for starting exclusively PCI Express
hierarchies.

(3) PCI Express Switches (connected with their Upstream Ports to the
Root Complex), also for starting exclusively PCI Express hierarchies.

(4) For starting legacy PCI hierarchies: DMI-PCI bridges.

> +
> +
> +   pcie.0 bus

"bus" is correct in QEMU lingo, but I'd still call it complex here.

> +   --
> +|||   |
> +   ---   --   --  --
> +   | PCI Dev |   | PCIe Root Port |   |  Upstream Port |  | DMI-PCI bridge |
> +   ---   --   --  --
> +

Please insert a separate (brief) section here about pxb-pcie devices --
just mention that they are documented in a separate spec txt in more
detail, and that they create new root complexes in practice.

In fact, maybe option (5) would be better for pxb-pcie devices, under
section 2.1, than a dedicated section!

> +2.2 PCIe only hierarchy
> +===
> +Always 

Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-01 Thread Laszlo Ersek
On 09/01/16 15:51, Marcel Apfelbaum wrote:
> On 09/01/2016 04:27 PM, Peter Maydell wrote:
>> On 1 September 2016 at 14:22, Marcel Apfelbaum  wrote:
>>> Proposes best practices on how to use PCIe/PCI device
>>> in PCIe based machines and explain the reasoning behind them.
>>>
>>> Signed-off-by: Marcel Apfelbaum 
>>> ---
>>>
>>> Hi,
>>>
>>> Please add your comments on what to add/remove/edit to make this doc
>>> usable.
>>
> 
> Hi Peter,
> 
>> As somebody who doesn't really understand the problem space, my
>> thoughts:
>>
>> (1) is this intended as advice for developers writing machine
>> models and adding pci controllers to them, or is it intended as
>> advice for users (and libvirt-style management layers) about
>> how to configure QEMU?
>>
> 
> Is it intended for management layers as they have no way to
> understand how to "consume" the Q35 machine,
> but also for firmware developers (OVMF/SeaBIOS) to help them
> understand the usage model so they can optimize IO/MEM
> resources allocation for both boot time and hot-plug.
> 
> QEMU users/developers can also benefit from it as the PCIe arch
> is more complex supporting both PCI/PCIe devices and
> several PCI/PCIe controllers with no clear rules on what goes where.
> 
>> (2) it seems to be a bit short on concrete advice (either
>> "you should do this" instructions to machine model developers,
>> or "use command lines like this" instructions to end-users.
>>
> 
> Thanks for the point. I'll be sure to add detailed command line examples
> to the next version.

I think that would be a huge benefit!

(I'll try to read the document later, and come back with remarks.)

Thanks!
Laszlo




Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-01 Thread Marcel Apfelbaum

On 09/01/2016 04:27 PM, Peter Maydell wrote:

On 1 September 2016 at 14:22, Marcel Apfelbaum  wrote:

Proposes best practices on how to use PCIe/PCI device
in PCIe based machines and explain the reasoning behind them.

Signed-off-by: Marcel Apfelbaum 
---

Hi,

Please add your comments on what to add/remove/edit to make this doc usable.




Hi Peter,


As somebody who doesn't really understand the problem space, my
thoughts:

(1) is this intended as advice for developers writing machine
models and adding pci controllers to them, or is it intended as
advice for users (and libvirt-style management layers) about
how to configure QEMU?



Is it intended for management layers as they have no way to
understand how to "consume" the Q35 machine,
but also for firmware developers (OVMF/SeaBIOS) to help them
understand the usage model so they can optimize IO/MEM
resources allocation for both boot time and hot-plug.

QEMU users/developers can also benefit from it as the PCIe arch
is more complex supporting both PCI/PCIe devices and
several PCI/PCIe controllers with no clear rules on what goes where.


(2) it seems to be a bit short on concrete advice (either
"you should do this" instructions to machine model developers,
or "use command lines like this" instructions to end-users.



Thanks for the point. I'll be sure to add detailed command line examples
to the next version.

Thanks,
Marcel


thanks
-- PMM






Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-01 Thread Peter Maydell
On 1 September 2016 at 14:22, Marcel Apfelbaum  wrote:
> Proposes best practices on how to use PCIe/PCI device
> in PCIe based machines and explain the reasoning behind them.
>
> Signed-off-by: Marcel Apfelbaum 
> ---
>
> Hi,
>
> Please add your comments on what to add/remove/edit to make this doc usable.

As somebody who doesn't really understand the problem space, my
thoughts:

(1) is this intended as advice for developers writing machine
models and adding pci controllers to them, or is it intended as
advice for users (and libvirt-style management layers) about
how to configure QEMU?

(2) it seems to be a bit short on concrete advice (either
"you should do this" instructions to machine model developers,
or "use command lines like this" instructions to end-users.

thanks
-- PMM



[Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-01 Thread Marcel Apfelbaum
Proposes best practices on how to use PCIe/PCI device
in PCIe based machines and explain the reasoning behind them.

Signed-off-by: Marcel Apfelbaum 
---

Hi,

Please add your comments on what to add/remove/edit to make this doc usable.

Thanks,
Marcel

 docs/pcie.txt | 145 ++
 1 file changed, 145 insertions(+)
 create mode 100644 docs/pcie.txt

diff --git a/docs/pcie.txt b/docs/pcie.txt
new file mode 100644
index 000..52a8830
--- /dev/null
+++ b/docs/pcie.txt
@@ -0,0 +1,145 @@
+PCI EXPRESS GUIDELINES
+==
+
+1. Introduction
+
+The doc proposes best practices on how to use PCIe/PCI device
+in PCIe based machines and explains the reasoning behind them.
+
+
+2. Device placement strategy
+
+QEMU does not have a clear socket-device matching mechanism
+and allows any PCI/PCIe device to be plugged into any PCI/PCIe slot.
+Plugging a PCI device into a PCIe device might not always work and
+is weird anyway since it cannot be done for "bare metal".
+Plugging a PCIe device into a PCI slot will hide the Extended
+Configuration Space thus is also not recommended.
+
+The recommendation is to separate the PCIe and PCI hierarchies.
+PCIe devices should be plugged only into PCIe Root Ports and
+PCIe Downstream ports (let's call them PCIe ports).
+
+2.1 Root Bus (pcie.0)
+=
+Plug only legacy PCI devices as Root Complex Integrated Devices
+even if the PCIe spec does not forbid PCIe devices. The existing
+hardware uses mostly PCI devices as Integrated Endpoints. In this
+way we may avoid some strange Guest OS-es behaviour.
+Other than that plug only PCIe Root Ports, PCIe Switches (upstream ports)
+or DMI-PCI bridges to start legacy PCI hierarchies.
+
+
+   pcie.0 bus
+   --
+|||   |
+   ---   --   --  --
+   | PCI Dev |   | PCIe Root Port |   |  Upstream Port |  | DMI-PCI bridge |
+   ---   --   --  --
+
+2.2 PCIe only hierarchy
+===
+Always use PCIe Root ports to start a PCIe hierarchy. Use PCIe switches 
(Upstream
+Ports + several Downstream Ports) if out of PCIe Root Ports slots. PCIe 
switches
+can be nested until a depth of 6-7. Plug only PCIe devices into PCIe Ports.
+
+
+   pcie.0 bus
+   
+||   |
+   -   -   -
+   | Root Port |   | Root Port |   | Root Port |
+      --   -
+ |   |
+ -
+| PCIe Dev | | Upstream Port |
+ -
+  ||
+ ------
+ | Downstream Port || Downstream Port |
+ ------
+ |
+ 
+ | PCIe Dev |
+ 
+
+2.3 PCI only hierarchy
+==
+Legacy PCI devices can be plugged into pcie.0 as Integrated Devices or
+into DMI-PCI bridge. PCI-PCI bridges can be plugged into DMI-PCI bridges
+and can be nested until a depth of 6-7. DMI-BRIDGES should be plugged
+only into pcie.0 bus.
+
+   pcie.0 bus
+   --
+||
+   ---   --
+   | PCI Dev |   | DMI-PCI BRIDGE |
+   ----
+   ||
+-----
+| PCI Dev || PCI-PCI Bridge |
+-----
+ |   |
+  --- ---
+  | PCI Dev | | PCI Dev |
+  --- ---
+
+
+
+3. IO space issues
+===
+PCIe Ports are seen by Firmware/Guest OS as PCI bridges and
+as required by PCI spec will reserve a 4K IO range for each.
+The firmware used by QEMU (SeaBIOS/OVMF) will further optimize
+it by allocation the IO space only if there is at least a device
+with IO BARs plugged into the bridge.
+Behind a PCIe PORT only one device may be plugged, resulting in
+the allocation of a whole 4K range for each device.
+The IO space is limited resulting in ~10 PCIe ports per system
+if devices with IO BARs are plugged into IO ports.
+
+Using the proposed device