On 02/22/2016 11:44 AM, Laszlo Ersek wrote:
On 02/22/16 18:21, Brian J. Johnson wrote:

Here's another example of a bare metal machine with multiple PCI roots,
although they do not share resources (SGI UV1000, edited for brevity):

[snip an incredible amount of devices]

Does this supercomputer fit in a van? :)


It's just a little 6-socket system we use as a build server. You should see the list on a 256-socket, 4-rack, 64TB machine. :)

Most of those "devices" are built in to the CPU sockets. Intel likes to expose their configuration registers, including useful things like power and temperature monitors, via PCI config space. Xeons produce rather long lspci output.

...
I think (with my admittedly limited PCI "expertise", quotes justified)
that the above driver and library class are a really good abstraction.
(I can praise it; it's not my design. :))

For OVMF we need a few tweaks in the driver code / assumptions (about
non-overlapping MMIO and ioport apertures), but otherwise it looks very
promising to us.

Thanks
Laszlo

Thanks for the summary, I appreciate it.


Thanks,
Brian



  >
  > Regards,
  > Ray
  >
  >
  >> -----Original Message-----
  >> From: Marcel Apfelbaum [mailto:marcel.apfelb...@gmail.com]
  >> Sent: Monday, February 8, 2016 6:56 PM
  >> To: Ni, Ruiyu <ruiyu...@intel.com>; Laszlo Ersek <ler...@redhat.com>
  >> Cc: Justen, Jordan L <jordan.l.jus...@intel.com>;
edk2-de...@ml01.01.org;
  >> Tian, Feng <feng.t...@intel.com>; Fan, Jeff <jeff....@intel.com>
  >> Subject: Re: [edk2] [Patch V4 4/4] MdeModulePkg: Add generic
  >> PciHostBridgeDxe driver.
  >>
  >> Hi,
  >>
  >> I am sorry for the noise, I am re-sending this mail from an e-mail
address
  >> subscribed to the list.
  >>
  >> Thanks,
  >> Marcel
  >>
  >> On 02/08/2016 12:41 PM, Marcel Apfelbaum wrote:
  >>> On 02/06/2016 09:09 AM, Ni, Ruiyu wrote:
  >>>> Marcel,
  >>>> Please see my reply embedded below.
  >>>>
  >>>> On 2016-02-02 19:07, Laszlo Ersek wrote:
  >>>>> On 02/01/16 16:07, Marcel Apfelbaum wrote:
  >>>>>> On 01/26/2016 07:17 AM, Ni, Ruiyu wrote:
  >>>>>>> Laszlo,
  >>>>>>> I now understand your problem.
  >>>>>>> Can you tell me why OVMF needs multiple root bridges support?
  >>>>>>> My understanding to OVMF is it's a firmware which can be used
in a
  >>>>>>> guest VM
  >>>>>>> environment to boot OS.
  >>>>>>> Multiple root bridges requirement currently mainly comes from
  >> high-end
  >>>>>>> servers.
  >>>>>>> Do you mean that the VM guest needs to be like a high-end
server?
  >>>>>>> This may help me to think about the possible solution to your
problem.
  >>>>>> Hi Ray,
  >>>>>>
  >>>>>> Laszlo's explanation is very good, this is not exactly about
high-end VMs,
  >>>>>> we need the extra root bridges to match assigned devices to their
  >>>>>> corresponding NUMA node.
  >>>>>>
  >>>>>> Regarding the OVMF issue, the main problem is that the extra root
  >>>>>> bridges are created dynamically
  >>>>>> for the VMs (command line parameter) and their resources are
  >> computed on
  >>>>>> the fly.
  >>>>>>
  >>>>>> Not directly related to the above, the optimal way to allocate
resources
  >>>>>> for PCI root bridges
  >>>>>> sharing the same PCI domain is to sort devices MEM/IO ranges
from the
  >>>>>> biggest to smallest
  >>>>>> and use this order during allocation.
  >>>>>>
  >>>>>> After the resources allocation is finished we can build the CRS
for each
  >>>>>> PCI root bridge
  >>>>>> and pass it back to firmware/OS.
  >>>>>>
  >>>>>> While for "real" machines we can hard-code the root bridge
resources in
  >>>>>> some ROM and have it
  >>>>>> extracted early in the boot process, for the VM world this would
not be
  >>>>>> possible. Also
  >>>>>> any effort to divide the resources range before the resource
allocation
  >>>>>> would be odd and far from optimal.
  >>
  >> Hi Ray,
  >> Thank you for your response,
  >>
  >>>> Real machine uses hard-code resources for root bridges. But when
the
  >> resource
  >>>> cannot meet certain root bridges' requirement, firmware can save
the real
  >> resource
  >>>> requirement per root bridges to NV storage and divide the
resources to
  >> each root
  >>>> bridge in next boot according to the NV settings.
  >>>> The MMIO/IO routine in the real machine I mentioned above needs
to be
  >> fixed
  >>>> in a very earlier phase before the PciHostBridgeDxe driver runs.
That's to
  >> say if
  >>>> [2G, 2.8G) is configured to route to root bridge #1, only [2G,
2.8G) is
  >> allowed to
  >>>> assigned to root bride #1.  And the routine cannot be changed
unless a
  >> platform
  >>>> reset is performed.
  >>
  >> I understand.
  >>
  >>>>
  >>>> Based on your description, it sounds like all the root bridges in
OVMF share
  >> the
  >>>> same range of resource and any MMIO/IO in the range can be route
to any
  >> root
  >>>> bridge. For example, every root bridge can use [2G, 3G) MMIO.
  >>>
  >>> Exactly. This is true for "snooping" host-bridges which do not have
their own
  >>> configuration registers (or MMConfig region). They are sniffing
host-bridge
  >> 0
  >>> for configuration cycles and if the are meant for a device on a bus
number
  >>> owned by them, they will forward the transaction to their primary
root bus.
  >>>
  >>> Until in
  >>>> allocation phase, root bridge #1 is assigned to [2G, 2.8G), #2 is
assigned
  >>>> to [2.8G, 2.9G), #3 is assigned to [2.9G, 3G).
  >>
  >> Correct, but the regions do not have to be disjoint in the above
scenario.
  >> root bridge #1 can have [2G,2.4G) and [2.8,3G) while root bridge #1
can have
  >> [2.4,2.8).
  >>
  >> This is so the firmware can distribute the resources in an optimal
way. An
  >> example can be:
  >>     - root bridge #1 has a PCI device A with a huge BAR and a PCI
device B
  >> with a little BAR.
  >>     - root bridge #2 has  aPCI device C with a medium BAR.
  >> The best way to distribute resources over [2G, 3G) is A BAR, C BAR,
and only
  >> then B BAR.
  >>
  >>>> So it seems that we need a way to tell PciHostBridgeDxe driver
from the
  >> PciHostBridgeLib
  >>>> that all resources are sharable among all root bridges.
  >>
  >> This is exactly what we need, indeed.
  >>
  >>>>
  >>>> The real platform case is the allocation per root bridge and OVMF
case is
  >> the allocation
  >>>> per PCI domain.
  >>
  >> Indeed, bare metal servers use different PCI domain per host bridge,
but I've
  >> actually seen
  >> real servers that have multiple root bridges sharing the same PCI
domain, 0.
  >>
  >>
  >>>> Is my understanding correct?
  >>
  >> It is, and thank you for taking your time to understand the issue,
  >> Marcel
  >>
  >>>>
  >>> [...]


_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel



_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel



--

                                                Brian

--------------------------------------------------------------------

   Q.  How many software engineers does it take to change a light bulb?
   A.  That's a hardware problem.
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to