On 02/01/16 16:07, Marcel Apfelbaum wrote: > On 01/26/2016 07:17 AM, Ni, Ruiyu wrote: >> Laszlo, >> I now understand your problem. >> Can you tell me why OVMF needs multiple root bridges support? >> My understanding to OVMF is it's a firmware which can be used in a >> guest VM >> environment to boot OS. >> Multiple root bridges requirement currently mainly comes from high-end >> servers. >> Do you mean that the VM guest needs to be like a high-end server? >> This may help me to think about the possible solution to your problem. > > Hi Ray, > > Laszlo's explanation is very good, this is not exactly about high-end VMs, > we need the extra root bridges to match assigned devices to their > corresponding NUMA node. > > Regarding the OVMF issue, the main problem is that the extra root > bridges are created dynamically > for the VMs (command line parameter) and their resources are computed on > the fly. > > Not directly related to the above, the optimal way to allocate resources > for PCI root bridges > sharing the same PCI domain is to sort devices MEM/IO ranges from the > biggest to smallest > and use this order during allocation. > > After the resources allocation is finished we can build the CRS for each > PCI root bridge > and pass it back to firmware/OS. > > While for "real" machines we can hard-code the root bridge resources in > some ROM and have it > extracted early in the boot process, for the VM world this would not be > possible. Also > any effort to divide the resources range before the resource allocation > would be odd and far from optimal. > > Regarding a possible solution, I first need to understand why the > resource allocation is done > per PCI root bridge and not per PCI domain. The CRS allows a PCI root > bridge to have several MEM/IO ranges > so there is really no need to impose a per PCI root bridge logic. > > I am new to the edk2 project so I might get things wrong, but I think we > need a way to specify if > the PCI root bridges will supply their resources or if an external > allocator will do the job. > Laszlo proposed solution looks like a way to implement such a policy, I > am personally OK with it. > > I really think the generic PciHostBridgeDxe driver is the right way to > go also for OVMF, we just need > a way to deal with this issue.
For the sake of discussion, I pushed my WIP patches to: https://github.com/lersek/edk2/commits/pci_host_wip It is a faithful conversion of OvmfPkg/PciHostBridgeDxe/, to MdeModulePkg's PciHostBridgeDxe, plus an OVMF-specific PciHostBridgeLib instance. The problem is of course patch MdeModulePkg: PciHostBridgeDxe: allow PciHostBridgeLib to prime the GCD map which we've been discussing in this thread. Thanks Laszlo >> -----Original Message----- >> From: Laszlo Ersek [mailto:ler...@redhat.com] >> Sent: Tuesday, January 26, 2016 11:10 AM >> To: Ni, Ruiyu <ruiyu...@intel.com> >> Cc: edk2-de...@ml01.01.org; Tian, Feng <feng.t...@intel.com>; Fan, >> Jeff <jeff....@intel.com>; Justen, Jordan L >> <jordan.l.jus...@intel.com>; Marcel Apfelbaum <mar...@redhat.com> >> Subject: Re: [edk2] [Patch V4 4/4] MdeModulePkg: Add generic >> PciHostBridgeDxe driver. >> >> On 01/26/16 03:42, Ni, Ruiyu wrote: >>> Laszlo, >>> Thanks for the detailed explanation and I agrees with your idea that >>> a generic >>> driver should consider the case that part of the IO/MMIO resource may be >>> already added by platform, no matter by PEI or by DXE. >>> >>> So I think we may not need the flag you proposed here. To achieve the >>> maximum >>> flexibility, we can use gDS.GetMemorySpaceMap/GetIoSpaceMap to >>> retrieve all >>> the current MMIO/IO space map and only add those ranges that are not >>> added. >>> >>> And to be more robust, we can check that the range specified to the >>> root bridge >>> should not be allocated by anyone (CAN be added by someone). >>> >>> What's your opinion? >> >> Very good, but painful questions. :) >> >> Until now, OVMF has relied upon PlatformPei producing the HOBs in >> question (where the base of the 32-bit PCI MMIO window varies in the >> HOB, based on guest RAM size), *and* OVMF's PCI host bridge driver has >> never even tried to match the root bridges' 32-bit MMIO apertures to >> anything at all. All of those apertures are set as [2GB, 4GB). >> >> The idea being, the gDS->AllocateMemorySpace() calls can be satisfied >> from whatever room is left from the range that the HOB identified. >> Especially in the case of multiple root bridges, I wouldn't know *how* >> to divide up the MMIO address space between them. (Dividing up the bus >> range was hard enough.) >> >> Same for IO ports -- all root bridges share the same [0xC000, 0xFFFF] >> port range. >> >> Therefore your idea above would immediately break OVMF, functionally :) >> >> (I suspect that OVMF is kinda broken already, in this aspect, but it >> happens to work.) >> >> I know precious little about PCI resource allocation. This is one of the >> areas we intend to look into seriously in the future, with Marcel >> (CC'd). Thus, if porting OVMF to the generic PciHostBridgeDxe driver >> requires incompatible changes at once, then I think we should postpone >> the port until Marcel returns from his vacation. >> >> To reiterate, my current problem is ultimately the lack of an algorithm >> for dynamically dividing up the IO and MMIO space between root bridges. >> Do you have an idea for that? (The algorithm, if there is one, could >> very well be specific to QEMU -- which too is where we'll need Marcel's >> input.) >> >> Thanks! >> Laszlo >> >>> >>> >>> Regards, >>> Ray >>> >>> >>> -----Original Message----- >>> From: Laszlo Ersek [mailto:ler...@redhat.com] >>> Sent: Tuesday, January 26, 2016 10:20 AM >>> To: Ni, Ruiyu <ruiyu...@intel.com> >>> Cc: edk2-de...@ml01.01.org; Tian, Feng <feng.t...@intel.com>; Fan, >>> Jeff <jeff....@intel.com>; Justen, Jordan L <jordan.l.jus...@intel.com> >>> Subject: Re: [edk2] [Patch V4 4/4] MdeModulePkg: Add generic >>> PciHostBridgeDxe driver. >>> >>> Hi Ray, >>> >>> I have a work-in-progress series to convert OvmfPkg to use this driver. >>> While testing it, I ran into a failed assertion: >>> >>> On 01/13/16 09:00, Ruiyu Ni wrote: >>>> This driver links to PciHostBridgeLib provided by platform/silicon to >>>> produce PciRootBridgeIo and PciHostBridgeResourceAllocation protocol. >>>> >>>> Contributed-under: TianoCore Contribution Agreement 1.0 >>>> Signed-off-by: Ruiyu Ni <ruiyu...@intel.com> >>>> Cc: Jeff Fan <jeff....@intel.com> >>>> Cc: Feng Tian <feng.t...@intel.com> >>>> --- >>>> .../Bus/Pci/PciHostBridgeDxe/PciHostBridge.c | 1134 >>>> ++++++++++++++ >>>> .../Bus/Pci/PciHostBridgeDxe/PciHostBridge.h | 252 ++++ >>>> .../Bus/Pci/PciHostBridgeDxe/PciHostBridgeDxe.inf | 55 + >>>> .../Bus/Pci/PciHostBridgeDxe/PciHostResource.h | 47 + >>>> .../Bus/Pci/PciHostBridgeDxe/PciRootBridge.h | 568 +++++++ >>>> .../Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c | 1561 >>>> ++++++++++++++++++++ >>>> MdeModulePkg/MdeModulePkg.dsc | 2 + >>>> 7 files changed, 3619 insertions(+) >>>> create mode 100644 >>>> MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciHostBridge.c >>>> create mode 100644 >>>> MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciHostBridge.h >>>> create mode 100644 >>>> MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciHostBridgeDxe.inf >>>> create mode 100644 >>>> MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciHostResource.h >>>> create mode 100644 >>>> MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridge.h >>>> create mode 100644 >>>> MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c >>>> >>>> diff --git a/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciHostBridge.c >>>> b/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciHostBridge.c >>>> new file mode 100644 >>>> index 0000000..08285d8 >>>> --- /dev/null >>>> +++ b/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciHostBridge.c >>>> @@ -0,0 +1,1134 @@ >>>> +/** @file >>>> + >>>> + Provides the basic interfaces to abstract a PCI Host Bridge >>>> Resource Allocation. >>>> + >>>> +Copyright (c) 1999 - 2016, Intel Corporation. All rights reserved.<BR> >>>> +This program and the accompanying materials >>>> +are licensed and made available under the terms and conditions of >>>> the BSD License >>>> +which accompanies this distribution. The full text of the license >>>> may be found at >>>> +http://opensource.org/licenses/bsd-license.php >>>> + >>>> +THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS, >>>> +WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS >>>> OR IMPLIED. >>>> + >>>> +**/ >>>> + >>>> +#include "PciHostBridge.h" >>>> +#include "PciRootBridge.h" >>>> +#include "PciHostResource.h" >>>> + >>>> + >>>> +EFI_METRONOME_ARCH_PROTOCOL *mMetronome; >>>> +EFI_CPU_IO2_PROTOCOL *mCpuIo; >>>> + >>>> +GLOBAL_REMOVE_IF_UNREFERENCED CHAR16 *mAcpiAddressSpaceTypeStr[] = { >>>> + L"Mem", L"I/O", L"Bus" >>>> +}; >>>> +GLOBAL_REMOVE_IF_UNREFERENCED CHAR16 *mPciResourceTypeStr[] = { >>>> + L"I/O", L"Mem", L"PMem", L"Mem64", L"PMem64", L"Bus" >>>> +}; >>>> + >>>> +/** >>>> + >>>> + Entry point of this driver. >>>> + >>>> + @param ImageHandle Image handle of this driver. >>>> + @param SystemTable Pointer to standard EFI system table. >>>> + >>>> + @retval EFI_SUCCESS Succeed. >>>> + @retval EFI_DEVICE_ERROR Fail to install PCI_ROOT_BRIDGE_IO >>>> protocol. >>>> + >>>> +**/ >>>> +EFI_STATUS >>>> +EFIAPI >>>> +InitializePciHostBridge ( >>>> + IN EFI_HANDLE ImageHandle, >>>> + IN EFI_SYSTEM_TABLE *SystemTable >>>> + ) >>>> +{ >>>> + EFI_STATUS Status; >>>> + PCI_HOST_BRIDGE_INSTANCE *HostBridge; >>>> + PCI_ROOT_BRIDGE_INSTANCE *RootBridge; >>>> + PCI_ROOT_BRIDGE *RootBridges; >>>> + UINTN RootBridgeCount; >>>> + UINTN Index; >>>> + PCI_ROOT_BRIDGE_APERTURE *MemApertures[4]; >>>> + UINTN MemApertureIndex; >>>> + >>>> + RootBridges = PciHostBridgeGetRootBridges (&RootBridgeCount); >>>> + if ((RootBridges == NULL) || (RootBridgeCount == 0)) { >>>> + return EFI_UNSUPPORTED; >>>> + } >>>> + >>>> + Status = gBS->LocateProtocol (&gEfiMetronomeArchProtocolGuid, >>>> NULL, (VOID **) &mMetronome); >>>> + ASSERT_EFI_ERROR (Status); >>>> + Status = gBS->LocateProtocol (&gEfiCpuIo2ProtocolGuid, NULL, >>>> (VOID **) &mCpuIo); >>>> + ASSERT_EFI_ERROR (Status); >>>> + >>>> + // >>>> + // Most systems in the world including complex servers have only >>>> one Host Bridge. >>>> + // >>>> + HostBridge = AllocateZeroPool (sizeof (PCI_HOST_BRIDGE_INSTANCE)); >>>> + ASSERT (HostBridge != NULL); >>>> + >>>> + HostBridge->Signature = PCI_HOST_BRIDGE_SIGNATURE; >>>> + HostBridge->CanRestarted = TRUE; >>>> + InitializeListHead (&HostBridge->RootBridges); >>>> + >>>> + HostBridge->ResAlloc.NotifyPhase = NotifyPhase; >>>> + HostBridge->ResAlloc.GetNextRootBridge = GetNextRootBridge; >>>> + HostBridge->ResAlloc.GetAllocAttributes = GetAttributes; >>>> + HostBridge->ResAlloc.StartBusEnumeration = StartBusEnumeration; >>>> + HostBridge->ResAlloc.SetBusNumbers = SetBusNumbers; >>>> + HostBridge->ResAlloc.SubmitResources = SubmitResources; >>>> + HostBridge->ResAlloc.GetProposedResources = GetProposedResources; >>>> + HostBridge->ResAlloc.PreprocessController = PreprocessController; >>>> + >>>> + Status = gBS->InstallMultipleProtocolInterfaces ( >>>> + &HostBridge->Handle, >>>> + &gEfiPciHostBridgeResourceAllocationProtocolGuid, >>>> &HostBridge->ResAlloc, >>>> + NULL >>>> + ); >>>> + if (EFI_ERROR (Status)) { >>>> + FreePool (HostBridge); >>>> + PciHostBridgeFreeRootBridges (RootBridges, RootBridgeCount); >>>> + return Status; >>>> + } >>>> + >>>> + // >>>> + // Create Root Bridge Device Handle in this Host Bridge >>>> + // >>>> + for (Index = 0; Index < RootBridgeCount; Index++) { >>>> + // >>>> + // Create Root Bridge Handle Instance >>>> + // >>>> + RootBridge = CreateRootBridge (&RootBridges[Index], >>>> HostBridge->Handle); >>>> + ASSERT (RootBridge != NULL); >>>> + if (RootBridge == NULL) { >>>> + continue; >>>> + } >>>> + >>>> + if (RootBridges[Index].Io.Limit > RootBridges[Index].Io.Base) { >>>> + Status = gDS->AddIoSpace ( >>>> + EfiGcdIoTypeIo, >>>> + RootBridges[Index].Io.Base, >>>> + RootBridges[Index].Io.Limit - >>>> RootBridges[Index].Io.Base + 1 >>>> + ); >>>> + ASSERT_EFI_ERROR (Status); >>>> + } >>> >>> This assertion fails here with with "Access Denied". >>> >>> The reason is that we add IO and MMIO space still in PEI, as resource >>> descriptor HOBs, and then the DXE core primes the GCD memory space map >>> from them. Adding the same IO space here with the appropriate DXE >>> service runs into a conflict justifiedly, and it is rejected. >>> >>> My question is if we could make the GCD memory space map manipulation >>> optional in this driver: >>> >>> The PCI_ROOT_BRIDGE structure could be extended with a UINT32 flag. For >>> each of Io, Mem, MemAbove4G, PMem, PMemAbove4G, the PciHostBridgeLib >>> instance producing the PCI_ROOT_BRIDGE structure could set a separate >>> bit, indicating whether PciHostBridgeDxe should add (and set the >>> attributes of) the given aperture to the GCD memory space map. >>> >>> Since this is a generic driver, I think it should accommodate the case >>> when the relevant apertures are described during PEI. (Not just because >>> that's what OVMF does, but because said practice is valid, as far as I >>> understand / recall the PI specs.) >>> >>> In my (limited) experience it varies between PCI host bridge drivers >>> whether they install the apertures themselves, or rely on them already >>> being present. >>> >>> In ArmVirtPkg/PciHostBridgeDxe/, the driver installs the (emulated) IO >>> and (real) MMIO space itself; while in OvmfPkg (and the original >>> PcAtChipsetPkg driver), this does not happen. >>> >>> If you agree with the idea, I can try writing a patch for it. >>> >>> Alternatively, you could require from the PciHostBridgeGetRootBridges() >>> API that on return, all the necessary GCD memory space map entries be in >>> place. At the moment, both approaches would work for me. >>> >>> ... Hm, I was too curious, and wrote the patch just now. OVMF (ported to >>> PciHostBridgeDxe) works well with it; I retested the multiple root >>> bridges case too, and compared the OVMF debug log & the Linux guest >>> dmesg before<->after. The resource assignments are the same. I also >>> regression tested a Windows 8.1 guest, with 6GB RAM. >>> >>> So, what do you think of the attached patch? If you are okay with it, I >>> can submit it as part of my upcoming OVMF series. >>> >>> Thanks! >>> Laszlo >>> >>> [...] >>> >> >> _______________________________________________ >> edk2-devel mailing list >> edk2-devel@lists.01.org >> https://lists.01.org/mailman/listinfo/edk2-devel >> > _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel