On Mon, Jan 16, 2017 at 11:35:04PM +0100, Laszlo Ersek wrote: > On 01/16/17 22:23, Ard Biesheuvel wrote: > > On 16 January 2017 at 21:13, Laszlo Ersek <ler...@redhat.com> wrote: > >> On 01/16/17 20:31, Ard Biesheuvel wrote: > >>> On 16 January 2017 at 18:20, Peter Maydell <peter.mayd...@linaro.org> > >>> wrote: > >>>> On 16 January 2017 at 17:30, Ard Biesheuvel <ard.biesheu...@linaro.org> > >>>> wrote: > >>>>> On 16 January 2017 at 17:25, Peter Maydell <peter.mayd...@linaro.org> > >>>>> wrote: > >>>>>> On 13 January 2017 at 17:32, Ard Biesheuvel > >>>>>> <ard.biesheu...@linaro.org> wrote: > >>>>>>> Linux for arm64 v4.10 and later will complain if the ECAM config > >>>>>>> space is > >>>>>>> not reserved in the ACPI namespace: > >>>>>>> > >>>>>>> acpi PNP0A08:00: [Firmware Bug]: ECAM area [mem > >>>>>>> 0x3f000000-0x3fffffff] not reserved in ACPI namespace > >>>>>>> > >>>>>>> The rationale is that OSes that don't consume the MCFG table should > >>>>>>> still > >>>>>>> be able to infer that the PCI config space MMIO region is occupied. > >>>>>>> > >>>>>>> So update the ACPI table generation routine to add this reservation. > >>>>>>> > >>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> > >>>>>>> --- > >>>>>>> hw/arm/virt-acpi-build.c | 7 +++++++ > >>>>>>> 1 file changed, 7 insertions(+) > >>>>>>> > >>>>>>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c > >>>>>>> index 085a61117378..50d52f685f68 100644 > >>>>>>> --- a/hw/arm/virt-acpi-build.c > >>>>>>> +++ b/hw/arm/virt-acpi-build.c > >>>>>>> @@ -310,6 +310,13 @@ static void acpi_dsdt_add_pci(Aml *scope, const > >>>>>>> MemMapEntry *memmap, > >>>>>>> Aml *dev_rp0 = aml_device("%s", "RP0"); > >>>>>>> aml_append(dev_rp0, aml_name_decl("_ADR", aml_int(0))); > >>>>>>> aml_append(dev, dev_rp0); > >>>>>>> + > >>>>>>> + Aml *dev_res0 = aml_device("%s", "RES0"); > >>>>>>> + aml_append(dev_res0, aml_name_decl("_HID", > >>>>>>> aml_string("PNP0C02"))); > >>>>>>> + crs = aml_resource_template(); > >>>>>>> + aml_append(crs, aml_memory32_fixed(base_ecam, size_ecam, > >>>>>>> AML_READ_WRITE)); > >>>>>>> + aml_append(dev_res0, aml_name_decl("_CRS", crs)); > >>>>>>> + aml_append(dev, dev_res0); > >>>>>>> aml_append(scope, dev); > >>>>>>> } > >>>>>> > >>>>>> This needs to be controlled via the machine class back-compat > >>>>>> machinery in hw/arm/virt.c so that it only happens for virt-2.9 > >>>>>> and later. > >>>>>> > >>>>> > >>>>> Why exactly? > >>>> > >>>> Because the "virt-2.8" machine has to present to the guest > >>>> exactly what "virt" did as of the QEMU 2.8 release, including > >>>> any bugs or missing things we happened to have in our ACPI > >>>> tables. This allows cross-version compatibility (including > >>>> VM migration). Drew will have a more detailed explanation > >>>> if you need it. > >>>> > >>> > >>> I suspected as much. > >>> > >>> But in this case, I am not sure if it is worth the trouble: the > >>> generated data is only consumed at boot time by the firmware, and I > >>> suppose migration involves freezing a VM, including whatever resident > >>> firmware image was used to boot the OS, and so this is unlikely to > >>> affect migration. > >>> > >>> But I will let Drew explain ... > >> > >> The PCI Firmware Specification (rev 3.1) says in 4.1.2. "MCFG Table > >> Description": "The resources can *optionally* be returned in [...] > >> EFIGetMemoryMap as reserved memory [...]". (Emphasis mine.) Linux seems > >> to *insist* on this kind of reservation however. > >> > > > > No, not at the UEFI level but at the ACPI level. Reservations in the > > UEFI memory map describe memory not MMIO space > > > >> PNP0C02 is "General ID for reserving resources required by PnP > >> motherboard registers. (Not device specific.)", according to > >> <http://www.plasma-online.de/english/identify/serial/pnp_id_pnp.html>. > >> So what this patch does is reserve a memory area through ACPI, > >> practically as an unspecified "platform resource". > >> > > > > This has been discussed at great length on the linux mailing lists > > > > https://patchwork.kernel.org/patch/9453149/ > > > >> There's an alternative that's contained entirely in the firmware. You > >> can cover the MMCONFIG area in ArmVirtQemu with an EfiReservedMemoryType > >> memory map entry (by producing an appropriate memalloc HOB in PEI, or by > >> calling the appropriate gDS memory space map functions in DXE). OVMF > >> does the former (memalloc HOB). > >> > >> In ArmVirtQemu, we grab the MMCONFIG range from "pci-host-ecam-generic", > >> from QEMU's DTB. If you don't dislike the idea, we could cover the range > >> as well, right in "ArmVirtPkg/Library/FdtPciPcdProducerLib". That lib > >> instance already sets the base address PCD, and makes sure that the > >> relevant code is executed only once (in whatever driver module the > >> library instance was built into). You could call the gDS functions > >> mentioned above from that spot. (The library instance is already > >> restricted to DXE_DRIVER and UEFI_DRIVER modules.) > >> > > > > In general, I think describing MMIO in the UEFI memory map is not very > > useful, and counter to the spec, which mentions that the memory map > > describes memory ("however it is used"), not memory *space* (unless > > UEFI itself needs to access it to implement runtime services) > > > > The UEFI memory map will reflect allocations from the GCD memory space, > for the Reserved and MMIO types. See "Figure 2. GCD Memory State > Transitions" in "7.2.2 GCD Memory Resources", Vol2 of the PI spec. > > See also "9.7.1 UEFI Boot Services Dependencies" in the same, > > 9.7.1.8 GetMemoryMap() > > The GetMemoryMap() implementation must include into the UEFI memory > map all GCD map entries of types EfiGcdMemoryTypeReserved and > EfiPersistentMemory, and all GCD map entries of type > EfiGcdMemoryTypeMemoryMappedIo that have EFI_MEMORY_RUNTIME attribute > set. > > (Note that I wrote Reserved earlier, not MMIO.) > > However, you are right that *just* the UEFI memmap entry is not > sufficient, according to the PCI firmware spec. (Regardless of the fact > that in practice, just the memmap entry does keep Linux happy. Or is it > about to change?) > > Namely, looking again at the spot I quoted above (and it's also quoted > in the kernel docs patch that you linked above, under ref [6]), we find > > If the operating system does not natively comprehend reserving the > MMCFG region, the MMCFG region must be reserved by firmware. The > address range reported in the MCFG table or by _CBA method (see > Section 4.1.3) must be reserved by declaring a motherboard resource. > For most systems, the motherboard resource would appear at the root > of the ACPI namespace (under \_SB) in a node with a _HID of EISAID > (PNP0C02), and the resources in this case should not be claimed in > the root PCI bus’s _CRS. The resources can optionally be returned in > Int15 E820 or EFIGetMemoryMap as reserved memory but must always be > reported through ACPI as a motherboard resource. > > Therefore I agree that reserving the MMCONFIG area via a PNP0C02 object > in QEMU's ACPI payload improves spec conformance. > > (Actually, the argument can be made for x86/Q35 as well. Adding Marcel > and MST.)
I agree, thanks for pointing this out. Patch, anyone? > ... Beyond the machine-type dependency raised by Peter (which I gather > is still being discussed), I suggest that the commit message of this > patch quote the relevant passage from the PCI fw spec in full (see > above, or in the kernel docs patch). > > Thanks! > Laszlo