Hi Marcel, this email should actually be an RFC patch. But RFC patches tend to turn into real PATCHes (if the submitter is lucky, that is), and I can't really promise sending multiple versions of a PATCH at this time. So please consider this a "maybe bug report".
In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI hole", 2017-11-16) we added logic so that QEMU expand the 64-bit PCI hole (for hotplug purposes), if (a) the firmware doesn't "configure" one (via programming individual BARs with 64-bit addresses), or (b) the firmware's programming results in an aperture smaller than we'd like (32GB on Q35). We made sure that the aperture required by the firmware's programming would never be shrunken or otherwise truncated by QEMU, so that's fine. However, the expansion doesn't work as "widely" in all cases as it should. Consider the following three functions, at current master (= commit 19b599f7664b): [hw/i386/pc.c] > /* > * The 64bit pci hole starts after "above 4G RAM" and > * potentially the space reserved for memory hotplug. > */ > uint64_t pc_pci_hole64_start(void) > { > PCMachineState *pcms = PC_MACHINE(qdev_get_machine()); > PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms); > MachineState *ms = MACHINE(pcms); > uint64_t hole64_start = 0; > > if (pcmc->has_reserved_memory && ms->device_memory->base) { > hole64_start = ms->device_memory->base; > if (!pcmc->broken_reserved_end) { > hole64_start += memory_region_size(&ms->device_memory->mr); > } > } else { > hole64_start = 0x100000000ULL + pcms->above_4g_mem_size; > } > > return ROUND_UP(hole64_start, 1 * GiB); > } [hw/pci-host/q35.c] > /* > * The 64bit PCI hole start is set by the Guest firmware > * as the address of the first 64bit PCI MEM resource. > * If no PCI device has resources on the 64bit area, > * the 64bit PCI hole will start after "over 4G RAM" and the > * reserved space for memory hotplug if any. > */ > static void q35_host_get_pci_hole64_start(Object *obj, Visitor *v, > const char *name, void *opaque, > Error **errp) > { > PCIHostState *h = PCI_HOST_BRIDGE(obj); > Q35PCIHost *s = Q35_HOST_DEVICE(obj); > Range w64; > uint64_t value; > > pci_bus_get_w64_range(h->bus, &w64); > value = range_is_empty(&w64) ? 0 : range_lob(&w64); > if (!value && s->pci_hole64_fix) { > value = pc_pci_hole64_start(); > } > visit_type_uint64(v, name, &value, errp); > } > > /* > * The 64bit PCI hole end is set by the Guest firmware > * as the address of the last 64bit PCI MEM resource. > * Then it is expanded to the PCI_HOST_PROP_PCI_HOLE64_SIZE > * that can be configured by the user. > */ > static void q35_host_get_pci_hole64_end(Object *obj, Visitor *v, > const char *name, void *opaque, > Error **errp) > { > PCIHostState *h = PCI_HOST_BRIDGE(obj); > Q35PCIHost *s = Q35_HOST_DEVICE(obj); > uint64_t hole64_start = pc_pci_hole64_start(); > Range w64; > uint64_t value, hole64_end; > > pci_bus_get_w64_range(h->bus, &w64); > value = range_is_empty(&w64) ? 0 : range_upb(&w64) + 1; > hole64_end = ROUND_UP(hole64_start + s->mch.pci_hole64_size, 1ULL << 30); > if (s->pci_hole64_fix && value < hole64_end) { > value = hole64_end; > } > visit_type_uint64(v, name, &value, errp); > } > Now consider the following scenario: - the firmware programs some BARs with 64-bit addresses such that the aperture that we deduce starts at 32GB, - the guest has 4GB of RAM, and no DIMM hotplug range. Consequences: - Because the "32-bit RAM split" for Q35 is at 2GB, the pc_pci_hole64_start() function will return 6GB. - The q35_host_get_pci_hole64_start() function will return 32GB. (It will not fall back to pc_pci_hole64_start() -- correctly -- because the firmware has programmed some BARs with 64-bit addresses.) - The q35_host_get_pci_hole64_end() function *intends* to return 64GB, because -- let's say -- the guest assigned BARs covering the 32GB..34GB range, which is 2GB in size, and we *intend* to round that size up to 32GB, so that 30GB be left for hotplug purposes. (This is the original intent of commit 9fa99d2519cb.) - However, because we initialize "hole64_start" from pc_pci_hole64_start(), and not from q35_host_get_pci_hole64_start(), we add "mch.pci_hole64_size" (32GB by default) to 6GB (the end of RAM), and not to 32GB (the aperture base deduced from the firmware's programming). As a result, we'll extend the aperture end address only to 38GB, and not to 64GB. My suggestion is simply to initialize "hole64_start" from q35_host_get_pci_hole64_start(), in the q35_host_get_pci_hole64_end() function. If the firmware doesn't program 64-bit addresses, then this change is a no-op -- q35_host_get_pci_hole64_start() will fall back to pc_pci_hole64_start() in that case. Otherwise, the behavior will be fixed. Now, there's another complication, obviously -- machine type compat. In commit 9fa99d2519cb, we added the "pci_hole64_fix" compat property. I assume the additional fix I'm proposing requires another compat property? Something like: > diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c > index 02f95765880a..c02b128189cd 100644 > --- a/hw/pci-host/q35.c > +++ b/hw/pci-host/q35.c > @@ -138,12 +138,14 @@ static void q35_host_get_pci_hole64_end(Object *obj, > Visitor *v, > { > PCIHostState *h = PCI_HOST_BRIDGE(obj); > Q35PCIHost *s = Q35_HOST_DEVICE(obj); > - uint64_t hole64_start = pc_pci_hole64_start(); > + uint64_t hole64_start; > Range w64; > uint64_t value, hole64_end; > > pci_bus_get_w64_range(h->bus, &w64); > value = range_is_empty(&w64) ? 0 : range_upb(&w64) + 1; > + hole64_start = s->pci_hole64_fix2 ? q35_host_get_pci_hole64_start() : > + pc_pci_hole64_start(); > hole64_end = ROUND_UP(hole64_start + s->mch.pci_hole64_size, 1ULL << 30); > if (s->pci_hole64_fix && value < hole64_end) { > value = hole64_end; The same would apply to i440fx too. What do you think? Thanks Laszlo