On Fri, Sep 21, 2018 at 06:01:30PM +0300, Marcel Apfelbaum wrote: > > > On 09/20/2018 05:49 PM, Laszlo Ersek wrote: > > Hi Marcel, > > Hi Laszlo, > I had to read this mail a few times... > > > > > this email should actually be an RFC patch. But RFC patches tend to turn > > into real PATCHes (if the submitter is lucky, that is), and I can't > > really promise sending multiple versions of a PATCH at this time. So > > please consider this a "maybe bug report". > > > > In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI > > hole", 2017-11-16) we added logic so that QEMU expand the 64-bit PCI > > hole (for hotplug purposes), if (a) the firmware doesn't "configure" one > > (via programming individual BARs with 64-bit addresses), or (b) the > > firmware's programming results in an aperture smaller than we'd like > > (32GB on Q35). > > Right > > > > > We made sure that the aperture required by the firmware's programming > > would never be shrunken or otherwise truncated by QEMU, so that's fine. > > However, the expansion doesn't work as "widely" in all cases as it > > should. > > > > Consider the following three functions, at current master (= commit > > 19b599f7664b): > > > > [hw/i386/pc.c] > > > > > /* > > > * The 64bit pci hole starts after "above 4G RAM" and > > > * potentially the space reserved for memory hotplug. > > > */ > > > uint64_t pc_pci_hole64_start(void) > > > { > > > PCMachineState *pcms = PC_MACHINE(qdev_get_machine()); > > > PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms); > > > MachineState *ms = MACHINE(pcms); > > > uint64_t hole64_start = 0; > > > > > > if (pcmc->has_reserved_memory && ms->device_memory->base) { > > > hole64_start = ms->device_memory->base; > > > if (!pcmc->broken_reserved_end) { > > > hole64_start += memory_region_size(&ms->device_memory->mr); > > > } > > > } else { > > > hole64_start = 0x100000000ULL + pcms->above_4g_mem_size; > > > } > > > > > > return ROUND_UP(hole64_start, 1 * GiB); > > > } > > [hw/pci-host/q35.c] > > > > > /* > > > * The 64bit PCI hole start is set by the Guest firmware > > > * as the address of the first 64bit PCI MEM resource. > > > * If no PCI device has resources on the 64bit area, > > > * the 64bit PCI hole will start after "over 4G RAM" and the > > > * reserved space for memory hotplug if any. > > > */ > > > static void q35_host_get_pci_hole64_start(Object *obj, Visitor *v, > > > const char *name, void *opaque, > > > Error **errp) > > > { > > > PCIHostState *h = PCI_HOST_BRIDGE(obj); > > > Q35PCIHost *s = Q35_HOST_DEVICE(obj); > > > Range w64; > > > uint64_t value; > > > > > > pci_bus_get_w64_range(h->bus, &w64); > > > value = range_is_empty(&w64) ? 0 : range_lob(&w64); > > > if (!value && s->pci_hole64_fix) { > > > value = pc_pci_hole64_start(); > > > } > > > visit_type_uint64(v, name, &value, errp); > > > } > > > > > > /* > > > * The 64bit PCI hole end is set by the Guest firmware > > > * as the address of the last 64bit PCI MEM resource. > > > * Then it is expanded to the PCI_HOST_PROP_PCI_HOLE64_SIZE > > > * that can be configured by the user. > > > */ > > > static void q35_host_get_pci_hole64_end(Object *obj, Visitor *v, > > > const char *name, void *opaque, > > > Error **errp) > > > { > > > PCIHostState *h = PCI_HOST_BRIDGE(obj); > > > Q35PCIHost *s = Q35_HOST_DEVICE(obj); > > > uint64_t hole64_start = pc_pci_hole64_start(); > > > Range w64; > > > uint64_t value, hole64_end; > > > > > > pci_bus_get_w64_range(h->bus, &w64); > > > value = range_is_empty(&w64) ? 0 : range_upb(&w64) + 1; > > > hole64_end = ROUND_UP(hole64_start + s->mch.pci_hole64_size, 1ULL << > > > 30); > > > if (s->pci_hole64_fix && value < hole64_end) { > > > value = hole64_end; > > > } > > > visit_type_uint64(v, name, &value, errp); > > > } > > > > > Now consider the following scenario: > > > > - the firmware programs some BARs with 64-bit addresses such that the > > aperture that we deduce starts at 32GB, > > > > - the guest has 4GB of RAM, and no DIMM hotplug range. > > > > Consequences: > > > > - Because the "32-bit RAM split" for Q35 is at 2GB, the > > pc_pci_hole64_start() function will return 6GB. > > > > - The q35_host_get_pci_hole64_start() function will return 32GB. (It > > will not fall back to pc_pci_hole64_start() -- correctly -- because > > the firmware has programmed some BARs with 64-bit addresses.) > > > > - The q35_host_get_pci_hole64_end() function *intends* to return 64GB, > > because -- let's say -- the guest assigned BARs covering the > > 32GB..34GB range, which is 2GB in size, and we *intend* to round that > > size up to 32GB, so that 30GB be left for hotplug purposes. (This is > > the original intent of commit 9fa99d2519cb.) > > - However, because we initialize "hole64_start" from > > pc_pci_hole64_start(), and not from q35_host_get_pci_hole64_start(), > > we add "mch.pci_hole64_size" (32GB by default) to 6GB (the end of > > RAM), and not to 32GB (the aperture base deduced from the firmware's > > programming). As a result, we'll extend the aperture end address only > > to 38GB, and not to 64GB. > > Right, there is no reason to use pc_pci_hole64_start, it looks > like a plain bug. We diverged from pc and the fact that > q35_host_get_pci_hole64_star uses it is only an implementation > detail. > > > My suggestion is simply to initialize "hole64_start" from > > q35_host_get_pci_hole64_start(), in the q35_host_get_pci_hole64_end() > > function. If the firmware doesn't program 64-bit addresses, then this > > change is a no-op -- q35_host_get_pci_hole64_start() will fall back to > > pc_pci_hole64_start() in that case. Otherwise, the behavior will be > > fixed. > > Looks OK to me. > > > Now, there's another complication, obviously -- machine type compat. In > > commit 9fa99d2519cb, we added the "pci_hole64_fix" compat property. I > > assume the additional fix I'm proposing requires another compat > > property? > > We have to, is a guest visible change. I really don't like these compat > properties, but I don't see a way around it.
Well does it only affect ACPI? Or other stuff? ACPI changes are mostly safe without need for compat things. > > Something like: > > > > > diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c > > > index 02f95765880a..c02b128189cd 100644 > > > --- a/hw/pci-host/q35.c > > > +++ b/hw/pci-host/q35.c > > > @@ -138,12 +138,14 @@ static void q35_host_get_pci_hole64_end(Object > > > *obj, Visitor *v, > > > { > > > PCIHostState *h = PCI_HOST_BRIDGE(obj); > > > Q35PCIHost *s = Q35_HOST_DEVICE(obj); > > > - uint64_t hole64_start = pc_pci_hole64_start(); > > > + uint64_t hole64_start; > > > Range w64; > > > uint64_t value, hole64_end; > > > > > > pci_bus_get_w64_range(h->bus, &w64); > > > value = range_is_empty(&w64) ? 0 : range_upb(&w64) + 1; > > > + hole64_start = s->pci_hole64_fix2 ? q35_host_get_pci_hole64_start() : > > > + pc_pci_hole64_start(); > > > hole64_end = ROUND_UP(hole64_start + s->mch.pci_hole64_size, 1ULL > > > << 30); > > > if (s->pci_hole64_fix && value < hole64_end) { > > > value = hole64_end; > > The same would apply to i440fx too. > > I am lost here. The q35 PCI 64bit hole computation issue starts from the > miss-use > of the PC conterpart functions. What is the problem with the PC? > > > > > What do you think? > > I think is a clear bug, even we can say we promised "32G" hole and we do > provide it. > But we clearly intended to have 32G-64G space in this case. > > Michael, what do you think? > > Thanks, > Marcel > I agree here. > > > > Thanks > > Laszlo