On Thu, 27 Sep 2018 00:31:07 +0200 Laszlo Ersek <ler...@redhat.com> wrote:
> On 09/26/18 18:26, Alex Williamson wrote: > > On Wed, 26 Sep 2018 13:12:47 +0200 > > Laszlo Ersek <ler...@redhat.com> wrote: > > > >> On 09/25/18 22:36, Alex Williamson wrote: > >>> On Tue, 25 Sep 2018 00:13:46 +0200 > >>> Laszlo Ersek <ler...@redhat.com> wrote: > >>> > >>>> In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI > >>>> hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in > >>>> the ACPI DSDT that would be at least as large as the new > >>>> "pci-hole64-size" > >>>> property (2GB on i440fx, 32GB on q35). The goal was to offer "enough" > >>>> 64-bit MMIO aperture to the guest OS for hotplug purposes. > >>>> > >>>> In that commit, we added or modified five functions: > >>>> > >>>> - pc_pci_hole64_start(): shared between i440fx and q35. Provides a > >>>> default > >>>> 64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips > >>>> the DIMM hotplug area too (if any). > >>>> > >>>> - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start(): > >>>> board-specific 64-bit base property getters called abstractly by the > >>>> ACPI generator. Both of these fall back to pc_pci_hole64_start() if the > >>>> firmware didn't program any 64-bit hole (i.e. if the firmware didn't > >>>> assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they > >>>> honor the firmware's BAR assignments (i.e., they treat the lowest > >>>> 64-bit > >>>> GPA programmed by the firmware as the base address for the aperture). > >>>> > >>>> - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end(): > >>>> these intended to extend the aperture to our size recommendation, > >>>> calculated relative to the base of the aperture. > >>>> > >>>> Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and > >>>> q35_host_get_pci_hole64_end() currently only extend the aperture relative > >>>> to the default base (pc_pci_hole64_start()), ignoring any programming > >>>> done > >>>> by the firmware. This means that our size recommendation may not be met. > >>>> Fix it by honoring the firmware's address assignments. > >>>> > >>>> The strange extension sizes were spotted by Alex, in the log of a guest > >>>> kernel running on top of OVMF (which does assign 64-bit GPAs to BARs). > >>>> > >>>> This change only affects DSDT generation, therefore no new compat > >>>> property > >>>> is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to > >>>> 64-bit BARs, the patch makes no difference to SeaBIOS guests. > >>> > >>> This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but > >>> only if it cannot satisfy all the BARs from 32-bit MMMIO, see > >>> src/fw/pciinit.c:pci_bios_map_devices. Create a VM with several > >>> assigned GPUs and you'll eventually cross that threshold and all 64-bit > >>> BARs will be moved above 4G. I'm sure a few sufficiently sized ivshmem > >>> devices could do the same. Thanks, > >> > >> The effect of this patch is not hard to demonstrate with SeaBIOS+Q35, > >> when using e.g. 5GB of guest RAM and a 4GB ivshmem-plain device. > >> > >> However, using SeaBIOS+i440fx, I can't show the difference. I've been > >> experimenting with various ivshmem devices (even multiple at the same > >> time, with different sizes). The "all or nothing" nature of SeaBIOS's > >> high allocation of the 64-bit BARs, combined with hugepage alignment > >> inside SeaBIOS, combined with the small (2GB) rounding size used in QEMU > >> for i440fx, seem to make it surprisingly difficult to trigger the issue. > >> > >> I figure I should: > >> > >> (1) remove the sentence "the patch makes no difference to SeaBIOS > >> guests" from the commit message, > >> > >> (2) include the DSDT diff on SeaBIOS/q35 in the commit message, > >> > >> (3) remain silent on SeaBIOS/i440fx, in the commit message, > >> > >> (4) append a new patch, for "bios-tables-test", so that the ACPI gen > >> change is validated as part of the test suite, on SeaBIOS/q35. > >> > >> Regarding (4): > >> > >> - is it OK if I add the test only for Q35? > >> > >> - what guest RAM size am I allowed to use in the test suite? In my own > >> SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's > >> acceptable for the test suite. > > > > Seems like you've done due diligence, the plan looks ok to me. > > Regarding the test memory allocation, is it possible and reasonable to > > perhaps create a 256MB shared memory area and re-use it for multiple > > ivshmem devices? ie. rather than 1, 4GB ivshmem device, use 16, 256MB > > devices, all with the same backing. Thanks, > > In order to show that the patch makes a difference, on SeaBIOS+Q35, I > must have SeaBIOS place the lowest 64-bit BAR strictly above end-of-RAM > / end-of-DIMM-hotplug-area (i.e., strictly past the return value of > pc_pci_hole64_start()). That's not easy, assuming test suite constraints: > > - With a small guest RAM size (and no DIMM hotplug area), > pc_pci_hole64_start() returns 4GB. Unfortunately, 4GB is well aligned > for 256MB BARs, so that's where SeaBIOS will place the lowest such BAR > too. Therefore the patch makes no difference. > > - If I try to mis-align pc_pci_hole64_start() for the 256MB BAR size, by > adding a DIMM hotplug area, it's not helpful. Because the end of the > DIMM area ("etc/reserved-memory-end") is rounded up to 1GB alignment > automatically. Similarly, pc_pci_hole64_start() is rounded up to 1GB > alignment automatically. Thus, the BARs remain aligned. > > - If I try to mis-align pc_pci_hole64_start() for the 256MB BAR size by > using more guest RAM (e.g. 2049 MB --> the low RAM split is at 2GB on > Q35, so RAM would end at 4GB+1MB), then it's the same as the previous > bullet, because pc_pci_hole64_start() is rounded up to 1GB alignment, > and the BARs remain aligned. > > - If I try to mis-align pc_pci_hole64_start() (again, 4GB, with small > guest RAM) for the ivshmem BAR size by changing the BAR size, then I > must pick a 8GB BAR size at least. And then I can't use the 256MB > "fragmentation". > > - I guess I could ascertain the mis-alignment by using small guest RAM > (128MB), a single DIMM hotplug slot so that reserved-memory-end is > rounded up to 5GB (through the 1GB alignment), and a single ivshmem > device with a 2GB BAR (mis-aligned with the reserved-memory-end at 5GB). > Is this the best compromise? > > (The last option is basically an optimization of my previous reproducer, > namely 5GB guest RAM and 4GB BAR. The 5GB guest RAM made sure that RAM > would end at an *odd* GB number -- we can optimize that by using an > empty DIMM hotplug area rather than guest RAM. And the 4GB BAR can be > optimized to the smallest *even* (positive) number of GBs, namely 2GB. that probably would do the job > But we can't do anything about the pervasive GB-alignment.) I haven't read the code to be sure but at first glance, thanks to brokenness before, it might be possible to mis-align pci_hole64 start if one would use a combination of broken_reserved_end=true && enforce_aligned_dimm=false That's assuming that an old machine type that has this combo is still suitable for testcase otherwise. > > Thanks > Laszlo