On Tue, Jun 09, 2015 at 10:34:32PM +0200, Laszlo Ersek wrote:
> On 06/07/15 11:23, Michael S. Tsirkin wrote:
> > On Sat, Jun 06, 2015 at 01:46:29AM +0200, Laszlo Ersek wrote:
> >> OVMF downloads the ACPI linker/loader script from QEMU when the edk2 PCI
> >> Bus driver globally signals the firmware that PCI enumeration and resource
> >> allocation have completed. At this point QEMU regenerates the ACPI payload
> >> in an fw_cfg read callback, and this is when the PXB's _CRS gets
> >> populated.
> >>
> >> Unfortunately, when this happens, the PCI_COMMAND_MEMORY bit is clear in
> >> the root bus's command register, *unlike* under SeaBIOS. The consequences
> >> unfold as follows:
> >>
> >> - When build_crs() fetches dev->io_regions[i].addr, it is all-bits-one,
> >>   because pci_update_mappings() --> pci_bar_address() calculated it as
> >>   PCI_BAR_UNMAPPED, due to the PCI_COMMAND_MEMORY bit being clear.
> >>
> >> - Consequently, the SHPC MMIO BAR (bar 0) of the bridge is not added to
> >>   the _CRS, *despite* having been programmed in PCI config space.
> >>
> >> - Similarly, the SHPC MMIO BAR of the PXB is not removed from the main
> >>   root bus's DWordMemory descriptor.
> >>
> >> - Guest OSes (Linux and Windows alike) notice the pre-programmed SHPC BAR
> >>   within the PXB's config space, and notice that it conflicts with the
> >>   main root bus's memory resource descriptors. Linux reports
> >>
> >>   pci 0000:04:00.0: BAR 0: can't assign mem (size 0x100)
> >>   pci 0000:04:00.0: BAR 0: trying firmware assignment [mem
> >>                            0x88200000-0x882000ff 64bit]
> >>   pci 0000:04:00.0: BAR 0: [mem 0x88200000-0x882000ff 64bit] conflicts
> >>                            with PCI Bus 0000:00 [mem
> >>                            0x88200000-0xfebfffff]
> >>
> >>   While Windows Server 2012 R2 reports
> >>
> >>     https://technet.microsoft.com/en-us/library/cc732199%28v=ws.10%29.aspx
> >>
> >>     This device cannot find enough free resources that it can use. If you
> >>     want to use this device, you will need to disable one of the other
> >>     devices on this system. (Code 12)
> >>
> >> (This issue was apparently encountered earlier, see:
> >>
> >>   https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
> >>
> >> and the current hole-punching logic in build_crs() and build_ssdt() is
> >> probably supposed to remedy exactly that problem -- however, for OVMF they
> >> don't work, because at the end of the PCI enumeration and resource
> >> allocation, which cues the ACPI linker/loader client, the command register
> >> is clear.)
> >>
> >> The solution is to fetch the BAR addresses from PCI config space directly,
> >> for the purposes of build_crs(), regardless of the PCI command register
> >> and any MemoryRegion state.
> >>
> >> Example MMIO maps for a 2GB guest:
> >>
> >> SeaBIOS:
> >>
> >>   main: 0x80000000..0xFC000000 / 0x7C000000
> >>   pxb:  0xFC000000..0xFC200000 / 0x00200000
> >>   main: 0xFC200000..0xFC213000 / 0x00013000
> >>   pxb:  0xFC213000..0xFC213100 / 0x00000100 <- SHPC BAR
> >>   main: 0xFC213100..0xFEA00000 / 0x027ECF00
> >>   pxb:  0xFEA00000..0xFEC00000 / 0x00200000
> >>
> >> OVMF, without the fix:
> >>
> >>   main: 0x80000000..0x88100000 / 0x08100000
> >>   pxb:  0x88100000..0x88200000 / 0x00100000
> >>   main: 0x88200000..0xFEC00000 / 0x76A00000
> >>
> >> OVMF, with the fix:
> >>
> >>   main: 0x80000000..0x88100000 / 0x08100000
> >>   pxb:  0x88100000..0x88200000 / 0x00100000
> >>   pxb:  0x88200000..0x88200100 / 0x00000100 <- SHPC BAR
> >>   main: 0x88200100..0xFEC00000 / 0x769FFF00
> >>
> >> (Note that under OVMF the PCI enumerator includes requests for
> >> prefetchable memory in the nonprefetchable memory pool -- separate windows
> >> for nonprefetchable and prefetchable memory are not supported (yet) --
> >> that's why there's one fewer pxb range in the corrected OVMF case than in
> >> the SeaBIOS case.)
> >>
> >> Cc: Marcel Apfelbaum <mar...@redhat.com>
> >> Cc: Michael S. Tsirkin <m...@redhat.com>
> >> Signed-off-by: Laszlo Ersek <ler...@redhat.com>
> > 
> > This is problematic - disabled BAR values have no meaning according to
> > the PCI spec.
> > 
> > It might be best to add a property to just disable shpc in the bridge so
> > no devices reside directly behind the pxb?
> 
> I started looking into this.
> 
> The property itself doesn't seem terribly hard (there's already an "msi"
> property which I can take as an example). Making stuff conditional on
> this new "shpc" property looked feasible in the beginning, however I
> qucikly ran into two issues:
> 
> - Migration.
> 
>   One idea would be to keep the SHPC setup around at all times, and
>   just not expose the PCI bar to the guest (same as in Marcel's hack
>   [1]). I didn't like this, although it would keep the migration stream
>   intact.

I don't get it. You can't migrate device with SHPC to device without,
the difference is guest visible.
Why worry about migration stream being compatible?

>   The other idea is to omit even the shpc_init() call when SHPC is
>   disabled. I like this, but it would require breaking migration
>   compatibility. Both "minimum_version_id" and "version_id" would have
>   to be set to 1 (from the current zero), and the fixed SHPC_VMSTATE()
>   field should be replaced with a subsection (dependent on the new
>   "shpc" flag).
> 
>   Seems sweaty but doable.
> 
> - Hotplug handling.
> 
>   This is the deal breaker. The new "shpc" flag can affect *instances*
>   of the bridge, but SHPC is a class-level trait.
>   pci_bridge_dev_class_init() sets hc->plug and hc->unplug_request to
>   SHPC-related callbacks, plus "pci_bridge_dev_info" advertises itself
>   as TYPE_HOTPLUG_HANDLER.
> 
>   This implies that I'd have to split the TYPE_PCI_BRIDGE_DEV class
>   into a base class and a derived class. Only the derived class would
>   support SHPC / TYPE_HOTPLUG_HANDLER. And, PXB would have to be
>   diverted to the new base class (without SHPC), in pxb_dev_initfn(),
>   from "pci-bridge". (The derived class would preserve the name
>   "pci-bridge".)
> 
>   Consequences for migration are unclear to me. Maybe the new derived
>   class (functionally equivalent to the current TYPE_PCI_BRIDGE_DEV)
>   would be migration-compatible with the current class.
> 
>   If not, I could create the "basic" bridge class as a standalone one,
>   without interrupt / MSI / SHPC / hotplug support, and PXB would use
>   that. The file "hw/pci-bridge/pci_bridge_dev.c" is really small, so
>   this would be quite easy; it woduln't duplicate a lot of code, and
>   would not affect preexistent migration at all. On the other hand,
>   people might not like that the base class functionality were
>   duplicated, instead of inherited.
> 
>   I've managed such a base/descendant class split once before
>   (splitting fw_cfg into the IO and MMIO mapped variants) -- with help
>   of course -- so perhaps I could try it again, if that's the
>   preference.
> 
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02983.html
> 
> Thoughts?
> 
> Thanks,
> Laszlo
> 
> 
> >> ---
> >>  hw/i386/acpi-build.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> >> index b71e942..60d4f75 100644
> >> --- a/hw/i386/acpi-build.c
> >> +++ b/hw/i386/acpi-build.c
> >> @@ -784,8 +784,8 @@ static Aml *build_crs(PCIHostState *host,
> >>          for (i = 0; i < PCI_NUM_REGIONS; i++) {
> >>              PCIIORegion *r = &dev->io_regions[i];
> >>  
> >> -            range_base = r->addr;
> >> -            range_limit = r->addr + r->size - 1;
> >> +            range_base = pci_bar_address(dev, i, r->type, r->size, false);
> >> +            range_limit = range_base + r->size - 1;
> >>  
> >>              /*
> >>               * Work-around for old bioses
> >> -- 
> >> 1.8.3.1

Reply via email to