On Thu, Feb 25, 2016 at 03:05:08PM +0100, Laszlo Ersek wrote: > On 02/25/16 14:30, Michael S. Tsirkin wrote: > > On Thu, Feb 25, 2016 at 02:00:09PM +0100, Laszlo Ersek wrote: > >> On 02/25/16 13:44, Laszlo Ersek wrote: > >>> Hi, > >>> > >>> On 02/25/16 12:57, Michael S. Tsirkin wrote: > >>>> ----- Forwarded message from Igor Mammedov <imamm...@redhat.com> ----- > >>>> > >>>> Date: Thu, 11 Feb 2016 16:16:05 +0100 > >>>> From: Igor Mammedov <imamm...@redhat.com> > >>>> To: "Michael S. Tsirkin" <m...@redhat.com> > >>>> To: ler...@redhat.com > >>>> Subject: on pci rebalancing > >>>> Message-ID: <20160211161605.0022e...@nial.brq.redhat.com> > >>>> In-Reply-To: <20160209131656-mutt-send-email-...@redhat.com> > >>>> > >>>>>>>> For PCI rebalance to work on Windows, one has to provide working PCI > >>>>>>>> driver > >>>>>>>> otherwise OS will ignore it when rebalancing happens and > >>>>>>>> might map something else over ignored BAR. > >>>>>>> > >>>>>>> Does it disable the BAR then? Or just move it elsewhere? > >>>>>> it doesn't, it just blindly ignores BARs existence and maps BAR of > >>>>>> another device with driver over it. > >>>>> > >>>>> Interesting. On classical PCI this is a forbidden configuration. > >>>>> Maybe we do something that confuses windows? > >>>>> Could you tell me how to reproduce this behaviour? > >>>> #cat > t << EOF > >>>> pci_update_mappings_del > >>>> pci_update_mappings_add > >>>> EOF > >>>> > >>>> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \ > >>>> -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \ > >>>> -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \ > >>>> -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01 > >>>> > >>>> wait till OS boots, note BARs programmed for ivshmem > >>>> in my case it was > >>>> 01:01.0 0,0xfe800000+0x100 > >>>> then execute script and watch pci_update_mappings* trace events > >>>> > >>>> # for i in $(seq 3 18); do printf -- "device_add > >>>> e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done; > >>>> > >>>> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where > >>>> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem > >>>> and then programs new BARs, where: > >>>> pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000 > >>>> creates overlapping BAR with ivshmem > >>> > >>> Michael informed me of this on IRC (and forwarded this email to me). I > >>> hope to start a new thread with my response. (I also reedited the subject > >>> fully.) > >>> > >>> So, to summarize what I said on IRC first. The situation where firmware > >>> recognizes and enables a PCI device, hands control to the OS, and then > >>> the OS lacks a driver for the PCI device, is completely normal and > >>> expected. For UEFI specifically, I can name a general argument and a > >>> specific argument. > >>> > >>> The general argument is that actions that need to be taken in > >>> ExitBootServices() callbacks do not include clearing IO or MMIO decode > >>> bits in PCI device command registers. Command register manipulation > >>> happens when a PCI device driver (that conforms to the UEFI driver model) > >>> *binds* or *unbinds* a device. And unbinding a device is not possible in > >>> the ExitBootServices() callback, minimally because such callbacks are > >>> forbidden from modifying the memory map -- but unbinding would release > >>> allocated memory. > >>> > >>> So what we use such callbacks for is aborting in-flight, outstanding > >>> DMA-like transfers. Re-setting virtio devices is also an example (think > >>> outstanding receive requests for virtio-net). > >>> > >>> Now let's move on to the specific argument I mentioned above. The > >>> Graphics Output Protocol (GOP) is a UEFI abstraction that was > >>> specifically designed with the case in mind when the operating system > >>> doesn't have a display driver -- yet installed --, but the user obviously > >>> has to use the display somehow. The GOP is most frequently provided on > >>> top of an EFI_PCI_IO_PROTOCOL instance; meaning simply that the "GOP > >>> driver" is a UEFI driver that drives a PCI device. In short, the driver > >>> provides the GOP on top of a PCI device. > >>> > >>> Now, the GOP is supposed to communicate the pixel format and the frame > >>> buffer base address for the currently active graphics mode to the > >>> software that consumes the GOP. This includes UEFI applications of course > >>> (think a boot loader putting up a splash screen or an anmiation), but > >>> importantly, the runtime OS is *also* supposed to inherit these > >>> characteristics from boot services time. The OS can then use simple > >>> unaccelerated MMIO writes to display things on the screen, until the > >>> users installs an accelerated driver. > >>> > >>> (Concrete example: this is why you can see *anything at all* on the > >>> screen, when you run e.g. Windows Server 2012 R2 on top of OVMF and a QXL > >>> display, before installing the QXL WDDM driver in the guest.) > >>> > >>> Clearly, the frame buffer base address communicated through the GOP > >>> points into one of the MMIO BARs of the PCI device. If, at > >>> ExitBootServices(), MMIO decoding were disabled for the PCI device that > >>> underlies the GOP, that would *completely* defeat the GOP design. The > >>> OS's attempt to poke at those MMIO addresses would be futile -- and in > >>> fact the OS has no idea what PCI device (if any) the framebuffer is > >>> supposed to be related to. This is the jurisdiction of the OS-level > >>> display driver -- if one exists and is installed. > >>> > >>> So, this is a Windows bug in my option. Just because there is no OS-level > >>> driver, a PCI device is fully expected to be decoding resources, if the > >>> firmware brought it up. > >>> > >>> --*-- > >>> > >>> Okay, so Michael asked me to try to reproduce the above with OVMF, and > >>> see what happens. Unfortunately I'm not really knowledgeable about > >>> ivshmem, hotplug, et cetera. Let me instead tell Igor about using OVMF. > >>> > >>> (1) Please follow the instructions on Gerd's page > >>> <https://www.kraxel.org/repos/>, and install the "edk2.git-ovmf-x64" > >>> package. > >>> > >>> (2) Create a separate directory for testing. In this directory, run the > >>> following command: > >>> > >>> cp /usr/share/edk2.git/ovmf-x64/OVMF_VARS-pure-efi.fd myvars.fd > >>> > >>> Also create a disk image for your new guest, etc. > >>> > >>> (3) Use the following command line snippet to work with OVMF: > >>> > >>> qemu-system-x86_64 \ > >>> -machine accel=kvm \ > >>> -smp cpus=2 \ > >>> -m 2048 \ > >>> \ > >>> -debugcon file:ovmf.debug.log \ > >>> -global isa-debugcon.iobase=0x402 \ > >>> \ > >>> -device qxl-vga \ > >>> \ > >>> -drive > >>> if=pflash,format=raw,unit=0,readonly,file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd > >>> \ > >>> -drive if=pflash,format=raw,unit=1,file=myvars.fd \ > >>> \ > >>> [your options here] > >>> > >>> You can of course customize the # of VCPUs, memory size, disks, CD-ROMs, > >>> network, and so on. > >>> > >>> Recommended: when you use the -device option to add the disk and the > >>> CD-ROM(s) to install the OS (and driver(s)) from, be sure to use the > >>> "bootindex" property. OVMF will adhere to the boot order. It is > >>> recommended to set bootindex=0 for your main disk, bootindex=1 for your > >>> OS installer CD-ROM, and *no* bootindex for your virtio-win driver disk. > >>> This way at first boot (with no OS installed) OVMF will boot the > >>> installer CD-ROM. Further boots (with the same command line) will boot > >>> the installed OS. > >>> > >>> Caveat: I never used the -snapshot option with OVMF virtual machines; it > >>> might or might not work. > >>> > >>> Caveat #2: I had tested simple PCI hotplug and hot-unplug with Windows > >>> running on OVMF many months ago, but I can't tell off-hand if it will > >>> work right now. > >> > >> I should also mention that you might not be able to reproduce the same > >> situation with the "ivshmem" device. Namely, if there is no UEFI driver > >> for that PCI device (and OVMF certainly doesn't have one), then its MMIO > >> and IO decoding bits will *never* be set. As I said, command register > >> massaging is the jurisdiction of the individual UEFI driver that > >> ultimately binds the device -- and OVMF has no UEFI driver for ivshmem. > >> > >> Therefore you should probably try to reproduce the issue with another > >> PCI device type that OVMF has a driver for, but Windows has none > >> (installed at least). I'm quite hard pressed to name such a device type, > >> unfortunately. :( > > > > virtio? > > ... was my first thought as well, but OVMF at the moment supports only > legacy (0.9.5) virtio-pci devices
Oh. We'll have to fix that too :( > (and virtio-mmio only on AARCH64) -- > those don't have MMIO BARs, only IO BARs. Well that's not exactly true - there is an MSI-X BAR. Maybe OVMF does not enable that, though. > Theoretically the Windows overlap issue should be triggerable with IO > BARs just the same (resource - resource, right?), but I doubt it will be > reproducible in practice. > > Laszlo > > >> Perhaps one of the more obscure emulated NICs could work in place of > >> ivshmem. (The IPXE oproms provide UEFI drivers for those.) > >> > >> Thanks > >> Laszlo