On 02/25/16 14:30, Michael S. Tsirkin wrote: > On Thu, Feb 25, 2016 at 02:00:09PM +0100, Laszlo Ersek wrote: >> On 02/25/16 13:44, Laszlo Ersek wrote: >>> Hi, >>> >>> On 02/25/16 12:57, Michael S. Tsirkin wrote: >>>> ----- Forwarded message from Igor Mammedov <imamm...@redhat.com> ----- >>>> >>>> Date: Thu, 11 Feb 2016 16:16:05 +0100 >>>> From: Igor Mammedov <imamm...@redhat.com> >>>> To: "Michael S. Tsirkin" <m...@redhat.com> >>>> To: ler...@redhat.com >>>> Subject: on pci rebalancing >>>> Message-ID: <20160211161605.0022e...@nial.brq.redhat.com> >>>> In-Reply-To: <20160209131656-mutt-send-email-...@redhat.com> >>>> >>>>>>>> For PCI rebalance to work on Windows, one has to provide working PCI >>>>>>>> driver >>>>>>>> otherwise OS will ignore it when rebalancing happens and >>>>>>>> might map something else over ignored BAR. >>>>>>> >>>>>>> Does it disable the BAR then? Or just move it elsewhere? >>>>>> it doesn't, it just blindly ignores BARs existence and maps BAR of >>>>>> another device with driver over it. >>>>> >>>>> Interesting. On classical PCI this is a forbidden configuration. >>>>> Maybe we do something that confuses windows? >>>>> Could you tell me how to reproduce this behaviour? >>>> #cat > t << EOF >>>> pci_update_mappings_del >>>> pci_update_mappings_add >>>> EOF >>>> >>>> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \ >>>> -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \ >>>> -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \ >>>> -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01 >>>> >>>> wait till OS boots, note BARs programmed for ivshmem >>>> in my case it was >>>> 01:01.0 0,0xfe800000+0x100 >>>> then execute script and watch pci_update_mappings* trace events >>>> >>>> # for i in $(seq 3 18); do printf -- "device_add >>>> e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done; >>>> >>>> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where >>>> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem >>>> and then programs new BARs, where: >>>> pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000 >>>> creates overlapping BAR with ivshmem >>> >>> Michael informed me of this on IRC (and forwarded this email to me). I hope >>> to start a new thread with my response. (I also reedited the subject fully.) >>> >>> So, to summarize what I said on IRC first. The situation where firmware >>> recognizes and enables a PCI device, hands control to the OS, and then the >>> OS lacks a driver for the PCI device, is completely normal and expected. >>> For UEFI specifically, I can name a general argument and a specific >>> argument. >>> >>> The general argument is that actions that need to be taken in >>> ExitBootServices() callbacks do not include clearing IO or MMIO decode bits >>> in PCI device command registers. Command register manipulation happens when >>> a PCI device driver (that conforms to the UEFI driver model) *binds* or >>> *unbinds* a device. And unbinding a device is not possible in the >>> ExitBootServices() callback, minimally because such callbacks are forbidden >>> from modifying the memory map -- but unbinding would release allocated >>> memory. >>> >>> So what we use such callbacks for is aborting in-flight, outstanding >>> DMA-like transfers. Re-setting virtio devices is also an example (think >>> outstanding receive requests for virtio-net). >>> >>> Now let's move on to the specific argument I mentioned above. The Graphics >>> Output Protocol (GOP) is a UEFI abstraction that was specifically designed >>> with the case in mind when the operating system doesn't have a display >>> driver -- yet installed --, but the user obviously has to use the display >>> somehow. The GOP is most frequently provided on top of an >>> EFI_PCI_IO_PROTOCOL instance; meaning simply that the "GOP driver" is a >>> UEFI driver that drives a PCI device. In short, the driver provides the GOP >>> on top of a PCI device. >>> >>> Now, the GOP is supposed to communicate the pixel format and the frame >>> buffer base address for the currently active graphics mode to the software >>> that consumes the GOP. This includes UEFI applications of course (think a >>> boot loader putting up a splash screen or an anmiation), but importantly, >>> the runtime OS is *also* supposed to inherit these characteristics from >>> boot services time. The OS can then use simple unaccelerated MMIO writes to >>> display things on the screen, until the users installs an accelerated >>> driver. >>> >>> (Concrete example: this is why you can see *anything at all* on the screen, >>> when you run e.g. Windows Server 2012 R2 on top of OVMF and a QXL display, >>> before installing the QXL WDDM driver in the guest.) >>> >>> Clearly, the frame buffer base address communicated through the GOP points >>> into one of the MMIO BARs of the PCI device. If, at ExitBootServices(), >>> MMIO decoding were disabled for the PCI device that underlies the GOP, that >>> would *completely* defeat the GOP design. The OS's attempt to poke at those >>> MMIO addresses would be futile -- and in fact the OS has no idea what PCI >>> device (if any) the framebuffer is supposed to be related to. This is the >>> jurisdiction of the OS-level display driver -- if one exists and is >>> installed. >>> >>> So, this is a Windows bug in my option. Just because there is no OS-level >>> driver, a PCI device is fully expected to be decoding resources, if the >>> firmware brought it up. >>> >>> --*-- >>> >>> Okay, so Michael asked me to try to reproduce the above with OVMF, and see >>> what happens. Unfortunately I'm not really knowledgeable about ivshmem, >>> hotplug, et cetera. Let me instead tell Igor about using OVMF. >>> >>> (1) Please follow the instructions on Gerd's page >>> <https://www.kraxel.org/repos/>, and install the "edk2.git-ovmf-x64" >>> package. >>> >>> (2) Create a separate directory for testing. In this directory, run the >>> following command: >>> >>> cp /usr/share/edk2.git/ovmf-x64/OVMF_VARS-pure-efi.fd myvars.fd >>> >>> Also create a disk image for your new guest, etc. >>> >>> (3) Use the following command line snippet to work with OVMF: >>> >>> qemu-system-x86_64 \ >>> -machine accel=kvm \ >>> -smp cpus=2 \ >>> -m 2048 \ >>> \ >>> -debugcon file:ovmf.debug.log \ >>> -global isa-debugcon.iobase=0x402 \ >>> \ >>> -device qxl-vga \ >>> \ >>> -drive >>> if=pflash,format=raw,unit=0,readonly,file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd >>> \ >>> -drive if=pflash,format=raw,unit=1,file=myvars.fd \ >>> \ >>> [your options here] >>> >>> You can of course customize the # of VCPUs, memory size, disks, CD-ROMs, >>> network, and so on. >>> >>> Recommended: when you use the -device option to add the disk and the >>> CD-ROM(s) to install the OS (and driver(s)) from, be sure to use the >>> "bootindex" property. OVMF will adhere to the boot order. It is recommended >>> to set bootindex=0 for your main disk, bootindex=1 for your OS installer >>> CD-ROM, and *no* bootindex for your virtio-win driver disk. This way at >>> first boot (with no OS installed) OVMF will boot the installer CD-ROM. >>> Further boots (with the same command line) will boot the installed OS. >>> >>> Caveat: I never used the -snapshot option with OVMF virtual machines; it >>> might or might not work. >>> >>> Caveat #2: I had tested simple PCI hotplug and hot-unplug with Windows >>> running on OVMF many months ago, but I can't tell off-hand if it will work >>> right now. >> >> I should also mention that you might not be able to reproduce the same >> situation with the "ivshmem" device. Namely, if there is no UEFI driver >> for that PCI device (and OVMF certainly doesn't have one), then its MMIO >> and IO decoding bits will *never* be set. As I said, command register >> massaging is the jurisdiction of the individual UEFI driver that >> ultimately binds the device -- and OVMF has no UEFI driver for ivshmem. >> >> Therefore you should probably try to reproduce the issue with another >> PCI device type that OVMF has a driver for, but Windows has none >> (installed at least). I'm quite hard pressed to name such a device type, >> unfortunately. :( > > virtio?
... was my first thought as well, but OVMF at the moment supports only legacy (0.9.5) virtio-pci devices (and virtio-mmio only on AARCH64) -- those don't have MMIO BARs, only IO BARs. Theoretically the Windows overlap issue should be triggerable with IO BARs just the same (resource - resource, right?), but I doubt it will be reproducible in practice. Laszlo >> Perhaps one of the more obscure emulated NICs could work in place of >> ivshmem. (The IPXE oproms provide UEFI drivers for those.) >> >> Thanks >> Laszlo