On Thu, Feb 25, 2016 at 02:00:09PM +0100, Laszlo Ersek wrote: > On 02/25/16 13:44, Laszlo Ersek wrote: > > Hi, > > > > On 02/25/16 12:57, Michael S. Tsirkin wrote: > >> ----- Forwarded message from Igor Mammedov <imamm...@redhat.com> ----- > >> > >> Date: Thu, 11 Feb 2016 16:16:05 +0100 > >> From: Igor Mammedov <imamm...@redhat.com> > >> To: "Michael S. Tsirkin" <m...@redhat.com> > >> To: ler...@redhat.com > >> Subject: on pci rebalancing > >> Message-ID: <20160211161605.0022e...@nial.brq.redhat.com> > >> In-Reply-To: <20160209131656-mutt-send-email-...@redhat.com> > >> > >>>>>> For PCI rebalance to work on Windows, one has to provide working PCI > >>>>>> driver > >>>>>> otherwise OS will ignore it when rebalancing happens and > >>>>>> might map something else over ignored BAR. > >>>>> > >>>>> Does it disable the BAR then? Or just move it elsewhere? > >>>> it doesn't, it just blindly ignores BARs existence and maps BAR of > >>>> another device with driver over it. > >>> > >>> Interesting. On classical PCI this is a forbidden configuration. > >>> Maybe we do something that confuses windows? > >>> Could you tell me how to reproduce this behaviour? > >> #cat > t << EOF > >> pci_update_mappings_del > >> pci_update_mappings_add > >> EOF > >> > >> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \ > >> -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \ > >> -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \ > >> -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01 > >> > >> wait till OS boots, note BARs programmed for ivshmem > >> in my case it was > >> 01:01.0 0,0xfe800000+0x100 > >> then execute script and watch pci_update_mappings* trace events > >> > >> # for i in $(seq 3 18); do printf -- "device_add > >> e1000,bus=pci.1,addr=%x\n" $i | nc -U /tmp/m; sleep 5; done; > >> > >> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where > >> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem > >> and then programs new BARs, where: > >> pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000 > >> creates overlapping BAR with ivshmem > > > > Michael informed me of this on IRC (and forwarded this email to me). I hope > > to start a new thread with my response. (I also reedited the subject fully.) > > > > So, to summarize what I said on IRC first. The situation where firmware > > recognizes and enables a PCI device, hands control to the OS, and then the > > OS lacks a driver for the PCI device, is completely normal and expected. > > For UEFI specifically, I can name a general argument and a specific > > argument. > > > > The general argument is that actions that need to be taken in > > ExitBootServices() callbacks do not include clearing IO or MMIO decode bits > > in PCI device command registers. Command register manipulation happens when > > a PCI device driver (that conforms to the UEFI driver model) *binds* or > > *unbinds* a device. And unbinding a device is not possible in the > > ExitBootServices() callback, minimally because such callbacks are forbidden > > from modifying the memory map -- but unbinding would release allocated > > memory. > > > > So what we use such callbacks for is aborting in-flight, outstanding > > DMA-like transfers. Re-setting virtio devices is also an example (think > > outstanding receive requests for virtio-net). > > > > Now let's move on to the specific argument I mentioned above. The Graphics > > Output Protocol (GOP) is a UEFI abstraction that was specifically designed > > with the case in mind when the operating system doesn't have a display > > driver -- yet installed --, but the user obviously has to use the display > > somehow. The GOP is most frequently provided on top of an > > EFI_PCI_IO_PROTOCOL instance; meaning simply that the "GOP driver" is a > > UEFI driver that drives a PCI device. In short, the driver provides the GOP > > on top of a PCI device. > > > > Now, the GOP is supposed to communicate the pixel format and the frame > > buffer base address for the currently active graphics mode to the software > > that consumes the GOP. This includes UEFI applications of course (think a > > boot loader putting up a splash screen or an anmiation), but importantly, > > the runtime OS is *also* supposed to inherit these characteristics from > > boot services time. The OS can then use simple unaccelerated MMIO writes to > > display things on the screen, until the users installs an accelerated > > driver. > > > > (Concrete example: this is why you can see *anything at all* on the screen, > > when you run e.g. Windows Server 2012 R2 on top of OVMF and a QXL display, > > before installing the QXL WDDM driver in the guest.) > > > > Clearly, the frame buffer base address communicated through the GOP points > > into one of the MMIO BARs of the PCI device. If, at ExitBootServices(), > > MMIO decoding were disabled for the PCI device that underlies the GOP, that > > would *completely* defeat the GOP design. The OS's attempt to poke at those > > MMIO addresses would be futile -- and in fact the OS has no idea what PCI > > device (if any) the framebuffer is supposed to be related to. This is the > > jurisdiction of the OS-level display driver -- if one exists and is > > installed. > > > > So, this is a Windows bug in my option. Just because there is no OS-level > > driver, a PCI device is fully expected to be decoding resources, if the > > firmware brought it up. > > > > --*-- > > > > Okay, so Michael asked me to try to reproduce the above with OVMF, and see > > what happens. Unfortunately I'm not really knowledgeable about ivshmem, > > hotplug, et cetera. Let me instead tell Igor about using OVMF. > > > > (1) Please follow the instructions on Gerd's page > > <https://www.kraxel.org/repos/>, and install the "edk2.git-ovmf-x64" > > package. > > > > (2) Create a separate directory for testing. In this directory, run the > > following command: > > > > cp /usr/share/edk2.git/ovmf-x64/OVMF_VARS-pure-efi.fd myvars.fd > > > > Also create a disk image for your new guest, etc. > > > > (3) Use the following command line snippet to work with OVMF: > > > > qemu-system-x86_64 \ > > -machine accel=kvm \ > > -smp cpus=2 \ > > -m 2048 \ > > \ > > -debugcon file:ovmf.debug.log \ > > -global isa-debugcon.iobase=0x402 \ > > \ > > -device qxl-vga \ > > \ > > -drive > > if=pflash,format=raw,unit=0,readonly,file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd > > \ > > -drive if=pflash,format=raw,unit=1,file=myvars.fd \ > > \ > > [your options here] > > > > You can of course customize the # of VCPUs, memory size, disks, CD-ROMs, > > network, and so on. > > > > Recommended: when you use the -device option to add the disk and the > > CD-ROM(s) to install the OS (and driver(s)) from, be sure to use the > > "bootindex" property. OVMF will adhere to the boot order. It is recommended > > to set bootindex=0 for your main disk, bootindex=1 for your OS installer > > CD-ROM, and *no* bootindex for your virtio-win driver disk. This way at > > first boot (with no OS installed) OVMF will boot the installer CD-ROM. > > Further boots (with the same command line) will boot the installed OS. > > > > Caveat: I never used the -snapshot option with OVMF virtual machines; it > > might or might not work. > > > > Caveat #2: I had tested simple PCI hotplug and hot-unplug with Windows > > running on OVMF many months ago, but I can't tell off-hand if it will work > > right now. > > I should also mention that you might not be able to reproduce the same > situation with the "ivshmem" device. Namely, if there is no UEFI driver > for that PCI device (and OVMF certainly doesn't have one), then its MMIO > and IO decoding bits will *never* be set. As I said, command register > massaging is the jurisdiction of the individual UEFI driver that > ultimately binds the device -- and OVMF has no UEFI driver for ivshmem. > > Therefore you should probably try to reproduce the issue with another > PCI device type that OVMF has a driver for, but Windows has none > (installed at least). I'm quite hard pressed to name such a device type, > unfortunately. :(
virtio? > Perhaps one of the more obscure emulated NICs could work in place of > ivshmem. (The IPXE oproms provide UEFI drivers for those.) > > Thanks > Laszlo