On 02/25/16 13:44, Laszlo Ersek wrote:
> Hi,
> 
> On 02/25/16 12:57, Michael S. Tsirkin wrote:
>> ----- Forwarded message from Igor Mammedov <imamm...@redhat.com> -----
>>
>> Date: Thu, 11 Feb 2016 16:16:05 +0100
>> From: Igor Mammedov <imamm...@redhat.com>
>> To: "Michael S. Tsirkin" <m...@redhat.com>
>> To: ler...@redhat.com
>> Subject: on pci rebalancing
>> Message-ID: <20160211161605.0022e...@nial.brq.redhat.com>
>> In-Reply-To: <20160209131656-mutt-send-email-...@redhat.com>
>>
>>>>>> For PCI rebalance to work on Windows, one has to provide working PCI 
>>>>>> driver
>>>>>> otherwise OS will ignore it when rebalancing happens and
>>>>>> might map something else over ignored BAR.    
>>>>>
>>>>> Does it disable the BAR then? Or just move it elsewhere?  
>>>> it doesn't, it just blindly ignores BARs existence and maps BAR of
>>>> another device with driver over it.  
>>>
>>> Interesting. On classical PCI this is a forbidden configuration.
>>> Maybe we do something that confuses windows?
>>> Could you tell me how to reproduce this behaviour?
>> #cat > t << EOF
>> pci_update_mappings_del
>> pci_update_mappings_add
>> EOF
>>
>> #./x86_64-softmmu/qemu-system-x86_64 -snapshot -enable-kvm -snapshot \
>>  -monitor unix:/tmp/m,server,nowait -device pci-bridge,chassis_nr=1 \
>>  -boot menu=on -m 4G -trace events=t ws2012r2x64dc.img \
>>  -device ivshmem,id=foo,size=2M,shm,bus=pci.1,addr=01
>>
>> wait till OS boots, note BARs programmed for ivshmem
>>  in my case it was
>>    01:01.0 0,0xfe800000+0x100
>> then execute script and watch pci_update_mappings* trace events
>>
>> # for i in $(seq 3 18); do printf -- "device_add e1000,bus=pci.1,addr=%x\n" 
>> $i | nc -U /tmp/m; sleep 5; done;
>>
>> hotplugging e1000,bus=pci.1,addr=12 triggers rebalancing where
>> Windows unmaps all BARs of nics on bridge but doesn't touch ivshmem
>> and then programs new BARs, where:
>>   pci_update_mappings_add d=0x7fa02ff0cf90 01:11.0 0,0xfe800000+0x20000
>> creates overlapping BAR with ivshmem 
> 
> Michael informed me of this on IRC (and forwarded this email to me). I hope 
> to start a new thread with my response. (I also reedited the subject fully.)
> 
> So, to summarize what I said on IRC first. The situation where firmware 
> recognizes and enables a PCI device, hands control to the OS, and then the OS 
> lacks a driver for the PCI device, is completely normal and expected. For 
> UEFI specifically, I can name a general argument and a specific argument.
> 
> The general argument is that actions that need to be taken in 
> ExitBootServices() callbacks do not include clearing IO or MMIO decode bits 
> in PCI device command registers. Command register manipulation happens when a 
> PCI device driver (that conforms to the UEFI driver model) *binds* or 
> *unbinds* a device. And unbinding a device is not possible in the 
> ExitBootServices() callback, minimally because such callbacks are forbidden 
> from modifying the memory map -- but unbinding would release allocated memory.
> 
> So what we use such callbacks for is aborting in-flight, outstanding DMA-like 
> transfers. Re-setting virtio devices is also an example (think outstanding 
> receive requests for virtio-net).
> 
> Now let's move on to the specific argument I mentioned above. The Graphics 
> Output Protocol (GOP) is a UEFI abstraction that was specifically designed 
> with the case in mind when the operating system doesn't have a display driver 
> -- yet installed --, but the user obviously has to use the display somehow. 
> The GOP is most frequently provided on top of an EFI_PCI_IO_PROTOCOL 
> instance; meaning simply that the "GOP driver" is a UEFI driver that drives a 
> PCI device. In short, the driver provides the GOP on top of a PCI device.
> 
> Now, the GOP is supposed to communicate the pixel format and the frame buffer 
> base address for the currently active graphics mode to the software that 
> consumes the GOP. This includes UEFI applications of course (think a boot 
> loader putting up a splash screen or an anmiation), but importantly, the 
> runtime OS is *also* supposed to inherit these characteristics from boot 
> services time. The OS can then use simple unaccelerated MMIO writes to 
> display things on the screen, until the users installs an accelerated driver.
> 
> (Concrete example: this is why you can see *anything at all* on the screen, 
> when you run e.g. Windows Server 2012 R2 on top of OVMF and a QXL display, 
> before installing the QXL WDDM driver in the guest.)
> 
> Clearly, the frame buffer base address communicated through the GOP points 
> into one of the MMIO BARs of the PCI device. If, at ExitBootServices(), MMIO 
> decoding were disabled for the PCI device that underlies the GOP, that would 
> *completely* defeat the GOP design. The OS's attempt to poke at those MMIO 
> addresses would be futile -- and in fact the OS has no idea what PCI device 
> (if any) the framebuffer is supposed to be related to. This is the 
> jurisdiction of the OS-level display driver -- if one exists and is installed.
> 
> So, this is a Windows bug in my option. Just because there is no OS-level 
> driver, a PCI device is fully expected to be decoding resources, if the 
> firmware brought it up.
> 
> --*--
> 
> Okay, so Michael asked me to try to reproduce the above with OVMF, and see 
> what happens. Unfortunately I'm not really knowledgeable about ivshmem, 
> hotplug, et cetera. Let me instead tell Igor about using OVMF.
> 
> (1) Please follow the instructions on Gerd's page 
> <https://www.kraxel.org/repos/>, and install the "edk2.git-ovmf-x64" package.
> 
> (2) Create a separate directory for testing. In this directory, run the 
> following command:
> 
>   cp /usr/share/edk2.git/ovmf-x64/OVMF_VARS-pure-efi.fd myvars.fd
> 
> Also create a disk image for your new guest, etc.
> 
> (3) Use the following command line snippet to work with OVMF:
> 
>      qemu-system-x86_64 \
>        -machine accel=kvm \
>        -smp cpus=2 \
>        -m 2048 \
>        \
>        -debugcon file:ovmf.debug.log \
>        -global isa-debugcon.iobase=0x402 \
>        \
>        -device qxl-vga \
>        \
>        -drive 
> if=pflash,format=raw,unit=0,readonly,file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd
>  \
>        -drive if=pflash,format=raw,unit=1,file=myvars.fd \
>        \
>        [your options here]
> 
> You can of course customize the # of VCPUs, memory size, disks, CD-ROMs, 
> network, and so on.
> 
> Recommended: when you use the -device option to add the disk and the 
> CD-ROM(s) to install the OS (and driver(s)) from, be sure to use the 
> "bootindex" property. OVMF will adhere to the boot order. It is recommended 
> to set bootindex=0 for your main disk, bootindex=1 for your OS installer 
> CD-ROM, and *no* bootindex for your virtio-win driver disk. This way at first 
> boot (with no OS installed) OVMF will boot the installer CD-ROM. Further 
> boots (with the same command line) will boot the installed OS.
> 
> Caveat: I never used the -snapshot option with OVMF virtual machines; it 
> might or might not work.
> 
> Caveat #2: I had tested simple PCI hotplug and hot-unplug with Windows 
> running on OVMF many months ago, but I can't tell off-hand if it will work 
> right now.

I should also mention that you might not be able to reproduce the same
situation with the "ivshmem" device. Namely, if there is no UEFI driver
for that PCI device (and OVMF certainly doesn't have one), then its MMIO
and IO decoding bits will *never* be set. As I said, command register
massaging is the jurisdiction of the individual UEFI driver that
ultimately binds the device -- and OVMF has no UEFI driver for ivshmem.

Therefore you should probably try to reproduce the issue with another
PCI device type that OVMF has a driver for, but Windows has none
(installed at least). I'm quite hard pressed to name such a device type,
unfortunately. :(

Perhaps one of the more obscure emulated NICs could work in place of
ivshmem. (The IPXE oproms provide UEFI drivers for those.)

Thanks
Laszlo

Reply via email to