Sometime ago I reported an issue about guest OS hang when 64bit BAR present. http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg03189.html http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg00413.html
Some more investigation has been done, so in this post I'll try to explain why it happens and offer possible solutions: *When the issue happens* The issue occurs on Linux guest OS if kernel version <2.6.36 A Guest OS hangs on boot when a 64bit PCI BAR is present in a system (if we use ivshmem driver for example) and occupies range within first 4 GB. *How to reproduce* I used the following qemu command to reproduce the case: /usr/local/bin/qemu-system-x86_64 -M pc-1.3 -enable-kvm -m 2000 -smp 1,sockets=1,cores=1,threads=1 -name Rh5332 -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/Rh5332.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc -boot cd -drive file=/home/akorolev/rh5332.img,if=none,id=drive-ide0-0-0,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -chardev file,id=charserial0,path=/home/akorolev/serial.log -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -device ivshmem,shm,size=32M-device virtio-balloon-pci,id=balloon0 Tried different guests: Centos 5.8 64bit, RHEL 5.3 32bit, FC 12 64bit on all machines hang occurs in 100% cases *Why it happens* The issue basically comes from Linux PCI enumeration code. The OS enumerates 64BIT bars when device is enabled using the following procedure. 1. Write all FF's to lower half of 64bit BAR 2. Write address back to lower half of 64bit BAR 3. Write all FF's to higher half of 64bit BAR 4. Write address back to higher half of 64bit BAR For qemu it means that qemu pci_default_write_config() recevies all FFs for lower part of the 64bit BAR. Then it applies the mask and converts the value to "All FF's - size + 1" (FE000000 if size is 32MB). So for short period of time the range [0xFE000000 - 0xFFFFFFFF] will be occupied by ivshmem resource. For some reason it is lethal for further boot process. We have found that boot process screws up completely if kvm-apic-msi range is overlapped even for short period of time. (We still don't know why it happens, hope that the qemu maintainers can answer?) If we look at kvm-apic-msi memory region it is a non-overlapable memory region with hardcoded address range [0xFEE00000 - 0xFEF00000]. Here is a log we collected from render_memory_regions: system overlap 0 pri 0 [0x0 - 0x7fffffffffffffff] kvmvapic-rom overlap 1 pri 1000 [0xca000 - 0xcd000] pc.ram overlap 0 pri 0 [0xca000 - 0xcd000] ++ pc.ram [0xca000 - 0xcd000] is added to view .................... smram-region overlap 1 pri 1 [0xa0000 - 0xc0000] pci overlap 0 pri 0 [0xa0000 - 0xc0000] cirrus-lowmem-container overlap 1 pri 1 [0xa0000 - 0xc0000] cirrus-low-memory overlap 0 pri 0 [0xa0000 - 0xc0000] ++cirrus-low-memory [0xa0000 - 0xc0000] is added to view kvm-ioapic overlap 0 pri 0 [0xfec00000 - 0xfec01000] ++kvm-ioapic [0xfec00000 - 0xfec01000] is added to view pci-hole64 overlap 0 pri 0 [0x100000000 - 0x4000000100000000] pci overlap 0 pri 0 [0x100000000 - 0x4000000100000000] pci-hole overlap 0 pri 0 [0x7d000000 - 0x100000000] pci overlap 0 pri 0 [0x7d000000 - 0x100000000] ivshmem-bar2-container overlap 1 pri 1 [0xfe000000 - 0x100000000] ivshmem.bar2 overlap 0 pri 0 [0xfe000000 - 0x100000000] ++ivshmem.bar2 [0xfe000000 - 0xfec00000] is added to view ++ivshmem.bar2 [0xfec01000 - 0x100000000] is added to view ivshmem-mmio overlap 1 pri 1 [0xfebf1000 - 0xfebf1100] e1000-mmio overlap 1 pri 1 [0xfeba0000 - 0xfebc0000] cirrus-mmio overlap 1 pri 1 [0xfebf0000 - 0xfebf1000] cirrus-pci-bar0 overlap 1 pri 1 [0xfa000000 - 0xfc000000] vga.vram overlap 1 pri 1 [0xfa000000 - 0xfa800000] ++vga.vram [0xfa000000 - 0xfa800000] is added to view cirrus-bitblt-mmio overlap 0 pri 0 [0xfb000000 - 0xfb400000] ++cirrus-bitblt-mmio [0xfb000000 - 0xfb400000] is added to view cirrus-linear-io overlap 0 pri 0 [0xfa000000 - 0xfa800000] pc.bios overlap 0 pri 0 [0xfffe0000 - 0x100000000] ram-below-4g overlap 0 pri 0 [0x0 - 0x7d000000] pc.ram overlap 0 pri 0 [0x0 - 0x7d000000] ++pc.ram [0x0 - 0xa0000] is added to view ++pc.ram [0x100000 - 0x7d000000] is added to view kvm-apic-msi overlap 0 pri 0 [0xfee00000 - 0xfef00000] As you can see from log the kvm-apic-msi is enumarated last when range [0xfee00000 - 0xfef00000] is already occupied by ivshmem.bar2 [0xfec01000 - 0x100000000]. *Possible solutions* Solution 1. Probably the best would be adding the rule that regions which may not be overlapped are added to view first (In in other words regions which must not be overlapped have the highest priority). Please find patch in the following message. Solution 2. Raise priority of kvm-apic-msi resource. This is a bit misleading solution, as priority is only applicable for overlap-able regions, but this region must not be overlapped. Solution 3. Fix the issue at PCI level. Track if the resource is 64bit and apply changes if both parts of 64bit BAR are programmed. (It appears that real PCI bus controllers are smart enough to track 64bit BAR writes on PC, so qemu could do the same? Drawbacks are that tracking PCI writes is bit cumbersome, and such tracking may appear to somebody as a hack) Alexey