I tested below combos of qemu and kernel, +------------------------+-----------------+-------------+ | kernel | QEMU | migration | +------------------------+-----------------+-------------+ | SLES11SP2+kvm-kmod-3.6 | qemu-1.6.0 | GOOD | +------------------------+-----------------+-------------+ | SLES11SP2+kvm-kmod-3.6 | qemu-1.6.0* | BAD | +------------------------+-----------------+-------------+ | SLES11SP2+kvm-kmod-3.6 | qemu-1.5.1 | BAD | +------------------------+-----------------+-------------+ | SLES11SP2+kvm-kmod-3.6*| qemu-1.5.1 | GOOD | +------------------------+-----------------+-------------+ | SLES11SP2+kvm-kmod-3.6 | qemu-1.5.1* | GOOD | +------------------------+-----------------+-------------+ | SLES11SP2+kvm-kmod-3.6 | qemu-1.5.2 | BAD | +------------------------+-----------------+-------------+ | kvm-3.11-2 | qemu-1.5.1 | BAD | +------------------------+-----------------+-------------+ NOTE: 1. kvm-3.11-2 : the whole tag kernel downloaded from https://git.kernel.org/pub/scm/virt/kvm/kvm.git 2. SLES11SP2+kvm-kmod-3.6 : our release kernel, replace the SLES11SP2's default kvm-kmod with kvm-kmod-3.6, SLES11SP2's kernel version is 3.0.13-0.27 3. qemu-1.6.0* : revert the commit 211ea74022f51164a7729030b28eec90b6c99a08 on qemu-1.6.0 4. kvm-kmod-3.6* : kvm-kmod-3.6 with EPT disabled 5. qemu-1.5.1* : apply below patch to qemu-1.5.1 to delete qemu_madvise() statement in ram_load() function
--- qemu-1.5.1/arch_init.c 2013-06-27 05:47:29.000000000 +0800 +++ qemu-1.5.1_fix3/arch_init.c 2013-08-28 19:43:42.000000000 +0800 @@ -842,7 +842,6 @@ static int ram_load(QEMUFile *f, void *o if (ch == 0 && (!kvm_enabled() || kvm_has_sync_mmu()) && getpagesize() <= TARGET_PAGE_SIZE) { - qemu_madvise(host, TARGET_PAGE_SIZE, QEMU_MADV_DONTNEED); } #endif } else if (flags & RAM_SAVE_FLAG_PAGE) { If I apply above patch to qemu-1.5.1 to delete the qemu_madvise() statement, the test result of the combos of SLES11SP2+kvm-kmod-3.6 and qemu-1.5.1 is good. Why do we perform the qemu_madvise(QEMU_MADV_DONTNEED) for those zero pages? Does the qemu_madvise() have sustained effect on the range of virtual address? In other words, does qemu_madvise() have sustained effect on the VM performance? If later frequently read/write the range of virtual address which have been advised to DONTNEED, could performance degradation happen? The reason why the combos of SLES11SP2+kvm-kmod-3.6 and qemu-1.6.0 is good, is because of commit 211ea74022f51164a7729030b28eec90b6c99a08, if I revert the commit 211ea74022f51164a7729030b28eec90b6c99a08 on qemu-1.6.0, the test result of combos of SLES11SP2+kvm-kmod-3.6 and qemu-1.6.0 is bad, performance degradation happened, too. Thanks, Zhang Haoyu >> >>> The QEMU command line (/var/log/libvirt/qemu/[domain name].log), >> >>> LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ >> >>> QEMU_AUDIO_DRV=none >> >>> /usr/local/bin/qemu-system-x86_64 -name ATS1 -S -M pc-0.12 -cpu >> >>> qemu32 -enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 >> >>> -uuid >> >>> 0505ec91-382d-800e-2c79-e5b286eb60b5 -no-user-config -nodefaults >> >>> -chardev >> >>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/ATS1.monitor,serv >> >>> er, n owait -mon chardev=charmonitor,id=monitor,mode=control -rtc >> >>> base=localtime -no-shutdown -device >> >>> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive >> >>> file=/opt/ne/vm/ATS1.img,if=none,id=drive-virtio-disk0,format=raw, >> >>> cac >> >>> h >> >>> e=none -device >> >>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk >> >>> 0,i >> >>> d >> >>> =virtio-disk0,bootindex=1 -netdev >> >>> tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device >> >>> virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:00,bus=pci. >> >>> 0 >> >>> ,addr=0x3,bootindex=2 -netdev >> >>> tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device >> >>> virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:00,bus=pci. >> >>> 0 >> >>> ,addr=0x4 -netdev tap,fd=24,id=hostnet2,vhost=on,vhostfd=25 >> >>> -device >> >>> virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:00,bus=pci. >> >>> 0 >> >>> ,addr=0x5 -netdev tap,fd=26,id=hostnet3,vhost=on,vhostfd=27 >> >>> -device >> >>> virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:00,bus=pci. >> >>> 0 >> >>> ,addr=0x6 -netdev tap,fd=28,id=hostnet4,vhost=on,vhostfd=29 >> >>> -device >> >>> virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:00,bus=pci. >> >>> 0 >> >>> ,addr=0x7 -netdev tap,fd=30,id=hostnet5,vhost=on,vhostfd=31 >> >>> -device >> >>> virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:00,bus=pci. >> >>> 0 >> >>> ,addr=0x9 -chardev pty,id=charserial0 -device >> >>> isa-serial,chardev=charserial0,id=serial0 -vnc *:0 -k en-us -vga >> >>> cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb >> >>> -watchdog-action poweroff -device >> >>> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa >> >>> >> >>Which QEMU version is this? Can you try with e1000 NICs instead of virtio? >> >> >> >This QEMU version is 1.0.0, but I also test QEMU 1.5.2, the same problem >> >exists, including the performance degradation and readonly GFNs' flooding. >> >I tried with e1000 NICs instead of virtio, including the performance >> >degradation and readonly GFNs' flooding, the QEMU version is 1.5.2. >> >No matter e1000 NICs or virtio NICs, the GFNs' flooding is initiated at >> >post-restore stage (i.e. running stage), as soon as the restoring >> >completed, the flooding is starting. >> > >> >Thanks, >> >Zhang Haoyu >> > >> >>-- >> >> Gleb. >> >> Should we focus on the first bad >> commit(612819c3c6e67bac8fceaa7cc402f13b1b63f7e4) and the surprising GFNs' >> flooding? >> >Not really. There is no point in debugging very old version compiled with >kvm-kmod, there are to many variables in the environment. I cannot reproduce >the GFN flooding on upstream, so the problem may be gone, may be a result of >kvm-kmod problem or something different in how I invoke qemu. So the best way >to proceed is for you to reproduce with upstream version then at least I will >be sure that we are using the same code. > >> I applied below patch to __direct_map(), @@ -2223,6 +2223,8 @@ static >> int __direct_map(struct kvm_vcpu >> int pt_write = 0; >> gfn_t pseudo_gfn; >> >> + map_writable = true; >> + >> for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) { >> if (iterator.level == level) { >> unsigned pte_access = ACC_ALL; and rebuild the >> kvm-kmod, then re-insmod it. >> After I started a VM, the host seemed to be abnormal, so many programs >> cannot be started successfully, segmentation fault is reported. >> In my opinion, after above patch applied, the commit: >> 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 should be of no effect, but the >> test result proved me wrong. >> Dose the map_writable value's getting process in hva_to_pfn() have effect on >> the result? >> >If hva_to_pfn() returns map_writable == false it means that page is mapped as >read only on primary MMU, so it should not be mapped writable on secondary MMU >either. This should not happen usually. > >-- > Gleb.