Re: [kvm-devel] FW: KVM Test result, kernel 4a7f582.., userspace bc6db37..
On Monday 03 March 2008 15:42:09 Yang, Sheng wrote: On Friday 29 February 2008 20:53:41 Zhao, Yunfeng wrote: Zhao, Yunfeng wrote: Hi, all, This is today's KVM test result against kvm.git 4a7f582a07e14763ee4714b681e98b3b134d1d46 and kvm-userspace.git bc6db37817ce749dcc88fbc761a36bb8df5cf60a. LTP and kernel build test on pae linux guest are failed, because these case boot guests with smp 2.6.9 kernel, it's related with today's new issue. With manual test save restore have no problem for the first time. Save/restore test cases passed in manually testing. Because the command has been changed, it failed in auto testing. We will change the test cases. One new issue: 1. Can not boot guests with 2.6.9 smp pae kernel https://sourceforge.net/tracker/index.php?func=detailaid=19037 32group_id=180599atid=893831 We doubt this issue is caused by this commit: kvm: bios: mark extra cpus as present kvm-userspace: 538c90271b9431f8c7f2ebfdffdab07749b97d86 It seems like a bug of rhel4 kernel (2.6.9) on supporting vcpus = 16. I reduced VCPU number to 15, and it's all right. I found the correlated description on Oracle's website(...) http://oss.oracle.com/pipermail/el-errata/2007-May/000154.html [2.6.9-52] -fix boot panic on =16 cpu on Intel microcore platforms (Brian Maly) [221479] (I tried Oracle's bugzilla for details, but failed...) I suggest reducing the VCPU number to 15, and I remember we've done similar thing before on ACPI table, for booting Windows 2k. Seems Glauber Costa got better fix on sourceforge. http://sourceforge.net/tracker/index.php?func=detailaid=1903732group_id=180599atid=893831 The patch works well on my side. :) -- Thanks Yang, Sheng - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] ncurses support
Hi, ncurses support has been added recently to the QEMU CVS. Would it be possible to update KVM from the latest QEMU CVS to add ncurses support to KVM? Thanks, Aurelien -- .''`. Aurelien Jarno | GPG: 1024D/F1BCDB73 : :' : Debian developer | Electrical Engineer `. `' [EMAIL PROTECTED] | [EMAIL PROTECTED] `-people.debian.org/~aurel32 | www.aurel32.net - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [ kvm-Bugs-1906189 ] All SMP guests often halt
Bugs item #1906189, was opened at 2008-03-03 13:33 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1906189group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Technologov (technologov) Assigned to: Nobody/Anonymous (nobody) Summary: All SMP guests often halt Initial Comment: All SMP configurations are very unstable - both on Intel and AMD. KVM-62. Symptons: guests often soft-lock ups, or more precisely, they slow down to unacceptable speeds. Guests may hard-lockup totally, or even BSOD in some cases. I have tried: Windows 2000 Windows XP Windows Server 2003 Windows Server 2008 The KVM acts, but it looks like a loop. = [EMAIL PROTECTED] win2000-Pro]$ dmesg | tail -n40 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 1 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 1 SIPI to vcpu 1 vector 0x10 apic write: bad size=1 fee00030 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 1 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 1 SIPI to vcpu 1 vector 0x10 apic write: bad size=1 fee00030 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 1 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 1 Ignoring de-assert INIT to vcpu 1 SIPI to vcpu 1 vector 0x21 SIPI to vcpu 1 vector 0x21 SIPI to vcpu 1 vector 0x21 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 1 SIPI to vcpu 1 vector 0x21 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 1 SIPI to vcpu 1 vector 0x10 apic write: bad size=1 fee00030 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 1 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 1 Ignoring de-assert INIT to vcpu 1 SIPI to vcpu 1 vector 0x21 SIPI to vcpu 1 vector 0x21 SIPI to vcpu 1 vector 0x21 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 1 SIPI to vcpu 1 vector 0x21 Ignoring de-assert INIT to vcpu 0 Ignoring de-assert INIT to vcpu 1 = (gdb) bt #0 0x003a016c9aa7 in ioctl () from /lib64/libc.so.6 #1 0x0051bb29 in kvm_run (kvm=0x2a9b040, vcpu=0) at libkvm.c:850 #2 0x004fda86 in kvm_cpu_exec (env=value optimized out) at /root/Linstall/kvm-62rc2/qemu/qemu-kvm.c:127 #3 0x004fe5d5 in kvm_main_loop_cpu (env=0x2b56490) at /root/Linstall/kvm-62rc2/qemu/qemu-kvm.c:307 #4 0x004110fd in main (argc=44675488, argv=value optimized out) at /root/Linstall/kvm-62rc2/qemu/vl.c:7862 = kvm statistics efer_reload 103701 0 exits512480997 20642 fpu_reload24781662 799 halt_exits 1824249 170 halt_wakeup 828699 68 host_state_reload 495932451617 hypercalls 0 0 insn_emulation 389188282 14239 insn_emulation_fail 1110 0 invlpg 0 0 io_exits 28855411 928 irq_exits 191313613248 irq_window 0 0 largepages 0 0 mmio_exits16078802 0 mmu_cache_miss 4219404 415 mmu_flooded4110773 410 mmu_pde_zapped 499335 6 mmu_pte_updated 103816391327 mmu_pte_write 145679441737 mmu_recycled 17419 0 mmu_shadow_zapped 4372079 410 = -Alexey, 03.03.2008. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1906189group_id=180599 - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [ kvm-Bugs-1906204 ] AMD NPT causes performance degradation
Bugs item #1906204, was opened at 2008-03-03 13:45 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1906204group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Technologov (technologov) Assigned to: Nobody/Anonymous (nobody) Summary: AMD NPT causes performance degradation Initial Comment: Platform: F7/x64, AMD Barcelona K10, KVM-61. Guest: Windows XP SP2. By default, the new Nested Page Tables is enabled, which should accelerate guests. While it *does* accelerate guests in some areas, particularly guest OS setup time dropped by 20% - which is great, but in other areas I see performance degradation. For example: Passmark PerformanceTest v6.1 shows 2D Graphics Marks: 78.6 (without NPT) 2D Graphics Marks: 18.9 (with NPT) NPT was disabled using: # rmmod kvm-amd # modprobe kvm-amd npt=0 # dmesg | tail and all the graphics feel more sluggish, and way slower. I have used SDL rendering. -Alexey, 03.03.2008. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1906204group_id=180599 - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] mmu notifiers #v8
On Mon, Mar 03, 2008 at 04:29:34AM +0100, Nick Piggin wrote: to something I prefer. Others may not, but I'll post them for debate anyway. Sure, thanks! I didn't drop invalidate_page, because invalidate_range_begin/end would be slower for usages like KVM/GRU (we don't need a begin/end there because where invalidate_page is called, the VM holds a reference on the page). do_wp_page should also use invalidate_page since it can free the page after dropping the PT lock without losing any performance (that's not true for the places where invalidate_range is called). I'm still not completely happy with this. I had a very quick look at the GRU driver, but I don't see why it can't be implemented more like the regular TLB model, and have TLB insertions depend on the linux pte, and do invalidates _after_ restricting permissions to the pte. Ie. I'd still like to get rid of invalidate_range_begin, and get rid of invalidate calls from places where permissions are relaxed. _begin exists because by the time _end is called, the VM already dropped the reference on the page. This way we can do a single invalidate no matter how large the range is. I don't see ways to remove _begin while still invoking _end a single time for the whole range. If we can agree on the API, then I don't see any reason why it can't go into 2.6.25, unless someome wants more time to review it (but 2.6.25 release should be quite far away still so there should be quite a bit of time). Cool! ;) - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH] Virtio network device migration support
Virtio network device migration support It is composed of state saving and dirty bit tracking. Added dirty bit tracking for the rx packets. There is no need to add dirty bits for the outgoing packets since we do not write over guest memory. As for the descriptor ring (guest memory), I rather copy the entire ring (3 pages) when saving state than touching the dirty bits every time in fast path. Besides that the virtio device, pci bus state and the network device states are saved. Signed-off-by: Dor Laor [EMAIL PROTECTED] diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c index eb2a441..612cf6b 100644 --- a/qemu/hw/virtio-net.c +++ b/qemu/hw/virtio-net.c @@ -128,6 +128,7 @@ static void virtio_net_receive(void *opaque, const uint8_t *buf, int size) hdr = (void *)elem.in_sg[0].iov_base; hdr-flags = 0; hdr-gso_type = VIRTIO_NET_HDR_GSO_NONE; +cpu_physical_memory_set_dirty((ram_addr_t)elem.in_sg[0].iov_base - (ram_addr_t)phys_ram_base); /* copy in packet. ugh */ offset = 0; @@ -136,6 +137,7 @@ static void virtio_net_receive(void *opaque, const uint8_t *buf, int size) int len = MIN(elem.in_sg[i].iov_len, size - offset); memcpy(elem.in_sg[i].iov_base, buf + offset, len); offset += len; +cpu_physical_memory_set_dirty((ram_addr_t)elem.in_sg[i].iov_base - (ram_addr_t)phys_ram_base); i++; } @@ -210,6 +212,8 @@ again: else fprintf(stderr, reading network error %d, len); } +cpu_physical_memory_set_dirty((ram_addr_t)elem.in_sg[1].iov_base - (ram_addr_t)phys_ram_base); +cpu_physical_memory_set_dirty((ram_addr_t)elem.in_sg[0].iov_base - (ram_addr_t)phys_ram_base); virtqueue_push(vnet-rx_vq, elem, sizeof(*hdr) + len); vnet-do_notify = 1; } @@ -281,11 +285,52 @@ static void virtio_net_tx_timer(void *opaque) virtio_net_flush_tx(n, n-tx_vq); } + +static void virtio_net_save(QEMUFile *f, void *opaque) +{ +VirtIONet *n = opaque; + +pci_device_save(n-vdev.pci_dev, f); +qemu_put_buffer(f, n-mac, sizeof n-mac); +qemu_put_be32s(f, n-can_receive); + +virtio_dev_save(f, n-vdev); +} + +static int virtio_net_load(QEMUFile *f, void *opaque, int version_id) +{ +VirtIONet *n = opaque; +int ret; + +if (version_id 1) { +fprintf(stderr, %s: not supporting version 1\n, __FUNCTION__); +return -1; +} + +if ((ret = pci_device_load(n-vdev.pci_dev, f)) 0) +return ret; + +qemu_get_buffer(f, n-mac, sizeof n-mac); +qemu_get_be32s(f, n-can_receive); + + +if ((ret = virtio_dev_load(f, n-vdev, version_id)) 0) +return ret; + +/* Make sure we kick the tx */ +qemu_mod_timer(n-tx_timer, + qemu_get_clock(vm_clock) + TX_TIMER_INTERVAL); +n-tx_timer_active = 1; + +return 0; +} + void *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn) { VirtIONet *n; +const char *info_str = virtio-net; -n = (VirtIONet *)virtio_init_pci(bus, virtio-net, 6900, 0x1000, +n = (VirtIONet *)virtio_init_pci(bus, info_str, 6900, 0x1000, 0, VIRTIO_ID_NET, 0x02, 0x00, 0x00, 6, sizeof(VirtIONet)); @@ -308,5 +353,11 @@ void *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn) n-tx_timer = qemu_new_timer(vm_clock, virtio_net_tx_timer, n); n-tx_timer_active = 0; +snprintf(n-vc-info_str, sizeof(n-vc-info_str), + %s macaddr=%02x:%02x:%02x:%02x:%02x:%02x, info_str, + nd-macaddr[0], nd-macaddr[1], nd-macaddr[2], + nd-macaddr[3], nd-macaddr[4], nd-macaddr[5]); +register_savevm(info_str, 1, 1, virtio_net_save, virtio_net_load, n); + return n-vdev; } diff --git a/qemu/hw/virtio.c b/qemu/hw/virtio.c index 634f869..69fe810 100644 --- a/qemu/hw/virtio.c +++ b/qemu/hw/virtio.c @@ -180,6 +180,59 @@ void virtio_reset(void *opaque) } } +void virtio_dev_save(QEMUFile *f, VirtIODevice *vdev) +{ +int i; + +qemu_put_be32s(f, vdev-features); +qemu_put_be16s(f, vdev-queue_sel); +qemu_put_8s(f, vdev-status); +qemu_put_8s(f, vdev-isr); + +for(i = 0; i VIRTIO_PCI_QUEUE_MAX; i++) { +if (!vdev-vq[i].vring.num) +continue; +qemu_put_be32s(f, vdev-vq[i].pfn); +qemu_put_be16s(f, vdev-vq[i].last_avail_idx); +qemu_put_be32s(f, vdev-vq[i].index); + +/* Save the descriptor ring instead of constantly mark them dirty */ +qemu_put_buffer(f, (uint8_t*)vdev-vq[i].vring.desc, vdev-vq[i].vring.num * sizeof(VRingDesc)); +qemu_put_buffer(f, (uint8_t*)vdev-vq[i].vring.avail, TARGET_PAGE_SIZE); +qemu_put_buffer(f, (uint8_t*)vdev-vq[i].vring.used, TARGET_PAGE_SIZE); +} +} + +int virtio_dev_load(QEMUFile *f, VirtIODevice *vdev, int version_id) +{ + int i; + + if (version_id 1)
Re: [kvm-devel] [PATCH] mmu notifiers #v8
On Mon, Mar 03, 2008 at 01:51:53PM +0100, Andrea Arcangeli wrote: On Mon, Mar 03, 2008 at 04:29:34AM +0100, Nick Piggin wrote: to something I prefer. Others may not, but I'll post them for debate anyway. Sure, thanks! I didn't drop invalidate_page, because invalidate_range_begin/end would be slower for usages like KVM/GRU (we don't need a begin/end there because where invalidate_page is called, the VM holds a reference on the page). do_wp_page should also use invalidate_page since it can free the page after dropping the PT lock without losing any performance (that's not true for the places where invalidate_range is called). I'm still not completely happy with this. I had a very quick look at the GRU driver, but I don't see why it can't be implemented more like the regular TLB model, and have TLB insertions depend on the linux pte, and do invalidates _after_ restricting permissions to the pte. Ie. I'd still like to get rid of invalidate_range_begin, and get rid of invalidate calls from places where permissions are relaxed. _begin exists because by the time _end is called, the VM already dropped the reference on the page. This way we can do a single invalidate no matter how large the range is. I don't see ways to remove _begin while still invoking _end a single time for the whole range. Is this just a GRU problem? Can't we just require them to take a ref on the page (IIRC Jack said GRU could be changed to more like a TLB model). - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] mmu notifiers #v8
On Mon, Mar 03, 2008 at 02:10:17PM +0100, Nick Piggin wrote: Is this just a GRU problem? Can't we just require them to take a ref on the page (IIRC Jack said GRU could be changed to more like a TLB model). Yes, it's just a GRU problem, it tries to optimize performance by calling follow_page only in the fast path, and fallbacks to get_user_pages; put_page in the slow path. xpmem could also send the message in _begin and wait the message in _end, to reduce the wait time. But if you forge GRU to call get_user_pages only (like KVM does), the _begin can be removed. In theory we could also optimize KVM to use follow_page only if the pte is already established. I'm not sure how much that is a worthwhile optimization though. However note that Quadrics also had a callback before and one after, so they may be using the callback before for similar optimizations. But functionality-wise _end is the only required bit if everyone takes refcounts like KVM and XPMEM do. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [ kvm-Bugs-1906272 ] Debian 4 fails to boot on KVM-AMD
Bugs item #1906272, was opened at 2008-03-03 15:53 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1906272group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Technologov (technologov) Assigned to: Nobody/Anonymous (nobody) Summary: Debian 4 fails to boot on KVM-AMD Initial Comment: Host: AMD Barcelona K10, F7/x64, KVM-62. Guest: Debian 4 (32-bit). Problem: When installing Debian 4 (32-bit) on KVM-AMD host, it installs k7 kernel by default, and the resulting image is not bootable. It can be booted only with -no-kvm. This problem can be avoided when installing on KVM-intel host, which uses i686 kernel for Debian guests. This is because Debian's setup check for CPUID and setups the kernel, that best matches the current CPU. This kernel (i686) can be booted from kvm-intel or from kvm-amd without problems. Perhaps KVM-AMD doesn't emulate something K7 specific (3Dnow ?). I don't know what is the best solution to this problem, but I think using custom CPUID when installing Debian guests might do it. Any ideas? === (gdb) bt #0 0x003dd02c9117 in ioctl () from /lib64/libc.so.6 #1 0x0051bb29 in kvm_run (kvm=0x2a9b040, vcpu=0) at libkvm.c:850 #2 0x004fda86 in kvm_cpu_exec (env=value optimized out) at /root/Linstall/kvm-62rc2/qemu/qemu-kvm.c:127 #3 0x004fe5d5 in kvm_main_loop_cpu (env=0x2b82bb0) at /root/Linstall/kvm-62rc2/qemu/qemu-kvm.c:307 #4 0x004110fd in main (argc=44675488, argv=value optimized out) at /root/Linstall/kvm-62rc2/qemu/vl.c:7862 === kvm statistics efer_reload 0 0 exits 11387804 324872 fpu_reload 1340894 296 halt_exits 0 0 halt_wakeup 0 0 host_state_reload 1340931 295 hypercalls 0 0 insn_emulation10053389 323534 insn_emulation_fail 10009814 323534 invlpg 0 0 io_exits 13347571004 irq_exits0 0 irq_window 0 0 largepages 0 0 mmio_exits 27231 0 mmu_cache_miss 20 0 mmu_flooded 0 0 mmu_pde_zapped 0 0 mmu_pte_updated 0 0 mmu_pte_write0 0 mmu_recycled 0 0 -Alexey, 3.3.2008. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1906272group_id=180599 - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] ncurses support
Aurelien Jarno wrote: Hi, ncurses support has been added recently to the QEMU CVS. Would it be possible to update KVM from the latest QEMU CVS to add ncurses support to KVM? I've merged qemu-cvs, will push once it passes regression tests. -- error compiling committee.c: too many arguments to function - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] mmu notifiers #v8
On Mon, Mar 03, 2008 at 02:10:17PM +0100, Nick Piggin wrote: On Mon, Mar 03, 2008 at 01:51:53PM +0100, Andrea Arcangeli wrote: On Mon, Mar 03, 2008 at 04:29:34AM +0100, Nick Piggin wrote: to something I prefer. Others may not, but I'll post them for debate anyway. Sure, thanks! I didn't drop invalidate_page, because invalidate_range_begin/end would be slower for usages like KVM/GRU (we don't need a begin/end there because where invalidate_page is called, the VM holds a reference on the page). do_wp_page should also use invalidate_page since it can free the page after dropping the PT lock without losing any performance (that's not true for the places where invalidate_range is called). I'm still not completely happy with this. I had a very quick look at the GRU driver, but I don't see why it can't be implemented more like the regular TLB model, and have TLB insertions depend on the linux pte, and do invalidates _after_ restricting permissions to the pte. Ie. I'd still like to get rid of invalidate_range_begin, and get rid of invalidate calls from places where permissions are relaxed. _begin exists because by the time _end is called, the VM already dropped the reference on the page. This way we can do a single invalidate no matter how large the range is. I don't see ways to remove _begin while still invoking _end a single time for the whole range. Is this just a GRU problem? Can't we just require them to take a ref on the page (IIRC Jack said GRU could be changed to more like a TLB model). Maintaining a long-term reference on a page is a problem. The GRU does not currently maintain tables to track the pages for which dropins have been done. The GRU has a large internal TLB and is designed to reference up to 8PB of memory. The size of the tables to track this many referenced pages would be a problem (at best). - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] KVM architecture docs
Hello, I'm interested in learning the technical details of KVM, possibly up to date with latest versions (KVM changes so fast!). I'm not really geared to development (yet), rather I would like to study its architecture from a security point of view. I've searched anywhere and all I could find was some basic/marketing stuff, or simple white papers explaining HW virtualization. The wiki currently has some details, but none of them are satisfying enough for my needs :-) If you're familiar with Xen, I'm looking for the KVM equivalent versions of the following docs: http://www.cl.cam.ac.uk/research/srg/netos/papers/2003-xensosp.pdf http://wiki.xensource.com/xenwiki/XenArchitecture?action=AttachFiledo=gettarget=Xen+Architecture_Q1+2008.pdf Does anything like that exist, or should I go the long way studying QEMU and KVM sources? Thanks, - Alessandro Sardo - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] use smp_cpus as lapic id
apic is not acpi, although they are acronyms. Due to a confusion of mine, those things were mixed, leading to a bug reported at https://sourceforge.net/tracker/index.php?func=detailaid=1903732group_id=180599atid=893831 This patch fixes it, by assigning smp_cpus instead of MAX_CPUS to lapic_id in the MP APIC tables. Signed-off-by: Glauber Costa [EMAIL PROTECTED] diff --git a/bios/rombios32.c b/bios/rombios32.c index 77e71ac..af18390 100755 --- a/bios/rombios32.c +++ b/bios/rombios32.c @@ -983,7 +983,7 @@ #endif putstr(q, ISA ); /* ioapic */ -ioapic_id = MAX_CPUS; +ioapic_id = smp_cpus; putb(q, 2); /* entry type = I/O APIC */ putb(q, ioapic_id); /* apic ID */ putb(q, 0x11); /* I/O APIC version number */ @@ -1427,7 +1427,7 @@ #endif io_apic = (void *)apic; io_apic-type = APIC_IO; io_apic-length = sizeof(*io_apic); -io_apic-io_apic_id = MAX_CPUS; +io_apic-io_apic_id = smp_cpus; io_apic-address = cpu_to_le32(0xfec0); io_apic-interrupt = cpu_to_le32(0); diff --git a/qemu/pc-bios/bios.bin b/qemu/pc-bios/bios.bin index 64b7abb..3a75ff5 100644 Binary files a/qemu/pc-bios/bios.bin and b/qemu/pc-bios/bios.bin differ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 4/8] KVM: MMU: hypercall based pte updates and TLB flushes
Hi Avi, Looks nice. On Sun, Mar 02, 2008 at 06:31:17PM +0200, Avi Kivity wrote: +int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes, + gpa_t addr, unsigned long *ret) +{ + int r; + struct kvm_pv_mmu_op_buffer buffer; Perhaps this structure is a little large to be on stack. + down_read(current-mm-mmap_sem); + down_read(vcpu-kvm-slots_lock); The order should be slots_locks then mmap_sem. Need some comment in the code. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] Virtio network device migration support
Hi Dor, Dor Laor wrote: void *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn) { VirtIONet *n; +const char *info_str = virtio-net; -n = (VirtIONet *)virtio_init_pci(bus, virtio-net, 6900, 0x1000, +n = (VirtIONet *)virtio_init_pci(bus, info_str, 6900, 0x1000, 0, VIRTIO_ID_NET, 0x02, 0x00, 0x00, 6, sizeof(VirtIONet)); @@ -308,5 +353,11 @@ void *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn) n-tx_timer = qemu_new_timer(vm_clock, virtio_net_tx_timer, n); n-tx_timer_active = 0; +snprintf(n-vc-info_str, sizeof(n-vc-info_str), + %s macaddr=%02x:%02x:%02x:%02x:%02x:%02x, info_str, + nd-macaddr[0], nd-macaddr[1], nd-macaddr[2], + nd-macaddr[3], nd-macaddr[4], nd-macaddr[5]); +register_savevm(info_str, 1, 1, virtio_net_save, virtio_net_load, n); I think we need to maintain an instance id and increment it here like we do for the rest of the network cards. return n-vdev; } diff --git a/qemu/hw/virtio.c b/qemu/hw/virtio.c index 634f869..69fe810 100644 --- a/qemu/hw/virtio.c +++ b/qemu/hw/virtio.c @@ -180,6 +180,59 @@ void virtio_reset(void *opaque) } } +void virtio_dev_save(QEMUFile *f, VirtIODevice *vdev) +{ +int i; + +qemu_put_be32s(f, vdev-features); +qemu_put_be16s(f, vdev-queue_sel); +qemu_put_8s(f, vdev-status); +qemu_put_8s(f, vdev-isr); + +for(i = 0; i VIRTIO_PCI_QUEUE_MAX; i++) { +if (!vdev-vq[i].vring.num) +continue; +qemu_put_be32s(f, vdev-vq[i].pfn); +qemu_put_be16s(f, vdev-vq[i].last_avail_idx); +qemu_put_be32s(f, vdev-vq[i].index); + +/* Save the descriptor ring instead of constantly mark them dirty */ +qemu_put_buffer(f, (uint8_t*)vdev-vq[i].vring.desc, vdev-vq[i].vring.num * sizeof(VRingDesc)); +qemu_put_buffer(f, (uint8_t*)vdev-vq[i].vring.avail, TARGET_PAGE_SIZE); +qemu_put_buffer(f, (uint8_t*)vdev-vq[i].vring.used, TARGET_PAGE_SIZE); I think these two need to be sizeof(VRingAvail) * vring.num and sizeof(VringUsed) * vring.num Regards, Anthony Liguori - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] mmu notifiers #v8
On Mon, Mar 03, 2008 at 09:18:59AM -0600, Jack Steiner wrote: On Mon, Mar 03, 2008 at 02:10:17PM +0100, Nick Piggin wrote: On Mon, Mar 03, 2008 at 01:51:53PM +0100, Andrea Arcangeli wrote: On Mon, Mar 03, 2008 at 04:29:34AM +0100, Nick Piggin wrote: to something I prefer. Others may not, but I'll post them for debate anyway. Sure, thanks! I didn't drop invalidate_page, because invalidate_range_begin/end would be slower for usages like KVM/GRU (we don't need a begin/end there because where invalidate_page is called, the VM holds a reference on the page). do_wp_page should also use invalidate_page since it can free the page after dropping the PT lock without losing any performance (that's not true for the places where invalidate_range is called). I'm still not completely happy with this. I had a very quick look at the GRU driver, but I don't see why it can't be implemented more like the regular TLB model, and have TLB insertions depend on the linux pte, and do invalidates _after_ restricting permissions to the pte. Ie. I'd still like to get rid of invalidate_range_begin, and get rid of invalidate calls from places where permissions are relaxed. _begin exists because by the time _end is called, the VM already dropped the reference on the page. This way we can do a single invalidate no matter how large the range is. I don't see ways to remove _begin while still invoking _end a single time for the whole range. Is this just a GRU problem? Can't we just require them to take a ref on the page (IIRC Jack said GRU could be changed to more like a TLB model). Maintaining a long-term reference on a page is a problem. The GRU does not currently maintain tables to track the pages for which dropins have been done. The GRU has a large internal TLB and is designed to reference up to 8PB of memory. The size of the tables to track this many referenced pages would be a problem (at best). Is it any worse a problem than the pagetables of the processes which have their virtual memory exported to GRU? AFAIKS, no; it is on the same magnitude of difficulty. So you could do it without introducing any fundamental problem (memory usage might be increased by some constant factor, but I think we can cope with that in order to make the core patch really nice and simple). It is going to be really easy to add more weird and wonderful notifiers later that deviate from our standard TLB model. It would be much harder to remove them. So I really want to see everyone conform to this model first. Numbers and comparisons can be brought out afterwards if people want to attempt to make such changes. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] mmu notifiers #v8
On Mon, Mar 03, 2008 at 05:59:10PM +0100, Nick Piggin wrote: On Mon, Mar 03, 2008 at 09:18:59AM -0600, Jack Steiner wrote: On Mon, Mar 03, 2008 at 02:10:17PM +0100, Nick Piggin wrote: On Mon, Mar 03, 2008 at 01:51:53PM +0100, Andrea Arcangeli wrote: On Mon, Mar 03, 2008 at 04:29:34AM +0100, Nick Piggin wrote: to something I prefer. Others may not, but I'll post them for debate anyway. Sure, thanks! I didn't drop invalidate_page, because invalidate_range_begin/end would be slower for usages like KVM/GRU (we don't need a begin/end there because where invalidate_page is called, the VM holds a reference on the page). do_wp_page should also use invalidate_page since it can free the page after dropping the PT lock without losing any performance (that's not true for the places where invalidate_range is called). I'm still not completely happy with this. I had a very quick look at the GRU driver, but I don't see why it can't be implemented more like the regular TLB model, and have TLB insertions depend on the linux pte, and do invalidates _after_ restricting permissions to the pte. Ie. I'd still like to get rid of invalidate_range_begin, and get rid of invalidate calls from places where permissions are relaxed. _begin exists because by the time _end is called, the VM already dropped the reference on the page. This way we can do a single invalidate no matter how large the range is. I don't see ways to remove _begin while still invoking _end a single time for the whole range. The range invalidates have a performance advantage for the GRU. TLB invalidates on the GRU are relatively slow (usec) and interfere somewhat with the performance of other active GRU instructions. Invalidating a large chunk of addresses with a single GRU TLBINVAL operation is must faster than issuing a stream of single page TLBINVALs. I expect this performance advantage will also apply to other users of mmuops. Is this just a GRU problem? Can't we just require them to take a ref on the page (IIRC Jack said GRU could be changed to more like a TLB model). Maintaining a long-term reference on a page is a problem. The GRU does not currently maintain tables to track the pages for which dropins have been done. The GRU has a large internal TLB and is designed to reference up to 8PB of memory. The size of the tables to track this many referenced pages would be a problem (at best). Is it any worse a problem than the pagetables of the processes which have their virtual memory exported to GRU? AFAIKS, no; it is on the same magnitude of difficulty. So you could do it without introducing any fundamental problem (memory usage might be increased by some constant factor, but I think we can cope with that in order to make the core patch really nice and simple). Functionally, the GRU is very close to what I would consider to be the standard TLB model. Dropins and flushs map closely to processor dropins and flushes for cpus. The internal structure of the GRU TLB is identical to the TLB of existing cpus. Requiring the GRU driver to track dropins with long term page references seems to me a deviation from having the basic mmuops support a standard TLB model. AFAIK, no other processor requires this. Tracking TLB dropins (and long term page references) could be done but it adds significant complexity and scaling issues. The size of the tables to track many TB (to PB) of memory can get large. If the memory is being referenced by highly threaded applications, then the problem becomes even more complex. Either tables must be replicated per-thread (and require even more memory), or the table structure becomes even more complex to deal with node locality, cacheline bouncing, etc. Try to avoid a requirement to track dropins with long term page references. It is going to be really easy to add more weird and wonderful notifiers later that deviate from our standard TLB model. It would be much harder to remove them. So I really want to see everyone conform to this model first. Agree. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] mmu notifiers #v8
Jack Steiner wrote: The range invalidates have a performance advantage for the GRU. TLB invalidates on the GRU are relatively slow (usec) and interfere somewhat with the performance of other active GRU instructions. Invalidating a large chunk of addresses with a single GRU TLBINVAL operation is must faster than issuing a stream of single page TLBINVALs. I expect this performance advantage will also apply to other users of mmuops. In theory this would apply to kvm as well (coalesce tlb flush IPIs, lookup shadow page table once), but is it really a fast path? What triggers range operations for your use cases? -- error compiling committee.c: too many arguments to function - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] mmu notifiers #v8
On Mon, Mar 03, 2008 at 08:09:49PM +0200, Avi Kivity wrote: Jack Steiner wrote: The range invalidates have a performance advantage for the GRU. TLB invalidates on the GRU are relatively slow (usec) and interfere somewhat with the performance of other active GRU instructions. Invalidating a large chunk of addresses with a single GRU TLBINVAL operation is must faster than issuing a stream of single page TLBINVALs. I expect this performance advantage will also apply to other users of mmuops. In theory this would apply to kvm as well (coalesce tlb flush IPIs, lookup shadow page table once), but is it really a fast path? What triggers range operations for your use cases? Although not frequent, an unmap of a multiple TB object could be quite painful if each page was invalidated individually instead of 1 invalidate for the entire range. This is even worse if the application is threaded and the object has been reference by many GRUs (there are 16 GRU ports per node - each potentially has to be invalidated). Forks (again, not frequent) would be another case. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] mmu notifiers #v8
On Mon, Mar 03, 2008 at 12:06:05PM -0600, Jack Steiner wrote: On Mon, Mar 03, 2008 at 05:59:10PM +0100, Nick Piggin wrote: Maintaining a long-term reference on a page is a problem. The GRU does not currently maintain tables to track the pages for which dropins have been done. The GRU has a large internal TLB and is designed to reference up to 8PB of memory. The size of the tables to track this many referenced pages would be a problem (at best). Is it any worse a problem than the pagetables of the processes which have their virtual memory exported to GRU? AFAIKS, no; it is on the same magnitude of difficulty. So you could do it without introducing any fundamental problem (memory usage might be increased by some constant factor, but I think we can cope with that in order to make the core patch really nice and simple). Functionally, the GRU is very close to what I would consider to be the standard TLB model. Dropins and flushs map closely to processor dropins and flushes for cpus. The internal structure of the GRU TLB is identical to the TLB of existing cpus. Requiring the GRU driver to track dropins with long term page references seems to me a deviation from having the basic mmuops support a standard TLB model. AFAIK, no other processor requires this. That is because the CPU TLBs have the mmu_gather batching APIs which avoid the problem. It would be possible to do something similar for GRU which would involve taking a reference for each page-to-be-invalidated in invalidate_page, and release them when you invalidate_range. Or else do some other scheme which makes mmu notifiers work similarly to the mmu gather API. But not just go an invent something completely different in the form of this invalidate_begin,clear linux pte,invalidate_end API. Tracking TLB dropins (and long term page references) could be done but it adds significant complexity and scaling issues. The size of the tables to track many TB (to PB) of memory can get large. If the memory is being referenced by highly threaded applications, then the problem becomes even more complex. Either tables must be replicated per-thread (and require even more memory), or the table structure becomes even more complex to deal with node locality, cacheline bouncing, etc. I don't think it would be that significant in terms of complexity or scaling. For a quick solution, you could stick a radix tree in each of your mmu notifiers registered (ie. one per mm), which is indexed on virtual address PAGE_SHIFT, and returns the struct page *. Size is no different than page tables, and locking is pretty scalable. After that, I would really like to see whether the numbers justify larger changes. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] mmu notifiers #v8
On Mon, 3 Mar 2008, Nick Piggin wrote: I'm still not completely happy with this. I had a very quick look at the GRU driver, but I don't see why it can't be implemented more like the regular TLB model, and have TLB insertions depend on the linux pte, and do invalidates _after_ restricting permissions to the pte. Ie. I'd still like to get rid of invalidate_range_begin, and get rid of invalidate calls from places where permissions are relaxed. Isnt this more a job for paravirt ops if it is so tightly bound to page tables? Are we not adding another similar API? If we can agree on the API, then I don't see any reason why it can't go into 2.6.25, unless someome wants more time to review it (but 2.6.25 release should be quite far away still so there should be quite a bit of time). API still has rcu issues and the example given for making things sleepable is only working for the aging callback. The most important callback is for try_to_unmao and page_mkclean. This means the API is still not generic enough and likely not extendable as needed in its present form. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] mmu notifiers #v8
On Mon, 3 Mar 2008, Nick Piggin wrote: It is going to be really easy to add more weird and wonderful notifiers later that deviate from our standard TLB model. It would be much harder to remove them. So I really want to see everyone conform to this model first. Numbers and comparisons can be brought out afterwards if people want to attempt to make such changes. Still do not see how that could be done. The model here is tightly bound to ptes. AFAICT this could be implemented in arch code like the paravirt ops. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] mmu notifiers #v8
On Mon, 3 Mar 2008, Nick Piggin wrote: Move definition of struct mmu_notifier and struct mmu_notifier_ops under CONFIG_MMU_NOTIFIER to ensure they doesn't get dereferenced when they don't make sense. The callbacks take a mmu_notifier parameter. So how does this compile for !MMU_NOTIFIER? - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges
On Mon, 3 Mar 2008, Nick Piggin wrote: Your skeleton is just registering notifiers and saying /* you fill the hard part in */ If somebody needs a skeleton in order just to register the notifiers, then almost by definition they are unqualified to write the hard part ;) Its also providing a locking scheme. OK, there are ways to solve it or hack around it. But this is exactly why I think the implementations should be kept seperate. Andrea's notifiers are coherent, work on all types of mappings, and will hopefully match closely the regular TLB invalidation sequence in the Linux VM (at the moment it is quite close, but I hope to make it a bit closer) so that it requires almost no changes to the mm. Then put it into the arch code for TLB invalidation. Paravirt ops gives good examples on how to do that. What about a completely different approach... XPmem runs over NUMAlink, right? Why not provide some non-sleeping way to basically IPI remote nodes over the NUMAlink where they can process the invalidation? If you intra-node cache coherency has to run over this link anyway, then presumably it is capable. There is another Linux instance at the remote end that first has to remove its own ptes. Also would not work for Inifiniband and other solutions. All the approaches that require evictions in an atomic context are limiting the approach and do not allow the generic functionality that we want in order to not add alternate APIs for this. Or another idea, why don't you LD_PRELOAD in the MPT library to also intercept munmap, mprotect, mremap etc as well as just fork()? That would give you similarly good enough coherency as the mmu notifier patches except that you can't swap (which Robin said was not a big problem). The good enough solution right now is to pin pages by elevating refcounts. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] mmu notifiers #v8
On Mon, Mar 03, 2008 at 07:45:17PM +0100, Nick Piggin wrote: On Mon, Mar 03, 2008 at 12:06:05PM -0600, Jack Steiner wrote: On Mon, Mar 03, 2008 at 05:59:10PM +0100, Nick Piggin wrote: Maintaining a long-term reference on a page is a problem. The GRU does not currently maintain tables to track the pages for which dropins have been done. The GRU has a large internal TLB and is designed to reference up to 8PB of memory. The size of the tables to track this many referenced pages would be a problem (at best). Is it any worse a problem than the pagetables of the processes which have their virtual memory exported to GRU? AFAIKS, no; it is on the same magnitude of difficulty. So you could do it without introducing any fundamental problem (memory usage might be increased by some constant factor, but I think we can cope with that in order to make the core patch really nice and simple). Functionally, the GRU is very close to what I would consider to be the standard TLB model. Dropins and flushs map closely to processor dropins and flushes for cpus. The internal structure of the GRU TLB is identical to the TLB of existing cpus. Requiring the GRU driver to track dropins with long term page references seems to me a deviation from having the basic mmuops support a standard TLB model. AFAIK, no other processor requires this. That is because the CPU TLBs have the mmu_gather batching APIs which avoid the problem. It would be possible to do something similar for GRU which would involve taking a reference for each page-to-be-invalidated in invalidate_page, and release them when you invalidate_range. Or else do some other scheme which makes mmu notifiers work similarly to the mmu gather API. But not just go an invent something completely different in the form of this invalidate_begin,clear linux pte,invalidate_end API. Correct. If the mmu_gather were passed on the mmuops callout and the callout were done at the same point as the tlb_finish_mmu(), the GRU could efficiently work w/o the range invalidates. A range invalidate might still be slightly more efficient but not measureable so. The net difference is not worth the extra complexity of range callouts. Tracking TLB dropins (and long term page references) could be done but it adds significant complexity and scaling issues. The size of the tables to track many TB (to PB) of memory can get large. If the memory is being referenced by highly threaded applications, then the problem becomes even more complex. Either tables must be replicated per-thread (and require even more memory), or the table structure becomes even more complex to deal with node locality, cacheline bouncing, etc. I don't think it would be that significant in terms of complexity or scaling. For a quick solution, you could stick a radix tree in each of your mmu notifiers registered (ie. one per mm), which is indexed on virtual address PAGE_SHIFT, and returns the struct page *. Size is no different than page tables, and locking is pretty scalable. After that, I would really like to see whether the numbers justify larger changes. I'm still concerned about performance. Each dropin would first have to access an additional data structure that would most likely be non-node-local and non-cache-resident. The net effect would be measurable but not a killer. I haven't thought about locking requirements for the radix tree. Most accesses would be read-only updates infrequent. Any chance of an RCU-based radix implementation? Otherwise, don't we add the potential for hot locks/cachelines for threaded applications ??? - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] Finally a solution that works
The only supplement for men that GUARANTEES that you will increase girth and length within as short as 2 weeks- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] mmu notifiers #v8
On Mon, Mar 03, 2008 at 11:01:22AM -0800, Christoph Lameter wrote: API still has rcu issues and the example given for making things sleepable is only working for the aging callback. The most important callback is for try_to_unmao and page_mkclean. This means the API is still not generic enough and likely not extendable as needed in its present form. I converted only one of those _notify as an example of how it should be done, because I assumed you volunteer to convert the other ones yourself during .26. It's useless to convert all of them right now, because the i_mmap_lock and anon_vma locks are still going to be spinlocks in .25. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH] mmu notifiers #v9
The only difference are Nick's changes (thanks Nick, nice work!) plus a fix to make it compile. About the removal of _begin I'm not strongly opposed to it, but I personally think that it's unnecessary if _begin avoids to build new data structures with a fixed ram (and cpu) cost per_page_ and at the same time deferring _end after the whole tlb_gather page freeing is reducing the number of invalidates. .26 will allow all the methods to sleep by following the roadmap described in the #v8 patch. KVM so far is swapping fine on top of this. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] Signed-off-by: Christoph Lameter [EMAIL PROTECTED] Signed-off-by: Nick Piggin [EMAIL PROTECTED] diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -228,6 +228,9 @@ struct mm_struct { #ifdef CONFIG_CGROUP_MEM_CONT struct mem_cgroup *mem_cgroup; #endif +#ifdef CONFIG_MMU_NOTIFIER + struct hlist_head mmu_notifier_list; +#endif }; #endif /* _LINUX_MM_TYPES_H */ diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h new file mode 100644 --- /dev/null +++ b/include/linux/mmu_notifier.h @@ -0,0 +1,194 @@ +#ifndef _LINUX_MMU_NOTIFIER_H +#define _LINUX_MMU_NOTIFIER_H + +#include linux/list.h +#include linux/spinlock.h +#include linux/mm_types.h + +struct mmu_notifier; +struct mmu_notifier_ops; + +#ifdef CONFIG_MMU_NOTIFIER + +struct mmu_notifier_ops { + /* +* Called when nobody can register any more notifier in the mm +* and after the mn notifier has been disarmed already. +*/ + void (*release)(struct mmu_notifier *mn, + struct mm_struct *mm); + + /* +* clear_flush_young is called after the VM is +* test-and-clearing the young/accessed bitflag in the +* pte. This way the VM will provide proper aging to the +* accesses to the page through the secondary MMUs and not +* only to the ones through the Linux pte. +*/ + int (*clear_flush_young)(struct mmu_notifier *mn, +struct mm_struct *mm, +unsigned long address); + + /* +* Before this is invoked any secondary MMU is still ok to +* read/write to the page previously pointed by the Linux pte +* because the old page hasn't been freed yet. If required +* set_page_dirty has to be called internally to this method. +*/ + void (*invalidate_page)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long address); + + /* +* invalidate_range_begin() and invalidate_range_end() must be +* paired. Multiple invalidate_range_begin/ends may be nested +* or called concurrently. +*/ + void (*invalidate_range_begin)(struct mmu_notifier *mn, + struct mm_struct *mm, + unsigned long start, unsigned long end); + void (*invalidate_range_end)(struct mmu_notifier *mn, +struct mm_struct *mm, +unsigned long start, unsigned long end); +}; + +struct mmu_notifier { + struct hlist_node hlist; + const struct mmu_notifier_ops *ops; +}; + +static inline int mm_has_notifiers(struct mm_struct *mm) +{ + return unlikely(!hlist_empty(mm-mmu_notifier_list)); +} + +/* + * Must hold the mmap_sem for write. + * + * RCU is used to traverse the list. A quiescent period needs to pass + * before the notifier is guaranteed to be visible to all threads. + */ +extern void mmu_notifier_register(struct mmu_notifier *mn, + struct mm_struct *mm); +/* + * Must hold the mmap_sem for write. + * + * RCU is used to traverse the list. A quiescent period needs to pass + * before the struct mmu_notifier can be freed. Alternatively it + * can be synchronously freed inside -release when the list can't + * change anymore and nobody could possibly walk it. + */ +extern void mmu_notifier_unregister(struct mmu_notifier *mn, + struct mm_struct *mm); + +extern void __mmu_notifier_release(struct mm_struct *mm); +extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm, + unsigned long address); +extern void __mmu_notifier_invalidate_page(struct mm_struct *mm, + unsigned long address); +extern void __mmu_notifier_invalidate_range_begin(struct mm_struct *mm, + unsigned long start, unsigned long end); +extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm, + unsigned long start, unsigned long end); + + +static inline void mmu_notifier_release(struct mm_struct *mm) +{ + if (mm_has_notifiers(mm)) +
[kvm-devel] [PATCH] KVM swapping with mmu notifiers #v9
Notably the registration now requires the mmap_sem in write mode. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 41962e7..e1287ab 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -21,6 +21,7 @@ config KVM tristate Kernel-based Virtual Machine (KVM) support depends on HAVE_KVM EXPERIMENTAL select PREEMPT_NOTIFIERS + select MMU_NOTIFIER select ANON_INODES ---help--- Support hosting fully virtualized guest machines using hardware diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 4583329..4067b0f 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -642,6 +642,110 @@ static void rmap_write_protect(struct kvm *kvm, u64 gfn) account_shadowed(kvm, gfn); } +static void kvm_unmap_spte(struct kvm *kvm, u64 *spte) +{ + struct page *page = pfn_to_page((*spte PT64_BASE_ADDR_MASK) PAGE_SHIFT); + get_page(page); + rmap_remove(kvm, spte); + set_shadow_pte(spte, shadow_trap_nonpresent_pte); + kvm_flush_remote_tlbs(kvm); + __free_page(page); +} + +static void kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp) +{ + u64 *spte, *curr_spte; + + spte = rmap_next(kvm, rmapp, NULL); + while (spte) { + BUG_ON(!(*spte PT_PRESENT_MASK)); + rmap_printk(kvm_rmap_unmap_hva: spte %p %llx\n, spte, *spte); + curr_spte = spte; + spte = rmap_next(kvm, rmapp, spte); + kvm_unmap_spte(kvm, curr_spte); + } +} + +void kvm_unmap_hva(struct kvm *kvm, unsigned long hva) +{ + int i; + + /* +* If mmap_sem isn't taken, we can look the memslots with only +* the mmu_lock by skipping over the slots with userspace_addr == 0. +*/ + spin_lock(kvm-mmu_lock); + for (i = 0; i kvm-nmemslots; i++) { + struct kvm_memory_slot *memslot = kvm-memslots[i]; + unsigned long start = memslot-userspace_addr; + unsigned long end; + + /* mmu_lock protects userspace_addr */ + if (!start) + continue; + + end = start + (memslot-npages PAGE_SHIFT); + if (hva = start hva end) { + gfn_t gfn_offset = (hva - start) PAGE_SHIFT; + kvm_unmap_rmapp(kvm, memslot-rmap[gfn_offset]); + } + } + spin_unlock(kvm-mmu_lock); +} + +static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp) +{ + u64 *spte; + int young = 0; + + spte = rmap_next(kvm, rmapp, NULL); + while (spte) { + int _young; + u64 _spte = *spte; + BUG_ON(!(_spte PT_PRESENT_MASK)); + _young = _spte PT_ACCESSED_MASK; + if (_young) { + young = !!_young; + set_shadow_pte(spte, _spte ~PT_ACCESSED_MASK); + } + spte = rmap_next(kvm, rmapp, spte); + } + return young; +} + +int kvm_age_hva(struct kvm *kvm, unsigned long hva) +{ + int i; + int young = 0; + + /* +* If mmap_sem isn't taken, we can look the memslots with only +* the mmu_lock by skipping over the slots with userspace_addr == 0. +*/ + spin_lock(kvm-mmu_lock); + for (i = 0; i kvm-nmemslots; i++) { + struct kvm_memory_slot *memslot = kvm-memslots[i]; + unsigned long start = memslot-userspace_addr; + unsigned long end; + + /* mmu_lock protects userspace_addr */ + if (!start) + continue; + + end = start + (memslot-npages PAGE_SHIFT); + if (hva = start hva end) { + gfn_t gfn_offset = (hva - start) PAGE_SHIFT; + young |= kvm_age_rmapp(kvm, memslot-rmap[gfn_offset]); + } + } + spin_unlock(kvm-mmu_lock); + + if (young) + kvm_flush_remote_tlbs(kvm); + + return young; +} + #ifdef MMU_DEBUG static int is_empty_shadow_page(u64 *spt) { diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 17f9d16..b014b19 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -380,6 +380,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, int r; struct page *page; int largepage = 0; + unsigned mmu_seq; pgprintk(%s: addr %lx err %x\n, __FUNCTION__, addr, error_code); kvm_mmu_audit(vcpu, pre page fault); @@ -415,6 +416,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, largepage = 1; } } + mmu_seq = read_seqbegin(vcpu-kvm-arch.mmu_notifier_invalidate_lock); page = gfn_to_page(vcpu-kvm, walker.gfn);
Re: [kvm-devel] [PATCH] Use spin_lock_irqsave/restore for virtio-pci
On Monday 03 March 2008 09:37:48 Anthony Liguori wrote: virtio-pci acquires its spin lock in an interrupt context so it's necessary to use spin_lock_irqsave/restore variants. This patch fixes guest SMP when using virtio devices in KVM. Signed-off-by: Anthony Liguori [EMAIL PROTECTED] Thanks, applied. Rusty. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] KVM swapping with mmu notifiers #v9
ציטוט Andrea Arcangeli: Notably the registration now requires the mmap_sem in write mode. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 41962e7..e1287ab 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -21,6 +21,7 @@ config KVM tristate Kernel-based Virtual Machine (KVM) support depends on HAVE_KVM EXPERIMENTAL select PREEMPT_NOTIFIERS + select MMU_NOTIFIER select ANON_INODES ---help--- Support hosting fully virtualized guest machines using hardware diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 4583329..4067b0f 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -642,6 +642,110 @@ static void rmap_write_protect(struct kvm *kvm, u64 gfn) account_shadowed(kvm, gfn); } +static void kvm_unmap_spte(struct kvm *kvm, u64 *spte) +{ + struct page *page = pfn_to_page((*spte PT64_BASE_ADDR_MASK) PAGE_SHIFT); + get_page(page); + rmap_remove(kvm, spte); + set_shadow_pte(spte, shadow_trap_nonpresent_pte); + kvm_flush_remote_tlbs(kvm); + __free_page(page); i wrote to you about this before (i didnt get answer for this so i write again) with large pages support i think we need to use here put_page +} + +static void kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp) +{ + u64 *spte, *curr_spte; + + spte = rmap_next(kvm, rmapp, NULL); + while (spte) { + BUG_ON(!(*spte PT_PRESENT_MASK)); + rmap_printk(kvm_rmap_unmap_hva: spte %p %llx\n, spte, *spte); + curr_spte = spte; + spte = rmap_next(kvm, rmapp, spte); + kvm_unmap_spte(kvm, curr_spte); + } +} + +void kvm_unmap_hva(struct kvm *kvm, unsigned long hva) +{ + int i; + + /* + * If mmap_sem isn't taken, we can look the memslots with only + * the mmu_lock by skipping over the slots with userspace_addr == 0. + */ + spin_lock(kvm-mmu_lock); + for (i = 0; i kvm-nmemslots; i++) { + struct kvm_memory_slot *memslot = kvm-memslots[i]; + unsigned long start = memslot-userspace_addr; + unsigned long end; + + /* mmu_lock protects userspace_addr */ + if (!start) + continue; + + end = start + (memslot-npages PAGE_SHIFT); + if (hva = start hva end) { + gfn_t gfn_offset = (hva - start) PAGE_SHIFT; + kvm_unmap_rmapp(kvm, memslot-rmap[gfn_offset]); + } + } + spin_unlock(kvm-mmu_lock); +} + +static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp) +{ + u64 *spte; + int young = 0; + + spte = rmap_next(kvm, rmapp, NULL); + while (spte) { + int _young; + u64 _spte = *spte; + BUG_ON(!(_spte PT_PRESENT_MASK)); + _young = _spte PT_ACCESSED_MASK; + if (_young) { + young = !!_young; + set_shadow_pte(spte, _spte ~PT_ACCESSED_MASK); + } + spte = rmap_next(kvm, rmapp, spte); + } + return young; +} + +int kvm_age_hva(struct kvm *kvm, unsigned long hva) +{ + int i; + int young = 0; + + /* + * If mmap_sem isn't taken, we can look the memslots with only + * the mmu_lock by skipping over the slots with userspace_addr == 0. + */ + spin_lock(kvm-mmu_lock); + for (i = 0; i kvm-nmemslots; i++) { + struct kvm_memory_slot *memslot = kvm-memslots[i]; + unsigned long start = memslot-userspace_addr; + unsigned long end; + + /* mmu_lock protects userspace_addr */ + if (!start) + continue; + + end = start + (memslot-npages PAGE_SHIFT); + if (hva = start hva end) { + gfn_t gfn_offset = (hva - start) PAGE_SHIFT; + young |= kvm_age_rmapp(kvm, memslot-rmap[gfn_offset]); + } + } + spin_unlock(kvm-mmu_lock); + + if (young) + kvm_flush_remote_tlbs(kvm); + + return young; +} + #ifdef MMU_DEBUG static int is_empty_shadow_page(u64 *spt) { diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 17f9d16..b014b19 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -380,6 +380,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, int r; struct page *page; int largepage = 0; + unsigned mmu_seq; pgprintk(%s: addr %lx err %x\n, __FUNCTION__, addr, error_code); kvm_mmu_audit(vcpu, pre page fault); @@ -415,6 +416,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, largepage = 1; } } + mmu_seq =
Re: [kvm-devel] KVM architecture docs
http://ols.108.redhat.com/2007/Reprints/kivity-Reprint.pdf Hi Avi, I have a question about KVM architecture after reading your paper. It reads: .. At the kernel level, the kernel causes the hardware to enter guest mode. If the processor exits guest mode due to an event such as an external interrupt or a shadow page table fault, the kernel performs the necessary handling and resumes guest execution. If the exit reason is due to an I/O instruction or a signal queued to the process, then the kernel exits to userspace. .. After reading your paper my understanding of KVM architecture is that for a particular VM the user mode(QEMU), kernel mode and guest mode share the same process context from host linux kernel's point of view, right? If this is the case, see the below example: 1 physical NIC interrupt is received on physical CPU 0 and host kernel determines that this is a network packet targeted to the emulated NIC for a VM 2 at the same time this VM is running in guest mode on physical CPU 1 My question is: at this time can host kernel *actively* interrupt VM and make it run in user mode to handle the incoming network data packet in QEMU? Or host kernel has to wait for VM(because of external interrupt or shadow page table fault or I/O instruction) to quit guest mode and wait for VM to voluntarily detect that incoming network packet is pending and switch to user space? A further question is, how a VM detect the incoming pending network packet? In kernel space or in user space? Thanks, Forrest - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] job opportunity available
Hi, Since the start of 2008 our organization starts recruiting home workers willing to take part in well-paying research studies conducted by leading online businesses. Your opinion as a consumer is important for the success and profitability of many business ventures. That is why they are ready to pay generously for what you think. You can earn from $300 to $600 a week for participating in on-line surveys, focus group discussions, and product/service evaluations. Become part of our team and earn: - from $5 to $50 for participating in on-line surveys - from $20 to $120 for participating in product/service evaluations - from $30 to $155 for participating in virtual focus group discussions What's best, all you need to work with us is a computer, an Internet connection, a valid e-mail address, and good English skills. You decide when to finish your tasks and how much time to devote to each survey assignment. You can also keep this job for the rest of the year! If you want to become one of our highly valued survey takers, please write back to [EMAIL PROTECTED] and I will send you more information. Sincerely, April Johnson Assistant HR - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] KVM Test result, kernel daf4de3.., userspace 724f8a9.. One new issue
Hi, all, This is today's KVM test result against kvm.git daf4de30ec718b16798aba07e9f25fd9e6ba9e53 and kvm-userspace.git 724f8a940ec0e78e607c051e6e82ca2f5055b1e1. In today's testing , save/restore crashed host once on pae/ia32e hosts. One new issue has been found: 1. blue screen when booting 64bits windows guests /tracker/index.php?func=detailaid=1906751group_id=180599atid=893831 https://sourceforge.net/tracker/index.php?func=detailaid=1906751group_id=180599atid=893831 Old issues: 2. Can not boot guests with 2.6.9 smp pae kernel 3. qcow based smp linux guests likely hang https://sourceforge.net/tracker/index.php?func=detailaid=1901980group_id=180599atid=893831 4. Fails to save/restore guests https://sourceforge.net/tracker/index.php?func=detailaid=1824525group_id=180599atid=893831 5. smp windows installer crashes while rebooting https://sourceforge.net/tracker/index.php?func=detailaid=1877875group_id=180599atid=893831 6. Timer of guest is inaccurate https://sourceforge.net/tracker/?func=detailatid=893831aid=1826080group_id=180599 7. Installer of 64bit vista guest will pause for ten minutes after reboot https://sourceforge.net/tracker/?func=detailatid=893831aid=1836905group_id=180599 Test environment PlatformWoodcrest CPU 4 Memory size 8G' Details IA32-pae: 1. boot guest with 256M memory PASS 2. boot two windows xp guest PASS 3. boot 4 same guest in parallelPASS 4. boot linux and windows guest in parallel PASS 5. boot guest with 1500M memory PASS 6. boot windows 2003 with ACPI enabled PASS 7. boot Windows xp with ACPI enabled PASS 8. boot Windows 2000 without ACPI PASS 9. kernel build on SMP linux guestFAIL 10. LTP on SMP linux guest FAIL 11. boot base kernel linux PASS 12. save/restore 32-bit HVM guests PASS 13. live migration 32-bit HVM guests PASS 14. boot SMP Windows xp with ACPI enabledPASS 15. boot SMP Windows 2003 with ACPI enabled PASS 16. boot SMP Windows 2000 with ACPI enabled PASS IA32e: 1. boot four 32-bit guest in parallel PASS 2. boot four 64-bit guest in parallel PASS 3. boot 4G 64-bit guest FAIL 4. boot 4G pae guest PASS 5. boot 32-bit linux and 32 bit windows guest in parallelPASS 6. boot 32-bit guest with 1500M memory PASS 7. boot 64-bit guest with 1500M memory FAIL 8. boot 32-bit guest with 256M memory PASS 9. boot 64-bit guest with 256M memory FAIL 10. boot two 32-bit windows xp in parallelPASS 11. boot four 32-bit different guest in paraPASS 12. save/restore 64-bit linux guests PASS 13. save/restore 32-bit linux guests PASS 14. boot 32-bit SMP windows 2003 with ACPI enabled PASS 15. boot 32-bit SMP Windows 2000 with ACPI enabledPASS 16. boot 32-bit SMP Windows xp with ACPI enabledPASS 17. boot 32-bit Windows 2000 without ACPIPASS 18. boot 64-bit Windows xp with ACPI enabledFAIL 19. boot 32-bit Windows xp without ACPIPASS 20. boot 64-bit UP vista PASS 21. boot 64-bit SMP vista FAIL 22. kernel build in 32-bit linux guest OS FAIL 23. kernel build in 64-bit linux guest OS PASS 24. LTP on SMP 32-bit linux guest OSFAIL 25. LTP on SMP 64-bit linux guest OSPASS 26. boot 64-bit guests with ACPI enabled PASS 27. boot 32-bit x-server PASS 28. boot 64-bit SMP windows XP with ACPI enabled FAIL 29. boot 64-bit SMP windows 2003 with ACPI enabled FAIL 30. live migration 64bit linux guests PASS 31. live migration 32bit linux guests PASS Report Summary on IA32-pae Summary Test Report of Last Session
Re: [kvm-devel] I/O bandwidth control on KVM
Hi, If you are using virtio drivers in the guest (which I presume you are given the reference to /dev/vda), try using the following -drive syntax: -drive file=/dev/mapper/ioband1,if=virtio,boot=on,cache=off This will force the use of O_DIRECT. By default, QEMU does not open with O_DIRECT so you'll see page cache effects. Thank you for your suggestion. I was using virtio drives as you wrote. I just thought that kvm would use O_DIRECT option when applications on the guest opened files with O_DIRECT option. I'll try the way you mentioned and report back. Thanks, Ryo Tsuruta - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [ kvm-Bugs-1906751 ] blue screen when booting 64bits windows guests
Bugs item #1906751, was opened at 2008-03-04 13:24 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1906751group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: yunfeng (yunfeng) Assigned to: Nobody/Anonymous (nobody) Summary: blue screen when booting 64bits windows guests Initial Comment: Environment: Host: ia32e rhel5 Guest OS : ia32e windows Change Set:kerneldaf4de30ec718b16798aba07e9f25fd9e6ba9e53 userspace 724f8a940ec0e78e607c051e6e82ca2f5055b1e1 Hardware:Platform Woodcrest CPU 4 Memory size 8G' Bug detailed description: -- can't boot 64bit windows 2k3/xp guests on 64bit host. Blue screen happens while booting 64bit windows. 64bits SMP vista guest can't boot, but UP vista can boot. Reproduce steps: 1. create qcow image qemu-img create -b /share/xvs/img/Windows/ia32e_vistaRTM.img -f qcow2 /share/xvs/var/tmp-img 2. boot the guest qemu-img create -b /share/xvs/img/Windows/kvm/kvm_win2000_up_noacpi_ia32.img -f qcow2 /share/xvs/var/tmp-img_gbp25_1204570139_1 -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1906751group_id=180599 - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [RFC] Notifier for Externally Mapped Memory (EMM)
Stripped things down and did what Andrea and I talked about last Friday. No invalidate_page callbacks. No ops anymore. Simple linked list for notifier. No RCU. Added the code to rmap.h and rmap.c (after all it is concerned with handling mappings). This patch implements a simple callback for device drivers that establish their own references to pages (KVM, GRU, XPmem, RDMA/Infiniband, DMA engines etc). These references are unknown to the VM (therefore external). With these callbacks it is possible for the device driver to release external references when the VM requests it. This enables swapping, page migration and allows support of remapping, permission changes etc etc for externally mapped memory. With this functionality it becomes possible to avoid pinning or mlocking pages (commonly done to stop the VM from unmapping pages). A device driver must subscribe to a process using emm_register_notifier The VM will then perform callbacks for operations that unmap or change permissions of pages in that address space. When the process terminates the callback function is called with emm_release. Callbacks are performed before and after the unmapping action of the VM. emm_invalidate_startbefore emm_invalidate_end after Callbacks are mostly performed in a non atomic context. However, in various places spinlocks are held to traverse rmaps. So this patch here is only useful for those devices that can remove mappings in an atomic context (f.e. KVM/GRU). If the rmap traversal spinlocks are converted to semaphores then all callbacks willbe performed in a nonatomic context. Callouts can stay where they are. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- include/linux/mm_types.h |3 + include/linux/rmap.h | 51 + kernel/fork.c|3 + mm/Kconfig |5 +++ mm/filemap_xip.c |5 +++ mm/fremap.c |2 + mm/hugetlb.c |4 ++ mm/memory.c | 32 ++-- mm/mmap.c|3 + mm/mprotect.c|3 + mm/mremap.c |5 +++ mm/rmap.c| 72 ++- 12 files changed, 183 insertions(+), 5 deletions(-) Index: linux-2.6/include/linux/mm_types.h === --- linux-2.6.orig/include/linux/mm_types.h 2008-03-03 22:54:11.961264684 -0800 +++ linux-2.6/include/linux/mm_types.h 2008-03-03 22:55:13.333569600 -0800 @@ -225,6 +225,9 @@ struct mm_struct { /* aio bits */ rwlock_tioctx_list_lock; struct kioctx *ioctx_list; +#ifdef CONFIG_EMM_NOTIFIER + struct emm_notifier *emm_notifier; +#endif #ifdef CONFIG_CGROUP_MEM_CONT struct mem_cgroup *mem_cgroup; #endif Index: linux-2.6/mm/Kconfig === --- linux-2.6.orig/mm/Kconfig 2008-03-03 22:54:11.993264520 -0800 +++ linux-2.6/mm/Kconfig2008-03-03 22:55:13.337569625 -0800 @@ -193,3 +193,8 @@ config NR_QUICK config VIRT_TO_BUS def_bool y depends on !ARCH_NO_VIRT_TO_BUS + +config EMM_NOTIFIER + def_bool n + bool External Mapped Memory Notifier for drivers directly mapping memory + Index: linux-2.6/mm/mmap.c === --- linux-2.6.orig/mm/mmap.c2008-03-03 22:54:12.053265354 -0800 +++ linux-2.6/mm/mmap.c 2008-03-03 22:59:25.522848812 -0800 @@ -1747,11 +1747,13 @@ static void unmap_region(struct mm_struc lru_add_drain(); tlb = tlb_gather_mmu(mm, 0); update_hiwater_rss(mm); + emm_notify(mm, emm_invalidate_start, start, end); unmap_vmas(tlb, vma, start, end, nr_accounted, NULL); vm_unacct_memory(nr_accounted); free_pgtables(tlb, vma, prev? prev-vm_end: FIRST_USER_ADDRESS, next? next-vm_start: 0); tlb_finish_mmu(tlb, start, end); + emm_notify(mm, emm_invalidate_end, start, end); } /* @@ -2038,6 +2040,7 @@ void exit_mmap(struct mm_struct *mm) /* mm's last user has gone, and its about to be pulled down */ arch_exit_mmap(mm); + emm_notify(mm, emm_release, 0, TASK_SIZE); lru_add_drain(); flush_cache_mm(mm); Index: linux-2.6/mm/mprotect.c === --- linux-2.6.orig/mm/mprotect.c2008-03-03 22:54:12.069264942 -0800 +++ linux-2.6/mm/mprotect.c 2008-03-03 22:55:13.337569625 -0800 @@ -21,6 +21,7 @@ #include linux/syscalls.h #include linux/swap.h #include linux/swapops.h +#include linux/rmap.h #include asm/uaccess.h #include asm/pgtable.h #include asm/cacheflush.h @@ -198,10 +199,12 @@ success: dirty_accountable = 1; } + emm_notify(mm, emm_invalidate_start, start, end); if
[kvm-devel] [Early draft] Conversion of i_mmap_lock to semaphore
Not there but the system boots and is usable. Complains about atomic contexts because the tlb functions use a get_cpu() and thus disable preempt. Not sure yet what to do about the cond_resched_lock stuff etc. Convert i_mmap_lock to i_mmap_sem The conversion to a rwsemaphore allows callbacks during rmap traversal for files in a non atomic context. A rw style lock allows concurrent walking of the reverse map. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] --- arch/x86/mm/hugetlbpage.c |4 ++-- fs/hugetlbfs/inode.c |4 ++-- fs/inode.c|2 +- include/linux/fs.h|2 +- include/linux/mm.h|2 +- kernel/fork.c |4 ++-- mm/filemap.c |8 mm/filemap_xip.c |4 ++-- mm/fremap.c |4 ++-- mm/hugetlb.c | 11 +-- mm/memory.c | 28 mm/migrate.c |4 ++-- mm/mmap.c | 16 mm/mremap.c |4 ++-- mm/rmap.c | 20 +--- 15 files changed, 51 insertions(+), 66 deletions(-) Index: linux-2.6/arch/x86/mm/hugetlbpage.c === --- linux-2.6.orig/arch/x86/mm/hugetlbpage.c2008-03-03 22:59:25.386848427 -0800 +++ linux-2.6/arch/x86/mm/hugetlbpage.c 2008-03-03 22:59:31.174878038 -0800 @@ -69,7 +69,7 @@ static void huge_pmd_share(struct mm_str if (!vma_shareable(vma, addr)) return; - spin_lock(mapping-i_mmap_lock); + down_read(mapping-i_mmap_sem); vma_prio_tree_foreach(svma, iter, mapping-i_mmap, idx, idx) { if (svma == vma) continue; @@ -94,7 +94,7 @@ static void huge_pmd_share(struct mm_str put_page(virt_to_page(spte)); spin_unlock(mm-page_table_lock); out: - spin_unlock(mapping-i_mmap_lock); + up_read(mapping-i_mmap_sem); } /* Index: linux-2.6/fs/hugetlbfs/inode.c === --- linux-2.6.orig/fs/hugetlbfs/inode.c 2008-03-03 22:59:25.410848010 -0800 +++ linux-2.6/fs/hugetlbfs/inode.c 2008-03-03 22:59:31.174878038 -0800 @@ -454,10 +454,10 @@ static int hugetlb_vmtruncate(struct ino pgoff = offset PAGE_SHIFT; i_size_write(inode, offset); - spin_lock(mapping-i_mmap_lock); + down_read(mapping-i_mmap_sem); if (!prio_tree_empty(mapping-i_mmap)) hugetlb_vmtruncate_list(mapping-i_mmap, pgoff); - spin_unlock(mapping-i_mmap_lock); + up_read(mapping-i_mmap_sem); truncate_hugepages(inode, offset); return 0; } Index: linux-2.6/fs/inode.c === --- linux-2.6.orig/fs/inode.c 2008-03-03 22:59:25.418848099 -0800 +++ linux-2.6/fs/inode.c2008-03-03 22:59:31.202878206 -0800 @@ -210,7 +210,7 @@ void inode_init_once(struct inode *inode INIT_LIST_HEAD(inode-i_devices); INIT_RADIX_TREE(inode-i_data.page_tree, GFP_ATOMIC); rwlock_init(inode-i_data.tree_lock); - spin_lock_init(inode-i_data.i_mmap_lock); + init_rwsem(inode-i_data.i_mmap_sem); INIT_LIST_HEAD(inode-i_data.private_list); spin_lock_init(inode-i_data.private_lock); INIT_RAW_PRIO_TREE_ROOT(inode-i_data.i_mmap); Index: linux-2.6/include/linux/fs.h === --- linux-2.6.orig/include/linux/fs.h 2008-03-03 22:59:25.430848089 -0800 +++ linux-2.6/include/linux/fs.h2008-03-03 22:59:31.202878206 -0800 @@ -503,7 +503,7 @@ struct address_space { unsigned inti_mmap_writable;/* count VM_SHARED mappings */ struct prio_tree_root i_mmap; /* tree of private and shared mappings */ struct list_headi_mmap_nonlinear;/*list VM_NONLINEAR mappings */ - spinlock_t i_mmap_lock;/* protect tree, count, list */ + struct rw_semaphore i_mmap_sem; /* protect tree, count, list */ unsigned inttruncate_count; /* Cover race condition with truncate */ unsigned long nrpages;/* number of total pages */ pgoff_t writeback_index;/* writeback starts here */ Index: linux-2.6/include/linux/mm.h === --- linux-2.6.orig/include/linux/mm.h 2008-03-03 22:59:25.442848167 -0800 +++ linux-2.6/include/linux/mm.h2008-03-03 22:59:31.202878206 -0800 @@ -709,7 +709,7 @@ struct zap_details { struct address_space *check_mapping;/* Check page-mapping if set */ pgoff_t first_index;/* Lowest page-index to unmap */ pgoff_t last_index; /* Highest page-index to unmap */ - spinlock_t *i_mmap_lock;/* For