date:20080303

Bugs item #1906189, was opened at 2008-03-03 13:33
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1906189group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Technologov (technologov)
Assigned to: Nobody/Anonymous (nobody)
Summary: All SMP guests often halt

Initial Comment:
All SMP configurations are very unstable - both on Intel and AMD. KVM-62.

Symptons: guests often soft-lock ups, or more precisely, they slow down to 
unacceptable speeds.
Guests may hard-lockup totally, or even BSOD in some cases.

I have tried:
 Windows 2000
 Windows XP
 Windows Server 2003
 Windows Server 2008

The KVM acts, but it looks like a loop.

=
[EMAIL PROTECTED] win2000-Pro]$ dmesg | tail -n40
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 1
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 1
SIPI to vcpu 1 vector 0x10
apic write: bad size=1 fee00030
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 1
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 1
SIPI to vcpu 1 vector 0x10
apic write: bad size=1 fee00030
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 1
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 1
Ignoring de-assert INIT to vcpu 1
SIPI to vcpu 1 vector 0x21
SIPI to vcpu 1 vector 0x21
SIPI to vcpu 1 vector 0x21
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 1
SIPI to vcpu 1 vector 0x21
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 1
SIPI to vcpu 1 vector 0x10
apic write: bad size=1 fee00030
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 1
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 1
Ignoring de-assert INIT to vcpu 1
SIPI to vcpu 1 vector 0x21
SIPI to vcpu 1 vector 0x21
SIPI to vcpu 1 vector 0x21
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 1
SIPI to vcpu 1 vector 0x21
Ignoring de-assert INIT to vcpu 0
Ignoring de-assert INIT to vcpu 1

=
(gdb) bt
#0  0x003a016c9aa7 in ioctl () from /lib64/libc.so.6
#1  0x0051bb29 in kvm_run (kvm=0x2a9b040, vcpu=0) at libkvm.c:850
#2  0x004fda86 in kvm_cpu_exec (env=value optimized out)
at /root/Linstall/kvm-62rc2/qemu/qemu-kvm.c:127
#3  0x004fe5d5 in kvm_main_loop_cpu (env=0x2b56490)
at /root/Linstall/kvm-62rc2/qemu/qemu-kvm.c:307
#4  0x004110fd in main (argc=44675488, argv=value optimized out)
at /root/Linstall/kvm-62rc2/qemu/vl.c:7862
=

kvm statistics

 efer_reload 103701   0
 exits512480997   20642
 fpu_reload24781662 799
 halt_exits 1824249 170
 halt_wakeup 828699  68
 host_state_reload 495932451617
 hypercalls   0   0
 insn_emulation   389188282   14239
 insn_emulation_fail   1110   0
 invlpg   0   0
 io_exits  28855411 928
 irq_exits 191313613248
 irq_window   0   0
 largepages   0   0
 mmio_exits16078802   0
 mmu_cache_miss 4219404 415
 mmu_flooded4110773 410
 mmu_pde_zapped  499335   6
 mmu_pte_updated   103816391327
 mmu_pte_write 145679441737
 mmu_recycled 17419   0
 mmu_shadow_zapped  4372079 410  

=

-Alexey, 03.03.2008.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1906189group_id=180599

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [ kvm-Bugs-1906204 ] AMD NPT causes performance degradation

Bugs item #1906204, was opened at 2008-03-03 13:45
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1906204group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Technologov (technologov)
Assigned to: Nobody/Anonymous (nobody)
Summary: AMD NPT causes performance degradation

Initial Comment:
Platform: F7/x64, AMD Barcelona K10, KVM-61.

Guest: Windows XP SP2.

By default, the new Nested Page Tables is enabled, which should accelerate 
guests.

While it *does* accelerate guests in some areas, particularly guest OS setup 
time dropped by 20% - which is great, but in other areas I see performance 
degradation.

For example:
Passmark PerformanceTest v6.1 shows
2D Graphics Marks: 78.6 (without NPT)

2D Graphics Marks: 18.9 (with NPT)

NPT was disabled using:
# rmmod kvm-amd
# modprobe kvm-amd npt=0
# dmesg | tail

and all the graphics feel more sluggish, and way slower.
I have used SDL rendering.

-Alexey, 03.03.2008.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1906204group_id=180599

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] mmu notifiers #v8

On Mon, Mar 03, 2008 at 04:29:34AM +0100, Nick Piggin wrote:
 to something I prefer. Others may not, but I'll post them for debate
 anyway.

Sure, thanks!

  I didn't drop invalidate_page, because invalidate_range_begin/end
  would be slower for usages like KVM/GRU (we don't need a begin/end
  there because where invalidate_page is called, the VM holds a
  reference on the page). do_wp_page should also use invalidate_page
  since it can free the page after dropping the PT lock without losing
  any performance (that's not true for the places where invalidate_range
  is called).
 
 I'm still not completely happy with this. I had a very quick look
 at the GRU driver, but I don't see why it can't be implemented
 more like the regular TLB model, and have TLB insertions depend on
 the linux pte, and do invalidates _after_ restricting permissions
 to the pte.
 
 Ie. I'd still like to get rid of invalidate_range_begin, and get
 rid of invalidate calls from places where permissions are relaxed.

_begin exists because by the time _end is called, the VM already
dropped the reference on the page. This way we can do a single
invalidate no matter how large the range is. I don't see ways to
remove _begin while still invoking _end a single time for the whole
range.

 If we can agree on the API, then I don't see any reason why it can't
 go into 2.6.25, unless someome wants more time to review it (but
 2.6.25 release should be quite far away still so there should be quite
 a bit of time).

Cool! ;)

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH] Virtio network device migration support

2008-03-03 Thread Dor Laor

Virtio network device migration support

It is composed of state saving and dirty bit tracking.

Added dirty bit tracking for the rx packets.
There is no need to add dirty bits for the outgoing
packets since we do not write over guest memory.
As for the descriptor ring (guest memory), I rather
copy the entire ring (3 pages) when saving state than
touching the dirty bits every time in fast path.

Besides that the virtio device, pci bus state and
the network device states are saved.

Signed-off-by: Dor Laor [EMAIL PROTECTED]

diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c
index eb2a441..612cf6b 100644
--- a/qemu/hw/virtio-net.c
+++ b/qemu/hw/virtio-net.c
@@ -128,6 +128,7 @@ static void virtio_net_receive(void *opaque, const uint8_t 
*buf, int size)
 hdr = (void *)elem.in_sg[0].iov_base;
 hdr-flags = 0;
 hdr-gso_type = VIRTIO_NET_HDR_GSO_NONE;
+cpu_physical_memory_set_dirty((ram_addr_t)elem.in_sg[0].iov_base - 
(ram_addr_t)phys_ram_base);
 
 /* copy in packet.  ugh */
 offset = 0;
@@ -136,6 +137,7 @@ static void virtio_net_receive(void *opaque, const uint8_t 
*buf, int size)
int len = MIN(elem.in_sg[i].iov_len, size - offset);
memcpy(elem.in_sg[i].iov_base, buf + offset, len);
offset += len;
+cpu_physical_memory_set_dirty((ram_addr_t)elem.in_sg[i].iov_base - 
(ram_addr_t)phys_ram_base);
i++;
 }
 
@@ -210,6 +212,8 @@ again:
 else
 fprintf(stderr, reading network error %d, len);
 }
+cpu_physical_memory_set_dirty((ram_addr_t)elem.in_sg[1].iov_base - 
(ram_addr_t)phys_ram_base);
+cpu_physical_memory_set_dirty((ram_addr_t)elem.in_sg[0].iov_base - 
(ram_addr_t)phys_ram_base);
 virtqueue_push(vnet-rx_vq, elem, sizeof(*hdr) + len);
 vnet-do_notify = 1;
 }
@@ -281,11 +285,52 @@ static void virtio_net_tx_timer(void *opaque)
 virtio_net_flush_tx(n, n-tx_vq);
 }
 
+
+static void virtio_net_save(QEMUFile *f, void *opaque)
+{
+VirtIONet *n = opaque;
+
+pci_device_save(n-vdev.pci_dev, f);
+qemu_put_buffer(f, n-mac, sizeof n-mac);
+qemu_put_be32s(f, n-can_receive);
+
+virtio_dev_save(f, n-vdev);
+}
+
+static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
+{
+VirtIONet *n = opaque;
+int ret;
+
+if (version_id  1) {
+fprintf(stderr, %s: not supporting version  1\n, __FUNCTION__);
+return -1;
+}
+
+if ((ret = pci_device_load(n-vdev.pci_dev, f))  0)
+return ret;
+
+qemu_get_buffer(f, n-mac, sizeof n-mac);
+qemu_get_be32s(f, n-can_receive);
+
+
+if ((ret = virtio_dev_load(f, n-vdev, version_id))  0)
+return ret;
+
+/* Make sure we kick the tx */ 
+qemu_mod_timer(n-tx_timer,
+   qemu_get_clock(vm_clock) + TX_TIMER_INTERVAL);
+n-tx_timer_active = 1;
+
+return 0;
+}
+
 void *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn)
 {
 VirtIONet *n;
+const char *info_str = virtio-net;
 
-n = (VirtIONet *)virtio_init_pci(bus, virtio-net, 6900, 0x1000,
+n = (VirtIONet *)virtio_init_pci(bus, info_str, 6900, 0x1000,
 0, VIRTIO_ID_NET,
 0x02, 0x00, 0x00,
 6, sizeof(VirtIONet));
@@ -308,5 +353,11 @@ void *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn)
 n-tx_timer = qemu_new_timer(vm_clock, virtio_net_tx_timer, n);
 n-tx_timer_active = 0;
 
+snprintf(n-vc-info_str, sizeof(n-vc-info_str),
+ %s macaddr=%02x:%02x:%02x:%02x:%02x:%02x, info_str,
+ nd-macaddr[0], nd-macaddr[1], nd-macaddr[2],
+ nd-macaddr[3], nd-macaddr[4], nd-macaddr[5]);
+register_savevm(info_str, 1, 1, virtio_net_save, virtio_net_load, n);
+
 return n-vdev;
 }
diff --git a/qemu/hw/virtio.c b/qemu/hw/virtio.c
index 634f869..69fe810 100644
--- a/qemu/hw/virtio.c
+++ b/qemu/hw/virtio.c
@@ -180,6 +180,59 @@ void virtio_reset(void *opaque)
 }
 }
 
+void virtio_dev_save(QEMUFile *f, VirtIODevice *vdev)
+{
+int i;
+
+qemu_put_be32s(f, vdev-features);
+qemu_put_be16s(f, vdev-queue_sel);
+qemu_put_8s(f, vdev-status);
+qemu_put_8s(f, vdev-isr);
+
+for(i = 0; i  VIRTIO_PCI_QUEUE_MAX; i++) {
+if (!vdev-vq[i].vring.num)
+continue;
+qemu_put_be32s(f, vdev-vq[i].pfn);
+qemu_put_be16s(f, vdev-vq[i].last_avail_idx);
+qemu_put_be32s(f, vdev-vq[i].index);
+
+/* Save the descriptor ring instead of constantly mark them dirty */
+qemu_put_buffer(f, (uint8_t*)vdev-vq[i].vring.desc, 
vdev-vq[i].vring.num * sizeof(VRingDesc));
+qemu_put_buffer(f, (uint8_t*)vdev-vq[i].vring.avail, 
TARGET_PAGE_SIZE);
+qemu_put_buffer(f, (uint8_t*)vdev-vq[i].vring.used, TARGET_PAGE_SIZE);
+}
+}
+
+int virtio_dev_load(QEMUFile *f, VirtIODevice *vdev, int version_id)
+{
+ int i;
+
+ if (version_id  1)

Re: [kvm-devel] [PATCH] mmu notifiers #v8

2008-03-03 Thread Nick Piggin

On Mon, Mar 03, 2008 at 01:51:53PM +0100, Andrea Arcangeli wrote:
 On Mon, Mar 03, 2008 at 04:29:34AM +0100, Nick Piggin wrote:
  to something I prefer. Others may not, but I'll post them for debate
  anyway.
 
 Sure, thanks!
 
   I didn't drop invalidate_page, because invalidate_range_begin/end
   would be slower for usages like KVM/GRU (we don't need a begin/end
   there because where invalidate_page is called, the VM holds a
   reference on the page). do_wp_page should also use invalidate_page
   since it can free the page after dropping the PT lock without losing
   any performance (that's not true for the places where invalidate_range
   is called).
  
  I'm still not completely happy with this. I had a very quick look
  at the GRU driver, but I don't see why it can't be implemented
  more like the regular TLB model, and have TLB insertions depend on
  the linux pte, and do invalidates _after_ restricting permissions
  to the pte.
  
  Ie. I'd still like to get rid of invalidate_range_begin, and get
  rid of invalidate calls from places where permissions are relaxed.
 
 _begin exists because by the time _end is called, the VM already
 dropped the reference on the page. This way we can do a single
 invalidate no matter how large the range is. I don't see ways to
 remove _begin while still invoking _end a single time for the whole
 range.

Is this just a GRU problem? Can't we just require them to take a ref
on the page (IIRC Jack said GRU could be changed to more like a TLB
model).

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] mmu notifiers #v8

On Mon, Mar 03, 2008 at 02:10:17PM +0100, Nick Piggin wrote:
 Is this just a GRU problem? Can't we just require them to take a ref
 on the page (IIRC Jack said GRU could be changed to more like a TLB
 model).

Yes, it's just a GRU problem, it tries to optimize performance by
calling follow_page only in the fast path, and fallbacks to
get_user_pages; put_page in the slow path. xpmem could also send the
message in _begin and wait the message in _end, to reduce the wait
time. But if you forge GRU to call get_user_pages only (like KVM
does), the _begin can be removed. In theory we could also optimize KVM
to use follow_page only if the pte is already established. I'm not
sure how much that is a worthwhile optimization though.

However note that Quadrics also had a callback before and one after,
so they may be using the callback before for similar
optimizations. But functionality-wise _end is the only required bit if
everyone takes refcounts like KVM and XPMEM do.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [ kvm-Bugs-1906272 ] Debian 4 fails to boot on KVM-AMD

Bugs item #1906272, was opened at 2008-03-03 15:53
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1906272group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Technologov (technologov)
Assigned to: Nobody/Anonymous (nobody)
Summary: Debian 4 fails to boot on KVM-AMD

Initial Comment:
Host: AMD Barcelona K10, F7/x64, KVM-62.

Guest: Debian 4 (32-bit).

Problem: When installing Debian 4 (32-bit) on KVM-AMD host, it installs k7 
kernel by default, and the resulting image is not bootable.

It can be booted only with -no-kvm.

This problem can be avoided when installing on KVM-intel host, which uses 
i686 kernel for Debian guests.

This is because Debian's setup check for CPUID and setups the kernel, that best 
matches the current CPU.

This kernel (i686) can be booted from kvm-intel or from kvm-amd without 
problems.

Perhaps KVM-AMD doesn't emulate something K7 specific (3Dnow ?).

I don't know what is the best solution to this problem, but I think using 
custom CPUID when installing Debian guests might do it.

Any ideas?

===

(gdb) bt
#0  0x003dd02c9117 in ioctl () from /lib64/libc.so.6
#1  0x0051bb29 in kvm_run (kvm=0x2a9b040, vcpu=0) at libkvm.c:850
#2  0x004fda86 in kvm_cpu_exec (env=value optimized out)
at /root/Linstall/kvm-62rc2/qemu/qemu-kvm.c:127
#3  0x004fe5d5 in kvm_main_loop_cpu (env=0x2b82bb0)
at /root/Linstall/kvm-62rc2/qemu/qemu-kvm.c:307
#4  0x004110fd in main (argc=44675488, argv=value optimized out)
at /root/Linstall/kvm-62rc2/qemu/vl.c:7862

===

kvm statistics

 efer_reload  0   0
 exits 11387804  324872
 fpu_reload 1340894 296
 halt_exits   0   0
 halt_wakeup  0   0
 host_state_reload  1340931 295
 hypercalls   0   0
 insn_emulation10053389  323534
 insn_emulation_fail   10009814  323534
 invlpg   0   0
 io_exits   13347571004
 irq_exits0   0
 irq_window   0   0
 largepages   0   0
 mmio_exits   27231   0
 mmu_cache_miss  20   0
 mmu_flooded  0   0
 mmu_pde_zapped   0   0
 mmu_pte_updated  0   0
 mmu_pte_write0   0
 mmu_recycled 0   0

-Alexey, 3.3.2008.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1906272group_id=180599

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] ncurses support

2008-03-03 Thread Avi Kivity

Aurelien Jarno wrote:
 Hi,

 ncurses support has been added recently to the QEMU CVS. Would it be
 possible to update KVM from the latest QEMU CVS to add ncurses support
 to KVM?
   

I've merged qemu-cvs, will push once it passes regression tests.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] mmu notifiers #v8

On Mon, Mar 03, 2008 at 02:10:17PM +0100, Nick Piggin wrote:
 On Mon, Mar 03, 2008 at 01:51:53PM +0100, Andrea Arcangeli wrote:
  On Mon, Mar 03, 2008 at 04:29:34AM +0100, Nick Piggin wrote:
   to something I prefer. Others may not, but I'll post them for debate
   anyway.
  
  Sure, thanks!
  
I didn't drop invalidate_page, because invalidate_range_begin/end
would be slower for usages like KVM/GRU (we don't need a begin/end
there because where invalidate_page is called, the VM holds a
reference on the page). do_wp_page should also use invalidate_page
since it can free the page after dropping the PT lock without losing
any performance (that's not true for the places where invalidate_range
is called).
   
   I'm still not completely happy with this. I had a very quick look
   at the GRU driver, but I don't see why it can't be implemented
   more like the regular TLB model, and have TLB insertions depend on
   the linux pte, and do invalidates _after_ restricting permissions
   to the pte.
   
   Ie. I'd still like to get rid of invalidate_range_begin, and get
   rid of invalidate calls from places where permissions are relaxed.
  
  _begin exists because by the time _end is called, the VM already
  dropped the reference on the page. This way we can do a single
  invalidate no matter how large the range is. I don't see ways to
  remove _begin while still invoking _end a single time for the whole
  range.
 
 Is this just a GRU problem? Can't we just require them to take a ref
 on the page (IIRC Jack said GRU could be changed to more like a TLB
 model).

Maintaining a long-term reference on a page is a problem. The GRU does not
currently maintain tables to track the pages for which dropins have been done.

The GRU has a large internal TLB and is designed to reference up to 8PB of
memory. The size of the tables to track this many referenced pages would be
a problem (at best).

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] KVM architecture docs

2008-03-03 Thread Alessandro Sardo

Hello,

I'm interested in learning the technical details of KVM, possibly up to 
date with latest versions (KVM changes so fast!). I'm not really geared 
to development (yet), rather I would like to study its architecture from 
a security point of view.

I've searched anywhere and all I could find was some basic/marketing 
stuff, or simple white papers explaining HW virtualization. The wiki 
currently has some details, but none of them are satisfying enough for 
my needs :-)

If you're familiar with Xen, I'm looking for the KVM equivalent versions 
of the following docs:

http://www.cl.cam.ac.uk/research/srg/netos/papers/2003-xensosp.pdf
http://wiki.xensource.com/xenwiki/XenArchitecture?action=AttachFiledo=gettarget=Xen+Architecture_Q1+2008.pdf

Does anything like that exist, or should I go the long way studying QEMU 
and KVM sources?

Thanks,

- Alessandro Sardo

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] use smp_cpus as lapic id

2008-03-03 Thread Glauber Costa

apic is not acpi, although they are acronyms. Due to a confusion of 
mine, those things were mixed, leading to a bug reported at

https://sourceforge.net/tracker/index.php?func=detailaid=1903732group_id=180599atid=893831

This patch fixes it, by assigning smp_cpus instead of MAX_CPUS to 
lapic_id in the MP APIC tables.


Signed-off-by: Glauber Costa [EMAIL PROTECTED]
diff --git a/bios/rombios32.c b/bios/rombios32.c
index 77e71ac..af18390 100755
--- a/bios/rombios32.c
+++ b/bios/rombios32.c
@@ -983,7 +983,7 @@ #endif
 putstr(q, ISA   );
 
 /* ioapic */
-ioapic_id = MAX_CPUS;
+ioapic_id = smp_cpus;
 putb(q, 2); /* entry type = I/O APIC */
 putb(q, ioapic_id); /* apic ID */
 putb(q, 0x11); /* I/O APIC version number */
@@ -1427,7 +1427,7 @@ #endif
 io_apic = (void *)apic;
 io_apic-type = APIC_IO;
 io_apic-length = sizeof(*io_apic);
-io_apic-io_apic_id = MAX_CPUS;
+io_apic-io_apic_id = smp_cpus;
 io_apic-address = cpu_to_le32(0xfec0);
 io_apic-interrupt = cpu_to_le32(0);
 
diff --git a/qemu/pc-bios/bios.bin b/qemu/pc-bios/bios.bin
index 64b7abb..3a75ff5 100644
Binary files a/qemu/pc-bios/bios.bin and b/qemu/pc-bios/bios.bin differ
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 4/8] KVM: MMU: hypercall based pte updates and TLB flushes

2008-03-03 Thread Marcelo Tosatti

Hi Avi,

Looks nice.

On Sun, Mar 02, 2008 at 06:31:17PM +0200, Avi Kivity wrote:
 +int kvm_pv_mmu_op(struct kvm_vcpu *vcpu, unsigned long bytes,
 +   gpa_t addr, unsigned long *ret)
 +{
 + int r;
 + struct kvm_pv_mmu_op_buffer buffer;

Perhaps this structure is a little large to be on stack.

 + down_read(current-mm-mmap_sem);
 + down_read(vcpu-kvm-slots_lock);

The order should be slots_locks then mmap_sem. Need some comment in the
code.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] Virtio network device migration support

2008-03-03 Thread Anthony Liguori

Hi Dor,

Dor Laor wrote:
  void *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn)
  {
  VirtIONet *n;
 +const char *info_str = virtio-net;

 -n = (VirtIONet *)virtio_init_pci(bus, virtio-net, 6900, 0x1000,
 +n = (VirtIONet *)virtio_init_pci(bus, info_str, 6900, 0x1000,
0, VIRTIO_ID_NET,
0x02, 0x00, 0x00,
6, sizeof(VirtIONet));
 @@ -308,5 +353,11 @@ void *virtio_net_init(PCIBus *bus, NICInfo *nd, int 
 devfn)
  n-tx_timer = qemu_new_timer(vm_clock, virtio_net_tx_timer, n);
  n-tx_timer_active = 0;

 +snprintf(n-vc-info_str, sizeof(n-vc-info_str),
 + %s macaddr=%02x:%02x:%02x:%02x:%02x:%02x, info_str,
 + nd-macaddr[0], nd-macaddr[1], nd-macaddr[2],
 + nd-macaddr[3], nd-macaddr[4], nd-macaddr[5]);
 +register_savevm(info_str, 1, 1, virtio_net_save, virtio_net_load, n);
   

I think we need to maintain an instance id and increment it here like we 
do for the rest of the network cards.

  return n-vdev;
  }
 diff --git a/qemu/hw/virtio.c b/qemu/hw/virtio.c
 index 634f869..69fe810 100644
 --- a/qemu/hw/virtio.c
 +++ b/qemu/hw/virtio.c
 @@ -180,6 +180,59 @@ void virtio_reset(void *opaque)
  }
  }

 +void virtio_dev_save(QEMUFile *f, VirtIODevice *vdev)
 +{
 +int i;
 +
 +qemu_put_be32s(f, vdev-features);
 +qemu_put_be16s(f, vdev-queue_sel);
 +qemu_put_8s(f, vdev-status);
 +qemu_put_8s(f, vdev-isr);
 +
 +for(i = 0; i  VIRTIO_PCI_QUEUE_MAX; i++) {
 +if (!vdev-vq[i].vring.num)
 +continue;
 +qemu_put_be32s(f, vdev-vq[i].pfn);
 +qemu_put_be16s(f, vdev-vq[i].last_avail_idx);
 +qemu_put_be32s(f, vdev-vq[i].index);
 +
 +/* Save the descriptor ring instead of constantly mark them dirty */
 +qemu_put_buffer(f, (uint8_t*)vdev-vq[i].vring.desc, 
 vdev-vq[i].vring.num * sizeof(VRingDesc));
 +qemu_put_buffer(f, (uint8_t*)vdev-vq[i].vring.avail, 
 TARGET_PAGE_SIZE);
 +qemu_put_buffer(f, (uint8_t*)vdev-vq[i].vring.used, 
 TARGET_PAGE_SIZE);
   

I think these two need to be sizeof(VRingAvail) * vring.num and 
sizeof(VringUsed) * vring.num

Regards,

Anthony Liguori


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] mmu notifiers #v8

2008-03-03 Thread Nick Piggin

On Mon, Mar 03, 2008 at 09:18:59AM -0600, Jack Steiner wrote:
 On Mon, Mar 03, 2008 at 02:10:17PM +0100, Nick Piggin wrote:
  On Mon, Mar 03, 2008 at 01:51:53PM +0100, Andrea Arcangeli wrote:
   On Mon, Mar 03, 2008 at 04:29:34AM +0100, Nick Piggin wrote:
to something I prefer. Others may not, but I'll post them for debate
anyway.
   
   Sure, thanks!
   
 I didn't drop invalidate_page, because invalidate_range_begin/end
 would be slower for usages like KVM/GRU (we don't need a begin/end
 there because where invalidate_page is called, the VM holds a
 reference on the page). do_wp_page should also use invalidate_page
 since it can free the page after dropping the PT lock without losing
 any performance (that's not true for the places where invalidate_range
 is called).

I'm still not completely happy with this. I had a very quick look
at the GRU driver, but I don't see why it can't be implemented
more like the regular TLB model, and have TLB insertions depend on
the linux pte, and do invalidates _after_ restricting permissions
to the pte.

Ie. I'd still like to get rid of invalidate_range_begin, and get
rid of invalidate calls from places where permissions are relaxed.
   
   _begin exists because by the time _end is called, the VM already
   dropped the reference on the page. This way we can do a single
   invalidate no matter how large the range is. I don't see ways to
   remove _begin while still invoking _end a single time for the whole
   range.
  
  Is this just a GRU problem? Can't we just require them to take a ref
  on the page (IIRC Jack said GRU could be changed to more like a TLB
  model).
 
 Maintaining a long-term reference on a page is a problem. The GRU does not
 currently maintain tables to track the pages for which dropins have been done.
 
 The GRU has a large internal TLB and is designed to reference up to 8PB of
 memory. The size of the tables to track this many referenced pages would be
 a problem (at best).

Is it any worse a problem than the pagetables of the processes which have
their virtual memory exported to GRU? AFAIKS, no; it is on the same
magnitude of difficulty. So you could do it without introducing any
fundamental problem (memory usage might be increased by some constant
factor, but I think we can cope with that in order to make the core patch
really nice and simple).

It is going to be really easy to add more weird and wonderful notifiers
later that deviate from our standard TLB model. It would be much harder to
remove them. So I really want to see everyone conform to this model first.
Numbers and comparisons can be brought out afterwards if people want to
attempt to make such changes.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] mmu notifiers #v8

On Mon, Mar 03, 2008 at 05:59:10PM +0100, Nick Piggin wrote:
 On Mon, Mar 03, 2008 at 09:18:59AM -0600, Jack Steiner wrote:
  On Mon, Mar 03, 2008 at 02:10:17PM +0100, Nick Piggin wrote:
   On Mon, Mar 03, 2008 at 01:51:53PM +0100, Andrea Arcangeli wrote:
On Mon, Mar 03, 2008 at 04:29:34AM +0100, Nick Piggin wrote:
 to something I prefer. Others may not, but I'll post them for debate
 anyway.

Sure, thanks!

  I didn't drop invalidate_page, because invalidate_range_begin/end
  would be slower for usages like KVM/GRU (we don't need a begin/end
  there because where invalidate_page is called, the VM holds a
  reference on the page). do_wp_page should also use invalidate_page
  since it can free the page after dropping the PT lock without losing
  any performance (that's not true for the places where 
  invalidate_range
  is called).
 
 I'm still not completely happy with this. I had a very quick look
 at the GRU driver, but I don't see why it can't be implemented
 more like the regular TLB model, and have TLB insertions depend on
 the linux pte, and do invalidates _after_ restricting permissions
 to the pte.
 
 Ie. I'd still like to get rid of invalidate_range_begin, and get
 rid of invalidate calls from places where permissions are relaxed.

_begin exists because by the time _end is called, the VM already
dropped the reference on the page. This way we can do a single
invalidate no matter how large the range is. I don't see ways to
remove _begin while still invoking _end a single time for the whole
range.

The range invalidates have a performance advantage for the GRU. TLB invalidates
on the GRU are relatively slow (usec) and interfere somewhat with the 
performance
of other active GRU instructions. Invalidating a large chunk of addresses with
a single GRU TLBINVAL operation is must faster than issuing a stream of single
page TLBINVALs.

I expect this performance advantage will also apply to other users of mmuops.

   
   Is this just a GRU problem? Can't we just require them to take a ref
   on the page (IIRC Jack said GRU could be changed to more like a TLB
   model).
  
  Maintaining a long-term reference on a page is a problem. The GRU does not
  currently maintain tables to track the pages for which dropins have been 
  done.
  
  The GRU has a large internal TLB and is designed to reference up to 8PB of
  memory. The size of the tables to track this many referenced pages would be
  a problem (at best).
 
 Is it any worse a problem than the pagetables of the processes which have
 their virtual memory exported to GRU? AFAIKS, no; it is on the same
 magnitude of difficulty. So you could do it without introducing any
 fundamental problem (memory usage might be increased by some constant
 factor, but I think we can cope with that in order to make the core patch
 really nice and simple).

Functionally, the GRU is very close to what I would consider to be the
standard TLB model. Dropins and flushs map closely to processor dropins
and flushes for cpus.  The internal structure of the GRU TLB is identical to
the TLB of existing cpus.  Requiring the GRU driver to track dropins with
long term page references seems to me a deviation from having the basic
mmuops support a standard TLB model. AFAIK, no other processor requires
this.

Tracking TLB dropins (and long term page references) could be done but it
adds significant complexity and scaling issues. The size of the tables to
track many TB (to PB) of memory can get large. If the memory is being
referenced by highly threaded applications, then the problem becomes even
more complex. Either tables must be replicated per-thread (and require even
more memory), or the table structure becomes even more complex to deal with
node locality, cacheline bouncing, etc.

Try to avoid a requirement to track dropins with long term page references.


 It is going to be really easy to add more weird and wonderful notifiers
 later that deviate from our standard TLB model. It would be much harder to
 remove them. So I really want to see everyone conform to this model first.

Agree.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] mmu notifiers #v8

2008-03-03 Thread Avi Kivity

Jack Steiner wrote:
 The range invalidates have a performance advantage for the GRU. TLB 
 invalidates
 on the GRU are relatively slow (usec) and interfere somewhat with the 
 performance
 of other active GRU instructions. Invalidating a large chunk of addresses with
 a single GRU TLBINVAL operation is must faster than issuing a stream of single
 page TLBINVALs.

 I expect this performance advantage will also apply to other users of mmuops.
   

In theory this would apply to kvm as well (coalesce tlb flush IPIs, 
lookup shadow page table once), but is it really a fast path?  What 
triggers range operations for your use cases?

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] mmu notifiers #v8

On Mon, Mar 03, 2008 at 08:09:49PM +0200, Avi Kivity wrote:
 Jack Steiner wrote:
 The range invalidates have a performance advantage for the GRU. TLB 
 invalidates
 on the GRU are relatively slow (usec) and interfere somewhat with the 
 performance
 of other active GRU instructions. Invalidating a large chunk of addresses 
 with
 a single GRU TLBINVAL operation is must faster than issuing a stream of 
 single
 page TLBINVALs.
 
 I expect this performance advantage will also apply to other users of 
 mmuops.
   
 
 In theory this would apply to kvm as well (coalesce tlb flush IPIs, 
 lookup shadow page table once), but is it really a fast path?  What 
 triggers range operations for your use cases?
 

Although not frequent, an unmap of a multiple TB object could be quite painful
if each page was invalidated individually instead of 1 invalidate for the 
entire range.
This is even worse if the application is threaded and the object has been 
reference by
many GRUs (there are 16 GRU ports per node - each potentially has to be 
invalidated).

Forks (again, not frequent) would be another case.




-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] mmu notifiers #v8

2008-03-03 Thread Nick Piggin

On Mon, Mar 03, 2008 at 12:06:05PM -0600, Jack Steiner wrote:
 On Mon, Mar 03, 2008 at 05:59:10PM +0100, Nick Piggin wrote:
   Maintaining a long-term reference on a page is a problem. The GRU does not
   currently maintain tables to track the pages for which dropins have been 
   done.
   
   The GRU has a large internal TLB and is designed to reference up to 8PB of
   memory. The size of the tables to track this many referenced pages would 
   be
   a problem (at best).
  
  Is it any worse a problem than the pagetables of the processes which have
  their virtual memory exported to GRU? AFAIKS, no; it is on the same
  magnitude of difficulty. So you could do it without introducing any
  fundamental problem (memory usage might be increased by some constant
  factor, but I think we can cope with that in order to make the core patch
  really nice and simple).
 
 Functionally, the GRU is very close to what I would consider to be the
 standard TLB model. Dropins and flushs map closely to processor dropins
 and flushes for cpus.  The internal structure of the GRU TLB is identical to
 the TLB of existing cpus.  Requiring the GRU driver to track dropins with
 long term page references seems to me a deviation from having the basic
 mmuops support a standard TLB model. AFAIK, no other processor requires
 this.

That is because the CPU TLBs have the mmu_gather batching APIs which
avoid the problem. It would be possible to do something similar for
GRU which would involve taking a reference for each page-to-be-invalidated
in invalidate_page, and release them when you invalidate_range. Or else
do some other scheme which makes mmu notifiers work similarly to the
mmu gather API. But not just go an invent something completely different
in the form of this invalidate_begin,clear linux pte,invalidate_end API.


 Tracking TLB dropins (and long term page references) could be done but it
 adds significant complexity and scaling issues. The size of the tables to
 track many TB (to PB) of memory can get large. If the memory is being
 referenced by highly threaded applications, then the problem becomes even
 more complex. Either tables must be replicated per-thread (and require even
 more memory), or the table structure becomes even more complex to deal with
 node locality, cacheline bouncing, etc.

I don't think it would be that significant in terms of complexity or
scaling.

For a quick solution, you could stick a radix tree in each of your mmu
notifiers registered (ie. one per mm), which is indexed on virtual address
 PAGE_SHIFT, and returns the struct page *. Size is no different than
page tables, and locking is pretty scalable.

After that, I would really like to see whether the numbers justify
larger changes.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] mmu notifiers #v8

On Mon, 3 Mar 2008, Nick Piggin wrote:

 I'm still not completely happy with this. I had a very quick look
 at the GRU driver, but I don't see why it can't be implemented
 more like the regular TLB model, and have TLB insertions depend on
 the linux pte, and do invalidates _after_ restricting permissions
 to the pte.
 
 Ie. I'd still like to get rid of invalidate_range_begin, and get
 rid of invalidate calls from places where permissions are relaxed.

Isnt this more a job for paravirt ops if it is so tightly bound to page 
tables? Are we not adding another similar API?

 If we can agree on the API, then I don't see any reason why it can't
 go into 2.6.25, unless someome wants more time to review it (but
 2.6.25 release should be quite far away still so there should be quite
 a bit of time).

API still has rcu issues and the example given for making things sleepable 
is only working for the aging callback. The most important callback is for 
try_to_unmao and page_mkclean. This means the API is still not generic 
enough and likely not extendable as needed in its present form.





-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] mmu notifiers #v8

On Mon, 3 Mar 2008, Nick Piggin wrote:

 It is going to be really easy to add more weird and wonderful notifiers
 later that deviate from our standard TLB model. It would be much harder to
 remove them. So I really want to see everyone conform to this model first.
 Numbers and comparisons can be brought out afterwards if people want to
 attempt to make such changes.

Still do not see how that could be done. The model here is tightly bound 
to ptes. AFAICT this could be implemented in arch code like the paravirt 
ops.

 

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] mmu notifiers #v8

On Mon, 3 Mar 2008, Nick Piggin wrote:

 Move definition of struct mmu_notifier and struct mmu_notifier_ops under
 CONFIG_MMU_NOTIFIER to ensure they doesn't get dereferenced when they
 don't make sense.

The callbacks take a mmu_notifier parameter. So how does this compile for 
!MMU_NOTIFIER?


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

On Mon, 3 Mar 2008, Nick Piggin wrote:

 Your skeleton is just registering notifiers and saying
 
 /* you fill the hard part in */
 
 If somebody needs a skeleton in order just to register the notifiers,
 then almost by definition they are unqualified to write the hard
 part ;)

Its also providing a locking scheme.

 OK, there are ways to solve it or hack around it. But this is exactly
 why I think the implementations should be kept seperate. Andrea's
 notifiers are coherent, work on all types of mappings, and will
 hopefully match closely the regular TLB invalidation sequence in the
 Linux VM (at the moment it is quite close, but I hope to make it a
 bit closer) so that it requires almost no changes to the mm.

Then put it into the arch code for TLB invalidation. Paravirt ops gives 
good examples on how to do that.

 What about a completely different approach... XPmem runs over NUMAlink,
 right? Why not provide some non-sleeping way to basically IPI remote
 nodes over the NUMAlink where they can process the invalidation? If you
 intra-node cache coherency has to run over this link anyway, then
 presumably it is capable.

There is another Linux instance at the remote end that first has to 
remove its own ptes. Also would not work for Inifiniband and other 
solutions. All the approaches that require evictions in an atomic context 
are limiting the approach and do not allow the generic functionality that 
we want in order to not add alternate APIs for this.

 Or another idea, why don't you LD_PRELOAD in the MPT library to also
 intercept munmap, mprotect, mremap etc as well as just fork()? That
 would give you similarly good enough coherency as the mmu notifier
 patches except that you can't swap (which Robin said was not a big
 problem).

The good enough solution right now is to pin pages by elevating 
refcounts.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] mmu notifiers #v8

On Mon, Mar 03, 2008 at 07:45:17PM +0100, Nick Piggin wrote:
 On Mon, Mar 03, 2008 at 12:06:05PM -0600, Jack Steiner wrote:
  On Mon, Mar 03, 2008 at 05:59:10PM +0100, Nick Piggin wrote:
Maintaining a long-term reference on a page is a problem. The GRU does 
not
currently maintain tables to track the pages for which dropins have 
been done.

The GRU has a large internal TLB and is designed to reference up to 8PB 
of
memory. The size of the tables to track this many referenced pages 
would be
a problem (at best).
   
   Is it any worse a problem than the pagetables of the processes which have
   their virtual memory exported to GRU? AFAIKS, no; it is on the same
   magnitude of difficulty. So you could do it without introducing any
   fundamental problem (memory usage might be increased by some constant
   factor, but I think we can cope with that in order to make the core patch
   really nice and simple).
  
  Functionally, the GRU is very close to what I would consider to be the
  standard TLB model. Dropins and flushs map closely to processor dropins
  and flushes for cpus.  The internal structure of the GRU TLB is identical to
  the TLB of existing cpus.  Requiring the GRU driver to track dropins with
  long term page references seems to me a deviation from having the basic
  mmuops support a standard TLB model. AFAIK, no other processor requires
  this.
 
 That is because the CPU TLBs have the mmu_gather batching APIs which
 avoid the problem. It would be possible to do something similar for
 GRU which would involve taking a reference for each page-to-be-invalidated
 in invalidate_page, and release them when you invalidate_range. Or else
 do some other scheme which makes mmu notifiers work similarly to the
 mmu gather API. But not just go an invent something completely different
 in the form of this invalidate_begin,clear linux pte,invalidate_end API.

Correct. If the mmu_gather were passed on the mmuops callout and the callout 
were
done at the same point as the tlb_finish_mmu(), the GRU could
efficiently work w/o the range invalidates. A range invalidate might still
be slightly more efficient but not measureable so. The net difference is
not worth the extra complexity of range callouts.


 
 
  Tracking TLB dropins (and long term page references) could be done but it
  adds significant complexity and scaling issues. The size of the tables to
  track many TB (to PB) of memory can get large. If the memory is being
  referenced by highly threaded applications, then the problem becomes even
  more complex. Either tables must be replicated per-thread (and require even
  more memory), or the table structure becomes even more complex to deal with
  node locality, cacheline bouncing, etc.
 
 I don't think it would be that significant in terms of complexity or
 scaling.
 
 For a quick solution, you could stick a radix tree in each of your mmu
 notifiers registered (ie. one per mm), which is indexed on virtual address
  PAGE_SHIFT, and returns the struct page *. Size is no different than
 page tables, and locking is pretty scalable.
 
 After that, I would really like to see whether the numbers justify
 larger changes.

I'm still concerned about performance. Each dropin would first have to access
an additional data structure that would most likely be non-node-local and
non-cache-resident. The net effect would be measurable but not a killer.

I haven't thought about locking requirements for the radix tree. Most accesses
would be read-only  updates infrequent. Any chance of an RCU-based radix
implementation?  Otherwise, don't we add the potential for hot locks/cachelines
for threaded applications ???

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] Finally a solution that works

2008-03-03 Thread Deding gurung

The only supplement for men that GUARANTEES that you will increase girth and 
length within as short as 2 weeks-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] mmu notifiers #v8

On Mon, Mar 03, 2008 at 11:01:22AM -0800, Christoph Lameter wrote:
 API still has rcu issues and the example given for making things sleepable 
 is only working for the aging callback. The most important callback is for 
 try_to_unmao and page_mkclean. This means the API is still not generic 
 enough and likely not extendable as needed in its present form.

I converted only one of those _notify as an example of how it should
be done, because I assumed you volunteer to convert the other ones
yourself during .26. It's useless to convert all of them right now,
because the i_mmap_lock and anon_vma locks are still going to be
spinlocks in .25.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH] mmu notifiers #v9

The only difference are Nick's changes (thanks Nick, nice work!) plus
a fix to make it compile.

About the removal of _begin I'm not strongly opposed to it, but I
personally think that it's unnecessary if _begin avoids to build new
data structures with a fixed ram (and cpu) cost per_page_ and at the
same time deferring _end after the whole tlb_gather page freeing is
reducing the number of invalidates.

.26 will allow all the methods to sleep by following the roadmap
described in the #v8 patch.

KVM so far is swapping fine on top of this.

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]
Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
Signed-off-by: Nick Piggin [EMAIL PROTECTED]

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -228,6 +228,9 @@ struct mm_struct {
 #ifdef CONFIG_CGROUP_MEM_CONT
struct mem_cgroup *mem_cgroup;
 #endif
+#ifdef CONFIG_MMU_NOTIFIER
+   struct hlist_head mmu_notifier_list;
+#endif
 };
 
 #endif /* _LINUX_MM_TYPES_H */
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
new file mode 100644
--- /dev/null
+++ b/include/linux/mmu_notifier.h
@@ -0,0 +1,194 @@
+#ifndef _LINUX_MMU_NOTIFIER_H
+#define _LINUX_MMU_NOTIFIER_H
+
+#include linux/list.h
+#include linux/spinlock.h
+#include linux/mm_types.h
+
+struct mmu_notifier;
+struct mmu_notifier_ops;
+
+#ifdef CONFIG_MMU_NOTIFIER
+
+struct mmu_notifier_ops {
+   /*
+* Called when nobody can register any more notifier in the mm
+* and after the mn notifier has been disarmed already.
+*/
+   void (*release)(struct mmu_notifier *mn,
+   struct mm_struct *mm);
+
+   /*
+* clear_flush_young is called after the VM is
+* test-and-clearing the young/accessed bitflag in the
+* pte. This way the VM will provide proper aging to the
+* accesses to the page through the secondary MMUs and not
+* only to the ones through the Linux pte.
+*/
+   int (*clear_flush_young)(struct mmu_notifier *mn,
+struct mm_struct *mm,
+unsigned long address);
+
+   /*
+* Before this is invoked any secondary MMU is still ok to
+* read/write to the page previously pointed by the Linux pte
+* because the old page hasn't been freed yet.  If required
+* set_page_dirty has to be called internally to this method.
+*/
+   void (*invalidate_page)(struct mmu_notifier *mn,
+   struct mm_struct *mm,
+   unsigned long address);
+
+   /*
+* invalidate_range_begin() and invalidate_range_end() must be
+* paired. Multiple invalidate_range_begin/ends may be nested
+* or called concurrently.
+*/
+   void (*invalidate_range_begin)(struct mmu_notifier *mn,
+  struct mm_struct *mm,
+  unsigned long start, unsigned long end);
+   void (*invalidate_range_end)(struct mmu_notifier *mn,
+struct mm_struct *mm,
+unsigned long start, unsigned long end);
+};
+
+struct mmu_notifier {
+   struct hlist_node hlist;
+   const struct mmu_notifier_ops *ops;
+};
+
+static inline int mm_has_notifiers(struct mm_struct *mm)
+{
+   return unlikely(!hlist_empty(mm-mmu_notifier_list));
+}
+
+/*
+ * Must hold the mmap_sem for write.
+ *
+ * RCU is used to traverse the list. A quiescent period needs to pass
+ * before the notifier is guaranteed to be visible to all threads.
+ */
+extern void mmu_notifier_register(struct mmu_notifier *mn,
+ struct mm_struct *mm);
+/*
+ * Must hold the mmap_sem for write.
+ *
+ * RCU is used to traverse the list. A quiescent period needs to pass
+ * before the struct mmu_notifier can be freed. Alternatively it
+ * can be synchronously freed inside -release when the list can't
+ * change anymore and nobody could possibly walk it.
+ */
+extern void mmu_notifier_unregister(struct mmu_notifier *mn,
+   struct mm_struct *mm);
+
+extern void __mmu_notifier_release(struct mm_struct *mm);
+extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
+ unsigned long address);
+extern void __mmu_notifier_invalidate_page(struct mm_struct *mm,
+ unsigned long address);
+extern void __mmu_notifier_invalidate_range_begin(struct mm_struct *mm,
+ unsigned long start, unsigned long end);
+extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+ unsigned long start, unsigned long end);
+
+
+static inline void mmu_notifier_release(struct mm_struct *mm)
+{
+   if (mm_has_notifiers(mm))
+

[kvm-devel] [PATCH] KVM swapping with mmu notifiers #v9

Notably the registration now requires the mmap_sem in write mode.

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 41962e7..e1287ab 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -21,6 +21,7 @@ config KVM
tristate Kernel-based Virtual Machine (KVM) support
depends on HAVE_KVM  EXPERIMENTAL
select PREEMPT_NOTIFIERS
+   select MMU_NOTIFIER
select ANON_INODES
---help---
  Support hosting fully virtualized guest machines using hardware
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4583329..4067b0f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -642,6 +642,110 @@ static void rmap_write_protect(struct kvm *kvm, u64 gfn)
account_shadowed(kvm, gfn);
 }
 
+static void kvm_unmap_spte(struct kvm *kvm, u64 *spte)
+{
+   struct page *page = pfn_to_page((*spte  PT64_BASE_ADDR_MASK)  
PAGE_SHIFT);
+   get_page(page);
+   rmap_remove(kvm, spte);
+   set_shadow_pte(spte, shadow_trap_nonpresent_pte);
+   kvm_flush_remote_tlbs(kvm);
+   __free_page(page);
+}
+
+static void kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
+{
+   u64 *spte, *curr_spte;
+
+   spte = rmap_next(kvm, rmapp, NULL);
+   while (spte) {
+   BUG_ON(!(*spte  PT_PRESENT_MASK));
+   rmap_printk(kvm_rmap_unmap_hva: spte %p %llx\n, spte, *spte);
+   curr_spte = spte;
+   spte = rmap_next(kvm, rmapp, spte);
+   kvm_unmap_spte(kvm, curr_spte);
+   }
+}
+
+void kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
+{
+   int i;
+
+   /*
+* If mmap_sem isn't taken, we can look the memslots with only
+* the mmu_lock by skipping over the slots with userspace_addr == 0.
+*/
+   spin_lock(kvm-mmu_lock);
+   for (i = 0; i  kvm-nmemslots; i++) {
+   struct kvm_memory_slot *memslot = kvm-memslots[i];
+   unsigned long start = memslot-userspace_addr;
+   unsigned long end;
+
+   /* mmu_lock protects userspace_addr */
+   if (!start)
+   continue;
+
+   end = start + (memslot-npages  PAGE_SHIFT);
+   if (hva = start  hva  end) {
+   gfn_t gfn_offset = (hva - start)  PAGE_SHIFT;
+   kvm_unmap_rmapp(kvm, memslot-rmap[gfn_offset]);
+   }
+   }
+   spin_unlock(kvm-mmu_lock);
+}
+
+static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp)
+{
+   u64 *spte;
+   int young = 0;
+
+   spte = rmap_next(kvm, rmapp, NULL);
+   while (spte) {
+   int _young;
+   u64 _spte = *spte;
+   BUG_ON(!(_spte  PT_PRESENT_MASK));
+   _young = _spte  PT_ACCESSED_MASK;
+   if (_young) {
+   young = !!_young;
+   set_shadow_pte(spte, _spte  ~PT_ACCESSED_MASK);
+   }
+   spte = rmap_next(kvm, rmapp, spte);
+   }
+   return young;
+}
+
+int kvm_age_hva(struct kvm *kvm, unsigned long hva)
+{
+   int i;
+   int young = 0;
+
+   /*
+* If mmap_sem isn't taken, we can look the memslots with only
+* the mmu_lock by skipping over the slots with userspace_addr == 0.
+*/
+   spin_lock(kvm-mmu_lock);
+   for (i = 0; i  kvm-nmemslots; i++) {
+   struct kvm_memory_slot *memslot = kvm-memslots[i];
+   unsigned long start = memslot-userspace_addr;
+   unsigned long end;
+
+   /* mmu_lock protects userspace_addr */
+   if (!start)
+   continue;
+
+   end = start + (memslot-npages  PAGE_SHIFT);
+   if (hva = start  hva  end) {
+   gfn_t gfn_offset = (hva - start)  PAGE_SHIFT;
+   young |= kvm_age_rmapp(kvm, memslot-rmap[gfn_offset]);
+   }
+   }
+   spin_unlock(kvm-mmu_lock);
+
+   if (young)
+   kvm_flush_remote_tlbs(kvm);
+
+   return young;
+}
+
 #ifdef MMU_DEBUG
 static int is_empty_shadow_page(u64 *spt)
 {
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 17f9d16..b014b19 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -380,6 +380,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr,
int r;
struct page *page;
int largepage = 0;
+   unsigned mmu_seq;
 
pgprintk(%s: addr %lx err %x\n, __FUNCTION__, addr, error_code);
kvm_mmu_audit(vcpu, pre page fault);
@@ -415,6 +416,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr,
largepage = 1;
}
}
+   mmu_seq = read_seqbegin(vcpu-kvm-arch.mmu_notifier_invalidate_lock);
page = gfn_to_page(vcpu-kvm, walker.gfn);

Re: [kvm-devel] [PATCH] Use spin_lock_irqsave/restore for virtio-pci

2008-03-03 Thread Rusty Russell

On Monday 03 March 2008 09:37:48 Anthony Liguori wrote:
 virtio-pci acquires its spin lock in an interrupt context so it's necessary
 to use spin_lock_irqsave/restore variants.  This patch fixes guest SMP when
 using virtio devices in KVM.

 Signed-off-by: Anthony Liguori [EMAIL PROTECTED]

Thanks, applied.

Rusty.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] KVM swapping with mmu notifiers #v9

2008-03-03 Thread izik eidus

ציטוט Andrea Arcangeli:
 Notably the registration now requires the mmap_sem in write mode.

 Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]

 diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
 index 41962e7..e1287ab 100644
 --- a/arch/x86/kvm/Kconfig
 +++ b/arch/x86/kvm/Kconfig
 @@ -21,6 +21,7 @@ config KVM
   tristate Kernel-based Virtual Machine (KVM) support
   depends on HAVE_KVM  EXPERIMENTAL
   select PREEMPT_NOTIFIERS
 + select MMU_NOTIFIER
   select ANON_INODES
   ---help---
 Support hosting fully virtualized guest machines using hardware
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 4583329..4067b0f 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -642,6 +642,110 @@ static void rmap_write_protect(struct kvm *kvm, u64 gfn)
   account_shadowed(kvm, gfn);
  }
  
 +static void kvm_unmap_spte(struct kvm *kvm, u64 *spte)
 +{
 + struct page *page = pfn_to_page((*spte  PT64_BASE_ADDR_MASK)  
 PAGE_SHIFT);
 + get_page(page);
 + rmap_remove(kvm, spte);
 + set_shadow_pte(spte, shadow_trap_nonpresent_pte);
 + kvm_flush_remote_tlbs(kvm);
 + __free_page(page);
   

i wrote to you about this before (i didnt get answer for this so i write 
again)
with large pages support i think we need to use here put_page

 +}
 +
 +static void kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
 +{
 + u64 *spte, *curr_spte;
 +
 + spte = rmap_next(kvm, rmapp, NULL);
 + while (spte) {
 + BUG_ON(!(*spte  PT_PRESENT_MASK));
 + rmap_printk(kvm_rmap_unmap_hva: spte %p %llx\n, spte, *spte);
 + curr_spte = spte;
 + spte = rmap_next(kvm, rmapp, spte);
 + kvm_unmap_spte(kvm, curr_spte);
 + }
 +}
 +
 +void kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
 +{
 + int i;
 +
 + /*
 +  * If mmap_sem isn't taken, we can look the memslots with only
 +  * the mmu_lock by skipping over the slots with userspace_addr == 0.
 +  */
 + spin_lock(kvm-mmu_lock);
 + for (i = 0; i  kvm-nmemslots; i++) {
 + struct kvm_memory_slot *memslot = kvm-memslots[i];
 + unsigned long start = memslot-userspace_addr;
 + unsigned long end;
 +
 + /* mmu_lock protects userspace_addr */
 + if (!start)
 + continue;
 +
 + end = start + (memslot-npages  PAGE_SHIFT);
 + if (hva = start  hva  end) {
 + gfn_t gfn_offset = (hva - start)  PAGE_SHIFT;
 + kvm_unmap_rmapp(kvm, memslot-rmap[gfn_offset]);
 + }
 + }
 + spin_unlock(kvm-mmu_lock);
 +}
 +
 +static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp)
 +{
 + u64 *spte;
 + int young = 0;
 +
 + spte = rmap_next(kvm, rmapp, NULL);
 + while (spte) {
 + int _young;
 + u64 _spte = *spte;
 + BUG_ON(!(_spte  PT_PRESENT_MASK));
 + _young = _spte  PT_ACCESSED_MASK;
 + if (_young) {
 + young = !!_young;
 + set_shadow_pte(spte, _spte  ~PT_ACCESSED_MASK);
 + }
 + spte = rmap_next(kvm, rmapp, spte);
 + }
 + return young;
 +}
 +
 +int kvm_age_hva(struct kvm *kvm, unsigned long hva)
 +{
 + int i;
 + int young = 0;
 +
 + /*
 +  * If mmap_sem isn't taken, we can look the memslots with only
 +  * the mmu_lock by skipping over the slots with userspace_addr == 0.
 +  */
 + spin_lock(kvm-mmu_lock);
 + for (i = 0; i  kvm-nmemslots; i++) {
 + struct kvm_memory_slot *memslot = kvm-memslots[i];
 + unsigned long start = memslot-userspace_addr;
 + unsigned long end;
 +
 + /* mmu_lock protects userspace_addr */
 + if (!start)
 + continue;
 +
 + end = start + (memslot-npages  PAGE_SHIFT);
 + if (hva = start  hva  end) {
 + gfn_t gfn_offset = (hva - start)  PAGE_SHIFT;
 + young |= kvm_age_rmapp(kvm, memslot-rmap[gfn_offset]);
 + }
 + }
 + spin_unlock(kvm-mmu_lock);
 +
 + if (young)
 + kvm_flush_remote_tlbs(kvm);
 +
 + return young;
 +}
 +
  #ifdef MMU_DEBUG
  static int is_empty_shadow_page(u64 *spt)
  {
 diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
 index 17f9d16..b014b19 100644
 --- a/arch/x86/kvm/paging_tmpl.h
 +++ b/arch/x86/kvm/paging_tmpl.h
 @@ -380,6 +380,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
 addr,
   int r;
   struct page *page;
   int largepage = 0;
 + unsigned mmu_seq;
  
   pgprintk(%s: addr %lx err %x\n, __FUNCTION__, addr, error_code);
   kvm_mmu_audit(vcpu, pre page fault);
 @@ -415,6 +416,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
 addr,
   largepage = 1;
   }
   }
 + mmu_seq =

Re: [kvm-devel] KVM architecture docs

2008-03-03 Thread Zhao Forrest


 http://ols.108.redhat.com/2007/Reprints/kivity-Reprint.pdf

Hi Avi,

I have a question about KVM architecture after reading your paper.
It reads:
..
At the kernel level, the kernel causes the hardware
to enter guest mode. If the processor exits guest
mode due to an event such as an external interrupt
or a shadow page table fault, the kernel performs
the necessary handling and resumes guest execution.
If the exit reason is due to an I/O instruction
or a signal queued to the process, then the kernel
exits to userspace.
..
After reading your paper my understanding of KVM architecture is that
for a particular VM the user mode(QEMU), kernel mode and guest mode share
the same process context from host linux kernel's point of view, right?
If this is the case, see the below example:
1 physical NIC interrupt is received on physical CPU 0 and host kernel
determines that this is a network packet targeted to the emulated NIC
for a VM
2 at the same time this VM is running in guest mode on physical CPU 1
My question is: at this time can host kernel *actively* interrupt VM
and make it run in user mode to handle the incoming network data
packet in QEMU? Or host kernel has to wait for
VM(because of external interrupt or shadow page table fault or I/O
instruction) to quit guest mode and wait for VM to voluntarily detect
that incoming network packet is pending and switch to user space?
A further question is, how a VM detect the incoming pending network
packet? In kernel space or in user space?

Thanks,
Forrest

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] job opportunity available

2008-03-03 Thread Dora Hatcher

Hi,

Since the start of 2008 our organization starts recruiting home workers
willing to take part in well-paying research studies conducted by leading
online businesses. 
Your opinion as a consumer is important for the success and profitability of
many business ventures. That is why they are ready to pay generously for
what you think. 

You can earn from $300 to $600 a week for participating in on-line surveys,
focus group discussions, and product/service evaluations. Become part of our
team and earn: 

- from $5 to $50 for participating in on-line surveys 
- from $20 to $120 for participating in product/service evaluations 
- from $30 to $155 for participating in virtual focus group discussions 

What's best, all you need to work with us is a computer, an Internet
connection, a valid e-mail address, and good English skills. You decide when
to finish your tasks and how much time to devote to each survey assignment.
You can also keep this job for the rest of the year!

If you want to become one of our highly valued survey takers, please write
back to [EMAIL PROTECTED] and I will send you more information.

Sincerely,

April Johnson
Assistant HR
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] KVM Test result, kernel daf4de3.., userspace 724f8a9.. One new issue

2008-03-03 Thread Yunfeng Zhao

Hi, all,

This is today's KVM test result against kvm.git
daf4de30ec718b16798aba07e9f25fd9e6ba9e53 and kvm-userspace.git
724f8a940ec0e78e607c051e6e82ca2f5055b1e1.
In today's testing , save/restore crashed host once on pae/ia32e hosts.
One new issue has been found:
1. blue screen when booting 64bits windows guests
/tracker/index.php?func=detailaid=1906751group_id=180599atid=893831
https://sourceforge.net/tracker/index.php?func=detailaid=1906751group_id=180599atid=893831

Old issues:
2. Can not boot guests with 2.6.9 smp pae kernel
3. qcow based smp linux guests likely hang
https://sourceforge.net/tracker/index.php?func=detailaid=1901980group_id=180599atid=893831
4. Fails to save/restore guests
https://sourceforge.net/tracker/index.php?func=detailaid=1824525group_id=180599atid=893831
5. smp windows installer crashes while rebooting
https://sourceforge.net/tracker/index.php?func=detailaid=1877875group_id=180599atid=893831
6. Timer of guest is inaccurate
https://sourceforge.net/tracker/?func=detailatid=893831aid=1826080group_id=180599
7. Installer of 64bit vista guest will pause for ten minutes after reboot
https://sourceforge.net/tracker/?func=detailatid=893831aid=1836905group_id=180599

Test environment

PlatformWoodcrest
CPU 4
Memory size 8G'

Details

IA32-pae:

1. boot guest with 256M memory PASS
2. boot two windows xp guest PASS
3. boot 4 same guest in parallelPASS
4. boot linux and windows guest in parallel PASS
5. boot guest with 1500M memory PASS
6. boot windows 2003 with ACPI enabled PASS
7. boot Windows xp with ACPI enabled PASS
8. boot Windows 2000 without ACPI PASS
9. kernel build on SMP linux guestFAIL
10. LTP on SMP linux guest FAIL
11. boot base kernel linux PASS
12. save/restore 32-bit HVM guests PASS
13. live migration 32-bit HVM guests PASS
14. boot SMP Windows xp with ACPI enabledPASS
15. boot SMP Windows 2003 with ACPI enabled PASS
16. boot SMP Windows 2000 with ACPI enabled PASS

IA32e:

1. boot four 32-bit guest in
parallel PASS
2. boot four 64-bit guest in
parallel PASS
3. boot 4G 64-bit
guest FAIL
4. boot 4G pae
guest PASS
5. boot 32-bit linux and 32 bit windows guest in parallelPASS
6. boot 32-bit guest with 1500M memory PASS
7. boot 64-bit guest with 1500M memory FAIL
8. boot 32-bit guest with 256M memory PASS
9. boot 64-bit guest with 256M memory FAIL
10. boot two 32-bit windows xp in parallelPASS
11. boot four 32-bit different guest in paraPASS
12. save/restore 64-bit linux guests
PASS
13. save/restore 32-bit linux guests
PASS
14. boot 32-bit SMP windows 2003 with ACPI enabled PASS
15. boot 32-bit SMP Windows 2000 with ACPI enabledPASS
16. boot 32-bit SMP Windows xp with ACPI enabledPASS
17. boot 32-bit Windows 2000 without ACPIPASS
18. boot 64-bit Windows xp with ACPI enabledFAIL
19. boot 32-bit Windows xp without ACPIPASS
20. boot 64-bit UP
vista PASS
21. boot 64-bit SMP
vista FAIL
22. kernel build in 32-bit linux guest OS FAIL
23. kernel build in 64-bit linux guest OS PASS
24. LTP on SMP 32-bit linux guest OSFAIL
25. LTP on SMP 64-bit linux guest OSPASS
26. boot 64-bit guests with ACPI enabled PASS
27. boot 32-bit
x-server PASS
28. boot 64-bit SMP windows XP with ACPI enabled FAIL
29. boot 64-bit SMP windows 2003 with ACPI enabled FAIL
30. live migration 64bit linux
guests PASS
31. live migration 32bit linux
guests PASS

Report Summary on IA32-pae

Summary Test Report of Last Session

Re: [kvm-devel] I/O bandwidth control on KVM

2008-03-03 Thread Ryo Tsuruta

Hi,

 If you are using virtio drivers in the guest (which I presume you are given 
 the reference to /dev/vda), try using the following -drive syntax:

 -drive file=/dev/mapper/ioband1,if=virtio,boot=on,cache=off

 This will force the use of O_DIRECT.  By default, QEMU does not open with 
 O_DIRECT so you'll see page cache effects.

Thank you for your suggestion.
I was using virtio drives as you wrote.
I just thought that kvm would use O_DIRECT option when applications on
the guest opened files with O_DIRECT option.
I'll try the way you mentioned and report back.

Thanks,
Ryo Tsuruta

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [ kvm-Bugs-1906751 ] blue screen when booting 64bits windows guests

Bugs item #1906751, was opened at 2008-03-04 13:24
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1906751group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: yunfeng (yunfeng)
Assigned to: Nobody/Anonymous (nobody)
Summary: blue screen when booting 64bits windows guests

Initial Comment:
Environment:

Host: ia32e rhel5
Guest OS :  ia32e windows
Change Set:kerneldaf4de30ec718b16798aba07e9f25fd9e6ba9e53 
   userspace 724f8a940ec0e78e607c051e6e82ca2f5055b1e1
Hardware:Platform  Woodcrest
 CPU   4
 Memory size   8G'



Bug detailed description:
--
can't boot 64bit windows 2k3/xp guests on 64bit host. Blue screen happens while 
booting 64bit windows.
64bits SMP vista guest can't boot, but UP vista can boot.

Reproduce steps:

1. create qcow image
qemu-img create -b /share/xvs/img/Windows/ia32e_vistaRTM.img -f qcow2
/share/xvs/var/tmp-img
2. boot the guest
qemu-img create -b /share/xvs/img/Windows/kvm/kvm_win2000_up_noacpi_ia32.img -f
qcow2 /share/xvs/var/tmp-img_gbp25_1204570139_1


--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1906751group_id=180599

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [RFC] Notifier for Externally Mapped Memory (EMM)

Stripped things down and did what Andrea and I talked about last Friday.
No invalidate_page callbacks. No ops anymore. Simple linked list for 
notifier. No RCU. Added the code to rmap.h and rmap.c (after all it is 
concerned with handling mappings).



This patch implements a simple callback for device drivers that establish
their own references to pages (KVM, GRU, XPmem, RDMA/Infiniband, DMA engines
etc). These references are unknown to the VM (therefore external).

With these callbacks it is possible for the device driver to release external
references when the VM requests it. This enables swapping, page migration and
allows support of remapping, permission changes etc etc for externally
mapped memory.

With this functionality it becomes possible to avoid pinning or mlocking
pages (commonly done to stop the VM from unmapping pages).

A device driver must subscribe to a process using

emm_register_notifier

The VM will then perform callbacks for operations that unmap or change
permissions of pages in that address space. When the process terminates
the callback function is called with emm_release.

Callbacks are performed before and after the unmapping action of the VM.

emm_invalidate_startbefore
emm_invalidate_end  after

Callbacks are mostly performed in a non atomic context. However, in
various places spinlocks are held to traverse rmaps. So this patch here
is only useful for those devices that can remove mappings in an atomic
context (f.e. KVM/GRU).

If the rmap traversal spinlocks are converted to semaphores then all 
callbacks willbe performed in a nonatomic context. Callouts can stay 
where they are.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

---
 include/linux/mm_types.h |3 +
 include/linux/rmap.h |   51 +
 kernel/fork.c|3 +
 mm/Kconfig   |5 +++
 mm/filemap_xip.c |5 +++
 mm/fremap.c  |2 +
 mm/hugetlb.c |4 ++
 mm/memory.c  |   32 ++--
 mm/mmap.c|3 +
 mm/mprotect.c|3 +
 mm/mremap.c  |5 +++
 mm/rmap.c|   72 ++-
 12 files changed, 183 insertions(+), 5 deletions(-)

Index: linux-2.6/include/linux/mm_types.h
===
--- linux-2.6.orig/include/linux/mm_types.h 2008-03-03 22:54:11.961264684 
-0800
+++ linux-2.6/include/linux/mm_types.h  2008-03-03 22:55:13.333569600 -0800
@@ -225,6 +225,9 @@ struct mm_struct {
/* aio bits */
rwlock_tioctx_list_lock;
struct kioctx   *ioctx_list;
+#ifdef CONFIG_EMM_NOTIFIER
+   struct emm_notifier *emm_notifier;
+#endif
 #ifdef CONFIG_CGROUP_MEM_CONT
struct mem_cgroup *mem_cgroup;
 #endif
Index: linux-2.6/mm/Kconfig
===
--- linux-2.6.orig/mm/Kconfig   2008-03-03 22:54:11.993264520 -0800
+++ linux-2.6/mm/Kconfig2008-03-03 22:55:13.337569625 -0800
@@ -193,3 +193,8 @@ config NR_QUICK
 config VIRT_TO_BUS
def_bool y
depends on !ARCH_NO_VIRT_TO_BUS
+
+config EMM_NOTIFIER
+   def_bool n
+   bool External Mapped Memory Notifier for drivers directly mapping 
memory
+
Index: linux-2.6/mm/mmap.c
===
--- linux-2.6.orig/mm/mmap.c2008-03-03 22:54:12.053265354 -0800
+++ linux-2.6/mm/mmap.c 2008-03-03 22:59:25.522848812 -0800
@@ -1747,11 +1747,13 @@ static void unmap_region(struct mm_struc
lru_add_drain();
tlb = tlb_gather_mmu(mm, 0);
update_hiwater_rss(mm);
+   emm_notify(mm, emm_invalidate_start, start, end);
unmap_vmas(tlb, vma, start, end, nr_accounted, NULL);
vm_unacct_memory(nr_accounted);
free_pgtables(tlb, vma, prev? prev-vm_end: FIRST_USER_ADDRESS,
 next? next-vm_start: 0);
tlb_finish_mmu(tlb, start, end);
+   emm_notify(mm, emm_invalidate_end, start, end);
 }
 
 /*
@@ -2038,6 +2040,7 @@ void exit_mmap(struct mm_struct *mm)
 
/* mm's last user has gone, and its about to be pulled down */
arch_exit_mmap(mm);
+   emm_notify(mm, emm_release, 0, TASK_SIZE);
 
lru_add_drain();
flush_cache_mm(mm);
Index: linux-2.6/mm/mprotect.c
===
--- linux-2.6.orig/mm/mprotect.c2008-03-03 22:54:12.069264942 -0800
+++ linux-2.6/mm/mprotect.c 2008-03-03 22:55:13.337569625 -0800
@@ -21,6 +21,7 @@
 #include linux/syscalls.h
 #include linux/swap.h
 #include linux/swapops.h
+#include linux/rmap.h
 #include asm/uaccess.h
 #include asm/pgtable.h
 #include asm/cacheflush.h
@@ -198,10 +199,12 @@ success:
dirty_accountable = 1;
}
 
+   emm_notify(mm, emm_invalidate_start, start, end);
if

[kvm-devel] [Early draft] Conversion of i_mmap_lock to semaphore