[PATCH] kvm-390: fix wait_queue handling

2009-07-16 Thread Christian Bornträger
From: Christian Borntraeger 

There are two waitqueues in kvm for wait handling:
vcpu->wq for virt/kvm/kvm_main.c and
vpcu->arch.local_int.wq for the s390 specific wait code.

the wait handling in kvm_s390_handle_wait was broken by using different
wait_queues for add_wait queue and remove_wait_queue.

There are two options to fix the problem: 
o  move all the s390 specific code to vcpu->wq and remove
   vcpu->arch.local_int.wq
o  move all the s390 specific code to vcpu->arch.local_int.wq

This patch chooses the 2nd variant for two reasons:
o  s390 does not use kvm_vcpu_block but implements its own enabled wait
   handling.
   Having a separate wait_queue make it clear, that our wait mechanism is
   different
o  the patch is much smaller

Report-by:  Julia Lawall 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/kvm/interrupt.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: kvm/arch/s390/kvm/interrupt.c
===
--- kvm.orig/arch/s390/kvm/interrupt.c
+++ kvm/arch/s390/kvm/interrupt.c
@@ -386,7 +386,7 @@ no_timer:
}
__unset_cpu_idle(vcpu);
__set_current_state(TASK_RUNNING);
-   remove_wait_queue(&vcpu->wq, &wait);
+   remove_wait_queue(&vcpu->arch.local_int.wq, &wait);
spin_unlock_bh(&vcpu->arch.local_int.lock);
spin_unlock(&vcpu->arch.local_int.float_int->lock);
hrtimer_try_to_cancel(&vcpu->arch.ckc_timer);

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] virtio_test: A module for testing virtio via userspace

2009-06-24 Thread Christian Bornträger
Am Mittwoch 24 Juni 2009 05:40:34 schrieb Rusty Russell:
> > o  the general idea of a virtio_test module
> > o  the user interface ioctls
> > o  further ideas and comments
>
> Not mugging real drivers would be a requirement, I think.

Ok, I try to find a proper way to avoid that virtio_test binds to devices that 
have real drivers available. That patch to virtio_dev_match should be 
relatively 
easy.
The open question I have:  Should virtio_test bind to a device if no other 
driver is (yet) available?

> > +config VIRTIO_TEST
> > +   tristate "Virtio test driver (EXPERIMENTAL)"
> > +   select VIRTIO
> > +   select VIRTIO_RING
>
> Perhaps these should be depends?  Plus, depends on EXPERIMENTAL.
>
> > +If unsure, say M.
>
> That's "N" I think.

Yes. 

>
> > +   case VIOTEST_IOCGETBUF:
> > +   ret = do_get_buf(vtest, (struct viotest_getbuf __user *) arg);
> > +   break;
> > +   case VIOTEST_IOCGETCBS:
> > +   ret = get_callbacks(vtest, (struct viotest_cbinfo __user *) 
> > arg);
> > +   break;
>
> Generally the point of callbacks is to tell you you have new buffers; in
> fact you're insulated from callbacks which don't show new buffers.  So I'm
> not sure these two need to be separate?
> In which case, a read/write interface starts to make sense (write for
> addbuf and kick, read for get_buf).  That fits nicely with O_NONBLOCK and
> poll().

Hmm - makes sense. I will try to propose a 2nd version of the interface. The 
interface must handle multiple virtqueues per device, should allow non-blocking 
mode etc. Lets see what ideas come to my mind.

Thanks for the comments

Christian
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/3] fixes for kvm on s390

2009-06-24 Thread Christian Bornträger
Am Mittwoch 24 Juni 2009 11:18:32 schrieb Avi Kivity:
> On 06/24/2009 11:18 AM, Christian Bornträger wrote:
> > Yes, the stfle issue is present on linus git and should go to Linus.
> > Fixes 1 and 3 are only for your kvm.git-tree. They should go to Linus in
> > as soon as you push the referenced commits to Linus.
>
> That'll be 2.6.32.  I generally fold commits that fix other commits in
> the same pull.

That would be fine with me.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/3] fixes for kvm on s390

2009-06-24 Thread Christian Bornträger
Am Mittwoch 24 Juni 2009 10:09:18 schrieben Sie:
> On 06/23/2009 06:24 PM, Christian Borntraeger wrote:
> > Hello Avi,
> >
> > here are three patches against kvm.git that fix several issues in our
> > kvm port. Please review and consider all patches for 2.6.31.
>
> Applied all, and queued the stfle patch for 2.6.31.  The commits
> referenced in patch 1 and 3 doesn't exist in 2.6.31.  Please correct me
> if I misread things.

Yes, the stfle issue is present on linus git and should go to Linus.

Fixes 1 and 3 are only for your kvm.git-tree. They should go to Linus in as 
soon 
as you push the referenced commits to Linus.

Christian
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication

2009-06-23 Thread Christian Bornträger
Am Dienstag 23 Juni 2009 16:16:13 schrieb Paul Brook:
> > I did some work on virtio-console, since kvm on s390 does not provide any
> > other. I dont think we should mix two different types of devices into one
> > driver. The only thing that these drivers have in common, is the fact
> > that there are two virtqueues, piping data (single bytes or larger
> > chunks). So you could make the same argument with the first virtio_net
> > driver (the one before GSO) - which is obviously wrong. The common part
> > of the transport is already factored out to virtio_ring and the
> > transports.
>
> virtio-net is packet based, not stream based.

You can argue that virtio-console is also packet based. The input buffer can 
accept up to 4K in one buffer and the console code can also submit larger 
chunks 
to virtio_console.

> > In addition there are two ABIs involved: a userspace ABI (/dev/hvc0) and
> > a guest/host ABI for this console. (and virtio was not meant to be a
> > KVM-only interface, that we can change all the time). David A. Wheeler's
> > 'SLOCCount' gives me 141 lines of code for virtio_console.c. I am quite
> > confident that the saving we could achieve by merging these two drivers
> > is not worth the hazzle.
>
> AFAICS the functionality provided is exactly the same. The host API is
> identical, and the guest userspace API only has trivial differences (which
> could be eliminated with a simple udev rule). By my reading virtio-serial
> makes virtio-console entirely redundant.

How can you know, that the userspace API only has trivial differences, if the 
question below is not answered?

> > Discussion about merging the console code into this distracts from the
> > main problem: To get the interface and functionality right before it
> > becomes an ABI (is it /dev/ttyS, network like or is it something
> > completely different?).
>
> Ah, now that's a different question. I don't know what the requirements are
> for the higher level vmchannel interface. However I also don't care.

You should care, because it might have an impact if two serial lines are really 
the right solution for the vmchannel.

One thing that I forgot:
You should be warned that hvc_console sometimes can be a real PITA. A while ago 
I tried to change virtio_console to support more than one console and hotplug 
and failed to find a proper solution that can handle all the subtle  
console/tty 
register/unregister combinations. You dont want to adopt new code to fit to 
hvc_console - leave it in virtio_console...


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication

2009-06-23 Thread Christian Bornträger
Am Dienstag 23 Juni 2009 14:55:52 schrieb Paul Brook:
> > Here are two patches. One implements a virtio-serial device in qemu
> > and the other is the driver for a guest kernel.
>
> So I'll ask again. Why is this separate from virtio-console?

I did some work on virtio-console, since kvm on s390 does not provide any other.
I dont think we should mix two different types of devices into one driver. The 
only thing that these drivers have in common, is the fact that there are two 
virtqueues, piping data (single bytes or larger chunks). So you could make the 
same argument with the first virtio_net driver (the one before GSO) - which is 
obviously wrong. The common part of the transport is already factored out to 
virtio_ring and the transports.

In addition there are two ABIs involved: a userspace ABI (/dev/hvc0) and a 
guest/host ABI for this console. (and virtio was not meant to be a KVM-only 
interface, that we can change all the time). David A. Wheeler's 'SLOCCount' 
gives me 141 lines of code for virtio_console.c. I am quite confident that the 
saving we could achieve by merging these two drivers is not worth the hazzle.

Discussion about merging the console code into this distracts from the main 
problem: To get the interface and functionality right before it becomes an ABI 
(is it /dev/ttyS, network like or is it something completely different?).

Christian



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC] virtio_test: A module for testing virtio via userspace

2009-06-19 Thread Christian Bornträger
Hello Rusty,

this is a result of a two month internship about virtio testing.

From: Adrian Schneider 
From: Tim Hofmann 
From: Christian Ehrhardt 
From: Christian Borntraeger 

This patch introduces a prototype for a virtio_test module. This module can
be bound to any virtio device via sysfs bind/unbind feature, e.g:
$ echo virtio1 > /sys/bus/virtio/drivers/virtio_rng/unbind
$ modprobe virtio_test

On probe this module registers to all virtqueues and creates a character
device for every virtio device. (/dev/viotest).
The character device offers ioctls to allow a userspace application to submit
virtio operations like addbuf, kick and getbuf. It also offers ioctls to get
information about the device and to query the amount of occurred callbacks (or
wait synchronously on callbacks).

The driver currently lacks the following planned features:
o  userspace tooling for fuzzing (a prototype exists)
o  feature bit support
o  support arbitrary pointer mode in add_buf (e.g. test how qemu deals with
   iovecs pointing beyond the guest memory size)
o  priority binding with other virtio drivers (e.g. if virtio_blk and 
   virtio_test are compiled into the kernel, virtio_blk should get all block
   devices by default on hotplug)

I would like to get feedback on 

o  the general idea of a virtio_test module
o  the user interface ioctls
o  further ideas and comments 

Signed-off-by: Christian Borntraeger 
---
 drivers/virtio/Kconfig   |   12 
 drivers/virtio/Makefile  |2 
 drivers/virtio/virtio_test.c |  710 +++
 include/linux/Kbuild |1 
 include/linux/virtio_test.h  |  146 
 5 files changed, 871 insertions(+)

Index: linux-2.6/drivers/virtio/Kconfig
===
--- linux-2.6.orig/drivers/virtio/Kconfig
+++ linux-2.6/drivers/virtio/Kconfig
@@ -33,3 +33,15 @@ config VIRTIO_BALLOON
 
 If unsure, say M.
 
+config VIRTIO_TEST
+   tristate "Virtio test driver (EXPERIMENTAL)"
+   select VIRTIO
+   select VIRTIO_RING
+   ---help---
+This driver supports testing arbitrary virtio devices. The drivers
+offers IOCTLs to run add_buf/get_buf etc. from userspace. You can
+bind/unbind any unused virtio device to this driver via sysfs. Each
+bound device will get a /dev/viotest* device node.
+
+If unsure, say M.
+
Index: linux-2.6/drivers/virtio/Makefile
===
--- linux-2.6.orig/drivers/virtio/Makefile
+++ linux-2.6/drivers/virtio/Makefile
@@ -2,3 +2,5 @@ obj-$(CONFIG_VIRTIO) += virtio.o
 obj-$(CONFIG_VIRTIO_RING) += virtio_ring.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
+obj-$(CONFIG_VIRTIO_TEST) += virtio_test.o
+
Index: linux-2.6/drivers/virtio/virtio_test.c
===
--- /dev/null
+++ linux-2.6/drivers/virtio/virtio_test.c
@@ -0,0 +1,710 @@
+/*
+ *  Test driver for the virtio bus
+ *
+ *Copyright IBM Corp. 2009
+ *Author(s): Adrian Schneider 
+ *   Tim Hofmann 
+ *   Christian Ehrhardt 
+ *   Christian Borntraeger 
+ */
+
+
+#define KMSG_COMPONENT "virtio_test"
+#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static u32 viotest_major = VIOTEST_MAJOR;
+static struct class *viotest_class;
+static LIST_HEAD(viotest_list);
+static spinlock_t viotest_list_lock = SPIN_LOCK_UNLOCKED;
+
+static void free_kvec(struct kvec *kiov, u32 index)
+{
+   u32 i;
+
+   for (i = 0; i < index; i++)
+   kfree(kiov[i].iov_base);
+
+   kfree(kiov);
+}
+
+/*
+ * This function copies a userspace iovec * array into a kernel kvec * array
+ */
+static int copy_iovec_from_user(struct kvec **kiov, struct iovec __user *uiov,
+   u32 uiov_num)
+{
+   u32 i;
+   u64 kiov_sz;
+   struct iovec uservec;
+
+   kiov_sz = sizeof(struct kvec) * uiov_num;
+   *kiov = kmalloc(kiov_sz, GFP_KERNEL);
+   if (!(*kiov))
+   return -ENOMEM;
+
+   for (i = 0; i < uiov_num; i++) {
+   if (copy_from_user(&uservec, &uiov[i], sizeof(struct iovec))) {
+   free_kvec(*kiov, i);
+   return -EFAULT;
+   }
+   (*kiov)[i].iov_base = kmalloc(uservec.iov_len, GFP_KERNEL);
+   if (!(*kiov)[i].iov_base) {
+   free_kvec(*kiov, i);
+   return -ENOMEM;
+   }
+
+   if (copy_from_user((*kiov)[i].iov_base, uservec.iov_base, 
uservec.iov_len)) {
+   free_kvec(*kiov, i);
+   return -EFAULT;
+   }
+   (*kiov)[i].iov_len = uservec.iov_len;
+   }
+
+   return 0;
+}
+
+static int copy_kvec_to_user(struct iovec 

Re: [PATCH] KVM: add localversion to avoid confusion and conflicts

2009-05-29 Thread Christian Bornträger
Am Freitag 29 Mai 2009 10:43:46 schrieb Jaswinder Singh Rajput:
> > > Adding localversion avoids confusion in kernel images :
> > NAK from my side. If you need a distinction, there is always
> Here is NAK for your NAK from my side.
> This patch is only for KVM tree and not for linus tree.

I know that this is for the kvm tree.
I personally dont like to have a forced localversion in my kernel trees - it 
might break my tooling. 
Anyway, If it really makes your life better I can live with these localversion 
files and adopt my tooling.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: add localversion to avoid confusion and conflicts

2009-05-29 Thread Christian Bornträger
Am Freitag 29 Mai 2009 09:18:14 schrieb Jaswinder Singh Rajput:
> Adding localversion avoids confusion in kernel images :
>
> like Linux version 2.6.30-rc7 does not tell whether it is linus or kvm
> kernel.
>
> By adding localversion it tells :
>
> Linux version 2.6.30-rc7-kvm , any doubt ;-)
> I am inspired by Ingo's -tip, I am sure Ingo will tell more advantages,
> if these are not enough :-)
[...]
> diff --git a/localversion-kvm b/localversion-kvm
> new file mode 100644
> index 000..d969ff0
> --- /dev/null
> +++ b/localversion-kvm
> @@ -0,0 +1 @@
> +-kvm

NAK from my side. If you need a distinction, there is always 
CONFIG_LOCALVERSION_AUTO. If you need this kind of prefix, there is always  
CONFIG_LOCALVERSION.

Christian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] kvm-s390: streamline memslot handling

2009-05-26 Thread Christian Bornträger
Am Dienstag 26 Mai 2009 09:57:58 schrieb Avi Kivity:
> > I could add that behaviour, but that could make our normal interrupt
> > handling much slower. Therefore I don't want to call that function,
> > but on the other hand I like the "skip if the request is already set"
> > functionality and think about adding that in my loop.
>
> I don't understand why it would affect your interrupt handling.  We need

As far as I understand x86, every host interrupt causes a guest exit.

On s390 the SIE instruction is interruptible. On a host interrupt (like an 
IPI)  the host interrupt handler runs and finally jumps back into the SIE 
instruction.  The hardware will continue with guest execution. This has the 
advantage, that we dont have to load/save guest and host registers on host 
interrupts. (the low level interrupt handler saves the registers of the 
interrupted context)

In our low-level interrupt handler we do check for signal_pending, 
machine_check_pending and need_resched to leave the sie instruction. For 
anything else a the host sees a cpu bound guest always in the SIE instruction. 

Christian


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/4] move irq protection role to separate lock v2

2009-05-20 Thread Christian Bornträger
Am Donnerstag 21 Mai 2009 06:50:15 schrieb Marcelo Tosatti:
> But I fail to see the case where vcpu creation is a fast path (unless
> you're benchmarking cpu hotplug/hotunplug).

[...]

> @@ -2053,6 +2054,9 @@ static long kvm_vm_ioctl(struct file *fi
>
>   if (kvm->mm != current->mm)
>   return -EIO;
> +
> + mutex_lock(&kvm->vm_ioctl_lock);
> +
>   switch (ioctl) {
>   case KVM_CREATE_VCPU:
>   r = kvm_vm_ioctl_create_vcpu(kvm, arg);
> @@ -2228,6 +2232,7 @@ static long kvm_vm_ioctl(struct file *fi
>   r = kvm_arch_vm_ioctl(filp, ioctl, arg);
>   }
>  out:
> + mutex_unlock(&kvm->vm_ioctl_lock);
>   return r;
>  }

The thing that looks worrysome is that the s390 version of kvm_arch_vm_ioctl 
has KVM_S390_INTERRUPT. This allows userspace to inject interrupts - which 
would be serialized. The thing is, that external interrupts and I/O interrupts 
are floating - which means they can arrive on all cpus. This is somewhat of a 
fast path.
On the other hand, kvm_s390_inject_vm already takes the kvm->lock to protect 
agains hotplug. With this patch we might be able to remove the kvm->lock in 
kvm_s390_inject_vm - that would reduce the impact. 

This needs more thinking on our side.

Christian
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-20 Thread Christian Bornträger
Am Mittwoch 20 Mai 2009 11:11:57 schrieb Avi Kivity:
> > Yes, KSM is easier and it even finds duplicate data pages.
> > On the other hand it does only provide memory saving. It does not speedup
> > application startup like execute-in-place (major page faults become minor
> > page faults for text pages if the page is already backed by the host) I
> > am not claiming that KSM is useless. Depending on the scenario you might
> > want the one or the other or even both. For typical desktop use, KSM is
> > very likely the better approach
>
> If ksm shares pagecache, then doesn't it become effectively XIP?

Not exactly, only for long running guests with stable working set. If the 
guest boots up, its page cache is basically empty, but the shared segment is 
populated. its the startup where xip wins. Same is true for guests with 
quickly changing working sets. 

> We could also hook virtio dma to preemptively share pages somehow.

Yes, that is something to think about. One idea that is used on z/VM by lot of 
customers is to have a shared disk read-only for /usr that is cached by the 
host.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-20 Thread Christian Bornträger
Am Mittwoch 20 Mai 2009 10:45:50 schrieb Avi Kivity:
> Christian Bornträger wrote:
> > o shared guest kernels: The CMS operating system is build as a bootable
> > DCSS (called named-saved-segments NSS). All guests have the same host
> > pages for the read-only parts of the CMS kernel. The local data is stored
> > in exclusive-write parts of the same NSS. Linux on System z is also
> > capable of using this feature (CONFIG_SHARED_KERNEL). The kernel linkage
> > is changed in a way to separate the read-only text segment from the other
> > parts with segment size alignment
>
> How does patching (smp, kprobes/jprobes, markers/ftrace) work with this?
It does not. :-) 
Because of that and since most distro kernels are fully modular and kernel 
updates are another problem this feature is not used very often for Linux. It 
is used heavily in CMS, though.
Actually, we could do COW in the host but then it is really not worth the 
effort.

> > o execute-in-place: This is a Linux feature to exploit the DCSS
> > technology. The goal is to shared identical guest pages without the
> > additional overhead of KSM etc. We have a block device driver for DCSS.
> > This block device driver supports the direct_access function and
> > therefore allows to use the xip option of ext2. The idea is to put 
> > binaries into an read-only ext2 filesystem. Whenever an mmap is made on
> > this file system, the page is not mapped into the page cache. The ptes
> > point into the DCSS memory instead. Since the DCSS is demand-paged by the
> > host no memory is wasted for unused parts of the binaries. In case of COW
> > the page is copied as usual. It turned out that installations with many
> > similar guests (lets say 400 guests) will profit in terms of memory
> > saving and quicker application startups (not the first guest of course).
> > There is a downside: this requires a skilled administrator to setup.
>
> ksm might be easier to admin, at the cost of some cpu time.

Yes, KSM is easier and it even finds duplicate data pages.
On the other hand it does only provide memory saving. It does not speedup 
application startup like execute-in-place (major page faults become minor page 
faults for text pages if the page is already backed by the host)
I am not claiming that KSM is useless. Depending on the scenario you might 
want the one or the other or even both. For typical desktop use, KSM is very 
likely the better approach.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-20 Thread Christian Bornträger
Am Dienstag 19 Mai 2009 20:39:24 schrieb Anthony Liguori:
> Perhaps something that maps closer to the current add_buf/get_buf API.
> Something like:
>
> struct iovec *(*map_buf)(struct virtqueue *vq, unsigned int *out_num,
> unsigned int *in_num);
> void (*unmap_buf)(struct virtqueue *vq, struct iovec *iov, unsigned int
> out_num, unsigned int in_num);
>
> There's symmetry here which is good.  The one bad thing about it is
> forces certain memory to be read-only and other memory to be
> read-write.  I don't see that as a bad thing though.
>
> I think we'll need an interface like this so support driver domains too
> since "backend".  To put it another way, in QEMU, map_buf ==
> virtqueue_pop and unmap_buf == virtqueue_push.


You are proposing that the guest should define some guest memory to be used as 
shared memory (some kind of replacement), right? This is fine, as long as we 
can _also_ map host memory somewhere else (e.g. after guest memory, above 1TB 
etc.). I definitely want to be able to have an 64MB guest map an 2GB shared 
memory zone. (See my other mail about the execute-in-place via DCSS use case).


I think we should start to write down some requirements. This will help to get 
a better understanding of the necessary interface:
here are my first ideas:

o allow to map host-shared-memory to anyplace that can be addressed via a PFN
o allow to map beyond guest storage
o allow to replace guest memory
o read-only and read/write modes
o driver interface should not depend on hardware specific stuff (e.g. prefer 
generic virtio over PCI)

More ideas are welcome.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-20 Thread Christian Bornträger
Am Mittwoch 20 Mai 2009 04:58:38 schrieb Rusty Russell:
> On Wed, 20 May 2009 02:21:08 am Cam Macdonell wrote:
> > Avi Kivity wrote:
> > > Christian Bornträger wrote:
> > >>> To summarize, Anthony thinks it should use virtio, while I believe
> > >>> virtio is useful for exporting guest memory, not for importing host
> > >>> memory.
>
> Yes, precisely.
>
> But what's it *for*, this shared memory?  Implementing shared memory is
> trivial.  Using it is harder.  For example, inter-guest networking: you'd
> have to copy packets in and out, making it slow as well as losing
> abstraction.
>
> The only interesting idea I can think of is exposing it to userspace, and
> having that run some protocol across it for fast app <-> app comms.  But if
> that's your plan, you still have a lot of code the write!
>
> So I guess I'm missing the big picture here?

I can give some insights about shared memory usage in z/VM. z/VM uses so-
called discontiguous saved segments (DCSS) to shared memory between guests.
(naming side note:
o discontigous because these segments can have holes and different 
access
  rights, e.g. you can build DCSS that go from 800M-801M read only and
  900M-910M exclusive-write.
o segments because the 2nd level of our page tables is called segment 
table.
 )

z/VM uses these segments for several purposes:
o The monitoring subsystem uses a DCSS to get data from several components
o shared guest kernels: The CMS operating system is build as a bootable DCSS
  (called named-saved-segments NSS). All guests have the same host pages for
  the read-only parts of the CMS kernel. The local data is stored in
  exclusive-write parts of the same NSS. Linux on System z is also capable of
  using this feature (CONFIG_SHARED_KERNEL). The kernel linkage is changed in
  a way to separate the read-only text segment from the other parts with
  segment size alignment
o execute-in-place: This is a Linux feature to exploit the DCSS technology.
  The goal is to shared identical guest pages without the additional overhead
  of KSM etc. We have a block device driver for DCSS. This block device driver
  supports the direct_access function and therefore allows to use the xip
  option of ext2. The idea is to put  binaries into an read-only ext2
  filesystem. Whenever an mmap is made on this file system, the page is not
  mapped into the page cache. The ptes point into the DCSS memory instead.
  Since the DCSS is demand-paged by the host no memory is wasted for unused
  parts of the binaries. In case of COW the page is copied as usual. It turned
  out that installations with many similar guests (lets say 400 guests) will
  profit in terms of memory saving and quicker application startups (not the
  first guest of course). There is a downside: this requires a skilled
  administrator to setup.

We have also experimented with network, Posix shared memory, and shared caches 
via DCSS. Most of these ideas turned out to be not very useful or hard to 
implement proper.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.

2009-05-19 Thread Christian Bornträger
Am Montag 18 Mai 2009 16:26:15 schrieb Avi Kivity:
> Christian Borntraeger wrote:
> > Sorry for the late question, but I missed your first version. Is there a
> > way to change that code to use virtio instead of PCI? That would allow us
> > to use this driver on s390 and maybe other virtio transports.
>
> Opinion differs.  See the discussion in
> http://article.gmane.org/gmane.comp.emulators.kvm.devel/30119.
>
> To summarize, Anthony thinks it should use virtio, while I believe
> virtio is useful for exporting guest memory, not for importing host memory.

I think the current virtio interface is not ideal for importing host memory, 
but we can change that. If you look at the dcssblk driver for s390, it allows 
a guest to map shared memory segments via a diagnose (hypercall). This driver 
uses PCI regions to map memory.

My point is, that the method to map memory is completely irrelevant, we just 
need something like mmap/shmget between the guest and the host. We could 
define an interface in virtio, that can be used by any transport. In case of 
pci this could be a simple pci map operation. 

What do you think about something like: (CCed Rusty)
---
 include/linux/virtio.h |   26 ++
 1 file changed, 26 insertions(+)

Index: linux-2.6/include/linux/virtio.h
===
--- linux-2.6.orig/include/linux/virtio.h
+++ linux-2.6/include/linux/virtio.h
@@ -71,6 +71,31 @@ struct virtqueue_ops {
 };
 
 /**
+ * virtio_device_ops - operations for virtio devices
+ * @map_region: map host buffer at a given address
+ * vdev: the struct virtio_device we're talking about.
+ * addr: The address where the buffer should be mapped (hint only)
+ * length: THe length of the mapping
+ * identifier: the token that identifies the host buffer
+ *  Returns the mapping address or an error pointer.
+ * @unmap_region: unmap host buffer from the address
+ * vdev: the struct virtio_device we're talking about.
+ * addr: The address where the buffer is mapped
+ *  Returns 0 on success or an error
+ *
+ * TBD, we might need query etc.
+ */
+struct virtio_device_ops {
+   void * (*map_region)(struct virtio_device *vdev,
+void *addr,
+size_t length,
+int identifier);
+   int (*unmap_region)(struct virtio_device *vdev, void *addr);
+/* we might need query region and other stuff */
+};
+
+
+/**
  * virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
  * @dev: underlying device.
@@ -85,6 +110,7 @@ struct virtio_device
struct device dev;
struct virtio_device_id id;
struct virtio_config_ops *config;
+   struct virtio_device_ops *ops;
/* Note that this is a Linux set_bit-style bitmap. */
unsigned long features[1];
void *priv;



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html