[PATCH] kvm-390: fix wait_queue handling
From: Christian Borntraeger There are two waitqueues in kvm for wait handling: vcpu->wq for virt/kvm/kvm_main.c and vpcu->arch.local_int.wq for the s390 specific wait code. the wait handling in kvm_s390_handle_wait was broken by using different wait_queues for add_wait queue and remove_wait_queue. There are two options to fix the problem: o move all the s390 specific code to vcpu->wq and remove vcpu->arch.local_int.wq o move all the s390 specific code to vcpu->arch.local_int.wq This patch chooses the 2nd variant for two reasons: o s390 does not use kvm_vcpu_block but implements its own enabled wait handling. Having a separate wait_queue make it clear, that our wait mechanism is different o the patch is much smaller Report-by: Julia Lawall Signed-off-by: Christian Borntraeger --- arch/s390/kvm/interrupt.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: kvm/arch/s390/kvm/interrupt.c === --- kvm.orig/arch/s390/kvm/interrupt.c +++ kvm/arch/s390/kvm/interrupt.c @@ -386,7 +386,7 @@ no_timer: } __unset_cpu_idle(vcpu); __set_current_state(TASK_RUNNING); - remove_wait_queue(&vcpu->wq, &wait); + remove_wait_queue(&vcpu->arch.local_int.wq, &wait); spin_unlock_bh(&vcpu->arch.local_int.lock); spin_unlock(&vcpu->arch.local_int.float_int->lock); hrtimer_try_to_cancel(&vcpu->arch.ckc_timer); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] virtio_test: A module for testing virtio via userspace
Am Mittwoch 24 Juni 2009 05:40:34 schrieb Rusty Russell: > > o the general idea of a virtio_test module > > o the user interface ioctls > > o further ideas and comments > > Not mugging real drivers would be a requirement, I think. Ok, I try to find a proper way to avoid that virtio_test binds to devices that have real drivers available. That patch to virtio_dev_match should be relatively easy. The open question I have: Should virtio_test bind to a device if no other driver is (yet) available? > > +config VIRTIO_TEST > > + tristate "Virtio test driver (EXPERIMENTAL)" > > + select VIRTIO > > + select VIRTIO_RING > > Perhaps these should be depends? Plus, depends on EXPERIMENTAL. > > > +If unsure, say M. > > That's "N" I think. Yes. > > > + case VIOTEST_IOCGETBUF: > > + ret = do_get_buf(vtest, (struct viotest_getbuf __user *) arg); > > + break; > > + case VIOTEST_IOCGETCBS: > > + ret = get_callbacks(vtest, (struct viotest_cbinfo __user *) > > arg); > > + break; > > Generally the point of callbacks is to tell you you have new buffers; in > fact you're insulated from callbacks which don't show new buffers. So I'm > not sure these two need to be separate? > In which case, a read/write interface starts to make sense (write for > addbuf and kick, read for get_buf). That fits nicely with O_NONBLOCK and > poll(). Hmm - makes sense. I will try to propose a 2nd version of the interface. The interface must handle multiple virtqueues per device, should allow non-blocking mode etc. Lets see what ideas come to my mind. Thanks for the comments Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/3] fixes for kvm on s390
Am Mittwoch 24 Juni 2009 11:18:32 schrieb Avi Kivity: > On 06/24/2009 11:18 AM, Christian Bornträger wrote: > > Yes, the stfle issue is present on linus git and should go to Linus. > > Fixes 1 and 3 are only for your kvm.git-tree. They should go to Linus in > > as soon as you push the referenced commits to Linus. > > That'll be 2.6.32. I generally fold commits that fix other commits in > the same pull. That would be fine with me. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/3] fixes for kvm on s390
Am Mittwoch 24 Juni 2009 10:09:18 schrieben Sie: > On 06/23/2009 06:24 PM, Christian Borntraeger wrote: > > Hello Avi, > > > > here are three patches against kvm.git that fix several issues in our > > kvm port. Please review and consider all patches for 2.6.31. > > Applied all, and queued the stfle patch for 2.6.31. The commits > referenced in patch 1 and 3 doesn't exist in 2.6.31. Please correct me > if I misread things. Yes, the stfle issue is present on linus git and should go to Linus. Fixes 1 and 3 are only for your kvm.git-tree. They should go to Linus in as soon as you push the referenced commits to Linus. Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication
Am Dienstag 23 Juni 2009 16:16:13 schrieb Paul Brook: > > I did some work on virtio-console, since kvm on s390 does not provide any > > other. I dont think we should mix two different types of devices into one > > driver. The only thing that these drivers have in common, is the fact > > that there are two virtqueues, piping data (single bytes or larger > > chunks). So you could make the same argument with the first virtio_net > > driver (the one before GSO) - which is obviously wrong. The common part > > of the transport is already factored out to virtio_ring and the > > transports. > > virtio-net is packet based, not stream based. You can argue that virtio-console is also packet based. The input buffer can accept up to 4K in one buffer and the console code can also submit larger chunks to virtio_console. > > In addition there are two ABIs involved: a userspace ABI (/dev/hvc0) and > > a guest/host ABI for this console. (and virtio was not meant to be a > > KVM-only interface, that we can change all the time). David A. Wheeler's > > 'SLOCCount' gives me 141 lines of code for virtio_console.c. I am quite > > confident that the saving we could achieve by merging these two drivers > > is not worth the hazzle. > > AFAICS the functionality provided is exactly the same. The host API is > identical, and the guest userspace API only has trivial differences (which > could be eliminated with a simple udev rule). By my reading virtio-serial > makes virtio-console entirely redundant. How can you know, that the userspace API only has trivial differences, if the question below is not answered? > > Discussion about merging the console code into this distracts from the > > main problem: To get the interface and functionality right before it > > becomes an ABI (is it /dev/ttyS, network like or is it something > > completely different?). > > Ah, now that's a different question. I don't know what the requirements are > for the higher level vmchannel interface. However I also don't care. You should care, because it might have an impact if two serial lines are really the right solution for the vmchannel. One thing that I forgot: You should be warned that hvc_console sometimes can be a real PITA. A while ago I tried to change virtio_console to support more than one console and hotplug and failed to find a proper solution that can handle all the subtle console/tty register/unregister combinations. You dont want to adopt new code to fit to hvc_console - leave it in virtio_console... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication
Am Dienstag 23 Juni 2009 14:55:52 schrieb Paul Brook: > > Here are two patches. One implements a virtio-serial device in qemu > > and the other is the driver for a guest kernel. > > So I'll ask again. Why is this separate from virtio-console? I did some work on virtio-console, since kvm on s390 does not provide any other. I dont think we should mix two different types of devices into one driver. The only thing that these drivers have in common, is the fact that there are two virtqueues, piping data (single bytes or larger chunks). So you could make the same argument with the first virtio_net driver (the one before GSO) - which is obviously wrong. The common part of the transport is already factored out to virtio_ring and the transports. In addition there are two ABIs involved: a userspace ABI (/dev/hvc0) and a guest/host ABI for this console. (and virtio was not meant to be a KVM-only interface, that we can change all the time). David A. Wheeler's 'SLOCCount' gives me 141 lines of code for virtio_console.c. I am quite confident that the saving we could achieve by merging these two drivers is not worth the hazzle. Discussion about merging the console code into this distracts from the main problem: To get the interface and functionality right before it becomes an ABI (is it /dev/ttyS, network like or is it something completely different?). Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH/RFC] virtio_test: A module for testing virtio via userspace
Hello Rusty, this is a result of a two month internship about virtio testing. From: Adrian Schneider From: Tim Hofmann From: Christian Ehrhardt From: Christian Borntraeger This patch introduces a prototype for a virtio_test module. This module can be bound to any virtio device via sysfs bind/unbind feature, e.g: $ echo virtio1 > /sys/bus/virtio/drivers/virtio_rng/unbind $ modprobe virtio_test On probe this module registers to all virtqueues and creates a character device for every virtio device. (/dev/viotest). The character device offers ioctls to allow a userspace application to submit virtio operations like addbuf, kick and getbuf. It also offers ioctls to get information about the device and to query the amount of occurred callbacks (or wait synchronously on callbacks). The driver currently lacks the following planned features: o userspace tooling for fuzzing (a prototype exists) o feature bit support o support arbitrary pointer mode in add_buf (e.g. test how qemu deals with iovecs pointing beyond the guest memory size) o priority binding with other virtio drivers (e.g. if virtio_blk and virtio_test are compiled into the kernel, virtio_blk should get all block devices by default on hotplug) I would like to get feedback on o the general idea of a virtio_test module o the user interface ioctls o further ideas and comments Signed-off-by: Christian Borntraeger --- drivers/virtio/Kconfig | 12 drivers/virtio/Makefile |2 drivers/virtio/virtio_test.c | 710 +++ include/linux/Kbuild |1 include/linux/virtio_test.h | 146 5 files changed, 871 insertions(+) Index: linux-2.6/drivers/virtio/Kconfig === --- linux-2.6.orig/drivers/virtio/Kconfig +++ linux-2.6/drivers/virtio/Kconfig @@ -33,3 +33,15 @@ config VIRTIO_BALLOON If unsure, say M. +config VIRTIO_TEST + tristate "Virtio test driver (EXPERIMENTAL)" + select VIRTIO + select VIRTIO_RING + ---help--- +This driver supports testing arbitrary virtio devices. The drivers +offers IOCTLs to run add_buf/get_buf etc. from userspace. You can +bind/unbind any unused virtio device to this driver via sysfs. Each +bound device will get a /dev/viotest* device node. + +If unsure, say M. + Index: linux-2.6/drivers/virtio/Makefile === --- linux-2.6.orig/drivers/virtio/Makefile +++ linux-2.6/drivers/virtio/Makefile @@ -2,3 +2,5 @@ obj-$(CONFIG_VIRTIO) += virtio.o obj-$(CONFIG_VIRTIO_RING) += virtio_ring.o obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o +obj-$(CONFIG_VIRTIO_TEST) += virtio_test.o + Index: linux-2.6/drivers/virtio/virtio_test.c === --- /dev/null +++ linux-2.6/drivers/virtio/virtio_test.c @@ -0,0 +1,710 @@ +/* + * Test driver for the virtio bus + * + *Copyright IBM Corp. 2009 + *Author(s): Adrian Schneider + * Tim Hofmann + * Christian Ehrhardt + * Christian Borntraeger + */ + + +#define KMSG_COMPONENT "virtio_test" +#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include + +static u32 viotest_major = VIOTEST_MAJOR; +static struct class *viotest_class; +static LIST_HEAD(viotest_list); +static spinlock_t viotest_list_lock = SPIN_LOCK_UNLOCKED; + +static void free_kvec(struct kvec *kiov, u32 index) +{ + u32 i; + + for (i = 0; i < index; i++) + kfree(kiov[i].iov_base); + + kfree(kiov); +} + +/* + * This function copies a userspace iovec * array into a kernel kvec * array + */ +static int copy_iovec_from_user(struct kvec **kiov, struct iovec __user *uiov, + u32 uiov_num) +{ + u32 i; + u64 kiov_sz; + struct iovec uservec; + + kiov_sz = sizeof(struct kvec) * uiov_num; + *kiov = kmalloc(kiov_sz, GFP_KERNEL); + if (!(*kiov)) + return -ENOMEM; + + for (i = 0; i < uiov_num; i++) { + if (copy_from_user(&uservec, &uiov[i], sizeof(struct iovec))) { + free_kvec(*kiov, i); + return -EFAULT; + } + (*kiov)[i].iov_base = kmalloc(uservec.iov_len, GFP_KERNEL); + if (!(*kiov)[i].iov_base) { + free_kvec(*kiov, i); + return -ENOMEM; + } + + if (copy_from_user((*kiov)[i].iov_base, uservec.iov_base, uservec.iov_len)) { + free_kvec(*kiov, i); + return -EFAULT; + } + (*kiov)[i].iov_len = uservec.iov_len; + } + + return 0; +} + +static int copy_kvec_to_user(struct iovec
Re: [PATCH] KVM: add localversion to avoid confusion and conflicts
Am Freitag 29 Mai 2009 10:43:46 schrieb Jaswinder Singh Rajput: > > > Adding localversion avoids confusion in kernel images : > > NAK from my side. If you need a distinction, there is always > Here is NAK for your NAK from my side. > This patch is only for KVM tree and not for linus tree. I know that this is for the kvm tree. I personally dont like to have a forced localversion in my kernel trees - it might break my tooling. Anyway, If it really makes your life better I can live with these localversion files and adopt my tooling. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: add localversion to avoid confusion and conflicts
Am Freitag 29 Mai 2009 09:18:14 schrieb Jaswinder Singh Rajput: > Adding localversion avoids confusion in kernel images : > > like Linux version 2.6.30-rc7 does not tell whether it is linus or kvm > kernel. > > By adding localversion it tells : > > Linux version 2.6.30-rc7-kvm , any doubt ;-) > I am inspired by Ingo's -tip, I am sure Ingo will tell more advantages, > if these are not enough :-) [...] > diff --git a/localversion-kvm b/localversion-kvm > new file mode 100644 > index 000..d969ff0 > --- /dev/null > +++ b/localversion-kvm > @@ -0,0 +1 @@ > +-kvm NAK from my side. If you need a distinction, there is always CONFIG_LOCALVERSION_AUTO. If you need this kind of prefix, there is always CONFIG_LOCALVERSION. Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] kvm-s390: streamline memslot handling
Am Dienstag 26 Mai 2009 09:57:58 schrieb Avi Kivity: > > I could add that behaviour, but that could make our normal interrupt > > handling much slower. Therefore I don't want to call that function, > > but on the other hand I like the "skip if the request is already set" > > functionality and think about adding that in my loop. > > I don't understand why it would affect your interrupt handling. We need As far as I understand x86, every host interrupt causes a guest exit. On s390 the SIE instruction is interruptible. On a host interrupt (like an IPI) the host interrupt handler runs and finally jumps back into the SIE instruction. The hardware will continue with guest execution. This has the advantage, that we dont have to load/save guest and host registers on host interrupts. (the low level interrupt handler saves the registers of the interrupted context) In our low-level interrupt handler we do check for signal_pending, machine_check_pending and need_resched to leave the sie instruction. For anything else a the host sees a cpu bound guest always in the SIE instruction. Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/4] move irq protection role to separate lock v2
Am Donnerstag 21 Mai 2009 06:50:15 schrieb Marcelo Tosatti: > But I fail to see the case where vcpu creation is a fast path (unless > you're benchmarking cpu hotplug/hotunplug). [...] > @@ -2053,6 +2054,9 @@ static long kvm_vm_ioctl(struct file *fi > > if (kvm->mm != current->mm) > return -EIO; > + > + mutex_lock(&kvm->vm_ioctl_lock); > + > switch (ioctl) { > case KVM_CREATE_VCPU: > r = kvm_vm_ioctl_create_vcpu(kvm, arg); > @@ -2228,6 +2232,7 @@ static long kvm_vm_ioctl(struct file *fi > r = kvm_arch_vm_ioctl(filp, ioctl, arg); > } > out: > + mutex_unlock(&kvm->vm_ioctl_lock); > return r; > } The thing that looks worrysome is that the s390 version of kvm_arch_vm_ioctl has KVM_S390_INTERRUPT. This allows userspace to inject interrupts - which would be serialized. The thing is, that external interrupts and I/O interrupts are floating - which means they can arrive on all cpus. This is somewhat of a fast path. On the other hand, kvm_s390_inject_vm already takes the kvm->lock to protect agains hotplug. With this patch we might be able to remove the kvm->lock in kvm_s390_inject_vm - that would reduce the impact. This needs more thinking on our side. Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.
Am Mittwoch 20 Mai 2009 11:11:57 schrieb Avi Kivity: > > Yes, KSM is easier and it even finds duplicate data pages. > > On the other hand it does only provide memory saving. It does not speedup > > application startup like execute-in-place (major page faults become minor > > page faults for text pages if the page is already backed by the host) I > > am not claiming that KSM is useless. Depending on the scenario you might > > want the one or the other or even both. For typical desktop use, KSM is > > very likely the better approach > > If ksm shares pagecache, then doesn't it become effectively XIP? Not exactly, only for long running guests with stable working set. If the guest boots up, its page cache is basically empty, but the shared segment is populated. its the startup where xip wins. Same is true for guests with quickly changing working sets. > We could also hook virtio dma to preemptively share pages somehow. Yes, that is something to think about. One idea that is used on z/VM by lot of customers is to have a shared disk read-only for /usr that is cached by the host. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.
Am Mittwoch 20 Mai 2009 10:45:50 schrieb Avi Kivity: > Christian Bornträger wrote: > > o shared guest kernels: The CMS operating system is build as a bootable > > DCSS (called named-saved-segments NSS). All guests have the same host > > pages for the read-only parts of the CMS kernel. The local data is stored > > in exclusive-write parts of the same NSS. Linux on System z is also > > capable of using this feature (CONFIG_SHARED_KERNEL). The kernel linkage > > is changed in a way to separate the read-only text segment from the other > > parts with segment size alignment > > How does patching (smp, kprobes/jprobes, markers/ftrace) work with this? It does not. :-) Because of that and since most distro kernels are fully modular and kernel updates are another problem this feature is not used very often for Linux. It is used heavily in CMS, though. Actually, we could do COW in the host but then it is really not worth the effort. > > o execute-in-place: This is a Linux feature to exploit the DCSS > > technology. The goal is to shared identical guest pages without the > > additional overhead of KSM etc. We have a block device driver for DCSS. > > This block device driver supports the direct_access function and > > therefore allows to use the xip option of ext2. The idea is to put > > binaries into an read-only ext2 filesystem. Whenever an mmap is made on > > this file system, the page is not mapped into the page cache. The ptes > > point into the DCSS memory instead. Since the DCSS is demand-paged by the > > host no memory is wasted for unused parts of the binaries. In case of COW > > the page is copied as usual. It turned out that installations with many > > similar guests (lets say 400 guests) will profit in terms of memory > > saving and quicker application startups (not the first guest of course). > > There is a downside: this requires a skilled administrator to setup. > > ksm might be easier to admin, at the cost of some cpu time. Yes, KSM is easier and it even finds duplicate data pages. On the other hand it does only provide memory saving. It does not speedup application startup like execute-in-place (major page faults become minor page faults for text pages if the page is already backed by the host) I am not claiming that KSM is useless. Depending on the scenario you might want the one or the other or even both. For typical desktop use, KSM is very likely the better approach. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.
Am Dienstag 19 Mai 2009 20:39:24 schrieb Anthony Liguori: > Perhaps something that maps closer to the current add_buf/get_buf API. > Something like: > > struct iovec *(*map_buf)(struct virtqueue *vq, unsigned int *out_num, > unsigned int *in_num); > void (*unmap_buf)(struct virtqueue *vq, struct iovec *iov, unsigned int > out_num, unsigned int in_num); > > There's symmetry here which is good. The one bad thing about it is > forces certain memory to be read-only and other memory to be > read-write. I don't see that as a bad thing though. > > I think we'll need an interface like this so support driver domains too > since "backend". To put it another way, in QEMU, map_buf == > virtqueue_pop and unmap_buf == virtqueue_push. You are proposing that the guest should define some guest memory to be used as shared memory (some kind of replacement), right? This is fine, as long as we can _also_ map host memory somewhere else (e.g. after guest memory, above 1TB etc.). I definitely want to be able to have an 64MB guest map an 2GB shared memory zone. (See my other mail about the execute-in-place via DCSS use case). I think we should start to write down some requirements. This will help to get a better understanding of the necessary interface: here are my first ideas: o allow to map host-shared-memory to anyplace that can be addressed via a PFN o allow to map beyond guest storage o allow to replace guest memory o read-only and read/write modes o driver interface should not depend on hardware specific stuff (e.g. prefer generic virtio over PCI) More ideas are welcome. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.
Am Mittwoch 20 Mai 2009 04:58:38 schrieb Rusty Russell: > On Wed, 20 May 2009 02:21:08 am Cam Macdonell wrote: > > Avi Kivity wrote: > > > Christian Bornträger wrote: > > >>> To summarize, Anthony thinks it should use virtio, while I believe > > >>> virtio is useful for exporting guest memory, not for importing host > > >>> memory. > > Yes, precisely. > > But what's it *for*, this shared memory? Implementing shared memory is > trivial. Using it is harder. For example, inter-guest networking: you'd > have to copy packets in and out, making it slow as well as losing > abstraction. > > The only interesting idea I can think of is exposing it to userspace, and > having that run some protocol across it for fast app <-> app comms. But if > that's your plan, you still have a lot of code the write! > > So I guess I'm missing the big picture here? I can give some insights about shared memory usage in z/VM. z/VM uses so- called discontiguous saved segments (DCSS) to shared memory between guests. (naming side note: o discontigous because these segments can have holes and different access rights, e.g. you can build DCSS that go from 800M-801M read only and 900M-910M exclusive-write. o segments because the 2nd level of our page tables is called segment table. ) z/VM uses these segments for several purposes: o The monitoring subsystem uses a DCSS to get data from several components o shared guest kernels: The CMS operating system is build as a bootable DCSS (called named-saved-segments NSS). All guests have the same host pages for the read-only parts of the CMS kernel. The local data is stored in exclusive-write parts of the same NSS. Linux on System z is also capable of using this feature (CONFIG_SHARED_KERNEL). The kernel linkage is changed in a way to separate the read-only text segment from the other parts with segment size alignment o execute-in-place: This is a Linux feature to exploit the DCSS technology. The goal is to shared identical guest pages without the additional overhead of KSM etc. We have a block device driver for DCSS. This block device driver supports the direct_access function and therefore allows to use the xip option of ext2. The idea is to put binaries into an read-only ext2 filesystem. Whenever an mmap is made on this file system, the page is not mapped into the page cache. The ptes point into the DCSS memory instead. Since the DCSS is demand-paged by the host no memory is wasted for unused parts of the binaries. In case of COW the page is copied as usual. It turned out that installations with many similar guests (lets say 400 guests) will profit in terms of memory saving and quicker application startups (not the first guest of course). There is a downside: this requires a skilled administrator to setup. We have also experimented with network, Posix shared memory, and shared caches via DCSS. Most of these ideas turned out to be not very useful or hard to implement proper. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.
Am Montag 18 Mai 2009 16:26:15 schrieb Avi Kivity: > Christian Borntraeger wrote: > > Sorry for the late question, but I missed your first version. Is there a > > way to change that code to use virtio instead of PCI? That would allow us > > to use this driver on s390 and maybe other virtio transports. > > Opinion differs. See the discussion in > http://article.gmane.org/gmane.comp.emulators.kvm.devel/30119. > > To summarize, Anthony thinks it should use virtio, while I believe > virtio is useful for exporting guest memory, not for importing host memory. I think the current virtio interface is not ideal for importing host memory, but we can change that. If you look at the dcssblk driver for s390, it allows a guest to map shared memory segments via a diagnose (hypercall). This driver uses PCI regions to map memory. My point is, that the method to map memory is completely irrelevant, we just need something like mmap/shmget between the guest and the host. We could define an interface in virtio, that can be used by any transport. In case of pci this could be a simple pci map operation. What do you think about something like: (CCed Rusty) --- include/linux/virtio.h | 26 ++ 1 file changed, 26 insertions(+) Index: linux-2.6/include/linux/virtio.h === --- linux-2.6.orig/include/linux/virtio.h +++ linux-2.6/include/linux/virtio.h @@ -71,6 +71,31 @@ struct virtqueue_ops { }; /** + * virtio_device_ops - operations for virtio devices + * @map_region: map host buffer at a given address + * vdev: the struct virtio_device we're talking about. + * addr: The address where the buffer should be mapped (hint only) + * length: THe length of the mapping + * identifier: the token that identifies the host buffer + * Returns the mapping address or an error pointer. + * @unmap_region: unmap host buffer from the address + * vdev: the struct virtio_device we're talking about. + * addr: The address where the buffer is mapped + * Returns 0 on success or an error + * + * TBD, we might need query etc. + */ +struct virtio_device_ops { + void * (*map_region)(struct virtio_device *vdev, +void *addr, +size_t length, +int identifier); + int (*unmap_region)(struct virtio_device *vdev, void *addr); +/* we might need query region and other stuff */ +}; + + +/** * virtio_device - representation of a device using virtio * @index: unique position on the virtio bus * @dev: underlying device. @@ -85,6 +110,7 @@ struct virtio_device struct device dev; struct virtio_device_id id; struct virtio_config_ops *config; + struct virtio_device_ops *ops; /* Note that this is a Linux set_bit-style bitmap. */ unsigned long features[1]; void *priv; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html