tun/tap and Vlans (was: Re: Network I/O performance)
Hi all, On a sidenote: I have also realized that when using the tun/tap configuration with a bridge, packets are replicated on all tap devices when QEMU writes packets to the tun interface. I guess this is a limitation of tun/tap as it does not know to which tap device the packet has to go to. The tap device then eventually drops packets when the destination MAC is not its own, but it still receives the packet which causes more overhead in the system overall. Right, I guess you'd see this with a real switch as well? Maybe have your guest send a packet out once in a while so the bridge can learn its MAC address (we do this after migration, for example). Does this mean that it is not possible for having each tun device in a seperate bridge that serves a seperate Vlan? We have experienced a strange problem that we couldn't yet explain. Given this setup: GuestHost kvm1 --- eth0 -+- bridge0 --- vlan1 \ | +-- eth0 kvm2 -+- eth0 -/ / \- eth1 --- bridge1 --- vlan2 + When sending packets through kvm2/eth0, they appear on both bridges and also vlans, also when sending packets through kvm2/eth1. When the guest has only one interface, the packets only appear on one bridge and one vlan as it's supposed to be. Can this be worked around? -- Lukas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] kvm: Use a bitmap for tracking used GSIs
Alex Williamson wrote: Perhaps we should update the bitmap on entry points that everyone uses so we don't have to worry about preallocating. We could set the bitmap in kvm_add_routing_entry() and clear it in kvm_del_routing_entry(). This would mean that kvm_del_routing_entry() implicitly gives up a GSI obtained via kvm_get_irq_route_gsi(), which seems to be the assumption already. Much better. That would eliminate any need for proliferating KVM_CAP_IRQ_ROUTING ifdefs or doing anything based on KVM_IOAPIC_NUM_PINS, but should I keep the KVM_CAP_IRQ_ROUTING around the new code for documentation purposes Only around code which directly uses the routing facilities (i.e. only in the libkvm wrappers). Code in qemu should only do runtime detection. I really should write Documentation/kvm/extensions.txt. And ioctls.txt, and intro.txt... -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.
Am Montag 18 Mai 2009 16:26:15 schrieb Avi Kivity: Christian Borntraeger wrote: Sorry for the late question, but I missed your first version. Is there a way to change that code to use virtio instead of PCI? That would allow us to use this driver on s390 and maybe other virtio transports. Opinion differs. See the discussion in http://article.gmane.org/gmane.comp.emulators.kvm.devel/30119. To summarize, Anthony thinks it should use virtio, while I believe virtio is useful for exporting guest memory, not for importing host memory. I think the current virtio interface is not ideal for importing host memory, but we can change that. If you look at the dcssblk driver for s390, it allows a guest to map shared memory segments via a diagnose (hypercall). This driver uses PCI regions to map memory. My point is, that the method to map memory is completely irrelevant, we just need something like mmap/shmget between the guest and the host. We could define an interface in virtio, that can be used by any transport. In case of pci this could be a simple pci map operation. What do you think about something like: (CCed Rusty) --- include/linux/virtio.h | 26 ++ 1 file changed, 26 insertions(+) Index: linux-2.6/include/linux/virtio.h === --- linux-2.6.orig/include/linux/virtio.h +++ linux-2.6/include/linux/virtio.h @@ -71,6 +71,31 @@ struct virtqueue_ops { }; /** + * virtio_device_ops - operations for virtio devices + * @map_region: map host buffer at a given address + * vdev: the struct virtio_device we're talking about. + * addr: The address where the buffer should be mapped (hint only) + * length: THe length of the mapping + * identifier: the token that identifies the host buffer + * Returns the mapping address or an error pointer. + * @unmap_region: unmap host buffer from the address + * vdev: the struct virtio_device we're talking about. + * addr: The address where the buffer is mapped + * Returns 0 on success or an error + * + * TBD, we might need query etc. + */ +struct virtio_device_ops { + void * (*map_region)(struct virtio_device *vdev, +void *addr, +size_t length, +int identifier); + int (*unmap_region)(struct virtio_device *vdev, void *addr); +/* we might need query region and other stuff */ +}; + + +/** * virtio_device - representation of a device using virtio * @index: unique position on the virtio bus * @dev: underlying device. @@ -85,6 +110,7 @@ struct virtio_device struct device dev; struct virtio_device_id id; struct virtio_config_ops *config; + struct virtio_device_ops *ops; /* Note that this is a Linux set_bit-style bitmap. */ unsigned long features[1]; void *priv; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.
Christian Bornträger wrote: To summarize, Anthony thinks it should use virtio, while I believe virtio is useful for exporting guest memory, not for importing host memory. I think the current virtio interface is not ideal for importing host memory, but we can change that. If you look at the dcssblk driver for s390, it allows a guest to map shared memory segments via a diagnose (hypercall). This driver uses PCI regions to map memory. My point is, that the method to map memory is completely irrelevant, we just need something like mmap/shmget between the guest and the host. We could define an interface in virtio, that can be used by any transport. In case of pci this could be a simple pci map operation. What do you think about something like: (CCed Rusty) Exactly. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v9] kvm: add support for irqfd
Gregory Haskins wrote: More slop. I shouldn't send patches out first thing Monday morning, I guess. Here is my current delta queued for v10. I will wait for some feedback on v9 before cutting it: With this, v10 looks good to go. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] If interrupt injection is not possible do not scan IRR.
Gleb Natapov wrote: Forget to remove debug output before submitting. Resending. Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c index 1ccb50c..d32ceac 100644 --- a/arch/x86/kvm/i8259.c +++ b/arch/x86/kvm/i8259.c @@ -218,6 +218,11 @@ int kvm_pic_read_irq(struct kvm *kvm) struct kvm_pic *s = pic_irqchip(kvm); pic_lock(s); + if (!s-output) { + pic_unlock(s); + return -1; + } + s-output = 0; irq = pic_get_irq(s-pics[0]); if (irq = 0) { pic_intack(s-pics[0], irq); diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index 96dfbb6..e93405a 100644 --- a/arch/x86/kvm/irq.c +++ b/arch/x86/kvm/irq.c @@ -78,7 +78,6 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v) if (vector == -1) { if (kvm_apic_accept_pic_intr(v)) { s = pic_irqchip(v-kvm); - s-output = 0; /* PIC */ vector = kvm_pic_read_irq(v-kvm); } } Please split into a different patch. Even though it is a lot simpler, it contains non-local changes and is therefore relatively dangerous. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 44e87a5..854e8c9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3174,10 +3174,10 @@ static void inject_pending_irq(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) vcpu-arch.nmi_injected = true; kvm_x86_ops-set_nmi(vcpu); } - } else if (kvm_cpu_has_interrupt(vcpu)) { - if (kvm_x86_ops-interrupt_allowed(vcpu)) { - kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu), - false); + } else if (kvm_x86_ops-interrupt_allowed(vcpu)) { + int vec = kvm_cpu_get_interrupt(vcpu); + if (vec != -1) { + kvm_queue_interrupt(vcpu, vec, false); kvm_x86_ops-set_irq(vcpu); } } Again, I don't think this is a win. Usually -interrupts_allowed() == true so we'll execute the rest anyway. Perhaps we could move the call to has_interrupt into get_interrupt. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Drop interrupt shadow when single stepping should be done only on VMX.
Gleb Natapov wrote: The problem exists only on VMX. Also currently we skip this step if there is pending exception. The patch fixes this too. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [APIC] Optimize searching for highest IRR
Gleb Natapov wrote: Most of the time IRR is empty, so instead of scanning the whole IRR on each VM entry keep a variable that tells us if IRR is not empty. IRR will have to be scanned twice on each IRQ delivery, but this is much more rare than VM entry. static inline int apic_find_highest_irr(struct kvm_lapic *apic) { int result; - result = find_highest_vector(apic-regs + APIC_IRR); + if (!apic-irr_pending) + return -1; smp_mb__before_clear_bit(), to prevent the cpu speculating the IRR. + + result = apic_search_irr(apic); ASSERT(result == -1 || result = 16); return result; } +static inline void apic_clear_irr(int vec, struct kvm_lapic *apic) +{ + apic-irr_pending = false; + apic_clear_vector(vec, apic-regs + APIC_IRR); smp_rmb() + if (apic_search_irr(apic) != -1) + apic-irr_pending = true; -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio net regression
Avi Kivity wrote: Antoine Martin wrote: Hi, Here is another one, any ideas? These oopses do look quite deep. Is it normal to end up in tcp_send_ack from pdflush?? I think it can happen anywhere, part of the net softirq. Hah, gotcha. Cheers Antoine [929492.154634] pdflush: page allocation failure. order:0, mode:0x20 You're out of memory. That's quite odd, the guest wasn't even hitting the swap at the tine. How much memory did you allocate to the guest? did you balloon it? 512MB, no ballooning. [929492.154637] Pid: 291, comm: pdflush Not tainted 2.6.29.2 #5 [929492.154639] Call Trace: [929492.154641] IRQ [8027e8bc] __alloc_pages_internal+0x3e1/0x401 [929492.154649] [8055b5ea] try_fill_recv+0xa1/0x182 [929492.154652] [8055c1fc] virtnet_poll+0x533/0x5ab [929492.154655] [80632bba] net_rx_action+0x70/0x143 [929492.154658] [8023f18c] __do_softirq+0x83/0x123 [929492.154661] [8020d35c] call_softirq+0x1c/0x28 [929492.154664] [8020e2c0] do_softirq+0x3c/0x85 [929492.154666] [8023eea3] irq_exit+0x3f/0x7a [929492.154668] [8020e59c] do_IRQ+0x12b/0x14f [929492.154670] [8020cad3] ret_from_intr+0x0/0x29 [929492.154672] EOI [802c22b1] __set_page_dirty_buffers+0x0/0x8f [929492.154677] [8031702b] bget_one+0x0/0xb [929492.154680] [80316fa2] walk_page_buffers+0x2/0x8b [929492.154682] [803185bc] ext3_ordered_writepage+0xae/0x134 [929492.154685] [8027ea46] __writepage+0xa/0x25 [929492.154687] [8027f19f] write_cache_pages+0x206/0x322 [929492.154689] [8027ea3c] __writepage+0x0/0x25 [929492.154691] [8027f2fe] do_writepages+0x27/0x2d [929492.154694] [802bd3f6] __writeback_single_inode+0x1a7/0x3b5 [929492.154696] [8020a68c] __switch_to+0xb4/0x38c [929492.154698] [802bda76] generic_sync_sb_inodes+0x2a7/0x458 [929492.154701] [802bde00] writeback_inodes+0x8d/0xe6 [929492.154704] [807296e2] _spin_lock+0x5/0x7 [929492.155056] [8027f432] wb_kupdate+0x9f/0x116 [929492.155058] [80280095] pdflush+0x14b/0x202 [929492.155061] [8027f393] wb_kupdate+0x0/0x116 [929492.155063] [8027ff4a] pdflush+0x0/0x202 [929492.155065] [8027ff4a] pdflush+0x0/0x202 [929492.155068] [8024c127] kthread+0x47/0x73 [929492.155070] [8020d25a] child_rip+0xa/0x20 [929492.155072] [8024c0e0] kthread+0x0/0x73 [929492.183142] [8020d250] child_rip+0x0/0x20 [929492.183145] Mem-Info: [929492.183147] DMA per-cpu: [929492.183149] CPU0: hi:0, btch: 1 usd: 0 [929492.183151] DMA32 per-cpu: [929492.183154] CPU0: hi: 186, btch: 31 usd: 184 [929492.183158] Active_anon:2755 active_file:39849 inactive_anon:2972 [929492.183159] inactive_file:70353 unevictable:0 dirty:4172 writeback:1580 unstable:0 [929492.183161] free:734 slab:5619 mapped:15047 pagetables:927 bounce:0 [929492.183166] DMA free:1968kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:40kB active_file:2116kB inactive_file:1880kB unevictable:0kB present:5448kB pages_scanned:0 all_unreclaimable? no [929492.183169] lowmem_reserve[]: 0 489 489 489 [929492.183176] DMA32 free:968kB min:2812kB low:3512kB high:4216kB active_anon:11020kB inactive_anon:11848kB active_file:157280kB inactive_file:279532kB unevictable:0kB present:500896kB pages_scanned:0 all_unreclaimable? no [929492.183180] lowmem_reserve[]: 0 0 0 0 [929492.183183] DMA: 6*4kB 2*8kB 3*16kB 1*32kB 1*64kB 2*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1976kB [929492.183235] DMA32: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 3*128kB 2*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 968kB [929492.183244] 110992 total pagecache pages [929492.183246] 739 pages in swap cache [929492.183248] Swap cache stats: add 8996, delete 8257, find 92604/93191 [929492.183250] Free swap = 1040016kB [929492.183252] Total swap = 1048568kB [929492.186003] 131056 pages RAM [929492.186006] 4799 pages reserved [929492.186007] 44697 pages shared [929492.186008] 90516 pages non-shared [930274.380075] eth0: no IPv6 routers present Strange, seems to be a bit of free memory here. There should be lots, all this host is doing is apache+sftp... Assuming I can make it re-occur (stress testing it?), how would I dig further to find the cause of this memory exhaustion? /proc/meminfo and friends? Cheers Antoine -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio net regression
Antoine Martin wrote: You're out of memory. That's quite odd, the guest wasn't even hitting the swap at the tine. But you do have swap enabled? Strange, seems to be a bit of free memory here. There should be lots, all this host is doing is apache+sftp... Assuming I can make it re-occur (stress testing it?), how would I dig further to find the cause of this memory exhaustion? /proc/meminfo and friends? Yes please. Maybe virtio is leaking memory. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/2] Intel-IOMMU: source-id checking for interrupt remapping
Support source-id checking for interrupt remapping, and then isolates interrupts for guests/VMs with assigned devices. v1 - v2 change log: Access PCI directly (read_pci_config_byte) to parse IOAPIC, instead of PCI related discovery, because PCI subsystem is not initialized at that time. Weidong Han (2): Intel-IOMMU, intr-remap: set the whole 128bits of irte when modify/free it Intel-IOMMU, intr-remap: source-id checking arch/x86/kernel/apic/io_apic.c |6 ++ drivers/pci/intr_remapping.c | 100 +-- drivers/pci/intr_remapping.h |2 + include/linux/dmar.h | 11 4 files changed, 113 insertions(+), 6 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] Intel-IOMMU, intr-remap: set the whole 128bits of irte when modify/free it
Interrupt remapping table entry is 128bits. Currently, it only sets low 64bits of irte in modify_irte and free_irte. This ignores high 64bits setting of irte, that means source-id setting will be ignored. This patch sets the whole 128bits of irte when modify/free it. Following source-id checking patch depends on this. Signed-off-by: Weidong Han weidong@intel.com --- drivers/pci/intr_remapping.c | 10 +++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/pci/intr_remapping.c b/drivers/pci/intr_remapping.c index f5e0ea7..946e170 100644 --- a/drivers/pci/intr_remapping.c +++ b/drivers/pci/intr_remapping.c @@ -309,7 +309,8 @@ int modify_irte(int irq, struct irte *irte_modified) index = irq_iommu-irte_index + irq_iommu-sub_handle; irte = iommu-ir_table-base[index]; - set_64bit((unsigned long *)irte, irte_modified-low); + set_64bit((unsigned long *)irte-low, irte_modified-low); + set_64bit((unsigned long *)irte-high, irte_modified-high); __iommu_flush_cache(iommu, irte, sizeof(*irte)); rc = qi_flush_iec(iommu, index, 0); @@ -386,8 +387,11 @@ int free_irte(int irq) irte = iommu-ir_table-base[index]; if (!irq_iommu-sub_handle) { - for (i = 0; i (1 irq_iommu-irte_mask); i++) - set_64bit((unsigned long *)(irte + i), 0); + for (i = 0; i (1 irq_iommu-irte_mask); i++) { + set_64bit((unsigned long *)irte-low, 0); + set_64bit((unsigned long *)irte-high, 0); + irte++; + } rc = qi_flush_iec(iommu, index, irq_iommu-irte_mask); } -- 1.6.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/2] Intel-IOMMU, intr-remap: source-id checking
Ingo Molnar wrote: * Han, Weidong weidong@intel.com wrote: Ingo Molnar wrote: * Han, Weidong weidong@intel.com wrote: Siddha, Suresh B wrote: On Wed, 2009-05-06 at 23:16 -0700, Han, Weidong wrote: @@ -634,6 +694,44 @@ static int ir_parse_ioapic_scope(struct acpi_dmar_header *header, 0x%Lx\n, scope-enumeration_id, drhd-address); +bus = pci_find_bus(drhd-segment, scope-bus); +path = (struct acpi_dmar_pci_path *)(scope + 1); + count = (scope-length - +sizeof(struct acpi_dmar_device_scope)) +/ sizeof(struct acpi_dmar_pci_path); + +while (count) { +if (pdev) +pci_dev_put(pdev); + +if (!bus) +break; + +pdev = pci_get_slot(bus, +PCI_DEVFN(path-dev, path-fn)); +if (!pdev) +break; ir_parse_ioapic_scope() happens very early in the boot. So, I don't think we can do the pci related discovery here. Thanks for your pointing it out. It should enable the source-id checking for io-apic's after the pci subsystem is up. I will change it. Note, there's ways to do early PCI quirks too, check arch/x86/kernel/early-quirks.c. It's done by reading the PCI configuration space directly via a careful early-capable subset of the PCI config space APIs. But it's a method of last resort. Thanks for your reminder. It can use direct PCI access here as follows. It's easy and clean. I think it's better than adding the source-id checking for io-apic's after the pci subsystem is up. I will send out updated patches after some tests. @@ -634,6 +695,24 @@ static int ir_parse_ioapic_scope(struct acpi_dmar_header *header, 0x%Lx\n, scope-enumeration_id, drhd-address); + bus = scope-bus; + path = (struct acpi_dmar_pci_path *)(scope + 1); + count = (scope-length - +sizeof(struct acpi_dmar_device_scope)) + / sizeof(struct acpi_dmar_pci_path); + + while (--count 0) { + /* Access PCI directly due to the PCI +* subsystem isn't initialized yet. +*/ + bus = read_pci_config_byte(bus, path-dev, + path-fn, PCI_SECONDARY_BUS); + path++; + } + + ir_ioapic[ir_ioapic_num].bus = bus; + ir_ioapic[ir_ioapic_num].devfn = + PCI_DEVFN(path-dev, path-fn); looks good IMO, beyond the obligatory comment-style nitpick [*] :-) Also, the function above seems to be way too large - please split it into a couple of natural helper functions. Thanks, Ingo [*] Please use the customary comment style: /* * Comment . * .. goes here: */ specified in Documentation/CodingStyle. I have sent out the updated patches. Thanks! Regards, Weidong-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio net regression
Avi Kivity wrote: Antoine Martin wrote: You're out of memory. That's quite odd, the guest wasn't even hitting the swap at the tine. But you do have swap enabled? Yes. I always do this on the guests as it seems fairer to let the guests use swap when they need the extra memory rather than over-committing too much memory on the host. Although it would probably be more efficient overall to let the host manage all swapping. It consumes more I/O bandwidth, but most guest's memory stay warm no matter what other guests are doing. Does that sound reasonable? Strange, seems to be a bit of free memory here. There should be lots, all this host is doing is apache+sftp... Assuming I can make it re-occur (stress testing it?), how would I dig further to find the cause of this memory exhaustion? /proc/meminfo and friends? Yes please. Maybe virtio is leaking memory. Will report if I find anything. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] Nested SVM: Implement INVLPGA v2
SVM adds another way to do INVLPG by ASID which Hyper-V makes use of, so let's implement it! For now we just do the same thing invlpg does, as asid switching means we flush the mmu anyways. That might change one day though. v2 makes invlpga do the same as invlpg, not flush the whole mmu Signed-off-by: Alexander Graf ag...@suse.de --- arch/x86/kvm/svm.c | 15 ++- 1 files changed, 14 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 4b4eadd..fa2a710 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1785,6 +1785,19 @@ static int clgi_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) return 1; } +static int invlpga_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) +{ + struct kvm_vcpu *vcpu = svm-vcpu; + nsvm_printk(INVLPGA\n); + + /* Let's treat INVLPGA the same as INVLPG */ + kvm_mmu_invlpg(vcpu, vcpu-arch.regs[VCPU_REGS_RAX]); + + svm-next_rip = kvm_rip_read(svm-vcpu) + 3; + skip_emulated_instruction(svm-vcpu); + return 1; +} + static int invalid_op_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) { @@ -2130,7 +2143,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm, [SVM_EXIT_INVD] = emulate_on_interception, [SVM_EXIT_HLT] = halt_interception, [SVM_EXIT_INVLPG] = invlpg_interception, - [SVM_EXIT_INVLPGA] = invalid_op_interception, + [SVM_EXIT_INVLPGA] = invlpga_interception, [SVM_EXIT_IOIO] = io_interception, [SVM_EXIT_MSR] = msr_interception, [SVM_EXIT_TASK_SWITCH] = task_switch_interception, -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Add rudimentary Hyper-V guest support v2
Now that we have nested SVM in place, let's make use of it and virtualize something non-kvm. The first interesting target that came to my mind here was Hyper-V. This patchset makes Windows Server 2008 boot with Hyper-V, which runs the dom0 in virtualized mode already. It hangs somewhere in IDE code when booted, so I haven't been able to run a second VM within for now yet. Please keep in mind that Hyper-V won't work unless you apply the userspace patches too and the PAT bit patch v2 changes: - remove reserved PAT check patch (Avi will do this) - remove #PF inject on emulated_read - take comments from v1 into account (listed individually) Alexander Graf (4): Add definition for IGNNE MSR Implement Hyper-V MSRs v2 Nested SVM: Implement INVLPGA v2 Nested SVM: Improve interrupt injection v2 arch/x86/include/asm/msr-index.h |1 + arch/x86/kvm/svm.c | 59 +++-- 2 files changed, 44 insertions(+), 16 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] Nested SVM: Improve interrupt injection v2
While trying to get Hyper-V running, I realized that the interrupt injection mechanisms that are in place right now are not 100% correct. This patch makes nested SVM's interrupt injection behave more like on a real machine. v2 calls BUG_ON when svm_set_irq is called with GIF=0 Signed-off-by: Alexander Graf ag...@suse.de --- arch/x86/kvm/svm.c | 39 --- 1 files changed, 24 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index fa2a710..5b14c9d 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1517,7 +1517,8 @@ static int nested_svm_vmexit_real(struct vcpu_svm *svm, void *arg1, /* Kill any pending exceptions */ if (svm-vcpu.arch.exception.pending == true) nsvm_printk(WARNING: Pending Exception\n); - svm-vcpu.arch.exception.pending = false; + kvm_clear_exception_queue(svm-vcpu); + kvm_clear_interrupt_queue(svm-vcpu); /* Restore selected save entries */ svm-vmcb-save.es = hsave-save.es; @@ -1585,7 +1586,8 @@ static int nested_svm_vmrun(struct vcpu_svm *svm, void *arg1, svm-nested_vmcb = svm-vmcb-save.rax; /* Clear internal status */ - svm-vcpu.arch.exception.pending = false; + kvm_clear_exception_queue(svm-vcpu); + kvm_clear_interrupt_queue(svm-vcpu); /* Save the old vmcb, so we don't need to pick what we save, but can restore everything when a VMEXIT occurs */ @@ -2277,21 +2279,14 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, int irq) ((/*control-int_vector 4*/ 0xf) V_INTR_PRIO_SHIFT); } -static void svm_queue_irq(struct kvm_vcpu *vcpu, unsigned nr) -{ - struct vcpu_svm *svm = to_svm(vcpu); - - svm-vmcb-control.event_inj = nr | - SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR; -} - static void svm_set_irq(struct kvm_vcpu *vcpu, int irq) { struct vcpu_svm *svm = to_svm(vcpu); - nested_svm_intr(svm); + BUG_ON(!(svm-vcpu.arch.hflags HF_GIF_MASK)); - svm_queue_irq(vcpu, irq); + svm-vmcb-control.event_inj = irq | + SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR; } static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) @@ -2319,13 +2314,25 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu) struct vmcb *vmcb = svm-vmcb; return (vmcb-save.rflags X86_EFLAGS_IF) !(vmcb-control.int_state SVM_INTERRUPT_SHADOW_MASK) - (svm-vcpu.arch.hflags HF_GIF_MASK); + (svm-vcpu.arch.hflags HF_GIF_MASK) + !is_nested(svm); } static void enable_irq_window(struct kvm_vcpu *vcpu) { - svm_set_vintr(to_svm(vcpu)); - svm_inject_irq(to_svm(vcpu), 0x0); + struct vcpu_svm *svm = to_svm(vcpu); + nsvm_printk(Trying to open IRQ window\n); + + nested_svm_intr(svm); + + /* In case GIF=0 we can't rely on the CPU to tell us when +* GIF becomes 1, because that's a separate STGI/VMRUN intercept. +* The next time we get that intercept, this function will be +* called again though and we'll get the vintr intercept. */ + if (svm-vcpu.arch.hflags HF_GIF_MASK) { + svm_set_vintr(svm); + svm_inject_irq(svm, 0x0); + } } static void enable_nmi_window(struct kvm_vcpu *vcpu) @@ -2393,6 +2400,8 @@ static void svm_complete_interrupts(struct vcpu_svm *svm) case SVM_EXITINTINFO_TYPE_EXEPT: /* In case of software exception do not reinject an exception vector, but re-execute and instruction instead */ + if (is_nested(svm)) + break; if (vector == BP_VECTOR || vector == OF_VECTOR) break; if (exitintinfo SVM_EXITINTINFO_VALID_ERR) { -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] Add definition for IGNNE MSR
Hyper-V tried to access MSR_IGNNE, so let's at least have a definition for it in our headers. Signed-off-by: Alexander Graf ag...@suse.de --- arch/x86/include/asm/msr-index.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index ec41fc1..e273549 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -372,6 +372,7 @@ /* AMD-V MSRs */ #define MSR_VM_CR 0xc0010114 +#define MSR_VM_IGNNE0xc0010115 #define MSR_VM_HSAVE_PA 0xc0010117 #endif /* _ASM_X86_MSR_INDEX_H */ -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2] Shared memory device with interrupt support
Cam, is it somehow possible to generate a local APIC interrupt from one VM to another? I guess it shouldn't be as the LAPIC interrupts generated in one VM will go to the VCPUs of the same VM... Regards, Bhaskar. -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Kumar, Venkat Sent: Tuesday, May 19, 2009 9:22 AM To: Cam Macdonell Cc: kvm@vger.kernel.org list Subject: RE: [PATCH v2] Shared memory device with interrupt support I had tried all syntaxes other than this :). Interrupts work now. Thx, Venkat -Original Message- From: Cam Macdonell [mailto:c...@cs.ualberta.ca] Sent: Monday, May 18, 2009 9:51 PM To: Kumar, Venkat Cc: kvm@vger.kernel.org list Subject: Re: [PATCH v2] Shared memory device with interrupt support Kumar, Venkat wrote: Cam - I got your patch to work but without notifications. I could share memory using the patch but notifications aren't working. I bring up two VM's with option -ivshmem shrmem,1024,/dev/shm/shrmem,server and -ivshmem shrmem,1024,/dev/shm/shrmem respectively. Ok, I guess I need to do more error checking of arguments :) You need to specify unix: on the path. So your options should look like this -ivshmem shrmem,1024,unix:/dev/shm/shrmem,server -ivshmem shrmem,1024,unix:/dev/shm/shrmem That should help. Cam When I make an ioctl from one of the VM's to inject an interrupt to the other VM, I get an error in qemu_chr_write and return value is -1. write call in send_all is failing with return value -1. Am I missing something here? Thx, Venkat -Original Message- From: Cam Macdonell [mailto:c...@cs.ualberta.ca] Sent: Saturday, May 16, 2009 9:01 AM To: Kumar, Venkat Cc: kvm@vger.kernel.org list Subject: Re: [PATCH v2] Shared memory device with interrupt support On 15-May-09, at 8:54 PM, Kumar, Venkat wrote: Cam, A questions on interrupts as well. What is unix:path that needs to be passed in the argument list? Can it be any string? It has to be a valid path on the host. It will create a unix domain socket on that path. If my understanding is correct both the VM's who wants to communicate would gives this path in the command line with one of them specifying as server. Exactly, the one with the server in the parameter list will wait for a connection before booting. Cam Thx, Venkat Support an inter-vm shared memory device that maps a shared- memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. This device now creates a qemu character device and sends 1-bytes messages to trigger interrupts. Writes are trigger by writing to the Doorbell register on the shared memory PCI device. The lower 8-bits of the value written to this register are sent as the 1-byte message so different meanings of interrupts can be supported. Interrupts are only supported between 2 VMs currently. One VM must act as the server by adding server to the command-line argument. Shared memory devices are created with the following command-line: -ivhshmem shm object,size in MB,[unix:path][,server] Interrupts can also be used between host and guest as well by implementing a listener on the host. Cam --- Makefile.target |3 + hw/ivshmem.c| 421 ++ + hw/pc.c |6 + hw/pc.h |3 + qemu-options.hx | 14 ++ sysemu.h|8 + vl.c| 14 ++ 7 files changed, 469 insertions(+), 0 deletions(-) create mode 100644 hw/ivshmem.c diff --git a/Makefile.target b/Makefile.target index b68a689..3190bba 100644 --- a/Makefile.target +++ b/Makefile.target @@ -643,6 +643,9 @@ OBJS += pcnet.o OBJS += rtl8139.o OBJS += e1000.o +# Inter-VM PCI shared memory +OBJS += ivshmem.o + # Generic watchdog support and some watchdog devices OBJS += watchdog.o OBJS += wdt_ib700.o wdt_i6300esb.o diff --git a/hw/ivshmem.c b/hw/ivshmem.c new file mode 100644 index 000..95e2268 --- /dev/null +++ b/hw/ivshmem.c @@ -0,0 +1,421 @@ +/* + * Inter-VM Shared Memory PCI device. + * + * Author: + * Cam Macdonell c...@cs.ualberta.ca + * + * Based On: cirrus_vga.c and rtl8139.c + * + * This code is licensed under the GNU GPL v2. + */ + +#include hw.h +#include console.h +#include pc.h +#include pci.h +#include sysemu.h + +#include qemu-common.h +#include sys/mman.h + +#define PCI_COMMAND_IOACCESS0x0001 +#define PCI_COMMAND_MEMACCESS 0x0002 +#define PCI_COMMAND_BUSMASTER 0x0004 + +//#define DEBUG_IVSHMEM + +#ifdef DEBUG_IVSHMEM +#define IVSHMEM_DPRINTF(fmt, args...)\ +do {printf(IVSHMEM: fmt, ##args); } while (0) +#else +#define IVSHMEM_DPRINTF(fmt, args...) +#endif + +typedef struct IVShmemState { +uint16_t intrmask; +uint16_t
Re: [PATCH v2] Shared memory device with interrupt support
Jayaraman, Bhaskar wrote: Cam, is it somehow possible to generate a local APIC interrupt from one VM to another? I guess it shouldn't be as the LAPIC interrupts generated in one VM will go to the VCPUs of the same VM... Regards, Bhaskar. The closest thing to this is the irqfd+iosignalfd thing I mentioned the other day. With this model, a PIO/MMIO write in the src guest will directly inject an interrupt into the dst guest's LAPIC. However, as Avi points out, this is just an optimization. You can also do it by first taking a hop through each guests userspace as well. HTH -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH v2 2/2] Intel-IOMMU, intr-remap: source-id checking
* Weidong Han weidong@intel.com wrote: To support domain-isolation usages, the platform hardware must be capable of uniquely identifying the requestor (source-id) for each interrupt message. Without source-id checking for interrupt remapping , a rouge guest/VM with assigned devices can launch interrupt attacks to bring down anothe guest/VM or the VMM itself. This patch adds source-id checking for interrupt remapping, and then really isolates interrupts for guests/VMs with assigned devices. Because PCI subsystem is not initialized yet when set up IOAPIC entries, use read_pci_config_byte to access PCI config space directly. Signed-off-by: Weidong Han weidong@intel.com --- arch/x86/kernel/apic/io_apic.c |6 +++ drivers/pci/intr_remapping.c | 90 ++- drivers/pci/intr_remapping.h |2 + include/linux/dmar.h | 11 + 4 files changed, 106 insertions(+), 3 deletions(-) Code structure looks nice now. (and i susect you have tested this on real and relevant hardware?) I've Cc:-ed Eric too ... does this direction look good to you too Eric? Have a few minor nits only: diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 30da617..3d10c68 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -1559,6 +1559,9 @@ int setup_ioapic_entry(int apic_id, int irq, irte.vector = vector; irte.dest_id = IRTE_DEST(destination); + /* Set source-id of interrupt request */ + set_ioapic_sid(irte, apic_id); + modify_irte(irq, irte); ir_entry-index2 = (index 15) 0x1; @@ -3329,6 +3332,9 @@ static int msi_compose_msg(struct pci_dev *pdev, unsigned int irq, struct msi_ms irte.vector = cfg-vector; irte.dest_id = IRTE_DEST(dest); + /* Set source-id of interrupt request */ + set_msi_sid(irte, pdev); + modify_irte(irq, irte); msg-address_hi = MSI_ADDR_BASE_HI; diff --git a/drivers/pci/intr_remapping.c b/drivers/pci/intr_remapping.c index 946e170..9ef7b0d 100644 --- a/drivers/pci/intr_remapping.c +++ b/drivers/pci/intr_remapping.c @@ -10,6 +10,8 @@ #include linux/intel-iommu.h #include intr_remapping.h #include acpi/acpi.h +#include asm/pci-direct.h +#include pci.h static struct ioapic_scope ir_ioapic[MAX_IO_APICS]; static int ir_ioapic_num; @@ -405,6 +407,61 @@ int free_irte(int irq) return rc; } +int set_ioapic_sid(struct irte *irte, int apic) +{ + int i; + u16 sid = 0; + + if (!irte) + return -1; + + for (i = 0; i MAX_IO_APICS; i++) + if (ir_ioapic[i].id == apic) { + sid = (ir_ioapic[i].bus 8) | ir_ioapic[i].devfn; + break; + } Please generally put extra curly braces around each multi-line loop body. One reason beyond readability is robustness: the above structure can be easily extended in a buggy way via [mockup patch hunk]: sid = (ir_ioapic[i].bus 8) | ir_ioapic[i].devfn; break; } + if (!sid) + break; And note that if this slips in by accident how unobvious this bug is during patch review - the loop head context is not present in the 3-line default context and the code looks correct at a glance. With extra braces, such typos or mismerges: } } + if (!sid) + break; stick out during review like a sore thumb :-) + if (sid == 0) { + printk(KERN_WARNING Failed to set source-id of +I/O APIC (%d), because it is not under +any DRHD\n, apic); + return -1; + } please try to keep kernel messages on a single line - even if checkpatch complains. Also, it's a good idea to use pr_warning(), it's shorter by 8 characters. + + irte-svt = 1; /* requestor ID verification SID/SQ */ + irte-sq = 0; /* comparing all 16-bit of SID */ + irte-sid = sid; this is a borderline suggestion: Note how you already lined up the _comments_ vertically, so you did notice that it makes sense to align vertically. The same visual arguments can be made for the initialization itself too: + + irte-svt = 1;/* requestor ID verification SID/SQ */ + irte-sq= 0;/* comparing all 16-bit of SID */ + irte-sid = sid; + + return 0; But ... it might make even more sense to introduce a set_irte() helper method, so it can all be written in a single line as: set_irte(irte, 1, 0, sid); and explain common values in the set_irte() function's description - that way those comments above (and below) dont have to be made at the usage sites. +} + +int set_msi_sid(struct irte
Re: [PATCH v2 1/2] Intel-IOMMU, intr-remap: set the whole 128bits of irte when modify/free it
* Weidong Han weidong@intel.com wrote: Interrupt remapping table entry is 128bits. Currently, it only sets low 64bits of irte in modify_irte and free_irte. This ignores high 64bits setting of irte, that means source-id setting will be ignored. This patch sets the whole 128bits of irte when modify/free it. Following source-id checking patch depends on this. Signed-off-by: Weidong Han weidong@intel.com --- drivers/pci/intr_remapping.c | 10 +++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/pci/intr_remapping.c b/drivers/pci/intr_remapping.c index f5e0ea7..946e170 100644 --- a/drivers/pci/intr_remapping.c +++ b/drivers/pci/intr_remapping.c @@ -309,7 +309,8 @@ int modify_irte(int irq, struct irte *irte_modified) index = irq_iommu-irte_index + irq_iommu-sub_handle; irte = iommu-ir_table-base[index]; - set_64bit((unsigned long *)irte, irte_modified-low); + set_64bit((unsigned long *)irte-low, irte_modified-low); + set_64bit((unsigned long *)irte-high, irte_modified-high); __iommu_flush_cache(iommu, irte, sizeof(*irte)); rc = qi_flush_iec(iommu, index, 0); @@ -386,8 +387,11 @@ int free_irte(int irq) irte = iommu-ir_table-base[index]; if (!irq_iommu-sub_handle) { - for (i = 0; i (1 irq_iommu-irte_mask); i++) - set_64bit((unsigned long *)(irte + i), 0); + for (i = 0; i (1 irq_iommu-irte_mask); i++) { + set_64bit((unsigned long *)irte-low, 0); + set_64bit((unsigned long *)irte-high, 0); + irte++; + } The loop is a bit unclean. It has a side-effect on 'irte' - and other patterns in the driver usually treat 'irte' as a generally available variable. So the above code, while correct, opens up the possibility of later code added to this function relying on 'irte', thinking that it's set to iommu-ir_table-base[index], and then breaking because 'irte' has been iterated to the end of it in certain circumstances. It's better to factor out the whole loop into a helper function, which does something like: int flush_entries(struct irq_2_iommu *irq_iommu) { struct irte *start, *entry, *end; struct intel_iommu *iommu; int index; if (irq_iommu-sub_handle) return 0; iommu = irq_iommu-iommu; index = irq_iommu-irte_index + irq_iommu-sub_handle; start = iommu-ir_table-base + index; end = start + (1 irq_iommu-irte_mask); for (entry = start; entry end; entry++) { set_64bit((unsigned long *)entry-low, 0); set_64bit((unsigned long *)entry-high, 0); } return qi_flush_iec(iommu, index, irq_iommu-irte_mask); } Note how clearer this is - the new method has one purpose and 'entry' is a clear iterator. ( And note how much clearer the flow of 'rc' has become as well as a side-effect: it is clear now that it's set to 0 when irq_iommu-sub_handle is still present. ) Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] Nested SVM: Implement INVLPGA v2
Alexander Graf wrote: SVM adds another way to do INVLPG by ASID which Hyper-V makes use of, so let's implement it! For now we just do the same thing invlpg does, as asid switching means we flush the mmu anyways. That might change one day though. v2 makes invlpga do the same as invlpg, not flush the whole mmu +static int invlpga_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) +{ + struct kvm_vcpu *vcpu = svm-vcpu; + nsvm_printk(INVLPGA\n); + + /* Let's treat INVLPGA the same as INVLPG */ + kvm_mmu_invlpg(vcpu, vcpu-arch.regs[VCPU_REGS_RAX]); + + svm-next_rip = kvm_rip_read(svm-vcpu) + 3; + skip_emulated_instruction(svm-vcpu); + return 1; +} I think that for ASID!=0 you can actually do nothing. The guest entry is a cr3 switch, so we'll both get a tlb flush and a resync on any modified ptes. For ASID==0 you can do the invlpg thing. Marcelo? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] Implement Hyper-V MSRs v2
Alexander Graf wrote: Hyper-V uses some MSRs, some of which are actually reserved for BIOS usage. But let's be nice today and have it its way, because otherwise it fails terribly. v2 changes: - remove the 0x4081 MSR definition - add pr_unimpl() on unimplemented writes Signed-off-by: Alexander Graf ag...@suse.de --- arch/x86/kvm/svm.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index ef43a18..4b4eadd 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2034,6 +2034,11 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data) case MSR_VM_HSAVE_PA: svm-hsave_msr = data; break; + case MSR_VM_CR: + case MSR_VM_IGNNE: + case MSR_K8_HWCR: + pr_unimpl(vcpu, unimplemented wrmsr: 0x%x data 0x%llx\n, ecx, data); + break; We can be nicer, if the write doesn't set bits which we don't implement, we can let it proceed silently. See for example MSR_IA32_DEBUGCTLMSR. Most likely the values written are already correctly implemented (by doing nothing). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio net regression
Antoine Martin wrote: But you do have swap enabled? Yes. I always do this on the guests as it seems fairer to let the guests use swap when they need the extra memory rather than over-committing too much memory on the host. Although it would probably be more efficient overall to let the host manage all swapping. It consumes more I/O bandwidth, but most guest's memory stay warm no matter what other guests are doing. Does that sound reasonable? Yes, it also provides better isolation. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] Nested SVM: Implement INVLPGA v2
On 19.05.2009, at 14:58, Avi Kivity wrote: Alexander Graf wrote: SVM adds another way to do INVLPG by ASID which Hyper-V makes use of, so let's implement it! For now we just do the same thing invlpg does, as asid switching means we flush the mmu anyways. That might change one day though. v2 makes invlpga do the same as invlpg, not flush the whole mmu +static int invlpga_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) +{ + struct kvm_vcpu *vcpu = svm-vcpu; + nsvm_printk(INVLPGA\n); + + /* Let's treat INVLPGA the same as INVLPG */ + kvm_mmu_invlpg(vcpu, vcpu-arch.regs[VCPU_REGS_RAX]); + + svm-next_rip = kvm_rip_read(svm-vcpu) + 3; + skip_emulated_instruction(svm-vcpu); + return 1; +} I think that for ASID!=0 you can actually do nothing. The guest entry is a cr3 switch, so we'll both get a tlb flush and a resync on any modified ptes. Right, the only situation I can imagine this isn't fulfilled is when INVLPGA isn't trapped in the 1st level guest, but issued in the 2nd level one. That should be rather rare though ;-). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] Nested SVM: Implement INVLPGA v2
Alexander Graf wrote: I think that for ASID!=0 you can actually do nothing. The guest entry is a cr3 switch, so we'll both get a tlb flush and a resync on any modified ptes. Right, the only situation I can imagine this isn't fulfilled is when INVLPGA isn't trapped in the 1st level guest, but issued in the 2nd level one. That should be rather rare though ;-). Good catch. Would be better to get it right; changing the test to asid != current_asid should suffice. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] Nested SVM: Improve interrupt injection v2
On Tue, May 19, 2009 at 12:54:03PM +0200, Alexander Graf wrote: While trying to get Hyper-V running, I realized that the interrupt injection mechanisms that are in place right now are not 100% correct. This patch makes nested SVM's interrupt injection behave more like on a real machine. v2 calls BUG_ON when svm_set_irq is called with GIF=0 Signed-off-by: Alexander Graf ag...@suse.de --- arch/x86/kvm/svm.c | 39 --- 1 files changed, 24 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index fa2a710..5b14c9d 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1517,7 +1517,8 @@ static int nested_svm_vmexit_real(struct vcpu_svm *svm, void *arg1, /* Kill any pending exceptions */ if (svm-vcpu.arch.exception.pending == true) nsvm_printk(WARNING: Pending Exception\n); - svm-vcpu.arch.exception.pending = false; + kvm_clear_exception_queue(svm-vcpu); + kvm_clear_interrupt_queue(svm-vcpu); What about pending NMI here? /* Restore selected save entries */ svm-vmcb-save.es = hsave-save.es; @@ -1585,7 +1586,8 @@ static int nested_svm_vmrun(struct vcpu_svm *svm, void *arg1, svm-nested_vmcb = svm-vmcb-save.rax; /* Clear internal status */ - svm-vcpu.arch.exception.pending = false; + kvm_clear_exception_queue(svm-vcpu); + kvm_clear_interrupt_queue(svm-vcpu); And here. /* Save the old vmcb, so we don't need to pick what we save, but can restore everything when a VMEXIT occurs */ @@ -2277,21 +2279,14 @@ static inline void svm_inject_irq(struct vcpu_svm *svm, int irq) ((/*control-int_vector 4*/ 0xf) V_INTR_PRIO_SHIFT); } -static void svm_queue_irq(struct kvm_vcpu *vcpu, unsigned nr) -{ - struct vcpu_svm *svm = to_svm(vcpu); - - svm-vmcb-control.event_inj = nr | - SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR; -} - static void svm_set_irq(struct kvm_vcpu *vcpu, int irq) { struct vcpu_svm *svm = to_svm(vcpu); - nested_svm_intr(svm); + BUG_ON(!(svm-vcpu.arch.hflags HF_GIF_MASK)); - svm_queue_irq(vcpu, irq); + svm-vmcb-control.event_inj = irq | + SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_INTR; } static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) @@ -2319,13 +2314,25 @@ static int svm_interrupt_allowed(struct kvm_vcpu *vcpu) struct vmcb *vmcb = svm-vmcb; return (vmcb-save.rflags X86_EFLAGS_IF) !(vmcb-control.int_state SVM_INTERRUPT_SHADOW_MASK) - (svm-vcpu.arch.hflags HF_GIF_MASK); + (svm-vcpu.arch.hflags HF_GIF_MASK) + !is_nested(svm); } static void enable_irq_window(struct kvm_vcpu *vcpu) { - svm_set_vintr(to_svm(vcpu)); - svm_inject_irq(to_svm(vcpu), 0x0); + struct vcpu_svm *svm = to_svm(vcpu); + nsvm_printk(Trying to open IRQ window\n); + + nested_svm_intr(svm); + + /* In case GIF=0 we can't rely on the CPU to tell us when + * GIF becomes 1, because that's a separate STGI/VMRUN intercept. + * The next time we get that intercept, this function will be + * called again though and we'll get the vintr intercept. */ + if (svm-vcpu.arch.hflags HF_GIF_MASK) { + svm_set_vintr(svm); + svm_inject_irq(svm, 0x0); + } } static void enable_nmi_window(struct kvm_vcpu *vcpu) @@ -2393,6 +2400,8 @@ static void svm_complete_interrupts(struct vcpu_svm *svm) case SVM_EXITINTINFO_TYPE_EXEPT: /* In case of software exception do not reinject an exception vector, but re-execute and instruction instead */ + if (is_nested(svm)) + break; if (vector == BP_VECTOR || vector == OF_VECTOR) break; if (exitintinfo SVM_EXITINTINFO_VALID_ERR) { -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] Nested SVM: Implement INVLPGA v2
On Tue, May 19, 2009 at 03:58:52PM +0300, Avi Kivity wrote: Alexander Graf wrote: SVM adds another way to do INVLPG by ASID which Hyper-V makes use of, so let's implement it! For now we just do the same thing invlpg does, as asid switching means we flush the mmu anyways. That might change one day though. v2 makes invlpga do the same as invlpg, not flush the whole mmu +static int invlpga_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) +{ +struct kvm_vcpu *vcpu = svm-vcpu; +nsvm_printk(INVLPGA\n); + +/* Let's treat INVLPGA the same as INVLPG */ +kvm_mmu_invlpg(vcpu, vcpu-arch.regs[VCPU_REGS_RAX]); + +svm-next_rip = kvm_rip_read(svm-vcpu) + 3; +skip_emulated_instruction(svm-vcpu); +return 1; +} I think that for ASID!=0 you can actually do nothing. The guest entry is a cr3 switch, so we'll both get a tlb flush and a resync on any modified ptes. For ASID==0 you can do the invlpg thing. Marcelo? kvm_mmu_invlpg is cheap, better just invalidate the entry. If hyper-v uses invlpga to invalidate TLB entries which it has updated pte's in memory for, and you skip the invalidation now and somehow later use an unsync spte, you're toast. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] Nested SVM: Implement INVLPGA v2
Marcelo Tosatti wrote: I think that for ASID!=0 you can actually do nothing. The guest entry is a cr3 switch, so we'll both get a tlb flush and a resync on any modified ptes. For ASID==0 you can do the invlpg thing. Marcelo? kvm_mmu_invlpg is cheap, better just invalidate the entry. If hyper-v uses invlpga to invalidate TLB entries which it has updated pte's in memory for, and you skip the invalidation now and somehow later use an unsync spte, you're toast. But won't the guest entry cause a resync? Doing nothing is even cheaper. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] Nested SVM: Implement INVLPGA v2
On Tue, May 19, 2009 at 04:56:48PM +0300, Avi Kivity wrote: Marcelo Tosatti wrote: I think that for ASID!=0 you can actually do nothing. The guest entry is a cr3 switch, so we'll both get a tlb flush and a resync on any modified ptes. For ASID==0 you can do the invlpg thing. Marcelo? kvm_mmu_invlpg is cheap, better just invalidate the entry. If hyper-v uses invlpga to invalidate TLB entries which it has updated pte's in memory for, and you skip the invalidation now and somehow later use an unsync spte, you're toast. But won't the guest entry cause a resync? If its a cr3/cr4 exit, yes. Doing nothing is even cheaper. My brain is nested. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio-net zero-copy
On Mon, May 18, 2009 at 10:00 PM, Avi Kivity a...@redhat.com wrote: Raju Srivastava wrote: Greetings, Could someone let me know if current virtio-net supports zero-copy? I see some discussion here: http://thread.gmane.org/gmane.comp.emulators.kvm.devel/28061/ (copyless virtio net thoughts) and it looks like the copyless virtio-net is not supported by KVM yet. That is correct. Thank you for letting me know this. If this is true, then is there any plan to add the zero copy to the virtio-net? Yes, but it will be a difficult journey. That's great. I'm looking forward to it. It's said Xen NetChannel 2 has some new features including the zero-copy. Though it would be a difficult journey, it's really worth, right? Thanks Regards, Raju -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
qemu-kvm.git regression in configure
Latest qemu-kvm.git fails with ./configure, and reverting 22d239bcee126742df46938ee8ddc7c6b9209e23 corrects it. Beth Kon -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] Nested SVM: Implement INVLPGA v2
On 19.05.2009, at 15:58, Marcelo Tosatti wrote: On Tue, May 19, 2009 at 04:56:48PM +0300, Avi Kivity wrote: Marcelo Tosatti wrote: I think that for ASID!=0 you can actually do nothing. The guest entry is a cr3 switch, so we'll both get a tlb flush and a resync on any modified ptes. For ASID==0 you can do the invlpg thing. Marcelo? kvm_mmu_invlpg is cheap, better just invalidate the entry. If hyper-v uses invlpga to invalidate TLB entries which it has updated pte's in memory for, and you skip the invalidation now and somehow later use an unsync spte, you're toast. But won't the guest entry cause a resync? If its a cr3/cr4 exit, yes. Well it has to be. Either we're switching from one NPT to the other (todo) or do a normal cr3+cr4 switch. So I guess we can optimize here. Is it worth it? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm.git regression in configure
Beth Kon wrote: Latest qemu-kvm.git fails with ./configure, and reverting 22d239bcee126742df46938ee8ddc7c6b9209e23 corrects it. Works for me. What error do you get? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm.git regression in configure
Avi Kivity wrote: Beth Kon wrote: Latest qemu-kvm.git fails with ./configure, and reverting 22d239bcee126742df46938ee8ddc7c6b9209e23 corrects it. Works for me. What error do you get? ./configure: 1364: Syntax error: ( unexpected (expecting fi) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm.git regression in configure
Beth Kon wrote: Avi Kivity wrote: Beth Kon wrote: Latest qemu-kvm.git fails with ./configure, and reverting 22d239bcee126742df46938ee8ddc7c6b9209e23 corrects it. Works for me. What error do you get? ./configure: 1364: Syntax error: ( unexpected (expecting fi) Ah, a non-bash shell, no arrays. I'll sort it out. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] Nested SVM: Implement INVLPGA v2
On Tue, May 19, 2009 at 05:18:07PM +0200, Alexander Graf wrote: On 19.05.2009, at 15:58, Marcelo Tosatti wrote: On Tue, May 19, 2009 at 04:56:48PM +0300, Avi Kivity wrote: Marcelo Tosatti wrote: I think that for ASID!=0 you can actually do nothing. The guest entry is a cr3 switch, so we'll both get a tlb flush and a resync on any modified ptes. For ASID==0 you can do the invlpg thing. Marcelo? kvm_mmu_invlpg is cheap, better just invalidate the entry. If hyper-v uses invlpga to invalidate TLB entries which it has updated pte's in memory for, and you skip the invalidation now and somehow later use an unsync spte, you're toast. But won't the guest entry cause a resync? If its a cr3/cr4 exit, yes. Well it has to be. Either we're switching from one NPT to the other (todo) or do a normal cr3+cr4 switch. So I guess we can optimize here. Is it worth it? IMHO better leave it the way it is, perhaps add a comment that the optimization is possible, and do it later if worthwhile. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] Nested SVM: Implement INVLPGA v2
Alexander Graf wrote: kvm_mmu_invlpg is cheap, better just invalidate the entry. If hyper-v uses invlpga to invalidate TLB entries which it has updated pte's in memory for, and you skip the invalidation now and somehow later use an unsync spte, you're toast. But won't the guest entry cause a resync? If its a cr3/cr4 exit, yes. Well it has to be. Either we're switching from one NPT to the other (todo) or do a normal cr3+cr4 switch. So I guess we can optimize here. Is it worth it? I think so. We also need to make sure the entry causes a resync, even if cr3 doesn't change. Oh, exit needs to force a resync as well, in case the guest foolishly let its guest touch its page tables and issue invlpga asid=0. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.
Avi Kivity wrote: Christian Bornträger wrote: To summarize, Anthony thinks it should use virtio, while I believe virtio is useful for exporting guest memory, not for importing host memory. I think the current virtio interface is not ideal for importing host memory, but we can change that. If you look at the dcssblk driver for s390, it allows a guest to map shared memory segments via a diagnose (hypercall). This driver uses PCI regions to map memory. My point is, that the method to map memory is completely irrelevant, we just need something like mmap/shmget between the guest and the host. We could define an interface in virtio, that can be used by any transport. In case of pci this could be a simple pci map operation. What do you think about something like: (CCed Rusty) Exactly. Agreed. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: XP smp using a lot of CPU
Hi Avi, here is the cpuinfo - what do you mean with workload? The CPU isage is around 33%. processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz stepping: 2 cpu MHz : 1833.554 cache size : 2048 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips: 3667.98 clflush size: 64 processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz stepping: 2 cpu MHz : 1833.554 cache size : 2048 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm bogomips: 3666.43 clflush size: 64 Best regards, Erik Avi Kivity wrote: Erik Rull wrote: Hi all, very very interesting. I have a similar problem but the other way round. If my XP runs up tp 100% CPU usage top on the linux host reports only 33% cpu usage. I would expect around 50% because I only provide one core for the guest. I already increased the process priority of qemu and the io priority, nothing helped. The rest of the CPU is nearly idle, no excessive disk access this time :-) Any Idea what this could be? What workload is the guest running? What is your host cpu type (/proc/cpuinfo)? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/17] net: drop packet from tap device if all NICs are down
On Sun, 2009-05-17 at 10:43 -0500, Anthony Liguori wrote: From: Mark McLoughlin mar...@redhat.com If you do e.g. set_link virtio.0 down and there are packets pending on the tap interface, we currently buffer a packet and constantly try and send it until the link is up again. We actually just want to drop the packet if the NIC is down. Upstream qemu already does this, we just differ because we buffer packets from the tap interface. [aliguori: rebased this patch on stable. Mark, please review and Ack] Reported-by: Yan Vugenfirer yvuge...@redhat.com Signed-off-by: Mark McLoughlin mar...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Anthony Liguori aligu...@us.ibm.com Looks good to me. Cheers, Mark. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2793994 ] kvm (command) doesn't work when vboxdrv module is loaded
Bugs item #2793994, was opened at 2009-05-19 19:52 Message generated for change (Tracker Item Submitted) made by benb You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2793994group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Interface (example) Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Ben Bucksch (benb) Assigned to: Nobody/Anonymous (nobody) Summary: kvm (command) doesn't work when vboxdrv module is loaded Initial Comment: Reproduction: 1. Load kvm_amd kernel module 2. Start kvm VM with kvm command 3. Stop kvm VM 4. Load VirtualBox vboxdrv kernel module 5. Start VirtualBox GUI, start VM, exit VM, close VirtualBox GUI 6. Start kvm VM with kvm -vnc ... command Actual result: All steps up to step 5 work. In Step 6, kvm starts and keeps running, I can connect to VNC, but I only get a black screen. No error message. Expected result: In step 6, kvm command immediately exits, with an error message: Another virtual machine manager like VirtualBox, Xen or VMWare is running at the moment. Check 'lsmod' that no virtual machine manager modules other than 'kvm*' are loaded. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2793994group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2793994 ] kvm doesn't work when VirtualBox' vboxdrv module is loaded
Bugs item #2793994, was opened at 2009-05-19 19:52 Message generated for change (Settings changed) made by benb You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2793994group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Interface (example) Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Ben Bucksch (benb) Assigned to: Nobody/Anonymous (nobody) Summary: kvm doesn't work when VirtualBox' vboxdrv module is loaded Initial Comment: Reproduction: 1. Load kvm_amd kernel module 2. Start kvm VM with kvm command 3. Stop kvm VM 4. Load VirtualBox vboxdrv kernel module 5. Start VirtualBox GUI, start VM, exit VM, close VirtualBox GUI 6. Start kvm VM with kvm -vnc ... command Actual result: All steps up to step 5 work. In Step 6, kvm starts and keeps running, I can connect to VNC, but I only get a black screen. No error message. Expected result: In step 6, kvm command immediately exits, with an error message: Another virtual machine manager like VirtualBox, Xen or VMWare is running at the moment. Check 'lsmod' that no virtual machine manager modules other than 'kvm*' are loaded. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2793994group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Shared memory device with interrupt support
Avi Kivity wrote: Anthony Liguori wrote: I'd strongly recommend working these patches on qemu-devel and lkml. I suspect Avi may disagree with me, but in order for this to be eventually merged in either place, you're going to have additional requirements put on you. I don't disagree with the fact that there will be additional requirements, but I might disagree with some of those additional requirements themselves. It actually works out better than I think you expect it to... We can't use mmap() directly. With the new RAM allocation scheme, I think it's pretty reasonable to now allow portions of ram to come from files that get mmap() (sort of like -mem-path). This RAM area could be setup as a BAR. In particular I think your proposal was unimplementable; I would like to see how how you can address my concerns. I don't remember what my proposal was to be perfectly honest :-) I think I suggested registering a guest allocated portion of memory as a sharable region via virtio? Why is that unimplementable? I don't think bulk memory sharing and the current transactional virtio mechanisms are a good fit for each other; but if we were to add a BAR-like capability to virtio that would address the compatibility requirement (though it might be difficult to implement on s390 with its requirement on contiguous host virtual address space). It doesn't necessarily have to be virtio if that's not what makes sense. The QEMU bits and the device model bits are actually relatively simple. The part that I think needs more deep thought is the guest-visible interface. A char device is probably not the best interface. I think you want something like tmpfs/hugetlbfs. Another question is whether you want a guest to be able to share a portion of it's memory with another guest or have everything setup by the host. If everything is setup by the host, hot plug is important. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.
Christian Bornträger wrote: Am Montag 18 Mai 2009 16:26:15 schrieb Avi Kivity: Christian Borntraeger wrote: Sorry for the late question, but I missed your first version. Is there a way to change that code to use virtio instead of PCI? That would allow us to use this driver on s390 and maybe other virtio transports. Opinion differs. See the discussion in http://article.gmane.org/gmane.comp.emulators.kvm.devel/30119. To summarize, Anthony thinks it should use virtio, while I believe virtio is useful for exporting guest memory, not for importing host memory. I think the current virtio interface is not ideal for importing host memory, but we can change that. If you look at the dcssblk driver for s390, it allows a guest to map shared memory segments via a diagnose (hypercall). This driver uses PCI regions to map memory. My point is, that the method to map memory is completely irrelevant, we just need something like mmap/shmget between the guest and the host. We could define an interface in virtio, that can be used by any transport. In case of pci this could be a simple pci map operation. What do you think about something like: (CCed Rusty) --- include/linux/virtio.h | 26 ++ 1 file changed, 26 insertions(+) Index: linux-2.6/include/linux/virtio.h === --- linux-2.6.orig/include/linux/virtio.h +++ linux-2.6/include/linux/virtio.h @@ -71,6 +71,31 @@ struct virtqueue_ops { }; /** + * virtio_device_ops - operations for virtio devices + * @map_region: map host buffer at a given address + * vdev: the struct virtio_device we're talking about. + * addr: The address where the buffer should be mapped (hint only) + * length: THe length of the mapping + * identifier: the token that identifies the host buffer + * Returns the mapping address or an error pointer. + * @unmap_region: unmap host buffer from the address + * vdev: the struct virtio_device we're talking about. + * addr: The address where the buffer is mapped + * Returns 0 on success or an error + * + * TBD, we might need query etc. + */ +struct virtio_device_ops { + void * (*map_region)(struct virtio_device *vdev, +void *addr, +size_t length, +int identifier); + int (*unmap_region)(struct virtio_device *vdev, void *addr); +/* we might need query region and other stuff */ +}; Perhaps something that maps closer to the current add_buf/get_buf API. Something like: struct iovec *(*map_buf)(struct virtqueue *vq, unsigned int *out_num, unsigned int *in_num); void (*unmap_buf)(struct virtqueue *vq, struct iovec *iov, unsigned int out_num, unsigned int in_num); There's symmetry here which is good. The one bad thing about it is forces certain memory to be read-only and other memory to be read-write. I don't see that as a bad thing though. I think we'll need an interface like this so support driver domains too since backend. To put it another way, in QEMU, map_buf == virtqueue_pop and unmap_buf == virtqueue_push. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][KVM][retry 3] Add support for Pause Filtering to AMD SVM
From 67f831e825b64be5dedae9936ff8a60b884959f2 Mon Sep 17 00:00:00 2001 From: mark.langsd...@amd.com Date: Tue, 19 May 2009 07:46:11 -0500 Subject: [PATCH] This feature creates a new field in the VMCB called Pause Filter Count. If Pause Filter Count is greater than 0 and intercepting PAUSEs is enabled, the processor will increment an internal counter when a PAUSE instruction occurs instead of intercepting. When the internal counter reaches the Pause Filter Count value, a PAUSE intercept will occur. This feature can be used to detect contended spinlocks, especially when the lock holding VCPU is not scheduled. Rescheduling another VCPU prevents the VCPU seeking the lock from wasting its quantum by spinning idly. Perform the reschedule by increasing the the credited time on the VCPU. Experimental results show that most spinlocks are held for less than 1000 PAUSE cycles or more than a few thousand. Default the Pause Filter Counter to 5000 to detect the contended spinlocks. Processor support for this feature is indicated by a CPUID bit. On a 24 core system running 4 guests each with 16 VCPUs, this patch improved overall performance of each guest's 32 job kernbench by approximately 1%. Further performance improvement may be possible with a more sophisticated yield algorithm. -Mark Langsdorf Operating System Research Center AMD Signed-off-by: Mark Langsdorf mark.langsd...@amd.com --- arch/x86/include/asm/svm.h |3 ++- arch/x86/kvm/svm.c | 13 + include/linux/sched.h |7 +++ kernel/sched.c |5 + 4 files changed, 27 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index 85574b7..1fecb7e 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -57,7 +57,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area { u16 intercept_dr_write; u32 intercept_exceptions; u64 intercept; - u8 reserved_1[44]; + u8 reserved_1[42]; + u16 pause_filter_count; u64 iopm_base_pa; u64 msrpm_base_pa; u64 tsc_offset; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index ef43a18..86df191 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -45,6 +45,7 @@ MODULE_LICENSE(GPL); #define SVM_FEATURE_NPT (1 0) #define SVM_FEATURE_LBRV (1 1) #define SVM_FEATURE_SVML (1 2) +#define SVM_FEATURE_PAUSE_FILTER (1 10) #define DEBUGCTL_RESERVED_BITS (~(0x3fULL)) @@ -575,6 +576,11 @@ static void init_vmcb(struct vcpu_svm *svm) svm-nested_vmcb = 0; svm-vcpu.arch.hflags = HF_GIF_MASK; + + if (svm_has(SVM_FEATURE_PAUSE_FILTER)) { + control-pause_filter_count = 3000; + control-intercept |= (1ULL INTERCEPT_PAUSE); + } } static int svm_vcpu_reset(struct kvm_vcpu *vcpu) @@ -2087,6 +2093,12 @@ static int interrupt_window_interception(struct vcpu_svm *svm, return 1; } +static int pause_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run) +{ + set_task_delay(current, 100); + return 1; +} + static int (*svm_exit_handlers[])(struct vcpu_svm *svm, struct kvm_run *kvm_run) = { [SVM_EXIT_READ_CR0] = emulate_on_interception, @@ -2123,6 +2135,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm, [SVM_EXIT_CPUID]= cpuid_interception, [SVM_EXIT_IRET] = iret_interception, [SVM_EXIT_INVD] = emulate_on_interception, + [SVM_EXIT_PAUSE]= pause_interception, [SVM_EXIT_HLT] = halt_interception, [SVM_EXIT_INVLPG] = invlpg_interception, [SVM_EXIT_INVLPGA] = invalid_op_interception, diff --git a/include/linux/sched.h b/include/linux/sched.h index b4c38bc..683bc65 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2283,6 +2283,9 @@ static inline unsigned int task_cpu(const struct task_struct *p) return task_thread_info(p)-cpu; } +extern void set_task_delay(struct task_struct *p, unsigned int delay); + + extern void set_task_cpu(struct task_struct *p, unsigned int cpu); #else @@ -2292,6 +2295,10 @@ static inline unsigned int task_cpu(const struct task_struct *p) return 0; } +void set_task_delay(struct task_struct *p, unsigned int delay) +{ +} + static inline void set_task_cpu(struct task_struct *p, unsigned int cpu) { } diff --git a/kernel/sched.c b/kernel/sched.c index b902e58..3174620 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -1947,6 +1947,11 @@ task_hot(struct task_struct *p, u64 now, struct sched_domain *sd) return delta (s64)sysctl_sched_migration_cost; } +void set_task_delay(struct task_struct *p, unsigned int delay) +{ + p-se.vruntime += delay; +} +EXPORT_SYMBOL(set_task_delay); void
kvm guest debug using gdb on x86
Hi, With the latest qemu-kvm and 2.6.30-rc6 kernel i am not able to get the guest debugging with gdb. I get the following error. $gdb ./vmlinux GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as i486-linux-gnu... (gdb) b do_fork Breakpoint 1 at 0xc106cfc8: file kernel/fork.c, line 1347. (gdb) target remote localhost:1234 Remote debugging using localhost:1234 [New Thread 1] Remote 'g' packet reply is too long: 7fa557e209c10400c8b3d0c1c03fd1c1a83fd1c1912d03c10202600068007b007b00d8f60b8015407f03 (gdb) any patches that i can try ? -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm guest debug using gdb on x86
On Wed, May 20, 2009 at 12:23:12AM +0530, Aneesh Kumar K.V wrote: Hi, With the latest qemu-kvm and 2.6.30-rc6 kernel i am not able to get the guest debugging with gdb. I get the following error. $gdb ./vmlinux GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as i486-linux-gnu... (gdb) b do_fork Breakpoint 1 at 0xc106cfc8: file kernel/fork.c, line 1347. (gdb) target remote localhost:1234 Remote debugging using localhost:1234 [New Thread 1] Remote 'g' packet reply is too long: 7fa557e209c10400c8b3d0c1c03fd1c1a83fd1c1912d03c10202600068007b007b00d8f60b8015407f03 (gdb) any patches that i can try ? Works better with the four patches found at http://git.kiszka.org/?p=kvm-userspace.git;a=shortlog;h=refs/heads/queues/gdb But a next and continue doesn't get the prompt back on gdb. The guest does stops the execution. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] Intel-IOMMU, intr-remap: source-id checking
Ingo Molnar mi...@elte.hu writes: * Weidong Han weidong@intel.com wrote: To support domain-isolation usages, the platform hardware must be capable of uniquely identifying the requestor (source-id) for each interrupt message. Without source-id checking for interrupt remapping , a rouge guest/VM with assigned devices can launch interrupt attacks to bring down anothe guest/VM or the VMM itself. This patch adds source-id checking for interrupt remapping, and then really isolates interrupts for guests/VMs with assigned devices. Because PCI subsystem is not initialized yet when set up IOAPIC entries, use read_pci_config_byte to access PCI config space directly. Signed-off-by: Weidong Han weidong@intel.com --- arch/x86/kernel/apic/io_apic.c |6 +++ drivers/pci/intr_remapping.c | 90 ++- drivers/pci/intr_remapping.h |2 + include/linux/dmar.h | 11 + 4 files changed, 106 insertions(+), 3 deletions(-) Code structure looks nice now. (and i susect you have tested this on real and relevant hardware?) I've Cc:-ed Eric too ... does this direction look good to you too Eric? Being a major nitpick, I have to point out that the code is not structured to support other iommus, and I think AMD has one that can do this as well. The early pci reading of the bus is just wrong. What happens if the pci layer decided to renumber things? It looks like we have a real dependency on pci there and are avoiding sorting it out with this. Hmm. But that is what we use in setup_ioapic_sid I expect the right solution is to delay enabling ioapic entries until driver enable them. That could also reduce screaming irqs during bootup in the kdump case. set_msi_sid looks wrong. The comment are unhelpful. irte-svt should get an enum value or a deine (removing the repeated explanations of the magic value) and then we could have room to explain why we are doing what we are doing. Not finding an upstream pcie_bridge and then concluding we are a pcie device seems bogus. Why if we do have an upstream pcie bridge do we only want to do a bus range verification instead of checking just for the bus :devfn? The legacy PCI case seems even stranger. The table of apic information by apic_id also seems wrong. Don't we have chip_data or something that should point it that we can get from the irq? Eric -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tun/tap and Vlans
Am Dienstag, den 19.05.2009, 10:45 +0300 schrieb Avi Kivity: Hi, GuestHost kvm1 --- eth0 -+- bridge0 --- vlan1 \ | +-- eth0 kvm2 -+- eth0 -/ / \- eth1 --- bridge1 --- vlan2 + When sending packets through kvm2/eth0, they appear on both bridges and also vlans, also when sending packets through kvm2/eth1. When the guest has only one interface, the packets only appear on one bridge and one vlan as it's supposed to be. Can this be worked around? This is strange. Can you post the command line you used to start kvm2? Please bear with me - this was a few weeks ago and we didn't investigate further as we had other problems to solve. I'll set up a testbed next week and hope to report back with more details. -- Lukas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v7] kvm: Use a bitmap for tracking used GSIs
We're currently using a counter to track the most recent GSI we've handed out. This quickly hits KVM_MAX_IRQ_ROUTES when using device assignment with a driver that regularly toggles the MSI enable bit (such as Linux kernels 2.6.21-26). This can mean only a few minutes of usable run time. Instead, track used GSIs in a bitmap. Signed-off-by: Alex Williamson alex.william...@hp.com --- v2: Added mutex to protect gsi bitmap v3: Updated for comments from Michael Tsirkin No longer depends on [PATCH] kvm: device-assignment: Catch GSI overflow v4: Fix gsi_bytes calculation noted by Sheng Yang v5: Remove mutex per Avi Fix negative gsi_count path per Michael Remove KVM_CAP_IRQ_ROUTING per Michael, ppc should still be protected by the KVM_IOAPIC_NUM_PINS check v6: Make use of ALIGN macro, per Michael Define KVM_IOAPIC_NUM_PINS if not already, per Michael Fix comment indent, per Michael Remove unused BITMAP_SIZE macro v7: Don't define KVM_IOAPIC_NUM_PINS, mark bitmap in common paths so we can stay blissfully ignorant of ioapics kvm/libkvm/kvm-common.h |3 +- kvm/libkvm/libkvm.c | 91 +-- 2 files changed, 73 insertions(+), 21 deletions(-) diff --git a/kvm/libkvm/kvm-common.h b/kvm/libkvm/kvm-common.h index 591fb53..c95c591 100644 --- a/kvm/libkvm/kvm-common.h +++ b/kvm/libkvm/kvm-common.h @@ -67,7 +67,8 @@ struct kvm_context { struct kvm_irq_routing *irq_routes; int nr_allocated_irq_routes; #endif - int max_used_gsi; + void *used_gsi_bitmap; + int max_gsi; }; int kvm_alloc_kernel_memory(kvm_context_t kvm, unsigned long memory, diff --git a/kvm/libkvm/libkvm.c b/kvm/libkvm/libkvm.c index ba0a5d1..c5d6a7f 100644 --- a/kvm/libkvm/libkvm.c +++ b/kvm/libkvm/libkvm.c @@ -61,10 +61,32 @@ #define DPRINTF(fmt, args...) do {} while (0) #endif +#define MIN(x,y) ((x) (y) ? (x) : (y)) +#define ALIGN(x, y) (((x)+(y)-1) ~((y)-1)) int kvm_abi = EXPECTED_KVM_API_VERSION; int kvm_page_size; +static inline void set_gsi(kvm_context_t kvm, unsigned int gsi) +{ + uint32_t *bitmap = kvm-used_gsi_bitmap; + + if (gsi kvm-max_gsi) + bitmap[gsi / 32] |= 1U (gsi % 32); + else + DPRINTF(Invalid GSI %d\n); +} + +static inline void clear_gsi(kvm_context_t kvm, unsigned int gsi) +{ + uint32_t *bitmap = kvm-used_gsi_bitmap; + + if (gsi kvm-max_gsi) + bitmap[gsi / 32] = ~(1U (gsi % 32)); + else + DPRINTF(Invalid GSI %d\n); +} + struct slot_info { unsigned long phys_addr; unsigned long len; @@ -285,7 +307,7 @@ kvm_context_t kvm_init(struct kvm_callbacks *callbacks, { int fd; kvm_context_t kvm; - int r; + int r, gsi_count; fd = open(/dev/kvm, O_RDWR); if (fd == -1) { @@ -323,6 +345,23 @@ kvm_context_t kvm_init(struct kvm_callbacks *callbacks, kvm-no_irqchip_creation = 0; kvm-no_pit_creation = 0; + gsi_count = kvm_get_gsi_count(kvm); + if (gsi_count 0) { + int gsi_bits, i; + + /* Round up so we can search ints using ffs */ + gsi_bits = ALIGN(gsi_count, 32); + kvm-used_gsi_bitmap = malloc(gsi_bits / 8); + if (!kvm-used_gsi_bitmap) + goto out_close; + memset(kvm-used_gsi_bitmap, 0, gsi_bits / 8); + kvm-max_gsi = gsi_bits; + + /* Mark any over-allocated bits as already in use */ + for (i = gsi_count; i gsi_bits; i++) + set_gsi(kvm, i); + } + return kvm; out_close: close(fd); @@ -626,9 +665,6 @@ int kvm_get_dirty_pages(kvm_context_t kvm, unsigned long phys_addr, void *buf) return kvm_get_map(kvm, KVM_GET_DIRTY_LOG, slot, buf); } -#define ALIGN(x, y) (((x)+(y)-1) ~((y)-1)) -#define BITMAP_SIZE(m) (ALIGN(((m)/PAGE_SIZE), sizeof(long) * 8) / 8) - int kvm_get_dirty_pages_range(kvm_context_t kvm, unsigned long phys_addr, unsigned long len, void *buf, void *opaque, int (*cb)(unsigned long start, unsigned long len, @@ -1298,8 +1334,8 @@ int kvm_add_routing_entry(kvm_context_t kvm, new-flags = entry-flags; new-u = entry-u; - if (entry-gsi kvm-max_used_gsi) - kvm-max_used_gsi = entry-gsi; + set_gsi(kvm, entry-gsi); + return 0; #else return -ENOSYS; @@ -1327,12 +1363,14 @@ int kvm_del_routing_entry(kvm_context_t kvm, { #ifdef KVM_CAP_IRQ_ROUTING struct kvm_irq_routing_entry *e, *p; - int i, found = 0; + int i, gsi, found = 0; + + gsi = entry-gsi; for (i = 0; i kvm-irq_routes-nr; ++i) { e = kvm-irq_routes-entries[i]; if (e-type == entry-type -e-gsi == entry-gsi) { +e-gsi == gsi) {
Does KVM suffer from ACK-compression as you increase the number of VMs?
I recently read the following paper from 2004 that discusses ACK- compression in a VMware GSX 2.5.1 environment. http://www.cs.clemson.edu/~jmarty/papers/ccn2004.pdf I was wondering if anyone had checked to see if KVM also suffers from ACK-compression as you increase the number of VMs on each host (increasing virtualization overhead)? If it does suffer delays, what solutions exist for remedying this? In addition to that, I was also curious what the maximum number of VMs people have been able to fit on a host, and what bottlenecks they encountered as they reached a maximum level of VMs before things fell apart. thanks, andrew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kvm building issue
Hi Avi, I am trying to build the kvm with changed repositories. I was trying to follow the instructions from here: http://www.linux-kvm.org/page/Code. Especially this section: building an external module with older kernels from that page. I find the kernel directory is missing in the qemu-kvm.git repository. Hence the make sync is not working anymore. With the new repository setup, how do I build latest qemu-kvm with latest kvm modules for fedora 10 kernel? Thanks, Nitin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] qemu-kvm: Handle -no-shutodwn
Plain QEMU has the parameter -no-shutdown. This avoids termination of the qemu process when VM got shutdown (e.g. to still use the QEMU-Monitor with stopped VM). This parameter has no effect on qemu-kvm, today. This patch introduces identical handling, as in qemu, of -no-shutdown for qemu-kvm: * termination of qemu-kvm process on a VM shutdown get only avoided once * second shutdown of VM cause termination of qemu-kvm (like in qemu) Signed-off-by: Daniel Gollub gol...@b1-systems.de --- qemu-kvm.c |9 ++--- sysemu.h |1 + vl.c |7 +++ 3 files changed, 14 insertions(+), 3 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 5e4002b..b9926eb 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -597,9 +597,12 @@ int kvm_main_loop(void) while (1) { main_loop_wait(1000); -if (qemu_shutdown_requested()) -break; -else if (qemu_powerdown_requested()) +if (qemu_shutdown_requested()) { +if (qemu_no_shutdown()) { +vm_stop(0); +} else +break; + } else if (qemu_powerdown_requested()) qemu_system_powerdown(); else if (qemu_reset_requested()) qemu_kvm_system_reset(); diff --git a/sysemu.h b/sysemu.h index 1f45fd6..0dd184d 100644 --- a/sysemu.h +++ b/sysemu.h @@ -35,6 +35,7 @@ void cpu_disable_ticks(void); void qemu_system_reset_request(void); void qemu_system_shutdown_request(void); void qemu_system_powerdown_request(void); +int qemu_no_shutdown(void); int qemu_shutdown_requested(void); int qemu_reset_requested(void); int qemu_powerdown_requested(void); diff --git a/vl.c b/vl.c index d9f0607..9b2a420 100644 --- a/vl.c +++ b/vl.c @@ -3644,6 +3644,13 @@ static int powerdown_requested; static int debug_requested; static int vmstop_requested; +int qemu_no_shutdown(void) +{ +int r = no_shutdown; +no_shutdown = 0; +return r; +} + int qemu_shutdown_requested(void) { int r = shutdown_requested; -- Daniel GollubGeschaeftsfuehrer: Ralph Dehner FOSS Developer Unternehmenssitz: Vohburg B1 Systems GmbH Amtsgericht: Ingolstadt Mobil: +49-(0)-160 47 73 970 Handelsregister: HRB 3537 EMail: gol...@b1-systems.de http://www.b1-systems.de Adresse: B1 Systems GmbH, Osterfeldstraße 7, 85088 Vohburg http://pgpkeys.pca.dfn.de/pks/lookup?op=getsearch=0xED14B95C2F8CA78D -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts.
On Wed, 20 May 2009 02:21:08 am Cam Macdonell wrote: Avi Kivity wrote: Christian Bornträger wrote: To summarize, Anthony thinks it should use virtio, while I believe virtio is useful for exporting guest memory, not for importing host memory. Yes, precisely. But what's it *for*, this shared memory? Implementing shared memory is trivial. Using it is harder. For example, inter-guest networking: you'd have to copy packets in and out, making it slow as well as losing abstraction. The only interesting idea I can think of is exposing it to userspace, and having that run some protocol across it for fast app - app comms. But if that's your plan, you still have a lot of code the write! So I guess I'm missing the big picture here? Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html