[PATCH] KVM: MMU: initialize sptes early
Otherwise, the following kvm_sync_pages() will see invalid sptes in a new shadow page. Signed-off-by: Zhao Jin crono...@gmail.com --- arch/x86/kvm/mmu.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 8e8da79..d7e1694 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1692,6 +1692,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, return sp; sp-gfn = gfn; sp-role = role; + init_shadow_page_table(sp); hlist_add_head(sp-hash_link, vcpu-kvm-arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]); if (!direct) { @@ -1702,7 +1703,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, account_shadowed(vcpu-kvm, gfn); } - init_shadow_page_table(sp); trace_kvm_mmu_get_page(sp, true); return sp; } -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: MMU: fix the condition of syncing a new shadow page
Should be or since a new shadow page is synced if either it is not leaf or there already exists another unsync shadow page with the same gfn. Signed-off-by: Zhao Jin crono...@gmail.com --- arch/x86/kvm/mmu.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index d7e1694..f36de41 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1698,7 +1698,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, if (!direct) { if (rmap_write_protect(vcpu-kvm, gfn)) kvm_flush_remote_tlbs(vcpu-kvm); - if (level PT_PAGE_TABLE_LEVEL need_sync) + if (level PT_PAGE_TABLE_LEVEL || need_sync) kvm_sync_pages(vcpu, gfn); account_shadowed(vcpu-kvm, gfn); -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: VMX: fix incorrect operand
Should test save-ar for access rights. Signed-off-by: Zhao Jin crono...@gmail.com --- arch/x86/kvm/vmx.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e65a158..62086da 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2544,7 +2544,7 @@ static void fix_pmode_dataseg(int seg, struct kvm_save_segment *save) { struct kvm_vmx_segment_field *sf = kvm_vmx_segment_fields[seg]; - if (vmcs_readl(sf-base) == save-base (save-base AR_S_MASK)) { + if (vmcs_readl(sf-base) == save-base (save-ar AR_S_MASK)) { vmcs_write16(sf-selector, save-selector); vmcs_writel(sf-base, save-base); vmcs_write32(sf-limit, save-limit); -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qcow2 eating up space when formattng Centos
Hello Anonnymous, On Monday 24 October 2011 03:22:15 day knight wrote: I am not sure if this is the right behaviour but qcow2 image seems to grow when Centos is only formatting the image. I mean it goes upto 30 Gig once evrything is formatted and installed and it is minimal Centos install with no gui or apps just baseline. OS = Centos5 Virtualization: KVM Total Qcow2 Image Created = 1TB Once Centos is installed the qcow2 image shows as around 30 Gig. I have done several install and they were all less than 4 gig or even lesser but this seems to not make sense Can someone please explain what is going on? You didn't tell which file system you're using. ext3 needs to initialize its meta data (super blocks, inote tables), which are scattered all over the image. Depending on your cluster size fpr the qcow2 file, each (small) write takes the space of a full cluster. Add to that the meta-data needed by qcow2 itself (a two-level tree), 30 GiB seem to be okay. You might want to try ext4 with delayed allocation (IMHO enabled by default), which doesn't write all over the range, since initialization of the superblock and inode tables is delayed until they are acually needed. Sincerely Philipp -- Philipp Hahn Open Source Software Engineer h...@univention.de Univention GmbHLinux for Your Businessfon: +49 421 22 232- 0 Mary-Somerville-Str.1 D-28359 Bremen fax: +49 421 22 232-99 http://www.univention.de/ signature.asc Description: This is a digitally signed message part.
Re: [PATCH] KVM: MMU: initialize sptes early
On 2011/10/24 15:21, Zhao Jin wrote: Otherwise, the following kvm_sync_pages() will see invalid sptes in a new shadow page. No, kvm_sync_pages just handle the unsync page, but the new sp is the sync page. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: MMU: fix the condition of syncing a new shadow page
On 2011/10/24 15:21, Zhao Jin wrote: Should be or since a new shadow page is synced if either it is not leaf or there already exists another unsync shadow page with the same gfn. It is obviously wrong, we need to sync pages only if it has unsync page *and* the new shadow page breaks the unsync rule(only the level 1 sp can became unsync). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] kvm tools: Simplify msi message handling
This patch simplifies passing around msi messages by using 'struct kvm_irq_routing_msi' for storing of msi messages instead of passing all msi parameters around. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/hw/pci-shmem.c|5 + tools/kvm/include/kvm/irq.h |4 +++- tools/kvm/include/kvm/pci.h |7 +++ tools/kvm/irq.c |8 tools/kvm/virtio/pci.c | 10 ++ 5 files changed, 13 insertions(+), 21 deletions(-) diff --git a/tools/kvm/hw/pci-shmem.c b/tools/kvm/hw/pci-shmem.c index 2907a66..780a377 100644 --- a/tools/kvm/hw/pci-shmem.c +++ b/tools/kvm/hw/pci-shmem.c @@ -124,10 +124,7 @@ int pci_shmem__get_local_irqfd(struct kvm *kvm) return fd; if (pci_shmem_pci_device.msix.ctrl PCI_MSIX_FLAGS_ENABLE) { - gsi = irq__add_msix_route(kvm, - msix_table[0].low, - msix_table[0].high, - msix_table[0].data); + gsi = irq__add_msix_route(kvm, msix_table[0].msg); } else { gsi = pci_shmem_pci_device.irq_line; } diff --git a/tools/kvm/include/kvm/irq.h b/tools/kvm/include/kvm/irq.h index 401bee9..61f593d 100644 --- a/tools/kvm/include/kvm/irq.h +++ b/tools/kvm/include/kvm/irq.h @@ -4,6 +4,8 @@ #include linux/types.h #include linux/rbtree.h #include linux/list.h +#include linux/kvm.h +#include linux/msi.h struct kvm; @@ -24,6 +26,6 @@ int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line); struct rb_node *irq__get_pci_tree(void); void irq__init(struct kvm *kvm); -int irq__add_msix_route(struct kvm *kvm, u32 low, u32 high, u32 data); +int irq__add_msix_route(struct kvm *kvm, struct msi_msg *msg); #endif diff --git a/tools/kvm/include/kvm/pci.h b/tools/kvm/include/kvm/pci.h index 5ee8005..f71af0b 100644 --- a/tools/kvm/include/kvm/pci.h +++ b/tools/kvm/include/kvm/pci.h @@ -2,8 +2,9 @@ #define KVM__PCI_H #include linux/types.h - +#include linux/kvm.h #include linux/pci_regs.h +#include linux/msi.h /* * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1. @@ -26,9 +27,7 @@ struct pci_config_address { }; struct msix_table { - u32 low; - u32 high; - u32 data; + struct msi_msg msg; u32 ctrl; }; diff --git a/tools/kvm/irq.c b/tools/kvm/irq.c index e35bf18..dc2247e 100644 --- a/tools/kvm/irq.c +++ b/tools/kvm/irq.c @@ -167,7 +167,7 @@ void irq__init(struct kvm *kvm) die(Failed setting GSI routes); } -int irq__add_msix_route(struct kvm *kvm, u32 low, u32 high, u32 data) +int irq__add_msix_route(struct kvm *kvm, struct msi_msg *msg) { int r; @@ -175,9 +175,9 @@ int irq__add_msix_route(struct kvm *kvm, u32 low, u32 high, u32 data) (struct kvm_irq_routing_entry) { .gsi = gsi, .type = KVM_IRQ_ROUTING_MSI, - .u.msi.address_lo = low, - .u.msi.address_hi = high, - .u.msi.data = data, + .u.msi.address_hi = msg-address_hi, + .u.msi.address_lo = msg-address_lo, + .u.msi.data = msg-data, }; r = ioctl(kvm-vm_fd, KVM_SET_GSI_ROUTING, irq_routing); diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c index f01851b..73d55a9 100644 --- a/tools/kvm/virtio/pci.c +++ b/tools/kvm/virtio/pci.c @@ -126,20 +126,14 @@ static bool virtio_pci__specific_io_out(struct kvm *kvm, struct virtio_pci *vpci case VIRTIO_MSI_CONFIG_VECTOR: vec = vpci-config_vector = ioport__read16(data); - gsi = irq__add_msix_route(kvm, - vpci-msix_table[vec].low, - vpci-msix_table[vec].high, - vpci-msix_table[vec].data); + gsi = irq__add_msix_route(kvm, vpci-msix_table[vec].msg); vpci-config_gsi = gsi; break; case VIRTIO_MSI_QUEUE_VECTOR: { vec = vpci-vq_vector[vpci-queue_selector] = ioport__read16(data); - gsi = irq__add_msix_route(kvm, - vpci-msix_table[vec].low, - vpci-msix_table[vec].high, - vpci-msix_table[vec].data); + gsi = irq__add_msix_route(kvm, vpci-msix_table[vec].msg); vpci-gsis[vpci-queue_selector] = gsi; break; } -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo
Re: [PATCH] KVM: MMU: initialize sptes early
2011/10/24 Xiao Guangrong xiao.guangr...@qq.com: On 2011/10/24 15:21, Zhao Jin wrote: Otherwise, the following kvm_sync_pages() will see invalid sptes in a new shadow page. No, kvm_sync_pages just handle the unsync page, but the new sp is the sync page. Sorry, I didn't notice the sp itself was zero-ed when allocated hence was considered as synced. Please ignore this patch. Thanks for the remainder. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 4/5] kvm guest : Added configuration support to enable debug information for KVM Guests
On Mon, 2011-10-24 at 00:37 +0530, Raghavendra K T wrote: Added configuration support to enable debug information for KVM Guests in debugfs Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Suzuki Poulose suz...@in.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 1f03f82..ed34269 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -562,6 +562,15 @@ config KVM_GUEST This option enables various optimizations for running under the KVM hypervisor. +config KVM_DEBUG_FS + bool Enable debug information for KVM Guests in debugfs + depends on KVM_GUEST Shouldn't it depend on DEBUG_FS as well? + default n + ---help--- + This option enables collection of various statistics for KVM guest. + Statistics are displayed in debugfs filesystem. Enabling this option + may incur significant overhead. + source arch/x86/lguest/Kconfig config PARAVIRT -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 3/5] kvm hypervisor : Add two hypercalls to support pv-ticketlock
On Mon, 2011-10-24 at 00:35 +0530, Raghavendra K T wrote: Add two hypercalls to KVM hypervisor to support pv-ticketlocks. KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or it is woken up because of an event like interrupt. KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu. The presence of these hypercalls is indicated to guest via KVM_FEATURE_WAIT_FOR_KICK/KVM_CAP_WAIT_FOR_KICK. Qemu needs a corresponding patch to pass up the presence of this feature to guest via cpuid. Patch to qemu will be sent separately. There is no Xen/KVM hypercall interface to await kick from. Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Suzuki Poulose suz...@in.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 734c376..2874c19 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -16,12 +16,14 @@ #define KVM_FEATURE_CLOCKSOURCE 0 #define KVM_FEATURE_NOP_IO_DELAY 1 #define KVM_FEATURE_MMU_OP 2 + /* This indicates that the new set of kvmclock msrs * are available. The use of 0x11 and 0x12 is deprecated */ #define KVM_FEATURE_CLOCKSOURCE23 #define KVM_FEATURE_ASYNC_PF 4 #define KVM_FEATURE_STEAL_TIME 5 +#define KVM_FEATURE_WAIT_FOR_KICK 6 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 84a28ea..b43fd18 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2077,6 +2077,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_XSAVE: case KVM_CAP_ASYNC_PF: case KVM_CAP_GET_TSC_KHZ: + case KVM_CAP_WAIT_FOR_KICK: r = 1; break; case KVM_CAP_COALESCED_MMIO: @@ -2548,7 +2549,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, (1 KVM_FEATURE_NOP_IO_DELAY) | (1 KVM_FEATURE_CLOCKSOURCE2) | (1 KVM_FEATURE_ASYNC_PF) | - (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); + (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | + (1 KVM_FEATURE_WAIT_FOR_KICK); if (sched_info_on()) entry-eax |= (1 KVM_FEATURE_STEAL_TIME); @@ -5231,6 +5233,61 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) return 1; } +/* + * kvm_pv_wait_for_kick_op : Block until kicked by either a KVM_HC_KICK_CPU + * hypercall or a event like interrupt. + * + * @vcpu : vcpu which is blocking. + */ +static void kvm_pv_wait_for_kick_op(struct kvm_vcpu *vcpu) +{ + DEFINE_WAIT(wait); + + /* + * Blocking on vcpu-wq allows us to wake up sooner if required to + * service pending events (like interrupts). + * + * Also set state to TASK_INTERRUPTIBLE before checking vcpu-kicked to + * avoid racing with kvm_pv_kick_cpu_op(). + */ + prepare_to_wait(vcpu-wq, wait, TASK_INTERRUPTIBLE); + + /* + * Somebody has already tried kicking us. Acknowledge that + * and terminate the wait. + */ + if (vcpu-kicked) { + vcpu-kicked = 0; + goto end_wait; + } + + /* Let's wait for either KVM_HC_KICK_CPU or someother event + * to wake us up. + */ + + srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx); + schedule(); + vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu); + +end_wait: + finish_wait(vcpu-wq, wait); +} + +/* + * kvm_pv_kick_cpu_op: Kick a vcpu. + * + * @cpu - vcpu to be kicked. + */ +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int cpu) +{ + struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, cpu); + + if (vcpu) { + vcpu-kicked = 1; I'm not sure about it, but maybe we want a memory barrier over here? + wake_up_interruptible(vcpu-wq); + } +} -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 5/5] kvm guest : pv-ticketlocks support for linux guests running on KVM hypervisor
On Mon, 2011-10-24 at 00:37 +0530, Raghavendra K T wrote: This patch extends Linux guests running on KVM hypervisor to support pv-ticketlocks. Very early during bootup, paravirtualied KVM guest detects if the hypervisor has required feature (KVM_FEATURE_WAIT_FOR_KICK) to support pv-ticketlocks. If so, support for pv-ticketlocks is registered via pv_lock_ops. Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Suzuki Poulose suz...@in.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 2874c19..c7f34b7 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -195,10 +195,18 @@ void kvm_async_pf_task_wait(u32 token); void kvm_async_pf_task_wake(u32 token); u32 kvm_read_and_reset_pf_reason(void); extern void kvm_disable_steal_time(void); -#else + +#ifdef CONFIG_PARAVIRT_SPINLOCKS +void __init kvm_guest_early_init(void); +#else /* CONFIG_PARAVIRT_SPINLOCKS */ +#define kvm_guest_early_init() do { } while (0) This should be defined as an empty function. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: MMU: fix the condition of syncing a new shadow page
2011/10/24 Xiao Guangrong xiao.guangr...@qq.com: On 2011/10/24 15:21, Zhao Jin wrote: Should be or since a new shadow page is synced if either it is not leaf or there already exists another unsync shadow page with the same gfn. It is obviously wrong, we need to sync pages only if it has unsync page *and* the new shadow page breaks the unsync rule(only the level 1 sp can became unsync). Please ignore this patch as I had taken an incorrect assumption. Thanks very much for the correction. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC v2 PATCH 5/4 PATCH] virtio-net: send gratuitous packet when needed
On Mon, 2011-10-24 at 07:25 +0200, Michael S. Tsirkin wrote: On Mon, Oct 24, 2011 at 02:54:59PM +1030, Rusty Russell wrote: On Sat, 22 Oct 2011 13:43:11 +0800, Jason Wang jasow...@redhat.com wrote: This make let virtio-net driver can send gratituous packet by a new config bit - VIRTIO_NET_S_ANNOUNCE in each config update interrupt. When this bit is set by backend, the driver would schedule a workqueue to send gratituous packet through NETDEV_NOTIFY_PEERS. This feature is negotiated through bit VIRTIO_NET_F_GUEST_ANNOUNCE. Signed-off-by: Jason Wang jasow...@redhat.com This seems like a huge layering violation. Imagine this in real hardware, for example. commits 06c4648d46d1b757d6b9591a86810be79818b60c and 99606477a5888b0ead0284fecb13417b1da8e3af document the need for this: NETDEV_NOTIFY_PEERS notifier indicates that a device moved to a different physical link. and In real hardware such notifications are only generated when the device comes up or the address changes. So hypervisor could get the same behaviour by sending link up/down events, this is just an optimization so guest won't do unecessary stuff like try to reconfigure an IP address. Maybe LOCATION_CHANGE would be a better name? [...] We also use this in bonding failover, where the system location doesn't change but a different link is used. However, I do recognise that the name ought to indicate what kind of change happened and not what the expected action is. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 2/5] debugfs: Renaming of xen functions and change unsigned to u32
On 10/24/2011 03:49 AM, Greg KH wrote: On Mon, Oct 24, 2011 at 12:34:59AM +0530, Raghavendra K T wrote: Renaming of xen functions and change unsigned to u32. Why not just rename when you move the functions? Why the extra step? Intention was only clarity. Yes, if this patch is an overhead, I 'll combine both the patches. greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 1/5] debugfs: Add support to print u32 array in debugfs
On 10/24/2011 03:50 AM, Greg KH wrote: On Mon, Oct 24, 2011 at 12:34:04AM +0530, Raghavendra K T wrote: Add debugfs support to print u32-arrays in debugfs. Move the code from Xen to debugfs to make the code common for other users as well. You forgot the kerneldoc for the function explaining what it is and how to use it, and the EXPORT_SYMBOL_GPL() marking for the global function as that's the only way it will be able to be used, right? Greg right. Thanks for finding this. I 'll update the patch for that. thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 5/5] kvm guest : pv-ticketlocks support for linux guests running on KVM hypervisor
On 10/24/2011 03:31 PM, Sasha Levin wrote: On Mon, 2011-10-24 at 00:37 +0530, Raghavendra K T wrote: +#else /* CONFIG_PARAVIRT_SPINLOCKS */ +#define kvm_guest_early_init() do { } while (0) This should be defined as an empty function. Yes Agree, I 'll change to an empty function. - Raghu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 2/5] debugfs: Renaming of xen functions and change unsigned to u32
On Mon, Oct 24, 2011 at 02:58:47PM +0530, Raghavendra K T wrote: On 10/24/2011 03:49 AM, Greg KH wrote: On Mon, Oct 24, 2011 at 12:34:59AM +0530, Raghavendra K T wrote: Renaming of xen functions and change unsigned to u32. Why not just rename when you move the functions? Why the extra step? Intention was only clarity. Yes, if this patch is an overhead, I 'll combine both the patches. Yeah, it makes more sense as it originally confused me why you were adding a xen_* function to the debugfs core code :) thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips
On 10/21/2011 11:19 AM, Jan Kiszka wrote: Currently, MSI messages can only be injected to in-kernel irqchips by defining a corresponding IRQ route for each message. This is not only unhandy if the MSI messages are generated on the fly by user space, IRQ routes are a limited resource that user space as to manage carefully. By itself, this does not provide enough value to offset the cost of a new ABI, especially as userspace will need to continue supporting the old method for a very long while. By providing a direct injection with, we can both avoid using up limited resources and simplify the necessary steps for user land. The API already provides a channel (flags) to revoke an injected but not yet delivered message which will become important for in-kernel MSI-X vector masking support. With the new feature it may be worthwhile, but I'd like to see the whole thing, with numbers attached. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index
On Wed, Oct 19, 2011 at 12:12:20PM +0200, Michael S. Tsirkin wrote: On Thu, Jun 09, 2011 at 06:41:56AM -0400, Mark Wu wrote: On 06/09/2011 05:14 AM, Tejun Heo wrote: Hello, On Thu, Jun 09, 2011 at 08:51:05AM +0930, Rusty Russell wrote: On Wed, 08 Jun 2011 09:08:29 -0400, Mark Wu d...@redhat.com wrote: Hi Rusty, Yes, I can't figure out an instance of disk probing in parallel either, but as per the following commit, I think we still need use lock for safety. What's your opinion? commit 4034cc68157bfa0b6622efe368488d3d3e20f4e6 Author: Tejun Heo t...@kernel.org Date: Sat Feb 21 11:04:45 2009 +0900 [SCSI] sd: revive sd_index_lock Commit f27bac2761cab5a2e212dea602d22457a9aa6943 which converted sd to use ida instead of idr incorrectly removed sd_index_lock around id allocation and free. idr/ida do have internal locks but they protect their free object lists not the allocation itself. The caller is responsible for that. This missing synchronization led to the same id being assigned to multiple devices leading to oops. I'm confused. Tejun, Greg, anyone can probes happen in parallel? If so, I'll have to review all my drivers. Unless async is explicitly used, probe happens sequentially. IOW, if there's no async_schedule() call, things won't happen in parallel. That said, I think it wouldn't be such a bad idea to protect ida with spinlock regardless unless the probe code explicitly requires serialization. Thanks. Since virtio blk driver doesn't use async probe, it needn't use spinlock to protect ida. So remove the lock from patch. From fbb396df9dbf8023f1b268be01b43529a3993d57 Mon Sep 17 00:00:00 2001 From: Mark Wu d...@redhat.com Date: Thu, 9 Jun 2011 06:34:07 -0400 Subject: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index Current index allocation in virtio-blk is based on a monotonically increasing variable index. It could cause some confusion about disk name in the case of hot-plugging disks. And it's impossible to find the lowest available index by just maintaining a simple index. So it's changed to use ida to allocate index via referring to the index allocation in scsi disk. Signed-off-by: Mark Wu d...@redhat.com Acked-by: Michael S. Tsirkin m...@redhat.com This got lost in the noise and missed 3.1 which is unfortunate. How about we apply this as is and look at cleanups as a next step? Rusty, any opinion on merging this for 3.2? I expect merge window will open right after the summit, so need to decide soon ... --- drivers/block/virtio_blk.c | 28 +++- 1 files changed, 23 insertions(+), 5 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 079c088..bf81ab6 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -8,10 +8,13 @@ #include linux/scatterlist.h #include linux/string_helpers.h #include scsi/scsi_cmnd.h +#include linux/idr.h #define PART_BITS 4 -static int major, index; +static int major; +static DEFINE_IDA(vd_index_ida); + struct workqueue_struct *virtblk_wq; struct virtio_blk @@ -23,6 +26,7 @@ struct virtio_blk /* The disk structure for the kernel. */ struct gendisk *disk; + u32 index; /* Request tracking. */ struct list_head reqs; @@ -343,12 +347,23 @@ static int __devinit virtblk_probe(struct virtio_device *vdev) struct request_queue *q; int err; u64 cap; - u32 v, blk_size, sg_elems, opt_io_size; + u32 v, blk_size, sg_elems, opt_io_size, index; u16 min_io_size; u8 physical_block_exp, alignment_offset; - if (index_to_minor(index) = 1 MINORBITS) - return -ENOSPC; + do { + if (!ida_pre_get(vd_index_ida, GFP_KERNEL)) + return -ENOMEM; + err = ida_get_new(vd_index_ida, index); + } while (err == -EAGAIN); + + if (err) + return err; + + if (index_to_minor(index) = 1 MINORBITS) { + err = -ENOSPC; + goto out_free_index; + } /* We need to know how many segments before we allocate. */ err = virtio_config_val(vdev, VIRTIO_BLK_F_SEG_MAX, @@ -421,7 +436,7 @@ static int __devinit virtblk_probe(struct virtio_device *vdev) vblk-disk-private_data = vblk; vblk-disk-fops = virtblk_fops; vblk-disk-driverfs_dev = vdev-dev; - index++; + vblk-index = index; /* configure queue flush support */ if (virtio_has_feature(vdev, VIRTIO_BLK_F_FLUSH)) @@ -516,6 +531,8 @@ out_free_vq: vdev-config-del_vqs(vdev); out_free_vblk: kfree(vblk); +out_free_index: + ida_remove(vd_index_ida, index); out: return err; } @@ -538,6 +555,7 @@ static void __devexit virtblk_remove(struct
Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index
On 2011-10-24 12:02, Michael S. Tsirkin wrote: On Wed, Oct 19, 2011 at 12:12:20PM +0200, Michael S. Tsirkin wrote: On Thu, Jun 09, 2011 at 06:41:56AM -0400, Mark Wu wrote: On 06/09/2011 05:14 AM, Tejun Heo wrote: Hello, On Thu, Jun 09, 2011 at 08:51:05AM +0930, Rusty Russell wrote: On Wed, 08 Jun 2011 09:08:29 -0400, Mark Wu d...@redhat.com wrote: Hi Rusty, Yes, I can't figure out an instance of disk probing in parallel either, but as per the following commit, I think we still need use lock for safety. What's your opinion? commit 4034cc68157bfa0b6622efe368488d3d3e20f4e6 Author: Tejun Heo t...@kernel.org Date: Sat Feb 21 11:04:45 2009 +0900 [SCSI] sd: revive sd_index_lock Commit f27bac2761cab5a2e212dea602d22457a9aa6943 which converted sd to use ida instead of idr incorrectly removed sd_index_lock around id allocation and free. idr/ida do have internal locks but they protect their free object lists not the allocation itself. The caller is responsible for that. This missing synchronization led to the same id being assigned to multiple devices leading to oops. I'm confused. Tejun, Greg, anyone can probes happen in parallel? If so, I'll have to review all my drivers. Unless async is explicitly used, probe happens sequentially. IOW, if there's no async_schedule() call, things won't happen in parallel. That said, I think it wouldn't be such a bad idea to protect ida with spinlock regardless unless the probe code explicitly requires serialization. Thanks. Since virtio blk driver doesn't use async probe, it needn't use spinlock to protect ida. So remove the lock from patch. From fbb396df9dbf8023f1b268be01b43529a3993d57 Mon Sep 17 00:00:00 2001 From: Mark Wu d...@redhat.com Date: Thu, 9 Jun 2011 06:34:07 -0400 Subject: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index Current index allocation in virtio-blk is based on a monotonically increasing variable index. It could cause some confusion about disk name in the case of hot-plugging disks. And it's impossible to find the lowest available index by just maintaining a simple index. So it's changed to use ida to allocate index via referring to the index allocation in scsi disk. Signed-off-by: Mark Wu d...@redhat.com Acked-by: Michael S. Tsirkin m...@redhat.com This got lost in the noise and missed 3.1 which is unfortunate. How about we apply this as is and look at cleanups as a next step? Rusty, any opinion on merging this for 3.2? I expect merge window will open right after the summit, I can toss it into for-3.2/drivers, if there's consensus to do that now. -- Jens Axboe -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 3/5] kvm hypervisor : Add two hypercalls to support pv-ticketlock
On 10/23/2011 09:05 PM, Raghavendra K T wrote: Add two hypercalls to KVM hypervisor to support pv-ticketlocks. KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or it is woken up because of an event like interrupt. KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu. The presence of these hypercalls is indicated to guest via KVM_FEATURE_WAIT_FOR_KICK/KVM_CAP_WAIT_FOR_KICK. Qemu needs a corresponding patch to pass up the presence of this feature to guest via cpuid. Patch to qemu will be sent separately. There is no Xen/KVM hypercall interface to await kick from. +/* + * kvm_pv_wait_for_kick_op : Block until kicked by either a KVM_HC_KICK_CPU + * hypercall or a event like interrupt. + * + * @vcpu : vcpu which is blocking. + */ +static void kvm_pv_wait_for_kick_op(struct kvm_vcpu *vcpu) +{ + DEFINE_WAIT(wait); + + /* + * Blocking on vcpu-wq allows us to wake up sooner if required to + * service pending events (like interrupts). + * + * Also set state to TASK_INTERRUPTIBLE before checking vcpu-kicked to + * avoid racing with kvm_pv_kick_cpu_op(). + */ + prepare_to_wait(vcpu-wq, wait, TASK_INTERRUPTIBLE); + + /* + * Somebody has already tried kicking us. Acknowledge that + * and terminate the wait. + */ + if (vcpu-kicked) { + vcpu-kicked = 0; + goto end_wait; + } + + /* Let's wait for either KVM_HC_KICK_CPU or someother event + * to wake us up. + */ + + srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx); + schedule(); + vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu); + +end_wait: + finish_wait(vcpu-wq, wait); +} This hypercall can be replaced by a HLT instruction, no? I'm pretty sure this misses a lot of stuff from kvm_vcpu_block(). + +/* + * kvm_pv_kick_cpu_op: Kick a vcpu. + * + * @cpu - vcpu to be kicked. + */ +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int cpu) +{ + struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, cpu); + Is the vcpu number meaningful? We should reuse an existing identifier like the APIC ID. + if (vcpu) { + vcpu-kicked = 1; Need to use smp memory barriers here. + wake_up_interruptible(vcpu-wq); + } +} + int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) { unsigned long nr, a0, a1, a2, a3, ret; -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 4/5] kvm guest : Added configuration support to enable debug information for KVM Guests
On 10/23/2011 09:07 PM, Raghavendra K T wrote: Added configuration support to enable debug information for KVM Guests in debugfs Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Suzuki Poulose suz...@in.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 1f03f82..ed34269 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -562,6 +562,15 @@ config KVM_GUEST This option enables various optimizations for running under the KVM hypervisor. +config KVM_DEBUG_FS + bool Enable debug information for KVM Guests in debugfs + depends on KVM_GUEST + default n + ---help--- + This option enables collection of various statistics for KVM guest. + Statistics are displayed in debugfs filesystem. Enabling this option + may incur significant overhead. + source arch/x86/lguest/Kconfig This might be better implemented through tracepoints, which an be enabled dynamically. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips
On 2011-10-24 11:45, Avi Kivity wrote: On 10/21/2011 11:19 AM, Jan Kiszka wrote: Currently, MSI messages can only be injected to in-kernel irqchips by defining a corresponding IRQ route for each message. This is not only unhandy if the MSI messages are generated on the fly by user space, IRQ routes are a limited resource that user space as to manage carefully. By itself, this does not provide enough value to offset the cost of a new ABI, especially as userspace will need to continue supporting the old method for a very long while. Yes, but less sophistically as it would now. By providing a direct injection with, we can both avoid using up limited resources and simplify the necessary steps for user land. The API already provides a channel (flags) to revoke an injected but not yet delivered message which will become important for in-kernel MSI-X vector masking support. With the new feature it may be worthwhile, but I'd like to see the whole thing, with numbers attached. It's not a performance issue, it's a resource limitation issue: With the new API we can stop worrying about user space device models consuming limited IRQ routes of the KVM subsystem. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
qemu-kvm guest which won't 'cont' (emulation failure?)
I have a qemu-kvm guest (apparently a Ubuntu 11.04 x86-64 install) which has stopped and refuses to continue: (qemu) info status VM status: paused (qemu) cont (qemu) info status VM status: paused The host is running linux 2.6.39.2 with qemu-kvm 0.14.1 on 24-core Opteron 6176 box, and has nine other 2GB production guests on it running absolutely fine. It's been a while since I've seen one of these. When I last saw a cluster of them, they were emulation failures (big real mode instructions, maybe?). I also remember a message about abnormal exit in the dmesg previously, but I don't have that here. This time, there is no host kernel output at all, just the paused guest. I have qemu monitor access and can even strace the relevant qemu process if necessary: is it possible to use this to diagnose what's caused this guest to stop, e.g. the unsupported instruction if it's an emulation failure? Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)
Am 24.10.2011 12:00, schrieb Chris Webb: I have a qemu-kvm guest (apparently a Ubuntu 11.04 x86-64 install) which has stopped and refuses to continue: (qemu) info status VM status: paused (qemu) cont (qemu) info status VM status: paused The host is running linux 2.6.39.2 with qemu-kvm 0.14.1 on 24-core Opteron 6176 box, and has nine other 2GB production guests on it running absolutely fine. It's been a while since I've seen one of these. When I last saw a cluster of them, they were emulation failures (big real mode instructions, maybe?). I also remember a message about abnormal exit in the dmesg previously, but I don't have that here. This time, there is no host kernel output at all, just the paused guest. I have qemu monitor access and can even strace the relevant qemu process if necessary: is it possible to use this to diagnose what's caused this guest to stop, e.g. the unsupported instruction if it's an emulation failure? Another common cause for stopped VMs are I/O errors, for example writes to a sparse image when the disk is full. Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)
Kevin Wolf kw...@redhat.com writes: Am 24.10.2011 12:00, schrieb Chris Webb: I have qemu monitor access and can even strace the relevant qemu process if necessary: is it possible to use this to diagnose what's caused this guest to stop, e.g. the unsupported instruction if it's an emulation failure? Another common cause for stopped VMs are I/O errors, for example writes to a sparse image when the disk is full. This guest are backed by LVM LVs so I don't think they can return EFULL, but I could imagine read errors, so I've just done a trivial test to make sure I can read them end-to-end: 0015# dd if=/dev/mapper/guest\:e549f8e1-4c0e-4dea-826a-e4b877282c07\:ide\:0\:0 of=/dev/null bs=1M 3136+0 records in 3136+0 records out 3288334336 bytes (3.3 GB) copied, 20.898 s, 157 MB/s 0015# dd if=/dev/mapper/guest\:e549f8e1-4c0e-4dea-826a-e4b877282c07\:ide\:0\:1 of=/dev/null bs=1M 276+0 records in 276+0 records out 289406976 bytes (289 MB) copied, 1.85218 s, 156 MB/s Is there any way to ask qemu why a guest has stopped, so I can distinguish IO problems from emulation problems from anything else? Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 3/5] kvm hypervisor : Add two hypercalls to support pv-ticketlock
On 10/24/2011 03:31 PM, Sasha Levin wrote: On Mon, 2011-10-24 at 00:35 +0530, Raghavendra K T wrote: Add two hypercalls to KVM hypervisor to support pv-ticketlocks. +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int cpu) +{ + struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, cpu); + + if (vcpu) { + vcpu-kicked = 1; I'm not sure about it, but maybe we want a memory barrier over here? Yes, Thanks for pointing this. Avi Kivity also pointed same. 'll add barrier() here. + wake_up_interruptible(vcpu-wq); + } +} -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call agenda for October 25
Hi Please send in any agenda items you are interested in covering. Thanks, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] kvm: use this_cpu_xxx replace percpu_xxx funcs
On 10/24/2011 04:50 AM, Alex,Shi wrote: On Thu, 2011-10-20 at 15:34 +0800, Alex,Shi wrote: percpu_xxx funcs are duplicated with this_cpu_xxx funcs, so replace them for further code clean up. And in preempt safe scenario, __this_cpu_xxx funcs has a bit better performance since __this_cpu_xxx has no redundant preempt_disable() Avi: Would you like to give some comments of this? Sorry, was travelling: Acked-by: Avi Kivity a...@redhat.com -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips
On 10/24/2011 12:19 PM, Jan Kiszka wrote: With the new feature it may be worthwhile, but I'd like to see the whole thing, with numbers attached. It's not a performance issue, it's a resource limitation issue: With the new API we can stop worrying about user space device models consuming limited IRQ routes of the KVM subsystem. Only if those devices are in the same process (or have access to the vmfd). Interrupt routing together with irqfd allows you to disaggregate the device model. Instead of providing a competing implementation with new limitations, we need to remove the limitations of the old implementation. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Nested Virtualization Of Hyper-V 2K8R2
Hi Anyone got any further ideas on how I get the Hyper-V guest to work ? My kvm is 0.14 (Ubuntu 11.04 Server) - is this just too old ? Jim On 19/10/2011 16:07, Jim wrote: On 19/10/2011 16:06, Jim wrote: Hi Joerg, I added the -cpu phenom,-hv but it made no difference. I then tried to call it from the command line (rather then via virsh) and get this : # /usr/bin/kvm -cpu phenom,-hv *CPU feature hv not found* I played around a little and found 'svm' seemed to be a supported cpu flag but both +svm and -svm made no difference either. Alas kvm -cpu ? only listed CPUs and not the options the various ones support. Am I on too low a version of kvm perhaps ? This is an Ubuntu 11.04 server system and I've just used the Ubuntu packages - I did not build kvm myself. Thanks Jim My CPU reports as : *processor: 0-3 i.e. 4 cores* vendor_id: AuthenticAMD cpu family: 16 model: 2 model name: Quad-Core AMD Opteron(tm) Processor 1354 stepping: 3 cpu MHz: 1100.000 cache size: 512 KB physical id: 0 siblings: 4 core id: 3 cpu cores: 4 apicid: 3 initial apicid: 3 fpu: yes fpu_exception: yes cpuid level: 5 wp: yes flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs npt lbrv svm_lock bogomips: 4400.04 TLB size: 1024 4K pages clflush size: 64 cache_alignment: 64 address sizes: 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate On 19/10/2011 15:19, Joerg Roedel wrote: Hi Jim, On Tue, Oct 18, 2011 at 07:28:52PM +0100, Jim wrote: Sure, the KVM command is : /usr/bin/kvm -enable-nesting -no-kvm-irqchip -S -M pc-0.14 -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -name hyperv1 -uuid 8c5d8f1f-5767-b388-d408-1b53a1b66e72 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/hyperv1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=localtime -no-reboot -boot d -drive file=/srv/hyperv/hyperv1.vmimg,if=none,id=drive-ide0-0-0,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=/srv/virtual-machines/fromiscsi/iso/W2K8ENTR2SP1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/srv/virtual-machines/fromiscsi/iso/virtio-win-1.1.16.iso,if=none,media=cdrom,id=drive-ide0-1-1,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev tap,fd=17,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:2a:be:2f,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga std -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 This is missing a -cpu parameter. Please try again with adding '-cpu phenom,-hv'. This is the combination I used during testing and development. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message tomajord...@vger.kernel.org More majordomo info athttp://vger.kernel.org/majordomo-info.html -- All e-mail and telephone communications are subject to Suresafe Terms And Conditions and may be monitored, recorded and processed for the purposes contained therein and adherence to regulatory and legal requirements. Your further communication or reply to this e-mail indicates your acceptance of this. Any views or opinions expressed are the responsibility of the author and may not reflect those of Suresafe Protection Limited. Suresafe Protection Limited is registered in Scotland, number SC132827 The registered office is at 8 Kelvin Road, Cumbernauld, G67 2BA. Telephone: 01236 727792Fax: 01236 723301 VAT Number: 556 6950 02 -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)
Am 24.10.2011 12:58, schrieb Chris Webb: Kevin Wolf kw...@redhat.com writes: Am 24.10.2011 12:00, schrieb Chris Webb: I have qemu monitor access and can even strace the relevant qemu process if necessary: is it possible to use this to diagnose what's caused this guest to stop, e.g. the unsupported instruction if it's an emulation failure? Another common cause for stopped VMs are I/O errors, for example writes to a sparse image when the disk is full. This guest are backed by LVM LVs so I don't think they can return EFULL, but I could imagine read errors, so I've just done a trivial test to make sure I can read them end-to-end: 0015# dd if=/dev/mapper/guest\:e549f8e1-4c0e-4dea-826a-e4b877282c07\:ide\:0\:0 of=/dev/null bs=1M 3136+0 records in 3136+0 records out 3288334336 bytes (3.3 GB) copied, 20.898 s, 157 MB/s 0015# dd if=/dev/mapper/guest\:e549f8e1-4c0e-4dea-826a-e4b877282c07\:ide\:0\:1 of=/dev/null bs=1M 276+0 records in 276+0 records out 289406976 bytes (289 MB) copied, 1.85218 s, 156 MB/s Is there any way to ask qemu why a guest has stopped, so I can distinguish IO problems from emulation problems from anything else? In qemu 1.0 we'll have an extended 'info status' that includes the stop reason, but 0.14 doesn't have this yet (was committed to git master only recently). If you attach a QMP monitor (see QMP/README, don't forget to send the capabilities command, it's part of creating the connection) you will receive messages for I/O errors, though. Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 3/5] kvm hypervisor : Add two hypercalls to support pv-ticketlock
On 10/24/2011 03:44 PM, Avi Kivity wrote: On 10/23/2011 09:05 PM, Raghavendra K T wrote: Add two hypercalls to KVM hypervisor to support pv-ticketlocks. + +end_wait: + finish_wait(vcpu-wq,wait); +} This hypercall can be replaced by a HLT instruction, no? I'm pretty sure this misses a lot of stuff from kvm_vcpu_block(). Yes.. agree. HLT sounds better idea. 'll try this out. + if (vcpu) { + vcpu-kicked = 1; Need to use smp memory barriers here. Agree. + wake_up_interruptible(vcpu-wq); + } +} + int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) { unsigned long nr, a0, a1, a2, a3, ret; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)
Kevin Wolf kw...@redhat.com writes: In qemu 1.0 we'll have an extended 'info status' that includes the stop reason, but 0.14 doesn't have this yet (was committed to git master only recently). Right, okay. I might take a look at cherry-picking and back-porting that to our version of qemu-kvm if it's not too entangled with other changes. It would be very useful in these situations. If you attach a QMP monitor (see QMP/README, don't forget to send the capabilities command, it's part of creating the connection) you will receive messages for I/O errors, though. Thanks. I don't think I can do this with an already-running qemu-kvm that's in a stopped state can I, only with a new qemu-kvm invocation and wait to try to catch the problem again? Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)
Am 24.10.2011 13:29, schrieb Chris Webb: Kevin Wolf kw...@redhat.com writes: In qemu 1.0 we'll have an extended 'info status' that includes the stop reason, but 0.14 doesn't have this yet (was committed to git master only recently). Right, okay. I might take a look at cherry-picking and back-porting that to our version of qemu-kvm if it's not too entangled with other changes. It would be very useful in these situations. I'm afraid that it depends on many other changes, but you can try. If you attach a QMP monitor (see QMP/README, don't forget to send the capabilities command, it's part of creating the connection) you will receive messages for I/O errors, though. Thanks. I don't think I can do this with an already-running qemu-kvm that's in a stopped state can I, only with a new qemu-kvm invocation and wait to try to catch the problem again? Good point... The only other thing that I can think of would be attaching gdb and setting a breakpoint in vm_stop() or something. Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for October 25
On 24 October 2011 12:35, Paolo Bonzini pbonz...@redhat.com wrote: On 10/24/2011 01:04 PM, Juan Quintela wrote: Please send in any agenda items you are interested in covering. - What's left to merge for 1.0. Things on my list, FWIW: * current target-arm pullreq * PL041 support (needs another patch round to fix a minor bug Andrzej spotted) * cpu_single_env must be thread-local I also think that it's somewhat unfortunate that we now will compile on ARM hosts so that we always abort on startup (due to the reliance on a working makecontext()) but I'm not really sure how to deal with that one. -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)
Kevin Wolf kw...@redhat.com writes: Good point... The only other thing that I can think of would be attaching gdb and setting a breakpoint in vm_stop() or something. Perfect, that seems to identified what's going on very nicely: (gdb) break vm_stop Breakpoint 1 at 0x407d10: file /home/root/packages/qemu-kvm/src-UMBurO/cpus.c, line 318. (gdb) fg Continuing. Breakpoint 1, vm_stop (reason=0) at /home/root/packages/qemu-kvm/src-UMBurO/cpus.c:318 318 /home/root/packages/qemu-kvm/src-UMBurO/cpus.c: No such file or directory. in /home/root/packages/qemu-kvm/src-UMBurO/cpus.c (gdb) bt #0 vm_stop (reason=0) at /home/root/packages/qemu-kvm/src-UMBurO/cpus.c:318 #1 0x0058585f in ide_handle_rw_error (s=0x20330d8, error=28, op=8) at /home/root/packages/qemu-kvm/src-UMBurO/hw/ide/core.c:468 #2 0x00588376 in ide_dma_cb (opaque=0x20330d8, ret=value optimized out) at /home/root/packages/qemu-kvm/src-UMBurO/hw/ide/core.c:494 #3 0x00590092 in dma_bdrv_cb (opaque=0x2043a10, ret=-28) at /home/root/packages/qemu-kvm/src-UMBurO/dma-helpers.c:94 #4 0x0044d64a in qcow2_aio_write_cb (opaque=0x2034900, ret=-28) at block/qcow2.c:714 #5 0x0043df6d in posix_aio_process_queue ( opaque=value optimized out) at posix-aio-compat.c:462 #6 0x0043e07d in posix_aio_read (opaque=0x17c8110) at posix-aio-compat.c:503 #7 0x00415fca in main_loop_wait (nonblocking=value optimized out) at /home/root/packages/qemu-kvm/src-UMBurO/vl.c:1383 #8 0x0042ca37 in kvm_main_loop () at /home/root/packages/qemu-kvm/src-UMBurO/qemu-kvm.c:1589 #9 0x004170a3 in main (argc=32, argv=value optimized out, envp=value optimized out) at /home/root/packages/qemu-kvm/src-UMBurO/vl.c:1429 I see what's happened here: we're not explicitly setting format=raw when we start that guest and someone's uploaded a qcow2 image directly to a block device. Ouch. Sorry for the noise! Best wishes, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips
On 2011-10-24 13:09, Avi Kivity wrote: On 10/24/2011 12:19 PM, Jan Kiszka wrote: With the new feature it may be worthwhile, but I'd like to see the whole thing, with numbers attached. It's not a performance issue, it's a resource limitation issue: With the new API we can stop worrying about user space device models consuming limited IRQ routes of the KVM subsystem. Only if those devices are in the same process (or have access to the vmfd). Interrupt routing together with irqfd allows you to disaggregate the device model. Instead of providing a competing implementation with new limitations, we need to remove the limitations of the old implementation. That depends on where we do the cut. Currently we let the IRQ source signal an abstract edge on a pre-allocated pseudo IRQ line. But we cannot build correct MSI-X on top of the current irqfd model as we lack the level information (for PBA emulation). *) So we either need to extend the existing model anyway -- or push per-vector masking back to the IRQ source. In the latter case, it would be a very good chance to give up on limited pseudo GSIs with static routes and do MSI messaging from external IRQ sources to KVM directly. But all those considerations affect different APIs than what I'm proposing here. We will always need a way to inject MSIs in the context of the VM as there will always be scenarios where devices are better run in that very same context, for performance or simplicity or whatever reasons. E.g., I could imagine that one would like to execute an emulated IRQ remapper rather in the hypervisor context than over-microkernelized in a separate process. Jan *) Realized this while trying to generalize the proposed MSI-X MMIO acceleration for assigned devices to arbitrary device models, vhost-net, and specifically vfio. -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 3/5] kvm hypervisor : Add two hypercalls to support pv-ticketlock
* Avi Kivity a...@redhat.com [2011-10-24 12:14:21]: +/* + * kvm_pv_wait_for_kick_op : Block until kicked by either a KVM_HC_KICK_CPU + * hypercall or a event like interrupt. + * + * @vcpu : vcpu which is blocking. + */ +static void kvm_pv_wait_for_kick_op(struct kvm_vcpu *vcpu) +{ [snip] +} This hypercall can be replaced by a HLT instruction, no? Good point. Assuming yield_on_hlt=1, that would allow the vcpu to be put to sleep and let other vcpus make progress. I guess with that change, we can also dropthe need for other hypercall introduced in this patch (kvm_pv_kick_cpu_op()). Essentially a vcpu sleeping because of HLT instruction can be woken up by a IPI issued by vcpu releasing a lock. - vatsa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips
On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote: On 2011-10-24 13:09, Avi Kivity wrote: On 10/24/2011 12:19 PM, Jan Kiszka wrote: With the new feature it may be worthwhile, but I'd like to see the whole thing, with numbers attached. It's not a performance issue, it's a resource limitation issue: With the new API we can stop worrying about user space device models consuming limited IRQ routes of the KVM subsystem. Only if those devices are in the same process (or have access to the vmfd). Interrupt routing together with irqfd allows you to disaggregate the device model. Instead of providing a competing implementation with new limitations, we need to remove the limitations of the old implementation. That depends on where we do the cut. Currently we let the IRQ source signal an abstract edge on a pre-allocated pseudo IRQ line. But we cannot build correct MSI-X on top of the current irqfd model as we lack the level information (for PBA emulation). *) I don't agree here. IMO PBA emulation would need to clear pending bits on interrupt status register read. So clearing pending bits could be done by ioctl from qemu while setting them would be done from irqfd. So we either need to extend the existing model anyway -- or push per-vector masking back to the IRQ source. In the latter case, it would be a very good chance to give up on limited pseudo GSIs with static routes and do MSI messaging from external IRQ sources to KVM directly. But all those considerations affect different APIs than what I'm proposing here. We will always need a way to inject MSIs in the context of the VM as there will always be scenarios where devices are better run in that very same context, for performance or simplicity or whatever reasons. E.g., I could imagine that one would like to execute an emulated IRQ remapper rather in the hypervisor context than over-microkernelized in a separate process. Jan *) Realized this while trying to generalize the proposed MSI-X MMIO acceleration for assigned devices to arbitrary device models, vhost-net, I'm actually working on a qemu patch to get pba emulation working correctly. I think it's doable with existing irqfd. and specifically vfio. Interesting. How would you clear the pseudo interrupt level? -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[kvm-autotest]client.tests.kvm.tests.cgroup: Add TestDeviceAccess subtest
Hi guys, I have a new subtest which tests the 'devices' cgroup subsystem and improve the logging a bit. Please find the pull request on github: https://github.com/autotest/autotest/pull/48 Cheers, Lukáš -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [kvm-autotest]client.tests.kvm.tests.cgroup: Add TestDeviceAccess subtest
This subtest tries to attach scsi_debug disk with different cgroup devices.list setting. * subtests the devices.{allow, deny, list} cgroup functionality * new function get_maj_min(dev) which returns (major, minor) numbers of dev * rm_drive: support for rm_device without drive (only remove the host file) * improved logging Signed-off-by: Lukas Doktor ldok...@redhat.com --- client/tests/kvm/tests/cgroup.py | 234 +- 1 files changed, 203 insertions(+), 31 deletions(-) diff --git a/client/tests/kvm/tests/cgroup.py b/client/tests/kvm/tests/cgroup.py index 6c64532..c83f91a 100644 --- a/client/tests/kvm/tests/cgroup.py +++ b/client/tests/kvm/tests/cgroup.py @@ -50,7 +50,7 @@ def run_cgroup(test, params, env): return abs(float(actual-reference) / reference) -def get_dd_cmd(direction, dev='vd?', count=None, bs=None): +def get_dd_cmd(direction, dev=None, count=None, bs=None): Generates dd_cmd string @param direction: {read,write,bi} dd direction @@ -59,6 +59,11 @@ def run_cgroup(test, params, env): @param bs: bs parameter of dd @return: dd command string +if dev is None: +if get_device_driver() == virtio: +dev = 'vd?' +else: +dev = '[sh]d?' if direction == read: params = if=$FILE of=/dev/null iflag=direct elif direction == write: @@ -82,6 +87,21 @@ def run_cgroup(test, params, env): return params.get('drive_format', 'virtio') +def get_maj_min(dev): + +Returns the major and minor numbers of the dev device +@return: Tupple(major, minor) numbers of the dev device + +try: +ret = utils.system_output(ls -l %s % dev) +ret = re.match(r'[bc][rwx-]{9} \d+ \w+ \w+ (\d+), (\d+)', + ret).groups() +except Exception, details: +raise error.TestFail(Couldn't get %s maj and min numbers: %s % + (dev, details)) +return ret + + def add_file_drive(vm, driver=get_device_driver(), host_file=None): Hot-add a drive based on file to a vm @@ -173,14 +193,17 @@ def run_cgroup(test, params, env): err = False # TODO: Implement also via QMP -vm.monitor.cmd(pci_del %s % device) -time.sleep(3) -qtree = vm.monitor.info('qtree', debug=False) -if qtree.count('addr %s.0' % device) != 0: -err = True -vm.destroy() - -if isinstance(host_file, str):# scsi device +if device: +vm.monitor.cmd(pci_del %s % device) +time.sleep(3) +qtree = vm.monitor.info('qtree', debug=False) +if qtree.count('addr %s.0' % device) != 0: +err = True +vm.destroy() + +if host_file is None: # Do not remove +pass +elif isinstance(host_file, str):# scsi device utils.system(echo -1 /sys/bus/pseudo/drivers/scsi_debug/add_host) else: # file host_file.close() @@ -334,7 +357,7 @@ def run_cgroup(test, params, env): _TestBlkioBandwidth.__init__(self, vms, modules) # Read from the last vd* in a loop until test removes the # /tmp/cgroup_lock file (and kills us) -self.dd_cmd = get_dd_cmd(read, bs=100K) +self.dd_cmd = get_dd_cmd(read, dev='vd?', bs=100K) class TestBlkioBandwidthWeigthWrite(_TestBlkioBandwidth): @@ -350,7 +373,7 @@ def run_cgroup(test, params, env): # Write on the last vd* in a loop until test removes the # /tmp/cgroup_lock file (and kills us) _TestBlkioBandwidth.__init__(self, vms, modules) -self.dd_cmd = get_dd_cmd(write, bs=100K) +self.dd_cmd = get_dd_cmd(write, dev='vd?', bs=100K) class _TestBlkioThrottle: @@ -376,10 +399,6 @@ def run_cgroup(test, params, env): self.devices = None # Temporary virt devices (PCI drive 1 per vm) self.dd_cmd = None # DD command used to test the throughput self.speeds = None # cgroup throughput -if get_device_driver() == virtio: -self.dev = vd? -else: -self.dev = [sh]d? def cleanup(self): @@ -417,13 +436,8 @@ def run_cgroup(test, params, env): driver=virtio) else: (self.files, self.devices) = add_scsi_drive(self.vm) -try: -dev = utils.system_output(ls -l %s % self.files).split()[4:6] -dev[0] = dev[0][:-1]# Remove tailing ',' -except: -time.sleep(5) -raise error.TestFail(Couldn't get %s maj and min numbers - %
Re: [Qemu-devel] KVM call agenda for October 25
Am 24.10.2011 14:02, schrieb Peter Maydell: On 24 October 2011 12:35, Paolo Bonzini pbonz...@redhat.com wrote: On 10/24/2011 01:04 PM, Juan Quintela wrote: Please send in any agenda items you are interested in covering. - What's left to merge for 1.0. I also think that it's somewhat unfortunate that we now will compile on ARM hosts so that we always abort on startup (due to the reliance on a working makecontext()) but I'm not really sure how to deal with that one. FWIW we're also not working / not building on Darwin ppc+Intel, which is related to a) softfloat integer types, b) GThread initialization, c) unknown issues. Bisecting did not work well and I am lacking time and ideas to investigate and fix this. For softfloat there are several solutions around, in need of a decision. Nice to merge would be the Cocoa sheet issue, once verified. Andreas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 3/5] kvm hypervisor : Add two hypercalls to support pv-ticketlock
On 10/24/2011 02:27 PM, Srivatsa Vaddagiri wrote: Good point. Assuming yield_on_hlt=1, that would allow the vcpu to be put to sleep and let other vcpus make progress. I guess with that change, we can also dropthe need for other hypercall introduced in this patch (kvm_pv_kick_cpu_op()). Essentially a vcpu sleeping because of HLT instruction can be woken up by a IPI issued by vcpu releasing a lock. Not if interrupts are disabled. My original plan was to use NMIs for wakeups, but it turns out NMIs can be coalesced under certain rare circumstances; this requires workarounds by the generic NMI code that make NMIs too slow. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips
On 2011-10-24 14:43, Michael S. Tsirkin wrote: On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote: On 2011-10-24 13:09, Avi Kivity wrote: On 10/24/2011 12:19 PM, Jan Kiszka wrote: With the new feature it may be worthwhile, but I'd like to see the whole thing, with numbers attached. It's not a performance issue, it's a resource limitation issue: With the new API we can stop worrying about user space device models consuming limited IRQ routes of the KVM subsystem. Only if those devices are in the same process (or have access to the vmfd). Interrupt routing together with irqfd allows you to disaggregate the device model. Instead of providing a competing implementation with new limitations, we need to remove the limitations of the old implementation. That depends on where we do the cut. Currently we let the IRQ source signal an abstract edge on a pre-allocated pseudo IRQ line. But we cannot build correct MSI-X on top of the current irqfd model as we lack the level information (for PBA emulation). *) I don't agree here. IMO PBA emulation would need to clear pending bits on interrupt status register read. So clearing pending bits could be done by ioctl from qemu while setting them would be done from irqfd. How should QEMU know if the reason for pending has been cleared at device level if the device is outside the scope of QEMU? This model only works for PV devices when you agree that spurious IRQs are OK. So we either need to extend the existing model anyway -- or push per-vector masking back to the IRQ source. In the latter case, it would be a very good chance to give up on limited pseudo GSIs with static routes and do MSI messaging from external IRQ sources to KVM directly. But all those considerations affect different APIs than what I'm proposing here. We will always need a way to inject MSIs in the context of the VM as there will always be scenarios where devices are better run in that very same context, for performance or simplicity or whatever reasons. E.g., I could imagine that one would like to execute an emulated IRQ remapper rather in the hypervisor context than over-microkernelized in a separate process. Jan *) Realized this while trying to generalize the proposed MSI-X MMIO acceleration for assigned devices to arbitrary device models, vhost-net, I'm actually working on a qemu patch to get pba emulation working correctly. I think it's doable with existing irqfd. irqfd has no notion of level. You can only communicate a rising edge and then need a side channel for the state of the edge reason. and specifically vfio. Interesting. How would you clear the pseudo interrupt level? Ideally: not at all (for MSI). If we manage the mask at device level, we only need to send the message if there is actually something to deliver to the interrupt controller and masked input events would be lost on real HW as well. That said, we still need to address the irqfd level topic for the finite amount of legacy interrupt lines. If a line is masked at an IRQ controller, the device need to keep the controller up to date /wrt to the line state, or the controller has to poll the current state on unmask to avoid spurious injections. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips
On 2011-10-24 15:11, Jan Kiszka wrote: On 2011-10-24 14:43, Michael S. Tsirkin wrote: On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote: On 2011-10-24 13:09, Avi Kivity wrote: On 10/24/2011 12:19 PM, Jan Kiszka wrote: With the new feature it may be worthwhile, but I'd like to see the whole thing, with numbers attached. It's not a performance issue, it's a resource limitation issue: With the new API we can stop worrying about user space device models consuming limited IRQ routes of the KVM subsystem. Only if those devices are in the same process (or have access to the vmfd). Interrupt routing together with irqfd allows you to disaggregate the device model. Instead of providing a competing implementation with new limitations, we need to remove the limitations of the old implementation. That depends on where we do the cut. Currently we let the IRQ source signal an abstract edge on a pre-allocated pseudo IRQ line. But we cannot build correct MSI-X on top of the current irqfd model as we lack the level information (for PBA emulation). *) I don't agree here. IMO PBA emulation would need to clear pending bits on interrupt status register read. So clearing pending bits could be done by ioctl from qemu while setting them would be done from irqfd. How should QEMU know if the reason for pending has been cleared at device level if the device is outside the scope of QEMU? This model only works for PV devices when you agree that spurious IRQs are OK. So we either need to extend the existing model anyway -- or push per-vector masking back to the IRQ source. In the latter case, it would be a very good chance to give up on limited pseudo GSIs with static routes and do MSI messaging from external IRQ sources to KVM directly. But all those considerations affect different APIs than what I'm proposing here. We will always need a way to inject MSIs in the context of the VM as there will always be scenarios where devices are better run in that very same context, for performance or simplicity or whatever reasons. E.g., I could imagine that one would like to execute an emulated IRQ remapper rather in the hypervisor context than over-microkernelized in a separate process. Jan *) Realized this while trying to generalize the proposed MSI-X MMIO acceleration for assigned devices to arbitrary device models, vhost-net, I'm actually working on a qemu patch to get pba emulation working correctly. I think it's doable with existing irqfd. irqfd has no notion of level. You can only communicate a rising edge and then need a side channel for the state of the edge reason. and specifically vfio. Interesting. How would you clear the pseudo interrupt level? Ideally: not at all (for MSI). If we manage the mask at device level, we only need to send the message if there is actually something to deliver to the interrupt controller and masked input events would be lost on real HW as well. This wouldn't work out nicely as well. We rather need a combined model: Devices need to maintain the PBA actively, i.e. set clear them themselves and do not rely on the core here (with the core being either QEMU user space or an in-kernel MSI-X MMIO accelerator). The core only checks the PBA if it is about to deliver some message and refrains from doing so if the bit became 0 in the meantime (specifically during the masked period). For QEMU device models, that means no additional IOCTLs, just memory sharing of the PBA which is required anyway. But that means QEMU-external device models need to gain at least basic MSI-X knowledge. And if they gain this awareness, they could also use it to send full-blown messages directly (e.g. device-id/vector tuples) instead of encoding them into finite GSI numbers. But that's an add-on topic. Moreover, we still need a corresponding side channel for line-base interrupts. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V2 3/5] kvm hypervisor : Add two hypercalls to support pv-ticketlock
* Avi Kivity a...@redhat.com [2011-10-24 15:09:25]: I guess with that change, we can also dropthe need for other hypercall introduced in this patch (kvm_pv_kick_cpu_op()). Essentially a vcpu sleeping because of HLT instruction can be woken up by a IPI issued by vcpu releasing a lock. Not if interrupts are disabled. Hmm yes ..so we need a kick hypercall then. - vatsa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips
On Mon, Oct 24, 2011 at 03:11:25PM +0200, Jan Kiszka wrote: On 2011-10-24 14:43, Michael S. Tsirkin wrote: On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote: On 2011-10-24 13:09, Avi Kivity wrote: On 10/24/2011 12:19 PM, Jan Kiszka wrote: With the new feature it may be worthwhile, but I'd like to see the whole thing, with numbers attached. It's not a performance issue, it's a resource limitation issue: With the new API we can stop worrying about user space device models consuming limited IRQ routes of the KVM subsystem. Only if those devices are in the same process (or have access to the vmfd). Interrupt routing together with irqfd allows you to disaggregate the device model. Instead of providing a competing implementation with new limitations, we need to remove the limitations of the old implementation. That depends on where we do the cut. Currently we let the IRQ source signal an abstract edge on a pre-allocated pseudo IRQ line. But we cannot build correct MSI-X on top of the current irqfd model as we lack the level information (for PBA emulation). *) I don't agree here. IMO PBA emulation would need to clear pending bits on interrupt status register read. So clearing pending bits could be done by ioctl from qemu while setting them would be done from irqfd. How should QEMU know if the reason for pending has been cleared at device level if the device is outside the scope of QEMU? This model only works for PV devices when you agree that spurious IRQs are OK. A read or irq status clears pending in the same way it clears irq line for level. I don't think this generates spurious irqs. Yes it only works for PV. For assigned devices, the only way I see to implement PBA correctly is by masking the vector in the device and looking at the actual pending bit. So we either need to extend the existing model anyway -- or push per-vector masking back to the IRQ source. In the latter case, it would be a very good chance to give up on limited pseudo GSIs with static routes and do MSI messaging from external IRQ sources to KVM directly. But all those considerations affect different APIs than what I'm proposing here. We will always need a way to inject MSIs in the context of the VM as there will always be scenarios where devices are better run in that very same context, for performance or simplicity or whatever reasons. E.g., I could imagine that one would like to execute an emulated IRQ remapper rather in the hypervisor context than over-microkernelized in a separate process. Jan *) Realized this while trying to generalize the proposed MSI-X MMIO acceleration for assigned devices to arbitrary device models, vhost-net, I'm actually working on a qemu patch to get pba emulation working correctly. I think it's doable with existing irqfd. irqfd has no notion of level. You can only communicate a rising edge and then need a side channel for the state of the edge reason. True. But we only need that for PBA read which is unused ATM. So kvm can just send the read to userspace, have qemu query vfio or whatever. and specifically vfio. Interesting. How would you clear the pseudo interrupt level? Ideally: not at all (for MSI). If we manage the mask at device level, we only need to send the message if there is actually something to deliver to the interrupt controller and masked input events would be lost on real HW as well. Not sure I understand. we certainly shouldn't send masked interrupts to the APIC if for no other reason that the message value is invalid while masked. That said, we still need to address the irqfd level topic for the finite amount of legacy interrupt lines. If a line is masked at an IRQ controller, the device need to keep the controller up to date /wrt to the line state, or the controller has to poll the current state on unmask to avoid spurious injections. Jan Yes, level interrupts are tricky. -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips
On Mon, Oct 24, 2011 at 03:43:53PM +0200, Jan Kiszka wrote: On 2011-10-24 15:11, Jan Kiszka wrote: On 2011-10-24 14:43, Michael S. Tsirkin wrote: On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote: On 2011-10-24 13:09, Avi Kivity wrote: On 10/24/2011 12:19 PM, Jan Kiszka wrote: With the new feature it may be worthwhile, but I'd like to see the whole thing, with numbers attached. It's not a performance issue, it's a resource limitation issue: With the new API we can stop worrying about user space device models consuming limited IRQ routes of the KVM subsystem. Only if those devices are in the same process (or have access to the vmfd). Interrupt routing together with irqfd allows you to disaggregate the device model. Instead of providing a competing implementation with new limitations, we need to remove the limitations of the old implementation. That depends on where we do the cut. Currently we let the IRQ source signal an abstract edge on a pre-allocated pseudo IRQ line. But we cannot build correct MSI-X on top of the current irqfd model as we lack the level information (for PBA emulation). *) I don't agree here. IMO PBA emulation would need to clear pending bits on interrupt status register read. So clearing pending bits could be done by ioctl from qemu while setting them would be done from irqfd. How should QEMU know if the reason for pending has been cleared at device level if the device is outside the scope of QEMU? This model only works for PV devices when you agree that spurious IRQs are OK. So we either need to extend the existing model anyway -- or push per-vector masking back to the IRQ source. In the latter case, it would be a very good chance to give up on limited pseudo GSIs with static routes and do MSI messaging from external IRQ sources to KVM directly. But all those considerations affect different APIs than what I'm proposing here. We will always need a way to inject MSIs in the context of the VM as there will always be scenarios where devices are better run in that very same context, for performance or simplicity or whatever reasons. E.g., I could imagine that one would like to execute an emulated IRQ remapper rather in the hypervisor context than over-microkernelized in a separate process. Jan *) Realized this while trying to generalize the proposed MSI-X MMIO acceleration for assigned devices to arbitrary device models, vhost-net, I'm actually working on a qemu patch to get pba emulation working correctly. I think it's doable with existing irqfd. irqfd has no notion of level. You can only communicate a rising edge and then need a side channel for the state of the edge reason. and specifically vfio. Interesting. How would you clear the pseudo interrupt level? Ideally: not at all (for MSI). If we manage the mask at device level, we only need to send the message if there is actually something to deliver to the interrupt controller and masked input events would be lost on real HW as well. This wouldn't work out nicely as well. We rather need a combined model: Devices need to maintain the PBA actively, i.e. set clear them themselves and do not rely on the core here (with the core being either QEMU user space or an in-kernel MSI-X MMIO accelerator). The core only checks the PBA if it is about to deliver some message and refrains from doing so if the bit became 0 in the meantime (specifically during the masked period). For QEMU device models, that means no additional IOCTLs, just memory sharing of the PBA which is required anyway. Sorry, I don't understand the above two paragraphs. Maybe I am confused by terminology here. We really only need to check PBA when it's read. Whether the message is delivered only depends on the mask bit. But that means QEMU-external device models need to gain at least basic MSI-X knowledge. And if they gain this awareness, they could also use it to send full-blown messages directly (e.g. device-id/vector tuples) instead of encoding them into finite GSI numbers. But that's an add-on topic. Moreover, we still need a corresponding side channel for line-base interrupts. Jan Agree on all points with the above. -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips
On 2011-10-24 16:40, Michael S. Tsirkin wrote: On Mon, Oct 24, 2011 at 03:43:53PM +0200, Jan Kiszka wrote: On 2011-10-24 15:11, Jan Kiszka wrote: On 2011-10-24 14:43, Michael S. Tsirkin wrote: On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote: On 2011-10-24 13:09, Avi Kivity wrote: On 10/24/2011 12:19 PM, Jan Kiszka wrote: With the new feature it may be worthwhile, but I'd like to see the whole thing, with numbers attached. It's not a performance issue, it's a resource limitation issue: With the new API we can stop worrying about user space device models consuming limited IRQ routes of the KVM subsystem. Only if those devices are in the same process (or have access to the vmfd). Interrupt routing together with irqfd allows you to disaggregate the device model. Instead of providing a competing implementation with new limitations, we need to remove the limitations of the old implementation. That depends on where we do the cut. Currently we let the IRQ source signal an abstract edge on a pre-allocated pseudo IRQ line. But we cannot build correct MSI-X on top of the current irqfd model as we lack the level information (for PBA emulation). *) I don't agree here. IMO PBA emulation would need to clear pending bits on interrupt status register read. So clearing pending bits could be done by ioctl from qemu while setting them would be done from irqfd. How should QEMU know if the reason for pending has been cleared at device level if the device is outside the scope of QEMU? This model only works for PV devices when you agree that spurious IRQs are OK. So we either need to extend the existing model anyway -- or push per-vector masking back to the IRQ source. In the latter case, it would be a very good chance to give up on limited pseudo GSIs with static routes and do MSI messaging from external IRQ sources to KVM directly. But all those considerations affect different APIs than what I'm proposing here. We will always need a way to inject MSIs in the context of the VM as there will always be scenarios where devices are better run in that very same context, for performance or simplicity or whatever reasons. E.g., I could imagine that one would like to execute an emulated IRQ remapper rather in the hypervisor context than over-microkernelized in a separate process. Jan *) Realized this while trying to generalize the proposed MSI-X MMIO acceleration for assigned devices to arbitrary device models, vhost-net, I'm actually working on a qemu patch to get pba emulation working correctly. I think it's doable with existing irqfd. irqfd has no notion of level. You can only communicate a rising edge and then need a side channel for the state of the edge reason. and specifically vfio. Interesting. How would you clear the pseudo interrupt level? Ideally: not at all (for MSI). If we manage the mask at device level, we only need to send the message if there is actually something to deliver to the interrupt controller and masked input events would be lost on real HW as well. This wouldn't work out nicely as well. We rather need a combined model: Devices need to maintain the PBA actively, i.e. set clear them themselves and do not rely on the core here (with the core being either QEMU user space or an in-kernel MSI-X MMIO accelerator). The core only checks the PBA if it is about to deliver some message and refrains from doing so if the bit became 0 in the meantime (specifically during the masked period). For QEMU device models, that means no additional IOCTLs, just memory sharing of the PBA which is required anyway. Sorry, I don't understand the above two paragraphs. Maybe I am confused by terminology here. We really only need to check PBA when it's read. Whether the message is delivered only depends on the mask bit. This is what I have in mind: - devices set PBA bit if MSI message cannot be sent due to mask (*) - core checksclears PBA bit on unmask, injects message if bit was set - devices clear PBA bit if message reason is resolved before unmask (*) The marked (*) lines differ from the current user space model where only the core does PBA manipulation (including clearance via a special function). Basically, the PBA becomes a communication channel also between device and MSI core. And this model also works if core and device run in different processes provided they set up the PBA as shared memory. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for October 25
On Mon, 24 Oct 2011 13:02:05 +0100 Peter Maydell peter.mayd...@linaro.org wrote: On 24 October 2011 12:35, Paolo Bonzini pbonz...@redhat.com wrote: On 10/24/2011 01:04 PM, Juan Quintela wrote: Please send in any agenda items you are interested in covering. - What's left to merge for 1.0. Things on my list, FWIW: * current target-arm pullreq * PL041 support (needs another patch round to fix a minor bug Andrzej spotted) * cpu_single_env must be thread-local I submitted today the second round of QAPI conversions, which converts all existing QMP query commands to the QAPI (plus some fixes). I expect that to make 1.0. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips
On Mon, Oct 24, 2011 at 05:00:27PM +0200, Jan Kiszka wrote: On 2011-10-24 16:40, Michael S. Tsirkin wrote: On Mon, Oct 24, 2011 at 03:43:53PM +0200, Jan Kiszka wrote: On 2011-10-24 15:11, Jan Kiszka wrote: On 2011-10-24 14:43, Michael S. Tsirkin wrote: On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote: On 2011-10-24 13:09, Avi Kivity wrote: On 10/24/2011 12:19 PM, Jan Kiszka wrote: With the new feature it may be worthwhile, but I'd like to see the whole thing, with numbers attached. It's not a performance issue, it's a resource limitation issue: With the new API we can stop worrying about user space device models consuming limited IRQ routes of the KVM subsystem. Only if those devices are in the same process (or have access to the vmfd). Interrupt routing together with irqfd allows you to disaggregate the device model. Instead of providing a competing implementation with new limitations, we need to remove the limitations of the old implementation. That depends on where we do the cut. Currently we let the IRQ source signal an abstract edge on a pre-allocated pseudo IRQ line. But we cannot build correct MSI-X on top of the current irqfd model as we lack the level information (for PBA emulation). *) I don't agree here. IMO PBA emulation would need to clear pending bits on interrupt status register read. So clearing pending bits could be done by ioctl from qemu while setting them would be done from irqfd. How should QEMU know if the reason for pending has been cleared at device level if the device is outside the scope of QEMU? This model only works for PV devices when you agree that spurious IRQs are OK. So we either need to extend the existing model anyway -- or push per-vector masking back to the IRQ source. In the latter case, it would be a very good chance to give up on limited pseudo GSIs with static routes and do MSI messaging from external IRQ sources to KVM directly. But all those considerations affect different APIs than what I'm proposing here. We will always need a way to inject MSIs in the context of the VM as there will always be scenarios where devices are better run in that very same context, for performance or simplicity or whatever reasons. E.g., I could imagine that one would like to execute an emulated IRQ remapper rather in the hypervisor context than over-microkernelized in a separate process. Jan *) Realized this while trying to generalize the proposed MSI-X MMIO acceleration for assigned devices to arbitrary device models, vhost-net, I'm actually working on a qemu patch to get pba emulation working correctly. I think it's doable with existing irqfd. irqfd has no notion of level. You can only communicate a rising edge and then need a side channel for the state of the edge reason. and specifically vfio. Interesting. How would you clear the pseudo interrupt level? Ideally: not at all (for MSI). If we manage the mask at device level, we only need to send the message if there is actually something to deliver to the interrupt controller and masked input events would be lost on real HW as well. This wouldn't work out nicely as well. We rather need a combined model: Devices need to maintain the PBA actively, i.e. set clear them themselves and do not rely on the core here (with the core being either QEMU user space or an in-kernel MSI-X MMIO accelerator). The core only checks the PBA if it is about to deliver some message and refrains from doing so if the bit became 0 in the meantime (specifically during the masked period). For QEMU device models, that means no additional IOCTLs, just memory sharing of the PBA which is required anyway. Sorry, I don't understand the above two paragraphs. Maybe I am confused by terminology here. We really only need to check PBA when it's read. Whether the message is delivered only depends on the mask bit. This is what I have in mind: - devices set PBA bit if MSI message cannot be sent due to mask (*) - core checksclears PBA bit on unmask, injects message if bit was set - devices clear PBA bit if message reason is resolved before unmask (*) OK, but practically, when exactly does the device clear PBA? The marked (*) lines differ from the current user space model where only the core does PBA manipulation (including clearance via a special function). Basically, the PBA becomes a communication channel also between device and MSI core. And this model also works if core and device run in different processes provided they set up the PBA as shared memory. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips
On 2011-10-24 18:05, Michael S. Tsirkin wrote: This is what I have in mind: - devices set PBA bit if MSI message cannot be sent due to mask (*) - core checksclears PBA bit on unmask, injects message if bit was set - devices clear PBA bit if message reason is resolved before unmask (*) OK, but practically, when exactly does the device clear PBA? Consider a network adapter that signals messages in a RX ring: If the corresponding vector is masked while the guest empties the ring, I strongly assume that the device is supposed to take back the pending bit in that case so that there is no interrupt inject on a later vector unmask operation. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] virt: Revert only update macaddr cache when capture dhcp ACK pkt
Revert commit d9bab5bef598b4b415d004eb62e9cd32c3243565, that changes how the macaddr cache is updated. This patch brought a lot of regressions on our internal tests, so it'll be dropped until a possibly safer version of the fix is proposed. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/virt/virt_env_process.py | 10 ++ 1 files changed, 2 insertions(+), 8 deletions(-) diff --git a/client/virt/virt_env_process.py b/client/virt/virt_env_process.py index a1ec07a..25285b8 100644 --- a/client/virt/virt_env_process.py +++ b/client/virt/virt_env_process.py @@ -403,20 +403,14 @@ def _update_address_cache(address_cache, line): address_cache[last_seen] = matches[0] if re.search(Client.Ethernet.Address, line, re.IGNORECASE): matches = re.findall(r\w*:\w*:\w*:\w*:\w*:\w*, line) -if matches: -address_cache[last_mac] = matches[0] -if re.search(DHCP-Message, line, re.IGNORECASE): -matches = re.findall(rACK, line) -if matches and (address_cache.get(last_seen) and -address_cache.get(last_mac)): -mac_address = address_cache.get(last_mac).lower() +if matches and address_cache.get(last_seen): +mac_address = matches[0].lower() if time.time() - address_cache.get(time_%s % mac_address, 0) 5: logging.debug((address cache) Adding cache entry: %s --- %s, mac_address, address_cache.get(last_seen)) address_cache[mac_address] = address_cache.get(last_seen) address_cache[time_%s % mac_address] = time.time() del address_cache[last_seen] -del address_cache[last_mac] def _take_screendumps(test, params, env): -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips
On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote: On 2011-10-24 18:05, Michael S. Tsirkin wrote: This is what I have in mind: - devices set PBA bit if MSI message cannot be sent due to mask (*) - core checksclears PBA bit on unmask, injects message if bit was set - devices clear PBA bit if message reason is resolved before unmask (*) OK, but practically, when exactly does the device clear PBA? Consider a network adapter that signals messages in a RX ring: If the corresponding vector is masked while the guest empties the ring, I strongly assume that the device is supposed to take back the pending bit in that case so that there is no interrupt inject on a later vector unmask operation. Jan Do you mean virtio here? Do you expect this optimization to give a significant performance gain? -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips
On Mon, Oct 24, 2011 at 07:05:08PM +0200, Michael S. Tsirkin wrote: On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote: On 2011-10-24 18:05, Michael S. Tsirkin wrote: This is what I have in mind: - devices set PBA bit if MSI message cannot be sent due to mask (*) - core checksclears PBA bit on unmask, injects message if bit was set - devices clear PBA bit if message reason is resolved before unmask (*) OK, but practically, when exactly does the device clear PBA? Consider a network adapter that signals messages in a RX ring: If the corresponding vector is masked while the guest empties the ring, I strongly assume that the device is supposed to take back the pending bit in that case so that there is no interrupt inject on a later vector unmask operation. Jan Do you mean virtio here? Do you expect this optimization to give a significant performance gain? It would also be challenging to implement this in a race free manner. Clearing on interrupt status read seems straight-forward. -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next-2.6 PATCH 0/8 RFC v2] macvlan: MAC Address filtering support for passthru mode
On 10/23/11 10:47 PM, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Oct 18, 2011 at 11:25:54PM -0700, Roopa Prabhu wrote: v1 version of this RFC patch was posted at http://www.spinics.net/lists/netdev/msg174245.html Today macvtap used in virtualized environment does not have support to propagate MAC, VLAN and interface flags from guest to lowerdev. Which means to be able to register additional VLANs, unicast and multicast addresses or change pkt filter flags in the guest, the lowerdev has to be put in promisocous mode. Today the only macvlan mode that supports this is the PASSTHRU mode and it puts the lower dev in promiscous mode. PASSTHRU mode was added primarily for the SRIOV usecase. In PASSTHRU mode there is a 1-1 mapping between macvtap and physical NIC or VF. There are two problems with putting the lowerdev in promiscous mode (ie SRIOV VF's): - Some SRIOV cards dont support promiscous mode today (Thread on Intel driver indicates that http://lists.openwall.net/netdev/2011/09/27/6) - For the SRIOV NICs that support it, Putting the lowerdev in promiscous mode leads to additional traffic being sent up to the guest virtio-net to filter result in extra overheads. Both the above problems can be solved by offloading filtering to the lowerdev hw. ie lowerdev does not need to be in promiscous mode as long as the guest filters are passed down to the lowerdev. This patch basically adds the infrastructure to set and get MAC and VLAN filters on an interface via rtnetlink. And adds support in macvlan and macvtap to allow set and get filter operations. Looks sane to me. Some minor comments below. Earlier version of this patch provided the TUNSETTXFILTER macvtap interface for setting address filtering. In response to feedback, This version introduces a netlink interface for the same. Response to some of the questions raised during v1: - Netlink interface: This patch provides the following netlink interface to set mac and vlan filters : [IFLA_RX_FILTER] = { [IFLA_ADDR_FILTER] = { [IFLA_ADDR_FILTER_FLAGS] [IFLA_ADDR_FILTER_UC_LIST] = { [IFLA_ADDR_LIST_ENTRY] } [IFLA_ADDR_FILTER_MC_LIST] = { [IFLA_ADDR_LIST_ENTRY] } } [IFLA_VLAN_FILTER] = { [IFLA_VLAN_BITMAP] } } Note: The IFLA_VLAN_FILTER is a nested attribute and contains only IFLA_VLAN_BITMAP today. The idea is that the IFLA_VLAN_FILTER can be extended tomorrow to use a vlan list option if some implementations prefer a list instead. And it provides the following rtnl_link_ops to set/get MAC/VLAN filters: int (*set_rx_addr_filter)(struct net_device *dev, struct nlattr *tb[]); int (*set_rx_vlan_filter)(struct net_device *dev, struct nlattr *tb[]); size_t (*get_rx_addr_filter_size)(const struct net_device *dev); size_t (*get_rx_vlan_filter_size)(const struct net_device *dev); int (*fill_rx_addr_filter)(struct sk_buff *skb, const struct net_device *dev); int (*fill_rx_vlan_filter)(struct sk_buff *skb, const struct net_device *dev); Note: The choice of rtnl_link_ops was because I saw the use case for this in virtual devices that need to do filtering in sw like macvlan and tun. Hw devices usually have filtering in hw with netdev-uc and mc lists to indicate active filters. But I can move from rtnl_link_ops to netdev_ops if that is the preferred way to go and if there is a need to support this interface on all kinds of interfaces. Please suggest. - Protection against address spoofing: - This patch adds filtering support only for macvtap PASSTHRU Mode. PASSTHRU mode is used mainly with SRIOV VF's. And SRIOV VF's come with anti mac/vlan spoofing support. (Recently added IFLA_VF_SPOOFCHK). In 802.1Qbh case the port profile has a knob to enable/disable anti spoof check. Lowerdevice drivers also enforce limits on the number of address registrations allowed. - Support for multiqueue devices: Enable filtering on individual queues (?): AFAIK, there is no netdev interface to install per queue hw filters for a multi queue interface. And also I dont know of any hw that provides an interface to set hw filters on a per queue basis. VMDq hardware would support this, no? Am not really sure. This patch uses netdev to pass filters to hw. And I don't see any netdev infrastructure that would support per queue filters. Maybe Greg (CC'ed) or anyone else from Intel can answer this. Greg, michael had brought up this question during first version of these patches as well. Will be nice to get the VMDq requirements for propagating guest filters to hw clarified. Do you see any special VMDq nic requirement we can cover in this patch.
Re: [PATCH RFC V2 4/5] kvm guest : Added configuration support to enable debug information for KVM Guests
On 10/24/2011 03:31 PM, Sasha Levin wrote: On Mon, 2011-10-24 at 00:37 +0530, Raghavendra K T wrote: Added configuration support to enable debug information for KVM Guests in debugfs +config KVM_DEBUG_FS + bool Enable debug information for KVM Guests in debugfs + depends on KVM_GUEST Shouldn't it depend on DEBUG_FS as well? Thanks again for pointing. will correct this too. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
11 Seconds waiting between pings, but network-throughput is fast
Dear KVM-List, I have a really strange network-issue and I'm running out of ideas how to track this down. I have two tap-devices on one bridge: bridge name bridge id STP enabled interfaces br0 8000.00e081c682e7 no eth0 vm0 vm1 Each of them are connected to a virtual machine with virtio. The machines are both running MS Windows Server 2008 R2 and are configured equally from the kvm-perspective. I can transfer data over the network from a windows-share in both directions with over 40 MB/sec. On one server, I have a noticeable latency over RDP. When I do a ping from this server, the first response comes immediately but there is a 11 seconds delay before the second ping is send on it's way! I know this sounds crazy but I've installed wireshark to track this down and the response for every ping comes immediately but there are huge pauses (about 11 seconds) between the pings are sent out. The CPU utilization is nearly zero and the disk-io is fast (raid10, lvm, virtio). Any help or ideas on that are appreciated, Andreas Piening-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hard limit for the cpu usage of a VM
Hi , I was previously using xen and currently moved to KVM. I am using libvirt to manage these VMs. In den's credit scheduler , I had the ability to set a cap on the cpu usage for a VM. But I was not able to find a similar substitute in KVM. I find that we can use cgroups to provide shares for VM but that will be more like weight based and it doesn't set a hard cap for that VM. I tried using cpulimit but I find it inaccurate and we can give values only between 0-100. Thus, I think it cannot support multi core environments. Can any one suggest a method to set a hard limit on a VM's cpu usage? Thank you. - Regards, Sethuraman Subbiah Graduate Student - NC state University M.S in Computer Science-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [net-next-2.6 PATCH 0/8 RFC v2] macvlan: MAC Address filtering support for passthru mode
-Original Message- From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On Behalf Of Roopa Prabhu Sent: Monday, October 24, 2011 11:15 AM To: Michael S. Tsirkin Cc: net...@vger.kernel.org; s...@us.ibm.com; dragos.tatu...@gmail.com; a...@arndb.de; kvm@vger.kernel.org; da...@davemloft.net; mc...@broadcom.com; dwa...@cisco.com; shemmin...@vyatta.com; eric.duma...@gmail.com; ka...@trash.net; be...@cisco.com; Rose, Gregory V Subject: Re: [net-next-2.6 PATCH 0/8 RFC v2] macvlan: MAC Address filtering support for passthru mode On 10/23/11 10:47 PM, Michael S. Tsirkin m...@redhat.com wrote: AFAIK, there is no netdev interface to install per queue hw filters for a multi queue interface. And also I dont know of any hw that provides an interface to set hw filters on a per queue basis. VMDq hardware would support this, no? Am not really sure. This patch uses netdev to pass filters to hw. And I don't see any netdev infrastructure that would support per queue filters. Maybe Greg (CC'ed) or anyone else from Intel can answer this. Greg, michael had brought up this question during first version of these patches as well. Will be nice to get the VMDq requirements for propagating guest filters to hw clarified. Do you see any special VMDq nic requirement we can cover in this patch. This is for VMDq queues directly connected to guest nics. Thanks. So far as I know there is no support for VMDq in the Linux kernel and while I know some folks have been working on it I can't really speak to that work or their plans. Much would depend on the implementation. For now it makes sense to me to get support for multiple MAC and VLAN filters per virtual function (or virtual nic) and it seems to me you're going in the right direction for this. We'll have a look at your next set of patches and take it from there. - Greg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Extreme time-drifts under windows server 2008 R2
Hi KVM-list, I have sent an email with my problem allready a few hours ago (attached below, for reference). After some additional examination of the system I figured out new facts that completely change my problem: When I open up the date- and time-settings dialog in windows the time seems to be freezed! But after (gues what?) 11 seconds the next second is displayed! The system-clock seems to be at a tenth of the real-time-clock. I have a time-drift of about 10 seconds per second on my windows server 2008 R2 guest. I searched the web and found a command that should be entered as Administrator to eliminate the time-drift: bcdedit /set {default} USEPLATFORMCLOCK But it hasn't changed the situation for me. I have set -localtime as a kvm parameter, what else can I do? Someone got a similar problem before and solved it? I'm not sure if it is related to the problem, but in the host-kernel I have: Tickless System (Dynamic Ticks) ENABLED, and High Resolution Timer Support DISABLED ... for no specific reason. What are the correct settings here? Thank you in advance! Andreas Piening - Dear KVM-List, I have a really strange network-issue and I'm running out of ideas how to track this down. I have two tap-devices on one bridge: bridge name bridge id STP enabled interfaces br0 8000.00e081c682e7 no eth0 vm0 vm1 Each of them are connected to a virtual machine with virtio. The machines are both running MS Windows Server 2008 R2 and are configured equally from the kvm-perspective. I can transfer data over the network from a windows-share in both directions with over 40 MB/sec. On one server, I have a noticeable latency over RDP. When I do a ping from this server, the first response comes immediately but there is a 11 seconds delay before the second ping is send on it's way! I know this sounds crazy but I've installed wireshark to track this down and the response for every ping comes immediately but there are huge pauses (about 11 seconds) between the pings are sent out. The CPU utilization is nearly zero and the disk-io is fast (raid10, lvm, virtio). Any help or ideas on that are appreciated, Andreas Piening-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC v2 PATCH 5/4 PATCH] virtio-net: send gratuitous packet when needed
On 10/24/2011 01:25 PM, Michael S. Tsirkin wrote: On Mon, Oct 24, 2011 at 02:54:59PM +1030, Rusty Russell wrote: On Sat, 22 Oct 2011 13:43:11 +0800, Jason Wang jasow...@redhat.com wrote: This make let virtio-net driver can send gratituous packet by a new config bit - VIRTIO_NET_S_ANNOUNCE in each config update interrupt. When this bit is set by backend, the driver would schedule a workqueue to send gratituous packet through NETDEV_NOTIFY_PEERS. This feature is negotiated through bit VIRTIO_NET_F_GUEST_ANNOUNCE. Signed-off-by: Jason Wang jasow...@redhat.com This seems like a huge layering violation. Imagine this in real hardware, for example. commits 06c4648d46d1b757d6b9591a86810be79818b60c and 99606477a5888b0ead0284fecb13417b1da8e3af document the need for this: NETDEV_NOTIFY_PEERS notifier indicates that a device moved to a different physical link. and In real hardware such notifications are only generated when the device comes up or the address changes. So hypervisor could get the same behaviour by sending link up/down events, this is just an optimization so guest won't do unecessary stuff like try to reconfigure an IP address. Maybe LOCATION_CHANGE would be a better name? ANNOUNCE_SELF? There may be a good reason why virtual devices might want this kind of reconfiguration cheat, which is unnecessary for normal machines, I think yes, the difference with real hardware is guest can change location without link getting dropped. FWIW, Xen seems to use this capability too. So does ms netvsc. but it'd have to be spelled out clearly in the spec to justify it... Cheers, Rusty. Agree, and I'd like to see the spec too. The interface seems to involve the guest clearing the status bit when it detects an event? I would describe this in spec. The interface need guest to clear the status bit, this would let the back-end know it has finished the work as we may need to send the gratuitous packets many times. Also - how does it interact with the link up event? We probably don't want to schedule this when we detect a link status change or during initialization, as this patch seems to do? What if link goes down while the work is running? Is that OK? Looks like there's are duplications if guest enable arp_notify vm is started, but we need to handle the situation that resuming a stopped virtual machine. For the link down race, I don't see any real issue, either dropping or queued. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index
On Mon, 24 Oct 2011 12:02:18 +0200, Jens Axboe ax...@kernel.dk wrote: On 2011-10-24 12:02, Michael S. Tsirkin wrote: On Wed, Oct 19, 2011 at 12:12:20PM +0200, Michael S. Tsirkin wrote: Rusty, any opinion on merging this for 3.2? I expect merge window will open right after the summit, I can toss it into for-3.2/drivers, if there's consensus to do that now. I'd like to see the final patch... we got the new simplified ida stuff in, so I assume it uses that? But assume silence from me means consent: it's obviously the Right Thing. Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html