[PATCH] virtio_blk: Add help function to format mass of disks
The current virtio block's naming algorithm just supports 18278 (26^3 + 26^2 + 26) disks. If there are mass of virtio blocks, there will be disks with the same name. Based on commit 3e1a7ff8a0a7b948f2684930166954f9e8e776fe, I add function virtblk_name_format() for virtio block to support mass of disks naming. Signed-off-by: Ren Mingxin re...@cn.fujitsu.com --- drivers/block/virtio_blk.c | 38 ++ 1 files changed, 26 insertions(+), 12 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index c4a60ba..86516c8 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -374,6 +374,31 @@ static int init_vq(struct virtio_blk *vblk) return err; } +static int virtblk_name_format(char *prefix, int index, char *buf, int buflen) +{ + const int base = 'z' - 'a' + 1; + char *begin = buf + strlen(prefix); + char *begin = buf + strlen(prefix); + char *end = buf + buflen; + char *p; + int unit; + + p = end - 1; + *p = '\0'; + unit = base; + do { + if (p == begin) + return -EINVAL; + *--p = 'a' + (index % unit); + index = (index / unit) - 1; + } while (index = 0); + + memmove(begin, p, end - p); + memcpy(buf, prefix, strlen(prefix)); + + return 0; +} + static int __devinit virtblk_probe(struct virtio_device *vdev) { struct virtio_blk *vblk; @@ -442,18 +467,7 @@ static int __devinit virtblk_probe(struct virtio_device *vdev) q-queuedata = vblk; - if (index 26) { - sprintf(vblk-disk-disk_name, vd%c, 'a' + index % 26); - } else if (index (26 + 1) * 26) { - sprintf(vblk-disk-disk_name, vd%c%c, - 'a' + index / 26 - 1, 'a' + index % 26); - } else { - const unsigned int m1 = (index / 26 - 1) / 26 - 1; - const unsigned int m2 = (index / 26 - 1) % 26; - const unsigned int m3 = index % 26; - sprintf(vblk-disk-disk_name, vd%c%c%c, - 'a' + m1, 'a' + m2, 'a' + m3); - } + virtblk_name_format(vd, index, vblk-disk-disk_name, DISK_NAME_LEN); vblk-disk-major = major; vblk-disk-first_minor = index_to_minor(index); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/13] KVM: MMU: fast page fault
On 04/09/2012 09:26 PM, Xiao Guangrong wrote: Yes, if Xwindow is not enabled, the benefit is limited. :) I'm more interested in migration. We could optimize the framebuffer by disabling dirty logging when VNC/Spice is not connected (which should usually be the case), or when the SDL window is minimized (shouldn't be that often, unfortunately) Related, qxl doesn't seem to stop the dirty log when switching to accelerated mode. vmsvga gets it right: case SVGA_REG_ENABLE: s-enable = value; s-config = !!value; s-width = -1; s-height = -1; s-invalidated = 1; s-vga.invalidate(s-vga); if (s-enable) { s-fb_size = ((s-depth + 7) 3) * s-new_width * s-new_height; vga_dirty_log_stop(s-vga); } else { vga_dirty_log_start(s-vga); } break; -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/13] KVM: MMU: fast page fault
On 04/09/2012 10:46 PM, Marcelo Tosatti wrote: Perhaps the mmu_lock hold times by get_dirty are a large component here? If that can be alleviated, not only RO-RW faults benefit. Currently the longest holder in normal use is probably reading the dirty log and write protecting the shadow page tables. We could fix that by switching to O(1) write protection (write-protecting PML4Es instead of PTEs). It would be interesting to combine O(1) write protection with lockless write-enabling. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: source for virt io backend driver
On Tue, Apr 10, 2012 at 4:47 AM, Steven wangwangk...@gmail.com wrote: I found this post http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/89334 So the current block driver seems completely emulated by the qemu driver. That's right: qemu/hw/virtio-blk.c Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/13] KVM: MMU: fast page fault
On Tue, 10 Apr 2012 13:39:14 +0300 Avi Kivity a...@redhat.com wrote: On 04/09/2012 10:46 PM, Marcelo Tosatti wrote: Perhaps the mmu_lock hold times by get_dirty are a large component here? If that can be alleviated, not only RO-RW faults benefit. Currently the longest holder in normal use is probably reading the dirty log and write protecting the shadow page tables. We could fix that by switching to O(1) write protection (write-protecting PML4Es instead of PTEs). It would be interesting to combine O(1) write protection with lockless write-enabling. As Marcelo suggested during reviewing srcu-less dirty logging, we can mitigate the get_dirty's mmu_lock hold time problem cleanly, locally in get_dirty_log(), by using cond_resched_lock() -- although we need to introduce cond_rescheck_lock_cb() to conditionally flush TLB. I have already started that work. Actually I introduced rmap based get_dirty for that kind of fine-grained contention control. I think we should do our best not to affect mmu so much just for the limited time of live migration. Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] adding tracepoints to vhost
On Tue, Apr 10, 2012 at 3:58 AM, Jason Wang jasow...@redhat.com wrote: To help in vhost analyzing, the following series adding basic tracepoints to vhost. Operations of both virtqueues and vhost works were traced in current implementation, net code were untouched. A top-like satistics displaying script were introduced to help the troubleshooting. TODO: - net specific tracepoints? --- Jason Wang (2): vhost: basic tracepoints tools: virtio: add a top-like utility for displaying vhost satistics drivers/vhost/trace.h | 153 drivers/vhost/vhost.c | 17 ++ tools/virtio/vhost_stat | 360 +++ 3 files changed, 528 insertions(+), 2 deletions(-) create mode 100644 drivers/vhost/trace.h create mode 100755 tools/virtio/vhost_stat Perhaps this can replace the vhost log feature? I'm not sure if tracepoints support the right data types but it seems like vhost debugging could be done using tracing with less code. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question about emulation of KVM?
On Sat, Apr 7, 2012 at 6:18 AM, R 1989012...@gmail.com wrote: I try to use the x86_emulate_instruction() function. But it seems like that it fails to emulate some instruction. My program gets stuck in somewhere. It keeps emulating one instructions. Is there some instructions that this function can not emulate? Yes there are instructions that are not supported by the emulator but they should produce a kernel message. Check dmesg(1) to see if an error was logged. You can also enable the kvm:* tracepoints in the kernel to get detailed information on guest behavior, including emulated instruction opcodes. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost-blk development
On Mon, Apr 9, 2012 at 11:59 PM, Michael Baysek mbay...@liquidweb.com wrote: Hi all. I'm interested in any developments on the vhost-blk in kernel accelerator for disk i/o. I had seen a patchset on LKML https://lkml.org/lkml/2011/7/28/175 but that is rather old. Are there any newer developments going on with the vhost-blk stuff? Hi Michael, I'm curious what you are looking for in vhost-blk. Are you trying to improve disk performance for KVM guests? Perhaps you'd like to share your configuration, workload, and other details so that we can discuss how to improve performance. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/13] KVM: MMU: fast page fault
On 04/10/2012 07:40 PM, Takuya Yoshikawa wrote: On Tue, 10 Apr 2012 13:39:14 +0300 Avi Kivity a...@redhat.com wrote: On 04/09/2012 10:46 PM, Marcelo Tosatti wrote: Perhaps the mmu_lock hold times by get_dirty are a large component here? If that can be alleviated, not only RO-RW faults benefit. Currently the longest holder in normal use is probably reading the dirty log and write protecting the shadow page tables. We could fix that by switching to O(1) write protection (write-protecting PML4Es instead of PTEs). It would be interesting to combine O(1) write protection with lockless write-enabling. As Marcelo suggested during reviewing srcu-less dirty logging, we can mitigate the get_dirty's mmu_lock hold time problem cleanly, locally in get_dirty_log(), by using cond_resched_lock() -- although we need to introduce cond_rescheck_lock_cb() to conditionally flush TLB. Although it can reduce the contention but it is not reduce the overload of dirty-log. I have already started that work. Actually I introduced rmap based get_dirty for that kind of fine-grained contention control. I do not think this way is better that O(1). Avi has explained the reason for many times, and i agree with that. :) I think we should do our best not to affect mmu so much just for the limited time of live migration. No, i do not really agree with that. We really can get great benefit from O(1) especially if lockless write-protect is introduced for O(1), live migration is very useful for cloud computing architecture to balance the overload on all nodes. And no reason to disallow us touch the code of MMU, yes, it needs simply but it does not means stop the development of MMU. For another hander, the mechanism like your to improve dirty-log also need introduce lots of code and it does not make MMU clearer. :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PMU emulation: GLOBAL_CTRL MSR should be enabled on reset.
On 04/09/2012 05:38 PM, Gleb Natapov wrote: On reset all MPU counters should be enabled in GLOBAL_CTRL MSR. Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 173df38..2e88438 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -459,17 +459,17 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu) pmu-available_event_types = ~entry-ebx ((1ull bitmap_len) - 1); if (pmu-version == 1) { - pmu-global_ctrl = (1 pmu-nr_arch_gp_counters) - 1; - return; + pmu-nr_arch_fixed_counters = 0; + } else { + pmu-nr_arch_fixed_counters = min((int)(entry-edx 0x1f), + X86_PMC_MAX_FIXED); + pmu-counter_bitmask[KVM_PMC_FIXED] = + ((u64)1 ((entry-edx 5) 0xff)) - 1; } - pmu-nr_arch_fixed_counters = min((int)(entry-edx 0x1f), - X86_PMC_MAX_FIXED); - pmu-counter_bitmask[KVM_PMC_FIXED] = - ((u64)1 ((entry-edx 5) 0xff)) - 1; - pmu-global_ctrl_mask = ~(((1 pmu-nr_arch_gp_counters) - 1) - | (((1ull pmu-nr_arch_fixed_counters) - 1) - X86_PMC_IDX_FIXED)); + pmu-global_ctrl = ((1 pmu-nr_arch_gp_counters) - 1) | + (((1ull pmu-nr_arch_fixed_counters) - 1) X86_PMC_IDX_FIXED); + pmu-global_ctrl_mask = ~pmu-global_ctrl; } This is not called on INIT (not sure it should be). On the other hand update_cpuid() is not the best place to initialize stuff. Oh well, this can be fixed later (not sure its possible), I'll apply this to master. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] adding tracepoints to vhost
On Tue, Apr 10, 2012 at 12:40:50PM +0100, Stefan Hajnoczi wrote: On Tue, Apr 10, 2012 at 3:58 AM, Jason Wang jasow...@redhat.com wrote: To help in vhost analyzing, the following series adding basic tracepoints to vhost. Operations of both virtqueues and vhost works were traced in current implementation, net code were untouched. A top-like satistics displaying script were introduced to help the troubleshooting. TODO: - net specific tracepoints? --- Jason Wang (2): vhost: basic tracepoints tools: virtio: add a top-like utility for displaying vhost satistics drivers/vhost/trace.h | 153 drivers/vhost/vhost.c | 17 ++ tools/virtio/vhost_stat | 360 +++ 3 files changed, 528 insertions(+), 2 deletions(-) create mode 100644 drivers/vhost/trace.h create mode 100755 tools/virtio/vhost_stat Perhaps this can replace the vhost log feature? I'm not sure if tracepoints support the right data types but it seems like vhost debugging could be done using tracing with less code. Stefan vhost log is not a debugging tool, it logs memory accesses for migration. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] adding tracepoints to vhost
On Tue, Apr 10, 2012 at 1:42 PM, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Apr 10, 2012 at 12:40:50PM +0100, Stefan Hajnoczi wrote: On Tue, Apr 10, 2012 at 3:58 AM, Jason Wang jasow...@redhat.com wrote: To help in vhost analyzing, the following series adding basic tracepoints to vhost. Operations of both virtqueues and vhost works were traced in current implementation, net code were untouched. A top-like satistics displaying script were introduced to help the troubleshooting. TODO: - net specific tracepoints? --- Jason Wang (2): vhost: basic tracepoints tools: virtio: add a top-like utility for displaying vhost satistics drivers/vhost/trace.h | 153 drivers/vhost/vhost.c | 17 ++ tools/virtio/vhost_stat | 360 +++ 3 files changed, 528 insertions(+), 2 deletions(-) create mode 100644 drivers/vhost/trace.h create mode 100755 tools/virtio/vhost_stat Perhaps this can replace the vhost log feature? I'm not sure if tracepoints support the right data types but it seems like vhost debugging could be done using tracing with less code. Stefan vhost log is not a debugging tool, it logs memory accesses for migration. Thanks. I totally misunderstood its purpose. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] KVM: Avoid zapping unrelated shadows in __kvm_set_memory_region()
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp We do not need to zap all shadow pages of the guest when we create or destroy a slot in this function. To change this, we make kvm_mmu_zap_all()/kvm_arch_flush_shadow() zap only those which have mappings into a given slot. The way we iterate through active shadow pages is also changed to avoid checking unrelated pages again and again. Furthermore, the condition to see if we have any mmio sptes to clear is changed so that we will not do flush for newly created slots. With all these changes applied, the total amount of time needed to flush shadow pages of a usual Linux guest, running Fedora with 4GB memory, during a shutdown was reduced from 90ms to 60ms. Furthermore, the total number of flushes needed to boot and shutdown that guest was also reduced from 52 to 31. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Cc: Takuya Yoshikawa takuya.yoshik...@gmail.com --- [ Added cc to my gmail account because my address may change (only) a bit in a few months. ] rebased against next-candidates arch/ia64/kvm/kvm-ia64.c|2 +- arch/powerpc/kvm/powerpc.c |2 +- arch/s390/kvm/kvm-s390.c|2 +- arch/x86/include/asm/kvm_host.h |2 +- arch/x86/kvm/mmu.c | 22 ++ arch/x86/kvm/x86.c | 13 ++--- include/linux/kvm_host.h|2 +- virt/kvm/kvm_main.c | 15 ++- 8 files changed, 39 insertions(+), 21 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 9d80ff8..360abe5 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1626,7 +1626,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, return; } -void kvm_arch_flush_shadow(struct kvm *kvm) +void kvm_arch_flush_shadow(struct kvm *kvm, int slot) { kvm_flush_remote_tlbs(kvm); } diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 58ad860..5680337 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -319,7 +319,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, } -void kvm_arch_flush_shadow(struct kvm *kvm) +void kvm_arch_flush_shadow(struct kvm *kvm, int slot) { } diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index d30c835..8c25606 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -879,7 +879,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, return; } -void kvm_arch_flush_shadow(struct kvm *kvm) +void kvm_arch_flush_shadow(struct kvm *kvm, int slot) { } diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f624ca7..422f23a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -715,7 +715,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot); void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn_offset, unsigned long mask); -void kvm_mmu_zap_all(struct kvm *kvm); +void kvm_mmu_zap_all(struct kvm *kvm, int slot); unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm); void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages); diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 29ad6f9..a50f7ba 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3930,16 +3930,30 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot) kvm_flush_remote_tlbs(kvm); } -void kvm_mmu_zap_all(struct kvm *kvm) +/** + * kvm_mmu_zap_all - zap all shadows which have mappings into a given slot + * @kvm: the kvm instance + * @slot: id of the target slot + * + * If @slot is -1, zap all shadow pages. + */ +void kvm_mmu_zap_all(struct kvm *kvm, int slot) { struct kvm_mmu_page *sp, *node; LIST_HEAD(invalid_list); + int zapped; spin_lock(kvm-mmu_lock); restart: - list_for_each_entry_safe(sp, node, kvm-arch.active_mmu_pages, link) - if (kvm_mmu_prepare_zap_page(kvm, sp, invalid_list)) - goto restart; + zapped = 0; + list_for_each_entry_safe(sp, node, kvm-arch.active_mmu_pages, link) { + if ((slot = 0) !test_bit(slot, sp-slot_bitmap)) + continue; + + zapped |= kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); + } + if (zapped) + goto restart; kvm_mmu_commit_zap_page(kvm, invalid_list); spin_unlock(kvm-mmu_lock); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0d9a578..eac378c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5038,7 +5038,7 @@ int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt) * to ensure that the updated hypercall appears atomically across all * VCPUs. */ - kvm_mmu_zap_all(vcpu-kvm); +
Re: [PATCH] virtio_blk: Add help function to format mass of disks
On 04/10/2012 03:28 PM, Ren Mingxin wrote: The current virtio block's naming algorithm just supports 18278 (26^3 + 26^2 + 26) disks. If there are mass of virtio blocks, there will be disks with the same name. Based on commit 3e1a7ff8a0a7b948f2684930166954f9e8e776fe, I add function virtblk_name_format() for virtio block to support mass of disks naming. Signed-off-by: Ren Mingxinre...@cn.fujitsu.com Make sense to me. Acked-by: Asias He as...@redhat.com --- drivers/block/virtio_blk.c | 38 ++ 1 files changed, 26 insertions(+), 12 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index c4a60ba..86516c8 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -374,6 +374,31 @@ static int init_vq(struct virtio_blk *vblk) return err; } +static int virtblk_name_format(char *prefix, int index, char *buf, int buflen) +{ + const int base = 'z' - 'a' + 1; + char *begin = buf + strlen(prefix); + char *begin = buf + strlen(prefix); + char *end = buf + buflen; + char *p; + int unit; + + p = end - 1; + *p = '\0'; + unit = base; + do { + if (p == begin) + return -EINVAL; + *--p = 'a' + (index % unit); + index = (index / unit) - 1; + } while (index= 0); + + memmove(begin, p, end - p); + memcpy(buf, prefix, strlen(prefix)); + + return 0; +} + static int __devinit virtblk_probe(struct virtio_device *vdev) { struct virtio_blk *vblk; @@ -442,18 +467,7 @@ static int __devinit virtblk_probe(struct virtio_device *vdev) q-queuedata = vblk; - if (index 26) { - sprintf(vblk-disk-disk_name, vd%c, 'a' + index % 26); - } else if (index (26 + 1) * 26) { - sprintf(vblk-disk-disk_name, vd%c%c, - 'a' + index / 26 - 1, 'a' + index % 26); - } else { - const unsigned int m1 = (index / 26 - 1) / 26 - 1; - const unsigned int m2 = (index / 26 - 1) % 26; - const unsigned int m3 = index % 26; - sprintf(vblk-disk-disk_name, vd%c%c%c, - 'a' + m1, 'a' + m2, 'a' + m3); - } + virtblk_name_format(vd, index, vblk-disk-disk_name, DISK_NAME_LEN); vblk-disk-major = major; vblk-disk-first_minor = index_to_minor(index); -- Asias -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] adding tracepoints to vhost
On Tue, Apr 10, 2012 at 8:42 PM, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Apr 10, 2012 at 12:40:50PM +0100, Stefan Hajnoczi wrote: On Tue, Apr 10, 2012 at 3:58 AM, Jason Wang jasow...@redhat.com wrote: To help in vhost analyzing, the following series adding basic tracepoints to vhost. Operations of both virtqueues and vhost works were traced in current implementation, net code were untouched. A top-like satistics displaying script were introduced to help the troubleshooting. TODO: - net specific tracepoints? --- Jason Wang (2): vhost: basic tracepoints tools: virtio: add a top-like utility for displaying vhost satistics drivers/vhost/trace.h | 153 drivers/vhost/vhost.c | 17 ++ tools/virtio/vhost_stat | 360 +++ 3 files changed, 528 insertions(+), 2 deletions(-) create mode 100644 drivers/vhost/trace.h create mode 100755 tools/virtio/vhost_stat Perhaps this can replace the vhost log feature? I'm not sure if tracepoints support the right data types but it seems like vhost debugging could be done using tracing with less code. Stefan vhost log is not a debugging tool, it logs memory accesses for migration. Great, it is very appreciated if there's some docs about this ___ Virtualization mailing list virtualizat...@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization -- Regards, Zhi Yong Wu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_blk: Add help function to format mass of disks
On 04/10/2012 10:28 AM, Ren Mingxin wrote: The current virtio block's naming algorithm just supports 18278 (26^3 + 26^2 + 26) disks. If there are mass of virtio blocks, there will be disks with the same name. Based on commit 3e1a7ff8a0a7b948f2684930166954f9e8e776fe, I add function virtblk_name_format() for virtio block to support mass of disks naming. Signed-off-by: Ren Mingxin re...@cn.fujitsu.com --- drivers/block/virtio_blk.c | 38 ++ 1 files changed, 26 insertions(+), 12 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index c4a60ba..86516c8 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -374,6 +374,31 @@ static int init_vq(struct virtio_blk *vblk) return err; } +static int virtblk_name_format(char *prefix, int index, char *buf, int buflen) +{ + const int base = 'z' - 'a' + 1; + char *begin = buf + strlen(prefix); + char *begin = buf + strlen(prefix); Duplicate line. + char *end = buf + buflen; + char *p; + int unit; + + p = end - 1; + *p = '\0'; + unit = base; Why not use 'base' below? neither unit nor base change. + do { + if (p == begin) + return -EINVAL; + *--p = 'a' + (index % unit); + index = (index / unit) - 1; + } while (index = 0); + + memmove(begin, p, end - p); + memcpy(buf, prefix, strlen(prefix)); + + return 0; +} + -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv0 dont apply] RFC: kvm eoi PV using shared memory
I took a stub at implementing PV EOI using shared memory. This should reduce the number of exits an interrupt causes as much as by half. A partially complete draft for both host and guest parts is below. The idea is simple: there's a bit, per APIC, in guest memory, that tells the guest that it does not need EOI. We set it before injecting an interrupt and clear before injecting a nested one. Guest tests it using a test and clear operation - this is necessary so that host can detect interrupt nesting - and if set, it can skip the EOI MSR. There's a new MSR to set the address of said register in guest memory. Otherwise not much changes: - Guest EOI is not required - ISR is automatically cleared before injection Some things are incomplete: add feature negotiation options, qemu support for said options. No testing was done beyond compiling the kernel. I would appreciate early feedback. Signed-off-by: Michael S. Tsirkin m...@redhat.com -- diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index d854101..8430f41 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -457,8 +457,13 @@ static inline u32 safe_apic_wait_icr_idle(void) { return 0; } #endif /* CONFIG_X86_LOCAL_APIC */ +DECLARE_EARLY_PER_CPU(unsigned long, apic_eoi); + static inline void ack_APIC_irq(void) { + if (__test_and_clear_bit(0, __get_cpu_var(apic_eoi))) + return; + /* * ack_APIC_irq() actually gets compiled as a single instruction * ... yummie. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e216ba0..0ee1472 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -481,6 +481,12 @@ struct kvm_vcpu_arch { u64 length; u64 status; } osvw; + + struct { + u64 msr_val; + struct gfn_to_hva_cache data; + int vector; + } eoi; }; struct kvm_lpage_info { diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 734c376..e22b9f8 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -37,6 +37,8 @@ #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 +#define MSR_KVM_EOI_EN 0x4b564d04 +#define MSR_KVM_EOI_ENABLED 0x1 struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 11544d8..1b3f9fa 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -89,6 +89,9 @@ EXPORT_EARLY_PER_CPU_SYMBOL(x86_bios_cpu_apicid); */ DEFINE_EARLY_PER_CPU(int, x86_cpu_to_logical_apicid, BAD_APICID); +DEFINE_EARLY_PER_CPU(unsigned long, apic_eoi, 0); +EXPORT_EARLY_PER_CPU_SYMBOL(apic_eoi); + /* * Knob to control our willingness to enable the local APIC. * diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index b8ba6e4..8b50f3a 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -39,6 +39,7 @@ #include asm/desc.h #include asm/tlbflush.h #include asm/idle.h +#include asm/apic.h static int kvmapf = 1; @@ -307,6 +308,9 @@ void __cpuinit kvm_guest_cpu_init(void) smp_processor_id()); } + wrmsrl(MSR_KVM_EOI_EN, __pa(this_cpu_ptr(apic_eoi)) | + MSR_KVM_EOI_ENABLED); + if (has_steal_clock) kvm_register_steal_time(); } diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 8584322..9e38e12 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -265,7 +265,61 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq) irq-level, irq-trig_mode); } -static inline int apic_find_highest_isr(struct kvm_lapic *apic) +static int eoi_put_user(struct kvm_vcpu *vcpu, u32 val) +{ + + return kvm_write_guest_cached(vcpu-kvm, vcpu-arch.eoi.data, val, + sizeof(val)); +} + +static int eoi_get_user(struct kvm_vcpu *vcpu, u32 *val) +{ + + return kvm_read_guest_cached(vcpu-kvm, vcpu-arch.eoi.data, val, + sizeof(*val)); +} + +static inline bool eoi_enabled(struct kvm_vcpu *vcpu) +{ + return (vcpu-arch.eoi.msr_val MSR_KVM_EOI_ENABLED); +} + +static int eoi_get_pending_vector(struct kvm_vcpu *vcpu) +{ + u32 val; + if (eoi_get_user(vcpu, val) 0) + apic_debug(Can't read EOI MSR value: 0x%llx\n, + (unsigned long long)vcpi-arch.eoi.msr_val); + if (!(val 0x1)) + vcpu-arch.eoi.vector = -1; + return vcpu-arch.eoi.vector; +} + +static void eoi_set_pending_vector(struct kvm_vcpu *vcpu, int vector) +{ + BUG_ON(vcpu-arch.eoi.vector != -1); + if (eoi_put_user(vcpu, 0x1) 0) { + apic_debug(Can't set EOI MSR value: 0x%llx\n, + (unsigned long
Re: [PATCH] virtio_blk: Add help function to format mass of disks
On Tue, Apr 10, 2012 at 04:16:10PM +0300, Avi Kivity wrote: On 04/10/2012 10:28 AM, Ren Mingxin wrote: The current virtio block's naming algorithm just supports 18278 (26^3 + 26^2 + 26) disks. If there are mass of virtio blocks, there will be disks with the same name. Based on commit 3e1a7ff8a0a7b948f2684930166954f9e8e776fe, I add function virtblk_name_format() for virtio block to support mass of disks naming. Signed-off-by: Ren Mingxin re...@cn.fujitsu.com --- drivers/block/virtio_blk.c | 38 ++ 1 files changed, 26 insertions(+), 12 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index c4a60ba..86516c8 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -374,6 +374,31 @@ static int init_vq(struct virtio_blk *vblk) return err; } +static int virtblk_name_format(char *prefix, int index, char *buf, int buflen) +{ + const int base = 'z' - 'a' + 1; + char *begin = buf + strlen(prefix); + char *begin = buf + strlen(prefix); Duplicate line. + char *end = buf + buflen; + char *p; + int unit; + + p = end - 1; + *p = '\0'; + unit = base; Why not use 'base' below? neither unit nor base change. Yes it's a bit strange, it was the same in Tejun's patch. Tejun, any idea? + do { + if (p == begin) + return -EINVAL; + *--p = 'a' + (index % unit); + index = (index / unit) - 1; + } while (index = 0); + + memmove(begin, p, end - p); + memcpy(buf, prefix, strlen(prefix)); + + return 0; +} + -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for April, Tuesday 10
As there are no topics, call is cancelled. Sorry for the late notice. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv0 dont apply] RFC: kvm eoi PV using shared memory
On 04/10/2012 04:27 PM, Michael S. Tsirkin wrote: I took a stub at implementing PV EOI using shared memory. This should reduce the number of exits an interrupt causes as much as by half. A partially complete draft for both host and guest parts is below. The idea is simple: there's a bit, per APIC, in guest memory, that tells the guest that it does not need EOI. We set it before injecting an interrupt and clear before injecting a nested one. Guest tests it using a test and clear operation - this is necessary so that host can detect interrupt nesting - and if set, it can skip the EOI MSR. There's a new MSR to set the address of said register in guest memory. Otherwise not much changes: - Guest EOI is not required - ISR is automatically cleared before injection Some things are incomplete: add feature negotiation options, qemu support for said options. No testing was done beyond compiling the kernel. I would appreciate early feedback. Signed-off-by: Michael S. Tsirkin m...@redhat.com -- diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index d854101..8430f41 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -457,8 +457,13 @@ static inline u32 safe_apic_wait_icr_idle(void) { return 0; } #endif /* CONFIG_X86_LOCAL_APIC */ +DECLARE_EARLY_PER_CPU(unsigned long, apic_eoi); + static inline void ack_APIC_irq(void) { + if (__test_and_clear_bit(0, __get_cpu_var(apic_eoi))) + return; + While __test_and_clear_bit() is implemented in a single instruction, it's not required to be. Better have the instruction there explicitly. /* * ack_APIC_irq() actually gets compiled as a single instruction * ... yummie. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e216ba0..0ee1472 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -481,6 +481,12 @@ struct kvm_vcpu_arch { u64 length; u64 status; } osvw; + + struct { + u64 msr_val; + struct gfn_to_hva_cache data; + int vector; + } eoi; }; Needs to be cleared on INIT. @@ -307,6 +308,9 @@ void __cpuinit kvm_guest_cpu_init(void) smp_processor_id()); } + wrmsrl(MSR_KVM_EOI_EN, __pa(this_cpu_ptr(apic_eoi)) | +MSR_KVM_EOI_ENABLED); + Clear on kexec. if (has_steal_clock) kvm_register_steal_time(); } diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 8584322..9e38e12 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -265,7 +265,61 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq) irq-level, irq-trig_mode); } -static inline int apic_find_highest_isr(struct kvm_lapic *apic) +static int eoi_put_user(struct kvm_vcpu *vcpu, u32 val) +{ + + return kvm_write_guest_cached(vcpu-kvm, vcpu-arch.eoi.data, val, + sizeof(val)); +} + +static int eoi_get_user(struct kvm_vcpu *vcpu, u32 *val) +{ + + return kvm_read_guest_cached(vcpu-kvm, vcpu-arch.eoi.data, val, + sizeof(*val)); +} + +static inline bool eoi_enabled(struct kvm_vcpu *vcpu) +{ + return (vcpu-arch.eoi.msr_val MSR_KVM_EOI_ENABLED); +} + +static int eoi_get_pending_vector(struct kvm_vcpu *vcpu) +{ + u32 val; + if (eoi_get_user(vcpu, val) 0) + apic_debug(Can't read EOI MSR value: 0x%llx\n, +(unsigned long long)vcpi-arch.eoi.msr_val); + if (!(val 0x1)) + vcpu-arch.eoi.vector = -1; + return vcpu-arch.eoi.vector; +} + +static void eoi_set_pending_vector(struct kvm_vcpu *vcpu, int vector) +{ + BUG_ON(vcpu-arch.eoi.vector != -1); + if (eoi_put_user(vcpu, 0x1) 0) { + apic_debug(Can't set EOI MSR value: 0x%llx\n, +(unsigned long long)vcpi-arch.eoi.msr_val); + return; + } + vcpu-arch.eoi.vector = vector; +} + +static int eoi_clr_pending_vector(struct kvm_vcpu *vcpu) +{ + int vector; + vector = vcpu-arch.eoi.vector; + if (vector != -1 eoi_put_user(vcpu, 0x0) 0) { + apic_debug(Can't clear EOI MSR value: 0x%llx\n, +(unsigned long long)vcpi-arch.eoi.msr_val); + return -1; + } + vcpu-arch.eoi.vector = -1; + return vector; +} + +static inline int __apic_find_highest_isr(struct kvm_lapic *apic) { int result; @@ -275,6 +329,17 @@ static inline int apic_find_highest_isr(struct kvm_lapic *apic) return result; } +static inline int apic_find_highest_isr(struct kvm_lapic *apic) +{ + int vector; + if (eoi_enabled(apic-vcpu)) { + vector = eoi_get_pending_vector(apic-vcpu); + if (vector != -1) +
Re: [PATCH 0/2] adding tracepoints to vhost
On Tue, Apr 10, 2012 at 09:10:48PM +0800, Zhi Yong Wu wrote: Perhaps this can replace the vhost log feature? I'm not sure if tracepoints support the right data types but it seems like vhost debugging could be done using tracing with less code. Stefan vhost log is not a debugging tool, it logs memory accesses for migration. Great, it is very appreciated if there's some docs about this About what? vhost logging? See the comment near the definition of VHOST_SET_LOG_BASE in vhost.h ___ Virtualization mailing list virtualizat...@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization -- Regards, Zhi Yong Wu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv0 dont apply] RFC: kvm eoi PV using shared memory
On 04/10/2012 05:26 PM, Michael S. Tsirkin wrote: u64 status; } osvw; + + struct { + u64 msr_val; + struct gfn_to_hva_cache data; + int vector; + } eoi; }; Needs to be cleared on INIT. You mean kvm_arch_vcpu_reset? Yes, or kvm_lapic_reset(). @@ -307,6 +308,9 @@ void __cpuinit kvm_guest_cpu_init(void) smp_processor_id()); } + wrmsrl(MSR_KVM_EOI_EN, __pa(this_cpu_ptr(apic_eoi)) | +MSR_KVM_EOI_ENABLED); + Clear on kexec. With register_reboot_notifier? Yes, we already clear some kvm msrs there. - apic_set_vector(vector, apic-regs + APIC_ISR); + if (eoi_enabled(vcpu)) { + /* Anything pending? If yes disable eoi optimization. */ + if (unlikely(apic_find_highest_isr(apic) = 0)) { + int v = eoi_clr_pending_vector(vcpu); ISR != pending, that's IRR. If ISR vector has lower priority than the new vector, then we don't need to disable eoi avoidance. Yes. But we can and it's easier than figuring out priorities. I am guessing such collisions are rare, right? It's pretty easy, if there is something in IRR but kvm_lapic_has_interrupt() returns -1, then we need to disable eoi avoidance. I'll add a trace to make sure. + if (v != -1) + apic_set_vector(v, apic-regs + APIC_ISR); + } else { + eoi_set_pending_vector(vcpu, vector); + set_isr = false; Weird. Just set it normally. Remember that reading the ISR needs to return the correct value. Marcelo said linux does not normally read ISR - not true? It's true and it's irrelevant. We aren't coding a feature to what linux does now, but for what linux or another guest may do in the future. Note this has no effect if the PV optimization is not enabled. We need to process the avoided EOI before any APIC read/writes, to be sure the guest sees the updated values. Same for IOAPIC, EOI affects remote_irr. That may been we need to sample it after every exit, or perhaps disable the feature for level-triggered interrupts. Disabling would be very sad. Can we sample on remote irr read? That can be done from another vcpu. Why do we care about level-triggered interrupts? Everything uses MSI or edge-triggered IOAPIC interrupts these days. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv0 dont apply] RFC: kvm eoi PV using shared memory
On Tue, Apr 10, 2012 at 05:33:18PM +0300, Avi Kivity wrote: On 04/10/2012 05:26 PM, Michael S. Tsirkin wrote: u64 status; } osvw; + + struct { + u64 msr_val; + struct gfn_to_hva_cache data; + int vector; + } eoi; }; Needs to be cleared on INIT. You mean kvm_arch_vcpu_reset? Yes, or kvm_lapic_reset(). @@ -307,6 +308,9 @@ void __cpuinit kvm_guest_cpu_init(void) smp_processor_id()); } + wrmsrl(MSR_KVM_EOI_EN, __pa(this_cpu_ptr(apic_eoi)) | + MSR_KVM_EOI_ENABLED); + Clear on kexec. With register_reboot_notifier? Yes, we already clear some kvm msrs there. - apic_set_vector(vector, apic-regs + APIC_ISR); + if (eoi_enabled(vcpu)) { + /* Anything pending? If yes disable eoi optimization. */ + if (unlikely(apic_find_highest_isr(apic) = 0)) { + int v = eoi_clr_pending_vector(vcpu); ISR != pending, that's IRR. If ISR vector has lower priority than the new vector, then we don't need to disable eoi avoidance. Yes. But we can and it's easier than figuring out priorities. I am guessing such collisions are rare, right? It's pretty easy, if there is something in IRR but kvm_lapic_has_interrupt() returns -1, then we need to disable eoi avoidance. I only see kvm_apic_has_interrupt - is this what you mean? I'll add a trace to make sure. + if (v != -1) + apic_set_vector(v, apic-regs + APIC_ISR); + } else { + eoi_set_pending_vector(vcpu, vector); + set_isr = false; Weird. Just set it normally. Remember that reading the ISR needs to return the correct value. Marcelo said linux does not normally read ISR - not true? It's true and it's irrelevant. We aren't coding a feature to what linux does now, but for what linux or another guest may do in the future. Right. So you think reading ISR has value in combination with PV EOI for future guests? I'm not arguing either way just curious. Note this has no effect if the PV optimization is not enabled. We need to process the avoided EOI before any APIC read/writes, to be sure the guest sees the updated values. Same for IOAPIC, EOI affects remote_irr. That may been we need to sample it after every exit, or perhaps disable the feature for level-triggered interrupts. Disabling would be very sad. Can we sample on remote irr read? That can be done from another vcpu. We still can handle it, right? Where's the code that handles that read? Why do we care about level-triggered interrupts? Everything uses MSI or edge-triggered IOAPIC interrupts these days. Well lots of emulated devices don't yet. They probably should but it's nice to be able to test with e.g. e1000 emulation not just virtio. Besides, kvm_get_apic_interrupt simply doesn't know about the triggering mode at the moment. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] virt tests: add dd_test
This patch adds dd_test which tries simple read/write from/to an attached disk. The purpose is to test readonly vs. readwrite subsystem. The dd_test.py is highly parametrizable so it can be used in other tests by changing parameters. It also adds another image parameter image_readonly (bool) which is required for this test. Signed-off-by: Lukas Doktor ldok...@redhat.com --- client/virt/kvm_vm.py |6 ++- client/virt/subtests.cfg.sample | 35 + client/virt/tests/dd_test.py| 105 +++ 3 files changed, 144 insertions(+), 2 deletions(-) create mode 100644 client/virt/tests/dd_test.py diff --git a/client/virt/kvm_vm.py b/client/virt/kvm_vm.py index 80ea453..cbc0494 100644 --- a/client/virt/kvm_vm.py +++ b/client/virt/kvm_vm.py @@ -310,7 +310,7 @@ class VM(virt_vm.BaseVM): boot=False, blkdebug=None, bus=None, port=None, bootindex=None, removable=None, min_io_size=None, opt_io_size=None, physical_block_size=None, - logical_block_size=None): + logical_block_size=None, readonly=False): name = None dev = if format == ahci: @@ -362,6 +362,7 @@ class VM(virt_vm.BaseVM): cmd += _add_option(snapshot, snapshot, bool) cmd += _add_option(boot, boot, bool) cmd += _add_option(id, name) +cmd += _add_option(readonly, readonly, bool) return cmd + dev def add_nic(help, vlan, model=None, mac=None, device_id=None, netdev_id=None, @@ -758,7 +759,8 @@ class VM(virt_vm.BaseVM): image_params.get(min_io_size), image_params.get(opt_io_size), image_params.get(physical_block_size), -image_params.get(logical_block_size)) +image_params.get(logical_block_size), +image_params.get(image_readonly)) redirs = [] for redir_name in params.objects(redirs): diff --git a/client/virt/subtests.cfg.sample b/client/virt/subtests.cfg.sample index 2192840..ec9fbc3 100644 --- a/client/virt/subtests.cfg.sample +++ b/client/virt/subtests.cfg.sample @@ -398,6 +398,41 @@ variants: create_image_stg = yes image_size_stg = 10M +- dd_test: install setup image_copy unattended_install.cdrom +type = dd_test +images += stg1 +image_name_stg1 = sgt1 +image_size_stg1 = 1M +image_snapshot_stg1 = no +drive_index_stg1 = 3 +dd_count = 1 +# last input and output disk +dd_if_select = -1 +dd_of_select = -1 +variants: +- readwrite: +dd_stat = 0 +variants: +- zero2disk: +dd_if = ZERO +dd_of = /dev/[shv]d? +- disk2null: +dd_if = /dev/[shv]d? +dd_of = NULL +- readonly: +# ide, ahci don't support readonly disks +no ide, ahci +image_readonly_stg1 = yes +variants: +- zero2disk: +dd_if = ZERO +dd_of = /dev/[shv]d? +dd_stat = 1 +- disk2null: +dd_if = /dev/[shv]d? +dd_of = NULL +dd_stat = 0 + - virsh_migrate: install setup image_copy unattended_install.cdrom type = virsh_migrate diff --git a/client/virt/tests/dd_test.py b/client/virt/tests/dd_test.py new file mode 100644 index 000..48d32b5 --- /dev/null +++ b/client/virt/tests/dd_test.py @@ -0,0 +1,105 @@ + +Configurable on-guest dd test. +@author: Lukas Doktor ldok...@redhat.com +@copyright: 2012 Red Hat, Inc. + +import logging +from autotest_lib.client.common_lib import error +from autotest_lib.client.virt.aexpect import ShellCmdError +from autotest_lib.client.virt.aexpect import ShellTimeoutError + + +def run_dd_test(test, params, env): + +Executes dd with defined parameters and checks the return number and output + +def _get_file(filename, select): + Picks the actual file based on select value +if filename == NULL: +return /dev/null +elif filename == ZERO: +return /dev/zero +elif filename == RANDOM: +return /dev/random +elif filename == URANDOM: +return /dev/urandom +else: +# get all matching filenames +try: +disks = sorted(session.cmd(ls -1d %s % filename).split('\n')) +except ShellCmdError: # No matching file (creating new?) +disks = [filename] +if disks[-1] == '': +disks = disks[:-1] +try: +
[PATCH 2/2] virt: Fix usb_stick block device subsystem
* Add ehci controller when usbstick is selected * Add default number of usb_max_port With those mini-changes it's possible to run every test with usb_stick block device without further test/cfg modifications. Signed-off-by: Lukas Doktor ldok...@redhat.com --- client/virt/guest-hw.cfg.sample |2 ++ client/virt/kvm_vm.py |2 +- 2 files changed, 3 insertions(+), 1 deletions(-) diff --git a/client/virt/guest-hw.cfg.sample b/client/virt/guest-hw.cfg.sample index 0729117..655ac9b 100644 --- a/client/virt/guest-hw.cfg.sample +++ b/client/virt/guest-hw.cfg.sample @@ -71,6 +71,8 @@ variants: cd_format=ahci - usb_stick: drive_format=usb2 +usbs += default-ehci +usb_type_default-ehci = usb-ehci - usb_cdrom: cd_format=usb2 - xenblk: diff --git a/client/virt/kvm_vm.py b/client/virt/kvm_vm.py index cbc0494..32d7330 100644 --- a/client/virt/kvm_vm.py +++ b/client/virt/kvm_vm.py @@ -238,7 +238,7 @@ class VM(virt_vm.BaseVM): usb_dev = self.usb_dev_dict.get(usb) controller = usb -max_port = int(usb_params.get(usb_max_port)) +max_port = int(usb_params.get(usb_max_port, 6)) if len(usb_dev) max_port: bus = %s.0 % usb self.usb_dev_dict[usb].append(dev) -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] adding tracepoints to vhost
On Tue, Apr 10, 2012 at 9:45 PM, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Apr 10, 2012 at 09:10:48PM +0800, Zhi Yong Wu wrote: Perhaps this can replace the vhost log feature? I'm not sure if tracepoints support the right data types but it seems like vhost debugging could be done using tracing with less code. Stefan vhost log is not a debugging tool, it logs memory accesses for migration. Great, it is very appreciated if there's some docs about this About what? vhost logging? See the comment near the Yeah, thanks definition of VHOST_SET_LOG_BASE in vhost.h ___ Virtualization mailing list virtualizat...@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization -- Regards, Zhi Yong Wu -- Regards, Zhi Yong Wu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv0 dont apply] RFC: kvm eoi PV using shared memory
On Tue, Apr 10, 2012 at 05:03:22PM +0300, Avi Kivity wrote: On 04/10/2012 04:27 PM, Michael S. Tsirkin wrote: I took a stub at implementing PV EOI using shared memory. This should reduce the number of exits an interrupt causes as much as by half. A partially complete draft for both host and guest parts is below. The idea is simple: there's a bit, per APIC, in guest memory, that tells the guest that it does not need EOI. We set it before injecting an interrupt and clear before injecting a nested one. Guest tests it using a test and clear operation - this is necessary so that host can detect interrupt nesting - and if set, it can skip the EOI MSR. There's a new MSR to set the address of said register in guest memory. Otherwise not much changes: - Guest EOI is not required - ISR is automatically cleared before injection Some things are incomplete: add feature negotiation options, qemu support for said options. No testing was done beyond compiling the kernel. I would appreciate early feedback. Signed-off-by: Michael S. Tsirkin m...@redhat.com -- diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index d854101..8430f41 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -457,8 +457,13 @@ static inline u32 safe_apic_wait_icr_idle(void) { return 0; } #endif /* CONFIG_X86_LOCAL_APIC */ +DECLARE_EARLY_PER_CPU(unsigned long, apic_eoi); + static inline void ack_APIC_irq(void) { + if (__test_and_clear_bit(0, __get_cpu_var(apic_eoi))) + return; + While __test_and_clear_bit() is implemented in a single instruction, it's not required to be. Better have the instruction there explicitly. /* * ack_APIC_irq() actually gets compiled as a single instruction * ... yummie. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e216ba0..0ee1472 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -481,6 +481,12 @@ struct kvm_vcpu_arch { u64 length; u64 status; } osvw; + + struct { + u64 msr_val; + struct gfn_to_hva_cache data; + int vector; + } eoi; }; Needs to be cleared on INIT. You mean kvm_arch_vcpu_reset? @@ -307,6 +308,9 @@ void __cpuinit kvm_guest_cpu_init(void) smp_processor_id()); } + wrmsrl(MSR_KVM_EOI_EN, __pa(this_cpu_ptr(apic_eoi)) | + MSR_KVM_EOI_ENABLED); + Clear on kexec. With register_reboot_notifier? if (has_steal_clock) kvm_register_steal_time(); } diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 8584322..9e38e12 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -265,7 +265,61 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq) irq-level, irq-trig_mode); } -static inline int apic_find_highest_isr(struct kvm_lapic *apic) +static int eoi_put_user(struct kvm_vcpu *vcpu, u32 val) +{ + + return kvm_write_guest_cached(vcpu-kvm, vcpu-arch.eoi.data, val, + sizeof(val)); +} + +static int eoi_get_user(struct kvm_vcpu *vcpu, u32 *val) +{ + + return kvm_read_guest_cached(vcpu-kvm, vcpu-arch.eoi.data, val, + sizeof(*val)); +} + +static inline bool eoi_enabled(struct kvm_vcpu *vcpu) +{ + return (vcpu-arch.eoi.msr_val MSR_KVM_EOI_ENABLED); +} + +static int eoi_get_pending_vector(struct kvm_vcpu *vcpu) +{ + u32 val; + if (eoi_get_user(vcpu, val) 0) + apic_debug(Can't read EOI MSR value: 0x%llx\n, + (unsigned long long)vcpi-arch.eoi.msr_val); + if (!(val 0x1)) + vcpu-arch.eoi.vector = -1; + return vcpu-arch.eoi.vector; +} + +static void eoi_set_pending_vector(struct kvm_vcpu *vcpu, int vector) +{ + BUG_ON(vcpu-arch.eoi.vector != -1); + if (eoi_put_user(vcpu, 0x1) 0) { + apic_debug(Can't set EOI MSR value: 0x%llx\n, + (unsigned long long)vcpi-arch.eoi.msr_val); + return; + } + vcpu-arch.eoi.vector = vector; +} + +static int eoi_clr_pending_vector(struct kvm_vcpu *vcpu) +{ + int vector; + vector = vcpu-arch.eoi.vector; + if (vector != -1 eoi_put_user(vcpu, 0x0) 0) { + apic_debug(Can't clear EOI MSR value: 0x%llx\n, + (unsigned long long)vcpi-arch.eoi.msr_val); + return -1; + } + vcpu-arch.eoi.vector = -1; + return vector; +} + +static inline int __apic_find_highest_isr(struct kvm_lapic *apic) { int result; @@ -275,6 +329,17 @@ static inline int apic_find_highest_isr(struct kvm_lapic *apic) return result; } +static inline int
Re: [PATCHv0 dont apply] RFC: kvm eoi PV using shared memory
On 04/10/2012 05:53 PM, Michael S. Tsirkin wrote: Yes. But we can and it's easier than figuring out priorities. I am guessing such collisions are rare, right? It's pretty easy, if there is something in IRR but kvm_lapic_has_interrupt() returns -1, then we need to disable eoi avoidance. I only see kvm_apic_has_interrupt - is this what you mean? Yes, sorry. It's not clear whether to do the check in kvm_apic_has_interrupt() or kvm_apic_get_interrupt() - the latter is called only after interrupts are enabled, so it looks like a better place (EOIs while interrupts are disabled have no effect). But need to make sure those functions are actually called, since they're protected by KVM_REQ_EVENT. I'll add a trace to make sure. + if (v != -1) + apic_set_vector(v, apic-regs + APIC_ISR); + } else { + eoi_set_pending_vector(vcpu, vector); + set_isr = false; Weird. Just set it normally. Remember that reading the ISR needs to return the correct value. Marcelo said linux does not normally read ISR - not true? It's true and it's irrelevant. We aren't coding a feature to what linux does now, but for what linux or another guest may do in the future. Right. So you think reading ISR has value in combination with PV EOI for future guests? I'm not arguing either way just curious. I don't. But we need to preserve the same interface the APIC has presented for thousands of years (well, almost). Note this has no effect if the PV optimization is not enabled. We need to process the avoided EOI before any APIC read/writes, to be sure the guest sees the updated values. Same for IOAPIC, EOI affects remote_irr. That may been we need to sample it after every exit, or perhaps disable the feature for level-triggered interrupts. Disabling would be very sad. Can we sample on remote irr read? That can be done from another vcpu. We still can handle it, right? Where's the code that handles that read? Better to keep everything per-cpu. The code is in virt/kvm/ioapic.c Why do we care about level-triggered interrupts? Everything uses MSI or edge-triggered IOAPIC interrupts these days. Well lots of emulated devices don't yet. They probably should but it's nice to be able to test with e.g. e1000 emulation not just virtio. e1000 doesn't support msi? Besides, kvm_get_apic_interrupt simply doesn't know about the triggering mode at the moment. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv0 dont apply] RFC: kvm eoi PV using shared memory
On Tue, Apr 10, 2012 at 06:00:51PM +0300, Avi Kivity wrote: On 04/10/2012 05:53 PM, Michael S. Tsirkin wrote: Yes. But we can and it's easier than figuring out priorities. I am guessing such collisions are rare, right? It's pretty easy, if there is something in IRR but kvm_lapic_has_interrupt() returns -1, then we need to disable eoi avoidance. I only see kvm_apic_has_interrupt - is this what you mean? Yes, sorry. It's not clear whether to do the check in kvm_apic_has_interrupt() or kvm_apic_get_interrupt() - the latter is called only after interrupts are enabled, so it looks like a better place (EOIs while interrupts are disabled have no effect). But need to make sure those functions are actually called, since they're protected by KVM_REQ_EVENT. Sorry not sure what you mean by make sure - read the code carefully? I'll add a trace to make sure. + if (v != -1) + apic_set_vector(v, apic-regs + APIC_ISR); + } else { + eoi_set_pending_vector(vcpu, vector); + set_isr = false; Weird. Just set it normally. Remember that reading the ISR needs to return the correct value. Marcelo said linux does not normally read ISR - not true? It's true and it's irrelevant. We aren't coding a feature to what linux does now, but for what linux or another guest may do in the future. Right. So you think reading ISR has value in combination with PV EOI for future guests? I'm not arguing either way just curious. I don't. But we need to preserve the same interface the APIC has presented for thousands of years (well, almost). Talk about overstatements :) Note this has no effect if the PV optimization is not enabled. We need to process the avoided EOI before any APIC read/writes, to be sure the guest sees the updated values. Same for IOAPIC, EOI affects remote_irr. That may been we need to sample it after every exit, or perhaps disable the feature for level-triggered interrupts. Disabling would be very sad. Can we sample on remote irr read? That can be done from another vcpu. We still can handle it, right? Where's the code that handles that read? Better to keep everything per-cpu. The code is in virt/kvm/ioapic.c Hmm. Disabling for level handles the ack notifiers issue as well, which I forgot about. It's a tough call. You think looking at TMR in kvm_get_apic_interrupt is safe? Why do we care about level-triggered interrupts? Everything uses MSI or edge-triggered IOAPIC interrupts these days. Well lots of emulated devices don't yet. They probably should but it's nice to be able to test with e.g. e1000 emulation not just virtio. e1000 doesn't support msi? qemu emulation doesn't. Besides, kvm_get_apic_interrupt simply doesn't know about the triggering mode at the moment. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_blk: Add help function to format mass of disks
Hello, guys. On Tue, Apr 10, 2012 at 04:34:06PM +0300, Michael S. Tsirkin wrote: Why not use 'base' below? neither unit nor base change. Yes it's a bit strange, it was the same in Tejun's patch. Tejun, any idea? It was years ago, so I don't recall much. I think I wanted to use a variable name which signifies its role - I worked out the rather convoluted base number logic on paper first and I probably wanted to keep the distinctions. I don't think it really matters at this point tho. Just make sure those functions are marked deprecated so that no one else copies them. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_blk: Add help function to format mass of disks
On Tue, Apr 10, 2012 at 08:49:43AM -0700, Tejun Heo wrote: Hello, guys. On Tue, Apr 10, 2012 at 04:34:06PM +0300, Michael S. Tsirkin wrote: Why not use 'base' below? neither unit nor base change. Yes it's a bit strange, it was the same in Tejun's patch. Tejun, any idea? It was years ago, so I don't recall much. I think I wanted to use a variable name which signifies its role - I worked out the rather convoluted base number logic on paper first and I probably wanted to keep the distinctions. I don't think it really matters at this point tho. Just make sure those functions are marked deprecated so that no one else copies them. Thanks. I guess I'll keep it same so it's easier to deduplicate if someon wants to. -- tejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv0 dont apply] RFC: kvm eoi PV using shared memory
On 04/10/2012 06:14 PM, Michael S. Tsirkin wrote: On Tue, Apr 10, 2012 at 06:00:51PM +0300, Avi Kivity wrote: On 04/10/2012 05:53 PM, Michael S. Tsirkin wrote: Yes. But we can and it's easier than figuring out priorities. I am guessing such collisions are rare, right? It's pretty easy, if there is something in IRR but kvm_lapic_has_interrupt() returns -1, then we need to disable eoi avoidance. I only see kvm_apic_has_interrupt - is this what you mean? Yes, sorry. It's not clear whether to do the check in kvm_apic_has_interrupt() or kvm_apic_get_interrupt() - the latter is called only after interrupts are enabled, so it looks like a better place (EOIs while interrupts are disabled have no effect). But need to make sure those functions are actually called, since they're protected by KVM_REQ_EVENT. Sorry not sure what you mean by make sure - read the code carefully? Yes. And I mean, get called at the right time. Better to keep everything per-cpu. The code is in virt/kvm/ioapic.c Hmm. Disabling for level handles the ack notifiers issue as well, which I forgot about. It's a tough call. You think looking at TMR in kvm_get_apic_interrupt is safe? Yes, it's read only from the guest point of view IIRC. Why do we care about level-triggered interrupts? Everything uses MSI or edge-triggered IOAPIC interrupts these days. Well lots of emulated devices don't yet. They probably should but it's nice to be able to test with e.g. e1000 emulation not just virtio. e1000 doesn't support msi? qemu emulation doesn't. Can be changed if someone's really interested. But really, avoiding EOIs for e1000 won't help it much. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] kvm: Disable MSI/MSI-X in assigned device reset path
On Mon, 2012-04-09 at 11:35 +0300, Avi Kivity wrote: On 04/08/2012 08:37 PM, Jan Kiszka wrote: The core problem is not the ordering. The problem is that the kernel is susceptible to ordering mistakes of userspace. And that is because the kernel panics on PCI errors of devices that are in user hands - a critical kernel bug IMHO. Certainly. But this userspace patch won't fix it. No, it won't in general and I don't think it makes sense to mangle pci-sysfs config space support to the nuances of a user space driver. We really need a userspace driver interface that limits the config space interactions and provides a channel to support error reporting and userspace recovery. This type of thing can be done with VFIO if we could ever get off the ground and get some consensus around it. Please feel free to contribute to that discussion if you ever want to get away from this clunky device assignment interface we have now. Proper reset of MSI or even the whole PCI config space is another issue, but one the kernel should not worry about - still, it should be fixed (therefore this patch). And I was asking what is the right way to do it. Reset the device and read back the register values, or do an emulated reset and push down the register values. Reading back the register values is currently a noop since the kernel restores the config space to the incoming state after reset. KVM does stash away the original config space of the device to be restored prior to releasing the device. We could restore to that each time, but that would mean implementing a device reset ioctl in kvm, and we'd still need this patch for compatibility and we still have the issues Michael brings up with the config restore updating things like MSI that we need to then manually sync with kvm. I fear suggesting it, but perhaps another way to achieve this result would be to de-assign and re-assign the device in reset. But even if we disallowed userland to disable MMIO and PIO access to the device, we would be be able to exclude that there are secrete channels in the device's interface having the same effect. So we likely need to enhance PCI error handling to catch and handle faults for certain devices differently - those we cannot trust to behave properly while they are under userland/guest control. Why not all of them? I think Jan is probably suggesting that we do need user space error handling for all userland/guest controlled devices, but some classes of errors on certain devices may be handled automatically by the userspace interface layer... which we could do with VFIO (well, assuming the APEI spec let's us nak the bios reporting a fatal error). So do we want to invent new solutions for each of these or do we want to move to a new interface? Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv0 dont apply] RFC: kvm eoi PV using shared memory
On Tue, Apr 10, 2012 at 07:08:26PM +0300, Avi Kivity wrote: On 04/10/2012 06:14 PM, Michael S. Tsirkin wrote: On Tue, Apr 10, 2012 at 06:00:51PM +0300, Avi Kivity wrote: On 04/10/2012 05:53 PM, Michael S. Tsirkin wrote: Yes. But we can and it's easier than figuring out priorities. I am guessing such collisions are rare, right? It's pretty easy, if there is something in IRR but kvm_lapic_has_interrupt() returns -1, then we need to disable eoi avoidance. I only see kvm_apic_has_interrupt - is this what you mean? Yes, sorry. It's not clear whether to do the check in kvm_apic_has_interrupt() or kvm_apic_get_interrupt() - the latter is called only after interrupts are enabled, so it looks like a better place (EOIs while interrupts are disabled have no effect). But need to make sure those functions are actually called, since they're protected by KVM_REQ_EVENT. Sorry not sure what you mean by make sure - read the code carefully? Yes. And I mean, get called at the right time. OK, Review will help here. Better to keep everything per-cpu. The code is in virt/kvm/ioapic.c Hmm. Disabling for level handles the ack notifiers issue as well, which I forgot about. It's a tough call. You think looking at TMR in kvm_get_apic_interrupt is safe? Yes, it's read only from the guest point of view IIRC. Why do we care about level-triggered interrupts? Everything uses MSI or edge-triggered IOAPIC interrupts these days. Well lots of emulated devices don't yet. They probably should but it's nice to be able to test with e.g. e1000 emulation not just virtio. e1000 doesn't support msi? qemu emulation doesn't. Can be changed if someone's really interested. But really, avoiding EOIs for e1000 won't help it much. It will help test EOI avoidance. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost-blk development
Hi Stefan. Well, I'm trying to determine which I/O method currently has the very least performance overhead and gives the best performance for both reads and writes. I am doing my testing by putting the entire guest onto a ramdisk. I'm working on an i5-760 with 16GB RAM with VT-d enabled. I am running the standard Centos 6 kernel with 0.12.1.2 release of qemu-kvm that comes stock on Centos 6. The guest is configured with 512 MB RAM, using, 4 cpu cores with it's /dev/vda being the ramdisk on the host. I'm not closed to building a custom kernel or kvm if I can get better performance reliably. However, my initial attempts with the 3.3.1 kernel and latest kvm gave mixed results. I've been using iozone 3.98 with -O -l32 -i0 -i1 -i2 -e -+n -r4K -s250M to measure performance. So, I was interested in vhost-blk since it seemed like a promising avenue to take a look at. If you have any other thoughts, that would also be helpful. -Mike - Original Message - From: Stefan Hajnoczi stefa...@gmail.com To: Michael Baysek mbay...@liquidweb.com Cc: kvm@vger.kernel.org Sent: Tuesday, April 10, 2012 4:55:26 AM Subject: Re: vhost-blk development On Mon, Apr 9, 2012 at 11:59 PM, Michael Baysek mbay...@liquidweb.com wrote: Hi all. I'm interested in any developments on the vhost-blk in kernel accelerator for disk i/o. I had seen a patchset on LKML https://lkml.org/lkml/2011/7/28/175 but that is rather old. Are there any newer developments going on with the vhost-blk stuff? Hi Michael, I'm curious what you are looking for in vhost-blk. Are you trying to improve disk performance for KVM guests? Perhaps you'd like to share your configuration, workload, and other details so that we can discuss how to improve performance. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv0 dont apply] RFC: kvm eoi PV using shared memory
Heh, I was working on that too. On Tue, Apr 10, 2012 at 05:26:18PM +0300, Michael S. Tsirkin wrote: On Tue, Apr 10, 2012 at 05:03:22PM +0300, Avi Kivity wrote: On 04/10/2012 04:27 PM, Michael S. Tsirkin wrote: I took a stub at implementing PV EOI using shared memory. This should reduce the number of exits an interrupt causes as much as by half. A partially complete draft for both host and guest parts is below. The idea is simple: there's a bit, per APIC, in guest memory, that tells the guest that it does not need EOI. We set it before injecting an interrupt and clear before injecting a nested one. Guest tests it using a test and clear operation - this is necessary so that host can detect interrupt nesting - and if set, it can skip the EOI MSR. There's a new MSR to set the address of said register in guest memory. Otherwise not much changes: - Guest EOI is not required - ISR is automatically cleared before injection Some things are incomplete: add feature negotiation options, qemu support for said options. No testing was done beyond compiling the kernel. I would appreciate early feedback. Signed-off-by: Michael S. Tsirkin m...@redhat.com -- diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index d854101..8430f41 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -457,8 +457,13 @@ static inline u32 safe_apic_wait_icr_idle(void) { return 0; } #endif /* CONFIG_X86_LOCAL_APIC */ +DECLARE_EARLY_PER_CPU(unsigned long, apic_eoi); + static inline void ack_APIC_irq(void) { + if (__test_and_clear_bit(0, __get_cpu_var(apic_eoi))) + return; + While __test_and_clear_bit() is implemented in a single instruction, it's not required to be. Better have the instruction there explicitly. /* * ack_APIC_irq() actually gets compiled as a single instruction * ... yummie. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e216ba0..0ee1472 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -481,6 +481,12 @@ struct kvm_vcpu_arch { u64 length; u64 status; } osvw; + + struct { + u64 msr_val; + struct gfn_to_hva_cache data; + int vector; + } eoi; }; Needs to be cleared on INIT. You mean kvm_arch_vcpu_reset? @@ -307,6 +308,9 @@ void __cpuinit kvm_guest_cpu_init(void) smp_processor_id()); } + wrmsrl(MSR_KVM_EOI_EN, __pa(this_cpu_ptr(apic_eoi)) | +MSR_KVM_EOI_ENABLED); + Clear on kexec. With register_reboot_notifier? if (has_steal_clock) kvm_register_steal_time(); } diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 8584322..9e38e12 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -265,7 +265,61 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq) irq-level, irq-trig_mode); } -static inline int apic_find_highest_isr(struct kvm_lapic *apic) +static int eoi_put_user(struct kvm_vcpu *vcpu, u32 val) +{ + + return kvm_write_guest_cached(vcpu-kvm, vcpu-arch.eoi.data, val, + sizeof(val)); +} + +static int eoi_get_user(struct kvm_vcpu *vcpu, u32 *val) +{ + + return kvm_read_guest_cached(vcpu-kvm, vcpu-arch.eoi.data, val, + sizeof(*val)); +} + +static inline bool eoi_enabled(struct kvm_vcpu *vcpu) +{ + return (vcpu-arch.eoi.msr_val MSR_KVM_EOI_ENABLED); +} + +static int eoi_get_pending_vector(struct kvm_vcpu *vcpu) +{ + u32 val; + if (eoi_get_user(vcpu, val) 0) + apic_debug(Can't read EOI MSR value: 0x%llx\n, +(unsigned long long)vcpi-arch.eoi.msr_val); + if (!(val 0x1)) + vcpu-arch.eoi.vector = -1; + return vcpu-arch.eoi.vector; +} + +static void eoi_set_pending_vector(struct kvm_vcpu *vcpu, int vector) +{ + BUG_ON(vcpu-arch.eoi.vector != -1); + if (eoi_put_user(vcpu, 0x1) 0) { + apic_debug(Can't set EOI MSR value: 0x%llx\n, +(unsigned long long)vcpi-arch.eoi.msr_val); + return; + } + vcpu-arch.eoi.vector = vector; +} + +static int eoi_clr_pending_vector(struct kvm_vcpu *vcpu) +{ + int vector; + vector = vcpu-arch.eoi.vector; + if (vector != -1 eoi_put_user(vcpu, 0x0) 0) { + apic_debug(Can't clear EOI MSR value: 0x%llx\n, +(unsigned long long)vcpi-arch.eoi.msr_val); + return -1; + } + vcpu-arch.eoi.vector = -1; + return vector; +} + +static inline int __apic_find_highest_isr(struct kvm_lapic *apic) { int
[PATCH] KVM: Introduce generic interrupt injection for in-kernel irqchips
Currently, MSI messages can only be injected to in-kernel irqchips by defining a corresponding IRQ route for each message. This is not only unhandy if the MSI messages are generated on the fly by user space, IRQ routes are a limited resource that user space has to manage carefully. By providing a direct injection path, we can both avoid using up limited resources and simplify the necessary steps for user land. This path is provide in a way that allows for use with other interrupt sources as well. Besides MSIs also external interrupt lines can be manipulated through this interface, obsoleting KVM_IRQ_LINE_STATUS. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- This picks up Avi's first suggestion as I still think it is the better option to provide a direct MSI injection channel. Documentation/virtual/kvm/api.txt | 46 + include/linux/kvm.h | 26 + include/linux/kvm_host.h |2 + virt/kvm/irq_comm.c | 29 +++ virt/kvm/kvm_main.c | 20 5 files changed, 123 insertions(+), 0 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 81ff39f..c70be58 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1482,6 +1482,52 @@ See KVM_ASSIGN_DEV_IRQ for the data structure. The target device is specified by assigned_dev_id. In the flags field, only KVM_DEV_ASSIGN_MASK_INTX is evaluated. +4.61 KVM_GENERAL_IRQ + +Capability: KVM_CAP_GENERAL_IRQ +Architectures: x86 +Type: vm ioctl +Parameters: struct kvm_general_irq (in/out) +Returns: 0 on success, 0 on error + +Inject an interrupt event to the guest. Only valid if in-kernel irqchip is +enabled. + +struct kvm_general_irq { + __u32 type; + __u32 op; + __s32 status; + __u32 pad; + union { + __u32 line; + struct { + __u32 address_lo; + __u32 address_hi; + __u32 data; + } msi; + __u8 pad[32]; + } u; +}; + +Support IRQ types are: + +#define KVM_IRQTYPE_EXTERNAL_LINE 0 +#define KVM_IRQTYPE_MSI1 + +Available operations are: + +#define KVM_IRQOP_LOWER0 +#define KVM_IRQOP_RAISE1 +#define KVM_IRQOP_TRIGGER 2 + +The level of an external interrupt line can either be raised or lowered, a +MSI can only be triggered. + +If 0 is returned from the IOCTL, the status field was updated as well to +reflect the injection result. It will be 0 on interrupt delivery, 0 if the +interrupt was coalesced with an already pending one, and 0 if the guest +blocked the delivery or some delivery error occurred. + 4.62 KVM_CREATE_SPAPR_TCE Capability: KVM_CAP_SPAPR_TCE diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 7a9dd4b..cb3afaf 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -590,6 +590,7 @@ struct kvm_ppc_pvinfo { #define KVM_CAP_SYNC_REGS 74 #define KVM_CAP_PCI_2_3 75 #define KVM_CAP_KVMCLOCK_CTRL 76 +#define KVM_CAP_GENERAL_IRQ 77 #ifdef KVM_CAP_IRQ_ROUTING @@ -715,6 +716,29 @@ struct kvm_one_reg { __u64 addr; }; +#define KVM_IRQTYPE_EXTERNAL_LINE 0 +#define KVM_IRQTYPE_MSI1 + +#define KVM_IRQOP_LOWER0 +#define KVM_IRQOP_RAISE1 +#define KVM_IRQOP_TRIGGER 2 + +struct kvm_general_irq { + __u32 type; + __u32 op; + __s32 status; + __u32 pad; + union { + __u32 line; + struct { + __u32 address_lo; + __u32 address_hi; + __u32 data; + } msi; + __u8 pad[32]; + } u; +}; + /* * ioctls for VM fds */ @@ -789,6 +813,8 @@ struct kvm_s390_ucas_mapping { /* Available with KVM_CAP_PCI_2_3 */ #define KVM_ASSIGN_SET_INTX_MASK _IOW(KVMIO, 0xa4, \ struct kvm_assigned_pci_dev) +/* Available with KVM_CAP_GENERAL_IRQ */ +#define KVM_GENERAL_IRQ _IOWR(KVMIO, 0xa5, struct kvm_general_irq) /* * ioctls for vcpu fds diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 49c2f2f..31d3b44 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -773,6 +773,8 @@ int kvm_set_irq_routing(struct kvm *kvm, unsigned flags); void kvm_free_irq_routing(struct kvm *kvm); +int kvm_general_irq(struct kvm *kvm, struct kvm_general_irq *irq); + #else static inline void kvm_free_irq_routing(struct kvm *kvm) {} diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 9f614b4..e487d3f 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -138,6 +138,35 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, return
Re: [PATCHv0 dont apply] RFC: kvm eoi PV using shared memory
On Tue, Apr 10, 2012 at 08:59:21PM +0300, Gleb Natapov wrote: Heh, I was working on that too. On Tue, Apr 10, 2012 at 05:26:18PM +0300, Michael S. Tsirkin wrote: On Tue, Apr 10, 2012 at 05:03:22PM +0300, Avi Kivity wrote: On 04/10/2012 04:27 PM, Michael S. Tsirkin wrote: I took a stub at implementing PV EOI using shared memory. This should reduce the number of exits an interrupt causes as much as by half. A partially complete draft for both host and guest parts is below. The idea is simple: there's a bit, per APIC, in guest memory, that tells the guest that it does not need EOI. We set it before injecting an interrupt and clear before injecting a nested one. Guest tests it using a test and clear operation - this is necessary so that host can detect interrupt nesting - and if set, it can skip the EOI MSR. There's a new MSR to set the address of said register in guest memory. Otherwise not much changes: - Guest EOI is not required - ISR is automatically cleared before injection Some things are incomplete: add feature negotiation options, qemu support for said options. No testing was done beyond compiling the kernel. I would appreciate early feedback. Signed-off-by: Michael S. Tsirkin m...@redhat.com -- diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index d854101..8430f41 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -457,8 +457,13 @@ static inline u32 safe_apic_wait_icr_idle(void) { return 0; } #endif /* CONFIG_X86_LOCAL_APIC */ +DECLARE_EARLY_PER_CPU(unsigned long, apic_eoi); + static inline void ack_APIC_irq(void) { + if (__test_and_clear_bit(0, __get_cpu_var(apic_eoi))) + return; + While __test_and_clear_bit() is implemented in a single instruction, it's not required to be. Better have the instruction there explicitly. /* * ack_APIC_irq() actually gets compiled as a single instruction * ... yummie. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e216ba0..0ee1472 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -481,6 +481,12 @@ struct kvm_vcpu_arch { u64 length; u64 status; } osvw; + + struct { + u64 msr_val; + struct gfn_to_hva_cache data; + int vector; + } eoi; }; Needs to be cleared on INIT. You mean kvm_arch_vcpu_reset? @@ -307,6 +308,9 @@ void __cpuinit kvm_guest_cpu_init(void) smp_processor_id()); } + wrmsrl(MSR_KVM_EOI_EN, __pa(this_cpu_ptr(apic_eoi)) | + MSR_KVM_EOI_ENABLED); + Clear on kexec. With register_reboot_notifier? if (has_steal_clock) kvm_register_steal_time(); } diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 8584322..9e38e12 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -265,7 +265,61 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq) irq-level, irq-trig_mode); } -static inline int apic_find_highest_isr(struct kvm_lapic *apic) +static int eoi_put_user(struct kvm_vcpu *vcpu, u32 val) +{ + + return kvm_write_guest_cached(vcpu-kvm, vcpu-arch.eoi.data, val, + sizeof(val)); +} + +static int eoi_get_user(struct kvm_vcpu *vcpu, u32 *val) +{ + + return kvm_read_guest_cached(vcpu-kvm, vcpu-arch.eoi.data, val, + sizeof(*val)); +} + +static inline bool eoi_enabled(struct kvm_vcpu *vcpu) +{ + return (vcpu-arch.eoi.msr_val MSR_KVM_EOI_ENABLED); +} + +static int eoi_get_pending_vector(struct kvm_vcpu *vcpu) +{ + u32 val; + if (eoi_get_user(vcpu, val) 0) + apic_debug(Can't read EOI MSR value: 0x%llx\n, + (unsigned long long)vcpi-arch.eoi.msr_val); + if (!(val 0x1)) + vcpu-arch.eoi.vector = -1; + return vcpu-arch.eoi.vector; +} + +static void eoi_set_pending_vector(struct kvm_vcpu *vcpu, int vector) +{ + BUG_ON(vcpu-arch.eoi.vector != -1); + if (eoi_put_user(vcpu, 0x1) 0) { + apic_debug(Can't set EOI MSR value: 0x%llx\n, + (unsigned long long)vcpi-arch.eoi.msr_val); + return; + } + vcpu-arch.eoi.vector = vector; +} + +static int
Re: [PATCHv0 dont apply] RFC: kvm eoi PV using shared memory
On Tue, Apr 10, 2012 at 10:30:04PM +0300, Michael S. Tsirkin wrote: On Tue, Apr 10, 2012 at 08:59:21PM +0300, Gleb Natapov wrote: Heh, I was working on that too. On Tue, Apr 10, 2012 at 05:26:18PM +0300, Michael S. Tsirkin wrote: On Tue, Apr 10, 2012 at 05:03:22PM +0300, Avi Kivity wrote: On 04/10/2012 04:27 PM, Michael S. Tsirkin wrote: I took a stub at implementing PV EOI using shared memory. This should reduce the number of exits an interrupt causes as much as by half. A partially complete draft for both host and guest parts is below. The idea is simple: there's a bit, per APIC, in guest memory, that tells the guest that it does not need EOI. We set it before injecting an interrupt and clear before injecting a nested one. Guest tests it using a test and clear operation - this is necessary so that host can detect interrupt nesting - and if set, it can skip the EOI MSR. There's a new MSR to set the address of said register in guest memory. Otherwise not much changes: - Guest EOI is not required - ISR is automatically cleared before injection Some things are incomplete: add feature negotiation options, qemu support for said options. No testing was done beyond compiling the kernel. I would appreciate early feedback. Signed-off-by: Michael S. Tsirkin m...@redhat.com -- diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index d854101..8430f41 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -457,8 +457,13 @@ static inline u32 safe_apic_wait_icr_idle(void) { return 0; } #endif /* CONFIG_X86_LOCAL_APIC */ +DECLARE_EARLY_PER_CPU(unsigned long, apic_eoi); + static inline void ack_APIC_irq(void) { + if (__test_and_clear_bit(0, __get_cpu_var(apic_eoi))) + return; + While __test_and_clear_bit() is implemented in a single instruction, it's not required to be. Better have the instruction there explicitly. /* * ack_APIC_irq() actually gets compiled as a single instruction * ... yummie. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e216ba0..0ee1472 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -481,6 +481,12 @@ struct kvm_vcpu_arch { u64 length; u64 status; } osvw; + + struct { + u64 msr_val; + struct gfn_to_hva_cache data; + int vector; + } eoi; }; Needs to be cleared on INIT. You mean kvm_arch_vcpu_reset? @@ -307,6 +308,9 @@ void __cpuinit kvm_guest_cpu_init(void) smp_processor_id()); } + wrmsrl(MSR_KVM_EOI_EN, __pa(this_cpu_ptr(apic_eoi)) | +MSR_KVM_EOI_ENABLED); + Clear on kexec. With register_reboot_notifier? if (has_steal_clock) kvm_register_steal_time(); } diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 8584322..9e38e12 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -265,7 +265,61 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq) irq-level, irq-trig_mode); } -static inline int apic_find_highest_isr(struct kvm_lapic *apic) +static int eoi_put_user(struct kvm_vcpu *vcpu, u32 val) +{ + + return kvm_write_guest_cached(vcpu-kvm, vcpu-arch.eoi.data, val, + sizeof(val)); +} + +static int eoi_get_user(struct kvm_vcpu *vcpu, u32 *val) +{ + + return kvm_read_guest_cached(vcpu-kvm, vcpu-arch.eoi.data, val, + sizeof(*val)); +} + +static inline bool eoi_enabled(struct kvm_vcpu *vcpu) +{ + return (vcpu-arch.eoi.msr_val MSR_KVM_EOI_ENABLED); +} + +static int eoi_get_pending_vector(struct kvm_vcpu *vcpu) +{ + u32 val; + if (eoi_get_user(vcpu, val) 0) + apic_debug(Can't read EOI MSR value: 0x%llx\n, +(unsigned long long)vcpi-arch.eoi.msr_val); + if (!(val 0x1)) + vcpu-arch.eoi.vector = -1; + return vcpu-arch.eoi.vector; +} + +static void eoi_set_pending_vector(struct kvm_vcpu *vcpu, int vector) +{ + BUG_ON(vcpu-arch.eoi.vector != -1); + if (eoi_put_user(vcpu, 0x1) 0) { + apic_debug(Can't set EOI MSR value: 0x%llx\n, +(unsigned long
KVM qemu-kvm ext4_fill_flex_info() Denial of Service Vulnerability
Hi all. Yesterday, secunia has released an advisory about qemu-kvm. https://secunia.com/advisories/48645/ This seems to describe and 'old' kernel bug, but I don't know if there is a 'link' between the ext4 issue and kvm. Can you explain a bit this issue? Thanks in advance. -- Agostino Sarubboago -at- gentoo.org Gentoo/AMD64 Arch Security Liaison GPG: 0x7CD2DC5D signature.asc Description: This is a digitally signed message part.
Re: [PATCHv0 dont apply] RFC: kvm eoi PV using shared memory
On Tue, Apr 10, 2012 at 10:33:54PM +0300, Gleb Natapov wrote: We don't try to match what HV does 100% anyway. We should. The same code will be used for HV. Only where it makes sense, that is where the functionality is sufficiently similar. We have to notify IOAPIC about EOI ASAP. It may hold another interrupt for us that has to be delivered. You mean the ack notifiers? We can skip just for the vectors which have ack notifiers or only if there are no notifiers. No. I mean: if (!ent-fields.mask (ioapic-irr (1 i))) ioapic_service(ioapic, i); Hmm. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv0 dont apply] RFC: kvm eoi PV using shared memory
On Tue, Apr 10, 2012 at 10:40:14PM +0300, Michael S. Tsirkin wrote: On Tue, Apr 10, 2012 at 10:33:54PM +0300, Gleb Natapov wrote: We don't try to match what HV does 100% anyway. We should. The same code will be used for HV. Only where it makes sense, that is where the functionality is sufficiently similar. You can sprinkle additional ifs in the code, but I do not see the point. We have to notify IOAPIC about EOI ASAP. It may hold another interrupt for us that has to be delivered. You mean the ack notifiers? We can skip just for the vectors which have ack notifiers or only if there are no notifiers. No. I mean: if (!ent-fields.mask (ioapic-irr (1 i))) ioapic_service(ioapic, i); Hmm. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3] IOMMU groups
Ping. Does this approach look like it could satisfy your desire for a more integrated group layer? I'd really like to move VFIO forward, we've been stalled on this long enough. David Woodhouse, I think this provides the quirking you're looking for for device like the Ricoh, do you have any other requirements for a group layer? Thanks, Alex On Mon, 2012-04-02 at 15:14 -0600, Alex Williamson wrote: This series attempts to make IOMMU device grouping a slightly more integral part of the device model. iommu_device_groups were originally introduced to support the VFIO user space driver interface which needs to understand the granularity of device isolation in order to ensure security of devices when assigned for user access. This information was provided via a simple group identifier from the IOMMU driver allowing VFIO to walk devices and assemble groups itself. The feedback received from this was that groups should be the effective unit of work for the IOMMU API. The existing model of allowing domains to be created and individual devices attached ignores many of the restrictions of the IOMMU, whether by design, by topology or by defective devices. Additionally we should be able to use the grouping information at the dma ops layer for managing domains and quirking devices. This series is a sketch at implementing only those aspects and leaving everything else about the multifaceted hairball of Isolation groups for another API. Please comment and let me know if this seems like the direction we should be headed. Thanks, Alex --- Alex Williamson (3): iommu: Create attach/detach group interface iommu: Create basic group infrastructure and update AMD-Vi Intel VT-d iommu: Introduce iommu_group drivers/iommu/amd_iommu.c | 50 ++ drivers/iommu/intel-iommu.c | 76 drivers/iommu/iommu.c | 210 ++- include/linux/device.h |2 include/linux/iommu.h | 43 + 5 files changed, 301 insertions(+), 80 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_blk: Add help function to format mass of disks
On 04/10/2012 11:53 PM, Michael S. Tsirkin wrote: On Tue, Apr 10, 2012 at 08:49:43AM -0700, Tejun Heo wrote: Hello, guys. On Tue, Apr 10, 2012 at 04:34:06PM +0300, Michael S. Tsirkin wrote: Why not use 'base' below? neither unit nor base change. Yes it's a bit strange, it was the same in Tejun's patch. Tejun, any idea? It was years ago, so I don't recall much. I think I wanted to use a variable name which signifies its role - I worked out the rather convoluted base number logic on paper first and I probably wanted to keep the distinctions. I don't think it really matters at this point tho. Just make sure those functions are marked deprecated so that no one else copies them. Thanks. I guess I'll keep it same so it's easier to deduplicate if someon wants to. Why not fix it both in sd_format_disk_name() and virtblk_name_format(). Ren, mind to send v2 to drop the duplicate line? -- tejun ___ Virtualization mailing list virtualizat...@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization -- Asias -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_blk: Add help function to format mass of disks
On 04/10/2012 09:16 PM, Avi Kivity wrote: On 04/10/2012 10:28 AM, Ren Mingxin wrote: The current virtio block's naming algorithm just supports 18278 (26^3 + 26^2 + 26) disks. If there are mass of virtio blocks, there will be disks with the same name. Based on commit 3e1a7ff8a0a7b948f2684930166954f9e8e776fe, I add function virtblk_name_format() for virtio block to support mass of disks naming. Signed-off-by: Ren Mingxinre...@cn.fujitsu.com --- drivers/block/virtio_blk.c | 38 ++ 1 files changed, 26 insertions(+), 12 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index c4a60ba..86516c8 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -374,6 +374,31 @@ static int init_vq(struct virtio_blk *vblk) return err; } +static int virtblk_name_format(char *prefix, int index, char *buf, int buflen) +{ + const int base = 'z' - 'a' + 1; + char *begin = buf + strlen(prefix); + char *begin = buf + strlen(prefix); Duplicate line. Oh, obvious missed :-( -- Thanks, Ren -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_blk: Add help function to format mass of disks
On 04/11/2012 09:21 AM, Asias He wrote: On 04/10/2012 11:53 PM, Michael S. Tsirkin wrote: On Tue, Apr 10, 2012 at 08:49:43AM -0700, Tejun Heo wrote: Hello, guys. On Tue, Apr 10, 2012 at 04:34:06PM +0300, Michael S. Tsirkin wrote: Why not use 'base' below? neither unit nor base change. Yes it's a bit strange, it was the same in Tejun's patch. Tejun, any idea? It was years ago, so I don't recall much. I think I wanted to use a variable name which signifies its role - I worked out the rather convoluted base number logic on paper first and I probably wanted to keep the distinctions. I don't think it really matters at this point tho. Just make sure those functions are marked deprecated so that no one else copies them. Thanks. I guess I'll keep it same so it's easier to deduplicate if someon wants to. So, I'd keep this in the next version. Why not fix it both in sd_format_disk_name() and virtblk_name_format(). Ren, mind to send v2 to drop the duplicate line? I'll send v2 soon. -- Thanks, Ren -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V6 0/11] Paravirtualized ticketlocks
On Sat, Mar 31, 2012 at 12:07:58AM +0200, Thomas Gleixner wrote: On Fri, 30 Mar 2012, H. Peter Anvin wrote: What is the current status of this patchset? I haven't looked at it too closely because I have been focused on 3.4 up until now... The real question is whether these heuristics are the correct approach or not. If I look at it from the non virtualized kernel side then this is ass backwards. We know already that we are holding a spinlock which might cause other (v)cpus going into eternal spin. The non virtualized kernel solves this by disabling preemption and therefor getting out of the critical section as fast as possible, The virtualization problem reminds me a lot of the problem which RT kernels are observing where non raw spinlocks are turned into sleeping spinlocks and therefor can cause throughput issues for non RT workloads. Though the virtualized situation is even worse. Any preempted guest section which holds a spinlock is prone to cause unbound delays. The paravirt ticketlock solution can only mitigate the problem, but not solve it. With massive overcommit there is always a way to trigger worst case scenarious unless you are educating the scheduler to cope with that. So if we need to fiddle with the scheduler and frankly that's the only way to get a real gain (the numbers, which are achieved by this patches, are not that impressive) then the question arises whether we should turn the whole thing around. I know that Peter is going to go berserk on me, but if we are running a paravirt guest then it's simple to provide a mechanism which allows the host (aka hypervisor) to check that in the guest just by looking at some global state. So if a guest exits due to an external event it's easy to inspect the state of that guest and avoid to schedule away when it was interrupted in a spinlock held section. That guest/host shared state needs to be modified to indicate the guest to invoke an exit when the last nested lock has been released. Remember that the host is scheduling other processes than vcpus of guests. The case where a higher priority task (whatever that task is) interrupts a vcpu which holds a spinlock should be frequent, in a overcommit scenario. Whenever that is the case, other vcpus _must_ be able to stop spinning. Now extrapolate that to guests with large number of vcpus. There is no replacement for sleep-in-hypervisor-instead-of-spin. Of course this needs to be time bound, so a rogue guest cannot monopolize the cpu forever, but that's the least to worry about problem simply because a guest which does not get out of a spinlocked region within a certain amount of time is borked and elegible to killing anyway. Thoughts ? Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 uq/master] kvm: Drop unused kvm_pit_in_kernel
On Thu, Mar 22, 2012 at 12:00:48AM +0100, Jan Kiszka wrote: From: Jan Kiszka jan.kis...@siemens.com This is now implied by kvm_irqchip_in_kernel. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Rebased over latest uq/master. kvm-all.c |6 -- kvm-stub.c |6 -- kvm.h |2 -- 3 files changed, 0 insertions(+), 14 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Export offsets of VMCS fields as note information for kdump
This patch set exports offsets of VMCS fields as note information for kdump. We call it VMCSINFO. The purpose of VMCSINFO is to retrieve runtime state of guest machine image, such as registers, in host machine's crash dump as VMCS format. The problem is that VMCS internal is hidden by Intel in its specification. So, we reverse engineering it in the way implemented in this patch set. Please note that this processing never affects any existing kvm logic. The VMCSINFO is exported via sysfs to kexec-tools just like VMCOREINFO. Here is an example: Processor: Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz $cat /sys/kernel/vmcsinfo 1cba8c0 2000 crash rd -p 1cba8c0 1000 1cba8c0: 127b0009 53434d56 {...VMCS 1cba8d0: 4f464e49 4e4f495349564552 INFOREVISION 1cba8e0: 49460a643d44495f 5f4e495028444c45 _ID=d.FIELD(PIN_ 1cba8f0: 4d565f4445534142 4f435f434558455f BASED_VM_EXEC_CO 1cba900: 303d294c4f52544e 0a30383130343831 NTROL)=01840180. 1cba910: 504328444c454946 5f44455341425f55 FIELD(CPU_BASED_ 1cba920: 5f434558455f4d56 294c4f52544e4f43 VM_EXEC_CONTROL) 1cba930: 393130343931303d 28444c4549460a30 =01940190.FIELD( 1cba940: 5241444e4f434553 4558455f4d565f59 SECONDARY_VM_EXE 1cba950: 4f52544e4f435f43 30346566303d294c C_CONTROL)=0fe40 1cba960: 4c4549460a306566 4958455f4d562844 fe0.FIELD(VM_EXI 1cba970: 4f52544e4f435f54 346531303d29534c T_CONTROLS)=01e4 1cba980: 4549460a30653130 4e455f4d5628444c 01e0.FIELD(VM_EN 1cba990: 544e4f435f595254 33303d29534c4f52 TRY_CONTROLS)=03 1cba9a0: 460a303133303431 45554728444c4549 140310.FIELD(GUE 1cba9b0: 45535f53455f5453 3d29524f5443454c ST_ES_SELECTOR)= 1cba9c0: 4549460a30303530 545345554728444c 0500.FIELD(GUEST 1cba9d0: 454c45535f53435f 35303d29524f5443 _CS_SELECTOR)=05 .. TODO: 1. In kexec-tools, get VMCSINFO via sysfs and dump it as note information into vmcore. 2. Dump VMCS region of each guest vcpu and VMCSINFO into qemu-process core file. To do this, we will modify kernel core dumper, gdb gcore and crash gcore. 3. Dump guest image from the qemu-process core file into a vmcore. zhangyanfei (4): x86: Add helper variables and functions to hold VMCSINFO KVM: VMX: Add functions to fill VMCSINFO ksysfs: export VMCSINFO via sysfs kexec: Add crash_save_vmcsinfo to update VMCSINFO arch/x86/include/asm/vmcsinfo.h | 42 + arch/x86/kernel/Makefile|2 + arch/x86/kernel/vmcsinfo.c | 70 arch/x86/kvm/vmx.c | 350 +++ include/linux/kexec.h |1 + kernel/kexec.c | 14 ++ kernel/ksysfs.c | 19 ++ 7 files changed, 498 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/vmcsinfo.h create mode 100644 arch/x86/kernel/vmcsinfo.c -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] x86: Add helper variables and functions to hold VMCSINFO
This patch provides a set of variables to hold the VMCSINFO and also some helper functions to help fill the VMCSINFO. Signed-off-by: zhangyanfei zhangyan...@cn.fujitsu.com --- arch/x86/include/asm/vmcsinfo.h | 42 +++ arch/x86/kernel/Makefile|2 + arch/x86/kernel/vmcsinfo.c | 70 +++ 3 files changed, 114 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/vmcsinfo.h create mode 100644 arch/x86/kernel/vmcsinfo.c diff --git a/arch/x86/include/asm/vmcsinfo.h b/arch/x86/include/asm/vmcsinfo.h new file mode 100644 index 000..cfdc984 --- /dev/null +++ b/arch/x86/include/asm/vmcsinfo.h @@ -0,0 +1,42 @@ +#ifndef _ASM_X86_VMCSINFO_H +#define _ASM_X86_VMCSINFO_H + +#ifndef __ASSEMBLY__ +#include linux/types.h +#include linux/elf.h + +/* + * Currently, 2 pages are enough for vmcsinfo. + */ +#define VMCSINFO_BYTES (8192) +#define VMCSINFO_NOTE_NAME VMCSINFO +#define VMCSINFO_NOTE_NAME_BYTES ALIGN(sizeof(VMCSINFO_NOTE_NAME), 4) +#define VMCSINFO_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4) +#define VMCSINFO_NOTE_SIZE (VMCSINFO_NOTE_HEAD_BYTES*2 \ + + VMCSINFO_BYTES \ + + VMCSINFO_NOTE_NAME_BYTES) + +extern size_t vmcsinfo_size; +extern size_t vmcsinfo_max_size; + +extern void update_vmcsinfo_note(void); +extern void vmcsinfo_append_str(const char *fmt, ...); +extern unsigned long paddr_vmcsinfo_note(void); + +#define VMCSINFO_REVISION_ID(id) \ + vmcsinfo_append_str(REVISION_ID=%x\n, id) +#define VMCSINFO_FIELD16(name, value) \ + vmcsinfo_append_str(FIELD(%s)=%04x\n, #name, value) +#define VMCSINFO_FIELD32(name, value) \ + vmcsinfo_append_str(FIELD(%s)=%08x\n, #name, value) +#define VMCSINFO_FIELD64(name, value) \ + vmcsinfo_append_str(FIELD(%s)=%016llx\n, #name, value) + +#ifdef CONFIG_X86_64 +#define VMCSINFO_FIELD(name, value) VMCSINFO_FIELD64(name, value) +#else +#define VMCSINFO_FIELD(name, value) VMCSINFO_FIELD32(name, value) +#endif + +#endif /* __ASSEMBLY__ */ +#endif /* _ASM_X86_VMCSINFO_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 532d2e0..63edf33 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -102,6 +102,8 @@ obj-$(CONFIG_X86_CHECK_BIOS_CORRUPTION) += check.o obj-$(CONFIG_SWIOTLB) += pci-swiotlb.o obj-$(CONFIG_OF) += devicetree.o +obj-y += vmcsinfo.o + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/vmcsinfo.c b/arch/x86/kernel/vmcsinfo.c new file mode 100644 index 000..c1306ef --- /dev/null +++ b/arch/x86/kernel/vmcsinfo.c @@ -0,0 +1,70 @@ +/* + * Architecture specific (i386/x86_64) functions for storing vmcs + * field information. + * + * Created by: zhangyanfei (zhangyan...@cn.fujitsu.com) + * + * Copyright (C) Fujitsu Corporation, 2012. All rights reserved. + * + * This source code is licensed under the GNU General Public License, + * Version 2. See the file COPYING for more details. + */ + +#include asm/vmcsinfo.h +#include linux/module.h +#include linux/elf.h + +static unsigned char vmcsinfo_data[VMCSINFO_BYTES]; +static u32 vmcsinfo_note[VMCSINFO_NOTE_SIZE/4]; +size_t vmcsinfo_max_size = sizeof(vmcsinfo_data); +size_t vmcsinfo_size; +EXPORT_SYMBOL(vmcsinfo_size); + +void update_vmcsinfo_note(void) +{ + u32 *buf = vmcsinfo_note; + struct elf_note note; + + if (!vmcsinfo_size) + return; + + note.n_namesz = strlen(VMCSINFO_NOTE_NAME) + 1; + note.n_descsz = vmcsinfo_size; + note.n_type = 0; + memcpy(buf, note, sizeof(note)); + buf += (sizeof(note) + 3)/4; + memcpy(buf, VMCSINFO_NOTE_NAME, note.n_namesz); + buf += (note.n_namesz + 3)/4; + memcpy(buf, vmcsinfo_data, note.n_descsz); + buf += (note.n_descsz + 3)/4; + + note.n_namesz = 0; + note.n_descsz = 0; + note.n_type = 0; + memcpy(buf, note, sizeof(note)); +} +EXPORT_SYMBOL(update_vmcsinfo_note); + +void vmcsinfo_append_str(const char *fmt, ...) +{ + va_list args; + char buf[0x50]; + int r; + + va_start(args, fmt); + r = vsnprintf(buf, sizeof(buf), fmt, args); + va_end(args); + + if (r + vmcsinfo_size vmcsinfo_max_size) + r = vmcsinfo_max_size - vmcsinfo_size; + + memcpy(vmcsinfo_data[vmcsinfo_size], buf, r); + + vmcsinfo_size += r; +} +EXPORT_SYMBOL(vmcsinfo_append_str); + +unsigned long paddr_vmcsinfo_note(void) +{ + return __pa((unsigned long)(char *)vmcsinfo_note); +} -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] KVM: VMX: Add functions to fill VMCSINFO
This patch is to implement the feature that at initialization of kvm_intel module, fills VMCSINFO with a VMCS revision identifier, and encoded offsets of VMCS fields. The reason why we put the VMCSINFO processing at the initialization of kvm_intel module is that it's dangerous to rob VMX resources while kvm module is loaded. Note, offsets of fields below will not be filled into VMCSINFO: 1. fields defined in Intel specification (Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3C) but not defined in *vmcs_field*. 2. fields don't exist because their corresponding control bits are not set. Signed-off-by: zhangyanfei zhangyan...@cn.fujitsu.com --- arch/x86/kvm/vmx.c | 350 1 files changed, 350 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ad85adf..e98fafa 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -41,6 +41,7 @@ #include asm/i387.h #include asm/xcr.h #include asm/perf_event.h +#include asm/vmcsinfo.h #include trace.h @@ -2599,6 +2600,353 @@ static __init int alloc_kvm_area(void) return 0; } +/* + * For caculating offsets of fields in VMCS data, we index every 16-bit + * field by this kind of format: + * | - 16 bits -- | + * +-+-++-+ + * | high 7 bits |1| low 7 bits |0| + * +-+-++-+ + * In high byte, the lowest bit must be 1; In low byte, the lowest bit + * must be 0. The two bits are set like this in case indexes in VMCS + * data are read as big endian mode. + * The remaining 14 bits of the index indicate the real offset of the + * field. Because the size of a VMCS region is at most 4 KBytes, so + * 14 bits are enough to index the whole VMCS region. + * + * ENCODING_OFFSET: encode the offset into the index of this kind. + */ +#define OFFSET_HIGH_SHIFT (7) +#define OFFSET_LOW_MASK ((1 OFFSET_HIGH_SHIFT) - 1) /* 0x7f */ +#define OFFSET_HIGH_MASK (OFFSET_LOW_MASK OFFSET_HIGH_SHIFT) /* 0x3f80 */ +#define ENCODING_OFFSET(offset) \ + offset) OFFSET_LOW_MASK) 1) + \ + offset) OFFSET_HIGH_MASK) 2) | 0x100)) + +/* + * We separate these five control fields from other fields + * because some fields only exist on processors that support + * the 1-setting of control bits in the five control fields. + */ +static inline void append_control_field(void) +{ +#define CONTROL_FIELD_OFFSET(field) \ + VMCSINFO_FIELD32(field, vmcs_read32(field)) + + CONTROL_FIELD_OFFSET(PIN_BASED_VM_EXEC_CONTROL); + CONTROL_FIELD_OFFSET(CPU_BASED_VM_EXEC_CONTROL); + if (cpu_has_secondary_exec_ctrls()) { + CONTROL_FIELD_OFFSET(SECONDARY_VM_EXEC_CONTROL); + } + CONTROL_FIELD_OFFSET(VM_EXIT_CONTROLS); + CONTROL_FIELD_OFFSET(VM_ENTRY_CONTROLS); +} + +static inline void append_field16(void) +{ +#define FIELD_OFFSET16(field) \ + VMCSINFO_FIELD16(field, vmcs_read16(field)); + + FIELD_OFFSET16(GUEST_ES_SELECTOR); + FIELD_OFFSET16(GUEST_CS_SELECTOR); + FIELD_OFFSET16(GUEST_SS_SELECTOR); + FIELD_OFFSET16(GUEST_DS_SELECTOR); + FIELD_OFFSET16(GUEST_FS_SELECTOR); + FIELD_OFFSET16(GUEST_GS_SELECTOR); + FIELD_OFFSET16(GUEST_LDTR_SELECTOR); + FIELD_OFFSET16(GUEST_TR_SELECTOR); + FIELD_OFFSET16(HOST_ES_SELECTOR); + FIELD_OFFSET16(HOST_CS_SELECTOR); + FIELD_OFFSET16(HOST_SS_SELECTOR); + FIELD_OFFSET16(HOST_DS_SELECTOR); + FIELD_OFFSET16(HOST_FS_SELECTOR); + FIELD_OFFSET16(HOST_GS_SELECTOR); + FIELD_OFFSET16(HOST_TR_SELECTOR); +} + +static inline void append_field64(void) +{ +#define FIELD_OFFSET64(field) \ + VMCSINFO_FIELD64(field, vmcs_read64(field)); + + FIELD_OFFSET64(IO_BITMAP_A); + FIELD_OFFSET64(IO_BITMAP_A_HIGH); + FIELD_OFFSET64(IO_BITMAP_B); + FIELD_OFFSET64(IO_BITMAP_B_HIGH); + FIELD_OFFSET64(VM_EXIT_MSR_STORE_ADDR); + FIELD_OFFSET64(VM_EXIT_MSR_STORE_ADDR_HIGH); + FIELD_OFFSET64(VM_EXIT_MSR_LOAD_ADDR); + FIELD_OFFSET64(VM_EXIT_MSR_LOAD_ADDR_HIGH); + FIELD_OFFSET64(VM_ENTRY_MSR_LOAD_ADDR); + FIELD_OFFSET64(VM_ENTRY_MSR_LOAD_ADDR_HIGH); + FIELD_OFFSET64(TSC_OFFSET); + FIELD_OFFSET64(TSC_OFFSET_HIGH); + FIELD_OFFSET64(VMCS_LINK_POINTER); + FIELD_OFFSET64(VMCS_LINK_POINTER_HIGH); + FIELD_OFFSET64(GUEST_IA32_DEBUGCTL); + FIELD_OFFSET64(GUEST_IA32_DEBUGCTL_HIGH); + + if (cpu_has_vmx_msr_bitmap()) { + FIELD_OFFSET64(MSR_BITMAP); + FIELD_OFFSET64(MSR_BITMAP_HIGH); + } + + if (cpu_has_vmx_tpr_shadow()) { + FIELD_OFFSET64(VIRTUAL_APIC_PAGE_ADDR); + FIELD_OFFSET64(VIRTUAL_APIC_PAGE_ADDR_HIGH); + } + + if (cpu_has_secondary_exec_ctrls()) { + if (vmcs_config.cpu_based_2nd_exec_ctrl +
Re: [PATCH 00/13] KVM: MMU: fast page fault
On Tue, Apr 10, 2012 at 01:04:13PM +0300, Avi Kivity wrote: On 04/09/2012 10:46 PM, Marcelo Tosatti wrote: Perhaps the mmu_lock hold times by get_dirty are a large component here? That's my concern, because it affects the scaling of migration for wider guests. If that can be alleviated, not only RO-RW faults benefit. Those are the most common types of faults on modern hardware, no? Depends on your workload, of course. If there is memory pressure, 0-PRESENT might be very frequent. My point is that reduction of mmu_lock contention is a good thing overall. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] ksysfs: export VMCSINFO via sysfs
This patch creates sysfs file to export where VMCSINFO is allocated, as below: $ cat /sys/kernel/vmcsinfo 1cb88a0 2000 number on the left-hand side is the physical address of VMCSINFO, while the one on the right-hand side is the max size of VMCSINFO. Signed-off-by: zhangyanfei zhangyan...@cn.fujitsu.com --- kernel/ksysfs.c | 19 +++ 1 files changed, 19 insertions(+), 0 deletions(-) diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c index 4e316e1..becbb68 100644 --- a/kernel/ksysfs.c +++ b/kernel/ksysfs.c @@ -18,6 +18,8 @@ #include linux/stat.h #include linux/sched.h #include linux/capability.h +#include asm/vmcsinfo.h +#include asm/virtext.h #define KERNEL_ATTR_RO(_name) \ static struct kobj_attribute _name##_attr = __ATTR_RO(_name) @@ -133,6 +135,20 @@ KERNEL_ATTR_RO(vmcoreinfo); #endif /* CONFIG_KEXEC */ +#ifdef CONFIG_X86 +static ssize_t vmcsinfo_show(struct kobject *kobj, +struct kobj_attribute *attr, char *buf) +{ + if (cpu_has_vmx()) + return sprintf(buf, %lx %x\n, + paddr_vmcsinfo_note(), + (unsigned int)vmcsinfo_max_size); + return 0; +} +KERNEL_ATTR_RO(vmcsinfo); + +#endif /* CONFIG_X86 */ + /* whether file capabilities are enabled */ static ssize_t fscaps_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) @@ -182,6 +198,9 @@ static struct attribute * kernel_attrs[] = { kexec_crash_size_attr.attr, vmcoreinfo_attr.attr, #endif +#ifdef CONFIG_X86 + vmcsinfo_attr.attr, +#endif NULL }; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] kexec: Add crash_save_vmcsinfo to update VMCSINFO
crash_save_vmcsinfo updates the VMCSINFO when kernel crashes. If no VMCSINFO has been saved before, this function will do nothing. Signed-off-by: zhangyanfei zhangyan...@cn.fujitsu.com --- include/linux/kexec.h |1 + kernel/kexec.c| 14 ++ 2 files changed, 15 insertions(+), 0 deletions(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 0d7d6a1..6e8ff13 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -145,6 +145,7 @@ void arch_crash_save_vmcoreinfo(void); __printf(1, 2) void vmcoreinfo_append_str(const char *fmt, ...); unsigned long paddr_vmcoreinfo_note(void); +void crash_save_vmcsinfo(void); #define VMCOREINFO_OSRELEASE(value) \ vmcoreinfo_append_str(OSRELEASE=%s\n, value) diff --git a/kernel/kexec.c b/kernel/kexec.c index 4e2e472..19843ef 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -38,6 +38,7 @@ #include asm/uaccess.h #include asm/io.h #include asm/sections.h +#include asm/vmcsinfo.h /* Per cpu memory for storing cpu states in case of system crash. */ note_buf_t __percpu *crash_notes; @@ -1094,6 +1095,7 @@ void crash_kexec(struct pt_regs *regs) crash_setup_regs(fixed_regs, regs); crash_save_vmcoreinfo(); + crash_save_vmcsinfo(); machine_crash_shutdown(fixed_regs); machine_kexec(kexec_crash_image); } @@ -1458,6 +1460,18 @@ unsigned long __attribute__ ((weak)) paddr_vmcoreinfo_note(void) return __pa((unsigned long)(char *)vmcoreinfo_note); } +#ifdef CONFIG_X86 +void crash_save_vmcsinfo(void) +{ + if (!vmcsinfo_size) + return; + vmcsinfo_append_str(CRASHTIME=%ld, get_seconds()); + update_vmcsinfo_note(); +} +#else +void crash_save_vmcsinfo(void) {} +#endif /* CONFIG_X86 */ + static int __init crash_save_vmcoreinfo_init(void) { VMCOREINFO_OSRELEASE(init_uts_ns.name.release); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Linux Crash Caused By KVM?
Hi,all I have met some problems while utilizing KVM。 The test environment is: Summary:Dell R610, 1 x Xeon E5645 2.40GHz, 47.1GB / 48GB 1333MHz DDR3 System: Dell PowerEdge R610 (Dell 08GXHX) Processors: 1 (of 2) x Xeon E5645 2.40GHz 5860MHz FSB (HT enabled, 6 cores, 24 threads) Memory: 47.1GB / 48GB 1333MHz DDR3 == 12 x 4GB Disk: sda: 299GB (72%) JBOD Disk: sdb (host9): 5.0TB JBOD == 1 x VIRTUAL-DISK Disk: sdc (host11): 5.0TB JBOD == 1 x VIRTUAL-DISK Disk: sdd (host12): 5.0TB JBOD == 1 x VIRTUAL-DISK Disk: sde (host10): 5.0TB JBOD == 1 x VIRTUAL-DISK Disk-Control: mpt2sas0: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] Disk-Control: host9: Disk-Control: host10: Disk-Control: host11: Disk-Control: host12: Chipset:Intel 82801IB (ICH9) Network:br1 (bridge): 14:fe:b5:dc:2c:6e Network:em1 (bnx2): Broadcom NetXtreme II BCM5709 Gigabit, 14:fe:b5:dc:2c:6e, 1000Mb/s full-duplex Network:em2 (bnx2): Broadcom NetXtreme II BCM5709 Gigabit, 14:fe:b5:dc:2c:70, 1000Mb/s full-duplex Network:em3 (bnx2): Broadcom NetXtreme II BCM5709 Gigabit, 14:fe:b5:dc:2c:72, 1000Mb/s full-duplex Network:em4 (bnx2): Broadcom NetXtreme II BCM5709 Gigabit, 14:fe:b5:dc:2c:74, 1000Mb/s full-duplex Network:vnet0 (tun): fe:16:3e:49:fb:05, 10Mb/s full-duplex Network:vnet1 (tun): fe:16:3e:cb:c0:d1, 10Mb/s full-duplex Network:vnet2 (tun): fe:16:3e:1e:c1:c4, 10Mb/s full-duplex Network:vnet3 (tun): fe:16:3e:d5:58:f4, 10Mb/s full-duplex Network:vnet4 (tun): fe:16:3e:15:b4:16, 10Mb/s full-duplex Network:vnet5 (tun): fe:16:3e:d2:07:47, 10Mb/s full-duplex Network:vnet6 (tun): fe:16:3e:e1:2b:b9, 10Mb/s full-duplex OS: RHEL Server 6.1 (Santiago), Linux 2.6.32-220.2.1.el6.x86_64 x86_64, 64-bit BIOS: Dell 3.0.0 01/31/2011 And during the term i utilize KVM, some issues happen: 1. Host Crash Caused by a. Kernel Panic 31 KERNEL: /usr/lib/debug/lib/modules/2.6.32-131.12.1.el6.x86_64/vmlinux 32 DUMPFILE: ../vmcore_2012.13.46 [PARTIAL DUMP] 33 CPUS: 24 34 DATE: Wed Jan 11 13:34:13 2012 35 UPTIME: 25 days, 04:11:05 36 LOAD AVERAGE: 223.16, 172.97, 158.23 37TASKS: 1464 38 NODENAME: dell2.localdomain 39 RELEASE: 2.6.32-131.12.1.el6.x86_64 40 VERSION: #1 SMP Sun Jul 31 16:44:56 EDT 2011 41 MACHINE: x86_64 (2394 Mhz) 42 MEMORY: 48 GB 43PANIC: kernel BUG at arch/x86/kernel/traps.c:547! 44 PID: 11851 45 COMMAND: qemu-kvm 46 TASK: 880c071c3500 [THREAD_INFO: 880c132d8000] 47 CPU: 1 48STATE: TASK_RUNNING (PANIC) 49 50 PID: 11851 TASK: 880c071c3500 CPU: 1 COMMAND: qemu-kvm 51 #0 [880028207be0] machine_kexec at 810310cb 52 #1 [880028207c40] crash_kexec at 810b6392 53 #2 [880028207d10] oops_end at 814de670 54 #3 [880028207d40] die at 8100f2eb 55 #4 [880028207d70] do_trap at 814ddf64 56 #5 [880028207dd0] do_invalid_op at 8100ceb5 57 #6 [880028207e70] invalid_op at 8100bf5b 58 [exception RIP: do_nmi+554] 59 RIP: 814de43a RSP: 880028207f28 RFLAGS: 00010002 60 RAX: 880c132d9fd8 RBX: 880028207f58 RCX: c101 61 RDX: 8800 RSI: RDI: 880028207f58 62 RBP: 880028207f48 R8: 88005ebf9800 R9: 880028203fc0 63 R10: 0034 R11: 03e8 R12: cc20 64 R13: 816024a0 R14: 88005ebf9800 R15: 7000 65 ORIG_RAX: CS: 0010 SS: 0018 66 #7 [880028207f50] nmi at 814ddc90 67 [exception RIP: bad_to_user+37] 68 RIP: 814e4e2b RSP: 880028207bb0 RFLAGS: 00010046 69 RAX: 880c132d9fd8 RBX: 880c132d9c48 RCX: 0001 70 RDX: RSI: 0001000b RDI: 880028207c08 71 RBP: 880028207c48 R8: 88005ebf9800 R9: 880028203fc0 72 R10: 0034 R11: 03e8 R12: cc20 73 R13: 816024a0 R14: 88005ebf9800 R15: 7000 74 ORIG_RAX: CS: 0010 SS: 0018 75 --- NMI exception stack --- For this problem, i found that panic is caused by BUG_ON(in_nmi()) which means NMI happened during another NMI Context; But i check the Intel Technical Manual and found While an NMI interrupt handler is executing, the processor disables additional calls to the NMI handler until the next IRET instruction is executed. So, how this happen? b. Qemu Process's CPU dead lock 28 KERNEL: /usr/lib/debug/lib/modules/2.6.32-131.12.1.el6.x86_64/vmlinux 29 DUMPFILE:
Re: [PATCH] kvm: set gsi_bits and max_gsi correctly
On Wed, Mar 28, 2012 at 02:18:05PM -0400, Jason Baron wrote: The current kvm_init_irq_routing() doesn't set up the used_gsi_bitmap correctly, and as a consequence pins max_gsi to 32 when it really should be 1024. I ran into this limitation while testing pci passthrough, where I consistently got an -ENOSPC return from kvm_get_irq_route_gsi() called from assigned_dev_update_msix_mmio(). Signed-off-by: Jason Baron jba...@redhat.com Applied to uq/master, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] virtio_blk: Add help function to format mass of disks
The current virtio block's naming algorithm just supports 18278 (26^3 + 26^2 + 26) disks. If there are mass of virtio blocks, there will be disks with the same name. Based on commit 3e1a7ff8a0a7b948f2684930166954f9e8e776fe, I add function virtblk_name_format() for virtio block to support mass of disks naming. Signed-off-by: Ren Mingxin re...@cn.fujitsu.com --- v1-v2: wipe off the duplicate line --- drivers/block/virtio_blk.c | 37 + 1 files changed, 25 insertions(+), 12 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index c4a60ba..07b8bf9 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -374,6 +374,30 @@ static int init_vq(struct virtio_blk *vblk) return err; } +static int virtblk_name_format(char *prefix, int index, char *buf, int buflen) +{ + const int base = 'z' - 'a' + 1; + char *begin = buf + strlen(prefix); + char *end = buf + buflen; + char *p; + int unit; + + p = end - 1; + *p = '\0'; + unit = base; + do { + if (p == begin) + return -EINVAL; + *--p = 'a' + (index % unit); + index = (index / unit) - 1; + } while (index = 0); + + memmove(begin, p, end - p); + memcpy(buf, prefix, strlen(prefix)); + + return 0; +} + static int __devinit virtblk_probe(struct virtio_device *vdev) { struct virtio_blk *vblk; @@ -442,18 +466,7 @@ static int __devinit virtblk_probe(struct virtio_device *vdev) q-queuedata = vblk; - if (index 26) { - sprintf(vblk-disk-disk_name, vd%c, 'a' + index % 26); - } else if (index (26 + 1) * 26) { - sprintf(vblk-disk-disk_name, vd%c%c, - 'a' + index / 26 - 1, 'a' + index % 26); - } else { - const unsigned int m1 = (index / 26 - 1) / 26 - 1; - const unsigned int m2 = (index / 26 - 1) % 26; - const unsigned int m3 = index % 26; - sprintf(vblk-disk-disk_name, vd%c%c%c, - 'a' + m1, 'a' + m2, 'a' + m3); - } + virtblk_name_format(vd, index, vblk-disk-disk_name, DISK_NAME_LEN); vblk-disk-major = major; vblk-disk-first_minor = index_to_minor(index); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: Avoid zapping unrelated shadows in __kvm_set_memory_region()
On 04/10/2012 09:05 PM, Takuya Yoshikawa wrote: diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 29ad6f9..a50f7ba 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3930,16 +3930,30 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot) kvm_flush_remote_tlbs(kvm); } -void kvm_mmu_zap_all(struct kvm *kvm) +/** + * kvm_mmu_zap_all - zap all shadows which have mappings into a given slot + * @kvm: the kvm instance + * @slot: id of the target slot + * + * If @slot is -1, zap all shadow pages. + */ +void kvm_mmu_zap_all(struct kvm *kvm, int slot) { struct kvm_mmu_page *sp, *node; LIST_HEAD(invalid_list); + int zapped; spin_lock(kvm-mmu_lock); restart: - list_for_each_entry_safe(sp, node, kvm-arch.active_mmu_pages, link) - if (kvm_mmu_prepare_zap_page(kvm, sp, invalid_list)) - goto restart; + zapped = 0; + list_for_each_entry_safe(sp, node, kvm-arch.active_mmu_pages, link) { + if ((slot = 0) !test_bit(slot, sp-slot_bitmap)) + continue; + + zapped |= kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); You should goto restart here like the origin code, also, safe version of list_for_each is not needed. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html