Re: [kvm-devel] [kvm-ppc-devel] [PATCH 1/5]Add some trace markers and exposeinterfaces in kernel for tracing
Liu, Eric E wrote: Hollis Blanchard wrote: On Wednesday 16 April 2008 01:45:34 Liu, Eric E wrote: [...] Actually... we could have kvmtrace itself insert the metadata, so there would be no chance of it being overwritten in the kernel buffers. The header could be written in tip_open_output(), and update fs_size accordingly. Yes, let kvmtrace insert the metadata is more reasonable. I wanted to note that the kvmtrace tool should, but not need to know everything about the data format. I think of e.g. changing kernel implementations that change endianess or even flags we don't yet know, but we might need in the future. What about adding another debugfs entry the kernel can use to expose the kvmtrace-metadata defined by the kernel implementation. The kvmtrace tool could then use that to build up the record by using one entry for kernel defined metadata and another to add any metadata that would be defined by kvmtrace tool itself. what about that one: struct metadata { u32 kmagic; /* stores kernel defined metadata read from debugfs entry */ u32 umagic; /* stores userspace tool defined metadata */ u32 extra; /* it is redundant, only use to fit into record. */ } That should give us the flexibility to keep the format if we get more metadata requirements in the future. -- Grüsse / regards, Christian Ehrhardt IBM Linux Technology Center, Open Virtualization - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] kvm-65/66 bug with Solaris 10 U4 ?
Avi Kivity wrote: Actually kvm is affected by pae: it enables nx support. Please try (separately) 1. Boot with 'noexec=off' on the host kernel command line 2.6.24.4-64.fc8PAE noexec=off: Using normal F8 modules qemu-kvm dies in the same way 2. Loading the kernel modules that come with kvm-66 Against 2.6.24.4-64.fc8 it works. I can't compile them against 2.6.24.4-64.fc8PAE as the module magic name mismatches, and I don't know how to change kernel-devel to know it's PAE. Probably won't be able to do any more tests that require a reboot till Tuesday now, but feel free to leave me some things to try. Ian - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] kvm-65/66 bug with Solaris 10 U4 ?
Ian Kirk wrote: Avi Kivity wrote: I do this regularly, basically you need to install kernel-devel and that's it. Yes, that is very easy isn't it. Oops to my stupidity. I've got it built and will give it a go tomorrow and report back on each test case. Please don't flame on kvm-devel, even if the flames are self-directed. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] (no subject)
各位老总:您们好! 诚祝:您们在2008年里;有鼠不尽的快乐!鼠不尽的收获!鼠不尽的钞票! 鼠不尽的幸福!鼠不尽的美满生活!愿:您们阖家欢乐!幸福安康! 我是(深圳市珊湖岛进出口有限公司)的负责人;可以提供:出口报关单, 核销单等等一系列手续;代理:出口报关,商检,境内外运输..等等;还可 以代办:出口欧盟许可证,欧盟产地证;并且还有(广州国际贸易交易会)的摊 位可以转让;有意者请来邮件或来电联系。 电话:0755-81153047。 传真:0755-81172940。 手机:15817477278。 联系人:钟文辉。 此致: 敬礼! - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] kvm-65/66 bug with Solaris 10 U4 ?
Avi Kivity wrote: Yes, that is very easy isn't it. Oops to my stupidity. I've got it built and will give it a go tomorrow and report back on each test case. Please don't flame on kvm-devel, even if the flames are self-directed. Er, OK... - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] (no subject)
各位老总:您们好! 诚祝:您们在2008年里;有鼠不尽的快乐!鼠不尽的收获!鼠不尽的钞票! 鼠不尽的幸福!鼠不尽的美满生活!愿:您们阖家欢乐!幸福安康! 我是(深圳市珊湖岛进出口有限公司)的负责人;可以提供:出口报关单, 核销单等等一系列手续;代理:出口报关,商检,境内外运输..等等;还可 以代办:出口欧盟许可证,欧盟产地证;并且还有(广州国际贸易交易会)的摊 位可以转让;有意者请来邮件或来电联系。 电话:0755-81153047。 传真:0755-81172940。 手机:15817477278。 联系人:钟文辉。 此致: 敬礼! - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH 6/6] kvm: qemu: Enable EPT support for real mode
From 73c33765f3d879001818cd0719038c78a0c65561 Mon Sep 17 00:00:00 2001 From: Sheng Yang [EMAIL PROTECTED] Date: Fri, 18 Apr 2008 17:15:39 +0800 Subject: [PATCH] kvm: qemu: Enable EPT support for real mode This patch build a identity page table on the last page of VGA bios, and use it as the guest page table in nonpaging mode for EPT. Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- qemu/hw/pc.c | 14 ++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c index ae87ab9..dcb98c6 100644 --- a/qemu/hw/pc.c +++ b/qemu/hw/pc.c @@ -780,6 +780,9 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size, int index; BlockDriverState *hd[MAX_IDE_BUS * MAX_IDE_DEVS]; BlockDriverState *fd[MAX_FD]; +#ifdef USE_KVM +uint32_t *table_items; +#endif if (ram_size = 0xe000 ) { above_4g_mem_size = ram_size - 0xe000; @@ -857,6 +860,17 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size, exit(1); } +#ifdef USE_KVM +if (kvm_allowed) { + /* set up identity map for EPT at the last page of VGA BIOS region. +* 0xe7 = _PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | +*_PAGE_DIRTY | _PAGE_PSE */ + table_items = (void *)(phys_ram_base + vga_bios_offset + 0xf000); + for (i = 0; i 1024; i++) + table_items[i] = (i 22) + 0xe7; +} +#endif + /* above 4giga memory allocation */ if (above_4g_mem_size 0) { ram_addr = qemu_ram_alloc(above_4g_mem_size); -- 1.5.4.5 From 73c33765f3d879001818cd0719038c78a0c65561 Mon Sep 17 00:00:00 2001 From: Sheng Yang [EMAIL PROTECTED] Date: Fri, 18 Apr 2008 17:15:39 +0800 Subject: [PATCH] kvm: qemu: Enable EPT support for real mode This patch build a identity page table on the last page of VGA bios, and use it as the guest page table in nonpaging mode for EPT. Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- qemu/hw/pc.c | 14 ++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c index ae87ab9..dcb98c6 100644 --- a/qemu/hw/pc.c +++ b/qemu/hw/pc.c @@ -780,6 +780,9 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size, int index; BlockDriverState *hd[MAX_IDE_BUS * MAX_IDE_DEVS]; BlockDriverState *fd[MAX_FD]; +#ifdef USE_KVM +uint32_t *table_items; +#endif if (ram_size = 0xe000 ) { above_4g_mem_size = ram_size - 0xe000; @@ -857,6 +860,17 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size, exit(1); } +#ifdef USE_KVM +if (kvm_allowed) { + /* set up identity map for EPT at the last page of VGA BIOS region. + * 0xe7 = _PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | + * _PAGE_DIRTY | _PAGE_PSE */ + table_items = (void *)(phys_ram_base + vga_bios_offset + 0xf000); + for (i = 0; i 1024; i++) + table_items[i] = (i 22) + 0xe7; +} +#endif + /* above 4giga memory allocation */ if (above_4g_mem_size 0) { ram_addr = qemu_ram_alloc(above_4g_mem_size); -- 1.5.4.5 - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH 1/6] KVM: VMX: EPT Feature Detection
From 9e723871299268e844c9e72f3903ba5f4eb71751 Mon Sep 17 00:00:00 2001 From: Sheng Yang [EMAIL PROTECTED] Date: Fri, 18 Apr 2008 17:02:59 +0800 Subject: [PATCH 1/5] KVM: VMX: EPT Feature Detection Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- arch/x86/kvm/vmx.c | 63 +++ arch/x86/kvm/vmx.h | 25 2 files changed, 83 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 8e5d664..d93250d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -42,6 +42,9 @@ module_param(enable_vpid, bool, 0); static int flexpriority_enabled = 1; module_param(flexpriority_enabled, bool, 0); +static int enable_ept; +module_param(enable_ept, bool, 0); + struct vmcs { u32 revision_id; u32 abort; @@ -107,6 +110,11 @@ static struct vmcs_config { u32 vmentry_ctrl; } vmcs_config; +struct vmx_capability { + u32 ept; + u32 vpid; +} vmx_capability; + #define VMX_SEGMENT_FIELD(seg) \ [VCPU_SREG_##seg] = { \ .selector = GUEST_##seg##_SELECTOR, \ @@ -214,6 +222,32 @@ static inline bool cpu_has_vmx_virtualize_apic_accesses(void) SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES); } +static inline int cpu_has_vmx_invept_individual_addr(void) +{ + return (!!(vmx_capability.ept VMX_EPT_EXTENT_INDIVIDUAL_BIT)); +} + +static inline int cpu_has_vmx_invept_context(void) +{ + return (!!(vmx_capability.ept VMX_EPT_EXTENT_CONTEXT_BIT)); +} + +static inline int cpu_has_vmx_invept_global(void) +{ + return (!!(vmx_capability.ept VMX_EPT_EXTENT_GLOBAL_BIT)); +} + +static inline int cpu_has_vmx_ept(void) +{ + return (vmcs_config.cpu_based_2nd_exec_ctrl + SECONDARY_EXEC_ENABLE_EPT); +} + +static inline int vm_need_ept(void) +{ + return (cpu_has_vmx_ept() enable_ept); +} + static inline int vm_need_virtualize_apic_accesses(struct kvm *kvm) { return ((cpu_has_vmx_virtualize_apic_accesses()) @@ -985,7 +1019,7 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt, static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) { u32 vmx_msr_low, vmx_msr_high; - u32 min, opt; + u32 min, opt, min2, opt2; u32 _pin_based_exec_control = 0; u32 _cpu_based_exec_control = 0; u32 _cpu_based_2nd_exec_control = 0; @@ -1003,6 +1037,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | #endif + CPU_BASED_CR3_LOAD_EXITING | + CPU_BASED_CR3_STORE_EXITING | CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MOV_DR_EXITING | CPU_BASED_USE_TSC_OFFSETING; @@ -1018,11 +1054,13 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) ~CPU_BASED_CR8_STORE_EXITING; #endif if (_cpu_based_exec_control CPU_BASED_ACTIVATE_SECONDARY_CONTROLS) { - min = 0; - opt = SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | + min2 = 0; + opt2 = SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | SECONDARY_EXEC_WBINVD_EXITING | - SECONDARY_EXEC_ENABLE_VPID; - if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PROCBASED_CTLS2, + SECONDARY_EXEC_ENABLE_VPID | + SECONDARY_EXEC_ENABLE_EPT; + if (adjust_vmx_controls(min2, opt2, + MSR_IA32_VMX_PROCBASED_CTLS2, _cpu_based_2nd_exec_control) 0) return -EIO; } @@ -1031,6 +1069,16 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) _cpu_based_exec_control = ~CPU_BASED_TPR_SHADOW; #endif + if (_cpu_based_2nd_exec_control SECONDARY_EXEC_ENABLE_EPT) { + /* CR3 accesses don't need to cause VM Exits when EPT enabled */ + min = ~(CPU_BASED_CR3_LOAD_EXITING | +CPU_BASED_CR3_STORE_EXITING); + if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PROCBASED_CTLS, + _cpu_based_exec_control) 0) + return -EIO; + rdmsr(MSR_IA32_VMX_EPT_VPID_CAP, + vmx_capability.ept, vmx_capability.vpid); + } min = 0; #ifdef CONFIG_X86_64 @@ -1638,6 +1686,9 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) CPU_BASED_CR8_LOAD_EXITING; #endif } + if (!vm_need_ept()) + exec_control |= CPU_BASED_CR3_STORE_EXITING | + CPU_BASED_CR3_LOAD_EXITING;
[kvm-devel] [PATCH 0/6] Enable EPT on KVM v3
Hi This patchset enabled EPT on KVM. The most obvious improvement is the separate construction of EPT table has been discarded completely. Now EPT reused ordinary MMU for building the EPT table. The code size is greatly reduced and this also solved the display problem. But I think it also have impact of scalability... But currently, S/R and live migration still got problem. I am working on it now. -- Thanks Yang, Sheng - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH 3/6] KVM: MMU: Add EPT support
From cb851671421832d37c7d90976b603b59a5c75c79 Mon Sep 17 00:00:00 2001 From: Sheng Yang [EMAIL PROTECTED] Date: Fri, 18 Apr 2008 17:05:06 +0800 Subject: [PATCH 3/5] KVM: MMU: Add EPT support Enable kvm_set_spte() to generate EPT entries. Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- arch/x86/kvm/mmu.c | 44 ++-- arch/x86/kvm/x86.c |3 +++ include/asm-x86/kvm_host.h |3 +++ 3 files changed, 40 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 108886d..1828837 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -161,6 +161,12 @@ static struct kmem_cache *mmu_page_header_cache; static u64 __read_mostly shadow_trap_nonpresent_pte; static u64 __read_mostly shadow_notrap_nonpresent_pte; +static u64 __read_mostly shadow_base_present_pte; +static u64 __read_mostly shadow_nx_mask; +static u64 __read_mostly shadow_x_mask;/* mutual exclusive with nx_mask */ +static u64 __read_mostly shadow_user_mask; +static u64 __read_mostly shadow_accessed_mask; +static u64 __read_mostly shadow_dirty_mask; void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte) { @@ -169,6 +175,23 @@ void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte) } EXPORT_SYMBOL_GPL(kvm_mmu_set_nonpresent_ptes); +void kvm_mmu_set_base_ptes(u64 base_pte) +{ + shadow_base_present_pte = base_pte; +} +EXPORT_SYMBOL_GPL(kvm_mmu_set_base_ptes); + +void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, + u64 dirty_mask, u64 nx_mask, u64 x_mask) +{ + shadow_user_mask = user_mask; + shadow_accessed_mask = accessed_mask; + shadow_dirty_mask = dirty_mask; + shadow_nx_mask = nx_mask; + shadow_x_mask = x_mask; +} +EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes); + static int is_write_protection(struct kvm_vcpu *vcpu) { return vcpu-arch.cr0 X86_CR0_WP; @@ -207,7 +230,7 @@ static int is_writeble_pte(unsigned long pte) static int is_dirty_pte(unsigned long pte) { - return pte PT_DIRTY_MASK; + return pte shadow_dirty_mask; } static int is_rmap_pte(u64 pte) @@ -522,7 +545,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) return; sp = page_header(__pa(spte)); pfn = spte_to_pfn(*spte); - if (*spte PT_ACCESSED_MASK) + if (*spte shadow_accessed_mask) kvm_set_pfn_accessed(pfn); if (is_writeble_pte(*spte)) kvm_release_pfn_dirty(pfn); @@ -1048,17 +1071,18 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, * whether the guest actually used the pte (in order to detect * demand paging). */ - spte = PT_PRESENT_MASK | PT_DIRTY_MASK; + spte = shadow_base_present_pte | shadow_dirty_mask; if (!speculative) pte_access |= PT_ACCESSED_MASK; if (!dirty) pte_access = ~ACC_WRITE_MASK; - if (!(pte_access ACC_EXEC_MASK)) - spte |= PT64_NX_MASK; - - spte |= PT_PRESENT_MASK; + if (pte_access ACC_EXEC_MASK) { + if (shadow_x_mask) + spte |= shadow_x_mask; + } else if (shadow_nx_mask) + spte |= shadow_nx_mask; if (pte_access ACC_USER_MASK) - spte |= PT_USER_MASK; + spte |= shadow_user_mask; if (largepage) spte |= PT_PAGE_SIZE_MASK; @@ -1164,7 +1188,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, } table[index] = __pa(new_table-spt) | PT_PRESENT_MASK - | PT_WRITABLE_MASK | PT_USER_MASK; + | PT_WRITABLE_MASK | shadow_user_mask; } table_addr = table[index] PT64_BASE_ADDR_MASK; } @@ -1608,7 +1632,7 @@ static bool last_updated_pte_accessed(struct kvm_vcpu *vcpu) { u64 *spte = vcpu-arch.last_pte_updated; - return !!(spte (*spte PT_ACCESSED_MASK)); + return !!(spte (*spte shadow_accessed_mask)); } static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0ce5563..0735efb 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2417,6 +2417,9 @@ int kvm_arch_init(void *opaque) kvm_x86_ops = ops; kvm_mmu_set_nonpresent_ptes(0ull, 0ull); + kvm_mmu_set_base_ptes(PT_PRESENT_MASK); + kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK, + PT_DIRTY_MASK, PT64_NX_MASK, 0); return 0; out: diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h index 31aa7d6..9f62773 100644 --- a/include/asm-x86/kvm_host.h +++ b/include/asm-x86/kvm_host.h @@ -432,6 +432,9 @@ void kvm_mmu_destroy(struct kvm_vcpu *vcpu); int kvm_mmu_create(struct kvm_vcpu *vcpu); int kvm_mmu_setup(struct kvm_vcpu *vcpu);
[kvm-devel] [PATCH 2/6] KVM: MMU: Move some defination
From a5ee291f056256f8a892393410bc5923ff575a3b Mon Sep 17 00:00:00 2001 From: Sheng Yang [EMAIL PROTECTED] Date: Fri, 18 Apr 2008 17:03:53 +0800 Subject: [PATCH 2/5] KVM: MMU: Move some defination for building common entries Move some defination to mmu.h in order to building common table entries. Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- arch/x86/kvm/mmu.c | 25 - arch/x86/kvm/mmu.h | 24 2 files changed, 24 insertions(+), 25 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 078a7f1..108886d 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -84,31 +84,6 @@ static int dbg = 1; #define PT32_PT_BITS 10 #define PT32_ENT_PER_PAGE (1 PT32_PT_BITS) -#define PT_WRITABLE_SHIFT 1 - -#define PT_PRESENT_MASK (1ULL 0) -#define PT_WRITABLE_MASK (1ULL PT_WRITABLE_SHIFT) -#define PT_USER_MASK (1ULL 2) -#define PT_PWT_MASK (1ULL 3) -#define PT_PCD_MASK (1ULL 4) -#define PT_ACCESSED_MASK (1ULL 5) -#define PT_DIRTY_MASK (1ULL 6) -#define PT_PAGE_SIZE_MASK (1ULL 7) -#define PT_PAT_MASK (1ULL 7) -#define PT_GLOBAL_MASK (1ULL 8) -#define PT64_NX_SHIFT 63 -#define PT64_NX_MASK (1ULL PT64_NX_SHIFT) - -#define PT_PAT_SHIFT 7 -#define PT_DIR_PAT_SHIFT 12 -#define PT_DIR_PAT_MASK (1ULL PT_DIR_PAT_SHIFT) - -#define PT32_DIR_PSE36_SIZE 4 -#define PT32_DIR_PSE36_SHIFT 13 -#define PT32_DIR_PSE36_MASK \ - (((1ULL PT32_DIR_PSE36_SIZE) - 1) PT32_DIR_PSE36_SHIFT) - - #define PT_FIRST_AVAIL_BITS_SHIFT 9 #define PT64_SECOND_AVAIL_BITS_SHIFT 52 diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index e64e9f5..271c011 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -9,6 +9,30 @@ #define TDP_ROOT_LEVEL PT32E_ROOT_LEVEL #endif +#define PT_WRITABLE_SHIFT 1 + +#define PT_PRESENT_MASK (1ULL 0) +#define PT_WRITABLE_MASK (1ULL PT_WRITABLE_SHIFT) +#define PT_USER_MASK (1ULL 2) +#define PT_PWT_MASK (1ULL 3) +#define PT_PCD_MASK (1ULL 4) +#define PT_ACCESSED_MASK (1ULL 5) +#define PT_DIRTY_MASK (1ULL 6) +#define PT_PAGE_SIZE_MASK (1ULL 7) +#define PT_PAT_MASK (1ULL 7) +#define PT_GLOBAL_MASK (1ULL 8) +#define PT64_NX_SHIFT 63 +#define PT64_NX_MASK (1ULL PT64_NX_SHIFT) + +#define PT_PAT_SHIFT 7 +#define PT_DIR_PAT_SHIFT 12 +#define PT_DIR_PAT_MASK (1ULL PT_DIR_PAT_SHIFT) + +#define PT32_DIR_PSE36_SIZE 4 +#define PT32_DIR_PSE36_SHIFT 13 +#define PT32_DIR_PSE36_MASK \ + (((1ULL PT32_DIR_PSE36_SIZE) - 1) PT32_DIR_PSE36_SHIFT) + static inline void kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu) { if (unlikely(vcpu-kvm-arch.n_free_mmu_pages KVM_MIN_FREE_MMU_PAGES)) -- 1.5.4.5 From a5ee291f056256f8a892393410bc5923ff575a3b Mon Sep 17 00:00:00 2001 From: Sheng Yang [EMAIL PROTECTED] Date: Fri, 18 Apr 2008 17:03:53 +0800 Subject: [PATCH 2/5] KVM: MMU: Move some defination for building common entries Move some defination to mmu.h in order to building common table entries. Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- arch/x86/kvm/mmu.c | 25 - arch/x86/kvm/mmu.h | 24 2 files changed, 24 insertions(+), 25 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 078a7f1..108886d 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -84,31 +84,6 @@ static int dbg = 1; #define PT32_PT_BITS 10 #define PT32_ENT_PER_PAGE (1 PT32_PT_BITS) -#define PT_WRITABLE_SHIFT 1 - -#define PT_PRESENT_MASK (1ULL 0) -#define PT_WRITABLE_MASK (1ULL PT_WRITABLE_SHIFT) -#define PT_USER_MASK (1ULL 2) -#define PT_PWT_MASK (1ULL 3) -#define PT_PCD_MASK (1ULL 4) -#define PT_ACCESSED_MASK (1ULL 5) -#define PT_DIRTY_MASK (1ULL 6) -#define PT_PAGE_SIZE_MASK (1ULL 7) -#define PT_PAT_MASK (1ULL 7) -#define PT_GLOBAL_MASK (1ULL 8) -#define PT64_NX_SHIFT 63 -#define PT64_NX_MASK (1ULL PT64_NX_SHIFT) - -#define PT_PAT_SHIFT 7 -#define PT_DIR_PAT_SHIFT 12 -#define PT_DIR_PAT_MASK (1ULL PT_DIR_PAT_SHIFT) - -#define PT32_DIR_PSE36_SIZE 4 -#define PT32_DIR_PSE36_SHIFT 13 -#define PT32_DIR_PSE36_MASK \ - (((1ULL PT32_DIR_PSE36_SIZE) - 1) PT32_DIR_PSE36_SHIFT) - - #define PT_FIRST_AVAIL_BITS_SHIFT 9 #define PT64_SECOND_AVAIL_BITS_SHIFT 52 diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index e64e9f5..271c011 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -9,6 +9,30 @@ #define TDP_ROOT_LEVEL PT32E_ROOT_LEVEL #endif +#define PT_WRITABLE_SHIFT 1 + +#define PT_PRESENT_MASK (1ULL 0) +#define PT_WRITABLE_MASK (1ULL PT_WRITABLE_SHIFT) +#define PT_USER_MASK (1ULL 2) +#define PT_PWT_MASK (1ULL 3) +#define PT_PCD_MASK (1ULL 4) +#define PT_ACCESSED_MASK (1ULL 5) +#define PT_DIRTY_MASK (1ULL 6) +#define PT_PAGE_SIZE_MASK (1ULL 7) +#define PT_PAT_MASK (1ULL 7) +#define PT_GLOBAL_MASK (1ULL 8) +#define PT64_NX_SHIFT 63 +#define PT64_NX_MASK (1ULL PT64_NX_SHIFT) + +#define PT_PAT_SHIFT 7 +#define PT_DIR_PAT_SHIFT 12 +#define PT_DIR_PAT_MASK (1ULL PT_DIR_PAT_SHIFT) + +#define PT32_DIR_PSE36_SIZE 4 +#define
[kvm-devel] [PATCH 4/6] KVM: Export necessary function for EPT
From 5d4a79e5edfc09b54bd83a3a289cbb82058e3daa Mon Sep 17 00:00:00 2001 From: Sheng Yang [EMAIL PROTECTED] Date: Fri, 18 Apr 2008 17:05:20 +0800 Subject: [PATCH 4/5] KVM: Export necessary function for EPT The function gfn_to_gva is necessary for handling EPT violation. Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- virt/kvm/kvm_main.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 0998455..d028e07 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -522,6 +522,7 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn) return bad_hva(); return (slot-userspace_addr + (gfn - slot-base_gfn) * PAGE_SIZE); } +EXPORT_SYMBOL_GPL(gfn_to_hva); /* * Requires current-mm-mmap_sem to be held -- 1.5.4.5 From 5d4a79e5edfc09b54bd83a3a289cbb82058e3daa Mon Sep 17 00:00:00 2001 From: Sheng Yang [EMAIL PROTECTED] Date: Fri, 18 Apr 2008 17:05:20 +0800 Subject: [PATCH 4/5] KVM: Export necessary function for EPT Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- virt/kvm/kvm_main.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 0998455..d028e07 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -522,6 +522,7 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn) return bad_hva(); return (slot-userspace_addr + (gfn - slot-base_gfn) * PAGE_SIZE); } +EXPORT_SYMBOL_GPL(gfn_to_hva); /* * Requires current-mm-mmap_sem to be held -- 1.5.4.5 - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH 5/6] KVM: VMX: Enable EPT feature for KVM
From 43eb727046349aac3df52317dbbfd3b4b33c084d Mon Sep 17 00:00:00 2001 From: Sheng Yang [EMAIL PROTECTED] Date: Fri, 18 Apr 2008 17:07:31 +0800 Subject: [PATCH 5/5] KVM: VMX: Enable EPT feature for KVM Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- arch/x86/kvm/mmu.c | 11 ++- arch/x86/kvm/vmx.c | 227 ++-- arch/x86/kvm/vmx.h | 11 ++ include/asm-x86/kvm_host.h |1 + 4 files changed, 241 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 1828837..c7b7335 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1187,8 +1187,15 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, return -ENOMEM; } - table[index] = __pa(new_table-spt) | PT_PRESENT_MASK - | PT_WRITABLE_MASK | shadow_user_mask; + if (shadow_user_mask) + table[index] = __pa(new_table-spt) + | PT_PRESENT_MASK | PT_WRITABLE_MASK + | shadow_user_mask; + else + table[index] = __pa(new_table-spt) + | PT_PRESENT_MASK | PT_WRITABLE_MASK + | shadow_x_mask; + table[index] = __pa(new_table-spt) | 0x7; } table_addr = table[index] PT64_BASE_ADDR_MASK; } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d93250d..2a85930 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -42,7 +42,7 @@ module_param(enable_vpid, bool, 0); static int flexpriority_enabled = 1; module_param(flexpriority_enabled, bool, 0); -static int enable_ept; +static int enable_ept = 1; module_param(enable_ept, bool, 0); struct vmcs { @@ -284,6 +284,18 @@ static inline void __invvpid(int ext, u16 vpid, gva_t gva) : : a(operand), c(ext) : cc, memory); } +static inline void __invept(int ext, u64 eptp, gpa_t gpa) +{ + struct { + u64 eptp, gpa; + } operand = {eptp, gpa}; + + asm volatile (ASM_VMX_INVEPT + /* CF==1 or ZF==1 -- rc = -1 */ + ; ja 1f ; ud2 ; 1:\n + : : a (operand), c (ext) : cc, memory); +} + static struct kvm_msr_entry *find_msr_entry(struct vcpu_vmx *vmx, u32 msr) { int i; @@ -335,6 +347,33 @@ static inline void vpid_sync_vcpu_all(struct vcpu_vmx *vmx) __invvpid(VMX_VPID_EXTENT_SINGLE_CONTEXT, vmx-vpid, 0); } +static inline void ept_sync_global(void) +{ + if (cpu_has_vmx_invept_global()) + __invept(VMX_EPT_EXTENT_GLOBAL, 0, 0); +} + +static inline void ept_sync_context(u64 eptp) +{ + if (vm_need_ept()) { + if (cpu_has_vmx_invept_context()) + __invept(VMX_EPT_EXTENT_CONTEXT, eptp, 0); + else + ept_sync_global(); + } +} + +static inline void ept_sync_individual_addr(u64 eptp, gpa_t gpa) +{ + if (vm_need_ept()) { + if (cpu_has_vmx_invept_individual_addr()) + __invept(VMX_EPT_EXTENT_INDIVIDUAL_ADDR, + eptp, gpa); + else + ept_sync_context(eptp); + } +} + static unsigned long vmcs_readl(unsigned long field) { unsigned long value; @@ -422,6 +461,8 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu) eb |= 1u 1; if (vcpu-arch.rmode.active) eb = ~0; + if (vm_need_ept()) + eb = ~(1u PF_VECTOR); /* bypass_guest_pf = 0 */ vmcs_write32(EXCEPTION_BITMAP, eb); } @@ -1352,8 +1393,64 @@ static void vmx_decache_cr4_guest_bits(struct kvm_vcpu *vcpu) vcpu-arch.cr4 |= vmcs_readl(GUEST_CR4) ~KVM_GUEST_CR4_MASK; } +static void ept_load_pdptrs(struct kvm_vcpu *vcpu) +{ + if (is_paging(vcpu) is_pae(vcpu) !is_long_mode(vcpu)) { + if (!load_pdptrs(vcpu, vcpu-arch.cr3)) { + printk(KERN_ERR EPT: Fail to load pdptrs!\n); + return; + } + vmcs_write64(GUEST_PDPTR0, vcpu-arch.pdptrs[0]); + vmcs_write64(GUEST_PDPTR1, vcpu-arch.pdptrs[1]); + vmcs_write64(GUEST_PDPTR2, vcpu-arch.pdptrs[2]); + vmcs_write64(GUEST_PDPTR3, vcpu-arch.pdptrs[3]); + } +} + +static void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4); + +static void ept_update_paging_mode_cr0(unsigned long *hw_cr0, + unsigned long cr0, + struct kvm_vcpu *vcpu) +{ + if (!(cr0 X86_CR0_PG)) { + /* From paging/starting to nonpaging */ +
Re: [kvm-devel] [PATCH 1/1] QEMU/KVM: Support for PCI Passthrough
On Fri, Apr 18, 2008 at 2:39 PM, Amit Shah [EMAIL PROTECTED] wrote: * On Monday 14 Apr 2008 06:01:07 Samuel Masham wrote: Please keep the userspace support alive. I am particularly interested in using the pci-passthough to qemu running non x86 system emulation (at the moment mips) My hope is that the pci - passthough could help with developing drivers and testing across architectures... OK; keeping support around won't be too much of a hassle, though the current support for pci-passthrough and the irqhook module are developed with x86 in mind (and only tested on x86). Since it's not tested on any other architecture, I've marked it TARGET_I386 and TARGET_X86_64 for now. Feel free to extend it to other architectures. Thanks, I will let you know if I get anything useful out of it. Samuel - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] Чем НЕЛЬЗЯ пренебрегать?
Приглашатся! Руководители и сотрудники служб безопасности, юрисконсульты, руководители предприятий, финансовые и коммерческие директора принять участие в мероприятии: ...Э к о н о м и ч е с к а я безопасность предприятия... 12 - 16 мая 2008г. в Санкт-Петербурге Информационный отдел: (812)... 983... 37... 96 Основные блоки программы:... Основы экономической безопасности предприятия. Общие положения теории экономической безопасности. Основные направления обеспечения безопасности предприятия. Внешние и внутренние угрозы. Определение эффективности работы службы экономической безопасности. Функции, задачи и направления деятельности СБ. Планирование работы. Система и методы анализа и управления экономическими рисками. Корпоративные захваты: инструменты обнаружения и противодействия корпоративным захватам. Правовые и экономические аспекты процесса недружественного поглощения. Сценарии проведения. Место деловой разведки в обеспечении экономической безопасности бизнеса. Понятие деловой разведки. Роль бизнесразведки в принятии управленческого решения. Бизнесразведка и безопасность бизнеса. Внезапная проверка. Процедура проведения проверок, виды проверок и основания проведения. Интегрированная система охраны объектов. Основы информационной безопасности предприятия. Правовые основы защиты конфиденциальной информации. Мероприятия по защите конфиденциальной информации. Меры по обеспечению информационной безопасности предприятия, связанные с кадровой работой. Технические средства промышленного шпионажа и средства их обнаружения. Защита компьютерной информации. Практическая демонстрация возможностей средств контроля и управления доступом, средств противодействия промышленному шпионажу. По запросу высылается полная программа (812)... 983... 37... 96 - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Extboot Option ROM rewritten in C - v3
On Thu, Apr 17, 2008 at 2:58 PM, H. Peter Anvin [EMAIL PROTECTED] wrote: Nguyen Anh Quynh wrote: This patch replaces the current assembly code of Extboot option rom with new C code. Patch is against kvm-66. This version returns an error code in case int 13 handler cannot handle a requested function. Signed-off-by: Nguyen Anh Quynh [EMAIL PROTECTED] + /* -fomit-frame-pointer might clobber %ebp */ + pushl %ebp + call setup + popl %ebp No, it might not. %ebx, %ebp, %esi, and %edi are guaranteed preserved; %eax, %ecx and %edx are clobbered. It's also prudent to call cld before jumping to C code. OK, I will fix these in the next version. Thanks, Q - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Extboot Option ROM rewritten in C - v3
On Thu, Apr 17, 2008 at 3:00 PM, H. Peter Anvin [EMAIL PROTECTED] wrote: + .globl linux_boot +linux_boot: + cli + cld + mov $0x9000, %ax + mov %ax, %ds + mov %ax, %es + mov %ax, %fs + mov %ax, %gs + mov %ax, %ss + mov $0x8ffe, %sp + ljmp $0x9000 + 0x20, $0 The hard use of segment 9000 is really highly unfortunate for bzImage, since it restricts its heap more than necessary. I suggest following the patterns used by the (new) Qemu loader. Actually, this code is left from the original code of Anthony, and it seems he took it from qemu 0.8 version. Anthony, may you explain why you want to hijact the linux boot process here? If I understand correctly, we can just let the original int19 execute, and if linux boot is desired, it would work in normal way. So why you want to do this? Thanks, Q - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Extboot Option ROM rewritten in C - v3
On Thu, Apr 17, 2008 at 4:36 PM, Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote: On Thu, Apr 17, 2008 at 10:30:27AM +0900, Nguyen Anh Quynh wrote: +++ b/extboot/farvar.h @@ -0,0 +1,113 @@ +// Code to access multiple segments within gcc. +// +// Copyright (C) 2008 Kevin O'Connor [EMAIL PROTECTED] +// +// This file may be distributed under the terms of the GNU GPLv3 license. IANAL but wouldn't this make extboot GPLv3 only? how that will interact with the GPLv2 extboot qemu? I am not sure if that is fine, but it might be better to have the same license for every code. I will contact Kevin when the next version is ready (I am still fixing something) Thanks, Q - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] gfxboot VMX workaround v2
On Tue, 15 Apr 2008 16:06:43 +0300 Avi Kivity [EMAIL PROTECTED] wrote: ... handle_vmentry_failure: invalid guest state handle_vmentry_failure: start emulation handle_vmentry_failure: emulation failed What instruction failed, exactly? I added the code do dump the instruction and it seems that it's the emulation of 0xe6 (== out imm8, al) that failed. I made modifications to emulate it (see below) and now I have another problem in kvm userspace with the following message (and the emulation doesn't work): enterprise:~ $ kvm_run: Operation not permitted enterprise:~ $ kvm_run returned -1 You need to load rip as well. Ooops, yes. So jump far emulation is now like: + case 0xea: /* jmp far */ { + struct kvm_segment kvm_seg; + long int eip; + int ret; + + kvm_x86_ops-get_segment(ctxt-vcpu, kvm_seg, VCPU_SREG_CS); + + ret = load_segment_descriptor(ctxt-vcpu, kvm_seg.selector, 9, VCPU_SREG_CS); + if (ret 0){ + printk(KERN_INFO %s: Failed to load CS descriptor\n, __FUNCTION__); + goto cannot_emulate; + } + + switch (c-op_bytes) { + case 2: + eip = insn_fetch(s16, 2, c-eip); + break; + case 4: + eip = insn_fetch(s32, 4, c-eip); + break; + default: + DPRINTF(jmp far: Invalid op_bytes\n); + goto cannot_emulate; + } + printk(KERN_INFO eip == 0x%lx\n, eip); + c-eip = eip; + break; + } It seems that the jump to cs:eip works and now I have the following error: [18535.446917] handle_vmentry_failure: invalid guest state [18535.449519] handle_vmentry_failure: start emulation [18535.457519] eip == 0x6e18 [18535.467685] handle_vmentry_failure: emulation of 0xe6 failed For the emulation of 0xe6 I used the following one that I found in nitin's tree: + case 0xe6: /* out imm8, al */ + case 0xe7: /* out imm8, ax/eax */ { + struct kvm_io_device *pio_dev; + + pio_dev = vcpu_find_pio_dev(ctxt-vcpu, c-src.val); + kvm_iodevice_write(pio_dev, c-src.val, + (c-d ByteOp) ? 1 : c-op_bytes, + c-regs[VCPU_REGS_RAX]); + } + break; I will look closer where is the problem and as you suggested, I will display the instruction to be emulated and the register state before and after, and compare with the expected state. Thanks for your help, Regards, Guillaume - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] kvm-65/66 bug with Solaris 10 U4 ?
Ian Kirk wrote: I can't compile them against 2.6.24.4-64.fc8PAE as the module magic name mismatches, and I don't know how to change kernel-devel to know it's PAE. You just need to install kernel-PAE-devel package, and then build against that. Chris Lalancette - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Extboot Option ROM rewritten in C - v3
Nguyen Anh Quynh wrote: Actually, this code is left from the original code of Anthony, and it seems he took it from qemu 0.8 version. Anthony, may you explain why you want to hijact the linux boot process here? If I understand correctly, we can just let the original int19 execute, and if linux boot is desired, it would work in normal way. So why you want to do this? I'm having exactly the *opposite* question... why does extboot have code to hook int 13h? -hpa - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/3] Refactor AIO interface to allow other AIO implementations
Daniel P. Berrange wrote: Those cases aren't always discoverable. Linux-aio just falls back to using synchronous IO. It's pretty terrible. We need a new AIO interface for Linux (and yes, we're working on this). Once we have something better, we'll change that to be the default and things will Just Work for most users. If QEMU can't discover cases where it won't work, what criteria should the end user use to decide between the impls, or for that matter, what criteria should a management api/app like libvirt use ? If the only decision logic is 'try it benchmark your VM' then its not a particularly useful option. Good use of Linux-AIO requires that you basically know which cases it handles well, and which ones it doesn't. Falling back to synchronous I/O with no indication (except speed) is a pretty atrocious API imho. But that's what the Linux folks decided to do. I suspect what you have to do is: 1. Try opening the file with O_DIRECT. 2. Use fstat to check the filesystem type and block device type. 3. If it's on a whitelist of filesystem types, 4. and a whitelist of block device types, 5. and the kernel version is later than an fs+bd-dependent value, 6. then select an alignment size (kernel version dependent) and use Linux-AIO with it. Otherwise don't use Linux-AIO. You may then decide to use Glibc's POSIX-AIO (which uses threads), or use threads for I/O yourself. In future, the above recipe will be more complicated, in that you have to use the same decision tree to decide between: - Synchronous IO. - Your own thread based IO. - Glibc POSIX-AIO using threads. - Linux-AIO. - Virtio thing or whatever is based around vringfd. - Syslets if they gain traction and perform well. I've basically got a choice of making libvirt always ad '-aio linux' or never add it at all. My inclination is to the latter since it is compatible with existing QEMU which has no -aio option. Presumably '-aio linux' is intended to provide some performance benefit so it'd be nice to use it. If we can't express some criteria under which it should be turned on, I can't enable it; where as if you can express some criteria, then QEMU should apply them automatically. I'm of the view that '-aio auto' would be a really good option - and when it's proven itself, it should be the default. It could work on all QEMU hosts: it would pick synchronous IO when there is nothing else. The criteria for selecting a good AIO strategy on Linux are quite complex, and might be worth hard coding. In that case, putting that into QEMU itself would be much better than every program which launches QEMU having it's own implementation of the criteria. Pushing this choice of AIO impls to the app or user invoking QEMU just does not seem like a win here. I think having the choice is very good, because whatever the hard coded selection criteria, there will be times when it's wrong (ideally in conservative ways - it should always be functional, just suboptimal). So I do support this patch to add the switch. But _forcing_ the user to decide is not good, since the criteria are rather obscure and change with things like filesystem. At least, a set of command line options to QEMU ought to work when you copy a VM to another machine! So I think '-aio auto', which invokes the selection criteria of the day and is guaranteed to work (conservatively picking a slower method if it cannot be sure a faster one will work) would be the most useful option of all. -- Jamie - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Extboot Option ROM rewritten in C - v3
Nguyen Anh Quynh wrote: Actually, this code is left from the original code of Anthony, and it seems he took it from qemu 0.8 version. Anthony, may you explain why you want to hijact the linux boot process here? If I understand correctly, we can just let the original int19 execute, and if linux boot is desired, it would work in normal way. So why you want to do this? I'm having exactly the *opposite* question... why does extboot have code to hook int 13h? -hpa - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] gfxboot VMX workaround v2
On Fri, 18 Apr 2008 14:18:16 +0200 Guillaume Thouvenin [EMAIL PROTECTED] wrote: I added the code do dump the instruction and it seems that it's the emulation of 0xe6 (== out imm8, al) that failed. I made modifications to emulate it (see below) and now I have another problem in kvm userspace with the following message (and the emulation doesn't work): enterprise:~ $ kvm_run: Operation not permitted enterprise:~ $ kvm_run returned -1 Ok for this one it seems to be a wrong value in the opcode_table[]. Now it generates an oops. I'm investigating... Regards, Guillaume --- Apr 18 14:48:53 enterprise kernel: [22321.010006] handle_vmentry_failure: invalid guest state Apr 18 14:48:53 enterprise kernel: [22321.011953] handle_vmentry_failure: start emulation Apr 18 14:48:53 enterprise kernel: [22321.015875] c-op_bytes == 2 Apr 18 14:48:53 enterprise kernel: [22321.019862] eip == 0x6e18 Message from [EMAIL PROTECTED] at Fri Apr 18 14:48:54 2008 ... enterprise kernel: [22321.027850] Oops: [2] SMP Message from [EMAIL PROTECTED] at Fri Apr 18 14:48:54 2008 ... enterprise kernel: [22321.027850] Code: 75 58 48 8b 7d 00 e8 64 4f ff ff f6 85 98 00 00 00 01 ba 01 00 00 00 75 04 0f b6 55 4c 48 8b 75 58 48 8d 8d a0 00 00 00 48 89 c7 ff 50 08 e9 f1 07 00 00 8a 45 4c 3c 02 74 0a 3c 04 0f 85 73 13 Message from [EMAIL PROTECTED] at Fri Apr 18 14:48:54 2008 ... enterprise kernel: [22321.027850] CR2: 0008 Apr 18 14:48:54 enterprise kernel: [22321.027850] PGD 36f1a8067 PUD 327c17067 PMD 0 Apr 18 14:48:54 enterprise kernel: [22321.027850] CPU 1 Apr 18 14:48:54 enterprise kernel: [22321.027850] Modules linked in: kvm_intel kvm aic94xx libsas scsi_transport_sas [last unloaded: kvm] Apr 18 14:48:54 enterprise kernel: [22321.027850] Pid: 7814, comm: qemu-system-x86 Tainted: G D 2.6.25 #207 Apr 18 14:48:54 enterprise kernel: [22321.027850] RIP: 0010:[88043933] [88043933] :kvm:x86_emulate_insn+0x2d97/0x414c Apr 18 14:48:54 enterprise kernel: [22321.027850] RSP: 0018:81033005fb68 EFLAGS: 00010202 Apr 18 14:48:54 enterprise kernel: [22321.027850] RAX: RBX: 810344cf9440 RCX: 810344cf9498 Apr 18 14:48:54 enterprise kernel: [22321.027850] RDX: 0001 RSI: 007a RDI: Apr 18 14:48:54 enterprise kernel: [22321.027850] RBP: 810344cf93f8 R08: R09: Apr 18 14:48:54 enterprise kernel: [22321.027850] R10: R11: R12: Apr 18 14:48:54 enterprise kernel: [22321.027850] R13: 88051e50 R14: 810344cf9498 R15: 7ad6 Apr 18 14:48:54 enterprise kernel: [22321.027850] FS: 4108b950() GS:810397c250c0() knlGS: Apr 18 14:48:54 enterprise kernel: [22321.027850] CS: 0010 DS: 002b ES: 002b CR0: 80050033 Apr 18 14:48:54 enterprise kernel: [22321.027850] CR2: 0008 CR3: 0003301b2000 CR4: 26e0 Apr 18 14:48:54 enterprise kernel: [22321.027850] DR0: DR1: DR2: Apr 18 14:48:54 enterprise kernel: [22321.027850] DR3: DR6: 0ff0 DR7: 0400 Apr 18 14:48:54 enterprise kernel: [22321.027850] Process qemu-system-x86 (pid: 7814, threadinfo 81033005e000, task 810396023080) Apr 18 14:48:54 enterprise kernel: [22321.027850] Stack: 81033005fb04 0088 810344cf9438 810344cf9440 Apr 18 14:48:54 enterprise kernel: [22321.027850] 00040040 00055e1c 00055e1c 810344cf9498 Apr 18 14:48:54 enterprise kernel: [22321.027850] 0089 8805087a 810344cf80c0 Apr 18 14:48:54 enterprise kernel: [22321.027850] Call Trace: Apr 18 14:48:54 enterprise kernel: [22321.027850] [88038d91] ? :kvm:emulate_instruction+0x1e5/0x2b9 Apr 18 14:48:54 enterprise kernel: [22321.027850] [88057cd1] ? :kvm_intel:kvm_handle_exit+0xea/0x1e8 Apr 18 14:48:54 enterprise kernel: [22321.027850] [88057a96] ? :kvm_intel:vmx_intr_assist+0x68/0x1b9 Apr 18 14:48:54 enterprise kernel: [22321.027850] [80563398] ? __down_read+0x12/0xa1 Apr 18 14:48:54 enterprise kernel: [22321.027850] [8803b940] ? :kvm:kvm_arch_vcpu_ioctl_run+0x4ae/0x631 Apr 18 14:48:54 enterprise kernel: [22321.027850] [80291ec9] ? touch_atime+0xae/0xed Apr 18 14:48:54 enterprise kernel: [22321.027850] [8803672e] ? :kvm:kvm_vcpu_ioctl+0xf3/0x3a1 Apr 18 14:48:54 enterprise kernel: [22321.027850] [802802c0] ? do_sync_read+0xd1/0x118 Apr 18 14:48:54 enterprise kernel: [22321.027850] [880363b1] ? :kvm:kvm_vm_ioctl+0x1ab/0x1c3 Apr 18 14:48:54 enterprise kernel: [22321.027850] [8028ae49] ? vfs_ioctl+0x21/0x6b Apr 18 14:48:54 enterprise kernel: [22321.027850] [8028b0e6] ? do_vfs_ioctl+0x253/0x264 Apr 18 14:48:54 enterprise kernel: [22321.027850]
Re: [kvm-devel] Extboot Option ROM rewritten in C - v3
Nguyen Anh Quynh wrote: On Thu, Apr 17, 2008 at 3:00 PM, H. Peter Anvin [EMAIL PROTECTED] wrote: + .globl linux_boot +linux_boot: + cli + cld + mov $0x9000, %ax + mov %ax, %ds + mov %ax, %es + mov %ax, %fs + mov %ax, %gs + mov %ax, %ss + mov $0x8ffe, %sp + ljmp $0x9000 + 0x20, $0 The hard use of segment 9000 is really highly unfortunate for bzImage, since it restricts its heap more than necessary. I suggest following the patterns used by the (new) Qemu loader. Actually, this code is left from the original code of Anthony, and it seems he took it from qemu 0.8 version. Anthony, may you explain why you want to hijact the linux boot process here? If I understand correctly, we can just let the original int19 execute, and if linux boot is desired, it would work in normal way. So why you want to do this? The thinking is to eliminate the need to hijack the boot sector when using the -kernel option. However, the linux boot stuff in extboot has been broken since hpa rewrote the boot code. It can be removed for now and I'll eventually revisit it. Regards, Anthony Liguori Thanks, Q - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Extboot Option ROM rewritten in C - v3
H. Peter Anvin wrote: Nguyen Anh Quynh wrote: Actually, this code is left from the original code of Anthony, and it seems he took it from qemu 0.8 version. Anthony, may you explain why you want to hijact the linux boot process here? If I understand correctly, we can just let the original int19 execute, and if linux boot is desired, it would work in normal way. So why you want to do this? I'm having exactly the *opposite* question... why does extboot have code to hook int 13h? extboot is primarily intended to allow scsi boot or boot from a pv disk. It hooks int13h to fake out disk access. Regards, Anthony Liguori -hpa - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] gfxboot VMX workaround v2
Guillaume Thouvenin wrote: On Tue, 15 Apr 2008 16:06:43 +0300 Avi Kivity [EMAIL PROTECTED] wrote: ... handle_vmentry_failure: invalid guest state handle_vmentry_failure: start emulation handle_vmentry_failure: emulation failed What instruction failed, exactly? I added the code do dump the instruction and it seems that it's the emulation of 0xe6 (== out imm8, al) that failed. I made modifications to emulate it (see below) and now I have another problem in kvm userspace with the following message (and the emulation doesn't work): enterprise:~ $ kvm_run: Operation not permitted enterprise:~ $ kvm_run returned -1 You need to load rip as well. Ooops, yes. So jump far emulation is now like: + case 0xea: /* jmp far */ { + struct kvm_segment kvm_seg; + long int eip; + int ret; + + kvm_x86_ops-get_segment(ctxt-vcpu, kvm_seg, VCPU_SREG_CS); + + ret = load_segment_descriptor(ctxt-vcpu, kvm_seg.selector, 9, VCPU_SREG_CS); + if (ret 0){ + printk(KERN_INFO %s: Failed to load CS descriptor\n, __FUNCTION__); + goto cannot_emulate; + } + + switch (c-op_bytes) { + case 2: + eip = insn_fetch(s16, 2, c-eip); + break; + case 4: + eip = insn_fetch(s32, 4, c-eip); + break; + default: + DPRINTF(jmp far: Invalid op_bytes\n); + goto cannot_emulate; + } + printk(KERN_INFO eip == 0x%lx\n, eip); + c-eip = eip; + break; + } It seems that the jump to cs:eip works and now I have the following error: [18535.446917] handle_vmentry_failure: invalid guest state [18535.449519] handle_vmentry_failure: start emulation [18535.457519] eip == 0x6e18 [18535.467685] handle_vmentry_failure: emulation of 0xe6 failed For the emulation of 0xe6 I used the following one that I found in nitin's tree: This doesn't seem right. You should have been able to break out of the emulator long before encountering an out instruction. The next instruction you encounter should be a mov instruction. Are you sure you're updating eip correctly? Regards, Anthony Liguori + case 0xe6: /* out imm8, al */ + case 0xe7: /* out imm8, ax/eax */ { + struct kvm_io_device *pio_dev; + + pio_dev = vcpu_find_pio_dev(ctxt-vcpu, c-src.val); + kvm_iodevice_write(pio_dev, c-src.val, + (c-d ByteOp) ? 1 : c-op_bytes, + c-regs[VCPU_REGS_RAX]); + } + break; I will look closer where is the problem and as you suggested, I will display the instruction to be emulated and the register state before and after, and compare with the expected state. Thanks for your help, Regards, Guillaume - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 3/6] KVM: MMU: Add EPT support
Yang, Sheng wrote: @@ -1048,17 +1071,18 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, * whether the guest actually used the pte (in order to detect * demand paging). */ - spte = PT_PRESENT_MASK | PT_DIRTY_MASK; + spte = shadow_base_present_pte | shadow_dirty_mask; if (!speculative) pte_access |= PT_ACCESSED_MASK; if (!dirty) pte_access = ~ACC_WRITE_MASK; - if (!(pte_access ACC_EXEC_MASK)) - spte |= PT64_NX_MASK; - - spte |= PT_PRESENT_MASK; + if (pte_access ACC_EXEC_MASK) { + if (shadow_x_mask) + spte |= shadow_x_mask; + } else if (shadow_nx_mask) + spte |= shadow_nx_mask; This looks like it may be a bug. The old behavior sets NX if (pte_access ACC_EXEC_MASK). The new behavior unconditionally sets NX and never sets PRESENT. Also, the if (shadow_x_mask) checks are unnecessary. spte |= 0 is a nop. if (pte_access ACC_USER_MASK) - spte |= PT_USER_MASK; + spte |= shadow_user_mask; if (largepage) spte |= PT_PAGE_SIZE_MASK; Regards, Anthony Liguori - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Extboot Option ROM rewritten in C - v3
On 4/18/08, Anthony Liguori [EMAIL PROTECTED] wrote: Nguyen Anh Quynh wrote: On Thu, Apr 17, 2008 at 3:00 PM, H. Peter Anvin [EMAIL PROTECTED] wrote: + .globl linux_boot +linux_boot: + cli + cld + mov $0x9000, %ax + mov %ax, %ds + mov %ax, %es + mov %ax, %fs + mov %ax, %gs + mov %ax, %ss + mov $0x8ffe, %sp + ljmp $0x9000 + 0x20, $0 The hard use of segment 9000 is really highly unfortunate for bzImage, since it restricts its heap more than necessary. I suggest following the patterns used by the (new) Qemu loader. Actually, this code is left from the original code of Anthony, and it seems he took it from qemu 0.8 version. Anthony, may you explain why you want to hijact the linux boot process here? If I understand correctly, we can just let the original int19 execute, and if linux boot is desired, it would work in normal way. So why you want to do this? The thinking is to eliminate the need to hijack the boot sector when using the -kernel option. I see, but does that offer any advantage over the current approach? Thanks, Q - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Extboot Option ROM rewritten in C - v3
Anthony Liguori wrote: The thinking is to eliminate the need to hijack the boot sector when using the -kernel option. However, the linux boot stuff in extboot has been broken since hpa rewrote the boot code. It can be removed for now and I'll eventually revisit it. It probably makes more sense to have a different boot ROM for that. I thought this was extboot, but apparently not. I probably can throw something together. -hpa - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] gfxboot VMX workaround v2
On Fri, 18 Apr 2008 08:23:07 -0500 Anthony Liguori [EMAIL PROTECTED] wrote: This doesn't seem right. You should have been able to break out of the emulator long before encountering an out instruction. The next instruction you encounter should be a mov instruction. Are you sure you're updating eip correctly? I think that eip is updated correctly but you're right, I think that the condition to stop emulation is not well implemented. I emulate a lot of mov instructions and I remain blocked in the emulation loop until I reach the out instruction. The loop is the following: [...] cs_rpl = vmcs_read16(GUEST_CS_SELECTOR) SELECTOR_RPL_MASK; ss_rpl = vmcs_read16(GUEST_SS_SELECTOR) SELECTOR_RPL_MASK; while (cs_rpl != ss_rpl) { if (emulate_instruction(vcpu, NULL, 0,0, 0) == EMULATE_FAIL) { printk(KERN_INFO %s: emulation of 0x%x failed\n, __FUNCTION__, vcpu-arch.emulate_ctxt.decode.b); return -1; } cs_rpl = vmcs_read16(GUEST_CS_SELECTOR) SELECTOR_RPL_MASK; ss_rpl = vmcs_read16(GUEST_SS_SELECTOR) SELECTOR_RPL_MASK; } printk(KERN_INFO %s: VMX friendly state recovered\n, __FUNCTION__); // I never reach this point Maybe CS and SS selector are not well updated. I will add trace to see their values before and after the emulation. Regards, Guillaume - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] pv clock: kvm is incompatible with xen :-(
Jeremy Fitzhardinge wrote: Gerd Hoffmann wrote: Wall clock is off a few hours though. Oops. I think the way wall clock and system clock work together in xen (Jeremy correct me if I'm wrong) is that the wall clock specifies the point in time where the system clock started going. As kvm fills in host system time into the guest system time fields the guest wall clock fields should be filled with the host boot time timestamp I'd say. Yes. The wallclock field in the shared info structure is the wallclock time at boot; you compute the current time by adding the system timestamp to it. System time changes are effected by retroactively changing the boot time of the machine, though that can also change because of suspend/resume/migrate. In general the kernel only reads the wallclock time at boot, and then maintains it for itself from then on. I think. Thanks. I'm looking at the guest side of the issue right now, trying to identify common code, and while doing so noticed that xen does the version-check-loop in both get_time_values_from_xen(void) and xen_clocksource_read(void), and I can't see any obvious reason for that. The loop in xen_clocksource_read(void) is not needed IMHO. Can I drop it? cheers, Gerd -- http://kraxel.fedorapeople.org/xenner/ - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Extboot Option ROM rewritten in C - v3
Anthony Liguori wrote: Nguyen Anh Quynh wrote: The thinking is to eliminate the need to hijack the boot sector when using the -kernel option. I see, but does that offer any advantage over the current approach? You no longer have to specify a -hda option when using -kernel. Plus, you don't have funny side effects if you do -- and I suspect there is ad hoc code in the disk driver which can be removed. -hpa - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Extboot Option ROM rewritten in C - v3
On 4/18/08, Anthony Liguori [EMAIL PROTECTED] wrote: Nguyen Anh Quynh wrote: The thinking is to eliminate the need to hijack the boot sector when using the -kernel option. I see, but does that offer any advantage over the current approach? You no longer have to specify a -hda option when using -kernel. Without -hda, how can we load disk image? Or you mean you only want to test the kernel? Thanks, Q - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 3/3] Implement linux-aio backend
On Thu, Apr 17, 2008 at 02:26:52PM -0500, Anthony Liguori wrote: This patch introduces a Linux-aio backend that is disabled by default. To use this backend effectively, the user should disable caching and select it with the appropriate -aio option. For instance: qemu-system-x86_64 -drive foo.img,cache=off -aio linux There's no universal way to asynchronous wait with linux-aio. At some point, signals were added to signal completion. More recently, and eventfd interface was added. This patch relies on the later. We try hard to detect whether the right support is available in configure to avoid compile failures. +do { + err = io_submit(aio_ctxt_id, 1, iocbs); +} while (err == -1 errno == EINTR); + +if (err != 1) { + fprintf(stderr, failed to submit aio request: %m\n); + exit(1); +} + +outstanding_requests++; + +return aiocb-common; +} + +static void la_wait(void) +{ +main_loop_wait(10); +} Sleeping in the context of vcpu's is extremely bad (eg virtio-block blocks in write() throttling which kills performance). It should wait on IO completions instead (qemu-kvm.c creates a pthread waitqueue to resolve that issue). Other than that looks fine to me, will give it a try. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 3/6] KVM: MMU: Add EPT support
On Friday 18 April 2008 21:30:14 Anthony Liguori wrote: Yang, Sheng wrote: @@ -1048,17 +1071,18 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, * whether the guest actually used the pte (in order to detect * demand paging). */ - spte = PT_PRESENT_MASK | PT_DIRTY_MASK; + spte = shadow_base_present_pte | shadow_dirty_mask; if (!speculative) pte_access |= PT_ACCESSED_MASK; if (!dirty) pte_access = ~ACC_WRITE_MASK; - if (!(pte_access ACC_EXEC_MASK)) - spte |= PT64_NX_MASK; - - spte |= PT_PRESENT_MASK; + if (pte_access ACC_EXEC_MASK) { + if (shadow_x_mask) + spte |= shadow_x_mask; + } else if (shadow_nx_mask) + spte |= shadow_nx_mask; This looks like it may be a bug. The old behavior sets NX if (pte_access ACC_EXEC_MASK). The new behavior unconditionally sets NX and never sets PRESENT. Also, the if (shadow_x_mas k) checks are unnecessary. spte |= 0 is a nop. Thanks for the comment! I realized two judgments of shadow_nx/x_mask is unnecessary... In fact, the correct behavior is either set shadow_x_mask or shadow_nx_mask, may be there is a better approach for this. The logic assured by program itself is always safer. But I will remove the redundant code at first. But I don't think it's a bug. The old behavior set NX if (!(pte_access ACC_EXEC_MASK)), the same as the new one. And I also curious about the PRESENT bit. You see, the PRESENT bit was set at the beginning of the code, and I really don't know why the duplicate one exists there... if (pte_access ACC_USER_MASK) - spte |= PT_USER_MASK; + spte |= shadow_user_mask; if (largepage) spte |= PT_PAGE_SIZE_MASK; -- Thanks Yang, Sheng Regards, Anthony Liguori - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 3/3] Implement linux-aio backend
Marcelo Tosatti wrote: On Thu, Apr 17, 2008 at 02:26:52PM -0500, Anthony Liguori wrote: This patch introduces a Linux-aio backend that is disabled by default. To use this backend effectively, the user should disable caching and select it with the appropriate -aio option. For instance: qemu-system-x86_64 -drive foo.img,cache=off -aio linux There's no universal way to asynchronous wait with linux-aio. At some point, signals were added to signal completion. More recently, and eventfd interface was added. This patch relies on the later. We try hard to detect whether the right support is available in configure to avoid compile failures. +do { +err = io_submit(aio_ctxt_id, 1, iocbs); +} while (err == -1 errno == EINTR); + +if (err != 1) { +fprintf(stderr, failed to submit aio request: %m\n); +exit(1); +} + +outstanding_requests++; + +return aiocb-common; +} + +static void la_wait(void) +{ +main_loop_wait(10); +} Sleeping in the context of vcpu's is extremely bad (eg virtio-block blocks in write() throttling which kills performance). It should wait on IO completions instead (qemu-kvm.c creates a pthread waitqueue to resolve that issue). Other than that looks fine to me, will give it a try. FWIW, I'm not getting wonderful results in KVM. It's hard to tell though because time seems wildly inaccurate (even with kvm clock in the guest). The time issue appears unrelated to this set of patches. Regards, Anthony Liguori - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Extboot Option ROM rewritten in C - v3
Nguyen Anh Quynh wrote: You no longer have to specify a -hda option when using -kernel. Without -hda, how can we load disk image? Or you mean you only want to test the kernel? Right. You may be booting from NFS, iSCSI, or something like that. Regards, Anthony Liguori Thanks, Q - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/3] Refactor AIO interface to allow other AIO implementations
Jamie Lokier wrote: I've basically got a choice of making libvirt always ad '-aio linux' or never add it at all. My inclination is to the latter since it is compatible with existing QEMU which has no -aio option. Presumably '-aio linux' is intended to provide some performance benefit so it'd be nice to use it. If we can't express some criteria under which it should be turned on, I can't enable it; where as if you can express some criteria, then QEMU should apply them automatically. I'm of the view that '-aio auto' would be a really good option - and when it's proven itself, it should be the default. It could work on all QEMU hosts: it would pick synchronous IO when there is nothing else. Right now, not specifying the -aio option is equivalent to your proposed -aio auto. I guess I should include an info aio to let the user know what type of aio they are using. We can add selection criteria later but semantically, not specifying an explicit -aio option allows QEMU to choose whichever one it thinks is best. Regards, Anthony Liguori - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] gfxboot VMX workaround v2
Guillaume Thouvenin wrote: On Fri, 18 Apr 2008 08:23:07 -0500 Anthony Liguori [EMAIL PROTECTED] wrote: This doesn't seem right. You should have been able to break out of the emulator long before encountering an out instruction. The next instruction you encounter should be a mov instruction. Are you sure you're updating eip correctly? I think that eip is updated correctly but you're right, I think that the condition to stop emulation is not well implemented. I emulate a lot of mov instructions and I remain blocked in the emulation loop until I reach the out instruction. The loop is the following: [...] cs_rpl = vmcs_read16(GUEST_CS_SELECTOR) SELECTOR_RPL_MASK; ss_rpl = vmcs_read16(GUEST_SS_SELECTOR) SELECTOR_RPL_MASK; while (cs_rpl != ss_rpl) { if (emulate_instruction(vcpu, NULL, 0,0, 0) == EMULATE_FAIL) { printk(KERN_INFO %s: emulation of 0x%x failed\n, __FUNCTION__, vcpu-arch.emulate_ctxt.decode.b); return -1; } cs_rpl = vmcs_read16(GUEST_CS_SELECTOR) SELECTOR_RPL_MASK; ss_rpl = vmcs_read16(GUEST_SS_SELECTOR) SELECTOR_RPL_MASK; } printk(KERN_INFO %s: VMX friendly state recovered\n, __FUNCTION__); // I never reach this point Maybe CS and SS selector are not well updated. I will add trace to see their values before and after the emulation. I'd prefer you not do an emulate_instruction loop at all. Just emulate one instruction on vmentry failure and let VT tell you what instructions you need to emulate. It's only four instructions so I don't think the performance is going to matter. Take a look at the patch I posted previously. Regards, Anthony Liguori Regards, Guillaume - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 1/1] Enble a guest to access a device's memory mapped I/O regions directly.
[EMAIL PROTECTED] wrote: From: Ben-Ami Yassour [EMAIL PROTECTED] Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED] Signed-off-by: Muli Ben-Yehuda [EMAIL PROTECTED] --- arch/x86/kvm/mmu.c | 59 +-- arch/x86/kvm/paging_tmpl.h | 19 + include/linux/kvm_host.h |2 +- virt/kvm/kvm_main.c| 17 +++- 4 files changed, 69 insertions(+), 28 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 078a7f1..c89029d 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -112,6 +112,8 @@ static int dbg = 1; #define PT_FIRST_AVAIL_BITS_SHIFT 9 #define PT64_SECOND_AVAIL_BITS_SHIFT 52 +#define PT_SHADOW_IO_MARK (1ULL PT_FIRST_AVAIL_BITS_SHIFT) + Please rename this PT_SHADOW_MMIO_MASK. #define VALID_PAGE(x) ((x) != INVALID_PAGE) #define PT64_LEVEL_BITS 9 @@ -237,6 +239,9 @@ static int is_dirty_pte(unsigned long pte) static int is_rmap_pte(u64 pte) { + if (pte PT_SHADOW_IO_MARK) + return false; + return is_shadow_present_pte(pte); } Why avoid rmap on mmio pages? Sure it's unnecessary work, but having less cases improves overall reliability. You can use pfn_valid() in gfn_to_pfn() and kvm_release_pfn_*() to conditionally update the page refcounts. -- Any sufficiently difficult bug is indistinguishable from a feature. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 3/6] KVM: MMU: Add EPT support
Yang, Sheng wrote: On Friday 18 April 2008 21:30:14 Anthony Liguori wrote: Yang, Sheng wrote: @@ -1048,17 +1071,18 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, * whether the guest actually used the pte (in order to detect * demand paging). */ - spte = PT_PRESENT_MASK | PT_DIRTY_MASK; + spte = shadow_base_present_pte | shadow_dirty_mask; if (!speculative) pte_access |= PT_ACCESSED_MASK; if (!dirty) pte_access = ~ACC_WRITE_MASK; - if (!(pte_access ACC_EXEC_MASK)) - spte |= PT64_NX_MASK; - - spte |= PT_PRESENT_MASK; + if (pte_access ACC_EXEC_MASK) { + if (shadow_x_mask) + spte |= shadow_x_mask; + } else if (shadow_nx_mask) + spte |= shadow_nx_mask; This looks like it may be a bug. The old behavior sets NX if (pte_access ACC_EXEC_MASK). The new behavior unconditionally sets NX and never sets PRESENT. Also, the if (shadow_x_mas k) checks are unnecessary. spte |= 0 is a nop. Thanks for the comment! I realized two judgments of shadow_nx/x_mask is unnecessary... In fact, the correct behavior is either set shadow_x_mask or shadow_nx_mask, may be there is a better approach for this. The logic assured by program itself is always safer. But I will remove the redundant code at first. But I don't think it's a bug. The old behavior set NX if (!(pte_access ACC_EXEC_MASK)), the same as the new one. The new behavior sets NX regardless of whether (pte_access ACC_EXEC_MASK). Is the desired change to unconditionally set NX? And I also curious about the PRESENT bit. You see, the PRESENT bit was set at the beginning of the code, and I really don't know why the duplicate one exists there... Looking at the code, you appear to be right. In the future, I think you should separate any cleanups (like removing the redundant setting of PRESENT) into a separate patch and stick to just programmatic changes of PT_USER_MASK = shadow_user_mask, etc. in this patch. That makes it a lot easier to review correctness. Regards, Anthony Liguori if (pte_access ACC_USER_MASK) - spte |= PT_USER_MASK; + spte |= shadow_user_mask; if (largepage) spte |= PT_PAGE_SIZE_MASK; - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 1/1] Enble a guest to access a device's memory mapped I/O regions directly.
[EMAIL PROTECTED] wrote: From: Ben-Ami Yassour [EMAIL PROTECTED] Signed-off-by: Ben-Ami Yassour [EMAIL PROTECTED] Signed-off-by: Muli Ben-Yehuda [EMAIL PROTECTED] --- libkvm/libkvm.c | 24 qemu/hw/pci-passthrough.c | 89 +++-- qemu/hw/pci-passthrough.h |2 + 3 files changed, 40 insertions(+), 75 deletions(-) diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c index de91328..8c02af9 100644 --- a/libkvm/libkvm.c +++ b/libkvm/libkvm.c @@ -400,7 +400,7 @@ void *kvm_create_userspace_phys_mem(kvm_context_t kvm, unsigned long phys_start, { int r; int prot = PROT_READ; - void *ptr; + void *ptr = NULL; struct kvm_userspace_memory_region memory = { .memory_size = len, .guest_phys_addr = phys_start, @@ -410,16 +410,24 @@ void *kvm_create_userspace_phys_mem(kvm_context_t kvm, unsigned long phys_start, if (writable) prot |= PROT_WRITE; - ptr = mmap(NULL, len, prot, MAP_ANONYMOUS | MAP_SHARED, -1, 0); - if (ptr == MAP_FAILED) { - fprintf(stderr, create_userspace_phys_mem: %s, strerror(errno)); - return 0; - } + if (len 0) { + ptr = mmap(NULL, len, prot, MAP_ANONYMOUS | MAP_SHARED, -1, 0); + if (ptr == MAP_FAILED) { + fprintf(stderr, create_userspace_phys_mem: %s, + strerror(errno)); + return 0; + } - memset(ptr, 0, len); + memset(ptr, 0, len); + } memory.userspace_addr = (unsigned long)ptr; - memory.slot = get_free_slot(kvm); + + if (len 0) + memory.slot = get_free_slot(kvm); + else + memory.slot = get_slot(phys_start); + r = ioctl(kvm-vm_fd, KVM_SET_USER_MEMORY_REGION, memory); if (r == -1) { fprintf(stderr, create_userspace_phys_mem: %s, strerror(errno)); This looks like support for zero-length memory slots? Why is it needed? It needs to be in a separate patch. diff --git a/qemu/hw/pci-passthrough.c b/qemu/hw/pci-passthrough.c index 7ffcc7b..a5894d9 100644 --- a/qemu/hw/pci-passthrough.c +++ b/qemu/hw/pci-passthrough.c @@ -25,18 +25,6 @@ typedef __u64 resource_size_t; extern kvm_context_t kvm_context; extern FILE *logfile; -CPUReadMemoryFunc *pt_mmio_read_cb[3] = { - pt_mmio_readb, - pt_mmio_readw, - pt_mmio_readl -}; - -CPUWriteMemoryFunc *pt_mmio_write_cb[3] = { - pt_mmio_writeb, - pt_mmio_writew, - pt_mmio_writel -}; - There's at least one use case for keeping mmio in userspace: reverse-engineering a device driver. So if it doesn't cause too much trouble, please keep this an option. -- Any sufficiently difficult bug is indistinguishable from a feature. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH] [QEMU POWERPC] FPRs no longer live in kvm_vcpu
Signed-off-by: Hollis Blanchard [EMAIL PROTECTED] diff --git a/qemu/qemu-kvm-powerpc.c b/qemu/qemu-kvm-powerpc.c --- a/qemu/qemu-kvm-powerpc.c +++ b/qemu/qemu-kvm-powerpc.c @@ -72,7 +72,6 @@ for (i = 0;i 32; i++){ regs.gpr[i] = env-gpr[i]; -regs.fpr[i] = env-fpr[i]; } rc = kvm_set_regs(kvm_context, env-cpu_index, regs); @@ -113,7 +112,6 @@ for (i = 0;i 32; i++){ env-gpr[i] = regs.gpr[i]; -env-fpr[i] = regs.fpr[i]; } } - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] direct mmio for passthrough - kernel part
[EMAIL PROTECTED] wrote: This patch for PCI passthrough devices enables a guest to access a device's memory mapped I/O regions directly, without requiring the host to trap and emulate every MMIO access. Updated from last version: we create a memory slot for each MMIO region of the guest's devices, and then use the /sys/bus/pci/.../resource# mapping to find the hfn for that MMIO region. The kernel part and the userspace part of this patchset apply to Amit's pv-dma tree. Tested on a Lenovo M57p with an e1000 NIC assigned directly to an FC8 guest. Comments are appreciated. I see no support for cache attributes in the page attributes table or mtrr. I guess for most devices this will work (as they will be set as uncachable by the mtrrs), but for display cards we'd need to set the vram as write-combining to get reasonable performance. This requires mtrr and pat emulation in kvm so we detect the guest's intentions. -- Any sufficiently difficult bug is indistinguishable from a feature. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/5] SVM CR8 optimization patches
Joerg Roedel wrote: This patch series implements optimizations to the CR8 intercept handling in SVM. With these patches applied CR8 reads are not intercepted anymore. The writes to CR8 are only intercepted if the TPR masks interrupts. This significantly reduces the number of total CR8 intercepts when running Windows 64 bit versions. Some quick numbers: Boot and shudown of Vista 64: Without these patches: ~38.000.000 CR8 writes intercepted Withthese patches: ~38.000 CR8 writes intercepted Applied all, thanks. Good patchset. -- Any sufficiently difficult bug is indistinguishable from a feature. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] SVM: remove selective CR0 comment
Joerg Roedel wrote: There is not selective cr0 intercept bug. The code in the comment sets the CR0.PG bit. But KVM sets the CR4.PG bit for SVM always to implement the paged real mode. So the 'mov %eax,%cr0' instruction does not change the CR0.PG bit. Selective CR0 intercepts only occur when a bit is actually changed. So its the right behavior that there is no intercept on this instruction. Applied, thanks. -- Any sufficiently difficult bug is indistinguishable from a feature. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Second KVM process hangs eating 80-100% CPU on host during startup
Alex Davis wrote: Host software: Linux 2.6.24.4 KVM 65 (I am using the kernel modules from this release). X11 7.2 from Xorg SDL 1.2.13 GCC 4.1.1 Glibc 2.4 Host hardware: Asus P5B Deluxe (P965 chipset based) motherboard 4 GB RAM Intel E6700 CPU Guest software: Slackware 12.0 installed from CD-ROM. Command used to first KVM instance: /usr/local/bin/qemu-system-x86_64 -hda /spare/vdisk1.img -cdrom /dev/cdrom -boot c -m 384 -net nic,macaddr=DE:AD:BE:EF:11:29 -net tap,ifname=tap0,script=no Command used to start second KVM instance: /usr/local/bin/qemu-system-x86_64 -hda /spare/vdisk2.img -cdrom /dev/cdrom -boot c -m 384 -net nic,macaddr=DE:AD:BE:EF:11:30 -net tap,ifname=tap1,script=no tap0 and tap1 are bridged on the host. The guest OS was installed on /spare/vdisk1.img, which was initially created by /usr/local/bin/qemu-img create -f qcow /spare/vdisk.img 10G After the guest installation completed, vdisk1 was copied to vdisk2. The second instance always stops after printing Checking if the processor honours the WP bit even in supervisor mode... Ok. It stays hung until I press the return key in the first instance; sometimes clicking in another X window will wake it up as well. This is a test machine so I can test patches (almost) at will. Strange. Does pinning each guest to a different cpu help (use 'taskset 1 qemu ... vdisk1.img ', taskset 2 qemu ... vdisk2.img) -- Any sufficiently difficult bug is indistinguishable from a feature. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/3] Refactor AIO interface to allow other AIO implementations
Anthony Liguori wrote: I'm of the view that '-aio auto' would be a really good option - and when it's proven itself, it should be the default. It could work on all QEMU hosts: it would pick synchronous IO when there is nothing else. Right now, not specifying the -aio option is equivalent to your proposed -aio auto. I guess I should include an info aio to let the user know what type of aio they are using. We can add selection criteria later but semantically, not specifying an explicit -aio option allows QEMU to choose whichever one it thinks is best. Great. I guess the next step is to add selection criteria, otherwise a million Wikis will tell everyone to use '-aio linux' :-) Do you know what the selection criteria should be - or is there a document/paper somewhere which says (ideally from benchmarks)? I'm interested for an unrelated project using AIO - so I'm willing to help get this right to some extent. -- Jamie - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] pass virtio disk geometry via config space
Ryan Harper wrote: From: Ryan Harper [EMAIL PROTECTED] Rather than faking up some geometry, allow the backend to push the disk geometry via virtio pci config option. Keep the old geo code around for compatibility. Applied, thanks. struct virtio_blk_config { uint64_t capacity; uint32_t size_max; uint32_t seg_max; +uint16_t cylinders; +uint8_t heads; +uint8_t sectors; }; I packed the structure here to avoid gcc surprises on odd architectures. -- Any sufficiently difficult bug is indistinguishable from a feature. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 0/3] Qemu crashes with pci passthrough
Glauber de Oliveira Costa wrote: Hi, I've got some qemu crashes while trying to passthrough an ide device to a kvm guest. After some investigation, it turned out that register_ioport_{read/write} will abort on errors instead of returning a meaningful error. However, even if we do return an error, the asynchronous nature of pci config space mapping updates makes it a little bit hard to treat. This series of patches basically treats errors in the mapping functions in the pci layer. If anything goes wrong, we unregister the pci device, unmapping any mappings that happened to be sucessfull already. After these patches are applied, a lot of warnings appears. And, you know, everytime there is a warning, god kills a kitten. But I'm not planning on touching the other pieces of qemu code for this until we set up (or not) in this solution Comments are very welcome, specially from qemu folks (since it is a bit invasive) Have you considered, instead of rolling back the changes you already made before the failure, to have a function which checks if an ioport registration will be successful? This may simplify the code. -- Any sufficiently difficult bug is indistinguishable from a feature. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/3] Refactor AIO interface to allow other AIO implementations
Anthony Liguori wrote: Right now, not specifying the -aio option is equivalent to your proposed -aio auto. I guess I should include an info aio to let the user know what type of aio they are using. We can add selection criteria later but semantically, not specifying an explicit -aio option allows QEMU to choose whichever one it thinks is best. For the majority of deployments posix aio should be sufficient. The few that need something else can use Linux aio. Of course, a managed environment can use Linux aio unconditionally if knows the kernel has all the needed goodies. -- Any sufficiently difficult bug is indistinguishable from a feature. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 3/3] Implement linux-aio backend
On Fri, Apr 18, 2008 at 10:18:33AM -0500, Anthony Liguori wrote: Sleeping in the context of vcpu's is extremely bad (eg virtio-block blocks in write() throttling which kills performance). It should wait on IO completions instead (qemu-kvm.c creates a pthread waitqueue to resolve that issue). Other than that looks fine to me, will give it a try. FWIW, I'm not getting wonderful results in KVM. It's hard to tell though because time seems wildly inaccurate (even with kvm clock in the guest). The time issue appears unrelated to this set of patches. Oh, you won't get completion signals on the aio eventfd. You might want to try the select-with-timeout() stuff. Will submit that with proper signalfd emulation shortly. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] VM Snapshots ?
Hi Uri, The method you propose in fact doesn't work (tested with KVM 65) at least for a Windows XP as guest. After performing steps from 1 to 7 with no errors: - In step 8, the VM in question is already loaded and its user interface is showed in the X windows (as mentioned a Windows XP in my tests) - After step 9, the VM seems to be unstopped (no more '[Stopped]' title in the X window caption) but in fact it doesn't runs. The X window appears to respond to mouse events, i.e. the Press Ctrl-Alt to exit grab message appears on mouse click, but the Windows XP interface does not respond. Also, top command shows near 0% CPU usage for the qemu process, so it seems that the Windows XP is not put to run after 'cont' in qemu monitor. It is supposed that this should work? Or this type of guest does not support these operations? Thanks, Duilio Protti Intel Corporation - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] disappointing speed with virtio_blk
Hi Marcelo, http://www.mail-archive.com/kvm-devel@lists.sourceforge.net/msg14732.html I tried it this evening with kvm 66 - which should include your patch, right? No its not included. The issue is being worked on. my bad, sorry. Now I know I really have that patch: qemu-kvm hangs :( I was trying kvm 66 with only the patch listed above applied on an otherwise perfectly working vm with virtio_blk root partition: Last line of the booting kernel in my vnc window: Serial: 8250/16550 driver $Revision 1.90... (you know the rest) an strace of the qemu-kvm gave the following in rapid succession: clock_gettime(CLOCK_MONOTONIC, {2565, 306799672}) = 0 clock_gettime(CLOCK_MONOTONIC, {2565, 307065342}) = 0 clock_gettime(CLOCK_MONOTONIC, {2565, 307354930}) = 0 clock_gettime(CLOCK_MONOTONIC, {2565, 307618803}) = 0 clock_gettime(CLOCK_MONOTONIC, {2565, 307886312}) = 0 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0 timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 3300}}, NULL) = 0 rt_sigtimedwait([USR1 USR2 ALRM IO], {si_signo=SIGALRM, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value={int=0, ptr=0}}, 0xbfe5af88, 8) = 14 rt_sigaction(SIGALRM, NULL, {0x804d8f8, ~[KILL STOP RTMIN RT_1], 0}, 8) = 0 select(12, [6 11], [], [], {0, 0}) = 0 (Timeout) select(0, [], NULL, NULL, {0, 0}) = 0 (Timeout) clock_gettime(CLOCK_MONOTONIC, {2565, 342895116}) = 0 clock_gettime(CLOCK_MONOTONIC, {2565, 343164113}) = 0 clock_gettime(CLOCK_MONOTONIC, {2565, 343454002}) = 0 clock_gettime(CLOCK_MONOTONIC, {2565, 343716804}) = 0 clock_gettime(CLOCK_MONOTONIC, {2565, 343980012}) = 0 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0 timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 3300}}, NULL) = 0 rt_sigtimedwait([USR1 USR2 ALRM IO], {si_signo=SIGALRM, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value={int=0, ptr=0}}, 0xbfe5af88, 8) = 14 rt_sigaction(SIGALRM, NULL, {0x804d8f8, ~[KILL STOP RTMIN RT_1], 0}, 8) = 0 select(12, [6 11], [], [], {0, 0}) = 0 (Timeout) select(0, [], NULL, NULL, {0, 0}) = 0 (Timeout) clock_gettime(CLOCK_MONOTONIC, {2565, 379035364}) = 0 clock_gettime(CLOCK_MONOTONIC, {2565, 379307884}) = 0 clock_gettime(CLOCK_MONOTONIC, {2565, 379589434}) = 0 clock_gettime(CLOCK_MONOTONIC, {2565, 379919100}) = 0 clock_gettime(CLOCK_MONOTONIC, {2565, 380183834}) = 0 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0 timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 3300}}, NULL) = 0 rt_sigtimedwait([USR1 USR2 ALRM IO], {si_signo=SIGALRM, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value={int=0, ptr=0}}, 0xbfe5af88, 8) = 14 rt_sigaction(SIGALRM, NULL, {0x804d8f8, ~[KILL STOP RTMIN RT_1], 0}, 8) = 0 select(12, [6 11], [], [], {0, 0}) = 0 (Timeout) select(0, [], NULL, NULL, {0, 0}) = 0 (Timeout) ... Hope that helps. Kind regards, Gerd -- Address (better: trap) for people I really don't want to get mail from: [EMAIL PROTECTED] - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] disappointing speed with virtio_blk
Hi Gerd, On Fri, Apr 18, 2008 at 11:27:58PM +0200, Gerd von Egidy wrote: Hi Marcelo, http://www.mail-archive.com/kvm-devel@lists.sourceforge.net/msg14732.html I tried it this evening with kvm 66 - which should include your patch, right? No its not included. The issue is being worked on. my bad, sorry. Now I know I really have that patch: qemu-kvm hangs :( I was trying kvm 66 with only the patch listed above applied on an otherwise perfectly working vm with virtio_blk root partition: Last line of the booting kernel in my vnc window: Serial: 8250/16550 driver $Revision 1.90... (you know the rest) When the hang happens, can you run kvm-stat --once (script can be found kvm-66 directory) and paste the result? Can you confirm that reverting the patch fixes it? an strace of the qemu-kvm gave the following in rapid succession: clock_gettime(CLOCK_MONOTONIC, {2565, 306799672}) = 0 clock_gettime(CLOCK_MONOTONIC, {2565, 307065342}) = 0 clock_gettime(CLOCK_MONOTONIC, {2565, 307354930}) = 0 clock_gettime(CLOCK_MONOTONIC, {2565, 307618803}) = 0 clock_gettime(CLOCK_MONOTONIC, {2565, 307886312}) = 0 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0 timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 3300}}, NULL) = 0 rt_sigtimedwait([USR1 USR2 ALRM IO], {si_signo=SIGALRM, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value={int=0, ptr=0}}, 0xbfe5af88, 8) = 14 rt_sigaction(SIGALRM, NULL, {0x804d8f8, ~[KILL STOP RTMIN RT_1], 0}, 8) = 0 This won't help much. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] pv clock: kvm is incompatible with xen :-(
Gerd Hoffmann wrote: I'm looking at the guest side of the issue right now, trying to identify common code, and while doing so noticed that xen does the version-check-loop in both get_time_values_from_xen(void) and xen_clocksource_read(void), and I can't see any obvious reason for that. The loop in xen_clocksource_read(void) is not needed IMHO. Can I drop it? No. The get_nsec_offset() needs to be atomic with respect to the get_time_values() parameters. There could be a loopless __get_time_values() for use in this case, but given that it almost never loops, I don't think its worthwhile. J - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [patch 0/2] virtio-blk async IO
Use the asynchronous version of block IO functions, otherwise guests can block for long periods of time waiting for the operations to complete. -- - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [patch 1/2] QEMU/KVM: provide a reset method for virtio
So drivers can do whatever necessary on reset. Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED] Index: kvm-userspace.aio/qemu/hw/virtio.c === --- kvm-userspace.aio.orig/qemu/hw/virtio.c +++ kvm-userspace.aio/qemu/hw/virtio.c @@ -166,6 +166,9 @@ void virtio_reset(void *opaque) VirtIODevice *vdev = opaque; int i; +if (vdev-reset) +vdev-reset(vdev); + vdev-features = 0; vdev-queue_sel = 0; vdev-status = 0; Index: kvm-userspace.aio/qemu/hw/virtio.h === --- kvm-userspace.aio.orig/qemu/hw/virtio.h +++ kvm-userspace.aio/qemu/hw/virtio.h @@ -119,6 +119,7 @@ struct VirtIODevice uint32_t (*get_features)(VirtIODevice *vdev); void (*set_features)(VirtIODevice *vdev, uint32_t val); void (*update_config)(VirtIODevice *vdev, uint8_t *config); +void (*reset)(VirtIODevice *vdev); VirtQueue vq[VIRTIO_PCI_QUEUE_MAX]; }; -- - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [patch 2/2] QEMU/KVM: virtio-blk async IO
virtio-blk should not use synchronous requests, as that can blocks vcpus outside of guest mode for large periods of time for no reason. The generic block layer could complete AIO's before re-entering guest mode, so that cached reads and writes can be reported ASAP, a job for the block layer. Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED] Index: kvm-userspace.aio/qemu/hw/virtio-blk.c === --- kvm-userspace.aio.orig/qemu/hw/virtio-blk.c +++ kvm-userspace.aio/qemu/hw/virtio-blk.c @@ -77,54 +77,117 @@ static VirtIOBlock *to_virtio_blk(VirtIO return (VirtIOBlock *)vdev; } +typedef struct VirtIOBlockReq +{ +VirtIODevice *vdev; +VirtQueue *vq; +struct iovec in_sg_status; +unsigned int pending; +unsigned int len; +unsigned int elem_idx; +int status; +} VirtIOBlockReq; + +static void virtio_blk_rw_complete(void *opaque, int ret) +{ +VirtIOBlockReq *req = opaque; +struct virtio_blk_inhdr *in; +VirtQueueElement elem; + +req-status |= ret; +if (--req-pending 0) +return; + +elem.index = req-elem_idx; +in = (void *)req-in_sg_status.iov_base; + +in-status = req-status ? VIRTIO_BLK_S_IOERR : VIRTIO_BLK_S_OK; +virtqueue_push(req-vq, elem, req-len); +virtio_notify(req-vdev, req-vq); +qemu_free(req); +} + static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq) { VirtIOBlock *s = to_virtio_blk(vdev); VirtQueueElement elem; +VirtIOBlockReq *req; unsigned int count; while ((count = virtqueue_pop(vq, elem)) != 0) { struct virtio_blk_inhdr *in; struct virtio_blk_outhdr *out; - unsigned int wlen; off_t off; int i; + /* +* FIXME: limit the number of in-flight requests +*/ + req = qemu_malloc(sizeof(VirtIOBlockReq)); + if (!req) + return; + memset(req, 0, sizeof(*req)); + memcpy(req-in_sg_status, elem.in_sg[elem.in_num - 1], + sizeof(req-in_sg_status)); + req-vdev = vdev; + req-vq = vq; + req-elem_idx = elem.index; + out = (void *)elem.out_sg[0].iov_base; in = (void *)elem.in_sg[elem.in_num - 1].iov_base; off = out-sector; if (out-type VIRTIO_BLK_T_SCSI_CMD) { - wlen = sizeof(*in); + unsigned int len = sizeof(*in); + in-status = VIRTIO_BLK_S_UNSUPP; + virtqueue_push(vq, elem, len); + virtio_notify(vdev, vq); + qemu_free(req); + } else if (out-type VIRTIO_BLK_T_OUT) { - wlen = sizeof(*in); + req-pending = elem.out_num - 1; for (i = 1; i elem.out_num; i++) { - bdrv_write(s-bs, off, + bdrv_aio_write(s-bs, off, elem.out_sg[i].iov_base, - elem.out_sg[i].iov_len / 512); + elem.out_sg[i].iov_len / 512, + virtio_blk_rw_complete, + req); off += elem.out_sg[i].iov_len / 512; + req-len += elem.out_sg[i].iov_len; } - in-status = VIRTIO_BLK_S_OK; } else { - wlen = sizeof(*in); + req-pending = elem.in_num - 1; for (i = 0; i elem.in_num - 1; i++) { - bdrv_read(s-bs, off, + bdrv_aio_read(s-bs, off, elem.in_sg[i].iov_base, - elem.in_sg[i].iov_len / 512); + elem.in_sg[i].iov_len / 512, + virtio_blk_rw_complete, + req); off += elem.in_sg[i].iov_len / 512; - wlen += elem.in_sg[i].iov_len; + req-len += elem.in_sg[i].iov_len; } - - in-status = VIRTIO_BLK_S_OK; } - - virtqueue_push(vq, elem, wlen); - virtio_notify(vdev, vq); } +/* + * FIXME: Want to check for completions before returning to guest mode, + * so cached reads and writes are reported as quickly as possible. But + * that should be done in the generic block layer. + */ +} + +static void virtio_blk_reset(VirtIODevice *vdev) +{ +VirtIOBlock *s = to_virtio_blk(vdev); + +/* + * This should cancel pending requests, but can't do nicely until there + * are per-device request lists. + */ +qemu_aio_flush(); } static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) @@ -156,6 +219,7 @@ void *virtio_blk_init(PCIBus *bus, uint1 s-vdev.update_config = virtio_blk_update_config; s-vdev.get_features = virtio_blk_get_features; +s-vdev.reset = virtio_blk_reset; s-bs = bs; bs-devfn = s-vdev.pci_dev.devfn; -- - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's
[kvm-devel] kvm-trace help
I am trying to add a trace marker and the data is coming out all 0's. e.g., 0 (+ 0) PTE_WRITE vcpu = 0x0001 pid = 0x240d [ gpa = 0x gpte = 0x ] Patch is attached. I know the data is non-zero as I added an if check before calling the trace to only do the trace if the data is non-zero. Anyone have suggestions on what I am missing? thanks, david diff -rb -U 10 kvm-66.orig/kernel/include/asm/kvm.h kvm-66/kernel/include/asm/kvm.h --- kvm-66.orig/kernel/include/asm/kvm.h 2008-04-16 08:29:14.0 -0600 +++ kvm-66/kernel/include/asm/kvm.h 2008-04-18 12:41:07.0 -0600 @@ -221,12 +221,14 @@ #define KVM_TRC_MSR_READ (KVM_TRC_HANDLER + 0x0B) #define KVM_TRC_MSR_WRITE(KVM_TRC_HANDLER + 0x0C) #define KVM_TRC_CPUID(KVM_TRC_HANDLER + 0x0D) #define KVM_TRC_INTR (KVM_TRC_HANDLER + 0x0E) #define KVM_TRC_NMI (KVM_TRC_HANDLER + 0x0F) #define KVM_TRC_VMMCALL (KVM_TRC_HANDLER + 0x10) #define KVM_TRC_HLT (KVM_TRC_HANDLER + 0x11) #define KVM_TRC_CLTS (KVM_TRC_HANDLER + 0x12) #define KVM_TRC_LMSW (KVM_TRC_HANDLER + 0x13) #define KVM_TRC_APIC_ACCESS (KVM_TRC_HANDLER + 0x14) +#define KVM_TRC_PTE_WRITE(KVM_TRC_HANDLER + 0x15) +#define KVM_TRC_PTE_FLOODED (KVM_TRC_HANDLER + 0x16) #endif diff -rb -U 10 kvm-66.orig/kernel/include/asm-x86/kvm.h kvm-66/kernel/include/asm-x86/kvm.h --- kvm-66.orig/kernel/include/asm-x86/kvm.h 2008-04-16 08:29:14.0 -0600 +++ kvm-66/kernel/include/asm-x86/kvm.h 2008-04-18 12:41:07.0 -0600 @@ -221,12 +221,14 @@ #define KVM_TRC_MSR_READ (KVM_TRC_HANDLER + 0x0B) #define KVM_TRC_MSR_WRITE(KVM_TRC_HANDLER + 0x0C) #define KVM_TRC_CPUID(KVM_TRC_HANDLER + 0x0D) #define KVM_TRC_INTR (KVM_TRC_HANDLER + 0x0E) #define KVM_TRC_NMI (KVM_TRC_HANDLER + 0x0F) #define KVM_TRC_VMMCALL (KVM_TRC_HANDLER + 0x10) #define KVM_TRC_HLT (KVM_TRC_HANDLER + 0x11) #define KVM_TRC_CLTS (KVM_TRC_HANDLER + 0x12) #define KVM_TRC_LMSW (KVM_TRC_HANDLER + 0x13) #define KVM_TRC_APIC_ACCESS (KVM_TRC_HANDLER + 0x14) +#define KVM_TRC_PTE_WRITE(KVM_TRC_HANDLER + 0x15) +#define KVM_TRC_PTE_FLOODED (KVM_TRC_HANDLER + 0x16) #endif diff -rb -U 10 kvm-66.orig/kernel/mmu.c kvm-66/kernel/mmu.c --- kvm-66.orig/kernel/mmu.c 2008-04-16 08:29:14.0 -0600 +++ kvm-66/kernel/mmu.c 2008-04-18 11:50:16.0 -0600 @@ -1662,20 +1662,22 @@ if (r) return; memcpy((void *)gpte + (gpa % 8), new, 4); } else if ((bytes == 8) (gpa % 8 == 0)) { memcpy((void *)gpte, new, 8); } } else { if ((bytes == 4) (gpa % 4 == 0)) memcpy((void *)gpte, new, 4); } + KVMTRACE_4D(PTE_WRITE, vcpu, (u32) gpa, (u32)(gpa32), + (u32) gpte, (u32)(gpte32), handler); if (!is_present_pte(gpte)) return; gfn = (gpte PT64_BASE_ADDR_MASK) PAGE_SHIFT; down_read(current-mm-mmap_sem); if (is_large_pte(gpte) is_largepage_backed(vcpu, gfn)) { gfn = ~(KVM_PAGES_PER_HPAGE-1); vcpu-arch.update_pte.largepage = 1; } pfn = gfn_to_pfn(vcpu-kvm, gfn); @@ -1711,21 +1713,22 @@ pgprintk(%s: gpa %llx bytes %d\n, __func__, gpa, bytes); mmu_guess_page_from_pte_write(vcpu, gpa, new, bytes); spin_lock(vcpu-kvm-mmu_lock); kvm_mmu_free_some_pages(vcpu); ++vcpu-kvm-stat.mmu_pte_write; kvm_mmu_audit(vcpu, pre pte write); if (gfn == vcpu-arch.last_pt_write_gfn !last_updated_pte_accessed(vcpu)) { ++vcpu-arch.last_pt_write_count; - if (vcpu-arch.last_pt_write_count = 3) + if (vcpu-arch.last_pt_write_count = 4) + KVMTRACE_0D(PTE_FLOODED, vcpu, handler); flooded = 1; } else { vcpu-arch.last_pt_write_gfn = gfn; vcpu-arch.last_pt_write_count = 1; vcpu-arch.last_pte_updated = NULL; } index = kvm_page_table_hashfn(gfn); bucket = vcpu-kvm-arch.mmu_page_hash[index]; hlist_for_each_entry_safe(sp, node, n, bucket, hash_link) { if (sp-gfn != gfn || sp-role.metaphysical) diff -rb -U 10 kvm-66.orig/user/formats kvm-66/user/formats --- kvm-66.orig/user/formats 2008-04-15 07:35:58.0 -0600 +++ kvm-66/user/formats 2008-04-18 12:46:36.0 -0600 @@ -15,10 +15,12 @@ 0x0002000B %(tsc)d (+%(reltsc)8d) MSR_READ vcpu = 0x%(vcpu)08x pid = 0x%(pid)08x [ MSR# = 0x%(1)08x, data = 0x%(3)08x %(2)08x ] 0x0002000C %(tsc)d (+%(reltsc)8d) MSR_WRITE vcpu = 0x%(vcpu)08x pid = 0x%(pid)08x [ MSR# = 0x%(1)08x, data = 0x%(3)08x %(2)08x ] 0x0002000D %(tsc)d (+%(reltsc)8d) CPUID vcpu = 0x%(vcpu)08x pid = 0x%(pid)08x [ func = 0x%(1)08x, eax = 0x%(2)08x, ebx = 0x%(3)08x, ecx = 0x%(4)08x edx = 0x%(5)08x] 0x0002000E %(tsc)d (+%(reltsc)8d) INTR vcpu = 0x%(vcpu)08x pid = 0x%(pid)08x [ vector = 0x%(1)02x ] 0x0002000F %(tsc)d (+%(reltsc)8d) NMI vcpu = 0x%(vcpu)08x pid = 0x%(pid)08x 0x00020010 %(tsc)d (+%(reltsc)8d) VMMCALL
Re: [kvm-devel] Second KVM process hangs eating 80-100% CPU on host during startup
--- On Fri, 4/18/08, Avi Kivity [EMAIL PROTECTED] wrote: From: Avi Kivity [EMAIL PROTECTED] Subject: Re: [kvm-devel] Second KVM process hangs eating 80-100% CPU on host during startup To: Alex Davis [EMAIL PROTECTED] Cc: kvm-devel@lists.sourceforge.net Date: Friday, April 18, 2008, 12:12 PM Alex Davis wrote: Host software: Linux 2.6.24.4 KVM 65 (I am using the kernel modules from this release). X11 7.2 from Xorg SDL 1.2.13 GCC 4.1.1 Glibc 2.4 Host hardware: Asus P5B Deluxe (P965 chipset based) motherboard 4 GB RAM Intel E6700 CPU Guest software: Slackware 12.0 installed from CD-ROM. Command used to first KVM instance: /usr/local/bin/qemu-system-x86_64 -hda /spare/vdisk1.img -cdrom /dev/cdrom -boot c -m 384 -net nic,macaddr=DE:AD:BE:EF:11:29 -net tap,ifname=tap0,script=no Command used to start second KVM instance: /usr/local/bin/qemu-system-x86_64 -hda /spare/vdisk2.img -cdrom /dev/cdrom -boot c -m 384 -net nic,macaddr=DE:AD:BE:EF:11:30 -net tap,ifname=tap1,script=no tap0 and tap1 are bridged on the host. The guest OS was installed on /spare/vdisk1.img, which was initially created by /usr/local/bin/qemu-img create -f qcow /spare/vdisk.img 10G After the guest installation completed, vdisk1 was copied to vdisk2. The second instance always stops after printing Checking if the processor honours the WP bit even in supervisor mode... Ok. It stays hung until I press the return key in the first instance; sometimes clicking in another X window will wake it up as well. This is a test machine so I can test patches (almost) at will. Strange. Does pinning each guest to a different cpu help (use 'taskset 1 qemu ... vdisk1.img ', taskset 2 qemu ... vdisk2.img) taskset made no difference. Upgrading to kvm-66 didn't help either. Any sufficiently difficult bug is indistinguishable from a feature. Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 0/2] virtio-blk async IO
Hi Marcelo, Use the asynchronous version of block IO functions, otherwise guests can block for long periods of time waiting for the operations to complete. just tried these patches. Results are similar to the last ones: the guest comes up fine but after running 2 or 3 minutes of bonnie++ the guest-vm hangs. This time I used screen on the guest console to try switching to another process - hanging too. Here is the kvm_stat --once output: efer_reload0 0 exits3325114 196 fpu_reload185671 0 halt_exits 1869229 halt_wakeup24807 0 host_state_reload138730859 insn_emulation 1924291 130 insn_emulation_fail0 0 invlpg 0 0 io_exits 35002030 irq_exits 225446 3 irq_window 0 0 mmio_exits917561 0 mmu_cache_miss 55436 0 mmu_flooded64416 0 mmu_pde_zapped 46914 0 mmu_pte_updated 565547 0 mmu_pte_write 650181 0 mmu_recycled 0 0 mmu_shadow_zapped 64416 0 pf_fixed 1229672 0 pf_guest 94338 0 remote_tlb_flush 0 0 request_irq0 0 signal_exits 1 0 tlb_flush 602678 4 Kind regards, Gerd -- Address (better: trap) for people I really don't want to get mail from: james(at)cactusamerica.com - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] kvm-trace help
David S. Ahern wrote: I am trying to add a trace marker and the data is coming out all 0's. e.g., 0 (+ 0) PTE_WRITE vcpu = 0x0001 pid = 0x240d [ gpa = 0x gpte = 0x ] Patch is attached. I know the data is non-zero as I added an if check before calling the trace to only do the trace if the data is non-zero. Anyone have suggestions on what I am missing? thanks, david Hi, david I read your patch and find this: +#define KVM_TRC_PTE_WRITE(KVM_TRC_HANDLER + 0x15) +#define KVM_TRC_PTE_FLOODED (KVM_TRC_HANDLER + 0x16) but in your formats file +0x00020015 %(tsc)d (+%(reltsc)8d) PTE_FLOODED vcpu = 0x%(vcpu)08x pid = 0x%(pid)08x +0x00020016 %(tsc)d (+%(reltsc)8d) PTE_WRITE vcpu = 0x%(vcpu)08x pid = 0x%(pid)08x [ gpa = 0x%(2)08x %(1)08x gpte = 0x%(4)08x %(3)08x ] You mistake the value, right? - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] kvm-trace help
inline. Liu, Eric E wrote: David S. Ahern wrote: I am trying to add a trace marker and the data is coming out all 0's. e.g., 0 (+ 0) PTE_WRITE vcpu = 0x0001 pid = 0x240d [ gpa = 0x gpte = 0x ] Patch is attached. I know the data is non-zero as I added an if check before calling the trace to only do the trace if the data is non-zero. Anyone have suggestions on what I am missing? thanks, david Hi, david I read your patch and find this: +#define KVM_TRC_PTE_WRITE(KVM_TRC_HANDLER + 0x15) +#define KVM_TRC_PTE_FLOODED (KVM_TRC_HANDLER + 0x16) but in your formats file +0x00020015 %(tsc)d (+%(reltsc)8d) PTE_FLOODED vcpu = 0x%(vcpu)08x pid = 0x%(pid)08x +0x00020016 %(tsc)d (+%(reltsc)8d) PTE_WRITE vcpu = 0x%(vcpu)08x pid = 0x%(pid)08x [ gpa = 0x%(2)08x %(1)08x gpte = 0x%(4)08x %(3)08x ] You mistake the value, right? Which value? Do you mean the 0x00020015 and0x00020016? kvm.h shows KVM_TRC_APIC_ACCESS as KVM_TRC_HANDLER + 0x14. I added the PTE_WRITE and PTE_FLOODED after that in kvm.h with the values 0x15 and 0x16. Then in the formats file it shows APIC_ACCESS as 0x00020014, and I added the new PTE entries after that as 20015 and 20016. The kvmtrace_format tool does show those lines in its output which makes me believe these values are ok. What has me puzzled is the 0 values for gpa and gpte. I believe they are not 0 because I added if (gpa || gpte) before the KVMTRACE_4D(PTE_WRITE, ...) line and the lines still show up in the trace output. david - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] kvm-trace help
David S. Ahern wrote: inline. Liu, Eric E wrote: David S. Ahern wrote: I am trying to add a trace marker and the data is coming out all 0's. e.g., 0 (+ 0) PTE_WRITE vcpu = 0x0001 pid = 0x240d [ gpa = 0x gpte = 0x ] Patch is attached. I know the data is non-zero as I added an if check before calling the trace to only do the trace if the data is non-zero. Anyone have suggestions on what I am missing? thanks, david Hi, david I read your patch and find this: +#define KVM_TRC_PTE_WRITE(KVM_TRC_HANDLER + 0x15) +#define KVM_TRC_PTE_FLOODED (KVM_TRC_HANDLER + 0x16) but in your formats file +0x00020015 %(tsc)d (+%(reltsc)8d) PTE_FLOODED vcpu = 0x%(vcpu)08x pid = 0x%(pid)08x +0x00020016 %(tsc)d (+%(reltsc)8d) PTE_WRITE vcpu = 0x%(vcpu)08x pid = 0x%(pid)08x [ gpa = 0x%(2)08x %(1)08x gpte = 0x%(4)08x %(3)08x ] You mistake the value, right? Which value? Do you mean the 0x00020015 and0x00020016? kvm.h shows KVM_TRC_APIC_ACCESS as KVM_TRC_HANDLER + 0x14. I added the PTE_WRITE and PTE_FLOODED after that in kvm.h with the values 0x15 and 0x16. Then in the formats file it shows APIC_ACCESS as 0x00020014, and I added the new PTE entries after that as 20015 and 20016. The kvmtrace_format tool does show those lines in its output which makes me believe these values are ok. I mean the value of PTE_WRITE you write in the formats file ( 0x00020016 )should be same with KVM_TRC_PTE_WRITE you define in kvm.h, but now it is 0x00020015. if not what you get in the text file will be disordered. What has me puzzled is the 0 values for gpa and gpte. I believe they are not 0 because I added if (gpa || gpte) before the KVMTRACE_4D(PTE_WRITE, ...) line and the lines still show up in the trace output. david - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Second KVM process hangs eating 80-100% CPU on host during startup
--- On Fri, 4/18/08, Avi Kivity [EMAIL PROTECTED] wrote: From: Avi Kivity [EMAIL PROTECTED] Subject: Re: [kvm-devel] Second KVM process hangs eating 80-100% CPU on host during startup To: Alex Davis [EMAIL PROTECTED] Cc: kvm-devel@lists.sourceforge.net Date: Friday, April 18, 2008, 12:12 PM Alex Davis wrote: Host software: Linux 2.6.24.4 KVM 65 (I am using the kernel modules from this release). X11 7.2 from Xorg SDL 1.2.13 GCC 4.1.1 Glibc 2.4 Host hardware: Asus P5B Deluxe (P965 chipset based) motherboard 4 GB RAM Intel E6700 CPU Guest software: Slackware 12.0 installed from CD-ROM. Command used to first KVM instance: /usr/local/bin/qemu-system-x86_64 -hda /spare/vdisk1.img -cdrom /dev/cdrom -boot c -m 384 -net nic,macaddr=DE:AD:BE:EF:11:29 -net tap,ifname=tap0,script=no Command used to start second KVM instance: /usr/local/bin/qemu-system-x86_64 -hda /spare/vdisk2.img -cdrom /dev/cdrom -boot c -m 384 -net nic,macaddr=DE:AD:BE:EF:11:30 -net tap,ifname=tap1,script=no tap0 and tap1 are bridged on the host. The guest OS was installed on /spare/vdisk1.img, which was initially created by /usr/local/bin/qemu-img create -f qcow /spare/vdisk.img 10G After the guest installation completed, vdisk1 was copied to vdisk2. The second instance always stops after printing Checking if the processor honours the WP bit even in supervisor mode... Ok. It stays hung until I press the return key in the first instance; sometimes clicking in another X window will wake it up as well. This is a test machine so I can test patches (almost) at will. Strange. Does pinning each guest to a different cpu help (use 'taskset 1 qemu ... vdisk1.img ', taskset 2 qemu ... vdisk2.img) Some additional information: I upgraded the guest to 2.6.25, and added some printk's to init_32.c and init/calibrate.c in the kernel source tree. Here's the output from dmesg for the guest boot: [0.004000] Checking if this processor honours the WP bit even in supervisor mode...Ok. [0.004000] Before cpa_init. [0.004000] CPA: page pool initialized 1 of 1 pages preallocated [0.004000] After cpa_init. [0.004000] After pagealloc [0.004000] After cpu_hotplug_init [0.004000] After kmem_cache_init [0.004000] After setup_percpu_pageset [0.004000] After numa_policy_init [0.004005] After late_time_init [0.004622] Before read_current_timer(pre_start) [0.005314] After read_current_timer() [0.006493] Before read_current_timer(start) [ 16.065027] Before read_current_timer(post_start) [ 16.065753] Before read_current_timer(post_end) [ 16.066437] Before read_current_timer(start) [ 16.073007] Before read_current_timer(post_start) [ 16.081007] Before read_current_timer(post_end) [ 16.081703] Before read_current_timer(start) [ 16.089008] Before read_current_timer(post_start) [ 16.097008] Before read_current_timer(post_end) [ 16.097695] Before read_current_timer(start) [ 16.105010] Before read_current_timer(post_start) [ 16.113009] Before read_current_timer(post_end) [ 16.113697] Before read_current_timer(start) [ 16.121010] Before read_current_timer(post_start) [ 16.129010] Before read_current_timer(post_end) [ 16.129697] calibrate_delay_direct() failed to get a good estimate for loops_per_jiffy. [ 16.129698] Probably due to long platform interrupts. Consider using lpj= boot option. [ 16.132180] Calibrating delay loop... 5308.41 BogoMIPS (lpj=10616832) [ 16.237019] After calibrate_delay Notice how the time jumped from about 0 seconds to 16 seconds. That's where I woke it up by typing in another window. The code seems to be hanging in the call to read_current_timer(start) in function calibrate_delay_direct in init/calibrate.c. Also notice that calibrate_delay_direct() failed. Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel