[PATCH] x86: Rename mtrr_state struct and macro names
From: Sheng Yang [EMAIL PROTECTED] Prepare for exporting them. Signed-off-by: Sheng Yang [EMAIL PROTECTED] Signed-off-by: Avi Kivity [EMAIL PROTECTED] diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c index cb7d3b6..b9574a6 100644 --- a/arch/x86/kernel/cpu/mtrr/generic.c +++ b/arch/x86/kernel/cpu/mtrr/generic.c @@ -14,9 +14,9 @@ #include asm/pat.h #include mtrr.h -struct mtrr_state { - struct mtrr_var_range var_ranges[MAX_VAR_RANGES]; - mtrr_type fixed_ranges[NUM_FIXED_RANGES]; +struct mtrr_state_type { + struct mtrr_var_range var_ranges[MTRR_MAX_VAR_RANGES]; + mtrr_type fixed_ranges[MTRR_NUM_FIXED_RANGES]; unsigned char enabled; unsigned char have_fixed; mtrr_type def_type; @@ -35,7 +35,7 @@ static struct fixed_range_block fixed_range_blocks[] = { }; static unsigned long smp_changes_mask; -static struct mtrr_state mtrr_state = {}; +static struct mtrr_state_type mtrr_state = {}; static int mtrr_state_set; u64 mtrr_tom2; diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c index 885c826..edadf7b 100644 --- a/arch/x86/kernel/cpu/mtrr/main.c +++ b/arch/x86/kernel/cpu/mtrr/main.c @@ -49,7 +49,7 @@ u32 num_var_ranges = 0; -unsigned int mtrr_usage_table[MAX_VAR_RANGES]; +unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES]; static DEFINE_MUTEX(mtrr_mutex); u64 size_or_mask, size_and_mask; @@ -574,7 +574,7 @@ struct mtrr_value { unsigned long lsize; }; -static struct mtrr_value mtrr_state[MAX_VAR_RANGES]; +static struct mtrr_value mtrr_state[MTRR_MAX_VAR_RANGES]; static int mtrr_save(struct sys_device * sysdev, pm_message_t state) { diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h index 2dc4ec6..9885382 100644 --- a/arch/x86/kernel/cpu/mtrr/mtrr.h +++ b/arch/x86/kernel/cpu/mtrr/mtrr.h @@ -11,8 +11,9 @@ #define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg)) #define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1) -#define NUM_FIXED_RANGES 88 -#define MAX_VAR_RANGES 256 +#define MTRR_NUM_FIXED_RANGES 88 +#define MTRR_MAX_VAR_RANGES 256 + #define MTRRfix64K_0_MSR 0x250 #define MTRRfix16K_8_MSR 0x258 #define MTRRfix16K_A_MSR 0x259 @@ -33,7 +34,7 @@ an 8 bit field: */ typedef u8 mtrr_type; -extern unsigned int mtrr_usage_table[MAX_VAR_RANGES]; +extern unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES]; struct mtrr_ops { u32 vendor; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] [RESEND] VT-d: Support multiple device assignment to one guest
Avi Kivity wrote: Han, Weidong wrote: If we devolve this to the iommu API, the same io page table can be shared by all iommus, so long as they all use the same page table format. I don't understand how to handle this by iommu API. Let me explain my thoughts more clearly: VT-d spec says: Context-entries programmed with the same domain identifier must always reference the same address translation structure (through the ASR field). Similarly, context-entries referencing the same address translation structure must be programmed with the same domain id. In native VT-d driver, dmar_domain is per device, and has its own VT-d page table, which is dynamically setup before each DMA. So it is impossible that the same VT-d page table is shared by all iommus. Moveover different iommus in system may have different page table levels. Right. This use case is in essence to prevent unintended sharing. It is also likely to have low page table height, since dma sizes are relatively small. I think it's enough that iommu API tells us its iommu of a device. While this is tangential to our conversation, why? Even for the device driver use case, this only makes the API more complex. If the API hides the existence of multiple iommus, it's easier to use and harder to make a mistake. Whereas in KVM side, the same VT-d page table can be shared by the devices which are under smae iommu and assigned to the same guest, because all of the guest's memory are statically mapped in VT-d page table. But it needs to wrap dmar_domain, this patch wraps it with a reference count for multiple devices relate to same dmar_domain. This patch already adds an API (intel_iommu_device_get_iommu()) in intel-iommu.c, which returns its iommu of a device. There is a missed optimization here. Suppose we have two devices each under a different iommu. With the patch, each will be in a different dmar_domain and so will have a different page table. The amount of memory used is doubled. You cannot let two devices each under a different iommu share one dmar_domain, becasue dmar_domain has a pointer to iommu. Suppose the iommu API hides the existence of multiple iommus. You allocate a translation and add devices to it. When you add a device, the iommu API checks which iommu is needed and programs it accordingly, but only one io page table is used. The other benefit is that iommu developers understand this issues while kvm developers don't, so it's best managed by the iommu API. This way if things change (as usual, becoming more complicated), the iommu can make the changes in their code and hide the complexity from kvm or other users. I'm probably (badly) duplicating Joerg's iommu API here, but this is how it could go: iommu_translation_create() - creates an iommu translation object; this allocates the page tables but doesn't do anything with them iommu_translation_map() - adds pages to the translation iommu_translation_attach() - attach a device to the translation; this locates the iommu and programs it _detach(), _unmap(), and _free() undo these operations. In fact, the exported APIs added for KVM VT-d also do create/map/attach/detach/free functions. Whereas these iommu APIs are more readable. Because kvm VT-d usage is different with native usage, it's inevitable extend native VT-d code to support KVM VT-d (such as wrap dmar_domain). For devices under different iommus, they cannot share the same dmar_domain, thus they cannot share VT-d page table. If we want to handle this by iommu APIs, I suspect we need to change lots of native VT-d driver code. David/Jesse, what's your opinion? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2143498 ] FreeBSD fails to reboot
Bugs item #2143498, was opened at 2008-10-03 05:38 Message generated for change (Comment added) made by rtg20 You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2143498group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Matt Lehner (mlehner) Assigned to: Nobody/Anonymous (nobody) Summary: FreeBSD fails to reboot Initial Comment: Xeon E5430, ubuntu 8.04 x86_64, currently kvm-62. Reports of same issue in kvm76: https://bugs.launchpad.net/ubuntu/+source/kvm/+bug/239107 Host kernel: 2.6.24-19 Guest: FreeBSD 6.2, i386 and amd64 Problem: FreeBSD will start normally as a guest OS. When a reboot is issued to the guest, it will not come back up without destroying the guest, and then starting it again. Screenshots from a reboot: http://lehner.pair.com/screenshot1.png Note that the drive is listed. http://lehner.pair.com/screenshot2.png In the second screenshot, the drive should be listed right after the Timecounters lines. If the guest is destroyed at that point, and restarted it will come up fine. This can be reproduced 100% of the time. Happens both when using a file or a partition for the guest OS. -- Comment By: Roman Yepishev (rtg20) Date: 2008-10-09 10:26 Message: KVM-76: When passed --no-kvm option - the reboot is happening without any problems, so this has something to do with kernel kvm module as well. -- Comment By: Roman Yepishev (rtg20) Date: 2008-10-07 23:35 Message: Reproducible for kvm-74,kvm-75,kvm-76 with FreeBSD 6.2, 6.3 and 7.0. This is KVM-specific bug. QEMU 0.9.1 does not suffer from it. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2143498group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] x86 emulator: Add Src2 decode set
Instruction like shld has three operands, so we need to add a Src2 decode set. We start with Src2None, Src2CL, and Src2Imm8 to support shld and we will expand it later. Signed-off-by: Guillaume Thouvenin [EMAIL PROTECTED] --- arch/x86/kvm/x86_emulate.c| 47 -- include/asm-x86/kvm_x86_emulate.h |1 2 files changed, 36 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c index a391e21..c9ef2da 100644 --- a/arch/x86/kvm/x86_emulate.c +++ b/arch/x86/kvm/x86_emulate.c @@ -59,16 +59,21 @@ #define SrcImm (54) /* Immediate operand. */ #define SrcImmByte (64) /* 8-bit sign-extended immediate operand. */ #define SrcMask (74) +/* Source 2 operand type */ +#define Src2None(07) +#define Src2CL (17) +#define Src2Imm8(27) +#define Src2Mask(77) /* Generic ModRM decode. */ -#define ModRM (17) +#define ModRM (124) /* Destination is only written; never read. */ -#define Mov (18) -#define BitOp (19) -#define MemAbs (110) /* Memory operand is absolute displacement */ -#define String (112) /* String instruction (rep capable) */ -#define Stack (113) /* Stack instruction (push/pop) */ -#define Group (114) /* Bits 3:5 of modrm byte extend opcode */ -#define GroupDual (115) /* Alternate decoding of mod == 3 */ +#define Mov (125) +#define BitOp (126) +#define MemAbs (127) /* Memory operand is absolute displacement */ +#define String (128) /* String instruction (rep capable) */ +#define Stack (129) /* Stack instruction (push/pop) */ +#define Group (130) /* Bits 3:5 of modrm byte extend opcode */ +#define GroupDual (131) /* Alternate decoding of mod == 3 */ #define GroupMask 0xff/* Group number stored in bits 0:7 */ enum { @@ -76,7 +81,7 @@ enum { Group1A, Group3_Byte, Group3, Group4, Group5, Group7, }; -static u16 opcode_table[256] = { +static u32 opcode_table[256] = { /* 0x00 - 0x07 */ ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, @@ -195,7 +200,7 @@ static u16 opcode_table[256] = { ImplicitOps, ImplicitOps, Group | Group4, Group | Group5, }; -static u16 twobyte_table[256] = { +static u32 twobyte_table[256] = { /* 0x00 - 0x0F */ 0, Group | GroupDual | Group7, 0, 0, 0, 0, ImplicitOps, 0, ImplicitOps, ImplicitOps, 0, 0, 0, ImplicitOps | ModRM, 0, 0, @@ -253,7 +258,7 @@ static u16 twobyte_table[256] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; -static u16 group_table[] = { +static u32 group_table[] = { [Group1_80*8] = ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM, @@ -297,7 +302,7 @@ static u16 group_table[] = { SrcMem16 | ModRM | Mov, SrcMem | ModRM | ByteOp, }; -static u16 group2_table[] = { +static u32 group2_table[] = { [Group7*8] = SrcNone | ModRM, 0, 0, 0, SrcNone | ModRM | DstMem | Mov, 0, @@ -1043,6 +1048,24 @@ done_prefixes: break; } + /* +* Decode and fetch the second source operand: register, memory +* or immediate. +*/ + switch (c-d Src2Mask) { + case Src2None: + break; + case Src2CL: + c-src2.val = c-regs[VCPU_REGS_RCX]; + break; + case Src2Imm8: + c-src2.type = OP_IMM; + c-src2.ptr = (unsigned long *)c-eip; + c-src2.bytes = 1; + c-src2.val = insn_fetch(u8, 1, c-eip); + break; + } + /* Decode and fetch the destination operand: register or memory. */ switch (c-d DstMask) { case ImplicitOps: diff --git a/include/asm-x86/kvm_x86_emulate.h b/include/asm-x86/kvm_x86_emulate.h index 4e8c1e4..00de896 100644 --- a/include/asm-x86/kvm_x86_emulate.h +++ b/include/asm-x86/kvm_x86_emulate.h @@ -123,6 +123,7 @@ struct decode_cache { u8 ad_bytes; u8 rex_prefix; struct operand src; + struct operand src2; struct operand dst; bool has_seg_override; u8 seg_override; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] x86: Export some definition of MTRR
For KVM can reuse the type define, and need them to support shadow MTRR. Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- arch/x86/kernel/cpu/mtrr/generic.c | 12 +++- arch/x86/kernel/cpu/mtrr/mtrr.h| 17 - include/asm-x86/mtrr.h | 25 + 3 files changed, 28 insertions(+), 26 deletions(-) diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c index b9574a6..aa414ab 100644 --- a/arch/x86/kernel/cpu/mtrr/generic.c +++ b/arch/x86/kernel/cpu/mtrr/generic.c @@ -14,14 +14,6 @@ #include asm/pat.h #include mtrr.h -struct mtrr_state_type { - struct mtrr_var_range var_ranges[MTRR_MAX_VAR_RANGES]; - mtrr_type fixed_ranges[MTRR_NUM_FIXED_RANGES]; - unsigned char enabled; - unsigned char have_fixed; - mtrr_type def_type; -}; - struct fixed_range_block { int base_msr; /* start address of an MTRR block */ int ranges; /* number of MTRRs in this block */ @@ -35,10 +27,12 @@ static struct fixed_range_block fixed_range_blocks[] = { }; static unsigned long smp_changes_mask; -static struct mtrr_state_type mtrr_state = {}; static int mtrr_state_set; u64 mtrr_tom2; +struct mtrr_state_type mtrr_state = {}; +EXPORT_SYMBOL_GPL(mtrr_state); + #undef MODULE_PARAM_PREFIX #define MODULE_PARAM_PREFIX mtrr. diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h index 9885382..ffd6040 100644 --- a/arch/x86/kernel/cpu/mtrr/mtrr.h +++ b/arch/x86/kernel/cpu/mtrr/mtrr.h @@ -8,12 +8,6 @@ #define MTRRcap_MSR 0x0fe #define MTRRdefType_MSR 0x2ff -#define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg)) -#define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1) - -#define MTRR_NUM_FIXED_RANGES 88 -#define MTRR_MAX_VAR_RANGES 256 - #define MTRRfix64K_0_MSR 0x250 #define MTRRfix16K_8_MSR 0x258 #define MTRRfix16K_A_MSR 0x259 @@ -30,10 +24,6 @@ #define MTRR_CHANGE_MASK_VARIABLE 0x02 #define MTRR_CHANGE_MASK_DEFTYPE 0x04 -/* In the Intel processor's MTRR interface, the MTRR type is always held in - an 8 bit field: */ -typedef u8 mtrr_type; - extern unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES]; struct mtrr_ops { @@ -71,13 +61,6 @@ struct set_mtrr_context { u32 ccr3; }; -struct mtrr_var_range { - u32 base_lo; - u32 base_hi; - u32 mask_lo; - u32 mask_hi; -}; - void set_mtrr_done(struct set_mtrr_context *ctxt); void set_mtrr_cache_disable(struct set_mtrr_context *ctxt); void set_mtrr_prepare_save(struct set_mtrr_context *ctxt); diff --git a/include/asm-x86/mtrr.h b/include/asm-x86/mtrr.h index a69a01a..2c8657b 100644 --- a/include/asm-x86/mtrr.h +++ b/include/asm-x86/mtrr.h @@ -57,6 +57,31 @@ struct mtrr_gentry { }; #endif /* !__i386__ */ +struct mtrr_var_range { + u32 base_lo; + u32 base_hi; + u32 mask_lo; + u32 mask_hi; +}; + +/* In the Intel processor's MTRR interface, the MTRR type is always held in + an 8 bit field: */ +typedef u8 mtrr_type; + +#define MTRR_NUM_FIXED_RANGES 88 +#define MTRR_MAX_VAR_RANGES 256 + +struct mtrr_state_type { + struct mtrr_var_range var_ranges[MTRR_MAX_VAR_RANGES]; + mtrr_type fixed_ranges[MTRR_NUM_FIXED_RANGES]; + unsigned char enabled; + unsigned char have_fixed; + mtrr_type def_type; +}; + +#define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg)) +#define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1) + /* These are the various ioctls */ #define MTRRIOC_ADD_ENTRY_IOW(MTRR_IOCTL_BASE, 0, struct mtrr_sentry) #define MTRRIOC_SET_ENTRY_IOW(MTRR_IOCTL_BASE, 1, struct mtrr_sentry) -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] MTRR/PAT support for EPT (v3)
Hi, Avi Here is the latest update of MTRR/PAT support. Change from v2: Discard the using of MSR bitmap, add MSR_IA32_CR_PAT to save/restore, as well as rebase on latest upstream. Thanks! -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] Enable MTRR for EPT
The effective memory type of EPT is the mixture of MSR_IA32_CR_PAT and memory type field of EPT entry. Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- arch/x86/kvm/mmu.c | 11 ++- arch/x86/kvm/svm.c |6 ++ arch/x86/kvm/vmx.c | 12 +--- arch/x86/kvm/x86.c |2 +- include/asm-x86/kvm_host.h |3 ++- 5 files changed, 28 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f590142..79cb4a9 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -168,6 +168,7 @@ static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ static u64 __read_mostly shadow_user_mask; static u64 __read_mostly shadow_accessed_mask; static u64 __read_mostly shadow_dirty_mask; +static u64 __read_mostly shadow_mt_mask; void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte) { @@ -183,13 +184,14 @@ void kvm_mmu_set_base_ptes(u64 base_pte) EXPORT_SYMBOL_GPL(kvm_mmu_set_base_ptes); void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, - u64 dirty_mask, u64 nx_mask, u64 x_mask) + u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 mt_mask) { shadow_user_mask = user_mask; shadow_accessed_mask = accessed_mask; shadow_dirty_mask = dirty_mask; shadow_nx_mask = nx_mask; shadow_x_mask = x_mask; + shadow_mt_mask = mt_mask; } EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes); @@ -1546,6 +1548,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, { u64 spte; int ret = 0; + u64 mt_mask = shadow_mt_mask; + /* * We don't set the accessed bit, since we sometimes want to see * whether the guest actually used the pte (in order to detect @@ -1564,6 +1568,11 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, spte |= shadow_user_mask; if (largepage) spte |= PT_PAGE_SIZE_MASK; + if (mt_mask) { + mt_mask = get_memory_type(vcpu, gfn) + kvm_x86_ops-get_mt_mask_shift(); + spte |= mt_mask; + } spte |= (u64)pfn PAGE_SHIFT; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 9c4ce65..05efc4e 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1912,6 +1912,11 @@ static int get_npt_level(void) #endif } +static int svm_get_mt_mask_shift(void) +{ + return 0; +} + static struct kvm_x86_ops svm_x86_ops = { .cpu_has_kvm_support = has_svm, .disabled_by_bios = is_disabled, @@ -1967,6 +1972,7 @@ static struct kvm_x86_ops svm_x86_ops = { .set_tss_addr = svm_set_tss_addr, .get_tdp_level = get_npt_level, + .get_mt_mask_shift = svm_get_mt_mask_shift, }; static int __init svm_init(void) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 809427e..3d56554 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3572,6 +3572,11 @@ static int get_ept_level(void) return VMX_EPT_DEFAULT_GAW + 1; } +static int vmx_get_mt_mask_shift(void) +{ + return VMX_EPT_MT_EPTE_SHIFT; +} + static struct kvm_x86_ops vmx_x86_ops = { .cpu_has_kvm_support = cpu_has_kvm_support, .disabled_by_bios = vmx_disabled_by_bios, @@ -3627,6 +3632,7 @@ static struct kvm_x86_ops vmx_x86_ops = { .set_tss_addr = vmx_set_tss_addr, .get_tdp_level = get_ept_level, + .get_mt_mask_shift = vmx_get_mt_mask_shift, }; static int __init vmx_init(void) @@ -3682,10 +3688,10 @@ static int __init vmx_init(void) if (vm_need_ept()) { bypass_guest_pf = 0; kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK | - VMX_EPT_WRITABLE_MASK | - VMX_EPT_DEFAULT_MT VMX_EPT_MT_EPTE_SHIFT); + VMX_EPT_WRITABLE_MASK); kvm_mmu_set_mask_ptes(0ull, 0ull, 0ull, 0ull, - VMX_EPT_EXECUTABLE_MASK); + VMX_EPT_EXECUTABLE_MASK, + VMX_EPT_DEFAULT_MT VMX_EPT_MT_EPTE_SHIFT); kvm_enable_tdp(); } else kvm_disable_tdp(); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index df98a1f..dda478e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2614,7 +2614,7 @@ int kvm_arch_init(void *opaque) kvm_mmu_set_nonpresent_ptes(0ull, 0ull); kvm_mmu_set_base_ptes(PT_PRESENT_MASK); kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK, - PT_DIRTY_MASK, PT64_NX_MASK, 0); + PT_DIRTY_MASK, PT64_NX_MASK, 0, 0); return 0; out: diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h index 1c25cb7..4b06ca8 100644 --- a/include/asm-x86/kvm_host.h +++ b/include/asm-x86/kvm_host.h @@ -480,6 +480,7 @@ struct kvm_x86_ops { int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); int
[PATCH 4/6] KVM: VMX: Add PAT support for EPT
GUEST_PAT support is a new feature introduced by Intel Core i7 architecture. With this, cpu would save/load guest and host PAT automatically, for EPT memory type in guest depends on MSR_IA32_CR_PAT. Also add save/restore for MSR_IA32_CR_PAT. Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- arch/x86/kvm/vmx.c | 29 ++--- arch/x86/kvm/vmx.h |7 +++ arch/x86/kvm/x86.c |2 +- 3 files changed, 34 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a2911cb..809427e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -962,6 +962,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data) pr_unimpl(vcpu, unimplemented perfctr wrmsr: 0x%x data 0x%llx\n, msr_index, data); break; + case MSR_IA32_CR_PAT: + if (vmcs_config.vmentry_ctrl VM_ENTRY_LOAD_IA32_PAT) { + vmcs_write64(GUEST_IA32_PAT, data); + vcpu-arch.pat = data; + break; + } + /* Otherwise falls through to kvm_set_msr_common */ default: vmx_load_host_state(vmx); msr = find_msr_entry(vmx, msr_index); @@ -1181,12 +1188,13 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) #ifdef CONFIG_X86_64 min |= VM_EXIT_HOST_ADDR_SPACE_SIZE; #endif - opt = 0; + opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT; if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS, _vmexit_control) 0) return -EIO; - min = opt = 0; + min = 0; + opt = VM_ENTRY_LOAD_IA32_PAT; if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS, _vmentry_control) 0) return -EIO; @@ -2092,8 +2100,9 @@ static void vmx_disable_intercept_for_msr(struct page *msr_bitmap, u32 msr) */ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) { - u32 host_sysenter_cs; + u32 host_sysenter_cs, msr_low, msr_high; u32 junk; + u64 host_pat; unsigned long a; struct descriptor_table dt; int i; @@ -2181,6 +2190,20 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) rdmsrl(MSR_IA32_SYSENTER_EIP, a); vmcs_writel(HOST_IA32_SYSENTER_EIP, a); /* 22.2.3 */ + if (vmcs_config.vmexit_ctrl VM_EXIT_LOAD_IA32_PAT) { + rdmsr(MSR_IA32_CR_PAT, msr_low, msr_high); + host_pat = msr_low | ((u64) msr_high 32); + vmcs_write64(HOST_IA32_PAT, host_pat); + } + if (vmcs_config.vmentry_ctrl VM_ENTRY_LOAD_IA32_PAT) { + rdmsr(MSR_IA32_CR_PAT, msr_low, msr_high); + host_pat = msr_low | ((u64) msr_high 32); + /* Write the default value follow host pat */ + vmcs_write64(GUEST_IA32_PAT, host_pat); + /* Keep arch.pat sync with GUEST_IA32_PAT */ + vmx-vcpu.arch.pat = host_pat; + } + for (i = 0; i NR_VMX_MSR; ++i) { u32 index = vmx_msr_index[i]; u32 data_low, data_high; diff --git a/arch/x86/kvm/vmx.h b/arch/x86/kvm/vmx.h index 3e010d2..3ad61dc 100644 --- a/arch/x86/kvm/vmx.h +++ b/arch/x86/kvm/vmx.h @@ -63,10 +63,13 @@ #define VM_EXIT_HOST_ADDR_SPACE_SIZE0x0200 #define VM_EXIT_ACK_INTR_ON_EXIT0x8000 +#define VM_EXIT_SAVE_IA32_PAT 0x0004 +#define VM_EXIT_LOAD_IA32_PAT 0x0008 #define VM_ENTRY_IA32E_MODE 0x0200 #define VM_ENTRY_SMM0x0400 #define VM_ENTRY_DEACT_DUAL_MONITOR 0x0800 +#define VM_ENTRY_LOAD_IA32_PAT 0x4000 /* VMCS Encodings */ enum vmcs_field { @@ -112,6 +115,8 @@ enum vmcs_field { VMCS_LINK_POINTER_HIGH = 0x2801, GUEST_IA32_DEBUGCTL = 0x2802, GUEST_IA32_DEBUGCTL_HIGH= 0x2803, + GUEST_IA32_PAT = 0x2804, + GUEST_IA32_PAT_HIGH = 0x2805, GUEST_PDPTR0= 0x280a, GUEST_PDPTR0_HIGH = 0x280b, GUEST_PDPTR1= 0x280c, @@ -120,6 +125,8 @@ enum vmcs_field { GUEST_PDPTR2_HIGH = 0x280f, GUEST_PDPTR3= 0x2810, GUEST_PDPTR3_HIGH = 0x2811, + HOST_IA32_PAT = 0x2c00, + HOST_IA32_PAT_HIGH = 0x2c01, PIN_BASED_VM_EXEC_CONTROL = 0x4000, CPU_BASED_VM_EXEC_CONTROL = 0x4002, EXCEPTION_BITMAP= 0x4004, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b335129..df98a1f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -452,7 +452,7 @@ static u32 msrs_to_save[] =
[PATCH 1/6] x86: Rename mtrr_state struct and macro names
Prepare for exporting them. Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- arch/x86/kernel/cpu/mtrr/generic.c |8 arch/x86/kernel/cpu/mtrr/main.c|4 ++-- arch/x86/kernel/cpu/mtrr/mtrr.h|7 --- 3 files changed, 10 insertions(+), 9 deletions(-) diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c index cb7d3b6..b9574a6 100644 --- a/arch/x86/kernel/cpu/mtrr/generic.c +++ b/arch/x86/kernel/cpu/mtrr/generic.c @@ -14,9 +14,9 @@ #include asm/pat.h #include mtrr.h -struct mtrr_state { - struct mtrr_var_range var_ranges[MAX_VAR_RANGES]; - mtrr_type fixed_ranges[NUM_FIXED_RANGES]; +struct mtrr_state_type { + struct mtrr_var_range var_ranges[MTRR_MAX_VAR_RANGES]; + mtrr_type fixed_ranges[MTRR_NUM_FIXED_RANGES]; unsigned char enabled; unsigned char have_fixed; mtrr_type def_type; @@ -35,7 +35,7 @@ static struct fixed_range_block fixed_range_blocks[] = { }; static unsigned long smp_changes_mask; -static struct mtrr_state mtrr_state = {}; +static struct mtrr_state_type mtrr_state = {}; static int mtrr_state_set; u64 mtrr_tom2; diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c index 885c826..edadf7b 100644 --- a/arch/x86/kernel/cpu/mtrr/main.c +++ b/arch/x86/kernel/cpu/mtrr/main.c @@ -49,7 +49,7 @@ u32 num_var_ranges = 0; -unsigned int mtrr_usage_table[MAX_VAR_RANGES]; +unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES]; static DEFINE_MUTEX(mtrr_mutex); u64 size_or_mask, size_and_mask; @@ -574,7 +574,7 @@ struct mtrr_value { unsigned long lsize; }; -static struct mtrr_value mtrr_state[MAX_VAR_RANGES]; +static struct mtrr_value mtrr_state[MTRR_MAX_VAR_RANGES]; static int mtrr_save(struct sys_device * sysdev, pm_message_t state) { diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h index 2dc4ec6..9885382 100644 --- a/arch/x86/kernel/cpu/mtrr/mtrr.h +++ b/arch/x86/kernel/cpu/mtrr/mtrr.h @@ -11,8 +11,9 @@ #define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg)) #define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1) -#define NUM_FIXED_RANGES 88 -#define MAX_VAR_RANGES 256 +#define MTRR_NUM_FIXED_RANGES 88 +#define MTRR_MAX_VAR_RANGES 256 + #define MTRRfix64K_0_MSR 0x250 #define MTRRfix16K_8_MSR 0x258 #define MTRRfix16K_A_MSR 0x259 @@ -33,7 +34,7 @@ an 8 bit field: */ typedef u8 mtrr_type; -extern unsigned int mtrr_usage_table[MAX_VAR_RANGES]; +extern unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES]; struct mtrr_ops { u32 vendor; -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] KVM: Improve MTRR structure
As well as reset mmu context when set MTRR. Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- arch/x86/kvm/x86.c | 61 ++- include/asm-x86/kvm_host.h |5 +++- 2 files changed, 63 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b2d3f06..b335129 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -39,6 +39,7 @@ #include asm/uaccess.h #include asm/msr.h #include asm/desc.h +#include asm/mtrr.h #define MAX_IO_MSRS 256 #define CR0_RESERVED_BITS \ @@ -650,10 +651,38 @@ static bool msr_mtrr_valid(unsigned msr) static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data) { + u64 *p = (u64 *)vcpu-arch.mtrr_state.fixed_ranges; + if (!msr_mtrr_valid(msr)) return 1; - vcpu-arch.mtrr[msr - 0x200] = data; + if (msr == MSR_MTRRdefType) { + vcpu-arch.mtrr_state.def_type = data; + vcpu-arch.mtrr_state.enabled = (data 0xc00) 10; + } else if (msr == MSR_MTRRfix64K_0) + p[0] = data; + else if (msr == MSR_MTRRfix16K_8 || msr == MSR_MTRRfix16K_A) + p[1 + msr - MSR_MTRRfix16K_8] = data; + else if (msr = MSR_MTRRfix4K_C msr = MSR_MTRRfix4K_F8000) + p[3 + msr - MSR_MTRRfix4K_C] = data; + else if (msr == MSR_IA32_CR_PAT) + vcpu-arch.pat = data; + else { /* Variable MTRRs */ + int idx, is_mtrr_mask; + u64 *pt; + + idx = (msr - 0x200) / 2; + is_mtrr_mask = msr - 0x200 - 2 * idx; + if (!is_mtrr_mask) + pt = + (u64 *)vcpu-arch.mtrr_state.var_ranges[idx].base_lo; + else + pt = + (u64 *)vcpu-arch.mtrr_state.var_ranges[idx].mask_lo; + *pt = data; + } + + kvm_mmu_reset_context(vcpu); return 0; } @@ -749,10 +778,37 @@ int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata) static int get_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) { + u64 *p = (u64 *)vcpu-arch.mtrr_state.fixed_ranges; + if (!msr_mtrr_valid(msr)) return 1; - *pdata = vcpu-arch.mtrr[msr - 0x200]; + if (msr == MSR_MTRRdefType) + *pdata = vcpu-arch.mtrr_state.def_type + +(vcpu-arch.mtrr_state.enabled 10); + else if (msr == MSR_MTRRfix64K_0) + *pdata = p[0]; + else if (msr == MSR_MTRRfix16K_8 || msr == MSR_MTRRfix16K_A) + *pdata = p[1 + msr - MSR_MTRRfix16K_8]; + else if (msr = MSR_MTRRfix4K_C msr = MSR_MTRRfix4K_F8000) + *pdata = p[3 + msr - MSR_MTRRfix4K_C]; + else if (msr == MSR_IA32_CR_PAT) + *pdata = vcpu-arch.pat; + else { /* Variable MTRRs */ + int idx, is_mtrr_mask; + u64 *pt; + + idx = (msr - 0x200) / 2; + is_mtrr_mask = msr - 0x200 - 2 * idx; + if (!is_mtrr_mask) + pt = + (u64 *)vcpu-arch.mtrr_state.var_ranges[idx].base_lo; + else + pt = + (u64 *)vcpu-arch.mtrr_state.var_ranges[idx].mask_lo; + *pdata = *pt; + } + return 0; } @@ -3941,6 +3997,7 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) /* We do fxsave: this must be aligned. */ BUG_ON((unsigned long)vcpu-arch.host_fx_image 0xF); + vcpu-arch.mtrr_state.have_fixed = 1; vcpu_load(vcpu); r = kvm_arch_vcpu_reset(vcpu); if (r == 0) diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h index 4b5d1eb..1c25cb7 100644 --- a/include/asm-x86/kvm_host.h +++ b/include/asm-x86/kvm_host.h @@ -21,6 +21,7 @@ #include asm/pvclock-abi.h #include asm/desc.h +#include asm/mtrr.h #define KVM_MAX_VCPUS 16 #define KVM_MEMORY_SLOTS 32 @@ -86,6 +87,7 @@ #define KVM_MIN_FREE_MMU_PAGES 5 #define KVM_REFILL_PAGES 25 #define KVM_MAX_CPUID_ENTRIES 40 +#define KVM_NR_FIXED_MTRR_REGION 88 #define KVM_NR_VAR_MTRR 8 extern spinlock_t kvm_lock; @@ -329,7 +331,8 @@ struct kvm_vcpu_arch { bool nmi_injected; bool nmi_window_open; - u64 mtrr[0x100]; + struct mtrr_state_type mtrr_state; + u32 pat; }; struct kvm_mem_alias { -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86 emulator: Add Src2 decode set
Guillaume Thouvenin wrote: Instruction like shld has three operands, so we need to add a Src2 decode set. We start with Src2None, Src2CL, and Src2Imm8 to support shld and we will expand it later. Please add Src2One (implied '1') as well, so we can switch the existing shift operators to Src2 later. Signed-off-by: Guillaume Thouvenin [EMAIL PROTECTED] --- arch/x86/kvm/x86_emulate.c| 47 -- include/asm-x86/kvm_x86_emulate.h |1 2 files changed, 36 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c index a391e21..c9ef2da 100644 --- a/arch/x86/kvm/x86_emulate.c +++ b/arch/x86/kvm/x86_emulate.c @@ -59,16 +59,21 @@ #define SrcImm (54) /* Immediate operand. */ #define SrcImmByte (64) /* 8-bit sign-extended immediate operand. */ #define SrcMask (74) +/* Source 2 operand type */ +#define Src2None(07) +#define Src2CL (17) +#define Src2Imm8(27) +#define Src2Mask(77) Please allocate bits for this at the end to avoid renumbering. + /* + * Decode and fetch the second source operand: register, memory + * or immediate. + */ + switch (c-d Src2Mask) { + case Src2None: + break; + case Src2CL: + c-src2.val = c-regs[VCPU_REGS_RCX]; Mask to a single byte; also set the operand length. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] KVM: IRQ ACK notifier should be used with in-kernel irqchip
Also remove unnecessary parameter of unregister irq ack notifier. Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- include/linux/kvm_host.h |3 +-- virt/kvm/irq_comm.c |8 ++-- virt/kvm/kvm_main.c |2 +- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 3833c48..41955ed 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -313,8 +313,7 @@ void kvm_set_irq(struct kvm *kvm, int irq, int level); void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi); void kvm_register_irq_ack_notifier(struct kvm *kvm, struct kvm_irq_ack_notifier *kian); -void kvm_unregister_irq_ack_notifier(struct kvm *kvm, -struct kvm_irq_ack_notifier *kian); +void kvm_unregister_irq_ack_notifier(struct kvm_irq_ack_notifier *kian); #ifdef CONFIG_DMAR int kvm_iommu_map_pages(struct kvm *kvm, gfn_t base_gfn, diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index d0169f5..54b251d 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -50,11 +50,15 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi) void kvm_register_irq_ack_notifier(struct kvm *kvm, struct kvm_irq_ack_notifier *kian) { + /* Must be called with in-kernel IRQ chip, otherwise it's nonsense */ + ASSERT(irqchip_in_kernel(kvm)); + ASSERT(kian); hlist_add_head(kian-link, kvm-arch.irq_ack_notifier_list); } -void kvm_unregister_irq_ack_notifier(struct kvm *kvm, -struct kvm_irq_ack_notifier *kian) +void kvm_unregister_irq_ack_notifier(struct kvm_irq_ack_notifier *kian) { + if (!kian) + return; hlist_del(kian-link); } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index cf0ab8e..d2ae1c9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -145,7 +145,7 @@ static void kvm_free_assigned_device(struct kvm *kvm, if (irqchip_in_kernel(kvm) assigned_dev-irq_requested) free_irq(assigned_dev-host_irq, (void *)assigned_dev); - kvm_unregister_irq_ack_notifier(kvm, assigned_dev-ack_notifier); + kvm_unregister_irq_ack_notifier(assigned_dev-ack_notifier); if (cancel_work_sync(assigned_dev-interrupt_work)) /* We had pending work. That means we will have to take -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86 emulator: Add Src2 decode set
Avi Kivity wrote: #define SrcMask (74) +/* Source 2 operand type */ +#define Src2None(07) +#define Src2CL (17) +#define Src2Imm8(27) Src2ImmByte like SrcImmByte. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RESEND] VT-d: Support multiple device assignment to one guest
Han, Weidong wrote: There is a missed optimization here. Suppose we have two devices each under a different iommu. With the patch, each will be in a different dmar_domain and so will have a different page table. The amount of memory used is doubled. You cannot let two devices each under a different iommu share one dmar_domain, becasue dmar_domain has a pointer to iommu. I don't want then to share dmar_domains (these are implementation details anyway), just io page tables. kvm --- something (owns io page table) --- dmar_domain (uses shared io page table) --- device Even if we don't implement io page table sharing right away, implementing the 'something' in the iommu api means we can later impement sharing without changing the iommu/kvm interface. In fact, the exported APIs added for KVM VT-d also do create/map/attach/detach/free functions. Whereas these iommu APIs are more readable. No; the existing iommu API talks about dmar domains and exposes the existence of multiple iommus, so it is more complex. Because kvm VT-d usage is different with native usage, it's inevitable extend native VT-d code to support KVM VT-d (such as wrap dmar_domain). For devices under different iommus, they cannot share the same dmar_domain, thus they cannot share VT-d page table. If we want to handle this by iommu APIs, I suspect we need to change lots of native VT-d driver code. As mentioned above, we can start with implementing the API without actual sharing (basically, your patch, but as an addition to the API rather than a change to kvm); we can add io pagetable sharing later. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: IRQ ACK notifier should be used with in-kernel irqchip
Sheng Yang wrote: Also remove unnecessary parameter of unregister irq ack notifier. diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index d0169f5..54b251d 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -50,11 +50,15 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi) void kvm_register_irq_ack_notifier(struct kvm *kvm, struct kvm_irq_ack_notifier *kian) { + /* Must be called with in-kernel IRQ chip, otherwise it's nonsense */ + ASSERT(irqchip_in_kernel(kvm)); + ASSERT(kian); hlist_add_head(kian-link, kvm-arch.irq_ack_notifier_list); } We don't want a BUG() here is the user specifies -no-kvm-irqchip; is there a check on the irq assignment ioctls before calling this? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] Enable MTRR for EPT
Sheng Yang wrote: The effective memory type of EPT is the mixture of MSR_IA32_CR_PAT and memory type field of EPT entry. @@ -168,6 +168,7 @@ static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ static u64 __read_mostly shadow_user_mask; static u64 __read_mostly shadow_accessed_mask; static u64 __read_mostly shadow_dirty_mask; +static u64 __read_mostly shadow_mt_mask; For shadow, the mt mask is different based on the level of the page table, so we need an array here. This can of course be left until shadow pat is implemented. + if (mt_mask) { + mt_mask = get_memory_type(vcpu, gfn) + kvm_x86_ops-get_mt_mask_shift(); + spte |= mt_mask; + } For shadow, it's not a simple shift, since for large pages one of the bits is at position 12. So we would need the callback to calculate the mask value. Perhaps even simpler, have a 4x8 array, with the first index the page table level and the second index the memory type. The initialization code can prepare the array like it prepares the other masks. This can wait until we have a shadow pat implementation. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: IRQ ACK notifier should be used with in-kernel irqchip
On Thursday 09 October 2008 16:34:47 Avi Kivity wrote: Sheng Yang wrote: Also remove unnecessary parameter of unregister irq ack notifier. diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index d0169f5..54b251d 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -50,11 +50,15 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi) void kvm_register_irq_ack_notifier(struct kvm *kvm, struct kvm_irq_ack_notifier *kian) { + /* Must be called with in-kernel IRQ chip, otherwise it's nonsense */ + ASSERT(irqchip_in_kernel(kvm)); + ASSERT(kian); hlist_add_head(kian-link, kvm-arch.irq_ack_notifier_list); } We don't want a BUG() here is the user specifies -no-kvm-irqchip; is there a check on the irq assignment ioctls before calling this? Yes. kvm_register_irq_ack_notifier should be called within irqchip_in_kernel() (on the other side, only if we have irqchip_in_kernel(), ack_notifier is useful, so we shouldn't call it without it), And I can't see if this would be useful with userspace irqchip, so add a ASSERT here. -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86 emulator: Add a Src2 decode set and SrcOne operand type
Instruction like shld has three operands, so we need to add a Src2 decode set. We start with Src2None, Src2CL, and Src2Imm8 to support shld and we will expand it later. Operand type of Src2 are placed at the end of the set to avoid renumbering. For Src2CL we mask to a single byte and set the operand length. This patch also added SrcOne operand type when we need to decode an implied '1' like with regular shift instruction. If needed I can split this patch into two patches, one for Src2 decode set and another one for SrcOne. Signed-off-by: Guillaume Thouvenin [EMAIL PROTECTED] --- arch/x86/kvm/x86_emulate.c| 37 + include/asm-x86/kvm_x86_emulate.h |1 + 2 files changed, 34 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c index a391e21..b5d7bc8 100644 --- a/arch/x86/kvm/x86_emulate.c +++ b/arch/x86/kvm/x86_emulate.c @@ -58,6 +58,7 @@ #define SrcMem32(44) /* Memory operand (32-bit). */ #define SrcImm (54) /* Immediate operand. */ #define SrcImmByte (64) /* 8-bit sign-extended immediate operand. */ +#define SrcOne (74) /* Implied '1' */ #define SrcMask (74) /* Generic ModRM decode. */ #define ModRM (17) @@ -70,13 +71,18 @@ #define Group (114) /* Bits 3:5 of modrm byte extend opcode */ #define GroupDual (115) /* Alternate decoding of mod == 3 */ #define GroupMask 0xff/* Group number stored in bits 0:7 */ +/* Source 2 operand type */ +#define Src2None(029) +#define Src2CL (129) +#define Src2ImmByte (229) +#define Src2Mask(729) enum { Group1_80, Group1_81, Group1_82, Group1_83, Group1A, Group3_Byte, Group3, Group4, Group5, Group7, }; -static u16 opcode_table[256] = { +static u32 opcode_table[256] = { /* 0x00 - 0x07 */ ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM, @@ -195,7 +201,7 @@ static u16 opcode_table[256] = { ImplicitOps, ImplicitOps, Group | Group4, Group | Group5, }; -static u16 twobyte_table[256] = { +static u32 twobyte_table[256] = { /* 0x00 - 0x0F */ 0, Group | GroupDual | Group7, 0, 0, 0, 0, ImplicitOps, 0, ImplicitOps, ImplicitOps, 0, 0, 0, ImplicitOps | ModRM, 0, 0, @@ -253,7 +259,7 @@ static u16 twobyte_table[256] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; -static u16 group_table[] = { +static u32 group_table[] = { [Group1_80*8] = ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM, @@ -297,7 +303,7 @@ static u16 group_table[] = { SrcMem16 | ModRM | Mov, SrcMem | ModRM | ByteOp, }; -static u16 group2_table[] = { +static u32 group2_table[] = { [Group7*8] = SrcNone | ModRM, 0, 0, 0, SrcNone | ModRM | DstMem | Mov, 0, @@ -1041,6 +1047,29 @@ done_prefixes: c-src.bytes = 1; c-src.val = insn_fetch(s8, 1, c-eip); break; + case SrcOne: + c-src.bytes = 1; + c-src.val = 1; + break; + } + + /* +* Decode and fetch the second source operand: register, memory +* or immediate. +*/ + switch (c-d Src2Mask) { + case Src2None: + break; + case Src2CL: + c-src2.bytes = 1; + c-src2.val = c-regs[VCPU_REGS_RCX] 0x8; + break; + case Src2ImmByte: + c-src2.type = OP_IMM; + c-src2.ptr = (unsigned long *)c-eip; + c-src2.bytes = 1; + c-src2.val = insn_fetch(u8, 1, c-eip); + break; } /* Decode and fetch the destination operand: register or memory. */ diff --git a/include/asm-x86/kvm_x86_emulate.h b/include/asm-x86/kvm_x86_emulate.h index 4e8c1e4..00de896 100644 --- a/include/asm-x86/kvm_x86_emulate.h +++ b/include/asm-x86/kvm_x86_emulate.h @@ -123,6 +123,7 @@ struct decode_cache { u8 ad_bytes; u8 rex_prefix; struct operand src; + struct operand src2; struct operand dst; bool has_seg_override; u8 seg_override; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RESEND] VT-d: Support multiple device assignment to one guest
Han, Weidong wrote: I don't want then to share dmar_domains (these are implementation details anyway), just io page tables. kvm --- something (owns io page table) --- dmar_domain (uses shared io page table) --- device Let dmar_domains share io page table is not allowed. VT-d spec allows one domain corresponds to one page table, vice versa. Since the io pagetables are read only for the iommu (right?), I don't see what prevents several iommus from accessing the same pagetable. It's just a bunch of memory. If we want something owns the io page table, which shared by all assigned devices to one guest, we need to redefine dmar_domain which covers all devices assigned to a guest. Then we need to rewrite most of native VT-d code for kvm. Xen doesn't use dmar_domain, instead it implements something as a domain sturcture (with domain id) to own page table. I imagine, Xen shares the io pagetables with the EPT pagetables as well. So io pagetable sharing is allowed. One guest has only one something instance, thus has only one page table. It looks like: xen --- something (owns io page table) --- device. But, in KVM side, I think we can reuse native VT-d code, needn't to duplicate another VT-d code. I agree that at this stage, we don't want to do optimization, we need something working first. But let's at least ensure the API allows the optimization later on (and also, that iommu implementation details are hidden from kvm). What I'm proposing is moving the list of kvm_vtd_domains inside the iommu API. The only missing piece is populating a new dmar_domain when a new device is added. We already have intel_iommu_iova_to_pfn(), we need to add a way to read the protection bits and the highest mapped iova (oh, and intel_iommu_iova_to_pfn() has a bug: it shifts right instead of left). Later we can make the something (that already contains the list) also own the io page table; and non-kvm users can still use the same code (the list will always be of length 1 for these users). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Call for help: moving the kvm wiki
As you may have noticed, the kvm wiki is overrun by spammers. It the past I've regularly cleaned up the spam, but some time ago I've given up. So I'm looking for a volunteer to locate a spam-free public wiki host (candidates include wiki.kernel.org and fedorahosted.org) and transfer the contents (minus the spam). I don't think we need to transfer the editing history, but the conversion should adapt to the target's wiki syntax. My requirements for the wiki are: - hosted by a public provider (not a private system) - ad free - open for editing by users (requiring an account is fine; but there should be no need for approval and no immutable pages) - customization of the theme a bonus This is a great way for non-coders to contribute to kvm development; the wiki is a useful tool and it's a pity to let the spammers take it over. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Broken userspace module Makefile
Hi, Avi After Xiantao's irq_common patches were checked in, we found that it's impossible to compile with VT-d userspace now. Essentially the problem is Makefile missed a $ since unifdef patch checked in half an years ago... But after I fix it, I found it's still impossible to get unifdef run correctly... First, unifdef report error when processing include/linux/kvm.h, but I can't find out what's wrong now. Second, seems at least my unifdef can't deal with #if defined(CONFIG_X86) || defined(CONFIG_IA64) My unifdef version is 1.0(20030701), the latest from debian testing. I also tried one for fc9, same result. How do you think... -- regards Yang,Sheng -- From: Sheng Yang [EMAIL PROTECTED] Date: Thu, 9 Oct 2008 20:45:02 +0800 Subject: [PATCH 1/1] kvm: Fix broken Makefile of kernel module Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- kernel/Makefile |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/Makefile b/kernel/Makefile index f2a71fa..e352f77 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -65,7 +65,7 @@ header-sync: $(LINUX)/arch/$(ARCH_DIR)/include/asm/./kvm*.h \ $T/include/asm-$(ARCH_DIR)/ - set -e for i in $(find $T -name '*.h'); do \ + set -e for i in $$(find $T -name '*.h'); do \ $(call unifdef,$$i); done $(call hack, include/linux/kvm.h) set -e for i in $$(find $T -type f -printf '%P '); \ @@ -79,7 +79,7 @@ source-sync: $(LINUX)/virt/kvm/./*.[cSh] \ $T/ - set -e for i in $(find $T -name '*.c'); do \ + set -e for i in $$(find $T -name '*.c'); do \ $(call unifdef,$$i); done for i in $(hack-files); \ -- 1.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] [RESEND] VT-d: Support multiple device assignment to one guest
Avi Kivity wrote: Han, Weidong wrote: I don't want then to share dmar_domains (these are implementation details anyway), just io page tables. kvm --- something (owns io page table) --- dmar_domain (uses shared io page table) --- device Let dmar_domains share io page table is not allowed. VT-d spec allows one domain corresponds to one page table, vice versa. Since the io pagetables are read only for the iommu (right?), I don't see what prevents several iommus from accessing the same pagetable. It's just a bunch of memory. I think the reason is that hardware may use the domain identifier to tag its internal caches. If we want something owns the io page table, which shared by all assigned devices to one guest, we need to redefine dmar_domain which covers all devices assigned to a guest. Then we need to rewrite most of native VT-d code for kvm. Xen doesn't use dmar_domain, instead it implements something as a domain sturcture (with domain id) to own page table. I imagine, Xen shares the io pagetables with the EPT pagetables as well. So io pagetable sharing is allowed. In Xen, VT-d page table doesn't share with EPT pagetable and P2M pagetable. But they can share if the format is the same. One guest has only one something instance, thus has only one page table. It looks like: xen --- something (owns io page table) --- device. But, in KVM side, I think we can reuse native VT-d code, needn't to duplicate another VT-d code. I agree that at this stage, we don't want to do optimization, we need something working first. But let's at least ensure the API allows the optimization later on (and also, that iommu implementation details are hidden from kvm). What I'm proposing is moving the list of kvm_vtd_domains inside the iommu API. The only missing piece is populating a new dmar_domain when a new device is added. We already have I will move kvm_vtd_domain inside the iommu API, and also hide get_kvm_vtd_domain() and release_kvm_vtd_domain() implementation details from kvm. intel_iommu_iova_to_pfn(), we need to add a way to read the protection bits and the highest mapped iova (oh, and intel_iommu_iova_to_pfn() has a bug: it shifts right instead of left). Why do we need the protection bits and the highest mapped iova? Shifting right instead of left in intel_iommu_iova_to_pfn() is not a bug, because it returns pfn, not address. Regards, Weidong Later we can make the something (that already contains the list) also own the io page table; and non-kvm users can still use the same code (the list will always be of length 1 for these users). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] Enable MTRR for EPT
On Thursday 09 October 2008 16:44:19 Avi Kivity wrote: Sheng Yang wrote: The effective memory type of EPT is the mixture of MSR_IA32_CR_PAT and memory type field of EPT entry. @@ -168,6 +168,7 @@ static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ static u64 __read_mostly shadow_user_mask; static u64 __read_mostly shadow_accessed_mask; static u64 __read_mostly shadow_dirty_mask; +static u64 __read_mostly shadow_mt_mask; For shadow, the mt mask is different based on the level of the page table, so we need an array here. This can of course be left until shadow pat is implemented. + if (mt_mask) { + mt_mask = get_memory_type(vcpu, gfn) + kvm_x86_ops-get_mt_mask_shift(); + spte |= mt_mask; + } For shadow, it's not a simple shift, since for large pages one of the bits is at position 12. So we would need the callback to calculate the mask value. Perhaps even simpler, have a 4x8 array, with the first index the page table level and the second index the memory type. The initialization code can prepare the array like it prepares the other masks. This can wait until we have a shadow pat implementation. Yes, of course. Now this mask is just used by EPT, so I do it like this. Later shadow mtrr/pat would solve this as well. :) -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] MTRR/PAT support for EPT (v3)
Sheng Yang wrote: On Thursday 09 October 2008 17:03:24 Avi Kivity wrote: Sheng Yang wrote: Hi, Avi Here is the latest update of MTRR/PAT support. Change from v2: Discard the using of MSR bitmap, add MSR_IA32_CR_PAT to save/restore, as well as rebase on latest upstream. Applied all; my comments about shadow can be addressed later. There is also the danger of the guest setting the wrong MTRR type for RAM, thus introducing incompatible memory types (between qemu and the guest). If this is a problem, we should ignore the guest's mtrr (and pat) for RAM and use write-back instead. Do you mean host(qemu) would access this memory and if we set it to guest MTRR, host access would be broken? We would cover this in our shadow MTRR patch, for we encountered this in video ram when doing some experiment with VGA assignment. No, I think that the cpu requires that all accesses to a page be done using the same memory type. We are allowing the guest to break that, since qemu mappings will use writeback and guest mapping will use guest specified memory types. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Compile faillure with 2.6.27-rc9-git1
On Wed, Oct 08, 2008 at 05:55:47PM +0200, Xavier Gnata wrote: Hi, I'm trying to compile kvm-76 on a box running 2.6.27-rc9-git1 (yeah ok...rc+git...). I get this error: In file included from /usr/local/src/kvm-76/kernel/x86/svm.c:16: /usr/local/src/kvm-76/kernel/include/linux/kvm_host.h:128: error: field ?mmu_notifier? has incomplete type Sorry for the noise if it is a non relevant or a well know issue. Seems you miss something in .config, probably CONFIG_KVM or CONFIG_VIRTUALIZATION. Did you enable KVM when you build the kernel? You can use make menuconfig then search MMU_NOTIFIER to find what config it depends on. In the most condition, simple enable KVM support when you compiling the kernel just OK. -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] MTRR/PAT support for EPT (v3)
On Thursday 09 October 2008 17:03:24 Avi Kivity wrote: Sheng Yang wrote: Hi, Avi Here is the latest update of MTRR/PAT support. Change from v2: Discard the using of MSR bitmap, add MSR_IA32_CR_PAT to save/restore, as well as rebase on latest upstream. Applied all; my comments about shadow can be addressed later. There is also the danger of the guest setting the wrong MTRR type for RAM, thus introducing incompatible memory types (between qemu and the guest). If this is a problem, we should ignore the guest's mtrr (and pat) for RAM and use write-back instead. Do you mean host(qemu) would access this memory and if we set it to guest MTRR, host access would be broken? We would cover this in our shadow MTRR patch, for we encountered this in video ram when doing some experiment with VGA assignment. -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ Re: unhandled vm exit: 0x80000021 vcpu_id 0]
On Wed, Oct 8, 2008 at 7:16 AM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi Yang, I often hibernate my Linux, so may be that the loadmodule message is missing in the dmesg because it is too old. I have rebooted the system and I attach a clean dmesg. Yeah, now I can see the load info of kvm-76. What means Windows always trig a apic write error before Jan's patch make them slience? which Windows? At least Windows XP like to do this, now for upstream, Jan's patch clean it. However, when I try ro run qemu/kvm using the winxp image, no error happens in the dmesg. I can see the error as output of the qemu/kvm command. It's indeed hard to debug with so limit info... I still suggest you to fill a bug first. And if you have time, please try the attached patch and update info. -- regards Yang, Sheng Reagrds, Pier Luigi Original Message Subject:Re: unhandled vm exit: 0x8021 vcpu_id 0 Date: Fri, 3 Oct 2008 08:57:31 +0800 From: Sheng Yang [EMAIL PROTECTED] To: [EMAIL PROTECTED] [EMAIL PROTECTED] CC: [EMAIL PROTECTED], kvm@vger.kernel.org References: [EMAIL PROTECTED] On Fri, Oct 03, 2008 at 12:16:20AM +0200, [EMAIL PROTECTED] wrote: Hi, I understand the particularity (checkpoint) of this case. Hi Pier Thanks for your understanding. :) Any way, in the attachment the dmesg log and the output of the dmesg command. But it's strange that I almost can't see anything correlated with kvm in the log. If you built kvm as a modules(I suppose you did it because you tried many versions), at least something like load kvm module xxx should appear(and Windows always trig a apic write error before Jan's patch make them slience). Is this the dmesg when the error was happening? -- regards Yang, Sheng thanks for your helpfulness. Regards. Sheng Yang wrote: On Mon, Sep 29, 2008 at 6:18 PM, [EMAIL PROTECTED] p. [EMAIL PROTECTED] it wrote: Hi, I have successfully installed windows XP SP2 on kvm. After the installation I have launched the setup of Checkpoint - Pointsec for the entire disk encryption. Hi Pier Can you issue a bug for this? But sadly Checkpoint is a commercial software, we may not deal with it directly and immediately. The first step of installation was run successfully, but when the system reboots and Pointsec loads the initial code, the following error happens: == unhandled vm exit: 0x8021 vcpu_id 0 rax 0007 rbx 1490 rcx rdx 19a0 rsi rdi rsp 0080 rbp 96bf r8 r9 r10 r11 r12 r13 r14 r15 rip 002a rflags 00023202 cs 14a2 (/ p 0 dpl 0 db 0 s 0 type 9 l 0 g 0 avl 0) ds 19a0 (/ p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0) es 1a31 (/ p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0) ss 1a29 (/ p 0 dpl 0 db 0 s 0 type 1 l 0 g 0 avl 0) fs (/ p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0) gs (/ p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0) tr 0058 (00201ffa/ p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0) ldt (/ p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0) gdt 20/1dd8 idt 201df0/188 cr0 8019 cr2 0 cr3 144 cr4 0 cr8 0 efer 0 What's this... CR0.PE clear, CR0.PG set... And segment register also strange. May be some real emulation wrong... Aborted == I am able to boot this system (image) using qemu (with kqemu enabled for user code), but not using kvm. I have also tried with the options: -no-kvm-irqchip -no-kvm-pit - no- acpi without success. Only the -no-kvm option works. I have tried these kvm releases: from 65 to 76; and these kernel (vanilla) releases: from 2.6.23.1 to 2.6.26.5. Thanks for your patient... My computer is a Dell D630 equipped with Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz The HOST Linux distributions used are: Fedora 8/9 for i386, and Fedora 9 for x86_64. Can you show dmesg as well? That's also helps. ___ Con Tiscali Adsl 8 Mega navighi SENZA LIMITI e GRATIS PER I PRIMI TRE MESI. In seguito paghi solo ??? 19,95 al mese. Attivala subito, l? offerta è valida fino al 02/10/2008! http://abbonati.tiscali. it/promo/adsl8mega/ ___ Visita Tiscali Shopping e troverai tutto quello che cerchi ai prezzi migliori http://shopping.tiscali.it/ -- regards, Yang, Sheng Index:
RE: [PATCH] [RESEND] VT-d: Support multiple device assignment to one guest
Avi Kivity wrote: Han, Weidong wrote: There is a missed optimization here. Suppose we have two devices each under a different iommu. With the patch, each will be in a different dmar_domain and so will have a different page table. The amount of memory used is doubled. You cannot let two devices each under a different iommu share one dmar_domain, becasue dmar_domain has a pointer to iommu. I don't want then to share dmar_domains (these are implementation details anyway), just io page tables. kvm --- something (owns io page table) --- dmar_domain (uses shared io page table) --- device Let dmar_domains share io page table is not allowed. VT-d spec allows one domain corresponds to one page table, vice versa. If we want something owns the io page table, which shared by all assigned devices to one guest, we need to redefine dmar_domain which covers all devices assigned to a guest. Then we need to rewrite most of native VT-d code for kvm. Xen doesn't use dmar_domain, instead it implements something as a domain sturcture (with domain id) to own page table. One guest has only one something instance, thus has only one page table. It looks like: xen --- something (owns io page table) --- device. But, in KVM side, I think we can reuse native VT-d code, needn't to duplicate another VT-d code. Regards, Weidong Even if we don't implement io page table sharing right away, implementing the 'something' in the iommu api means we can later impement sharing without changing the iommu/kvm interface. In fact, the exported APIs added for KVM VT-d also do create/map/attach/detach/free functions. Whereas these iommu APIs are more readable. No; the existing iommu API talks about dmar domains and exposes the existence of multiple iommus, so it is more complex. Because kvm VT-d usage is different with native usage, it's inevitable extend native VT-d code to support KVM VT-d (such as wrap dmar_domain). For devices under different iommus, they cannot share the same dmar_domain, thus they cannot share VT-d page table. If we want to handle this by iommu APIs, I suspect we need to change lots of native VT-d driver code. As mentioned above, we can start with implementing the API without actual sharing (basically, your patch, but as an addition to the API rather than a change to kvm); we can add io pagetable sharing later. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86 emulator: Add Src2 decode set
On Thu, 09 Oct 2008 11:06:50 +0200 Avi Kivity [EMAIL PROTECTED] wrote: The regular shift instructions (shl, rcl, etc) come in three varieties: shift by 1, shift by imm8, and shift by CL. Right now they use SrcImmByte and decode the implied '1' and CL by hand. If we change them to use Src2, they can reuse the Src2CL and Src2One support that you are adding now. Ok I see but shld, rcld, etc come only in two varieties: immediate and CL. So maybe it could be better to replace SrcImplicit (that is not really useful) by SrcOne? and then we will have ... case SrcOne: c-src.val = 1; break; ... and we can reuse Src2CL as well. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86 emulator: Add Src2 decode set
Guillaume Thouvenin wrote: I will add Src2One but I don't understand exactly what you mean by switching shift operators to Src2 later. I also applied other remarks, thanks for your help. The patch follows. The regular shift instructions (shl, rcl, etc) come in three varieties: shift by 1, shift by imm8, and shift by CL. Right now they use SrcImmByte and decode the implied '1' and CL by hand. If we change them to use Src2, they can reuse the Src2CL and Src2One support that you are adding now. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] MTRR/PAT support for EPT (v3)
Sheng Yang wrote: Hi, Avi Here is the latest update of MTRR/PAT support. Change from v2: Discard the using of MSR bitmap, add MSR_IA32_CR_PAT to save/restore, as well as rebase on latest upstream. Applied all; my comments about shadow can be addressed later. There is also the danger of the guest setting the wrong MTRR type for RAM, thus introducing incompatible memory types (between qemu and the guest). If this is a problem, we should ignore the guest's mtrr (and pat) for RAM and use write-back instead. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86 emulator: Add Src2 decode set
On Thu, 09 Oct 2008 10:11:57 +0200 Avi Kivity [EMAIL PROTECTED] wrote: Guillaume Thouvenin wrote: Instruction like shld has three operands, so we need to add a Src2 decode set. We start with Src2None, Src2CL, and Src2Imm8 to support shld and we will expand it later. Please add Src2One (implied '1') as well, so we can switch the existing shift operators to Src2 later. I will add Src2One but I don't understand exactly what you mean by switching shift operators to Src2 later. I also applied other remarks, thanks for your help. The patch follows. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM rpm/deb packages for recent releases.
Daniel P. Berrange wrote: On Wed, Oct 08, 2008 at 12:06:43PM -0700, jd wrote: Hi - I am looking for installable packages (both rpms and deb) for recent versions of KVM (kvm-70 and above). For SUSE/SLES I found, which seems useful (looks official) http://download.opensuse.org/repositories/Virtualization:/KVM/ Anything similar for RHEL/CentOS or Ubuntu/debian ? - RHEL and CentOS seems to be at kvm-36. Is there a process to make higher version of kvm supported on such distros? Would any one from RH and Novell know/comment here? No version of KVM is supported on RHEL. Xen is the virtualization technology in RHEL-5. Whatever CentOS is shipping is not derived from anything in RHEL-5. For up2date RPMs, Fedora (rawhide) is the place to look. We aim to track latest releases in both upstream kernel (for modules) and kvm (for the userspace). NB, we don't patch KVM modules to be newer than what's in Linus' official releases. but if you still like here are all required packages: http://www.lfarkas.org/linux/packages/centos/5/x86_64/ -- Levente Si vis pacem para bellum! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM rpm/deb packages for recent releases.
On Wed, Oct 8, 2008 at 4:06 PM, jd [EMAIL PROTECTED] wrote: Hi - I am looking for installable packages (both rpms and deb) for recent versions of KVM (kvm-70 and above). kvm-72 is in debian testing/unstable. You say for etch and a half ? I didn't find it in backports.org, but you could try downloading the source package from testing and compile it (you need to add a deb-src line to the sources.list), something like: apt-get build-dep kvm; apt-get source kvm; cd kvm-*; debuild -us -uc (I didn't test it, but it shouldn't be too different, perhaps you could not have some dependency you can install from testing or compile this way too:) Thanks, Rodrigo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Broken userspace module Makefile
CONFIG_X86 is defined to compile every qemu's objects, so even if unifdef doesn't work, we shouldn't meet the problems related to this header file. Maybe other pential issues casues the problem you met. Anyway we had better enable unifdef to work in its right way. :) BTW, seems unifdef can't handle the case like #if defined(CONFIG_X86) || defined(CONFIG_IA64) from the manual, who can clarify it ? Thanks Xiantao Sheng Yang wrote: Hi, Avi After Xiantao's irq_common patches were checked in, we found that it's impossible to compile with VT-d userspace now. Essentially the problem is Makefile missed a $ since unifdef patch checked in half an years ago... But after I fix it, I found it's still impossible to get unifdef run correctly... First, unifdef report error when processing include/linux/kvm.h, but I can't find out what's wrong now. Second, seems at least my unifdef can't deal with #if defined(CONFIG_X86) || defined(CONFIG_IA64) My unifdef version is 1.0(20030701), the latest from debian testing. I also tried one for fc9, same result. How do you think... -- regards Yang,Sheng -- From: Sheng Yang [EMAIL PROTECTED] Date: Thu, 9 Oct 2008 20:45:02 +0800 Subject: [PATCH 1/1] kvm: Fix broken Makefile of kernel module Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- kernel/Makefile |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/Makefile b/kernel/Makefile index f2a71fa..e352f77 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -65,7 +65,7 @@ header-sync: $(LINUX)/arch/$(ARCH_DIR)/include/asm/./kvm*.h \ $T/include/asm-$(ARCH_DIR)/ - set -e for i in $(find $T -name '*.h'); do \ + set -e for i in $$(find $T -name '*.h'); do \ $(call unifdef,$$i); done $(call hack, include/linux/kvm.h) set -e for i in $$(find $T -type f -printf '%P '); \ @@ -79,7 +79,7 @@ source-sync: $(LINUX)/virt/kvm/./*.[cSh] \ $T/ - set -e for i in $(find $T -name '*.c'); do \ + set -e for i in $$(find $T -name '*.c'); do \ $(call unifdef,$$i); done for i in $(hack-files); \ -- 1.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Broken userspace module Makefile
On Friday 10 October 2008 09:47:15 Zhang, Xiantao wrote: CONFIG_X86 is defined to compile every qemu's objects, so even if unifdef doesn't work, we shouldn't meet the problems related to this header file. Maybe other pential issues casues the problem you met. Anyway we had better enable unifdef to work in its right way. :) BTW, seems unifdef can't handle the case like #if defined(CONFIG_X86) || defined(CONFIG_IA64) from the manual, who can clarify it ? Yeah, CONFIG_X86 is for qemu. But kernel/ is not a part of qemu code, and can't be cover by qemu/config-host.mak... regards Yang, Sheng Thanks Xiantao Sheng Yang wrote: Hi, Avi After Xiantao's irq_common patches were checked in, we found that it's impossible to compile with VT-d userspace now. Essentially the problem is Makefile missed a $ since unifdef patch checked in half an years ago... But after I fix it, I found it's still impossible to get unifdef run correctly... First, unifdef report error when processing include/linux/kvm.h, but I can't find out what's wrong now. Second, seems at least my unifdef can't deal with #if defined(CONFIG_X86) || defined(CONFIG_IA64) My unifdef version is 1.0(20030701), the latest from debian testing. I also tried one for fc9, same result. How do you think... -- regards Yang,Sheng -- From: Sheng Yang [EMAIL PROTECTED] Date: Thu, 9 Oct 2008 20:45:02 +0800 Subject: [PATCH 1/1] kvm: Fix broken Makefile of kernel module Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- kernel/Makefile |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/Makefile b/kernel/Makefile index f2a71fa..e352f77 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -65,7 +65,7 @@ header-sync: $(LINUX)/arch/$(ARCH_DIR)/include/asm/./kvm*.h \ $T/include/asm-$(ARCH_DIR)/ - set -e for i in $(find $T -name '*.h'); do \ + set -e for i in $$(find $T -name '*.h'); do \ $(call unifdef,$$i); done $(call hack, include/linux/kvm.h) set -e for i in $$(find $T -type f -printf '%P '); \ @@ -79,7 +79,7 @@ source-sync: $(LINUX)/virt/kvm/./*.[cSh] \ $T/ - set -e for i in $(find $T -name '*.c'); do \ + set -e for i in $$(find $T -name '*.c'); do \ $(call unifdef,$$i); done for i in $(hack-files); \ -- 1.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Broken userspace module Makefile
The following patch should solve the issue you met before unifdef gets work again. diff --git a/libkvm/config-i386.mak b/libkvm/config-i386.mak index 2706b70..3579985 100644 --- a/libkvm/config-i386.mak +++ b/libkvm/config-i386.mak @@ -1,6 +1,6 @@ LIBDIR := /lib CFLAGS += -m32 -CFLAGS += -D__i386__ +CFLAGS += -D__i386__ -DCONFIG_X86 libkvm-$(ARCH)-objs := libkvm-x86.o diff --git a/libkvm/config-x86_64.mak b/libkvm/config-x86_64.mak index e638977..9d02eb0 100644 --- a/libkvm/config-x86_64.mak +++ b/libkvm/config-x86_64.mak @@ -1,6 +1,6 @@ LIBDIR := /lib64 CFLAGS += -m64 -CFLAGS += -D__x86_64__ +CFLAGS += -D__x86_64__ -DCONFIG_X86 libkvm-$(ARCH)-objs := libkvm-x86.o Yeah, CONFIG_X86 is for qemu. But kernel/ is not a part of qemu code, and can't be cover by qemu/config-host.mak... No, when you use ./configure in userspace, it will generate a qemu_cflag which includes -DCONFIG_X86 for compiling qemu's objects. So when compile qemu's objects, CONFIG_X86 is defined for every targets! :) Xiantao regards Yang, Sheng Thanks Xiantao Sheng Yang wrote: Hi, Avi After Xiantao's irq_common patches were checked in, we found that it's impossible to compile with VT-d userspace now. Essentially the problem is Makefile missed a $ since unifdef patch checked in half an years ago... But after I fix it, I found it's still impossible to get unifdef run correctly... First, unifdef report error when processing include/linux/kvm.h, but I can't find out what's wrong now. Second, seems at least my unifdef can't deal with #if defined(CONFIG_X86) || defined(CONFIG_IA64) My unifdef version is 1.0(20030701), the latest from debian testing. I also tried one for fc9, same result. How do you think... -- regards Yang,Sheng -- From: Sheng Yang [EMAIL PROTECTED] Date: Thu, 9 Oct 2008 20:45:02 +0800 Subject: [PATCH 1/1] kvm: Fix broken Makefile of kernel module Signed-off-by: Sheng Yang [EMAIL PROTECTED] --- kernel/Makefile |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/Makefile b/kernel/Makefile index f2a71fa..e352f77 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -65,7 +65,7 @@ header-sync: $(LINUX)/arch/$(ARCH_DIR)/include/asm/./kvm*.h \ $T/include/asm-$(ARCH_DIR)/ - set -e for i in $(find $T -name '*.h'); do \ + set -e for i in $$(find $T -name '*.h'); do \ $(call unifdef,$$i); done $(call hack, include/linux/kvm.h) set -e for i in $$(find $T -type f -printf '%P '); \ @@ -79,7 +79,7 @@ source-sync: $(LINUX)/virt/kvm/./*.[cSh] \ $T/ - set -e for i in $(find $T -name '*.c'); do \ + set -e for i in $$(find $T -name '*.c'); do \ $(call unifdef,$$i); done for i in $(hack-files); \ -- 1.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] [RESEND] VT-d: Support multiple device assignment to one guest
Han, Weidong wrote: Avi Kivity wrote: Han, Weidong wrote: I don't want then to share dmar_domains (these are implementation details anyway), just io page tables. kvm --- something (owns io page table) --- dmar_domain (uses shared io page table) --- device Let dmar_domains share io page table is not allowed. VT-d spec allows one domain corresponds to one page table, vice versa. Since the io pagetables are read only for the iommu (right?), I don't see what prevents several iommus from accessing the same pagetable. It's just a bunch of memory. I think the reason is that hardware may use the domain identifier to tag its internal caches. If we want something owns the io page table, which shared by all assigned devices to one guest, we need to redefine dmar_domain which covers all devices assigned to a guest. Then we need to rewrite most of native VT-d code for kvm. Xen doesn't use dmar_domain, instead it implements something as a domain sturcture (with domain id) to own page table. I imagine, Xen shares the io pagetables with the EPT pagetables as well. So io pagetable sharing is allowed. In Xen, VT-d page table doesn't share with EPT pagetable and P2M pagetable. But they can share if the format is the same. One guest has only one something instance, thus has only one page table. It looks like: xen --- something (owns io page table) --- device. But, in KVM side, I think we can reuse native VT-d code, needn't to duplicate another VT-d code. I agree that at this stage, we don't want to do optimization, we need something working first. But let's at least ensure the API allows the optimization later on (and also, that iommu implementation details are hidden from kvm). What I'm proposing is moving the list of kvm_vtd_domains inside the iommu API. The only missing piece is populating a new dmar_domain when a new device is added. We already have I will move kvm_vtd_domain inside the iommu API, and also hide get_kvm_vtd_domain() and release_kvm_vtd_domain() implementation details from kvm. It's hard to move kvm_vtd_domain inside current iommu API. It's kvm specific. It's not elegant to include kvm_vtd_domain stuffs in native VT-d code. I think leave it in kvm side is more clean at this point. Moveover it's very simple. I read Joerg's iommu API foils just now, I think it's good. Native AMD iommu code will be in 2.6.28, it's a suitable to implement a generic iommu API based both on Intel and AMD iommu for kvm after 2.6.28. What's your opinion? Regards, Weidong intel_iommu_iova_to_pfn(), we need to add a way to read the protection bits and the highest mapped iova (oh, and intel_iommu_iova_to_pfn() has a bug: it shifts right instead of left). Why do we need the protection bits and the highest mapped iova? Shifting right instead of left in intel_iommu_iova_to_pfn() is not a bug, because it returns pfn, not address. Regards, Weidong Later we can make the something (that already contains the list) also own the io page table; and non-kvm users can still use the same code (the list will always be of length 1 for these users). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: exit timing analysis v1 - commentsdiscussions welcome
I modified the code according to your comments and my ideas, the new values are shown in column impISF (irq delivery, Stat, FindFirstBit) I changed some code of the statistic updating and the interrupt delivery and got this: base - impirq (d3) - impstat (d5) - impboth - impISF a) 12.57% - 11.13% - 12.05% - 11.03% - 12.28% exit, saving guest state (booke_interrupt.S) b) 7.37% - 9.38% - 8.69% - 8.07% - 10.13% reaching kvmppc_handle_exit c) 7.38% - 7.20% - 7.49% - 9.78% - 7.85% syscall exit is checked and a interrupt is queued using kvmppc_queue_exception d1) 2.49% - 3.39% - 2.56% - 3.30% - 3.70% some checks for all exits d2) 8.84% - 8.56% - 9.28% - 8.31% - 6.07% finding first bit in kvmppc_check_and_deliver_interrupts d3) 6.53% - 5.25% - 6.63% - 5.10% - 4.27% can_deliver in kvmppc_check_and_deliver_interrupts d4) 13.66% - 15.37% - 14.12% - 14.92% - 13.96% cleardeliver exception in kvmppc_check_and_deliver_interrupts d5) 3.65% - 4.57% - 2.68% - 4.41% - 3.77% updating kvm_stat statistics e) 6.55% - 6.30% - 6.30% - 5.89% - 6.74% returning from kvmppc_handle_exit to booke_interrupt.S f1) 30.90% - 28.78% - 30.16% - 29.16% - 31.19% restoring guest tlb f2) 4.81% - 4.77% - 5.06% - 4.66% - 5.17% restoring guest state ([s]regs) We all see the measurement inaccuracy, but the last columns look good at the improved sections d2, d3 and d4. I'll remove these detailed tracing soon and make a larger test hoping that this will not have the inaccuracy. But for now I still wonder about the ~14% for cleardeliver - that should just not be that much. It should be worth to look into that section once again more in detail first. Christian Ehrhardt wrote: Hollis Blanchard wrote: On Wed, 2008-10-08 at 15:49 +0200, Christian Ehrhardt wrote: Wondering about that 30.5% for postprocessing and kvmppc_check_and_deliver_interrupts I quickly checked that in detail - part d is now divided in 4 subparts. I also looked at the return to guest path if the expected part (restoring tlb) is really the main time eater there. The result shows clearly that it is. more detailed breakdown: a) 10.94% - exit, saving guest state (booke_interrupt.S) b) 8.12% - reaching kvmppc_handle_exit c) 7.59% - syscall exit is checked and a interrupt is queued using kvmppc_queue_exception d1) 3.33% - some checks for all exits d2) 8.29% - finding first bit in kvmppc_check_and_deliver_interrupts d3) 17.20% - can_deliver/cleardeliver exception in kvmppc_check_and_deliver_interrupts d4) 4.47% - updating kvm_stat statistics e) 6.13% - returning from kvmppc_handle_exit to booke_interrupt.S f1) 29.18% - restoring guest tlb f2) 4.69% - restoring guest state ([s]regs) These fractions are % of our ~12µs syscall exit. = restoring tlb on each reenter = 4µs constant overhead = looking a bit into irq delivery and other constant things like kvm_stat updating ... Now I go for the TLB replacement in f1. Hang on... does d3 make sense to you? It doesn't to me, and if there's a bug there it will be easier to fix than rewriting the TLB code. :) I did not give up improving that part too :-) I think your core runs at 667MHz, right? So that's 1.5 ns/cycle. 17.20% of 12µs is 2064ns, or about 1300 cycles. (Check my math.) I get the same results. 1% ~ 80 cycles. Now when I look at kvmppc_core_deliver_interrupts(), I'm not sure where that time is going. We're assuming the first_first_bit() loop usually executes once, for syscall. Does it actually execute more than that? I don't expect any of kvmppc_can_deliver_interrupt(), kvmppc_booke_clear_exception(), or kvmppc_booke_deliver_interrupt() to take lots of time. You can see below that I already had a more detailed breakdown in my old mail: [...] d2) 8.84% - 8.56% - 9.28% - 8.31% finding first bit in kvmppc_check_and_deliver_interrupts d3) 6.53% - 5.25% - 6.63% - 5.10% can_deliver in kvmppc_check_and_deliver_interrupts d4) 13.66% - 15.37% - 14.12% - 14.92% cleardeliver exception in kvmppc_check_and_deliver_interrupts [...] Could it be cache effects? exception_priority[] and priority_exception[] are 16 bytes each, and our L1 cacheline is 32 bytes, so they should both fit into one... except they're not aligned. I would be so happy if I would have hardware performance counters like cache misses :-) Also, it looks like we use the generic find_first_bit(). That may be more expensive than we'd like. However, since vcpu-arch.pending_exceptions is a single long (not an arbitrary sized bitfield), we should be able to use ffs() instead, which has an optimized PowerPC implementation. That might help a lot. good idea. I'll check this and some other small improvements I have in mind. We might even be able to replace find_next_bit()