[PATCH] x86: Rename mtrr_state struct and macro names

2008-10-09 Thread Avi Kivity
From: Sheng Yang [EMAIL PROTECTED]

Prepare for exporting them.

Signed-off-by: Sheng Yang [EMAIL PROTECTED]
Signed-off-by: Avi Kivity [EMAIL PROTECTED]

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c 
b/arch/x86/kernel/cpu/mtrr/generic.c
index cb7d3b6..b9574a6 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -14,9 +14,9 @@
 #include asm/pat.h
 #include mtrr.h
 
-struct mtrr_state {
-   struct mtrr_var_range var_ranges[MAX_VAR_RANGES];
-   mtrr_type fixed_ranges[NUM_FIXED_RANGES];
+struct mtrr_state_type {
+   struct mtrr_var_range var_ranges[MTRR_MAX_VAR_RANGES];
+   mtrr_type fixed_ranges[MTRR_NUM_FIXED_RANGES];
unsigned char enabled;
unsigned char have_fixed;
mtrr_type def_type;
@@ -35,7 +35,7 @@ static struct fixed_range_block fixed_range_blocks[] = {
 };
 
 static unsigned long smp_changes_mask;
-static struct mtrr_state mtrr_state = {};
+static struct mtrr_state_type mtrr_state = {};
 static int mtrr_state_set;
 u64 mtrr_tom2;
 
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 885c826..edadf7b 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -49,7 +49,7 @@
 
 u32 num_var_ranges = 0;
 
-unsigned int mtrr_usage_table[MAX_VAR_RANGES];
+unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
 static DEFINE_MUTEX(mtrr_mutex);
 
 u64 size_or_mask, size_and_mask;
@@ -574,7 +574,7 @@ struct mtrr_value {
unsigned long   lsize;
 };
 
-static struct mtrr_value mtrr_state[MAX_VAR_RANGES];
+static struct mtrr_value mtrr_state[MTRR_MAX_VAR_RANGES];
 
 static int mtrr_save(struct sys_device * sysdev, pm_message_t state)
 {
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index 2dc4ec6..9885382 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -11,8 +11,9 @@
 #define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg))
 #define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)
 
-#define NUM_FIXED_RANGES 88
-#define MAX_VAR_RANGES 256
+#define MTRR_NUM_FIXED_RANGES 88
+#define MTRR_MAX_VAR_RANGES 256
+
 #define MTRRfix64K_0_MSR 0x250
 #define MTRRfix16K_8_MSR 0x258
 #define MTRRfix16K_A_MSR 0x259
@@ -33,7 +34,7 @@
an 8 bit field: */
 typedef u8 mtrr_type;
 
-extern unsigned int mtrr_usage_table[MAX_VAR_RANGES];
+extern unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
 
 struct mtrr_ops {
u32 vendor;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] [RESEND] VT-d: Support multiple device assignment to one guest

2008-10-09 Thread Han, Weidong
Avi Kivity wrote:
 Han, Weidong wrote:
 If we devolve this to the iommu API, the same io page table can be
 shared by all iommus, so long as they all use the same page table
 format. 
 
 
 I don't understand how to handle this by iommu API. Let me explain
 my thoughts more clearly: 
 
 VT-d spec says:
  Context-entries programmed with the same domain identifier must
 always reference the same address translation structure (through the
 ASR field). Similarly, context-entries referencing the same address
 translation structure must be programmed with the same domain id.
 
 In native VT-d driver, dmar_domain is per device, and has its own
 VT-d page table, which is dynamically setup before each DMA. So it is
 impossible that the same VT-d page table is shared by all iommus.
 Moveover different iommus in system may have different page table
 levels.
 
 Right.  This use case is in essence to prevent unintended sharing.  It
 is also likely to have low page table height, since dma sizes are
 relatively small.
 
 I think it's enough that iommu API tells us its iommu of a
 device.
 
 
 While this is tangential to our conversation, why?  Even for the
 device driver use case, this only makes the API more complex.  If the
 API hides the existence of multiple iommus, it's easier to use and
 harder to make a mistake.
 
 Whereas in KVM side, the same VT-d page table can be shared by the
 devices which are under smae iommu and assigned to the same guest,
 because all of the guest's memory are statically mapped in VT-d page
 table. But it needs to wrap dmar_domain, this patch wraps it with a
 reference count for multiple devices relate to same dmar_domain.
 
 This patch already adds an API (intel_iommu_device_get_iommu()) in
 intel-iommu.c, which returns its iommu of a device.
 
 There is a missed optimization here.  Suppose we have two devices each
 under a different iommu.  With the patch, each will be in a different
 dmar_domain and so will have a different page table.  The amount of
 memory used is doubled.

You cannot let two devices each under a different iommu share one
dmar_domain, becasue dmar_domain has a pointer to iommu.

 
 Suppose the iommu API hides the existence of multiple iommus.  You
 allocate a translation and add devices to it.  When you add a device,
 the iommu API checks which iommu is needed and programs it
 accordingly, but only one io page table is used.
 
 The other benefit is that iommu developers understand this issues
 while kvm developers don't, so it's best managed by the iommu API. 
 This way if things change (as usual, becoming more complicated), the
 iommu can make the changes in their code and hide the complexity from
 kvm or other users.
 
 I'm probably (badly) duplicating Joerg's iommu API here, but this is
 how it could go:
 
 iommu_translation_create() - creates an iommu translation object; this
 allocates the page tables but doesn't do anything with them
 iommu_translation_map() - adds pages to the translation
 iommu_translation_attach() - attach a device to the translation; this
 locates the iommu and programs it
 _detach(), _unmap(), and _free() undo these operations.

In fact, the exported APIs added for KVM VT-d also do
create/map/attach/detach/free functions. Whereas these iommu APIs are
more readable. 

Because kvm VT-d usage is different with native usage, it's inevitable
extend native VT-d code to support KVM VT-d (such as wrap dmar_domain).
For devices under different iommus, they cannot share the same
dmar_domain, thus they cannot share VT-d page table. If we want to
handle this by iommu APIs, I suspect we need to change lots of native
VT-d driver code.

David/Jesse, what's your opinion?



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2143498 ] FreeBSD fails to reboot

2008-10-09 Thread SourceForge.net
Bugs item #2143498, was opened at 2008-10-03 05:38
Message generated for change (Comment added) made by rtg20
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2143498group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Matt Lehner (mlehner)
Assigned to: Nobody/Anonymous (nobody)
Summary: FreeBSD fails to reboot

Initial Comment:
Xeon E5430, ubuntu 8.04 x86_64, currently kvm-62. Reports of same issue in 
kvm76:

https://bugs.launchpad.net/ubuntu/+source/kvm/+bug/239107

Host kernel: 2.6.24-19
Guest: FreeBSD 6.2, i386 and amd64

Problem: FreeBSD will start normally as a guest OS. When a reboot is issued 
to the guest, it will not come back up without destroying the guest, and then 
starting it again.

Screenshots from a reboot:
http://lehner.pair.com/screenshot1.png
Note that the drive is listed.

http://lehner.pair.com/screenshot2.png

In the second screenshot, the drive should be listed right after the 
Timecounters lines.

If the guest is destroyed at that point, and restarted it will come up fine. 
This can be reproduced 100% of the time. Happens both when using a file or a 
partition for the guest OS.

--

Comment By: Roman Yepishev (rtg20)
Date: 2008-10-09 10:26

Message:
KVM-76: When passed --no-kvm option - the reboot is happening without any
problems, so this has something to do with kernel kvm module as well.

--

Comment By: Roman Yepishev (rtg20)
Date: 2008-10-07 23:35

Message:
Reproducible for kvm-74,kvm-75,kvm-76 with FreeBSD 6.2, 6.3 and 7.0.

This is KVM-specific bug. QEMU 0.9.1 does not suffer from it.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2143498group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] x86 emulator: Add Src2 decode set

2008-10-09 Thread Guillaume Thouvenin
Instruction like shld has three operands, so we need to add a Src2
decode set. We start with Src2None, Src2CL, and Src2Imm8 to support
shld and we will expand it later.

Signed-off-by: Guillaume Thouvenin [EMAIL PROTECTED]
---
 arch/x86/kvm/x86_emulate.c|   47 --
 include/asm-x86/kvm_x86_emulate.h |1 
 2 files changed, 36 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index a391e21..c9ef2da 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -59,16 +59,21 @@
 #define SrcImm  (54) /* Immediate operand. */
 #define SrcImmByte  (64) /* 8-bit sign-extended immediate operand. */
 #define SrcMask (74)
+/* Source 2 operand type */
+#define Src2None(07)
+#define Src2CL  (17)
+#define Src2Imm8(27)
+#define Src2Mask(77)
 /* Generic ModRM decode. */
-#define ModRM   (17)
+#define ModRM   (124)
 /* Destination is only written; never read. */
-#define Mov (18)
-#define BitOp   (19)
-#define MemAbs  (110)  /* Memory operand is absolute displacement */
-#define String  (112) /* String instruction (rep capable) */
-#define Stack   (113) /* Stack instruction (push/pop) */
-#define Group   (114) /* Bits 3:5 of modrm byte extend opcode */
-#define GroupDual   (115) /* Alternate decoding of mod == 3 */
+#define Mov (125)
+#define BitOp   (126)
+#define MemAbs  (127) /* Memory operand is absolute displacement */
+#define String  (128) /* String instruction (rep capable) */
+#define Stack   (129) /* Stack instruction (push/pop) */
+#define Group   (130) /* Bits 3:5 of modrm byte extend opcode */
+#define GroupDual   (131) /* Alternate decoding of mod == 3 */
 #define GroupMask   0xff/* Group number stored in bits 0:7 */
 
 enum {
@@ -76,7 +81,7 @@ enum {
Group1A, Group3_Byte, Group3, Group4, Group5, Group7,
 };
 
-static u16 opcode_table[256] = {
+static u32 opcode_table[256] = {
/* 0x00 - 0x07 */
ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
@@ -195,7 +200,7 @@ static u16 opcode_table[256] = {
ImplicitOps, ImplicitOps, Group | Group4, Group | Group5,
 };
 
-static u16 twobyte_table[256] = {
+static u32 twobyte_table[256] = {
/* 0x00 - 0x0F */
0, Group | GroupDual | Group7, 0, 0, 0, 0, ImplicitOps, 0,
ImplicitOps, ImplicitOps, 0, 0, 0, ImplicitOps | ModRM, 0, 0,
@@ -253,7 +258,7 @@ static u16 twobyte_table[256] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
 };
 
-static u16 group_table[] = {
+static u32 group_table[] = {
[Group1_80*8] =
ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
@@ -297,7 +302,7 @@ static u16 group_table[] = {
SrcMem16 | ModRM | Mov, SrcMem | ModRM | ByteOp,
 };
 
-static u16 group2_table[] = {
+static u32 group2_table[] = {
[Group7*8] =
SrcNone | ModRM, 0, 0, 0,
SrcNone | ModRM | DstMem | Mov, 0,
@@ -1043,6 +1048,24 @@ done_prefixes:
break;
}
 
+   /*
+* Decode and fetch the second source operand: register, memory
+* or immediate.
+*/
+   switch (c-d  Src2Mask) {
+   case Src2None:
+   break;
+   case Src2CL:
+   c-src2.val = c-regs[VCPU_REGS_RCX];
+   break;
+   case Src2Imm8:
+   c-src2.type = OP_IMM;
+   c-src2.ptr = (unsigned long *)c-eip;
+   c-src2.bytes = 1;
+   c-src2.val = insn_fetch(u8, 1, c-eip);
+   break;
+   }
+
/* Decode and fetch the destination operand: register or memory. */
switch (c-d  DstMask) {
case ImplicitOps:
diff --git a/include/asm-x86/kvm_x86_emulate.h 
b/include/asm-x86/kvm_x86_emulate.h
index 4e8c1e4..00de896 100644
--- a/include/asm-x86/kvm_x86_emulate.h
+++ b/include/asm-x86/kvm_x86_emulate.h
@@ -123,6 +123,7 @@ struct decode_cache {
u8 ad_bytes;
u8 rex_prefix;
struct operand src;
+   struct operand src2;
struct operand dst;
bool has_seg_override;
u8 seg_override;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] x86: Export some definition of MTRR

2008-10-09 Thread Sheng Yang
For KVM can reuse the type define, and need them to support shadow MTRR.

Signed-off-by: Sheng Yang [EMAIL PROTECTED]
---
 arch/x86/kernel/cpu/mtrr/generic.c |   12 +++-
 arch/x86/kernel/cpu/mtrr/mtrr.h|   17 -
 include/asm-x86/mtrr.h |   25 +
 3 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c 
b/arch/x86/kernel/cpu/mtrr/generic.c
index b9574a6..aa414ab 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -14,14 +14,6 @@
 #include asm/pat.h
 #include mtrr.h
 
-struct mtrr_state_type {
-   struct mtrr_var_range var_ranges[MTRR_MAX_VAR_RANGES];
-   mtrr_type fixed_ranges[MTRR_NUM_FIXED_RANGES];
-   unsigned char enabled;
-   unsigned char have_fixed;
-   mtrr_type def_type;
-};
-
 struct fixed_range_block {
int base_msr; /* start address of an MTRR block */
int ranges;   /* number of MTRRs in this block  */
@@ -35,10 +27,12 @@ static struct fixed_range_block fixed_range_blocks[] = {
 };
 
 static unsigned long smp_changes_mask;
-static struct mtrr_state_type mtrr_state = {};
 static int mtrr_state_set;
 u64 mtrr_tom2;
 
+struct mtrr_state_type mtrr_state = {};
+EXPORT_SYMBOL_GPL(mtrr_state);
+
 #undef MODULE_PARAM_PREFIX
 #define MODULE_PARAM_PREFIX mtrr.
 
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index 9885382..ffd6040 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -8,12 +8,6 @@
 #define MTRRcap_MSR 0x0fe
 #define MTRRdefType_MSR 0x2ff
 
-#define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg))
-#define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)
-
-#define MTRR_NUM_FIXED_RANGES 88
-#define MTRR_MAX_VAR_RANGES 256
-
 #define MTRRfix64K_0_MSR 0x250
 #define MTRRfix16K_8_MSR 0x258
 #define MTRRfix16K_A_MSR 0x259
@@ -30,10 +24,6 @@
 #define MTRR_CHANGE_MASK_VARIABLE  0x02
 #define MTRR_CHANGE_MASK_DEFTYPE   0x04
 
-/* In the Intel processor's MTRR interface, the MTRR type is always held in
-   an 8 bit field: */
-typedef u8 mtrr_type;
-
 extern unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
 
 struct mtrr_ops {
@@ -71,13 +61,6 @@ struct set_mtrr_context {
u32 ccr3;
 };
 
-struct mtrr_var_range {
-   u32 base_lo;
-   u32 base_hi;
-   u32 mask_lo;
-   u32 mask_hi;
-};
-
 void set_mtrr_done(struct set_mtrr_context *ctxt);
 void set_mtrr_cache_disable(struct set_mtrr_context *ctxt);
 void set_mtrr_prepare_save(struct set_mtrr_context *ctxt);
diff --git a/include/asm-x86/mtrr.h b/include/asm-x86/mtrr.h
index a69a01a..2c8657b 100644
--- a/include/asm-x86/mtrr.h
+++ b/include/asm-x86/mtrr.h
@@ -57,6 +57,31 @@ struct mtrr_gentry {
 };
 #endif /* !__i386__ */
 
+struct mtrr_var_range {
+   u32 base_lo;
+   u32 base_hi;
+   u32 mask_lo;
+   u32 mask_hi;
+};
+
+/* In the Intel processor's MTRR interface, the MTRR type is always held in
+   an 8 bit field: */
+typedef u8 mtrr_type;
+
+#define MTRR_NUM_FIXED_RANGES 88
+#define MTRR_MAX_VAR_RANGES 256
+
+struct mtrr_state_type {
+   struct mtrr_var_range var_ranges[MTRR_MAX_VAR_RANGES];
+   mtrr_type fixed_ranges[MTRR_NUM_FIXED_RANGES];
+   unsigned char enabled;
+   unsigned char have_fixed;
+   mtrr_type def_type;
+};
+
+#define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg))
+#define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)
+
 /*  These are the various ioctls  */
 #define MTRRIOC_ADD_ENTRY_IOW(MTRR_IOCTL_BASE,  0, struct mtrr_sentry)
 #define MTRRIOC_SET_ENTRY_IOW(MTRR_IOCTL_BASE,  1, struct mtrr_sentry)
-- 
1.5.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] MTRR/PAT support for EPT (v3)

2008-10-09 Thread Sheng Yang
Hi, Avi

Here is the latest update of MTRR/PAT support.

Change from v2:
Discard the using of MSR bitmap, add MSR_IA32_CR_PAT to save/restore, as well
as rebase on latest upstream.

Thanks!
--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] Enable MTRR for EPT

2008-10-09 Thread Sheng Yang
The effective memory type of EPT is the mixture of MSR_IA32_CR_PAT and memory
type field of EPT entry.

Signed-off-by: Sheng Yang [EMAIL PROTECTED]
---
 arch/x86/kvm/mmu.c |   11 ++-
 arch/x86/kvm/svm.c |6 ++
 arch/x86/kvm/vmx.c |   12 +---
 arch/x86/kvm/x86.c |2 +-
 include/asm-x86/kvm_host.h |3 ++-
 5 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f590142..79cb4a9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -168,6 +168,7 @@ static u64 __read_mostly shadow_x_mask; /* mutual 
exclusive with nx_mask */
 static u64 __read_mostly shadow_user_mask;
 static u64 __read_mostly shadow_accessed_mask;
 static u64 __read_mostly shadow_dirty_mask;
+static u64 __read_mostly shadow_mt_mask;
 
 void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte)
 {
@@ -183,13 +184,14 @@ void kvm_mmu_set_base_ptes(u64 base_pte)
 EXPORT_SYMBOL_GPL(kvm_mmu_set_base_ptes);
 
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
-   u64 dirty_mask, u64 nx_mask, u64 x_mask)
+   u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 mt_mask)
 {
shadow_user_mask = user_mask;
shadow_accessed_mask = accessed_mask;
shadow_dirty_mask = dirty_mask;
shadow_nx_mask = nx_mask;
shadow_x_mask = x_mask;
+   shadow_mt_mask = mt_mask;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
 
@@ -1546,6 +1548,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
 {
u64 spte;
int ret = 0;
+   u64 mt_mask = shadow_mt_mask;
+
/*
 * We don't set the accessed bit, since we sometimes want to see
 * whether the guest actually used the pte (in order to detect
@@ -1564,6 +1568,11 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
spte |= shadow_user_mask;
if (largepage)
spte |= PT_PAGE_SIZE_MASK;
+   if (mt_mask) {
+   mt_mask = get_memory_type(vcpu, gfn) 
+ kvm_x86_ops-get_mt_mask_shift();
+   spte |= mt_mask;
+   }
 
spte |= (u64)pfn  PAGE_SHIFT;
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 9c4ce65..05efc4e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1912,6 +1912,11 @@ static int get_npt_level(void)
 #endif
 }
 
+static int svm_get_mt_mask_shift(void)
+{
+   return 0;
+}
+
 static struct kvm_x86_ops svm_x86_ops = {
.cpu_has_kvm_support = has_svm,
.disabled_by_bios = is_disabled,
@@ -1967,6 +1972,7 @@ static struct kvm_x86_ops svm_x86_ops = {
 
.set_tss_addr = svm_set_tss_addr,
.get_tdp_level = get_npt_level,
+   .get_mt_mask_shift = svm_get_mt_mask_shift,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 809427e..3d56554 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3572,6 +3572,11 @@ static int get_ept_level(void)
return VMX_EPT_DEFAULT_GAW + 1;
 }
 
+static int vmx_get_mt_mask_shift(void)
+{
+   return VMX_EPT_MT_EPTE_SHIFT;
+}
+
 static struct kvm_x86_ops vmx_x86_ops = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
@@ -3627,6 +3632,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
 
.set_tss_addr = vmx_set_tss_addr,
.get_tdp_level = get_ept_level,
+   .get_mt_mask_shift = vmx_get_mt_mask_shift,
 };
 
 static int __init vmx_init(void)
@@ -3682,10 +3688,10 @@ static int __init vmx_init(void)
if (vm_need_ept()) {
bypass_guest_pf = 0;
kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK |
-   VMX_EPT_WRITABLE_MASK |
-   VMX_EPT_DEFAULT_MT  VMX_EPT_MT_EPTE_SHIFT);
+   VMX_EPT_WRITABLE_MASK);
kvm_mmu_set_mask_ptes(0ull, 0ull, 0ull, 0ull,
-   VMX_EPT_EXECUTABLE_MASK);
+   VMX_EPT_EXECUTABLE_MASK,
+   VMX_EPT_DEFAULT_MT  VMX_EPT_MT_EPTE_SHIFT);
kvm_enable_tdp();
} else
kvm_disable_tdp();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index df98a1f..dda478e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2614,7 +2614,7 @@ int kvm_arch_init(void *opaque)
kvm_mmu_set_nonpresent_ptes(0ull, 0ull);
kvm_mmu_set_base_ptes(PT_PRESENT_MASK);
kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
-   PT_DIRTY_MASK, PT64_NX_MASK, 0);
+   PT_DIRTY_MASK, PT64_NX_MASK, 0, 0);
return 0;
 
 out:
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 1c25cb7..4b06ca8 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -480,6 +480,7 @@ struct kvm_x86_ops {
 
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int 

[PATCH 4/6] KVM: VMX: Add PAT support for EPT

2008-10-09 Thread Sheng Yang
GUEST_PAT support is a new feature introduced by Intel Core i7 architecture.
With this, cpu would save/load guest and host PAT automatically, for EPT memory
type in guest depends on MSR_IA32_CR_PAT.

Also add save/restore for MSR_IA32_CR_PAT.

Signed-off-by: Sheng Yang [EMAIL PROTECTED]
---
 arch/x86/kvm/vmx.c |   29 ++---
 arch/x86/kvm/vmx.h |7 +++
 arch/x86/kvm/x86.c |2 +-
 3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a2911cb..809427e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -962,6 +962,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 data)
pr_unimpl(vcpu, unimplemented perfctr wrmsr: 0x%x data 
0x%llx\n, msr_index, data);
 
break;
+   case MSR_IA32_CR_PAT:
+   if (vmcs_config.vmentry_ctrl  VM_ENTRY_LOAD_IA32_PAT) {
+   vmcs_write64(GUEST_IA32_PAT, data);
+   vcpu-arch.pat = data;
+   break;
+   }
+   /* Otherwise falls through to kvm_set_msr_common */
default:
vmx_load_host_state(vmx);
msr = find_msr_entry(vmx, msr_index);
@@ -1181,12 +1188,13 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
 #ifdef CONFIG_X86_64
min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
 #endif
-   opt = 0;
+   opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT;
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
_vmexit_control)  0)
return -EIO;
 
-   min = opt = 0;
+   min = 0;
+   opt = VM_ENTRY_LOAD_IA32_PAT;
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS,
_vmentry_control)  0)
return -EIO;
@@ -2092,8 +2100,9 @@ static void vmx_disable_intercept_for_msr(struct page 
*msr_bitmap, u32 msr)
  */
 static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
 {
-   u32 host_sysenter_cs;
+   u32 host_sysenter_cs, msr_low, msr_high;
u32 junk;
+   u64 host_pat;
unsigned long a;
struct descriptor_table dt;
int i;
@@ -2181,6 +2190,20 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
rdmsrl(MSR_IA32_SYSENTER_EIP, a);
vmcs_writel(HOST_IA32_SYSENTER_EIP, a);   /* 22.2.3 */
 
+   if (vmcs_config.vmexit_ctrl  VM_EXIT_LOAD_IA32_PAT) {
+   rdmsr(MSR_IA32_CR_PAT, msr_low, msr_high);
+   host_pat = msr_low | ((u64) msr_high  32);
+   vmcs_write64(HOST_IA32_PAT, host_pat);
+   }
+   if (vmcs_config.vmentry_ctrl  VM_ENTRY_LOAD_IA32_PAT) {
+   rdmsr(MSR_IA32_CR_PAT, msr_low, msr_high);
+   host_pat = msr_low | ((u64) msr_high  32);
+   /* Write the default value follow host pat */
+   vmcs_write64(GUEST_IA32_PAT, host_pat);
+   /* Keep arch.pat sync with GUEST_IA32_PAT */
+   vmx-vcpu.arch.pat = host_pat;
+   }
+
for (i = 0; i  NR_VMX_MSR; ++i) {
u32 index = vmx_msr_index[i];
u32 data_low, data_high;
diff --git a/arch/x86/kvm/vmx.h b/arch/x86/kvm/vmx.h
index 3e010d2..3ad61dc 100644
--- a/arch/x86/kvm/vmx.h
+++ b/arch/x86/kvm/vmx.h
@@ -63,10 +63,13 @@
 
 #define VM_EXIT_HOST_ADDR_SPACE_SIZE0x0200
 #define VM_EXIT_ACK_INTR_ON_EXIT0x8000
+#define VM_EXIT_SAVE_IA32_PAT  0x0004
+#define VM_EXIT_LOAD_IA32_PAT  0x0008
 
 #define VM_ENTRY_IA32E_MODE 0x0200
 #define VM_ENTRY_SMM0x0400
 #define VM_ENTRY_DEACT_DUAL_MONITOR 0x0800
+#define VM_ENTRY_LOAD_IA32_PAT 0x4000
 
 /* VMCS Encodings */
 enum vmcs_field {
@@ -112,6 +115,8 @@ enum vmcs_field {
VMCS_LINK_POINTER_HIGH  = 0x2801,
GUEST_IA32_DEBUGCTL = 0x2802,
GUEST_IA32_DEBUGCTL_HIGH= 0x2803,
+   GUEST_IA32_PAT  = 0x2804,
+   GUEST_IA32_PAT_HIGH = 0x2805,
GUEST_PDPTR0= 0x280a,
GUEST_PDPTR0_HIGH   = 0x280b,
GUEST_PDPTR1= 0x280c,
@@ -120,6 +125,8 @@ enum vmcs_field {
GUEST_PDPTR2_HIGH   = 0x280f,
GUEST_PDPTR3= 0x2810,
GUEST_PDPTR3_HIGH   = 0x2811,
+   HOST_IA32_PAT   = 0x2c00,
+   HOST_IA32_PAT_HIGH  = 0x2c01,
PIN_BASED_VM_EXEC_CONTROL   = 0x4000,
CPU_BASED_VM_EXEC_CONTROL   = 0x4002,
EXCEPTION_BITMAP= 0x4004,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b335129..df98a1f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -452,7 +452,7 @@ static u32 msrs_to_save[] = 

[PATCH 1/6] x86: Rename mtrr_state struct and macro names

2008-10-09 Thread Sheng Yang
Prepare for exporting them.

Signed-off-by: Sheng Yang [EMAIL PROTECTED]
---
 arch/x86/kernel/cpu/mtrr/generic.c |8 
 arch/x86/kernel/cpu/mtrr/main.c|4 ++--
 arch/x86/kernel/cpu/mtrr/mtrr.h|7 ---
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c 
b/arch/x86/kernel/cpu/mtrr/generic.c
index cb7d3b6..b9574a6 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -14,9 +14,9 @@
 #include asm/pat.h
 #include mtrr.h
 
-struct mtrr_state {
-   struct mtrr_var_range var_ranges[MAX_VAR_RANGES];
-   mtrr_type fixed_ranges[NUM_FIXED_RANGES];
+struct mtrr_state_type {
+   struct mtrr_var_range var_ranges[MTRR_MAX_VAR_RANGES];
+   mtrr_type fixed_ranges[MTRR_NUM_FIXED_RANGES];
unsigned char enabled;
unsigned char have_fixed;
mtrr_type def_type;
@@ -35,7 +35,7 @@ static struct fixed_range_block fixed_range_blocks[] = {
 };
 
 static unsigned long smp_changes_mask;
-static struct mtrr_state mtrr_state = {};
+static struct mtrr_state_type mtrr_state = {};
 static int mtrr_state_set;
 u64 mtrr_tom2;
 
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 885c826..edadf7b 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -49,7 +49,7 @@
 
 u32 num_var_ranges = 0;
 
-unsigned int mtrr_usage_table[MAX_VAR_RANGES];
+unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
 static DEFINE_MUTEX(mtrr_mutex);
 
 u64 size_or_mask, size_and_mask;
@@ -574,7 +574,7 @@ struct mtrr_value {
unsigned long   lsize;
 };
 
-static struct mtrr_value mtrr_state[MAX_VAR_RANGES];
+static struct mtrr_value mtrr_state[MTRR_MAX_VAR_RANGES];
 
 static int mtrr_save(struct sys_device * sysdev, pm_message_t state)
 {
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index 2dc4ec6..9885382 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -11,8 +11,9 @@
 #define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg))
 #define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)
 
-#define NUM_FIXED_RANGES 88
-#define MAX_VAR_RANGES 256
+#define MTRR_NUM_FIXED_RANGES 88
+#define MTRR_MAX_VAR_RANGES 256
+
 #define MTRRfix64K_0_MSR 0x250
 #define MTRRfix16K_8_MSR 0x258
 #define MTRRfix16K_A_MSR 0x259
@@ -33,7 +34,7 @@
an 8 bit field: */
 typedef u8 mtrr_type;
 
-extern unsigned int mtrr_usage_table[MAX_VAR_RANGES];
+extern unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
 
 struct mtrr_ops {
u32 vendor;
-- 
1.5.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] KVM: Improve MTRR structure

2008-10-09 Thread Sheng Yang
As well as reset mmu context when set MTRR.

Signed-off-by: Sheng Yang [EMAIL PROTECTED]
---
 arch/x86/kvm/x86.c |   61 ++-
 include/asm-x86/kvm_host.h |5 +++-
 2 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b2d3f06..b335129 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -39,6 +39,7 @@
 #include asm/uaccess.h
 #include asm/msr.h
 #include asm/desc.h
+#include asm/mtrr.h
 
 #define MAX_IO_MSRS 256
 #define CR0_RESERVED_BITS  \
@@ -650,10 +651,38 @@ static bool msr_mtrr_valid(unsigned msr)
 
 static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 {
+   u64 *p = (u64 *)vcpu-arch.mtrr_state.fixed_ranges;
+
if (!msr_mtrr_valid(msr))
return 1;
 
-   vcpu-arch.mtrr[msr - 0x200] = data;
+   if (msr == MSR_MTRRdefType) {
+   vcpu-arch.mtrr_state.def_type = data;
+   vcpu-arch.mtrr_state.enabled = (data  0xc00)  10;
+   } else if (msr == MSR_MTRRfix64K_0)
+   p[0] = data;
+   else if (msr == MSR_MTRRfix16K_8 || msr == MSR_MTRRfix16K_A)
+   p[1 + msr - MSR_MTRRfix16K_8] = data;
+   else if (msr = MSR_MTRRfix4K_C  msr = MSR_MTRRfix4K_F8000)
+   p[3 + msr - MSR_MTRRfix4K_C] = data;
+   else if (msr == MSR_IA32_CR_PAT)
+   vcpu-arch.pat = data;
+   else {  /* Variable MTRRs */
+   int idx, is_mtrr_mask;
+   u64 *pt;
+
+   idx = (msr - 0x200) / 2;
+   is_mtrr_mask = msr - 0x200 - 2 * idx;
+   if (!is_mtrr_mask)
+   pt =
+ (u64 *)vcpu-arch.mtrr_state.var_ranges[idx].base_lo;
+   else
+   pt =
+ (u64 *)vcpu-arch.mtrr_state.var_ranges[idx].mask_lo;
+   *pt = data;
+   }
+
+   kvm_mmu_reset_context(vcpu);
return 0;
 }
 
@@ -749,10 +778,37 @@ int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 
*pdata)
 
 static int get_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
 {
+   u64 *p = (u64 *)vcpu-arch.mtrr_state.fixed_ranges;
+
if (!msr_mtrr_valid(msr))
return 1;
 
-   *pdata = vcpu-arch.mtrr[msr - 0x200];
+   if (msr == MSR_MTRRdefType)
+   *pdata = vcpu-arch.mtrr_state.def_type +
+(vcpu-arch.mtrr_state.enabled  10);
+   else if (msr == MSR_MTRRfix64K_0)
+   *pdata = p[0];
+   else if (msr == MSR_MTRRfix16K_8 || msr == MSR_MTRRfix16K_A)
+   *pdata = p[1 + msr - MSR_MTRRfix16K_8];
+   else if (msr = MSR_MTRRfix4K_C  msr = MSR_MTRRfix4K_F8000)
+   *pdata = p[3 + msr - MSR_MTRRfix4K_C];
+   else if (msr == MSR_IA32_CR_PAT)
+   *pdata = vcpu-arch.pat;
+   else {  /* Variable MTRRs */
+   int idx, is_mtrr_mask;
+   u64 *pt;
+
+   idx = (msr - 0x200) / 2;
+   is_mtrr_mask = msr - 0x200 - 2 * idx;
+   if (!is_mtrr_mask)
+   pt =
+ (u64 *)vcpu-arch.mtrr_state.var_ranges[idx].base_lo;
+   else
+   pt =
+ (u64 *)vcpu-arch.mtrr_state.var_ranges[idx].mask_lo;
+   *pdata = *pt;
+   }
+
return 0;
 }
 
@@ -3941,6 +3997,7 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
/* We do fxsave: this must be aligned. */
BUG_ON((unsigned long)vcpu-arch.host_fx_image  0xF);
 
+   vcpu-arch.mtrr_state.have_fixed = 1;
vcpu_load(vcpu);
r = kvm_arch_vcpu_reset(vcpu);
if (r == 0)
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 4b5d1eb..1c25cb7 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -21,6 +21,7 @@
 
 #include asm/pvclock-abi.h
 #include asm/desc.h
+#include asm/mtrr.h
 
 #define KVM_MAX_VCPUS 16
 #define KVM_MEMORY_SLOTS 32
@@ -86,6 +87,7 @@
 #define KVM_MIN_FREE_MMU_PAGES 5
 #define KVM_REFILL_PAGES 25
 #define KVM_MAX_CPUID_ENTRIES 40
+#define KVM_NR_FIXED_MTRR_REGION 88
 #define KVM_NR_VAR_MTRR 8
 
 extern spinlock_t kvm_lock;
@@ -329,7 +331,8 @@ struct kvm_vcpu_arch {
bool nmi_injected;
bool nmi_window_open;
 
-   u64 mtrr[0x100];
+   struct mtrr_state_type mtrr_state;
+   u32 pat;
 };
 
 struct kvm_mem_alias {
-- 
1.5.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86 emulator: Add Src2 decode set

2008-10-09 Thread Avi Kivity
Guillaume Thouvenin wrote:
 Instruction like shld has three operands, so we need to add a Src2
 decode set. We start with Src2None, Src2CL, and Src2Imm8 to support
 shld and we will expand it later.

   

Please add Src2One (implied '1') as well, so we can switch the existing
shift operators to Src2 later.

 Signed-off-by: Guillaume Thouvenin [EMAIL PROTECTED]
 ---
  arch/x86/kvm/x86_emulate.c|   47 
 --
  include/asm-x86/kvm_x86_emulate.h |1 
  2 files changed, 36 insertions(+), 12 deletions(-)

 diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
 index a391e21..c9ef2da 100644
 --- a/arch/x86/kvm/x86_emulate.c
 +++ b/arch/x86/kvm/x86_emulate.c
 @@ -59,16 +59,21 @@
  #define SrcImm  (54)   /* Immediate operand. */
  #define SrcImmByte  (64)   /* 8-bit sign-extended immediate operand. */
  #define SrcMask (74)
 +/* Source 2 operand type */
 +#define Src2None(07)
 +#define Src2CL  (17)
 +#define Src2Imm8(27)
 +#define Src2Mask(77)
   

Please allocate bits for this at the end to avoid renumbering.

  
 + /*
 +  * Decode and fetch the second source operand: register, memory
 +  * or immediate.
 +  */
 + switch (c-d  Src2Mask) {
 + case Src2None:
 + break;
 + case Src2CL:
 + c-src2.val = c-regs[VCPU_REGS_RCX];
   

Mask to a single byte; also set the operand length.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] KVM: IRQ ACK notifier should be used with in-kernel irqchip

2008-10-09 Thread Sheng Yang
Also remove unnecessary parameter of unregister irq ack notifier.

Signed-off-by: Sheng Yang [EMAIL PROTECTED]
---
 include/linux/kvm_host.h |3 +--
 virt/kvm/irq_comm.c  |8 ++--
 virt/kvm/kvm_main.c  |2 +-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3833c48..41955ed 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -313,8 +313,7 @@ void kvm_set_irq(struct kvm *kvm, int irq, int level);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi);
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
   struct kvm_irq_ack_notifier *kian);
-void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
-struct kvm_irq_ack_notifier *kian);
+void kvm_unregister_irq_ack_notifier(struct kvm_irq_ack_notifier *kian);
 
 #ifdef CONFIG_DMAR
 int kvm_iommu_map_pages(struct kvm *kvm, gfn_t base_gfn,
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index d0169f5..54b251d 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -50,11 +50,15 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi)
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
   struct kvm_irq_ack_notifier *kian)
 {
+   /* Must be called with in-kernel IRQ chip, otherwise it's nonsense */
+   ASSERT(irqchip_in_kernel(kvm));
+   ASSERT(kian);
hlist_add_head(kian-link, kvm-arch.irq_ack_notifier_list);
 }
 
-void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
-struct kvm_irq_ack_notifier *kian)
+void kvm_unregister_irq_ack_notifier(struct kvm_irq_ack_notifier *kian)
 {
+   if (!kian)
+   return;
hlist_del(kian-link);
 }
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index cf0ab8e..d2ae1c9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -145,7 +145,7 @@ static void kvm_free_assigned_device(struct kvm *kvm,
if (irqchip_in_kernel(kvm)  assigned_dev-irq_requested)
free_irq(assigned_dev-host_irq, (void *)assigned_dev);
 
-   kvm_unregister_irq_ack_notifier(kvm, assigned_dev-ack_notifier);
+   kvm_unregister_irq_ack_notifier(assigned_dev-ack_notifier);
 
if (cancel_work_sync(assigned_dev-interrupt_work))
/* We had pending work. That means we will have to take
-- 
1.5.4.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86 emulator: Add Src2 decode set

2008-10-09 Thread Avi Kivity
Avi Kivity wrote:
  #define SrcMask (74)
 +/* Source 2 operand type */
 +#define Src2None(07)
 +#define Src2CL  (17)
 +#define Src2Imm8(27)
 


Src2ImmByte like SrcImmByte.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RESEND] VT-d: Support multiple device assignment to one guest

2008-10-09 Thread Avi Kivity
Han, Weidong wrote:

 There is a missed optimization here.  Suppose we have two devices each
 under a different iommu.  With the patch, each will be in a different
 dmar_domain and so will have a different page table.  The amount of
 memory used is doubled.
 

 You cannot let two devices each under a different iommu share one
 dmar_domain, becasue dmar_domain has a pointer to iommu.

   

I don't want then to share dmar_domains (these are implementation
details anyway), just io page tables.


kvm --- something (owns io page table) --- dmar_domain (uses shared io
page table) --- device

Even if we don't implement io page table sharing right away,
implementing the 'something' in the iommu api means we can later
impement sharing without changing the iommu/kvm interface.

 In fact, the exported APIs added for KVM VT-d also do
 create/map/attach/detach/free functions. Whereas these iommu APIs are
 more readable. 

   


No; the existing iommu API talks about dmar domains and exposes the
existence of multiple iommus, so it is more complex.

 Because kvm VT-d usage is different with native usage, it's inevitable
 extend native VT-d code to support KVM VT-d (such as wrap dmar_domain).
 For devices under different iommus, they cannot share the same
 dmar_domain, thus they cannot share VT-d page table. If we want to
 handle this by iommu APIs, I suspect we need to change lots of native
 VT-d driver code.
   

As mentioned above, we can start with implementing the API without
actual sharing (basically, your patch, but as an addition to the API
rather than a change to kvm); we can add io pagetable sharing later.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] KVM: IRQ ACK notifier should be used with in-kernel irqchip

2008-10-09 Thread Avi Kivity
Sheng Yang wrote:
 Also remove unnecessary parameter of unregister irq ack notifier.

 diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
 index d0169f5..54b251d 100644
 --- a/virt/kvm/irq_comm.c
 +++ b/virt/kvm/irq_comm.c
 @@ -50,11 +50,15 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi)
  void kvm_register_irq_ack_notifier(struct kvm *kvm,
  struct kvm_irq_ack_notifier *kian)
  {
 + /* Must be called with in-kernel IRQ chip, otherwise it's nonsense */
 + ASSERT(irqchip_in_kernel(kvm));
 + ASSERT(kian);
   hlist_add_head(kian-link, kvm-arch.irq_ack_notifier_list);
  }
   

We don't want a BUG() here is the user specifies -no-kvm-irqchip; is
there a check on the irq assignment ioctls before calling this?


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] Enable MTRR for EPT

2008-10-09 Thread Avi Kivity
Sheng Yang wrote:
 The effective memory type of EPT is the mixture of MSR_IA32_CR_PAT and memory
 type field of EPT entry.
   


 @@ -168,6 +168,7 @@ static u64 __read_mostly shadow_x_mask;   /* mutual 
 exclusive with nx_mask */
  static u64 __read_mostly shadow_user_mask;
  static u64 __read_mostly shadow_accessed_mask;
  static u64 __read_mostly shadow_dirty_mask;
 +static u64 __read_mostly shadow_mt_mask;
  
   

For shadow, the mt mask is different based on the level of the page
table, so we need an array here.  This can of course be left until
shadow pat is implemented.

 + if (mt_mask) {
 + mt_mask = get_memory_type(vcpu, gfn) 
 +   kvm_x86_ops-get_mt_mask_shift();
 + spte |= mt_mask;
 + }
   

For shadow, it's not a simple shift, since for large pages one of the
bits is at position 12.  So we would need the callback to calculate the
mask value.

Perhaps even simpler, have a 4x8 array, with the first index the page
table level and the second index the memory type.  The initialization
code can prepare the array like it prepares the other masks.

This can wait until we have a shadow pat implementation.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] KVM: IRQ ACK notifier should be used with in-kernel irqchip

2008-10-09 Thread Sheng Yang
On Thursday 09 October 2008 16:34:47 Avi Kivity wrote:
 Sheng Yang wrote:
  Also remove unnecessary parameter of unregister irq ack notifier.
 
  diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
  index d0169f5..54b251d 100644
  --- a/virt/kvm/irq_comm.c
  +++ b/virt/kvm/irq_comm.c
  @@ -50,11 +50,15 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned
  gsi) void kvm_register_irq_ack_notifier(struct kvm *kvm,
 struct kvm_irq_ack_notifier *kian)
   {
  +   /* Must be called with in-kernel IRQ chip, otherwise it's nonsense */
  +   ASSERT(irqchip_in_kernel(kvm));
  +   ASSERT(kian);
  hlist_add_head(kian-link, kvm-arch.irq_ack_notifier_list);
   }

 We don't want a BUG() here is the user specifies -no-kvm-irqchip; is
 there a check on the irq assignment ioctls before calling this?

Yes. kvm_register_irq_ack_notifier should be called within irqchip_in_kernel() 
(on the other side, only if we have irqchip_in_kernel(), ack_notifier is 
useful, so we shouldn't call it without it), And I can't see if this would be 
useful with userspace irqchip, so add a ASSERT here. 

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86 emulator: Add a Src2 decode set and SrcOne operand type

2008-10-09 Thread Guillaume Thouvenin
 Instruction like shld has three operands, so we need to add a Src2
decode set. We start with Src2None, Src2CL, and Src2Imm8 to support
shld and we will expand it later. Operand type of Src2 are placed at
the end of the set to avoid renumbering. For Src2CL we mask to a single
byte and set the operand length.

 This patch also added SrcOne operand type when we need to decode an
implied '1' like with regular shift instruction. 

 If needed I can split this patch into two patches, one for Src2 decode
set and another one for SrcOne. 


Signed-off-by: Guillaume Thouvenin [EMAIL PROTECTED]
---
 arch/x86/kvm/x86_emulate.c|   37 +
 include/asm-x86/kvm_x86_emulate.h |1 +
 2 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index a391e21..b5d7bc8 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -58,6 +58,7 @@
 #define SrcMem32(44) /* Memory operand (32-bit). */
 #define SrcImm  (54) /* Immediate operand. */
 #define SrcImmByte  (64) /* 8-bit sign-extended immediate operand. */
+#define SrcOne  (74) /* Implied '1' */
 #define SrcMask (74)
 /* Generic ModRM decode. */
 #define ModRM   (17)
@@ -70,13 +71,18 @@
 #define Group   (114) /* Bits 3:5 of modrm byte extend opcode */
 #define GroupDual   (115) /* Alternate decoding of mod == 3 */
 #define GroupMask   0xff/* Group number stored in bits 0:7 */
+/* Source 2 operand type */
+#define Src2None(029)
+#define Src2CL  (129)
+#define Src2ImmByte (229)
+#define Src2Mask(729)
 
 enum {
Group1_80, Group1_81, Group1_82, Group1_83,
Group1A, Group3_Byte, Group3, Group4, Group5, Group7,
 };
 
-static u16 opcode_table[256] = {
+static u32 opcode_table[256] = {
/* 0x00 - 0x07 */
ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
@@ -195,7 +201,7 @@ static u16 opcode_table[256] = {
ImplicitOps, ImplicitOps, Group | Group4, Group | Group5,
 };
 
-static u16 twobyte_table[256] = {
+static u32 twobyte_table[256] = {
/* 0x00 - 0x0F */
0, Group | GroupDual | Group7, 0, 0, 0, 0, ImplicitOps, 0,
ImplicitOps, ImplicitOps, 0, 0, 0, ImplicitOps | ModRM, 0, 0,
@@ -253,7 +259,7 @@ static u16 twobyte_table[256] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
 };
 
-static u16 group_table[] = {
+static u32 group_table[] = {
[Group1_80*8] =
ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
@@ -297,7 +303,7 @@ static u16 group_table[] = {
SrcMem16 | ModRM | Mov, SrcMem | ModRM | ByteOp,
 };
 
-static u16 group2_table[] = {
+static u32 group2_table[] = {
[Group7*8] =
SrcNone | ModRM, 0, 0, 0,
SrcNone | ModRM | DstMem | Mov, 0,
@@ -1041,6 +1047,29 @@ done_prefixes:
c-src.bytes = 1;
c-src.val = insn_fetch(s8, 1, c-eip);
break;
+   case SrcOne:
+   c-src.bytes = 1;
+   c-src.val = 1;
+   break;
+   }
+
+   /*
+* Decode and fetch the second source operand: register, memory
+* or immediate.
+*/
+   switch (c-d  Src2Mask) {
+   case Src2None:
+   break;
+   case Src2CL:
+   c-src2.bytes = 1;
+   c-src2.val = c-regs[VCPU_REGS_RCX]  0x8;
+   break;
+   case Src2ImmByte:
+   c-src2.type = OP_IMM;
+   c-src2.ptr = (unsigned long *)c-eip;
+   c-src2.bytes = 1;
+   c-src2.val = insn_fetch(u8, 1, c-eip);
+   break;
}
 
/* Decode and fetch the destination operand: register or memory. */
diff --git a/include/asm-x86/kvm_x86_emulate.h 
b/include/asm-x86/kvm_x86_emulate.h
index 4e8c1e4..00de896 100644
--- a/include/asm-x86/kvm_x86_emulate.h
+++ b/include/asm-x86/kvm_x86_emulate.h
@@ -123,6 +123,7 @@ struct decode_cache {
u8 ad_bytes;
u8 rex_prefix;
struct operand src;
+   struct operand src2;
struct operand dst;
bool has_seg_override;
u8 seg_override;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RESEND] VT-d: Support multiple device assignment to one guest

2008-10-09 Thread Avi Kivity
Han, Weidong wrote:

 I don't want then to share dmar_domains (these are implementation
 details anyway), just io page tables.


 kvm --- something (owns io page table) --- dmar_domain (uses shared
 io page table) --- device

 

 Let dmar_domains share io page table is not allowed. VT-d spec allows
 one domain corresponds to one page table, vice versa. 

Since the io pagetables are read only for the iommu (right?), I don't
see what prevents several iommus from accessing the same pagetable. 
It's just a bunch of memory.

 If we want
 something owns the io page table, which shared by all assigned devices
 to one guest, we need to redefine dmar_domain which covers all devices
 assigned to a guest. Then we need to rewrite most of native VT-d code
 for kvm. Xen doesn't use dmar_domain, instead it implements something
 as a domain sturcture (with domain id) to own page table. 

I imagine, Xen shares the io pagetables with the EPT pagetables as
well.  So io pagetable sharing is allowed.

 One guest has
 only one something instance, thus has only one page table. It looks
 like: xen --- something (owns io page table) --- device. But, in KVM
 side, I think we can reuse native VT-d code, needn't to duplicate
 another VT-d code.
   

I agree that at this stage, we don't want to do optimization, we need
something working first.  But let's at least ensure the API allows the
optimization later on (and also, that iommu implementation details are
hidden from kvm).

What I'm proposing is moving the list of kvm_vtd_domains inside the
iommu API.  The only missing piece is populating a new dmar_domain when
a new device is added.  We already have intel_iommu_iova_to_pfn(), we
need to add a way to read the protection bits and the highest mapped
iova (oh, and intel_iommu_iova_to_pfn() has a bug: it shifts right
instead of left).

Later we can make the something (that already contains the list) also
own the io page table; and non-kvm users can still use the same code
(the list will always be of length 1 for these users).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Call for help: moving the kvm wiki

2008-10-09 Thread Avi Kivity
As you may have noticed, the kvm wiki is overrun by spammers.  It the
past I've regularly cleaned up the spam, but some time ago I've given up.

So I'm looking for a volunteer to locate a spam-free public wiki host
(candidates include wiki.kernel.org and fedorahosted.org) and transfer
the contents (minus the spam).  I don't think we need to transfer the
editing history, but the conversion should adapt to the target's wiki
syntax.

My requirements for the wiki are:
- hosted by a public provider (not a private system)
- ad free
- open for editing by users (requiring an account is fine; but there
should be no need for approval and no immutable pages)
- customization of the theme a bonus

This is a great way for non-coders to contribute to kvm development; the
wiki is a useful tool and it's a pity to let the spammers take it over.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Broken userspace module Makefile

2008-10-09 Thread Sheng Yang
Hi, Avi

After Xiantao's irq_common patches were checked in, we found that it's
impossible to compile with VT-d userspace now. Essentially the problem is
Makefile missed a $ since unifdef patch checked in half an years ago...

But after I fix it, I found it's still impossible to get unifdef run
correctly...

First, unifdef report error when processing include/linux/kvm.h, but I
can't find out what's wrong now.

Second, seems at least my unifdef can't deal with

#if defined(CONFIG_X86) || defined(CONFIG_IA64)

My unifdef version is 1.0(20030701), the latest from debian testing. I also
tried one for fc9, same result.

How do you think...
--
regards
Yang,Sheng

--
From: Sheng Yang [EMAIL PROTECTED]
Date: Thu, 9 Oct 2008 20:45:02 +0800
Subject: [PATCH 1/1] kvm: Fix broken Makefile of kernel module


Signed-off-by: Sheng Yang [EMAIL PROTECTED]
---
 kernel/Makefile |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/Makefile b/kernel/Makefile
index f2a71fa..e352f77 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -65,7 +65,7 @@ header-sync:
 $(LINUX)/arch/$(ARCH_DIR)/include/asm/./kvm*.h \
  $T/include/asm-$(ARCH_DIR)/
 
-   set -e  for i in $(find $T -name '*.h'); do \
+   set -e  for i in $$(find $T -name '*.h'); do \
$(call unifdef,$$i); done
$(call hack, include/linux/kvm.h)
set -e  for i in $$(find $T -type f -printf '%P '); \
@@ -79,7 +79,7 @@ source-sync:
 $(LINUX)/virt/kvm/./*.[cSh] \
 $T/
 
-   set -e  for i in $(find $T -name '*.c'); do \
+   set -e  for i in $$(find $T -name '*.c'); do \
$(call unifdef,$$i); done
 
for i in $(hack-files); \
-- 
1.5.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] [RESEND] VT-d: Support multiple device assignment to one guest

2008-10-09 Thread Han, Weidong
Avi Kivity wrote:
 Han, Weidong wrote:
 
 I don't want then to share dmar_domains (these are implementation
 details anyway), just io page tables.
 
 
 kvm --- something (owns io page table) --- dmar_domain (uses
 shared io page table) --- device 
 
 
 
 Let dmar_domains share io page table is not allowed. VT-d spec allows
 one domain corresponds to one page table, vice versa.
 
 Since the io pagetables are read only for the iommu (right?), I don't
 see what prevents several iommus from accessing the same pagetable.
 It's just a bunch of memory.

I think the reason is that hardware may use the domain identifier to tag
its internal caches. 

 
 If we want
 something owns the io page table, which shared by all assigned
 devices to one guest, we need to redefine dmar_domain which covers
 all devices assigned to a guest. Then we need to rewrite most of
 native VT-d code for kvm. Xen doesn't use dmar_domain, instead it
 implements something as a domain sturcture (with domain id) to own
 page table. 
 
 I imagine, Xen shares the io pagetables with the EPT pagetables as
 well.  So io pagetable sharing is allowed.

In Xen, VT-d page table doesn't share with EPT pagetable and P2M
pagetable. But they can share if the format is the same.

 
 One guest has
 only one something instance, thus has only one page table. It looks
 like: xen --- something (owns io page table) --- device. But, in
 KVM side, I think we can reuse native VT-d code, needn't to
 duplicate another VT-d code. 
 
 
 I agree that at this stage, we don't want to do optimization, we need
 something working first.  But let's at least ensure the API allows the
 optimization later on (and also, that iommu implementation details are
 hidden from kvm).
 
 What I'm proposing is moving the list of kvm_vtd_domains inside the
 iommu API.  The only missing piece is populating a new dmar_domain
 when a new device is added.  We already have

I will move kvm_vtd_domain inside the iommu API, and also hide
get_kvm_vtd_domain() and release_kvm_vtd_domain() implementation details
from kvm.

 intel_iommu_iova_to_pfn(), we need to add a way to read the
 protection bits and the highest mapped iova (oh, and
 intel_iommu_iova_to_pfn() has a bug: it shifts right instead of left).
 

Why do we need the protection bits and the highest mapped iova? 

Shifting right instead of left in intel_iommu_iova_to_pfn() is not a
bug, because it returns pfn, not address.

Regards,
Weidong

 Later we can make the something (that already contains the list)
 also own the io page table; and non-kvm users can still use the same
 code (the list will always be of length 1 for these users).

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] Enable MTRR for EPT

2008-10-09 Thread Sheng Yang
On Thursday 09 October 2008 16:44:19 Avi Kivity wrote:
 Sheng Yang wrote:
  The effective memory type of EPT is the mixture of MSR_IA32_CR_PAT and
  memory type field of EPT entry.
 
 
 
  @@ -168,6 +168,7 @@ static u64 __read_mostly shadow_x_mask; /* mutual
  exclusive with nx_mask */ static u64 __read_mostly shadow_user_mask;
   static u64 __read_mostly shadow_accessed_mask;
   static u64 __read_mostly shadow_dirty_mask;
  +static u64 __read_mostly shadow_mt_mask;

 For shadow, the mt mask is different based on the level of the page
 table, so we need an array here.  This can of course be left until
 shadow pat is implemented.

  +   if (mt_mask) {
  +   mt_mask = get_memory_type(vcpu, gfn) 
  + kvm_x86_ops-get_mt_mask_shift();
  +   spte |= mt_mask;
  +   }

 For shadow, it's not a simple shift, since for large pages one of the
 bits is at position 12.  So we would need the callback to calculate the
 mask value.

 Perhaps even simpler, have a 4x8 array, with the first index the page
 table level and the second index the memory type.  The initialization
 code can prepare the array like it prepares the other masks.

 This can wait until we have a shadow pat implementation.

Yes, of course. Now this mask is just used by EPT, so I do it like this. Later 
shadow mtrr/pat would solve this as well. :)

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] MTRR/PAT support for EPT (v3)

2008-10-09 Thread Avi Kivity
Sheng Yang wrote:
 On Thursday 09 October 2008 17:03:24 Avi Kivity wrote:
   
 Sheng Yang wrote:
 
 Hi, Avi

 Here is the latest update of MTRR/PAT support.

 Change from v2:
 Discard the using of MSR bitmap, add MSR_IA32_CR_PAT to save/restore, as
 well as rebase on latest upstream.
   
 Applied all; my comments about shadow can be addressed later.

 There is also the danger of the guest setting the wrong MTRR type for
 RAM, thus introducing incompatible memory types (between qemu and the
 guest).  If this is a problem, we should ignore the guest's mtrr (and
 pat) for RAM and use write-back instead.
 

 Do you mean host(qemu) would access this memory and if we set it to guest 
 MTRR, host access would be broken? We would cover this in our shadow MTRR 
 patch, for we encountered this in video ram when doing some experiment with 
 VGA assignment. 
   

No, I think that the cpu requires that all accesses to a page be done
using the same memory type.  We are allowing the guest to break that,
since qemu mappings will use writeback and guest mapping will use guest
specified memory types.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Compile faillure with 2.6.27-rc9-git1

2008-10-09 Thread Sheng Yang
On Wed, Oct 08, 2008 at 05:55:47PM +0200, Xavier Gnata wrote:
 Hi,

 I'm trying to compile kvm-76 on a box running 2.6.27-rc9-git1 (yeah  
 ok...rc+git...).
 I get this error:
 In file included from /usr/local/src/kvm-76/kernel/x86/svm.c:16:
 /usr/local/src/kvm-76/kernel/include/linux/kvm_host.h:128: error: field  
 ?mmu_notifier? has incomplete type

 Sorry for the noise if it is a non relevant or a well know issue.

Seems you miss something in .config, probably CONFIG_KVM or
CONFIG_VIRTUALIZATION. Did you enable KVM when you build the
kernel? You can use make menuconfig then search MMU_NOTIFIER to find what
config it depends on. In the most condition, simple enable KVM support when
you compiling the kernel just OK.

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] MTRR/PAT support for EPT (v3)

2008-10-09 Thread Sheng Yang
On Thursday 09 October 2008 17:03:24 Avi Kivity wrote:
 Sheng Yang wrote:
  Hi, Avi
 
  Here is the latest update of MTRR/PAT support.
 
  Change from v2:
  Discard the using of MSR bitmap, add MSR_IA32_CR_PAT to save/restore, as
  well as rebase on latest upstream.

 Applied all; my comments about shadow can be addressed later.

 There is also the danger of the guest setting the wrong MTRR type for
 RAM, thus introducing incompatible memory types (between qemu and the
 guest).  If this is a problem, we should ignore the guest's mtrr (and
 pat) for RAM and use write-back instead.

Do you mean host(qemu) would access this memory and if we set it to guest 
MTRR, host access would be broken? We would cover this in our shadow MTRR 
patch, for we encountered this in video ram when doing some experiment with 
VGA assignment. 

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ Re: unhandled vm exit: 0x80000021 vcpu_id 0]

2008-10-09 Thread Sheng Yang
On Wed, Oct 8, 2008 at 7:16 AM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Hi Yang,
 I often hibernate my Linux, so may be that the loadmodule message is
 missing in the dmesg because it is too old.

 I have rebooted the system and I attach a clean dmesg.

Yeah, now I can see the load info of kvm-76.


 What means  Windows always trig a apic write error before Jan's patch
 make them slience? which Windows?

At least Windows XP like to do this, now for upstream, Jan's patch clean it.

 However, when I try ro run qemu/kvm using the winxp image, no error
 happens in the dmesg.  I can see the error as output of the qemu/kvm
 command.

It's indeed hard to debug with so limit info... I still suggest you to
fill a bug first.

And if you have time, please try the attached patch and update info.

--
regards
Yang, Sheng

 Reagrds,
  Pier Luigi


  Original Message 
 Subject:Re: unhandled vm exit: 0x8021 vcpu_id 0
 Date:   Fri, 3 Oct 2008 08:57:31 +0800
 From:   Sheng Yang [EMAIL PROTECTED]
 To: [EMAIL PROTECTED] [EMAIL PROTECTED]
 CC: [EMAIL PROTECTED], kvm@vger.kernel.org
 References: [EMAIL PROTECTED]



 On Fri, Oct 03, 2008 at 12:16:20AM +0200, [EMAIL PROTECTED] wrote:

 Hi,
 I understand the particularity (checkpoint) of this case.

 Hi Pier

 Thanks for your understanding. :)

 Any way, in the attachment the dmesg log and the output of the
 dmesg
 command.

 But it's strange that I almost can't see anything correlated with kvm
 in the
 log. If you built kvm as a modules(I suppose you did it because you
 tried
 many versions), at least something like load kvm module xxx should
 appear(and Windows always trig a apic write error before Jan's patch
 make
 them slience).

 Is this the dmesg when the error was happening?

 --
 regards
 Yang, Sheng


 thanks for your helpfulness.

 Regards.

 Sheng Yang wrote:
  On Mon, Sep 29, 2008 at 6:18 PM, [EMAIL PROTECTED] p.
 [EMAIL PROTECTED]
 it wrote:
 
  Hi,
  I have successfully installed windows XP SP2 on kvm. After the
  installation I have launched the setup of  Checkpoint -
 Pointsec
 for
  the entire disk encryption.
 
 
  Hi Pier
 
  Can you issue a bug for this? But sadly Checkpoint is a
 commercial
  software, we may not deal with it directly and immediately.
 
 
  The first step of installation was run successfully, but when the
  system reboots and Pointsec loads the initial code, the
 following
  error happens:
 

 ==
  unhandled vm exit: 0x8021 vcpu_id 0
  rax 0007 rbx 1490 rcx 
 rdx
  19a0
  rsi  rdi  rsp 0080
 rbp
  96bf
  r8   r9   r10 
 r11
  
  r12  r13  r14 
 r15
  
  rip 002a rflags 00023202
  cs 14a2 (/ p 0 dpl 0 db 0 s 0 type 9 l 0 g 0 avl
 0)
  ds 19a0 (/ p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl
 0)
  es 1a31 (/ p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl
 0)
  ss 1a29 (/ p 0 dpl 0 db 0 s 0 type 1 l 0 g 0 avl
 0)
  fs  (/ p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl
 0)
  gs  (/ p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl
 0)
  tr 0058 (00201ffa/ p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl
 0)
  ldt  (/ p 0 dpl 0 db 0 s 0 type 0 l 0 g 0
 avl
 0)
  gdt 20/1dd8
  idt 201df0/188
  cr0 8019 cr2 0 cr3 144 cr4 0 cr8 0 efer 0
 
 
  What's this... CR0.PE clear, CR0.PG set... And segment register
 also
  strange. May be some real emulation wrong...
 
 
  Aborted
 

 ==
  I am able to boot this system (image) using qemu (with kqemu
 enabled
  for user code), but not using kvm.
  I have also tried with the options: -no-kvm-irqchip -no-kvm-pit -
 no-
  acpi without success. Only the -no-kvm option works.
  I have tried these kvm releases: from 65 to 76; and these kernel
  (vanilla) releases: from 2.6.23.1 to 2.6.26.5.
 
 
  Thanks for your patient...
 
  My computer is a Dell D630 equipped with Intel(R) Core(TM)2 Duo
 CPU
  T7300  @ 2.00GHz
  The HOST Linux distributions used are: Fedora 8/9 for i386, and
 Fedora
  9 for x86_64.
 
 
  Can you show dmesg as well? That's also helps.
 
 







 ___

 Con Tiscali Adsl 8 Mega navighi SENZA LIMITI e GRATIS PER I PRIMI
 TRE MESI. In seguito paghi solo ??? 19,95 al mese. Attivala subito, l?
 offerta è valida fino al 02/10/2008! http://abbonati.tiscali.
 it/promo/adsl8mega/








 ___
 Visita Tiscali Shopping e troverai tutto quello che cerchi ai prezzi migliori
 http://shopping.tiscali.it/




-- 
regards,
Yang, Sheng
Index: 

RE: [PATCH] [RESEND] VT-d: Support multiple device assignment to one guest

2008-10-09 Thread Han, Weidong
Avi Kivity wrote:
 Han, Weidong wrote:
 
 There is a missed optimization here.  Suppose we have two devices
 each under a different iommu.  With the patch, each will be in a
 different dmar_domain and so will have a different page table.  The
 amount of memory used is doubled. 
 
 
 You cannot let two devices each under a different iommu share one
 dmar_domain, becasue dmar_domain has a pointer to iommu.
 
 
 
 I don't want then to share dmar_domains (these are implementation
 details anyway), just io page tables.
 
 
 kvm --- something (owns io page table) --- dmar_domain (uses shared
 io page table) --- device
 

Let dmar_domains share io page table is not allowed. VT-d spec allows
one domain corresponds to one page table, vice versa. If we want
something owns the io page table, which shared by all assigned devices
to one guest, we need to redefine dmar_domain which covers all devices
assigned to a guest. Then we need to rewrite most of native VT-d code
for kvm. Xen doesn't use dmar_domain, instead it implements something
as a domain sturcture (with domain id) to own page table. One guest has
only one something instance, thus has only one page table. It looks
like: xen --- something (owns io page table) --- device. But, in KVM
side, I think we can reuse native VT-d code, needn't to duplicate
another VT-d code.

Regards,
Weidong

 Even if we don't implement io page table sharing right away,
 implementing the 'something' in the iommu api means we can later
 impement sharing without changing the iommu/kvm interface.
 
 In fact, the exported APIs added for KVM VT-d also do
 create/map/attach/detach/free functions. Whereas these iommu APIs
 are more readable. 
 
 
 
 
 No; the existing iommu API talks about dmar domains and exposes the
 existence of multiple iommus, so it is more complex.
 
 Because kvm VT-d usage is different with native usage, it's
 inevitable extend native VT-d code to support KVM VT-d (such as wrap
 dmar_domain). For devices under different iommus, they cannot share
 the same dmar_domain, thus they cannot share VT-d page table. If we
 want to handle this by iommu APIs, I suspect we need to change lots
 of native VT-d driver code. 
 
 
 As mentioned above, we can start with implementing the API without
 actual sharing (basically, your patch, but as an addition to the API
 rather than a change to kvm); we can add io pagetable sharing later.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86 emulator: Add Src2 decode set

2008-10-09 Thread Guillaume Thouvenin
On Thu, 09 Oct 2008 11:06:50 +0200
Avi Kivity [EMAIL PROTECTED] wrote:

 
 The regular shift instructions (shl, rcl, etc) come in three varieties:
 shift by 1, shift by imm8, and shift by CL.  Right now they use
 SrcImmByte and decode the implied '1' and CL by hand.  If we change them
 to use Src2, they can reuse the Src2CL and Src2One support that you are
 adding now.

Ok I see but shld, rcld, etc come only in two varieties: immediate and
CL. So maybe it could be better to replace SrcImplicit (that is not
really useful) by SrcOne? and then we will have
 
 ...
 case SrcOne:
c-src.val = 1;
break;
 ...

and we can reuse Src2CL as well.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86 emulator: Add Src2 decode set

2008-10-09 Thread Avi Kivity
Guillaume Thouvenin wrote:
 I will add Src2One but I don't understand exactly what you mean by
 switching shift operators to Src2 later. I also applied other remarks,
 thanks for your help. The patch follows.
   

The regular shift instructions (shl, rcl, etc) come in three varieties:
shift by 1, shift by imm8, and shift by CL.  Right now they use
SrcImmByte and decode the implied '1' and CL by hand.  If we change them
to use Src2, they can reuse the Src2CL and Src2One support that you are
adding now.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] MTRR/PAT support for EPT (v3)

2008-10-09 Thread Avi Kivity
Sheng Yang wrote:
 Hi, Avi

 Here is the latest update of MTRR/PAT support.

 Change from v2:
 Discard the using of MSR bitmap, add MSR_IA32_CR_PAT to save/restore, as well
 as rebase on latest upstream.

   


Applied all; my comments about shadow can be addressed later.

There is also the danger of the guest setting the wrong MTRR type for
RAM, thus introducing incompatible memory types (between qemu and the
guest).  If this is a problem, we should ignore the guest's mtrr (and
pat) for RAM and use write-back instead.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86 emulator: Add Src2 decode set

2008-10-09 Thread Guillaume Thouvenin
On Thu, 09 Oct 2008 10:11:57 +0200
Avi Kivity [EMAIL PROTECTED] wrote:

 Guillaume Thouvenin wrote:
  Instruction like shld has three operands, so we need to add a Src2
  decode set. We start with Src2None, Src2CL, and Src2Imm8 to support
  shld and we will expand it later.
 

 
 Please add Src2One (implied '1') as well, so we can switch the existing
 shift operators to Src2 later.

I will add Src2One but I don't understand exactly what you mean by
switching shift operators to Src2 later. I also applied other remarks,
thanks for your help. The patch follows.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM rpm/deb packages for recent releases.

2008-10-09 Thread Farkas Levente
Daniel P. Berrange wrote:
 On Wed, Oct 08, 2008 at 12:06:43PM -0700, jd wrote:
 Hi 

  -  I am looking for installable packages (both rpms and deb) for recent 
 versions of KVM (kvm-70 and above).  

 For SUSE/SLES I found, which seems useful (looks official)
 http://download.opensuse.org/repositories/Virtualization:/KVM/
  
 Anything similar for RHEL/CentOS or Ubuntu/debian ?
   
  - RHEL and CentOS seems to be at  kvm-36. Is there a process to make 
 higher version of kvm supported on such distros? Would any one from 
 RH and Novell know/comment here?
 
 No version of KVM is supported on RHEL.  Xen is the virtualization
 technology in RHEL-5. Whatever CentOS is shipping is not derived
 from anything in RHEL-5.
 
 For up2date RPMs, Fedora (rawhide) is the place to look. We aim to track
 latest releases in both upstream kernel (for modules) and kvm (for the
 userspace). NB, we don't patch KVM modules to be newer than what's in
 Linus' official releases.

but if you still like here are all required packages:
http://www.lfarkas.org/linux/packages/centos/5/x86_64/

-- 
  Levente   Si vis pacem para bellum!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM rpm/deb packages for recent releases.

2008-10-09 Thread Rodrigo Campos
On Wed, Oct 8, 2008 at 4:06 PM, jd [EMAIL PROTECTED] wrote:
 Hi

  -  I am looking for installable packages (both rpms and deb) for recent 
 versions of KVM (kvm-70 and above).

kvm-72 is in debian testing/unstable. You say for etch and a half ?

I didn't find it in backports.org, but you could try downloading the
source package from testing and compile it (you need to add a deb-src
line to the sources.list), something like: apt-get build-dep kvm;
apt-get source kvm; cd kvm-*; debuild -us -uc (I didn't test it, but
it shouldn't be too different, perhaps you could not have some
dependency you can install from testing or compile this way too:)


Thanks,
Rodrigo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Broken userspace module Makefile

2008-10-09 Thread Zhang, Xiantao
CONFIG_X86 is defined to compile every qemu's objects, so even if
unifdef doesn't work, we shouldn't meet the problems related to this
header file.  Maybe other pential issues casues the problem you met.
Anyway we had better enable unifdef to work in its right way. :)
BTW, seems unifdef can't handle the case like #if defined(CONFIG_X86) ||
defined(CONFIG_IA64) from the manual, who can clarify it ?
Thanks
Xiantao


Sheng Yang wrote:
 Hi, Avi
 
 After Xiantao's irq_common patches were checked in, we found that it's
 impossible to compile with VT-d userspace now. Essentially the
 problem is Makefile missed a $ since unifdef patch checked in half an
 years ago... 
 
 But after I fix it, I found it's still impossible to get unifdef run
 correctly...
 
 First, unifdef report error when processing include/linux/kvm.h, but I
 can't find out what's wrong now.
 
 Second, seems at least my unifdef can't deal with
 
 #if defined(CONFIG_X86) || defined(CONFIG_IA64)
 
 My unifdef version is 1.0(20030701), the latest from debian testing.
 I also tried one for fc9, same result.
 
 How do you think...
 --
 regards
 Yang,Sheng
 
 --
 From: Sheng Yang [EMAIL PROTECTED]
 Date: Thu, 9 Oct 2008 20:45:02 +0800
 Subject: [PATCH 1/1] kvm: Fix broken Makefile of kernel module
 
 
 Signed-off-by: Sheng Yang [EMAIL PROTECTED]
 ---
  kernel/Makefile |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/kernel/Makefile b/kernel/Makefile
 index f2a71fa..e352f77 100644
 --- a/kernel/Makefile
 +++ b/kernel/Makefile
 @@ -65,7 +65,7 @@ header-sync:
$(LINUX)/arch/$(ARCH_DIR)/include/asm/./kvm*.h \
   $T/include/asm-$(ARCH_DIR)/
 
 - set -e  for i in $(find $T -name '*.h'); do \
 + set -e  for i in $$(find $T -name '*.h'); do \
   $(call unifdef,$$i); done
   $(call hack, include/linux/kvm.h)
   set -e  for i in $$(find $T -type f -printf '%P '); \
 @@ -79,7 +79,7 @@ source-sync:
$(LINUX)/virt/kvm/./*.[cSh] \
$T/
 
 - set -e  for i in $(find $T -name '*.c'); do \
 + set -e  for i in $$(find $T -name '*.c'); do \
   $(call unifdef,$$i); done
 
   for i in $(hack-files); \
 --
 1.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Broken userspace module Makefile

2008-10-09 Thread Sheng Yang
On Friday 10 October 2008 09:47:15 Zhang, Xiantao wrote:
 CONFIG_X86 is defined to compile every qemu's objects, so even if
 unifdef doesn't work, we shouldn't meet the problems related to this
 header file.  Maybe other pential issues casues the problem you met.
 Anyway we had better enable unifdef to work in its right way. :)
 BTW, seems unifdef can't handle the case like #if defined(CONFIG_X86) ||
 defined(CONFIG_IA64) from the manual, who can clarify it ?

Yeah, CONFIG_X86 is for qemu. But kernel/ is not a part of qemu code, and 
can't be cover by qemu/config-host.mak...

regards
Yang, Sheng
 Thanks
 Xiantao

 Sheng Yang wrote:
  Hi, Avi
 
  After Xiantao's irq_common patches were checked in, we found that it's
  impossible to compile with VT-d userspace now. Essentially the
  problem is Makefile missed a $ since unifdef patch checked in half an
  years ago...
 
  But after I fix it, I found it's still impossible to get unifdef run
  correctly...
 
  First, unifdef report error when processing include/linux/kvm.h, but I
  can't find out what's wrong now.
 
  Second, seems at least my unifdef can't deal with
 
  #if defined(CONFIG_X86) || defined(CONFIG_IA64)
 
  My unifdef version is 1.0(20030701), the latest from debian testing.
  I also tried one for fc9, same result.
 
  How do you think...
  --
  regards
  Yang,Sheng
 
  --
  From: Sheng Yang [EMAIL PROTECTED]
  Date: Thu, 9 Oct 2008 20:45:02 +0800
  Subject: [PATCH 1/1] kvm: Fix broken Makefile of kernel module
 
 
  Signed-off-by: Sheng Yang [EMAIL PROTECTED]
  ---
   kernel/Makefile |4 ++--
   1 files changed, 2 insertions(+), 2 deletions(-)
 
  diff --git a/kernel/Makefile b/kernel/Makefile
  index f2a71fa..e352f77 100644
  --- a/kernel/Makefile
  +++ b/kernel/Makefile
  @@ -65,7 +65,7 @@ header-sync:
   $(LINUX)/arch/$(ARCH_DIR)/include/asm/./kvm*.h \
$T/include/asm-$(ARCH_DIR)/
 
  -   set -e  for i in $(find $T -name '*.h'); do \
  +   set -e  for i in $$(find $T -name '*.h'); do \
  $(call unifdef,$$i); done
  $(call hack, include/linux/kvm.h)
  set -e  for i in $$(find $T -type f -printf '%P '); \
  @@ -79,7 +79,7 @@ source-sync:
   $(LINUX)/virt/kvm/./*.[cSh] \
   $T/
 
  -   set -e  for i in $(find $T -name '*.c'); do \
  +   set -e  for i in $$(find $T -name '*.c'); do \
  $(call unifdef,$$i); done
 
  for i in $(hack-files); \
  --
  1.5.3

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Broken userspace module Makefile

2008-10-09 Thread Zhang, Xiantao
The following patch should solve the issue you met before unifdef gets
work again. 

diff --git a/libkvm/config-i386.mak b/libkvm/config-i386.mak
index 2706b70..3579985 100644
--- a/libkvm/config-i386.mak
+++ b/libkvm/config-i386.mak
@@ -1,6 +1,6 @@

 LIBDIR := /lib
 CFLAGS += -m32
-CFLAGS += -D__i386__
+CFLAGS += -D__i386__ -DCONFIG_X86

 libkvm-$(ARCH)-objs := libkvm-x86.o
diff --git a/libkvm/config-x86_64.mak b/libkvm/config-x86_64.mak
index e638977..9d02eb0 100644
--- a/libkvm/config-x86_64.mak
+++ b/libkvm/config-x86_64.mak
@@ -1,6 +1,6 @@

 LIBDIR := /lib64
 CFLAGS += -m64
-CFLAGS += -D__x86_64__
+CFLAGS += -D__x86_64__ -DCONFIG_X86

 libkvm-$(ARCH)-objs := libkvm-x86.o

 
 Yeah, CONFIG_X86 is for qemu. But kernel/ is not a part of qemu code,
 and can't be cover by qemu/config-host.mak...

No, when you use ./configure in userspace, it will generate a qemu_cflag
which includes -DCONFIG_X86 for compiling qemu's objects.  So when
compile qemu's objects, CONFIG_X86 is defined for every targets! :)
Xiantao


 regards
 Yang, Sheng
 Thanks
 Xiantao
 
 Sheng Yang wrote:
 Hi, Avi
 
 After Xiantao's irq_common patches were checked in, we found that
 it's impossible to compile with VT-d userspace now. Essentially the
 problem is Makefile missed a $ since unifdef patch checked in half
 an years ago... 
 
 But after I fix it, I found it's still impossible to get unifdef
 run correctly... 
 
 First, unifdef report error when processing include/linux/kvm.h,
 but I can't find out what's wrong now.
 
 Second, seems at least my unifdef can't deal with
 
 #if defined(CONFIG_X86) || defined(CONFIG_IA64)
 
 My unifdef version is 1.0(20030701), the latest from debian testing.
 I also tried one for fc9, same result.
 
 How do you think...
 --
 regards
 Yang,Sheng
 
 --
 From: Sheng Yang [EMAIL PROTECTED]
 Date: Thu, 9 Oct 2008 20:45:02 +0800
 Subject: [PATCH 1/1] kvm: Fix broken Makefile of kernel module
 
 
 Signed-off-by: Sheng Yang [EMAIL PROTECTED]
 ---
  kernel/Makefile |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/kernel/Makefile b/kernel/Makefile
 index f2a71fa..e352f77 100644
 --- a/kernel/Makefile
 +++ b/kernel/Makefile
 @@ -65,7 +65,7 @@ header-sync:
  $(LINUX)/arch/$(ARCH_DIR)/include/asm/./kvm*.h \
   $T/include/asm-$(ARCH_DIR)/
 
 -   set -e  for i in $(find $T -name '*.h'); do \
 +   set -e  for i in $$(find $T -name '*.h'); do \
 $(call unifdef,$$i); done
 $(call hack, include/linux/kvm.h)
 set -e  for i in $$(find $T -type f -printf '%P '); \
 @@ -79,7 +79,7 @@ source-sync:
  $(LINUX)/virt/kvm/./*.[cSh] \
  $T/
 
 -   set -e  for i in $(find $T -name '*.c'); do \
 +   set -e  for i in $$(find $T -name '*.c'); do \
 $(call unifdef,$$i); done
 
 for i in $(hack-files); \
 --
 1.5.3
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] [RESEND] VT-d: Support multiple device assignment to one guest

2008-10-09 Thread Han, Weidong
Han, Weidong wrote:
 Avi Kivity wrote:
 Han, Weidong wrote:
 
 I don't want then to share dmar_domains (these are implementation
 details anyway), just io page tables.
 
 
 kvm --- something (owns io page table) --- dmar_domain (uses
 shared io page table) --- device
 
 
 
 Let dmar_domains share io page table is not allowed. VT-d spec
 allows one domain corresponds to one page table, vice versa.
 
 Since the io pagetables are read only for the iommu (right?), I don't
 see what prevents several iommus from accessing the same pagetable.
 It's just a bunch of memory.
 
 I think the reason is that hardware may use the domain identifier to
 tag its internal caches. 
 
 
 If we want
 something owns the io page table, which shared by all assigned
 devices to one guest, we need to redefine dmar_domain which covers
 all devices assigned to a guest. Then we need to rewrite most of
 native VT-d code for kvm. Xen doesn't use dmar_domain, instead it
 implements something as a domain sturcture (with domain id) to own
 page table.
 
 I imagine, Xen shares the io pagetables with the EPT pagetables as
 well.  So io pagetable sharing is allowed.
 
 In Xen, VT-d page table doesn't share with EPT pagetable and P2M
 pagetable. But they can share if the format is the same. 
 
 
 One guest has
 only one something instance, thus has only one page table. It
 looks like: xen --- something (owns io page table) --- device.
 But, in KVM side, I think we can reuse native VT-d code, needn't to
 duplicate another VT-d code. 
 
 
 I agree that at this stage, we don't want to do optimization, we need
 something working first.  But let's at least ensure the API allows
 the optimization later on (and also, that iommu implementation
 details are hidden from kvm). 
 
 What I'm proposing is moving the list of kvm_vtd_domains inside the
 iommu API.  The only missing piece is populating a new dmar_domain
 when a new device is added.  We already have
 
 I will move kvm_vtd_domain inside the iommu API, and also hide
 get_kvm_vtd_domain() and release_kvm_vtd_domain() implementation
 details from kvm.  

It's hard to move kvm_vtd_domain inside current iommu API. It's kvm
specific. It's not elegant to include kvm_vtd_domain stuffs in native
VT-d code. I think leave it in kvm side is more clean at this point.
Moveover it's very simple. I read Joerg's iommu API foils just now, I
think it's good. Native AMD iommu code will be in 2.6.28, it's a
suitable to implement a generic iommu API based both on Intel and AMD
iommu for kvm after 2.6.28. What's your opinion? 

Regards,
Weidong

 
 intel_iommu_iova_to_pfn(), we need to add a way to read the
 protection bits and the highest mapped iova (oh, and
 intel_iommu_iova_to_pfn() has a bug: it shifts right instead of
 left). 
 
 
 Why do we need the protection bits and the highest mapped iova?
 
 Shifting right instead of left in intel_iommu_iova_to_pfn() is not a
 bug, because it returns pfn, not address. 
 
 Regards,
 Weidong
 
 Later we can make the something (that already contains the list)
 also own the io page table; and non-kvm users can still use the same
 code (the list will always be of length 1 for these users).

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: exit timing analysis v1 - commentsdiscussions welcome

2008-10-09 Thread Christian Ehrhardt
I modified the code according to your comments and my ideas, the new 
values are shown in column impISF (irq delivery, Stat, FindFirstBit)


I changed some code of the statistic updating and the interrupt delivery 
and got this:

base - impirq (d3) - impstat (d5) - impboth  - impISF
a)  12.57% -  11.13% -  12.05%  -  11.03% - 12.28%  exit, saving 
guest state (booke_interrupt.S)
b)   7.37% -   9.38% -   8.69%  -   8.07% - 10.13%  reaching 
kvmppc_handle_exit
c)   7.38% -   7.20% -   7.49%  -   9.78% -  7.85%  syscall exit 
is checked and a interrupt is queued using kvmppc_queue_exception
d1)  2.49% -   3.39% -   2.56%  -   3.30% -  3.70%  some checks 
for all exits
d2)  8.84% -   8.56% -   9.28%  -   8.31% -  6.07%  finding 
first bit in kvmppc_check_and_deliver_interrupts
d3)  6.53% -   5.25% -   6.63%  -   5.10% -  4.27%  can_deliver 
in kvmppc_check_and_deliver_interrupts
d4) 13.66% -  15.37% -  14.12%  -  14.92% - 13.96%  
cleardeliver exception in kvmppc_check_and_deliver_interrupts
d5)  3.65% -   4.57% -   2.68%  -   4.41% -  3.77%  updating 
kvm_stat statistics
e)   6.55% -   6.30% -   6.30%  -   5.89% -  6.74%  returning 
from kvmppc_handle_exit to booke_interrupt.S
f1) 30.90% -  28.78% -  30.16%  -  29.16% - 31.19%  restoring 
guest tlb
f2)  4.81% -   4.77% -   5.06%  -   4.66% -  5.17%  restoring 
guest state ([s]regs)


We all see the measurement inaccuracy, but the last columns look good at 
the improved sections d2, d3 and d4.
I'll remove these detailed tracing soon and make a larger test hoping 
that this will not have the inaccuracy.
But for now I still wonder about the ~14% for cleardeliver - that 
should just not be that much.
It should be worth to look into that section once again more in detail 
first.


Christian Ehrhardt wrote:

Hollis Blanchard wrote:

On Wed, 2008-10-08 at 15:49 +0200, Christian Ehrhardt wrote:
 
Wondering about that 30.5% for postprocessing and 
kvmppc_check_and_deliver_interrupts I quickly checked that in detail 
- part d is now divided in 4 subparts.
I also looked at the return to guest path if the expected part 
(restoring tlb) is really the main time eater there. The result 
shows clearly that it is.


more detailed breakdown:
a)  10.94%  - exit, saving guest state (booke_interrupt.S)
b)   8.12% - reaching kvmppc_handle_exit
c)   7.59%  - syscall exit is checked and a interrupt is queued 
using kvmppc_queue_exception

d1)  3.33%  - some checks for all exits
d2)  8.29% - finding first bit in kvmppc_check_and_deliver_interrupts
d3) 17.20% - can_deliver/cleardeliver exception in 
kvmppc_check_and_deliver_interrupts

d4)  4.47% - updating kvm_stat statistics
e)   6.13% - returning from kvmppc_handle_exit to booke_interrupt.S
f1) 29.18% - restoring guest tlb
f2)  4.69% - restoring guest state ([s]regs)

These fractions are % of our ~12µs syscall exit.
= restoring tlb on each reenter = 4µs constant overhead
= looking a bit into irq delivery and other constant things like 
kvm_stat updating




...
 

Now I go for the TLB replacement in f1.



Hang on... does d3 make sense to you? It doesn't to me, and if there's a
bug there it will be easier to fix than rewriting the TLB code. :)
  

I did not give up improving that part too :-)

I think your core runs at 667MHz, right? So that's 1.5 ns/cycle. 17.20%
of 12µs is 2064ns, or about 1300 cycles. (Check my math.)
  

I get the same results. 1% ~ 80 cycles.

Now when I look at kvmppc_core_deliver_interrupts(), I'm not sure where
that time is going. We're assuming the first_first_bit() loop usually
executes once, for syscall. Does it actually execute more than that? I
don't expect any of kvmppc_can_deliver_interrupt(),
kvmppc_booke_clear_exception(), or kvmppc_booke_deliver_interrupt() to
take lots of time.
  
You can see below that I already had a more detailed breakdown in my 
old mail:

[...]
d2)  8.84% -   8.56% -   9.28%  -   8.31% finding first bit in 
kvmppc_check_and_deliver_interrupts
d3)  6.53% -   5.25% -   6.63%  -   5.10% can_deliver in 
kvmppc_check_and_deliver_interrupts
d4) 13.66% -  15.37% -  14.12%  -  14.92% cleardeliver 
exception in kvmppc_check_and_deliver_interrupts

[...]

Could it be cache effects? exception_priority[] and priority_exception[]
are 16 bytes each, and our L1 cacheline is 32 bytes, so they should both
fit into one... except they're not aligned.
  
I would be so happy if I would have hardware performance counters like 
cache misses :-)

Also, it looks like we use the generic find_first_bit(). That may be
more expensive than we'd like. However, since
vcpu-arch.pending_exceptions is a single long (not an arbitrary sized
bitfield), we should be able to use ffs() instead, which has an
optimized PowerPC implementation. That might help a lot.
  

good idea.
I'll check this and some other small improvements I have in mind.


We might even be able to replace find_next_bit()