consumer database

2010-05-05 Thread England hornmouth
Here's some of the healthcare lists we have:

Physicians (34 specialties) - 788k records, 17k emails, 200k fax numbers
Chiropractors - 108,421 total records * 3,414 emails * 6,553 fax numbers
US Surgery Centers - 85k records and 14k emails


Theres many more too, just send me an email here for additional info/samples: 
lavern.mcl...@newbusinesshorizon.co.cc

  


Send email to disapp...@newbusinesshorizon.co.cc to ensure no further 
communication
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] qemu-kvm: event writeback can overwrite interrupts with -no-kvm-irqchip

2010-05-05 Thread Avi Kivity
From: Marcelo Tosatti mtosa...@redhat.com

Interrupts that are injected during a vcpu event save/writeback cycle
are lost.

Fix by writebacking the state before injecting interrupts.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 43d599d..749587a 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -885,11 +885,6 @@ int pre_kvm_run(kvm_context_t kvm, CPUState *env)
 {
 kvm_arch_pre_run(env, env-kvm_run);
 
-if (env-kvm_vcpu_dirty) {
-kvm_arch_load_regs(env, KVM_PUT_RUNTIME_STATE);
-env-kvm_vcpu_dirty = 0;
-}
-
 pthread_mutex_unlock(qemu_mutex);
 return 0;
 }
@@ -907,6 +902,10 @@ int kvm_run(CPUState *env)
 int fd = env-kvm_fd;
 
   again:
+if (env-kvm_vcpu_dirty) {
+kvm_arch_load_regs(env, KVM_PUT_RUNTIME_STATE);
+env-kvm_vcpu_dirty = 0;
+}
 push_nmi(kvm);
 #if !defined(__s390__)
 if (!kvm-irqchip_in_kernel)
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] qemu-kvm: Process exit requests in kvm loop

2010-05-05 Thread Avi Kivity
From: Jan Kiszka jan.kis...@siemens.com

This unbreaks the monitor quit command for qemu-kvm.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 91f0222..43d599d 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2047,6 +2047,9 @@ int kvm_main_loop(void)
 vm_stop(EXCP_DEBUG);
 kvm_debug_cpu_requested = NULL;
 }
+if (qemu_exit_requested()) {
+exit(0);
+}
 }
 
 pause_all_threads();
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] replace set_msr_entry with kvm_msr_entry

2010-05-05 Thread Avi Kivity
From: Glauber Costa glom...@redhat.com

this is yet another function that upstream qemu implements,
so we can just use its implementation.

Signed-off-by: Glauber Costa glom...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 748ff69..439c31a 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -693,13 +693,6 @@ int kvm_arch_qemu_create_context(void)
 return 0;
 }
 
-static void set_msr_entry(struct kvm_msr_entry *entry, uint32_t index,
-  uint64_t data)
-{
-entry-index = index;
-entry-data  = data;
-}
-
 /* returns 0 on success, non-0 on failure */
 static int get_msr_entry(struct kvm_msr_entry *entry, CPUState *env)
 {
@@ -960,19 +953,19 @@ void kvm_arch_load_regs(CPUState *env, int level)
 /* msrs */
 n = 0;
 /* Remember to increase msrs size if you add new registers below */
-set_msr_entry(msrs[n++], MSR_IA32_SYSENTER_CS,  env-sysenter_cs);
-set_msr_entry(msrs[n++], MSR_IA32_SYSENTER_ESP, env-sysenter_esp);
-set_msr_entry(msrs[n++], MSR_IA32_SYSENTER_EIP, env-sysenter_eip);
+kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_CS,  env-sysenter_cs);
+kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_ESP, env-sysenter_esp);
+kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_EIP, env-sysenter_eip);
 if (kvm_has_msr_star)
-   set_msr_entry(msrs[n++], MSR_STAR,  env-star);
+   kvm_msr_entry_set(msrs[n++], MSR_STAR,  env-star);
 if (kvm_has_vm_hsave_pa)
-set_msr_entry(msrs[n++], MSR_VM_HSAVE_PA, env-vm_hsave);
+kvm_msr_entry_set(msrs[n++], MSR_VM_HSAVE_PA, env-vm_hsave);
 #ifdef TARGET_X86_64
 if (lm_capable_kernel) {
-set_msr_entry(msrs[n++], MSR_CSTAR, env-cstar);
-set_msr_entry(msrs[n++], MSR_KERNELGSBASE,  env-kernelgsbase);
-set_msr_entry(msrs[n++], MSR_FMASK, env-fmask);
-set_msr_entry(msrs[n++], MSR_LSTAR  ,   env-lstar);
+kvm_msr_entry_set(msrs[n++], MSR_CSTAR, env-cstar);
+kvm_msr_entry_set(msrs[n++], MSR_KERNELGSBASE,  
env-kernelgsbase);
+kvm_msr_entry_set(msrs[n++], MSR_FMASK, env-fmask);
+kvm_msr_entry_set(msrs[n++], MSR_LSTAR  ,   env-lstar);
 }
 #endif
 if (level == KVM_PUT_FULL_STATE) {
@@ -983,20 +976,20 @@ void kvm_arch_load_regs(CPUState *env, int level)
  * huge jump-backs that would occur without any writeback at all.
  */
 if (smp_cpus == 1 || env-tsc != 0) {
-set_msr_entry(msrs[n++], MSR_IA32_TSC, env-tsc);
+kvm_msr_entry_set(msrs[n++], MSR_IA32_TSC, env-tsc);
 }
-set_msr_entry(msrs[n++], MSR_KVM_SYSTEM_TIME, env-system_time_msr);
-set_msr_entry(msrs[n++], MSR_KVM_WALL_CLOCK, env-wall_clock_msr);
+kvm_msr_entry_set(msrs[n++], MSR_KVM_SYSTEM_TIME, 
env-system_time_msr);
+kvm_msr_entry_set(msrs[n++], MSR_KVM_WALL_CLOCK, env-wall_clock_msr);
 }
 #ifdef KVM_CAP_MCE
 if (env-mcg_cap) {
 if (level == KVM_PUT_RESET_STATE)
-set_msr_entry(msrs[n++], MSR_MCG_STATUS, env-mcg_status);
+kvm_msr_entry_set(msrs[n++], MSR_MCG_STATUS, env-mcg_status);
 else if (level == KVM_PUT_FULL_STATE) {
-set_msr_entry(msrs[n++], MSR_MCG_STATUS, env-mcg_status);
-set_msr_entry(msrs[n++], MSR_MCG_CTL, env-mcg_ctl);
+kvm_msr_entry_set(msrs[n++], MSR_MCG_STATUS, env-mcg_status);
+kvm_msr_entry_set(msrs[n++], MSR_MCG_CTL, env-mcg_ctl);
 for (i = 0; i  (env-mcg_cap  0xff); i++)
-set_msr_entry(msrs[n++], MSR_MC0_CTL + i, env-mce_banks[i]);
+kvm_msr_entry_set(msrs[n++], MSR_MC0_CTL + i, 
env-mce_banks[i]);
 }
 }
 #endif
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5239eaf..76c1adb 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -552,6 +552,8 @@ static int kvm_put_sregs(CPUState *env)
 return kvm_vcpu_ioctl(env, KVM_SET_SREGS, sregs);
 }
 
+#endif
+
 static void kvm_msr_entry_set(struct kvm_msr_entry *entry,
   uint32_t index, uint64_t value)
 {
@@ -559,6 +561,7 @@ static void kvm_msr_entry_set(struct kvm_msr_entry *entry,
 entry-data = value;
 }
 
+#ifdef KVM_UPSTREAM
 static int kvm_put_msrs(CPUState *env, int level)
 {
 struct {
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] introduce qemu_ram_map

2010-05-05 Thread Avi Kivity
From: Marcelo Tosatti mtosa...@redhat.com

Which allows drivers to register an mmap region into ram block mappings.
To be used by device assignment driver.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/cpu-common.h b/cpu-common.h
index 29b4ea5..4b0ba60 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -40,6 +40,7 @@ static inline void 
cpu_register_physical_memory(target_phys_addr_t start_addr,
 }
 
 ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr);
+ram_addr_t qemu_ram_map(ram_addr_t size, void *host);
 ram_addr_t qemu_ram_alloc(ram_addr_t);
 void qemu_ram_free(ram_addr_t addr);
 /* This should only be used for ram local to a device.  */
diff --git a/exec.c b/exec.c
index 9748496..4f94e87 100644
--- a/exec.c
+++ b/exec.c
@@ -2805,6 +2805,34 @@ static void *file_ram_alloc(ram_addr_t memory, const 
char *path)
 }
 #endif
 
+ram_addr_t qemu_ram_map(ram_addr_t size, void *host)
+{
+RAMBlock *new_block;
+
+size = TARGET_PAGE_ALIGN(size);
+new_block = qemu_malloc(sizeof(*new_block));
+
+new_block-host = host;
+
+new_block-offset = last_ram_offset;
+new_block-length = size;
+
+new_block-next = ram_blocks;
+ram_blocks = new_block;
+
+phys_ram_dirty = qemu_realloc(phys_ram_dirty,
+(last_ram_offset + size)  TARGET_PAGE_BITS);
+memset(phys_ram_dirty + (last_ram_offset  TARGET_PAGE_BITS),
+   0xff, size  TARGET_PAGE_BITS);
+
+last_ram_offset += size;
+
+if (kvm_enabled())
+kvm_setup_guest_memory(new_block-host, size);
+
+return new_block-offset;
+}
+
 ram_addr_t qemu_ram_alloc(ram_addr_t size)
 {
 RAMBlock *new_block;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] remove unused kvm_dirty_bitmap array

2010-05-05 Thread Avi Kivity
From: Marcelo Tosatti mtosa...@redhat.com

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/qemu-kvm.c b/qemu-kvm.c
index ae6570a..779bc5b 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2156,7 +2156,6 @@ void kvm_set_phys_mem(target_phys_addr_t start_addr, 
ram_addr_t size,
  * dirty pages logging
  */
 /* FIXME: use unsigned long pointer instead of unsigned char */
-unsigned char *kvm_dirty_bitmap = NULL;
 int kvm_physical_memory_set_dirty_tracking(int enable)
 {
 int r = 0;
@@ -2165,17 +2164,9 @@ int kvm_physical_memory_set_dirty_tracking(int enable)
 return 0;
 
 if (enable) {
-if (!kvm_dirty_bitmap) {
-unsigned bitmap_size = BITMAP_SIZE(phys_ram_size);
-kvm_dirty_bitmap = qemu_malloc(bitmap_size);
-r = kvm_dirty_pages_log_enable_all(kvm_context);
-}
+r = kvm_dirty_pages_log_enable_all(kvm_context);
 } else {
-if (kvm_dirty_bitmap) {
-r = kvm_dirty_pages_log_reset(kvm_context);
-qemu_free(kvm_dirty_bitmap);
-kvm_dirty_bitmap = NULL;
-}
+r = kvm_dirty_pages_log_reset(kvm_context);
 }
 return r;
 }
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] use upstream memslot management code

2010-05-05 Thread Avi Kivity
From: Marcelo Tosatti mtosa...@redhat.com

Drop qemu-kvm's implementation in favour of qemu's, they are
functionally equivalent.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 8e3cf38..1f13a6d 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -256,10 +256,7 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int 
region_num,
 AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev);
 AssignedDevRegion *region = r_dev-v_addrs[region_num];
 PCIRegion *real_region = r_dev-real_device.regions[region_num];
-pcibus_t old_ephys = region-e_physbase;
-pcibus_t old_esize = region-e_size;
-int first_map = (region-e_size == 0);
-int ret = 0;
+int ret = 0, flags = 0;
 
 DEBUG(e_phys=%08 FMT_PCIBUS  r_virt=%p type=%d len=%08 FMT_PCIBUS  
region_num=%d \n,
   e_phys, region-u.r_virtbase, type, e_size, region_num);
@@ -267,30 +264,22 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, 
int region_num,
 region-e_physbase = e_phys;
 region-e_size = e_size;
 
-if (!first_map)
-   kvm_destroy_phys_mem(kvm_context, old_ephys,
- TARGET_PAGE_ALIGN(old_esize));
-
 if (e_size  0) {
+
+if (region_num == PCI_ROM_SLOT)
+flags |= IO_MEM_ROM;
+
+cpu_register_physical_memory(e_phys, e_size, region-memory_index | 
flags);
+
 /* deal with MSI-X MMIO page */
 if (real_region-base_addr = r_dev-msix_table_addr 
 real_region-base_addr + real_region-size =
 r_dev-msix_table_addr) {
 int offset = r_dev-msix_table_addr - real_region-base_addr;
-ret = munmap(region-u.r_virtbase + offset, TARGET_PAGE_SIZE);
-if (ret == 0)
-DEBUG(munmap done, virt_base 0x%p\n,
-region-u.r_virtbase + offset);
-else {
-fprintf(stderr, %s: fail munmap msix table!\n, __func__);
-exit(1);
-}
+
 cpu_register_physical_memory(e_phys + offset,
 TARGET_PAGE_SIZE, r_dev-mmio_index);
 }
-   ret = kvm_register_phys_mem(kvm_context, e_phys,
-region-u.r_virtbase,
-TARGET_PAGE_ALIGN(e_size), 0);
 }
 
 if (ret != 0) {
@@ -539,6 +528,15 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 pci_dev-v_addrs[i].u.r_virtbase +=
 (cur_region-base_addr  0xFFF);
 
+
+if (!slow_map) {
+void *virtbase = pci_dev-v_addrs[i].u.r_virtbase;
+
+pci_dev-v_addrs[i].memory_index = 
qemu_ram_map(cur_region-size,
+virtbase);
+} else
+pci_dev-v_addrs[i].memory_index = 0;
+
 pci_register_bar((PCIDevice *) pci_dev, i,
  cur_region-size, t,
  slow_map ? assigned_dev_iomem_map_slow
@@ -726,10 +724,6 @@ static void free_assigned_device(AssignedDevice *dev)
 kvm_remove_ioperm_data(region-u.r_baseport, region-r_size);
 continue;
 } else if (pci_region-type  IORESOURCE_MEM) {
-if (region-e_size  0)
-kvm_destroy_phys_mem(kvm_context, region-e_physbase,
- TARGET_PAGE_ALIGN(region-e_size));
-
 if (region-u.r_virtbase) {
 int ret = munmap(region-u.r_virtbase,
  (pci_region-size + 0xFFF)  0xF000);
diff --git a/hw/device-assignment.h b/hw/device-assignment.h
index 1cbfc36..d561112 100644
--- a/hw/device-assignment.h
+++ b/hw/device-assignment.h
@@ -63,7 +63,7 @@ typedef struct {
 
 typedef struct {
 pcibus_t e_physbase;
-uint32_t memory_index;
+ram_addr_t memory_index;
 union {
 void *r_virtbase;/* mmapped access address for memory regions */
 uint32_t r_baseport; /* the base guest port for I/O regions */
diff --git a/kvm-all.c b/kvm-all.c
index 87b7f1e..9ac35aa 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -30,7 +30,6 @@
 
 /* KVM uses PAGE_SIZE in it's definition of COALESCED_MMIO_MAX */
 #define PAGE_SIZE TARGET_PAGE_SIZE
-#ifdef KVM_UPSTREAM
 
 //#define DEBUG_KVM
 
@@ -42,6 +41,8 @@
 do { } while (0)
 #endif
 
+#ifdef KVM_UPSTREAM
+
 typedef struct KVMSlot
 {
 target_phys_addr_t start_addr;
@@ -76,6 +77,8 @@ struct KVMState
 
 static KVMState *kvm_state;
 
+#endif
+
 static KVMSlot *kvm_alloc_slot(KVMState *s)
 {
 int i;
@@ -152,6 +155,7 @@ static int kvm_set_user_memory_region(KVMState *s, KVMSlot 
*slot)
 return kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, mem);
 }
 
+#ifdef KVM_UPSTREAM
 static void kvm_reset_vcpu(void *opaque)
 {
 CPUState *env = opaque;

[COMMIT master] qemu-kvm tests: enhanced msr test

2010-05-05 Thread Avi Kivity
From: Naphtali Sprei nsp...@redhat.com

Changed the code structure and added few tests for some of the msr's.

Signed-off-by: Naphtali Sprei nsp...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/kvm/user/test/x86/msr.c b/kvm/user/test/x86/msr.c
index 0d6f286..662cb4f 100644
--- a/kvm/user/test/x86/msr.c
+++ b/kvm/user/test/x86/msr.c
@@ -2,7 +2,80 @@
 
 #include libcflat.h
 
-#define MSR_KERNEL_GS_BASE 0xc102 /* SwapGS GS shadow */
+struct msr_info {
+int index;
+char *name;
+struct tc {
+int valid;
+unsigned long long value;
+unsigned long long expected;
+} val_pairs[20];
+};
+
+
+#define addr_64 0x123456789abcULL
+
+struct msr_info msr_info[] =
+{
+{ .index = 0x001b, .name = MSR_IA32_APICBASE,
+  .val_pairs = {
+{ .valid = 1, .value = 0x56789900, .expected = 
0x56789900},
+{ .valid = 1, .value = 0x56789D01, .expected = 
0x56789D01},
+}
+},
+{ .index = 0x0174, .name = IA32_SYSENTER_CS,
+  .val_pairs = {{ .valid = 1, .value = 0x1234, .expected = 0x1234}}
+},
+{ .index = 0x0175, .name = MSR_IA32_SYSENTER_ESP,
+  .val_pairs = {{ .valid = 1, .value = addr_64, .expected = addr_64}}
+},
+{ .index = 0x0176, .name = IA32_SYSENTER_EIP,
+  .val_pairs = {{ .valid = 1, .value = addr_64, .expected = addr_64}}
+},
+{ .index = 0x01a0, .name = MSR_IA32_MISC_ENABLE,
+  // reserved: 1:2, 4:6, 8:10, 13:15, 17, 19:21, 24:33, 35:63
+  .val_pairs = {{ .valid = 1, .value = 0x400c51889, .expected = 
0x400c51889}}
+},
+{ .index = 0x0277, .name = MSR_IA32_CR_PAT,
+  .val_pairs = {{ .valid = 1, .value = 0x07070707, .expected = 0x07070707}}
+},
+{ .index = 0xc100, .name = MSR_FS_BASE,
+  .val_pairs = {{ .valid = 1, .value = addr_64, .expected = addr_64}}
+},
+{ .index = 0xc101, .name = MSR_GS_BASE,
+  .val_pairs = {{ .valid = 1, .value = addr_64, .expected = addr_64}}
+},
+{ .index = 0xc102, .name = MSR_KERNEL_GS_BASE,
+  .val_pairs = {{ .valid = 1, .value = addr_64, .expected = addr_64}}
+},
+{ .index = 0xc080, .name = MSR_EFER,
+  .val_pairs = {{ .valid = 1, .value = 0xD00, .expected = 0xD00}}
+},
+{ .index = 0xc082, .name = MSR_LSTAR,
+  .val_pairs = {{ .valid = 1, .value = addr_64, .expected = addr_64}}
+},
+{ .index = 0xc083, .name = MSR_CSTAR,
+  .val_pairs = {{ .valid = 1, .value = addr_64, .expected = addr_64}}
+},
+{ .index = 0xc084, .name = MSR_SYSCALL_MASK,
+  .val_pairs = {{ .valid = 1, .value = 0x, .expected = 0x}}
+},
+
+//MSR_IA32_DEBUGCTLMSR needs svm feature LBRV
+//MSR_VM_HSAVE_PA only AMD host
+};
+
+static int find_msr_info(int msr_index)
+{
+int i;
+for (i = 0; i  sizeof(msr_info)/sizeof(msr_info[0]) ; i++) {
+if (msr_info[i].index == msr_index) {
+return i;
+}
+}
+return -1;
+}
+
 
 int nr_passed, nr_tests;
 
@@ -32,23 +105,42 @@ static unsigned long long rdmsr(unsigned index)
 
 #endif
 
-static void test_kernel_gs_base(void)
-{
-#ifdef __x86_64__
-   unsigned long long v1 = 0x123456789abc, v2;
 
-   wrmsr(MSR_KERNEL_GS_BASE, v1);
-   v2 = rdmsr(MSR_KERNEL_GS_BASE);
-   report(MSR_KERNEL_GS_BASE, v1 == v2);
-#endif
+
+static void test_msr_rw(int msr_index, unsigned long long input, unsigned long 
long expected)
+{
+unsigned long long r = 0;
+int index;
+char *sptr;
+if ((index = find_msr_info(msr_index)) != -1) {
+sptr = msr_info[index].name;
+} else {
+printf(couldn't find name for msr # 0x%x, skipping\n, msr_index);
+return;
+}
+wrmsr(msr_index, input);
+r = rdmsr(msr_index);
+if (expected != r) {
+printf(testing %s: output = 0x%x:0x%x expected = 0x%x:0x%x\n, sptr, 
r  32, r, expected  32, expected);
+}
+report(sptr, expected == r);
 }
 
 int main(int ac, char **av)
 {
-   test_kernel_gs_base();
+int i, j;
+for (i = 0 ; i  sizeof(msr_info) / sizeof(msr_info[0]); i++) {
+for (j = 0; j  sizeof(msr_info[i].val_pairs) / 
sizeof(msr_info[i].val_pairs[0]); j++) {
+if (msr_info[i].val_pairs[j].valid) {
+test_msr_rw(msr_info[i].index, msr_info[i].val_pairs[j].value, 
msr_info[i].val_pairs[j].expected);
+} else {
+break;
+}
+}
+}
 
-   printf(%d tests, %d failures\n, nr_tests, nr_tests - nr_passed);
+printf(%d tests, %d failures\n, nr_tests, nr_tests - nr_passed);
 
-   return nr_passed == nr_tests ? 0 : 1;
+return nr_passed == nr_tests ? 0 : 1;
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Document KVM_GET_MP_STATE and KVM_SET_MP_STATE

2010-05-05 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Acked-by: Pekka Enberg penb...@cs.helsinki.fi
Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index baa8fde..a237518 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -848,6 +848,50 @@ function properly, this is the place to put them.
__u8  pad[64];
 };
 
+4.37 KVM_GET_MP_STATE
+
+Capability: KVM_CAP_MP_STATE
+Architectures: x86, ia64
+Type: vcpu ioctl
+Parameters: struct kvm_mp_state (out)
+Returns: 0 on success; -1 on error
+
+struct kvm_mp_state {
+   __u32 mp_state;
+};
+
+Returns the vcpu's current multiprocessing state (though also valid on
+uniprocessor guests).
+
+Possible values are:
+
+ - KVM_MP_STATE_RUNNABLE:the vcpu is currently running
+ - KVM_MP_STATE_UNINITIALIZED:   the vcpu is an application processor (AP)
+ which has not yet received an INIT signal
+ - KVM_MP_STATE_INIT_RECEIVED:   the vcpu has received an INIT signal, and is
+ now ready for a SIPI
+ - KVM_MP_STATE_HALTED:  the vcpu has executed a HLT instruction and
+ is waiting for an interrupt
+ - KVM_MP_STATE_SIPI_RECEIVED:   the vcpu has just received a SIPI (vector
+ accesible via KVM_GET_VCPU_EVENTS)
+
+This ioctl is only useful after KVM_CREATE_IRQCHIP.  Without an in-kernel
+irqchip, the multiprocessing state must be maintained by userspace.
+
+4.38 KVM_SET_MP_STATE
+
+Capability: KVM_CAP_MP_STATE
+Architectures: x86, ia64
+Type: vcpu ioctl
+Parameters: struct kvm_mp_state (in)
+Returns: 0 on success; -1 on error
+
+Sets the vcpu's current multiprocessing state; see KVM_GET_MP_STATE for
+arguments.
+
+This ioctl is only useful after KVM_CREATE_IRQCHIP.  Without an in-kernel
+irqchip, the multiprocessing state must be maintained by userspace.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Minor MMU documentation edits

2010-05-05 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Reported by Andrew Jones.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/Documentation/kvm/mmu.txt b/Documentation/kvm/mmu.txt
index da04671..0cc28fb 100644
--- a/Documentation/kvm/mmu.txt
+++ b/Documentation/kvm/mmu.txt
@@ -75,8 +75,8 @@ direct mode; otherwise it operates in shadow mode (see below).
 Memory
 ==
 
-Guest memory (gpa) is part of user address space of the process that is using
-kvm.  Userspace defines the translation between guest addresses and user
+Guest memory (gpa) is part of the user address space of the process that is
+using kvm.  Userspace defines the translation between guest addresses and user
 addresses (gpa-hva); note that two gpas may alias to the same gva, but not
 vice versa.
 
@@ -111,7 +111,7 @@ is not related to a translation directly.  It points to 
other shadow pages.
 
 A leaf spte corresponds to either one or two translations encoded into
 one paging structure entry.  These are always the lowest level of the
-translation stack, with an optional higher level translations left to NPT/EPT.
+translation stack, with optional higher level translations left to NPT/EPT.
 Leaf ptes point at guest pages.
 
 The following table shows translations encoded by leaf ptes, with higher-level
@@ -167,7 +167,7 @@ Shadow pages contain the following information:
 Either the guest page table containing the translations shadowed by this
 page, or the base page frame for linear translations.  See role.direct.
   spt:
-A pageful of 64-bit sptes containig the translations for this page.
+A pageful of 64-bit sptes containing the translations for this page.
 Accessed by both kvm and hardware.
 The page pointed to by spt will have its page-private pointing back
 at the shadow page structure.
@@ -235,7 +235,7 @@ the amount of emulation we have to do when the guest 
modifies multiple gptes,
 or when the a guest page is no longer used as a page table and is used for
 random guest data.
 
-As a side effect we have resynchronize all reachable unsynchronized shadow
+As a side effect we have to resynchronize all reachable unsynchronized shadow
 pages on a tlb flush.
 
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: MMU: fix hashing for TDP and non-paging modes

2010-05-05 Thread Avi Kivity
From: Eric Northup digitale...@google.com

For TDP mode, avoid creating multiple page table roots for the single
guest-to-host physical address map by fixing the inputs used for the
shadow page table hash in mmu_alloc_roots().

Signed-off-by: Eric Northup digitale...@google.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ddfa865..9696d65 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2059,10 +2059,12 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
hpa_t root = vcpu-arch.mmu.root_hpa;
 
ASSERT(!VALID_PAGE(root));
-   if (tdp_enabled)
-   direct = 1;
if (mmu_check_root(vcpu, root_gfn))
return 1;
+   if (tdp_enabled) {
+   direct = 1;
+   root_gfn = 0;
+   }
sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
  PT64_ROOT_LEVEL, direct,
  ACC_ALL, NULL);
@@ -2072,8 +2074,6 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
return 0;
}
direct = !is_paging(vcpu);
-   if (tdp_enabled)
-   direct = 1;
for (i = 0; i  4; ++i) {
hpa_t root = vcpu-arch.mmu.pae_root[i];
 
@@ -2089,6 +2089,10 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
root_gfn = 0;
if (mmu_check_root(vcpu, root_gfn))
return 1;
+   if (tdp_enabled) {
+   direct = 1;
+   root_gfn = i  30;
+   }
sp = kvm_mmu_get_page(vcpu, root_gfn, i  30,
  PT32_ROOT_LEVEL, direct,
  ACC_ALL, NULL);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: VMX: Add definition for msr autoload entry

2010-05-05 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index fb9a080..4497318 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -25,6 +25,8 @@
  *
  */
 
+#include linux/types.h
+
 /*
  * Definitions of Primary Processor-Based VM-Execution Controls.
  */
@@ -394,6 +396,10 @@ enum vmcs_field {
 #define ASM_VMX_INVEPT   .byte 0x66, 0x0f, 0x38, 0x80, 0x08
 #define ASM_VMX_INVVPID  .byte 0x66, 0x0f, 0x38, 0x81, 0x08
 
-
+struct vmx_msr_entry {
+   u32 index;
+   u32 reserved;
+   u64 value;
+} __aligned(16);
 
 #endif
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Fix mmu shrinker error

2010-05-05 Thread Avi Kivity
From: Gui Jianfeng guijianf...@cn.fujitsu.com

kvm_mmu_remove_one_alloc_mmu_page() assumes kvm_mmu_zap_page() only reclaims
only one sp, but that's not the case. This will cause mmu shrinker returns
a wrong number. This patch fix the counting error.

Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9696d65..18d2f58 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2902,13 +2902,13 @@ restart:
kvm_flush_remote_tlbs(kvm);
 }
 
-static void kvm_mmu_remove_one_alloc_mmu_page(struct kvm *kvm)
+static int kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm)
 {
struct kvm_mmu_page *page;
 
page = container_of(kvm-arch.active_mmu_pages.prev,
struct kvm_mmu_page, link);
-   kvm_mmu_zap_page(kvm, page);
+   return kvm_mmu_zap_page(kvm, page) + 1;
 }
 
 static int mmu_shrink(int nr_to_scan, gfp_t gfp_mask)
@@ -2920,7 +2920,7 @@ static int mmu_shrink(int nr_to_scan, gfp_t gfp_mask)
spin_lock(kvm_lock);
 
list_for_each_entry(kvm, vm_list, vm_list) {
-   int npages, idx;
+   int npages, idx, freed_pages;
 
idx = srcu_read_lock(kvm-srcu);
spin_lock(kvm-mmu_lock);
@@ -2928,8 +2928,8 @@ static int mmu_shrink(int nr_to_scan, gfp_t gfp_mask)
 kvm-arch.n_free_mmu_pages;
cache_count += npages;
if (!kvm_freed  nr_to_scan  0  npages  0) {
-   kvm_mmu_remove_one_alloc_mmu_page(kvm);
-   cache_count--;
+   freed_pages = kvm_mmu_remove_some_alloc_mmu_pages(kvm);
+   cache_count -= freed_pages;
kvm_freed = kvm;
}
nr_to_scan--;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: MMU: move unsync/sync tracpoints to proper place

2010-05-05 Thread Avi Kivity
From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com

Move unsync/sync tracepoints to the proper place, it's good
for us to obtain unsync page live time

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 18d2f58..51eb6d6 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1189,6 +1189,7 @@ static struct kvm_mmu_page *kvm_mmu_lookup_page(struct 
kvm *kvm, gfn_t gfn)
 static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
WARN_ON(!sp-unsync);
+   trace_kvm_mmu_sync_page(sp);
sp-unsync = 0;
--kvm-stat.mmu_unsync;
 }
@@ -1202,7 +1203,6 @@ static int kvm_sync_page(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp)
return 1;
}
 
-   trace_kvm_mmu_sync_page(sp);
if (rmap_write_protect(vcpu-kvm, sp-gfn))
kvm_flush_remote_tlbs(vcpu-kvm);
kvm_unlink_unsync_page(vcpu-kvm, sp);
@@ -1730,7 +1730,6 @@ static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp)
struct kvm_mmu_page *s;
struct hlist_node *node, *n;
 
-   trace_kvm_mmu_unsync_page(sp);
index = kvm_page_table_hashfn(sp-gfn);
bucket = vcpu-kvm-arch.mmu_page_hash[index];
/* don't unsync if pagetable is shadowed with multiple roles */
@@ -1740,6 +1739,7 @@ static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp)
if (s-role.word != sp-role.word)
return 1;
}
+   trace_kvm_mmu_unsync_page(sp);
++vcpu-kvm-stat.mmu_unsync;
sp-unsync = 1;
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)

2010-05-05 Thread Avi Kivity
From: Shane Wang shane.w...@intel.com

Per document, for feature control MSR:

  Bit 1 enables VMXON in SMX operation. If the bit is clear, execution
of VMXON in SMX operation causes a general-protection exception.
  Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution
of VMXON outside SMX operation causes a general-protection exception.

This patch is to enable this kind of check with SMX for VMXON in KVM.

Signed-off-by: Shane Wang shane.w...@intel.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index bc473ac..f932485 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -202,8 +202,9 @@
 #define MSR_IA32_EBL_CR_POWERON0x002a
 #define MSR_IA32_FEATURE_CONTROL0x003a
 
-#define FEATURE_CONTROL_LOCKED (10)
-#define FEATURE_CONTROL_VMXON_ENABLED  (12)
+#define FEATURE_CONTROL_LOCKED (10)
+#define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX   (11)
+#define FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX  (12)
 
 #define MSR_IA32_APICBASE  0x001b
 #define MSR_IA32_APICBASE_BSP  (18)
diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index 86c9f91..46b8277 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -46,6 +46,7 @@
 
 /* Global pointer to shared data; NULL means no measured launch. */
 struct tboot *tboot __read_mostly;
+EXPORT_SYMBOL(tboot);
 
 /* timeout for APs (in secs) to enter wait-for-SIPI state during shutdown */
 #define AP_WAIT_TIMEOUT1
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c4f3955..d2a47ae 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -27,6 +27,7 @@
 #include linux/moduleparam.h
 #include linux/ftrace_event.h
 #include linux/slab.h
+#include linux/tboot.h
 #include kvm_cache_regs.h
 #include x86.h
 
@@ -1272,9 +1273,16 @@ static __init int vmx_disabled_by_bios(void)
u64 msr;
 
rdmsrl(MSR_IA32_FEATURE_CONTROL, msr);
-   return (msr  (FEATURE_CONTROL_LOCKED |
-  FEATURE_CONTROL_VMXON_ENABLED))
-   == FEATURE_CONTROL_LOCKED;
+   if (msr  FEATURE_CONTROL_LOCKED) {
+   if (!(msr  FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX)
+tboot_enabled())
+   return 1;
+   if (!(msr  FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX)
+!tboot_enabled())
+   return 1;
+   }
+
+   return 0;
/* locked but not enabled */
 }
 
@@ -1282,21 +1290,23 @@ static int hardware_enable(void *garbage)
 {
int cpu = raw_smp_processor_id();
u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
-   u64 old;
+   u64 old, test_bits;
 
if (read_cr4()  X86_CR4_VMXE)
return -EBUSY;
 
INIT_LIST_HEAD(per_cpu(vcpus_on_cpu, cpu));
rdmsrl(MSR_IA32_FEATURE_CONTROL, old);
-   if ((old  (FEATURE_CONTROL_LOCKED |
-   FEATURE_CONTROL_VMXON_ENABLED))
-   != (FEATURE_CONTROL_LOCKED |
-   FEATURE_CONTROL_VMXON_ENABLED))
+
+   test_bits = FEATURE_CONTROL_LOCKED;
+   test_bits |= FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX;
+   if (tboot_enabled())
+   test_bits |= FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX;
+
+   if ((old  test_bits) != test_bits) {
/* enable and lock */
-   wrmsrl(MSR_IA32_FEATURE_CONTROL, old |
-  FEATURE_CONTROL_LOCKED |
-  FEATURE_CONTROL_VMXON_ENABLED);
+   wrmsrl(MSR_IA32_FEATURE_CONTROL, old | test_bits);
+   }
write_cr4(read_cr4() | X86_CR4_VMXE); /* FIXME: not cpu hotplug safe */
asm volatile (ASM_VMX_VMXON_RAX
  : : a(phys_addr), m(phys_addr)
diff --git a/include/linux/tboot.h b/include/linux/tboot.h
index bf2a0c7..1dba6ee 100644
--- a/include/linux/tboot.h
+++ b/include/linux/tboot.h
@@ -150,6 +150,7 @@ extern int tboot_force_iommu(void);
 
 #else
 
+#define tboot_enabled()0
 #define tboot_probe()  do { } while (0)
 #define tboot_shutdown(shutdown_type)  do { } while (0)
 #define tboot_sleep(sleep_state, pm1a_control, pm1b_control)   \
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: VMX: Avoid writing HOST_CR0 every entry

2010-05-05 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

cr0.ts may change between entries, so we copy cr0 to HOST_CR0 before each
entry.  That is slow, so instead, set HOST_CR0 to have TS set unconditionally
(which is a safe value), and issue a clts() just before exiting vcpu context
if the task indeed owns the fpu.

Saves ~50 cycles/exit.

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ba0fd42..777e00d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -812,6 +812,8 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base);
}
 #endif
+   if (current_thread_info()-status  TS_USEDFPU)
+   clts();
 }
 
 static void vmx_load_host_state(struct vcpu_vmx *vmx)
@@ -2510,7 +2512,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, !!bypass_guest_pf);
vmcs_write32(CR3_TARGET_COUNT, 0);   /* 22.2.1 */
 
-   vmcs_writel(HOST_CR0, read_cr0());  /* 22.2.3 */
+   vmcs_writel(HOST_CR0, read_cr0() | X86_CR0_TS);  /* 22.2.3 */
vmcs_writel(HOST_CR4, read_cr4());  /* 22.2.3, 22.2.5 */
vmcs_writel(HOST_CR3, read_cr3());  /* 22.2.3  FIXME: shadow tables */
 
@@ -3863,11 +3865,6 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
vmx_set_interrupt_shadow(vcpu, 0);
 
-   /*
-* Loading guest fpu may have cleared host cr0.ts
-*/
-   vmcs_writel(HOST_CR0, read_cr0());
-
asm(
/* Store host registers */
push %%Rdx; push %%Rbp;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f6f8dad..8e267ab 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1723,8 +1723,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
-   kvm_put_guest_fpu(vcpu);
kvm_x86_ops-vcpu_put(vcpu);
+   kvm_put_guest_fpu(vcpu);
 }
 
 static int is_efer_nx(void)
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Fix wallclock version writing race

2010-05-05 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Wallclock writing uses an unprotected global variable to hold the version;
this can cause one guest to interfere with another if both write their
wallclock at the same time.

Acked-by: Glauber Costa glom...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8e267ab..4d0a968 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -754,14 +754,22 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned 
index, u64 *data)
 
 static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
 {
-   static int version;
+   int version;
+   int r;
struct pvclock_wall_clock wc;
struct timespec boot;
 
if (!wall_clock)
return;
 
-   version++;
+   r = kvm_read_guest(kvm, wall_clock, version, sizeof(version));
+   if (r)
+   return;
+
+   if (version  1)
+   ++version;  /* first time write, random junk */
+
+   ++version;
 
kvm_write_guest(kvm, wall_clock, version, sizeof(version));
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: cleanup nop emulation

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Make it more explicit what we are checking for.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a99d49c..03a7291 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2799,8 +2799,8 @@ special_insn:
goto done;
break;
case 0x90: /* nop / xchg r8,rax */
-   if (!(c-rex_prefix  1)) { /* nop */
-   c-dst.type = OP_NONE;
+   if (c-dst.ptr == (unsigned long *)c-regs[VCPU_REGS_RAX]) {
+   c-dst.type = OP_NONE;  /* nop */
break;
}
case 0x91 ... 0x97: /* xchg reg,rax */
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: add get_cached_segment_base() callback to x86_emulate_ops

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

On VMX it is expensive to call get_cached_descriptor() just to get segment
base since multiple vmcs_reads are done instead of only one. Introduce
new call back get_cached_segment_base() for efficiency.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index f751657..df53ba2 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -132,6 +132,7 @@ struct x86_emulate_ops {
  int seg, struct kvm_vcpu *vcpu);
u16 (*get_segment_selector)(int seg, struct kvm_vcpu *vcpu);
void (*set_segment_selector)(u16 sel, int seg, struct kvm_vcpu *vcpu);
+   unsigned long (*get_cached_segment_base)(int seg, struct kvm_vcpu 
*vcpu);
void (*get_gdt)(struct desc_ptr *dt, struct kvm_vcpu *vcpu);
ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7c8ed56..8228778 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2097,17 +2097,6 @@ static bool emulator_io_permited(struct x86_emulate_ctxt 
*ctxt,
return true;
 }
 
-static u32 get_cached_descriptor_base(struct x86_emulate_ctxt *ctxt,
- struct x86_emulate_ops *ops,
- int seg)
-{
-   struct desc_struct desc;
-   if (ops-get_cached_descriptor(desc, seg, ctxt-vcpu))
-   return get_desc_base(desc);
-   else
-   return ~0;
-}
-
 static void save_state_to_tss16(struct x86_emulate_ctxt *ctxt,
struct x86_emulate_ops *ops,
struct tss_segment_16 *tss)
@@ -2383,7 +2372,7 @@ static int emulator_do_task_switch(struct 
x86_emulate_ctxt *ctxt,
int ret;
u16 old_tss_sel = ops-get_segment_selector(VCPU_SREG_TR, ctxt-vcpu);
ulong old_tss_base =
-   get_cached_descriptor_base(ctxt, ops, VCPU_SREG_TR);
+   ops-get_cached_segment_base(VCPU_SREG_TR, ctxt-vcpu);
u32 desc_limit;
 
/* FIXME: old_tss_base == ~0 ? */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 673efbe..29cc2b1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3669,6 +3669,12 @@ static void emulator_get_gdt(struct desc_ptr *dt, struct 
kvm_vcpu *vcpu)
kvm_x86_ops-get_gdt(vcpu, dt);
 }
 
+static unsigned long emulator_get_cached_segment_base(int seg,
+ struct kvm_vcpu *vcpu)
+{
+   return get_segment_base(vcpu, seg);
+}
+
 static bool emulator_get_cached_descriptor(struct desc_struct *desc, int seg,
   struct kvm_vcpu *vcpu)
 {
@@ -3759,6 +3765,7 @@ static struct x86_emulate_ops emulate_ops = {
.set_cached_descriptor = emulator_set_cached_descriptor,
.get_segment_selector = emulator_get_segment_selector,
.set_segment_selector = emulator_set_segment_selector,
+   .get_cached_segment_base = emulator_get_cached_segment_base,
.get_gdt = emulator_get_gdt,
.get_cr  = emulator_get_cr,
.set_cr  = emulator_set_cr,
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: cleanup xchg emulation

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Dst operand is already initialized during decoding stage. No need to
reinitialize.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a81e6bf..a99d49c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2804,8 +2804,8 @@ special_insn:
break;
}
case 0x91 ... 0x97: /* xchg reg,rax */
-   c-src.type = c-dst.type = OP_REG;
-   c-src.bytes = c-dst.bytes = c-op_bytes;
+   c-src.type = OP_REG;
+   c-src.bytes = c-op_bytes;
c-src.ptr = (unsigned long *) c-regs[VCPU_REGS_RAX];
c-src.val = *(c-src.ptr);
goto xchg;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: handle emulation failure case first

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

If emulation failed return immediately.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 01bb1f3..4121a9f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3879,22 +3879,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 
 restart:
r = x86_emulate_insn(vcpu-arch.emulate_ctxt, emulate_ops);
-   shadow_mask = vcpu-arch.emulate_ctxt.interruptibility;
-
-   if (r == 0)
-   kvm_x86_ops-set_interrupt_shadow(vcpu, shadow_mask);
-
-   if (vcpu-arch.pio.count) {
-   if (!vcpu-arch.pio.in)
-   vcpu-arch.pio.count = 0;
-   return EMULATE_DO_MMIO;
-   }
-
-   if (vcpu-mmio_needed) {
-   if (vcpu-mmio_is_write)
-   vcpu-mmio_needed = 0;
-   return EMULATE_DO_MMIO;
-   }
 
if (r) { /* emulation failed */
/*
@@ -3910,6 +3894,21 @@ restart:
return EMULATE_FAIL;
}
 
+   shadow_mask = vcpu-arch.emulate_ctxt.interruptibility;
+   kvm_x86_ops-set_interrupt_shadow(vcpu, shadow_mask);
+
+   if (vcpu-arch.pio.count) {
+   if (!vcpu-arch.pio.in)
+   vcpu-arch.pio.count = 0;
+   return EMULATE_DO_MMIO;
+   }
+
+   if (vcpu-mmio_needed) {
+   if (vcpu-mmio_is_write)
+   vcpu-mmio_needed = 0;
+   return EMULATE_DO_MMIO;
+   }
+
if (vcpu-arch.exception.pending)
vcpu-arch.emulate_ctxt.restart = false;
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: handle far address source operand

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

ljmp/lcall instruction operand contains address and segment.
It can be 10 bytes long. Currently we decode it as two different
operands. Fix it by introducing new kind of operand that can hold
entire far address.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 288cbed..69a64a6 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -143,7 +143,11 @@ struct x86_emulate_ops {
 struct operand {
enum { OP_REG, OP_MEM, OP_IMM, OP_NONE } type;
unsigned int bytes;
-   unsigned long val, orig_val, *ptr;
+   unsigned long orig_val, *ptr;
+   union {
+   unsigned long val;
+   char valptr[sizeof(unsigned long) + 2];
+   };
 };
 
 struct fetch_cache {
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 03a7291..687ea09 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -67,6 +67,8 @@
 #define SrcImmUByte (84)  /* 8-bit unsigned immediate operand. */
 #define SrcImmU (94)  /* Immediate operand, unsigned */
 #define SrcSI   (0xa4)   /* Source is in the DS:RSI */
+#define SrcImmFAddr (0xb4)   /* Source is immediate far address */
+#define SrcMemFAddr (0xc4)   /* Source is far address in memory */
 #define SrcMask (0xf4)
 /* Generic ModRM decode. */
 #define ModRM   (18)
@@ -88,10 +90,6 @@
 #define Src2CL  (129)
 #define Src2ImmByte (229)
 #define Src2One (329)
-#define Src2Imm16   (429)
-#define Src2Mem16   (529) /* Used for Ep encoding. First argument has to be
-  in memory and second argument is located
-  immediately after the first one in memory. */
 #define Src2Mask(729)
 
 enum {
@@ -175,7 +173,7 @@ static u32 opcode_table[256] = {
/* 0x90 - 0x97 */
DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, DstReg,
/* 0x98 - 0x9F */
-   0, 0, SrcImm | Src2Imm16 | No64, 0,
+   0, 0, SrcImmFAddr | No64, 0,
ImplicitOps | Stack, ImplicitOps | Stack, 0, 0,
/* 0xA0 - 0xA7 */
ByteOp | DstReg | SrcMem | Mov | MemAbs, DstReg | SrcMem | Mov | MemAbs,
@@ -215,7 +213,7 @@ static u32 opcode_table[256] = {
ByteOp | SrcImmUByte | DstAcc, SrcImmUByte | DstAcc,
/* 0xE8 - 0xEF */
SrcImm | Stack, SrcImm | ImplicitOps,
-   SrcImmU | Src2Imm16 | No64, SrcImmByte | ImplicitOps,
+   SrcImmFAddr | No64, SrcImmByte | ImplicitOps,
SrcNone | ByteOp | DstAcc, SrcNone | DstAcc,
SrcNone | ByteOp | DstAcc, SrcNone | DstAcc,
/* 0xF0 - 0xF7 */
@@ -350,7 +348,7 @@ static u32 group_table[] = {
[Group5*8] =
DstMem | SrcNone | ModRM, DstMem | SrcNone | ModRM,
SrcMem | ModRM | Stack, 0,
-   SrcMem | ModRM | Stack, SrcMem | ModRM | Src2Mem16 | ImplicitOps,
+   SrcMem | ModRM | Stack, SrcMemFAddr | ModRM | ImplicitOps,
SrcMem | ModRM | Stack, 0,
[Group7*8] =
0, 0, ModRM | SrcMem | Priv, ModRM | SrcMem | Priv,
@@ -576,6 +574,13 @@ static u32 group2_table[] = {
(_type)_x;  \
 })
 
+#define insn_fetch_arr(_arr, _size, _eip)\
+({ rc = do_insn_fetch(ctxt, ops, (_eip), _arr, (_size));   \
+   if (rc != X86EMUL_CONTINUE) \
+   goto done;  \
+   (_eip) += (_size);  \
+})
+
 static inline unsigned long ad_mask(struct decode_cache *c)
 {
return (1UL  (c-ad_bytes  3)) - 1;
@@ -1160,6 +1165,17 @@ done_prefixes:
 c-regs[VCPU_REGS_RSI]);
c-src.val = 0;
break;
+   case SrcImmFAddr:
+   c-src.type = OP_IMM;
+   c-src.ptr = (unsigned long *)c-eip;
+   c-src.bytes = c-op_bytes + 2;
+   insn_fetch_arr(c-src.valptr, c-src.bytes, c-eip);
+   break;
+   case SrcMemFAddr:
+   c-src.type = OP_MEM;
+   c-src.ptr = (unsigned long *)c-modrm_ea;
+   c-src.bytes = c-op_bytes + 2;
+   break;
}
 
/*
@@ -1179,22 +1195,10 @@ done_prefixes:
c-src2.bytes = 1;
c-src2.val = insn_fetch(u8, 1, c-eip);
break;
-   case Src2Imm16:
-   c-src2.type = OP_IMM;
-   c-src2.ptr = (unsigned long *)c-eip;
-   c-src2.bytes = 2;
-   c-src2.val = insn_fetch(u16, 2, c-eip);
-   break;
case Src2One:
c-src2.bytes = 1;
c-src2.val = 1;
break;
-   case Src2Mem16:
-   c-src2.type = OP_MEM;
-   c-src2.bytes = 2;
-   

[COMMIT master] KVM: x86 emulator: set RFLAGS outside x86 emulator code

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Removes the need for set_flags() callback.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index b7e00cb..a87d95f 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -142,7 +142,6 @@ struct x86_emulate_ops {
ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
int (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
int (*cpl)(struct kvm_vcpu *vcpu);
-   void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
int (*get_dr)(int dr, unsigned long *dest, struct kvm_vcpu *vcpu);
int (*set_dr)(int dr, unsigned long value, struct kvm_vcpu *vcpu);
int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 437f31b..291e220 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3034,7 +3034,6 @@ writeback:
/* Commit shadow register state. */
memcpy(ctxt-vcpu-arch.regs, c-regs, sizeof c-regs);
ctxt-eip = c-eip;
-   ops-set_rflags(ctxt-vcpu, ctxt-eflags);
 
 done:
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3544ea9..f42be00 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3761,11 +3761,6 @@ static void emulator_set_segment_selector(u16 sel, int 
seg,
kvm_set_segment(vcpu, kvm_seg, seg);
 }
 
-static void emulator_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
-{
-   kvm_x86_ops-set_rflags(vcpu, rflags);
-}
-
 static struct x86_emulate_ops emulate_ops = {
.read_std= kvm_read_guest_virt_system,
.write_std   = kvm_write_guest_virt_system,
@@ -3784,7 +3779,6 @@ static struct x86_emulate_ops emulate_ops = {
.get_cr  = emulator_get_cr,
.set_cr  = emulator_set_cr,
.cpl = emulator_get_cpl,
-   .set_rflags  = emulator_set_rflags,
.get_dr  = emulator_get_dr,
.set_dr  = emulator_set_dr,
.set_msr = kvm_set_msr,
@@ -3896,6 +3890,7 @@ restart:
 
shadow_mask = vcpu-arch.emulate_ctxt.interruptibility;
kvm_x86_ops-set_interrupt_shadow(vcpu, shadow_mask);
+   kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags);
kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip);
 
if (vcpu-arch.pio.count) {
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: fill in run-mmio details in (read|write)_emulated function

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Fill in run-mmio details in (read|write)_emulated function just like
pio does. There is no point in filling only vcpu fields there just to
copy them into vcpu-run a little bit later.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dfad042..55496f4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3341,9 +3341,10 @@ mmio:
trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, bytes, gpa, 0);
 
vcpu-mmio_needed = 1;
-   vcpu-mmio_phys_addr = gpa;
-   vcpu-mmio_size = bytes;
-   vcpu-mmio_is_write = 0;
+   vcpu-run-exit_reason = KVM_EXIT_MMIO;
+   vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
+   vcpu-run-mmio.len = vcpu-mmio_size = bytes;
+   vcpu-run-mmio.is_write = vcpu-mmio_is_write = 0;
 
return X86EMUL_UNHANDLEABLE;
 }
@@ -3391,10 +3392,11 @@ mmio:
return X86EMUL_CONTINUE;
 
vcpu-mmio_needed = 1;
-   vcpu-mmio_phys_addr = gpa;
-   vcpu-mmio_size = bytes;
-   vcpu-mmio_is_write = 1;
-   memcpy(vcpu-mmio_data, val, bytes);
+   vcpu-run-exit_reason = KVM_EXIT_MMIO;
+   vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
+   vcpu-run-mmio.len = vcpu-mmio_size = bytes;
+   vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
+   memcpy(vcpu-run-mmio.data, val, bytes);
 
return X86EMUL_CONTINUE;
 }
@@ -3805,7 +3807,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 {
int r, shadow_mask;
struct decode_cache *c;
-   struct kvm_run *run = vcpu-run;
 
kvm_clear_exception_queue(vcpu);
vcpu-arch.mmio_fault_cr2 = cr2;
@@ -3892,14 +3893,6 @@ restart:
return EMULATE_DO_MMIO;
}
 
-   if (r || vcpu-mmio_is_write) {
-   run-exit_reason = KVM_EXIT_MMIO;
-   run-mmio.phys_addr = vcpu-mmio_phys_addr;
-   memcpy(run-mmio.data, vcpu-mmio_data, 8);
-   run-mmio.len = vcpu-mmio_size;
-   run-mmio.is_write = vcpu-mmio_is_write;
-   }
-
if (r) {
if (kvm_mmu_unprotect_page_virt(vcpu, cr2))
goto done;
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: fix X86EMUL_RETRY_INSTR and X86EMUL_CMPXCHG_FAILED values

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Currently X86EMUL_PROPAGATE_FAULT, X86EMUL_RETRY_INSTR and
X86EMUL_CMPXCHG_FAILED have the same value so caller cannot
distinguish why function such as emulator_cmpxchg_emulated()
(which can return both X86EMUL_PROPAGATE_FAULT and
X86EMUL_CMPXCHG_FAILED) failed.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 6c4f491..0cf4311 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -51,8 +51,9 @@ struct x86_emulate_ctxt;
 #define X86EMUL_UNHANDLEABLE1
 /* Terminate emulation but return success to the caller. */
 #define X86EMUL_PROPAGATE_FAULT 2 /* propagate a generated fault to guest */
-#define X86EMUL_RETRY_INSTR 2 /* retry the instruction for some reason */
-#define X86EMUL_CMPXCHG_FAILED  2 /* cmpxchg did not see expected value */
+#define X86EMUL_RETRY_INSTR 3 /* retry the instruction for some reason */
+#define X86EMUL_CMPXCHG_FAILED  4 /* cmpxchg did not see expected value */
+
 struct x86_emulate_ops {
/*
 * read_std: Read bytes of standard (non-emulated/special) memory.
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: cleanup some direct calls into kvm to use existing callbacks

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Use callbacks from x86_emulate_ops to access segments instead of calling
into kvm directly.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8228778..f56ec48 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -622,31 +622,35 @@ static void set_seg_override(struct decode_cache *c, int 
seg)
c-seg_override = seg;
 }
 
-static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, int seg)
+static unsigned long seg_base(struct x86_emulate_ctxt *ctxt,
+ struct x86_emulate_ops *ops, int seg)
 {
if (ctxt-mode == X86EMUL_MODE_PROT64  seg  VCPU_SREG_FS)
return 0;
 
-   return kvm_x86_ops-get_segment_base(ctxt-vcpu, seg);
+   return ops-get_cached_segment_base(seg, ctxt-vcpu);
 }
 
 static unsigned long seg_override_base(struct x86_emulate_ctxt *ctxt,
+  struct x86_emulate_ops *ops,
   struct decode_cache *c)
 {
if (!c-has_seg_override)
return 0;
 
-   return seg_base(ctxt, c-seg_override);
+   return seg_base(ctxt, ops, c-seg_override);
 }
 
-static unsigned long es_base(struct x86_emulate_ctxt *ctxt)
+static unsigned long es_base(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops)
 {
-   return seg_base(ctxt, VCPU_SREG_ES);
+   return seg_base(ctxt, ops, VCPU_SREG_ES);
 }
 
-static unsigned long ss_base(struct x86_emulate_ctxt *ctxt)
+static unsigned long ss_base(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops)
 {
-   return seg_base(ctxt, VCPU_SREG_SS);
+   return seg_base(ctxt, ops, VCPU_SREG_SS);
 }
 
 static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt,
@@ -941,7 +945,7 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
memset(c, 0, sizeof(struct decode_cache));
c-eip = ctxt-eip;
c-fetch.start = c-fetch.end = c-eip;
-   ctxt-cs_base = seg_base(ctxt, VCPU_SREG_CS);
+   ctxt-cs_base = seg_base(ctxt, ops, VCPU_SREG_CS);
memcpy(c-regs, ctxt-vcpu-arch.regs, sizeof c-regs);
 
switch (mode) {
@@ -1065,7 +1069,7 @@ done_prefixes:
set_seg_override(c, VCPU_SREG_DS);
 
if (!(!c-twobyte  c-b == 0x8d))
-   c-modrm_ea += seg_override_base(ctxt, c);
+   c-modrm_ea += seg_override_base(ctxt, ops, c);
 
if (c-ad_bytes != 8)
c-modrm_ea = (u32)c-modrm_ea;
@@ -1161,7 +1165,7 @@ done_prefixes:
c-src.type = OP_MEM;
c-src.bytes = (c-d  ByteOp) ? 1 : c-op_bytes;
c-src.ptr = (unsigned long *)
-   register_address(c,  seg_override_base(ctxt, c),
+   register_address(c,  seg_override_base(ctxt, ops, c),
 c-regs[VCPU_REGS_RSI]);
c-src.val = 0;
break;
@@ -1257,7 +1261,7 @@ done_prefixes:
c-dst.type = OP_MEM;
c-dst.bytes = (c-d  ByteOp) ? 1 : c-op_bytes;
c-dst.ptr = (unsigned long *)
-   register_address(c, es_base(ctxt),
+   register_address(c, es_base(ctxt, ops),
 c-regs[VCPU_REGS_RDI]);
c-dst.val = 0;
break;
@@ -1516,7 +1520,8 @@ exception:
return X86EMUL_PROPAGATE_FAULT;
 }
 
-static inline void emulate_push(struct x86_emulate_ctxt *ctxt)
+static inline void emulate_push(struct x86_emulate_ctxt *ctxt,
+   struct x86_emulate_ops *ops)
 {
struct decode_cache *c = ctxt-decode;
 
@@ -1524,7 +1529,7 @@ static inline void emulate_push(struct x86_emulate_ctxt 
*ctxt)
c-dst.bytes = c-op_bytes;
c-dst.val = c-src.val;
register_address_increment(c, c-regs[VCPU_REGS_RSP], -c-op_bytes);
-   c-dst.ptr = (void *) register_address(c, ss_base(ctxt),
+   c-dst.ptr = (void *) register_address(c, ss_base(ctxt, ops),
   c-regs[VCPU_REGS_RSP]);
 }
 
@@ -1535,7 +1540,7 @@ static int emulate_pop(struct x86_emulate_ctxt *ctxt,
struct decode_cache *c = ctxt-decode;
int rc;
 
-   rc = read_emulated(ctxt, ops, register_address(c, ss_base(ctxt),
+   rc = read_emulated(ctxt, ops, register_address(c, ss_base(ctxt, ops),
   c-regs[VCPU_REGS_RSP]),
   dest, len);
if (rc != X86EMUL_CONTINUE)
@@ -1588,15 +1593,14 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt,
return rc;
 }
 
-static void emulate_push_sreg(struct x86_emulate_ctxt *ctxt, int seg)
+static void emulate_push_sreg(struct x86_emulate_ctxt *ctxt,
+ struct 

[COMMIT master] KVM: x86 emulator: add (set|get)_msr callbacks to x86_emulate_ops

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Add (set|get)_msr callbacks to x86_emulate_ops instead of calling
them directly.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index c37296d..f751657 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -139,6 +139,8 @@ struct x86_emulate_ops {
void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
int (*get_dr)(int dr, unsigned long *dest, struct kvm_vcpu *vcpu);
int (*set_dr)(int dr, unsigned long value, struct kvm_vcpu *vcpu);
+   int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
+   int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
 };
 
 /* Type, address-of, and value of an instruction's operand. */
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8a4aa73..7c8ed56 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1875,7 +1875,7 @@ setup_syscalls_segments(struct x86_emulate_ctxt *ctxt,
 }
 
 static int
-emulate_syscall(struct x86_emulate_ctxt *ctxt)
+emulate_syscall(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 {
struct decode_cache *c = ctxt-decode;
struct kvm_segment cs, ss;
@@ -1890,7 +1890,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt)
 
setup_syscalls_segments(ctxt, cs, ss);
 
-   kvm_x86_ops-get_msr(ctxt-vcpu, MSR_STAR, msr_data);
+   ops-get_msr(ctxt-vcpu, MSR_STAR, msr_data);
msr_data = 32;
cs.selector = (u16)(msr_data  0xfffc);
ss.selector = (u16)(msr_data + 8);
@@ -1907,17 +1907,17 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt)
 #ifdef CONFIG_X86_64
c-regs[VCPU_REGS_R11] = ctxt-eflags  ~EFLG_RF;
 
-   kvm_x86_ops-get_msr(ctxt-vcpu,
-   ctxt-mode == X86EMUL_MODE_PROT64 ?
-   MSR_LSTAR : MSR_CSTAR, msr_data);
+   ops-get_msr(ctxt-vcpu,
+ctxt-mode == X86EMUL_MODE_PROT64 ?
+MSR_LSTAR : MSR_CSTAR, msr_data);
c-eip = msr_data;
 
-   kvm_x86_ops-get_msr(ctxt-vcpu, MSR_SYSCALL_MASK, msr_data);
+   ops-get_msr(ctxt-vcpu, MSR_SYSCALL_MASK, msr_data);
ctxt-eflags = ~(msr_data | EFLG_RF);
 #endif
} else {
/* legacy mode */
-   kvm_x86_ops-get_msr(ctxt-vcpu, MSR_STAR, msr_data);
+   ops-get_msr(ctxt-vcpu, MSR_STAR, msr_data);
c-eip = (u32)msr_data;
 
ctxt-eflags = ~(EFLG_VM | EFLG_IF | EFLG_RF);
@@ -1927,7 +1927,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt)
 }
 
 static int
-emulate_sysenter(struct x86_emulate_ctxt *ctxt)
+emulate_sysenter(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 {
struct decode_cache *c = ctxt-decode;
struct kvm_segment cs, ss;
@@ -1949,7 +1949,7 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt)
 
setup_syscalls_segments(ctxt, cs, ss);
 
-   kvm_x86_ops-get_msr(ctxt-vcpu, MSR_IA32_SYSENTER_CS, msr_data);
+   ops-get_msr(ctxt-vcpu, MSR_IA32_SYSENTER_CS, msr_data);
switch (ctxt-mode) {
case X86EMUL_MODE_PROT32:
if ((msr_data  0xfffc) == 0x0) {
@@ -1979,17 +1979,17 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt)
kvm_x86_ops-set_segment(ctxt-vcpu, cs, VCPU_SREG_CS);
kvm_x86_ops-set_segment(ctxt-vcpu, ss, VCPU_SREG_SS);
 
-   kvm_x86_ops-get_msr(ctxt-vcpu, MSR_IA32_SYSENTER_EIP, msr_data);
+   ops-get_msr(ctxt-vcpu, MSR_IA32_SYSENTER_EIP, msr_data);
c-eip = msr_data;
 
-   kvm_x86_ops-get_msr(ctxt-vcpu, MSR_IA32_SYSENTER_ESP, msr_data);
+   ops-get_msr(ctxt-vcpu, MSR_IA32_SYSENTER_ESP, msr_data);
c-regs[VCPU_REGS_RSP] = msr_data;
 
return X86EMUL_CONTINUE;
 }
 
 static int
-emulate_sysexit(struct x86_emulate_ctxt *ctxt)
+emulate_sysexit(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 {
struct decode_cache *c = ctxt-decode;
struct kvm_segment cs, ss;
@@ -2012,7 +2012,7 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
 
cs.dpl = 3;
ss.dpl = 3;
-   kvm_x86_ops-get_msr(ctxt-vcpu, MSR_IA32_SYSENTER_CS, msr_data);
+   ops-get_msr(ctxt-vcpu, MSR_IA32_SYSENTER_CS, msr_data);
switch (usermode) {
case X86EMUL_MODE_PROT32:
cs.selector = (u16)(msr_data + 16);
@@ -3099,7 +3099,7 @@ twobyte_insn:
}
break;
case 0x05:  /* syscall */
-   rc = emulate_syscall(ctxt);
+   rc = emulate_syscall(ctxt, ops);
if (rc != X86EMUL_CONTINUE)
goto done;
else
@@ -3155,7 +3155,7 @@ twobyte_insn:
/* wrmsr */
msr_data = (u32)c-regs[VCPU_REGS_RAX]
| 

[COMMIT master] KVM: x86 emulator: move interruptibility state tracking out of emulator

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Emulator shouldn't access vcpu directly.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 97a42e8..c40b405 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1843,20 +1843,6 @@ static inline int writeback(struct x86_emulate_ctxt 
*ctxt,
return X86EMUL_CONTINUE;
 }
 
-static void toggle_interruptibility(struct x86_emulate_ctxt *ctxt, u32 mask)
-{
-   u32 int_shadow = kvm_x86_ops-get_interrupt_shadow(ctxt-vcpu, mask);
-   /*
-* an sti; sti; sequence only disable interrupts for the first
-* instruction. So, if the last instruction, be it emulated or
-* not, left the system with the INT_STI flag enabled, it
-* means that the last instruction is an sti. We should not
-* leave the flag on in this case. The same goes for mov ss
-*/
-   if (!(int_shadow  mask))
-   ctxt-interruptibility = mask;
-}
-
 static inline void
 setup_syscalls_segments(struct x86_emulate_ctxt *ctxt,
struct x86_emulate_ops *ops, struct desc_struct *cs,
@@ -2516,7 +2502,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
int rc = X86EMUL_CONTINUE;
int saved_dst_type = c-dst.type;
 
-   ctxt-interruptibility = 0;
ctxt-decode.mem_read.pos = 0;
 
if (ctxt-mode == X86EMUL_MODE_PROT64  (c-d  No64)) {
@@ -2789,7 +2774,7 @@ special_insn:
}
 
if (c-modrm_reg == VCPU_SREG_SS)
-   toggle_interruptibility(ctxt, 
KVM_X86_SHADOW_INT_MOV_SS);
+   ctxt-interruptibility = KVM_X86_SHADOW_INT_MOV_SS;
 
rc = load_segment_descriptor(ctxt, ops, sel, c-modrm_reg);
 
@@ -2958,7 +2943,7 @@ special_insn:
if (emulator_bad_iopl(ctxt, ops))
kvm_inject_gp(ctxt-vcpu, 0);
else {
-   toggle_interruptibility(ctxt, KVM_X86_SHADOW_INT_STI);
+   ctxt-interruptibility = KVM_X86_SHADOW_INT_STI;
ctxt-eflags |= X86_EFLAGS_IF;
c-dst.type = OP_NONE;  /* Disable writeback. */
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d84d531..2b29ca3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3793,12 +3793,26 @@ static void cache_all_regs(struct kvm_vcpu *vcpu)
vcpu-arch.regs_dirty = ~0;
 }
 
+static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
+{
+   u32 int_shadow = kvm_x86_ops-get_interrupt_shadow(vcpu, mask);
+   /*
+* an sti; sti; sequence only disable interrupts for the first
+* instruction. So, if the last instruction, be it emulated or
+* not, left the system with the INT_STI flag enabled, it
+* means that the last instruction is an sti. We should not
+* leave the flag on in this case. The same goes for mov ss
+*/
+   if (!(int_shadow  mask))
+   kvm_x86_ops-set_interrupt_shadow(vcpu, mask);
+}
+
 int emulate_instruction(struct kvm_vcpu *vcpu,
unsigned long cr2,
u16 error_code,
int emulation_type)
 {
-   int r, shadow_mask;
+   int r;
struct decode_cache *c = vcpu-arch.emulate_ctxt.decode;
 
kvm_clear_exception_queue(vcpu);
@@ -3826,6 +3840,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16;
memset(c, 0, sizeof(struct decode_cache));
memcpy(c-regs, vcpu-arch.regs, sizeof c-regs);
+   vcpu-arch.emulate_ctxt.interruptibility = 0;
 
r = x86_decode_insn(vcpu-arch.emulate_ctxt, emulate_ops);
trace_kvm_emulate_insn_start(vcpu);
@@ -3893,8 +3908,7 @@ restart:
return EMULATE_FAIL;
}
 
-   shadow_mask = vcpu-arch.emulate_ctxt.interruptibility;
-   kvm_x86_ops-set_interrupt_shadow(vcpu, shadow_mask);
+   toggle_interruptibility(vcpu, vcpu-arch.emulate_ctxt.interruptibility);
kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags);
memcpy(vcpu-arch.regs, c-regs, sizeof c-regs);
kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: do not inject exception directly into vcpu

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Return exception as a result of instruction emulation and handle
injection in KVM code.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index a87d95f..51cfd73 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -216,6 +216,12 @@ struct x86_emulate_ctxt {
int interruptibility;
 
bool restart; /* restart string instruction after writeback */
+
+   int exception; /* exception that happens during emulation or -1 */
+   u32 error_code; /* error code for exception */
+   bool error_code_valid;
+   unsigned long cr2; /* faulted address in case of #PF */
+
/* decode cache */
struct decode_cache decode;
 };
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c40b405..b43ac98 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -653,6 +653,37 @@ static unsigned long ss_base(struct x86_emulate_ctxt *ctxt,
return seg_base(ctxt, ops, VCPU_SREG_SS);
 }
 
+static void emulate_exception(struct x86_emulate_ctxt *ctxt, int vec,
+ u32 error, bool valid)
+{
+   ctxt-exception = vec;
+   ctxt-error_code = error;
+   ctxt-error_code_valid = valid;
+   ctxt-restart = false;
+}
+
+static void emulate_gp(struct x86_emulate_ctxt *ctxt, int err)
+{
+   emulate_exception(ctxt, GP_VECTOR, err, true);
+}
+
+static void emulate_pf(struct x86_emulate_ctxt *ctxt, unsigned long addr,
+  int err)
+{
+   ctxt-cr2 = addr;
+   emulate_exception(ctxt, PF_VECTOR, err, true);
+}
+
+static void emulate_ud(struct x86_emulate_ctxt *ctxt)
+{
+   emulate_exception(ctxt, UD_VECTOR, 0, false);
+}
+
+static void emulate_ts(struct x86_emulate_ctxt *ctxt, int err)
+{
+   emulate_exception(ctxt, TS_VECTOR, err, true);
+}
+
 static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt,
  struct x86_emulate_ops *ops,
  unsigned long eip, u8 *dest)
@@ -1285,7 +1316,7 @@ static int read_emulated(struct x86_emulate_ctxt *ctxt,
rc = ops-read_emulated(addr, mc-data + mc-end, n, err,
ctxt-vcpu);
if (rc == X86EMUL_PROPAGATE_FAULT)
-   kvm_inject_page_fault(ctxt-vcpu, addr, err);
+   emulate_pf(ctxt, addr, err);
if (rc != X86EMUL_CONTINUE)
return rc;
mc-end += n;
@@ -1366,13 +1397,13 @@ static int read_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
get_descriptor_table_ptr(ctxt, ops, selector, dt);
 
if (dt.size  index * 8 + 7) {
-   kvm_inject_gp(ctxt-vcpu, selector  0xfffc);
+   emulate_gp(ctxt, selector  0xfffc);
return X86EMUL_PROPAGATE_FAULT;
}
addr = dt.address + index * 8;
ret = ops-read_std(addr, desc, sizeof *desc, ctxt-vcpu,  err);
if (ret == X86EMUL_PROPAGATE_FAULT)
-   kvm_inject_page_fault(ctxt-vcpu, addr, err);
+   emulate_pf(ctxt, addr, err);
 
return ret;
 }
@@ -1391,14 +1422,14 @@ static int write_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
get_descriptor_table_ptr(ctxt, ops, selector, dt);
 
if (dt.size  index * 8 + 7) {
-   kvm_inject_gp(ctxt-vcpu, selector  0xfffc);
+   emulate_gp(ctxt, selector  0xfffc);
return X86EMUL_PROPAGATE_FAULT;
}
 
addr = dt.address + index * 8;
ret = ops-write_std(addr, desc, sizeof *desc, ctxt-vcpu, err);
if (ret == X86EMUL_PROPAGATE_FAULT)
-   kvm_inject_page_fault(ctxt-vcpu, addr, err);
+   emulate_pf(ctxt, addr, err);
 
return ret;
 }
@@ -1517,7 +1548,7 @@ load:
ops-set_cached_descriptor(seg_desc, seg, ctxt-vcpu);
return X86EMUL_CONTINUE;
 exception:
-   kvm_queue_exception_e(ctxt-vcpu, err_vec, err_code);
+   emulate_exception(ctxt, err_vec, err_code, true);
return X86EMUL_PROPAGATE_FAULT;
 }
 
@@ -1578,7 +1609,7 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt,
break;
case X86EMUL_MODE_VM86:
if (iopl  3) {
-   kvm_inject_gp(ctxt-vcpu, 0);
+   emulate_gp(ctxt, 0);
return X86EMUL_PROPAGATE_FAULT;
}
change_mask |= EFLG_IF;
@@ -1829,7 +1860,7 @@ static inline int writeback(struct x86_emulate_ctxt *ctxt,
err,
ctxt-vcpu);
if (rc == X86EMUL_PROPAGATE_FAULT)
-   kvm_inject_page_fault(ctxt-vcpu,
+   emulate_pf(ctxt,
  (unsigned 

[COMMIT master] KVM: x86 emulator: make (get|set)_dr() callback return error if it fails

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Make (get|set)_dr() callback return error if it fails instead of
injecting exception behind emulator's back.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 061f7d3..d5979ec 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3151,9 +3151,14 @@ twobyte_insn:
goto done;
}
 
-   ops-set_dr(c-modrm_reg,c-regs[c-modrm_rm] 
-   ((ctxt-mode == X86EMUL_MODE_PROT64) ? ~0ULL : ~0U),
-   ctxt-vcpu);
+   if (ops-set_dr(c-modrm_reg, c-regs[c-modrm_rm] 
+   ((ctxt-mode == X86EMUL_MODE_PROT64) ?
+~0ULL : ~0U), ctxt-vcpu)  0) {
+   /* #UD condition is already handled by the code above */
+   kvm_inject_gp(ctxt-vcpu, 0);
+   goto done;
+   }
+
c-dst.type = OP_NONE;  /* no writeback */
break;
case 0x30:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f6c799d..dfad042 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -573,7 +573,7 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_get_cr8);
 
-int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
+static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
 {
switch (dr) {
case 0 ... 3:
@@ -582,29 +582,21 @@ int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned 
long val)
vcpu-arch.eff_db[dr] = val;
break;
case 4:
-   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) {
-   kvm_queue_exception(vcpu, UD_VECTOR);
-   return 1;
-   }
+   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
+   return 1; /* #UD */
/* fall through */
case 6:
-   if (val  0xULL) {
-   kvm_inject_gp(vcpu, 0);
-   return 1;
-   }
+   if (val  0xULL)
+   return -1; /* #GP */
vcpu-arch.dr6 = (val  DR6_VOLATILE) | DR6_FIXED_1;
break;
case 5:
-   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) {
-   kvm_queue_exception(vcpu, UD_VECTOR);
-   return 1;
-   }
+   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
+   return 1; /* #UD */
/* fall through */
default: /* 7 */
-   if (val  0xULL) {
-   kvm_inject_gp(vcpu, 0);
-   return 1;
-   }
+   if (val  0xULL)
+   return -1; /* #GP */
vcpu-arch.dr7 = (val  DR7_VOLATILE) | DR7_FIXED_1;
if (!(vcpu-guest_debug  KVM_GUESTDBG_USE_HW_BP)) {
kvm_x86_ops-set_dr7(vcpu, vcpu-arch.dr7);
@@ -615,28 +607,37 @@ int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned 
long val)
 
return 0;
 }
+
+int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
+{
+   int res;
+
+   res = __kvm_set_dr(vcpu, dr, val);
+   if (res  0)
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   else if (res  0)
+   kvm_inject_gp(vcpu, 0);
+
+   return res;
+}
 EXPORT_SYMBOL_GPL(kvm_set_dr);
 
-int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val)
+static int _kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val)
 {
switch (dr) {
case 0 ... 3:
*val = vcpu-arch.db[dr];
break;
case 4:
-   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) {
-   kvm_queue_exception(vcpu, UD_VECTOR);
+   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
return 1;
-   }
/* fall through */
case 6:
*val = vcpu-arch.dr6;
break;
case 5:
-   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) {
-   kvm_queue_exception(vcpu, UD_VECTOR);
+   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
return 1;
-   }
/* fall through */
default: /* 7 */
*val = vcpu-arch.dr7;
@@ -645,6 +646,15 @@ int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned 
long *val)
 
return 0;
 }
+
+int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val)
+{
+   if (_kvm_get_dr(vcpu, dr, val)) {
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   return 1;
+   }
+   return 0;
+}
 EXPORT_SYMBOL_GPL(kvm_get_dr);
 
 static inline u32 bit(int bitno)
@@ 

[COMMIT master] KVM: x86 emulator: handle shadowed registers outside emulator

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Emulator shouldn't access vcpu directly.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 42cb7d7..97a42e8 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -941,12 +941,9 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
/* we cannot decode insn before we complete previous rep insn */
WARN_ON(ctxt-restart);
 
-   /* Shadow copy of register state. Committed on successful emulation. */
-   memset(c, 0, sizeof(struct decode_cache));
c-eip = ctxt-eip;
c-fetch.start = c-fetch.end = c-eip;
ctxt-cs_base = seg_base(ctxt, ops, VCPU_SREG_CS);
-   memcpy(c-regs, ctxt-vcpu-arch.regs, sizeof c-regs);
 
switch (mode) {
case X86EMUL_MODE_REAL:
@@ -2486,16 +2483,13 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
struct decode_cache *c = ctxt-decode;
int rc;
 
-   memset(c, 0, sizeof(struct decode_cache));
c-eip = ctxt-eip;
-   memcpy(c-regs, ctxt-vcpu-arch.regs, sizeof c-regs);
c-dst.type = OP_NONE;
 
rc = emulator_do_task_switch(ctxt, ops, tss_selector, reason,
 has_error_code, error_code);
 
if (rc == X86EMUL_CONTINUE) {
-   memcpy(ctxt-vcpu-arch.regs, c-regs, sizeof c-regs);
rc = writeback(ctxt, ops);
if (rc == X86EMUL_CONTINUE)
ctxt-eip = c-eip;
@@ -2525,13 +2519,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
ctxt-interruptibility = 0;
ctxt-decode.mem_read.pos = 0;
 
-   /* Shadow copy of register state. Committed on successful emulation.
-* NOTE: we can copy them from vcpu as x86_decode_insn() doesn't
-* modify them.
-*/
-
-   memcpy(c-regs, ctxt-vcpu-arch.regs, sizeof c-regs);
-
if (ctxt-mode == X86EMUL_MODE_PROT64  (c-d  No64)) {
kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
goto done;
@@ -3031,8 +3018,6 @@ writeback:
 * without decoding
 */
ctxt-decode.mem_read.end = 0;
-   /* Commit shadow register state. */
-   memcpy(ctxt-vcpu-arch.regs, c-regs, sizeof c-regs);
ctxt-eip = c-eip;
 
 done:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f42be00..d84d531 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3799,7 +3799,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
int emulation_type)
 {
int r, shadow_mask;
-   struct decode_cache *c;
+   struct decode_cache *c = vcpu-arch.emulate_ctxt.decode;
 
kvm_clear_exception_queue(vcpu);
vcpu-arch.mmio_fault_cr2 = cr2;
@@ -3824,13 +3824,14 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
? X86EMUL_MODE_VM86 : cs_l
? X86EMUL_MODE_PROT64 : cs_db
? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16;
+   memset(c, 0, sizeof(struct decode_cache));
+   memcpy(c-regs, vcpu-arch.regs, sizeof c-regs);
 
r = x86_decode_insn(vcpu-arch.emulate_ctxt, emulate_ops);
trace_kvm_emulate_insn_start(vcpu);
 
/* Only allow emulation of specific instructions on #UD
 * (namely VMMCALL, sysenter, sysexit, syscall)*/
-   c = vcpu-arch.emulate_ctxt.decode;
if (emulation_type  EMULTYPE_TRAP_UD) {
if (!c-twobyte)
return EMULATE_FAIL;
@@ -3871,6 +3872,10 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
return EMULATE_DONE;
}
 
+   /* this is needed for vmware backdor interface to work since it
+  changes registers values  during IO operation */
+   memcpy(c-regs, vcpu-arch.regs, sizeof c-regs);
+
 restart:
r = x86_emulate_insn(vcpu-arch.emulate_ctxt, emulate_ops);
 
@@ -3891,6 +3896,7 @@ restart:
shadow_mask = vcpu-arch.emulate_ctxt.interruptibility;
kvm_x86_ops-set_interrupt_shadow(vcpu, shadow_mask);
kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags);
+   memcpy(vcpu-arch.regs, c-regs, sizeof c-regs);
kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip);
 
if (vcpu-arch.pio.count) {
@@ -4874,6 +4880,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason,
bool has_error_code, u32 error_code)
 {
+   struct decode_cache *c = vcpu-arch.emulate_ctxt.decode;
int cs_db, cs_l, ret;
cache_all_regs(vcpu);
 
@@ -4888,6 +4895,8 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 
tss_selector, int reason,
? X86EMUL_MODE_VM86 : cs_l
? X86EMUL_MODE_PROT64 : cs_db

[COMMIT master] KVM: x86 emulator: x86_emulate_insn() return -1 only in case of emulation failure

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Currently emulator returns -1 when emulation failed or IO is needed.
Caller tries to guess whether emulation failed by looking at other
variables. Make it easier for caller to recognise error condition by
always returning -1 in case of failure. For this new emulator
internal return value X86EMUL_IO_NEEDED is introduced. It is used to
distinguish between error condition (which returns X86EMUL_UNHANDLEABLE)
and condition that requires IO exit to userspace to continue emulation.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 0cf4311..777240d 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -53,6 +53,7 @@ struct x86_emulate_ctxt;
 #define X86EMUL_PROPAGATE_FAULT 2 /* propagate a generated fault to guest */
 #define X86EMUL_RETRY_INSTR 3 /* retry the instruction for some reason */
 #define X86EMUL_CMPXCHG_FAILED  4 /* cmpxchg did not see expected value */
+#define X86EMUL_IO_NEEDED   5 /* IO is needed to complete emulation */
 
 struct x86_emulate_ops {
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 55496f4..adf82ef 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3230,7 +3230,7 @@ static int kvm_read_guest_virt_helper(gva_t addr, void 
*val, unsigned int bytes,
}
ret = kvm_read_guest(vcpu-kvm, gpa, data, toread);
if (ret  0) {
-   r = X86EMUL_UNHANDLEABLE;
+   r = X86EMUL_IO_NEEDED;
goto out;
}
 
@@ -3286,7 +3286,7 @@ static int kvm_write_guest_virt_system(gva_t addr, void 
*val,
}
ret = kvm_write_guest(vcpu-kvm, gpa, data, towrite);
if (ret  0) {
-   r = X86EMUL_UNHANDLEABLE;
+   r = X86EMUL_IO_NEEDED;
goto out;
}
 
@@ -3346,7 +3346,7 @@ mmio:
vcpu-run-mmio.len = vcpu-mmio_size = bytes;
vcpu-run-mmio.is_write = vcpu-mmio_is_write = 0;
 
-   return X86EMUL_UNHANDLEABLE;
+   return X86EMUL_IO_NEEDED;
 }
 
 int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
@@ -3818,8 +3818,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 */
cache_all_regs(vcpu);
 
-   vcpu-mmio_is_write = 0;
-
if (!(emulation_type  EMULTYPE_NO_DECODE)) {
int cs_db, cs_l;
kvm_x86_ops-get_cs_db_l_bits(vcpu, cs_db, cs_l);
@@ -3893,24 +3891,26 @@ restart:
return EMULATE_DO_MMIO;
}
 
-   if (r) {
-   if (kvm_mmu_unprotect_page_virt(vcpu, cr2))
-   goto done;
-   if (!vcpu-mmio_needed) {
-   ++vcpu-stat.insn_emulation_fail;
-   trace_kvm_emulate_insn_failed(vcpu);
-   kvm_report_emulation_failure(vcpu, mmio);
-   return EMULATE_FAIL;
-   }
+   if (vcpu-mmio_needed) {
+   if (vcpu-mmio_is_write)
+   vcpu-mmio_needed = 0;
return EMULATE_DO_MMIO;
}
 
-   if (vcpu-mmio_is_write) {
-   vcpu-mmio_needed = 0;
-   return EMULATE_DO_MMIO;
+   if (r) { /* emulation failed */
+   /*
+* if emulation was due to access to shadowed page table
+* and it failed try to unshadow page and re-entetr the
+* guest to let CPU execute the instruction.
+*/
+   if (kvm_mmu_unprotect_page_virt(vcpu, cr2))
+   return EMULATE_DONE;
+
+   trace_kvm_emulate_insn_failed(vcpu);
+   kvm_report_emulation_failure(vcpu, mmio);
+   return EMULATE_FAIL;
}
 
-done:
if (vcpu-arch.exception.pending)
vcpu-arch.emulate_ctxt.restart = false;
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: advance RIP outside x86 emulator code

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Return new RIP as part of instruction emulation result instead of
updating KVM's RIP from x86 emulator code.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index d7a18a0..437f31b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2496,8 +2496,9 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
 
if (rc == X86EMUL_CONTINUE) {
memcpy(ctxt-vcpu-arch.regs, c-regs, sizeof c-regs);
-   kvm_rip_write(ctxt-vcpu, c-eip);
rc = writeback(ctxt, ops);
+   if (rc == X86EMUL_CONTINUE)
+   ctxt-eip = c-eip;
}
 
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
@@ -2554,7 +2555,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
if (address_mask(c, c-regs[VCPU_REGS_RCX]) == 0) {
string_done:
ctxt-restart = false;
-   kvm_rip_write(ctxt-vcpu, c-eip);
+   ctxt-eip = c-eip;
goto done;
}
/* The second termination condition only applies for REPE
@@ -3032,7 +3033,7 @@ writeback:
ctxt-decode.mem_read.end = 0;
/* Commit shadow register state. */
memcpy(ctxt-vcpu-arch.regs, c-regs, sizeof c-regs);
-   kvm_rip_write(ctxt-vcpu, c-eip);
+   ctxt-eip = c-eip;
ops-set_rflags(ctxt-vcpu, ctxt-eflags);
 
 done:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4121a9f..3544ea9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3896,6 +3896,7 @@ restart:
 
shadow_mask = vcpu-arch.emulate_ctxt.interruptibility;
kvm_x86_ops-set_interrupt_shadow(vcpu, shadow_mask);
+   kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip);
 
if (vcpu-arch.pio.count) {
if (!vcpu-arch.pio.in)
@@ -4900,6 +4901,7 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 
tss_selector, int reason,
if (ret)
return EMULATE_FAIL;
 
+   kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip);
kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags);
return EMULATE_DONE;
 }
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: make set_cr() callback return error if it fails

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Make set_cr() callback return error if it fails instead of injecting #GP
behind emulator's back.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index df53ba2..6c4f491 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -135,7 +135,7 @@ struct x86_emulate_ops {
unsigned long (*get_cached_segment_base)(int seg, struct kvm_vcpu 
*vcpu);
void (*get_gdt)(struct desc_ptr *dt, struct kvm_vcpu *vcpu);
ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
-   void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
+   int (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
int (*cpl)(struct kvm_vcpu *vcpu);
void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
int (*get_dr)(int dr, unsigned long *dest, struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f56ec48..061f7d3 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2272,7 +2272,10 @@ static int load_state_from_tss32(struct x86_emulate_ctxt 
*ctxt,
struct decode_cache *c = ctxt-decode;
int ret;
 
-   ops-set_cr(3, tss-cr3, ctxt-vcpu);
+   if (ops-set_cr(3, tss-cr3, ctxt-vcpu)) {
+   kvm_inject_gp(ctxt-vcpu, 0);
+   return X86EMUL_PROPAGATE_FAULT;
+   }
c-eip = tss-eip;
ctxt-eflags = tss-eflags | 2;
c-regs[VCPU_REGS_RAX] = tss-eax;
@@ -3135,7 +3138,10 @@ twobyte_insn:
c-dst.type = OP_NONE;  /* no writeback */
break;
case 0x22: /* mov reg, cr */
-   ops-set_cr(c-modrm_reg, c-modrm_val, ctxt-vcpu);
+   if (ops-set_cr(c-modrm_reg, c-modrm_val, ctxt-vcpu)) {
+   kvm_inject_gp(ctxt-vcpu, 0);
+   goto done;
+   }
c-dst.type = OP_NONE;
break;
case 0x23: /* mov from reg to dr */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 29cc2b1..f6c799d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -414,57 +414,49 @@ out:
return changed;
 }
 
-void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
+static int __kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 {
cr0 |= X86_CR0_ET;
 
 #ifdef CONFIG_X86_64
-   if (cr0  0xUL) {
-   kvm_inject_gp(vcpu, 0);
-   return;
-   }
+   if (cr0  0xUL)
+   return 1;
 #endif
 
cr0 = ~CR0_RESERVED_BITS;
 
-   if ((cr0  X86_CR0_NW)  !(cr0  X86_CR0_CD)) {
-   kvm_inject_gp(vcpu, 0);
-   return;
-   }
+   if ((cr0  X86_CR0_NW)  !(cr0  X86_CR0_CD))
+   return 1;
 
-   if ((cr0  X86_CR0_PG)  !(cr0  X86_CR0_PE)) {
-   kvm_inject_gp(vcpu, 0);
-   return;
-   }
+   if ((cr0  X86_CR0_PG)  !(cr0  X86_CR0_PE))
+   return 1;
 
if (!is_paging(vcpu)  (cr0  X86_CR0_PG)) {
 #ifdef CONFIG_X86_64
if ((vcpu-arch.efer  EFER_LME)) {
int cs_db, cs_l;
 
-   if (!is_pae(vcpu)) {
-   kvm_inject_gp(vcpu, 0);
-   return;
-   }
+   if (!is_pae(vcpu))
+   return 1;
kvm_x86_ops-get_cs_db_l_bits(vcpu, cs_db, cs_l);
-   if (cs_l) {
-   kvm_inject_gp(vcpu, 0);
-   return;
-
-   }
+   if (cs_l)
+   return 1;
} else
 #endif
-   if (is_pae(vcpu)  !load_pdptrs(vcpu, vcpu-arch.cr3)) {
-   kvm_inject_gp(vcpu, 0);
-   return;
-   }
-
+   if (is_pae(vcpu)  !load_pdptrs(vcpu, vcpu-arch.cr3))
+   return 1;
}
 
kvm_x86_ops-set_cr0(vcpu, cr0);
 
kvm_mmu_reset_context(vcpu);
-   return;
+   return 0;
+}
+
+void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
+{
+   if (__kvm_set_cr0(vcpu, cr0))
+   kvm_inject_gp(vcpu, 0);
 }
 EXPORT_SYMBOL_GPL(kvm_set_cr0);
 
@@ -474,61 +466,56 @@ void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw)
 }
 EXPORT_SYMBOL_GPL(kvm_lmsw);
 
-void kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+int __kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
unsigned long old_cr4 = kvm_read_cr4(vcpu);
unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE;
 
-   if (cr4  CR4_RESERVED_BITS) {
-   kvm_inject_gp(vcpu, 0);
-   return;
-   }
+   if (cr4  CR4_RESERVED_BITS)
+   return 1;
 

[COMMIT master] KVM: x86 emulator: add (set|get)_dr callbacks to x86_emulate_ops

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Add (set|get)_dr callbacks to x86_emulate_ops instead of calling
them directly.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 69a64a6..c37296d 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -137,6 +137,8 @@ struct x86_emulate_ops {
void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
int (*cpl)(struct kvm_vcpu *vcpu);
void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
+   int (*get_dr)(int dr, unsigned long *dest, struct kvm_vcpu *vcpu);
+   int (*set_dr)(int dr, unsigned long value, struct kvm_vcpu *vcpu);
 };
 
 /* Type, address-of, and value of an instruction's operand. */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3f0007b..74cb6ac 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -590,10 +590,6 @@ void kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
 int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address);
 int emulate_clts(struct kvm_vcpu *vcpu);
-int emulator_get_dr(struct x86_emulate_ctxt *ctxt, int dr,
-   unsigned long *dest);
-int emulator_set_dr(struct x86_emulate_ctxt *ctxt, int dr,
-   unsigned long value);
 
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
 int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 687ea09..8a4aa73 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3132,7 +3132,7 @@ twobyte_insn:
kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
goto done;
}
-   emulator_get_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]);
+   ops-get_dr(c-modrm_reg, c-regs[c-modrm_rm], ctxt-vcpu);
c-dst.type = OP_NONE;  /* no writeback */
break;
case 0x22: /* mov reg, cr */
@@ -3145,7 +3145,10 @@ twobyte_insn:
kvm_queue_exception(ctxt-vcpu, UD_VECTOR);
goto done;
}
-   emulator_set_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]);
+
+   ops-set_dr(c-modrm_reg,c-regs[c-modrm_rm] 
+   ((ctxt-mode == X86EMUL_MODE_PROT64) ? ~0ULL : ~0U),
+   ctxt-vcpu);
c-dst.type = OP_NONE;  /* no writeback */
break;
case 0x30:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4d0a968..71ff194 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3575,16 +3575,14 @@ int emulate_clts(struct kvm_vcpu *vcpu)
return X86EMUL_CONTINUE;
 }
 
-int emulator_get_dr(struct x86_emulate_ctxt *ctxt, int dr, unsigned long *dest)
+int emulator_get_dr(int dr, unsigned long *dest, struct kvm_vcpu *vcpu)
 {
-   return kvm_get_dr(ctxt-vcpu, dr, dest);
+   return kvm_get_dr(vcpu, dr, dest);
 }
 
-int emulator_set_dr(struct x86_emulate_ctxt *ctxt, int dr, unsigned long value)
+int emulator_set_dr(int dr, unsigned long value, struct kvm_vcpu *vcpu)
 {
-   unsigned long mask = (ctxt-mode == X86EMUL_MODE_PROT64) ? ~0ULL : ~0U;
-
-   return kvm_set_dr(ctxt-vcpu, dr, value  mask);
+   return kvm_set_dr(vcpu, dr, value);
 }
 
 void kvm_report_emulation_failure(struct kvm_vcpu *vcpu, const char *context)
@@ -3766,6 +3764,8 @@ static struct x86_emulate_ops emulate_ops = {
.set_cr  = emulator_set_cr,
.cpl = emulator_get_cpl,
.set_rflags  = emulator_set_rflags,
+   .get_dr  = emulator_get_dr,
+   .set_dr  = emulator_set_dr,
 };
 
 static void cache_all_regs(struct kvm_vcpu *vcpu)
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots

2010-05-05 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

On svm, kvm_read_pdptr() may require reading guest memory, which can sleep.

Push the spinlock into mmu_alloc_roots(), and only take it after we've read
the pdptr.

Tested-by: Joerg Roedel joerg.roe...@amd.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 51eb6d6..de99638 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2065,11 +2065,13 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
direct = 1;
root_gfn = 0;
}
+   spin_lock(vcpu-kvm-mmu_lock);
sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
  PT64_ROOT_LEVEL, direct,
  ACC_ALL, NULL);
root = __pa(sp-spt);
++sp-root_count;
+   spin_unlock(vcpu-kvm-mmu_lock);
vcpu-arch.mmu.root_hpa = root;
return 0;
}
@@ -2093,11 +2095,14 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
direct = 1;
root_gfn = i  30;
}
+   spin_lock(vcpu-kvm-mmu_lock);
sp = kvm_mmu_get_page(vcpu, root_gfn, i  30,
  PT32_ROOT_LEVEL, direct,
  ACC_ALL, NULL);
root = __pa(sp-spt);
++sp-root_count;
+   spin_unlock(vcpu-kvm-mmu_lock);
+
vcpu-arch.mmu.pae_root[i] = root | PT_PRESENT_MASK;
}
vcpu-arch.mmu.root_hpa = __pa(vcpu-arch.mmu.pae_root);
@@ -2466,7 +2471,9 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
goto out;
spin_lock(vcpu-kvm-mmu_lock);
kvm_mmu_free_some_pages(vcpu);
+   spin_unlock(vcpu-kvm-mmu_lock);
r = mmu_alloc_roots(vcpu);
+   spin_lock(vcpu-kvm-mmu_lock);
mmu_sync_roots(vcpu);
spin_unlock(vcpu-kvm-mmu_lock);
if (r)
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: x86 emulator: introduce read cache

2010-05-05 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Introduce read cache which is needed for instruction that require more
then one exit to userspace. After returning from userspace the instruction
will be re-executed with cached read value.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 0b2729b..288cbed 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -186,6 +186,7 @@ struct decode_cache {
unsigned long modrm_val;
struct fetch_cache fetch;
struct read_cache io_read;
+   struct read_cache mem_read;
 };
 
 struct x86_emulate_ctxt {
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5ac0bb4..776874b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1263,6 +1263,33 @@ done:
return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0;
 }
 
+static int read_emulated(struct x86_emulate_ctxt *ctxt,
+struct x86_emulate_ops *ops,
+unsigned long addr, void *dest, unsigned size)
+{
+   int rc;
+   struct read_cache *mc = ctxt-decode.mem_read;
+
+   while (size) {
+   int n = min(size, 8u);
+   size -= n;
+   if (mc-pos  mc-end)
+   goto read_cached;
+
+   rc = ops-read_emulated(addr, mc-data + mc-end, n, 
ctxt-vcpu);
+   if (rc != X86EMUL_CONTINUE)
+   return rc;
+   mc-end += n;
+
+   read_cached:
+   memcpy(dest, mc-data + mc-pos, n);
+   mc-pos += n;
+   dest += n;
+   addr += n;
+   }
+   return X86EMUL_CONTINUE;
+}
+
 static int pio_in_emulated(struct x86_emulate_ctxt *ctxt,
   struct x86_emulate_ops *ops,
   unsigned int size, unsigned short port,
@@ -1504,9 +1531,9 @@ static int emulate_pop(struct x86_emulate_ctxt *ctxt,
struct decode_cache *c = ctxt-decode;
int rc;
 
-   rc = ops-read_emulated(register_address(c, ss_base(ctxt),
-c-regs[VCPU_REGS_RSP]),
-   dest, len, ctxt-vcpu);
+   rc = read_emulated(ctxt, ops, register_address(c, ss_base(ctxt),
+  c-regs[VCPU_REGS_RSP]),
+  dest, len);
if (rc != X86EMUL_CONTINUE)
return rc;
 
@@ -2475,6 +2502,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
int saved_dst_type = c-dst.type;
 
ctxt-interruptibility = 0;
+   ctxt-decode.mem_read.pos = 0;
 
/* Shadow copy of register state. Committed on successful emulation.
 * NOTE: we can copy them from vcpu as x86_decode_insn() doesn't
@@ -2529,20 +2557,16 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
}
 
if (c-src.type == OP_MEM) {
-   rc = ops-read_emulated((unsigned long)c-src.ptr,
-   c-src.val,
-   c-src.bytes,
-   ctxt-vcpu);
+   rc = read_emulated(ctxt, ops, (unsigned long)c-src.ptr,
+   c-src.val, c-src.bytes);
if (rc != X86EMUL_CONTINUE)
goto done;
c-src.orig_val = c-src.val;
}
 
if (c-src2.type == OP_MEM) {
-   rc = ops-read_emulated((unsigned long)c-src2.ptr,
-   c-src2.val,
-   c-src2.bytes,
-   ctxt-vcpu);
+   rc = read_emulated(ctxt, ops, (unsigned long)c-src2.ptr,
+   c-src2.val, c-src2.bytes);
if (rc != X86EMUL_CONTINUE)
goto done;
}
@@ -2553,8 +2577,8 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
 
if ((c-dst.type == OP_MEM)  !(c-d  Mov)) {
/* optimisation - avoid slow emulated read if Mov */
-   rc = ops-read_emulated((unsigned long)c-dst.ptr, c-dst.val,
-   c-dst.bytes, ctxt-vcpu);
+   rc = read_emulated(ctxt, ops, (unsigned long)c-dst.ptr,
+  c-dst.val, c-dst.bytes);
if (rc != X86EMUL_CONTINUE)
goto done;
}
@@ -2981,7 +3005,11 @@ writeback:
(rc-end != 0  rc-end == rc-pos))
ctxt-restart = false;
}
-
+   /*
+* reset read cache here in case string instruction is restared
+* without decoding
+*/
+   ctxt-decode.mem_read.end = 0;
/* Commit shadow register state. */

[COMMIT master] KVM: x86: properly update ready_for_interrupt_injection

2010-05-05 Thread Avi Kivity
From: Marcelo Tosatti mtosa...@redhat.com

The recent changes to emulate string instructions without entering guest
mode exposed a bug where pending interrupts are not properly reflected
in ready_for_interrupt_injection.

The result is that userspace overwrites a previously queued interrupt,
when irqchip's are emulated in userspace.

Fix by always updating state before returning to userspace.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b2ce1d..dff08e5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4653,7 +4653,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
}
 
srcu_read_unlock(kvm-srcu, vcpu-srcu_idx);
-   post_kvm_run_save(vcpu);
 
vapic_exit(vcpu);
 
@@ -4703,6 +4702,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
r = __vcpu_run(vcpu);
 
 out:
+   post_kvm_run_save(vcpu);
if (vcpu-sigset_active)
sigprocmask(SIG_SETMASK, sigsaved, NULL);
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-05 Thread Neil Brown
On Wed, 5 May 2010 14:28:41 +0930
Rusty Russell ru...@rustcorp.com.au wrote:

 On Wed, 5 May 2010 05:47:05 am Jamie Lokier wrote:
  Jens Axboe wrote:
   On Tue, May 04 2010, Rusty Russell wrote:
ISTR someone mentioning a desire for such an API years ago, so CC'ing 
the
usual I/O suspects...
   
   It would be nice to have a more fuller API for this, but the reality is
   that only the flush approach is really workable. Even just strict
   ordering of requests could only be supported on SCSI, and even there the
   kernel still lacks proper guarantees on error handling to prevent
   reordering there.
  
  There's a few I/O scheduling differences that might be useful:
  
  1. The I/O scheduler could freely move WRITEs before a FLUSH but not
 before a BARRIER.  That might be useful for time-critical WRITEs,
 and those issued by high I/O priority.
 
 This is only because noone actually wants flushes or barriers, though
 I/O people seem to only offer that.  We really want these writes must
 occur before this write.  That offers maximum choice to the I/O subsystem
 and potentially to smart (virtual?) disks.
 
  2. The I/O scheduler could move WRITEs after a FLUSH if the FLUSH is
 only for data belonging to a particular file (e.g. fdatasync with
 no file size change, even on btrfs if O_DIRECT was used for the
 writes being committed).  That would entail tagging FLUSHes and
 WRITEs with a fs-specific identifier (such as inode number), opaque
 to the scheduler which only checks equality.
 
 This is closer.  In userspace I'd be happy with a all prior writes to this
 struct file before all future writes.  Even if the original guarantees were
 stronger (ie. inode basis).  We currently implement transactions using 4 fsync
 /msync pairs.
 
   write_recovery_data(fd);
   fsync(fd);
   msync(mmap);
   write_recovery_header(fd);
   fsync(fd);
   msync(mmap);
   overwrite_with_new_data(fd);
   fsync(fd);
   msync(mmap);
   remove_recovery_header(fd);
   fsync(fd);
   msync(mmap);

Seems over-zealous.
If the recovery_header held a strong checksum of the recovery_data you would
not need the first fsync, and as long as you have two places to write recovery
data, you don't need the 3rd and 4th syncs.
Just:
  write_internally_checksummed_recovery_data_and_header_to_unused_log_space()
  fsync / msync
  overwrite_with_new_data()

To recovery you choose the most recent log_space and replay the content.
That may be a redundant operation, but that is no loss.

Also cannot see the point of msync if you have already performed an fsync,
and if there is a point, I would expect you to call msync before
fsync... Maybe there is some subtlety there that I am not aware of.

 
 Yet we really only need ordering, not guarantees about it actually hitting
 disk before returning.
 
  In other words, FLUSH can be more relaxed than BARRIER inside the
  kernel.  It's ironic that we think of fsync as stronger than
  fbarrier outside the kernel :-)
 
 It's an implementation detail; barrier has less flexibility because it has
 less information about what is required. I'm saying I want to give you as
 much information as I can, even if you don't use it yet.

Only we know that approach doesn't work.
People will learn that they don't need to give the extra information to still
achieve the same result - just like they did with ext3 and fsync.
Then when we improve the implementation to only provide the guarantees that
you asked for, people will complain that they are getting empty files that
they didn't expect.

The abstraction I would like to see is a simple 'barrier' that contains no
data and has a filesystem-wide effect.

If a filesystem wanted a 'full' barrier such as the current BIO_RW_BARRER,
it would send an empty barrier, then the data, then another empty barrier.
(However I suspect most filesystems don't really need barriers on both sides.)
A low level driver might merge these together if the underlying hardware
supported that combined operation (which I believe some do).
I think this merging would be less complex that the current need to split a
BIO_RW_BARRIER in to the three separate operations when only a flush is
possible (I know it would make md code a lot nicer :-).

I would probably expose this to user-space as extra flags to sync_file_range:
   SYNC_FILE_RANGE_BARRIER_BEFORE
   SYNC_FILE_RANGE_BARRIER_AFTER

This would make it clear that a barrier does *not* imply a sync, it only
applies to data for which a sync has already been requested. So data that has
already been 'synced' is stored strictly before data which has not yet been
submitted with write() (or by changing a mmapped area).
The barrier would still be filesystem wide in that if you
SYNC_FILE_WRITE_WRITE one file, then SYNC_FILE_RANGE_BARRIER_BEFORE another
file on the same filesystem, the pages scheduled in the first file would be
affect by the barrier request on the second file.

Implementing 

Re: [PATCH 2/2] turn off kvmclock when resetting cpu

2010-05-05 Thread Avi Kivity

On 05/04/2010 09:35 PM, Glauber Costa wrote:

Currently, in the linux kernel, we reset kvmclock if we are rebooting
into a crash kernel through kexec. The rationale, is that a new kernel
won't follow the same memory addresses, and the memory where kvmclock is
located in the first kernel, will be something else in the second one.

We don't do it in normal reboots, because the second kernel ends up
registering kvmclock again, which has the effect of turning off the
first instance.

This is, however, totally wrong. This assumes we're booting into
a kernel that also has kvmclock enabled. If by some reason we reboot
into something that doesn't do kvmclock including but not limited to:
  * rebooting into an older kernel without kvmclock support,
  * rebooting with no-kvmclock,
  * rebootint into another O.S,

we'll simply have the hypervisor writing into a random memory position
into the guest. Neat, uh?

Moreover, I believe the fix belongs in qemu, since it is the entity
more prepared to detect all kinds of reboots (by means of a cpu_reset),
not to mention the presence of misbehaving guests, that can forget
to turn kvmclock off.

This patch fixes the issue for me.

Signed-off-by: Glauber Costaglom...@redhat.com
---
  qemu-kvm-x86.c |   19 +++
  1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 439c31a..4b94e04 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -1417,8 +1417,27 @@ void kvm_arch_push_nmi(void *opaque)
  }
  #endif /* KVM_CAP_USER_NMI */

+static int kvm_turn_off_clock(CPUState *env)
+{
+struct {
+struct kvm_msrs info;
+struct kvm_msr_entry entries[100];
+} msr_data;
+
+struct kvm_msr_entry *msrs = msr_data.entries;
+int n = 0;
+
+kvm_msr_entry_set(msrs[n++], MSR_KVM_SYSTEM_TIME, 0);
+kvm_msr_entry_set(msrs[n++], MSR_KVM_WALL_CLOCK, 0);
   


This fails if the kernel doesn't support those MSRs.  Moreover, you need 
to use the new MSRs as well if we are ever to succeed in deprecating the 
old ones.



+msr_data.info.nmsrs = n;
+
+return kvm_vcpu_ioctl(env, KVM_SET_MSRS,msr_data);
+}
+
+
   


How about a different approach?  Query the supported MSRs 
(KVM_GET_MSR_LIST or thereabout) and reset them (with special cases for 
the TSC, and the old clock MSRs when the new ones are present)?


Long term we need a kernel reset function, but this will do for now.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] x86: eliminate TS_XSAVE

2010-05-05 Thread Avi Kivity

On 05/04/2010 09:24 PM, H. Peter Anvin wrote:


I would like to request one change, however.  I would like to see the
alternatives code to be:

movb $0,reg
movb $1,reg

... instead of using xor (which has to be padded with NOPs, which is of
course pointless since the slot is a fixed size.)


Right.


I would suggest using
a byte-sized variable instead of a dword-size variable to save a few
bytes, too.
   


I used a bool, and the code already compiles to a byte mov.  Though it 
could be argued that a word instruction is better since it avoids a 
false dependency, and allows a preceding instruction that modifies %reg 
to be executed after the mov instruction.



Once the jump label framework is integrated and has matured, I think we
should consider using it to save the mov/test/jump.
   


IIRC that has an implied unlikely() which isn't suitable here?

Perhaps the immediate values patches.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] x86 FPU API

2010-05-05 Thread Avi Kivity
Currently all fpu accessors are wedded to task_struct.  However kvm also uses
the fpu in a different context.  Introduce an FPU API, and replace the
current uses with the new API.

While this patchset is oriented towards deeper changes, as a first step it
simlifies xsave for kvm.

v2:
eliminate useless padding in use_xsave() by using a larger instruction

Avi Kivity (2):
  x86: eliminate TS_XSAVE
  x86: Introduce 'struct fpu' and related API

 arch/x86/include/asm/i387.h|  135 +++-
 arch/x86/include/asm/processor.h   |6 ++-
 arch/x86/include/asm/thread_info.h |1 -
 arch/x86/include/asm/xsave.h   |7 +-
 arch/x86/kernel/cpu/common.c   |5 +-
 arch/x86/kernel/i387.c |  107 ++---
 arch/x86/kernel/process.c  |   20 +++---
 arch/x86/kernel/process_32.c   |2 +-
 arch/x86/kernel/process_64.c   |2 +-
 arch/x86/kernel/xsave.c|8 +-
 arch/x86/math-emu/fpu_aux.c|6 +-
 11 files changed, 181 insertions(+), 118 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] x86: eliminate TS_XSAVE

2010-05-05 Thread Avi Kivity
The fpu code currently uses current-thread_info-status  TS_XSAVE as
a way to distinguish between XSAVE capable processors and older processors.
The decision is not really task specific; instead we use the task status to
avoid a global memory reference - the value should be the same across all
threads.

Eliminate this tie-in into the task structure by using an alternative
instruction keyed off the XSAVE cpu feature; this results in shorter and
faster code, without introducing a global memory reference.

Acked-by: Suresh Siddha suresh.b.sid...@intel.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/include/asm/i387.h|   20 
 arch/x86/include/asm/thread_info.h |1 -
 arch/x86/kernel/cpu/common.c   |5 +
 arch/x86/kernel/i387.c |5 +
 arch/x86/kernel/xsave.c|6 +++---
 5 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h
index da29309..301fff5 100644
--- a/arch/x86/include/asm/i387.h
+++ b/arch/x86/include/asm/i387.h
@@ -56,6 +56,18 @@ extern int restore_i387_xstate_ia32(void __user *buf);
 
 #define X87_FSW_ES (1  7)/* Exception Summary */
 
+static inline bool use_xsave(void)
+{
+   bool has_xsave;
+
+   alternative_io(mov $0, %0,
+  mov $1, %0,
+  X86_FEATURE_XSAVE,
+  =g(has_xsave));
+
+   return has_xsave;
+}
+
 #ifdef CONFIG_X86_64
 
 /* Ignore delayed exceptions from user space */
@@ -99,7 +111,7 @@ static inline void clear_fpu_state(struct task_struct *tsk)
/*
 * xsave header may indicate the init state of the FP.
 */
-   if ((task_thread_info(tsk)-status  TS_XSAVE) 
+   if (use_xsave() 
!(xstate-xsave_hdr.xstate_bv  XSTATE_FP))
return;
 
@@ -164,7 +176,7 @@ static inline void fxsave(struct task_struct *tsk)
 
 static inline void __save_init_fpu(struct task_struct *tsk)
 {
-   if (task_thread_info(tsk)-status  TS_XSAVE)
+   if (use_xsave())
xsave(tsk);
else
fxsave(tsk);
@@ -218,7 +230,7 @@ static inline int fxrstor_checking(struct 
i387_fxsave_struct *fx)
  */
 static inline void __save_init_fpu(struct task_struct *tsk)
 {
-   if (task_thread_info(tsk)-status  TS_XSAVE) {
+   if (use_xsave()) {
struct xsave_struct *xstate = tsk-thread.xstate-xsave;
struct i387_fxsave_struct *fx = tsk-thread.xstate-fxsave;
 
@@ -266,7 +278,7 @@ end:
 
 static inline int restore_fpu_checking(struct task_struct *tsk)
 {
-   if (task_thread_info(tsk)-status  TS_XSAVE)
+   if (use_xsave())
return xrstor_checking(tsk-thread.xstate-xsave);
else
return fxrstor_checking(tsk-thread.xstate-fxsave);
diff --git a/arch/x86/include/asm/thread_info.h 
b/arch/x86/include/asm/thread_info.h
index d017ed5..d4092fa 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -242,7 +242,6 @@ static inline struct thread_info *current_thread_info(void)
 #define TS_POLLING 0x0004  /* true if in idle loop
   and not sleeping */
 #define TS_RESTORE_SIGMASK 0x0008  /* restore signal mask in do_signal() */
-#define TS_XSAVE   0x0010  /* Use xsave/xrstor */
 
 #define tsk_is_polling(t) (task_thread_info(t)-status  TS_POLLING)
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 4868e4a..c1c00d0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1243,10 +1243,7 @@ void __cpuinit cpu_init(void)
/*
 * Force FPU initialization:
 */
-   if (cpu_has_xsave)
-   current_thread_info()-status = TS_XSAVE;
-   else
-   current_thread_info()-status = 0;
+   current_thread_info()-status = 0;
clear_used_math();
mxcsr_feature_mask_init();
 
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 54c31c2..14ca1dc 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -102,10 +102,7 @@ void __cpuinit fpu_init(void)
 
mxcsr_feature_mask_init();
/* clean state in init */
-   if (cpu_has_xsave)
-   current_thread_info()-status = TS_XSAVE;
-   else
-   current_thread_info()-status = 0;
+   current_thread_info()-status = 0;
clear_used_math();
 }
 #endif /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 782c3a3..c1b0a11 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -99,7 +99,7 @@ int save_i387_xstate(void __user *buf)
if (err)
return err;
 
-   if (task_thread_info(tsk)-status  TS_XSAVE)
+   if (use_xsave())
err = xsave_user(buf);
else
err = 

Re: [PATCHv2 00/23] next round of emulator cleanups

2010-05-05 Thread Avi Kivity

On 04/28/2010 07:15 PM, Gleb Natapov wrote:

This is the next round of emulator cleanups. Make it even more detached
from kvm. First patch introduces IO read cache which is needed to
correctly emulate instructions that require more then one IO read exit
during emulation.

   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RESEND] intel_txt: enable SMX flag for VMXON in KVM

2010-05-05 Thread Avi Kivity

On 05/05/2010 01:38 PM, Shane Wang wrote:

Per Intel SDM 3B 20.7, for IA32_FEATURE_CONTROL MSR
Bit 1 enables VMXON in SMX operation. If the bit is clear, execution of VMXON 
in SMX operation causes a general-protection exception.
Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution of 
VMXON outside SMX operation causes a general-protection exception.

This patch is to check the correct in/outside-SMX flag when detecting if VMX is 
disabled by BIOS, and to set in-SMX flag for VMXON after Intel TXT is launched 
in KVM.

   


Already committed as 9d4b473eeea, I forgot to confirm, sorry.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/7] export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID

2010-05-05 Thread Avi Kivity

On 05/03/2010 06:52 PM, Glauber Costa wrote:

Right now, we were using individual KVM_CAP entities to communicate
userspace about which cpuids we support. This is suboptimal, since it
generates a delay between the feature arriving in the host, and
being available at the guest.

A much better mechanism is to list para features in KVM_GET_SUPPORTED_CPUID.
This makes userspace automatically aware of what we provide. And if we
ever add a new cpuid bit in the future, we have to do that again,
which create some complexity and delay in feature adoption.

Signed-off-by: Glauber Costaglom...@redhat.com
---
  arch/x86/include/asm/kvm_para.h |4 
  arch/x86/kvm/x86.c  |   27 +++
  2 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 9734808..f019f8c 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -16,6 +16,10 @@
  #define KVM_FEATURE_CLOCKSOURCE   0
  #define KVM_FEATURE_NOP_IO_DELAY  1
  #define KVM_FEATURE_MMU_OP2
+/* This indicates that the new set of kvmclock msrs
+ * are available. The use of 0x11 and 0x12 is deprecated
+ */
+#define KVM_FEATURE_CLOCKSOURCE23
   


Separate patch.



  #define MSR_KVM_WALL_CLOCK  0x11
  #define MSR_KVM_SYSTEM_TIME 0x12
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eb84947..8a7cdda 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1971,6 +1971,20 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, 
u32 function,
}
break;
}
+   case 0x4000: {
   


Use symbolic name, please.


+   char signature[] = KVMKVMKVM;
+   u32 *sigptr = (u32 *)signature;
+   entry-eax = 1;
   


Where did this come from?


+   entry-ebx = sigptr[0];
+   entry-ecx = sigptr[1];
+   entry-edx = sigptr[2];
   


Overflow, you're reading 12 bytes from a 10-byte variable.


+   break;
+   }
+   case 0x4001:
+   entry-eax = (1  KVM_FEATURE_CLOCKSOURCE) |
+   (1  KVM_FEATURE_NOP_IO_DELAY) |
+   (1  KVM_FEATURE_CLOCKSOURCE2);
   


Indentation...

Also, have to initialize all fields, since the real cpu won't initialize 
them for you.


Sidenote: the real cpu may be a kvm vcpu, so it may in fact support 
those features.



+   break;
case 0x8000:
entry-eax = min(entry-eax, 0x801a);
break;
@@ -2017,6 +2031,19 @@ static int kvm_dev_ioctl_get_supported_cpuid(struct 
kvm_cpuid2 *cpuid,
for (func = 0x8001; func= limit  nent  cpuid-nent; ++func)
do_cpuid_ent(cpuid_entries[nent], func, 0,
nent, cpuid-nent);
+
+   
+
+   r = -E2BIG;
+   if (nent= cpuid-nent)
+   goto out_free;
+
+   do_cpuid_ent(cpuid_entries[nent], 0x4000, 0,nent, cpuid-nent);
+   limit = cpuid_entries[nent - 1].eax;
   


The kvm cpuid does not follow the limit thing.


+   for (func = 0x4001; func= limit  nent  cpuid-nent; ++func)
+   do_cpuid_ent(cpuid_entries[nent], func, 0,
+   nent, cpuid-nent);
+
r = -E2BIG;
   


To avoid confusion, please write Documentation/kvm/cpuid.txt based on 
the current qemu-kvm code, and implement this patch according to the 
documentation.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 6/7] don't compute pvclock adjustments if we trust the tsc

2010-05-05 Thread Avi Kivity

On 05/03/2010 06:52 PM, Glauber Costa wrote:

If the HV told us we can fully trust the TSC, skip any
correction

   



Signed-off-by: Glauber Costaglom...@redhat.com
---
  arch/x86/include/asm/kvm_para.h|5 +
  arch/x86/include/asm/pvclock-abi.h |1 +
  arch/x86/kernel/kvmclock.c |3 +++
  arch/x86/kernel/pvclock.c  |4 
  4 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index f019f8c..6f1b878 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -21,6 +21,11 @@
   */
  #define KVM_FEATURE_CLOCKSOURCE23

+/* The last 8 bits are used to indicate how to interpret the flags field
+ * in pvclock structure. If no bits are set, all flags are ignored.
+ */
+#define KVM_FEATURE_CLOCKSOURCE_STABLE_TSC 24
   


This needs documentation (in cpuid.txt).  The flag doesn't mean the TSC 
is stable, rather it means the pvclock tsc stable bit is valid.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/7] Enable pvclock flags in vcpu_time_info structure

2010-05-05 Thread Avi Kivity

On 05/03/2010 06:52 PM, Glauber Costa wrote:

This patch removes one padding byte and transform it into a flags
field. New versions of guests using pvclock will query these flags
upon each read.

Flags, however, will only be interpreted when the guest decides to.
It uses the pvclock_valid_flags function to signal that a specific
set of flags should be taken into consideration. Which flags are valid
are usually devised via HV negotiation.

Signed-off-by: Glauber Costaglom...@redhat.com
CC: Jeremy Fitzhardingejer...@goop.org
---
  arch/x86/include/asm/pvclock-abi.h |3 ++-
  arch/x86/include/asm/pvclock.h |1 +
  arch/x86/kernel/pvclock.c  |9 +
  3 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/pvclock-abi.h 
b/arch/x86/include/asm/pvclock-abi.h
index 6d93508..ec5c41a 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -29,7 +29,8 @@ struct pvclock_vcpu_time_info {
u64   system_time;
u32   tsc_to_system_mul;
s8tsc_shift;
-   u8pad[3];
+   u8flags;
+   u8pad[2];
  } __attribute__((__packed__)); /* 32 bytes */

  struct pvclock_wall_clock {
diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 53235fd..cd02f32 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -6,6 +6,7 @@

  /* some helper functions for xen and kvm pv clock sources */
  cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
+void pvclock_set_flags(u8 flags);
  unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
  void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
struct pvclock_vcpu_time_info *vcpu,
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 03801f2..aa2262b 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -31,8 +31,16 @@ struct pvclock_shadow_time {
u32 tsc_to_nsec_mul;
int tsc_shift;
u32 version;
+   u8  flags;
  };

+static u8 valid_flags = 0;
+
   


Minor optimization: __read_mostly.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Booting/installing WindowsNT

2010-05-05 Thread Andre Przywara

Avi Kivity wrote:

On 05/04/2010 06:27 PM, Andre Przywara wrote:



3.  In all other cases so far it BSoDs with STOP 0x3E error
  right before displaying that kernel message.

MSDN talks about a mulitprocessor configuration error:
http://msdn.microsoft.com/en-us/library/ms819006.aspx
I suspected the offline CPUs in the mptable that confuse NT. But -smp 
1,maxcpus=1 does not make a difference. I will try to dig deeper in 
this area.

OK, I tackled this down. It is the max CPUID level that differs.
In the AMD CPUID guide leafs _0002 till _0004 are reserved, the 
CPU that Michael and I used (K8RevF) actually have a max leaf of 1 here.

Default qemu64 has a max leaf of 4.
So by saying -cpu qemu64,level=1 (or 2 or 3) it works for me.
Modern OS only read leaf 4 on Intel systems, it seems that NT4 is 
missing this.

I will now think about a proper fix for this.


What about disabling ACPI?  smp should still work through the mptable.

Didn't make a difference.

Regards,
Andre.


--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Booting/installing WindowsNT

2010-05-05 Thread Avi Kivity

On 05/05/2010 11:32 AM, Andre Przywara wrote:

Avi Kivity wrote:

On 05/04/2010 06:27 PM, Andre Przywara wrote:



3.  In all other cases so far it BSoDs with STOP 0x3E error
  right before displaying that kernel message.

MSDN talks about a mulitprocessor configuration error:
http://msdn.microsoft.com/en-us/library/ms819006.aspx
I suspected the offline CPUs in the mptable that confuse NT. But 
-smp 1,maxcpus=1 does not make a difference. I will try to dig 
deeper in this area.

OK, I tackled this down. It is the max CPUID level that differs.
In the AMD CPUID guide leafs _0002 till _0004 are reserved, 
the CPU that Michael and I used (K8RevF) actually have a max leaf of 1 
here.

Default qemu64 has a max leaf of 4.
So by saying -cpu qemu64,level=1 (or 2 or 3) it works for me.
Modern OS only read leaf 4 on Intel systems, it seems that NT4 is 
missing this.

I will now think about a proper fix for this.


I don't understand.  Shouldn't the values for cpuid leaf 4 be the same 
for qemu64 whether the cpu is Intel or AMD?  The real cpuid shouldn't 
matter.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] replace set_msr_entry with kvm_msr_entry

2010-05-05 Thread Avi Kivity

On 05/04/2010 09:35 PM, Glauber Costa wrote:

this is yet another function that upstream qemu implements,
so we can just use its implementation.

   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vCPU scalability for linux VMs

2010-05-05 Thread Avi Kivity

On 05/05/2010 04:45 AM, Alec Istomin wrote:

Gentlemen,
  Reaching out with a non-development question, sorry if it's not
  appropriate here.

  I'm looking for a way to improve Linux SMP VMs performance under KVM.

  My preliminary results show that single vCPU Linux VMs perform up to 10
  times better than 4vCPU Linux VMs (consolidated performance of 8 VMs on
  8 core pre-Nehalem server). I suspect that I'm missing something major
  and look for any means that can help improve SMP VMs performance.


   


So you have a total of 32 vcpus on 8 cores?  This is known to be 
problematic.  You may see some improvement by enabling hyperthreading.


There is ongoing work to improve this.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: Process exit requests in kvm loop

2010-05-05 Thread Avi Kivity

On 05/04/2010 12:28 PM, Jan Kiszka wrote:

This unbreaks the monitor quit command for qemu-kvm.

   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm: event writeback can overwrite interrupts with -no-kvm-irqchip

2010-05-05 Thread Avi Kivity

On 05/04/2010 05:15 AM, Marcelo Tosatti wrote:

Interrupts that are injected during a vcpu event save/writeback cycle
are lost.

Fix by writebacking the state before injecting interrupts.
   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/6] qemu-kvm: use upstream memslot code

2010-05-05 Thread Avi Kivity

On 05/04/2010 01:48 AM, Marcelo Tosatti wrote:

See individual patches for details.


   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Booting/installing WindowsNT

2010-05-05 Thread Michael Tokarev

05.05.2010 12:32, Andre Przywara wrote:

Avi Kivity wrote:

On 05/04/2010 06:27 PM, Andre Przywara wrote:



3. In all other cases so far it BSoDs with STOP 0x3E error
right before displaying that kernel message.

MSDN talks about a mulitprocessor configuration error:
http://msdn.microsoft.com/en-us/library/ms819006.aspx
I suspected the offline CPUs in the mptable that confuse NT. But -smp
1,maxcpus=1 does not make a difference. I will try to dig deeper in
this area.

OK, I tackled this down. It is the max CPUID level that differs.
In the AMD CPUID guide leafs _0002 till _0004 are reserved, the
CPU that Michael and I used (K8RevF) actually have a max leaf of 1 here.
Default qemu64 has a max leaf of 4.
So by saying -cpu qemu64,level=1 (or 2 or 3) it works for me.
Modern OS only read leaf 4 on Intel systems, it seems that NT4 is
missing this.


Confirmed, with -cpu qemu64,level=[123] it works for me as well.

Note again that after service pack 6 (I haven't tried other SPs),
the problem goes away entirely -- winNT SP6 works with the default
kvm cpu just fine.

Thanks!

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fix -mem-path with hugetlbfs

2010-05-05 Thread Avi Kivity

On 05/04/2010 12:12 AM, Marcelo Tosatti wrote:

Avi, please apply to both master and uq/master.

---

Fallback to qemu_vmalloc in case file_ram_alloc fails.
   


applied to both, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [qemu-kvm tests PATCH] qemu-kvm tests: enhanced msr test

2010-05-05 Thread Avi Kivity

On 05/02/2010 06:10 PM, Naphtali Sprei wrote:

Changed the code structure and added few tests for some of the msr's.
   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Booting/installing WindowsNT

2010-05-05 Thread Andre Przywara

Avi Kivity wrote:

On 05/05/2010 11:32 AM, Andre Przywara wrote:

Avi Kivity wrote:

On 05/04/2010 06:27 PM, Andre Przywara wrote:



3.  In all other cases so far it BSoDs with STOP 0x3E error
  right before displaying that kernel message.

MSDN talks about a mulitprocessor configuration error:
http://msdn.microsoft.com/en-us/library/ms819006.aspx
I suspected the offline CPUs in the mptable that confuse NT. But 
-smp 1,maxcpus=1 does not make a difference. I will try to dig 
deeper in this area.

OK, I tackled this down. It is the max CPUID level that differs.
In the AMD CPUID guide leafs _0002 till _0004 are reserved, 
the CPU that Michael and I used (K8RevF) actually have a max leaf of 1 
here.

Default qemu64 has a max leaf of 4.
So by saying -cpu qemu64,level=1 (or 2 or 3) it works for me.
Modern OS only read leaf 4 on Intel systems, it seems that NT4 is 
missing this.

I will now think about a proper fix for this.


I don't understand.  Shouldn't the values for cpuid leaf 4 be the same 
for qemu64 whether the cpu is Intel or AMD?  The real cpuid shouldn't 
matter.
Yes, but if the max leaf value is smaller than 4, then the guest will 
not read it. It seems that NT does not like the entries returned by KVM 
for leaf 4. I am about to find out what exactly is causing that.
I have the theory that the stop is intentional as NT4 workstation does 
not _want_ to support certain SMP configurations (more than 2 
processors?) I have seen similar issue with WinXPPro and -smp 4 (which 
went away with -smp 4,cores=4).


Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [qemu-kvm tests PATCH] qemu-kvm tests: fix linker script problem

2010-05-05 Thread Avi Kivity

On 05/03/2010 02:34 PM, Naphtali Sprei wrote:

This is a fix to a previous patch by me.
It's on 'next' branch, as of now.

commit 848bd0c89c83814023cf51c72effdbc7de0d18b7 causes the linker script
itself (flat.lds) to become part of the linked objects, which messed
the output file, one such problem is that symbol edata is not the last symbol
anymore.



diff --git a/kvm/user/config-x86-common.mak b/kvm/user/config-x86-common.mak
index 61cc2f0..ad7aeac 100644
--- a/kvm/user/config-x86-common.mak
+++ b/kvm/user/config-x86-common.mak
@@ -19,7 +19,7 @@ CFLAGS += -m$(bits)
  libgcc := $(shell $(CC) -m$(bits) --print-libgcc-file-name)

  FLATLIBS = test/lib/libcflat.a $(libgcc)
-%.flat: %.o $(FLATLIBS) flat.lds
+%.flat: %.o $(FLATLIBS)
$(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,flat.lds $^ $(FLATLIBS)

   


This drops the dependency, so if flat.lds changes, we don't rebuild.

I think you can replace $^ by $(filter %.o, $^) and retain the dependency.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Booting/installing WindowsNT

2010-05-05 Thread Avi Kivity

On 05/05/2010 11:51 AM, Michael Tokarev wrote:

In the AMD CPUID guide leafs _0002 till _0004 are reserved, the
CPU that Michael and I used (K8RevF) actually have a max leaf of 1 here.
Default qemu64 has a max leaf of 4.
So by saying -cpu qemu64,level=1 (or 2 or 3) it works for me.
Modern OS only read leaf 4 on Intel systems, it seems that NT4 is
missing this.

OK, I tackled this down. It is the max CPUID level that differs.

Confirmed, with -cpu qemu64,level=[123] it works for me as well.

Note again that after service pack 6 (I haven't tried other SPs),
the problem goes away entirely -- winNT SP6 works with the default
kvm cpu just fine.


Interesting, may be a guest bug that was fixed later.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [qemu-kvm tests PATCH] qemu-kvm tests: merged stringio into emulator

2010-05-05 Thread Avi Kivity

On 05/03/2010 06:39 PM, Naphtali Sprei wrote:

based on 'next' branch.

Changed test-case stringio into C code and merged into emulator test-case.
Removed traces of stringio test-case.

   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM hook for code integrity checking

2010-05-05 Thread Avi Kivity

On 04/30/2010 05:53 PM, Suen Chun Hui wrote:

Dear KVM developers,

I'm currently working on an open source security patch to use KVM to
implement code verification on a guest VM in runtime. Thus, it would be
very helpful if someone can point to me the right function or place to
look at for adding 2 hooks into the KVM paging code to:

1. Detect a new guest page (which I assume will imply a new pte and
imply a new spte).
Currently, I'm considering putting a hook in the function
mmu_set_spte(), but may there is a better place.
This hook will be used as the main entry point into the code
verification function
   


This is in general not possible.  Hosts with npt or ept will not see new 
guest ptes.


It could be done with physical pages, but you'll have no way of knowing 
if the pages are used in userspace, the kernel, or both.



2. Detect a write fault to a read-only spte (eg. for the case of
updating the dirty bit back to the guest pte)
Unfortunately, I'm unable to find an appropriate place where this
actually takes place after reading the code many times.
This hook will be used to prevent a secondary peek page from modifying
an existing verified code page.
   


set_spte() or mmu_set_spte() may work.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH 7/9] KVM test: Introduce the local_login()

2010-05-05 Thread Michael Goldish
On 04/29/2010 02:44 AM, Amos Kong wrote:
 On Wed, Apr 28, 2010 at 03:01:40PM +0300, Michael Goldish wrote:
 On 04/26/2010 01:04 PM, Jason Wang wrote:
 This patch introduces a new method which is used to log into the guest
 through the guest serial console. The serial_mode must be set to
 session in order to make use of this patch.

 In what cases would we want to use this feature?  The serial console is
 not supported by all guests and I'm not sure it supports multiple
 concurrent sessions (does it?), so it's probably not possible to use it
 reliably as a replacement for the regular remote shell servers, or even
 as an alternative variant.
 
 We could not get system log by ssh session when network doesn't work(haven't
 launched, down, unstable, ...) Using serial console can get more useful info.
 Control guest by ssh in some network related testcases isn't credible. It 
 should
 be independent.

Can you provide a usage example?  Which test is going to use this and
how?  Do you think it should be used in existing tests or in new tests only?

  
 
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  client/tests/kvm/kvm_vm.py |   25 +
  1 files changed, 25 insertions(+), 0 deletions(-)

 diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
 index 0cdf925..a22893b 100755
 --- a/client/tests/kvm/kvm_vm.py
 +++ b/client/tests/kvm/kvm_vm.py
 @@ -814,7 +814,32 @@ class VM:
  command, ))
  return session
  
 +def local_login(self, timeout=240):
 +
 +Log into the guest via serial console
 +If timeout expires while waiting for output from the guest (e.g. a
 +password prompt or a shell prompt) -- fail.
 +
 +
 +serial_mode = self.params.get(serial_mode)
 +username = self.params.get(username, )
 +password = self.params.get(password, )
 +prompt = self.params.get(shell_prompt, [\#\$])
 +linesep = eval('%s' % self.params.get(shell_linesep, r\n))
  
 +if serial_mode != session:
 +logging.debug(serial_mode is not session)
 +return None
 +else:
 +command = nc -U %s  % self.serial_file_name
 +assist = self.params.get(prompt_assist)
 +session = kvm_utils.remote_login(command, password, prompt, 
 linesep,
 + timeout, , username)
  ^
 You probably meant to pass the prompt assist string to remote_login()
 but instead you're passing .

 +if session:
 +
 session.set_status_test_command(self.params.get(status_test_
 +command, 
 ))
 +return session
 +
  def copy_files_to(self, local_path, remote_path, nic_index=0, 
 timeout=300):
  
  Transfer files to the guest.

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 ___
 Autotest mailing list
 autot...@test.kernel.org
 http://test.kernel.org/cgi-bin/mailman/listinfo/autotest

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What changed since kvm-72 resulting in winNT to fail to boot (STOP 0x0000001E) ?

2010-05-05 Thread Michael Tokarev

02.05.2010 10:06, Avi Kivity wrote:

On 04/30/2010 11:06 PM, Michael Tokarev wrote:

I've a bugreport handy, see
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=575439
about the apparent problem booting winNT 4 in kvm 0.12.
At least 2 people were hit by this issue. In short, when
booting winNT 4.0, it BSODs with error code 0x001E,
which means inaccessible boot device.

Note that it is when upgrading from -72 to 0.12  [...]


What about 0.11? Does it work?


After finding the cause of the other problem (in the thread
Booting/Installing Windows NT, all thanks going to Andre
Przywara), I can proceed with this issue finally.

I tried installing winNT here on old kvm and upgrading kvm.
So far I can say that if winNT were installed with kvm-72
and later, it boots just fine in kvm-0.12.

So I don't know what the problem is in this case.  Maybe
it is becauese the OP installed his winNT guest before
kvm-72 and now in 0.12 the guest is not able to find its
filesystem anymore, or maybe it's because there was some
bug fixed in service pack 1 (which I used here) that makes
the problem go away - I dunno.

But having in mind how picky winNT was for the hardware
changes, I don't think it's worth the effort to debug
this problem further - winNT is really ancient system,
and having upgrade path for kvm from some ancient
development snapshot to current version isn't that
important, IMHO.  Yes, a few people will be hit by this
issue, which is a sad thing, but seriously, we've more
interesting things to do ;)

Thanks!

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Booting/installing WindowsNT

2010-05-05 Thread Andre Przywara

Michael Tokarev wrote:

05.05.2010 12:32, Andre Przywara wrote:

Avi Kivity wrote:

On 05/04/2010 06:27 PM, Andre Przywara wrote:



3. In all other cases so far it BSoDs with STOP 0x3E error
right before displaying that kernel message.

MSDN talks about a mulitprocessor configuration error:
http://msdn.microsoft.com/en-us/library/ms819006.aspx
I suspected the offline CPUs in the mptable that confuse NT. But -smp
1,maxcpus=1 does not make a difference. I will try to dig deeper in
this area.

OK, I tackled this down. It is the max CPUID level that differs.
In the AMD CPUID guide leafs _0002 till _0004 are reserved, the
CPU that Michael and I used (K8RevF) actually have a max leaf of 1 here.
Default qemu64 has a max leaf of 4.
So by saying -cpu qemu64,level=1 (or 2 or 3) it works for me.
Modern OS only read leaf 4 on Intel systems, it seems that NT4 is
missing this.


Confirmed, with -cpu qemu64,level=[123] it works for me as well.

The strange thing is that NT4 never reads leaf 4:
kvm-2341  [003]   228.527874: kvm_cpuid: func 4000 rax 0 rbx 
4b4d564b rcx 564b4d56 rdx 4d
kvm-2341  [003]   228.530033: kvm_cpuid: func 1 rax 623 rbx 800 rcx 
80002001 rdx 78bfbfd
kvm-2341  [003]   228.530081: kvm_cpuid: func 8000 rax 800a rbx 
68747541 rcx 444d4163 rdx 69746e65
kvm-2341  [003]   228.530084: kvm_cpuid: func 8008 rax 3028 rbx 0 
rcx 0 rdx 0
kvm-2341  [003]   228.530147: kvm_cpuid: func 1 rax 623 rbx 800 rcx 
80002001 rdx 78bfbfd
kvm-2341  [002]   228.538254: kvm_cpuid: func 1 rax 623 rbx 800 rcx 
80002001 rdx 78bfbfd
kvm-2341  [002]   228.539902: kvm_cpuid: func 1 rax 623 rbx 800 rcx 
80002001 rdx 78bfbfd
kvm-2341  [002]   236.273370: kvm_cpuid: func 1 rax 623 rbx 800 rcx 
80002001 rdx 78bfbfd
kvm-2341  [002]   236.273381: kvm_cpuid: func 0 rax 4 rbx 68747541 rcx 
444d4163 rdx 69746e65


With level=4 it BSODs afterwards, with level=1 it beyond that:
kvm-2472  [002]   871.379192: kvm_cpuid: func 1 rax 623 rbx 800 rcx 
80002001 rdx 78bfbfd
kvm-2472  [002]   871.379235: kvm_cpuid: func 0 rax 1 rbx 68747541 rcx 
444d4163 rdx 69746e65
kvm-2472  [002]   871.379238: kvm_cpuid: func 1 rax 623 rbx 800 rcx 
80002001 rdx 78bfbfd



Interestingly it also accesses leaf 8000_0008, I thought that that leaf 
wasn't around in 1996.




Note again that after service pack 6 (I haven't tried other SPs),
the problem goes away entirely -- winNT SP6 works with the default
kvm cpu just fine.
I agree with Avi that it looks like a bug to me. I will see if I can 
learn more about it.


Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM test: Add new subtest iozone_windows

2010-05-05 Thread Lucas Meneghel Rodrigues
On Wed, 2010-05-05 at 13:17 +0300, Michael Goldish wrote:
 On 05/04/2010 01:03 AM, Lucas Meneghel Rodrigues wrote:
  Following the new IOzone postprocessing changes, add a new
  KVM subtest iozone_windows, which takes advantage of the
  fact that there's a windows build for the test, so we can
  ship it on winutils.iso and run it, providing this way
  the ability to track IO performance for windows guests also.
  The new test imports the postprocessing library directly
  from iozone, so it can postprocess the results right after
  the benchmark is finished on the windows guest.
  
  I'll update winutils.iso on the download page soon.
  
  Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
  ---
   client/tests/kvm/tests/iozone_windows.py |   40 
  ++
   client/tests/kvm/tests_base.cfg.sample   |7 -
   2 files changed, 46 insertions(+), 1 deletions(-)
   create mode 100644 client/tests/kvm/tests/iozone_windows.py
  
  diff --git a/client/tests/kvm/tests/iozone_windows.py 
  b/client/tests/kvm/tests/iozone_windows.py
  new file mode 100644
  index 000..86ec2c4
  --- /dev/null
  +++ b/client/tests/kvm/tests/iozone_windows.py
  @@ -0,0 +1,40 @@
  +import logging, time, os
  +from autotest_lib.client.common_lib import error
  +from autotest_lib.client.bin import utils
  +from autotest_lib.client.tests.iozone import postprocessing
  +import kvm_subprocess, kvm_test_utils, kvm_utils
  +
  +
  +def run_iozone_windows(test, params, env):
  +
  +Run IOzone for windows on a windows guest:
  +1) Log into a guest
  +2) Execute the IOzone test contained in the winutils.iso
  +3) Get results
  +4) Postprocess it with the IOzone postprocessing module
  +
  +@param test: kvm test object
  +@param params: Dictionary with the test parameters
  +@param env: Dictionary with test environment.
  +
  +vm = kvm_test_utils.get_living_vm(env, params.get(main_vm))
  +session = kvm_test_utils.wait_for_login(vm)
  +results_path = os.path.join(test.resultsdir,
  +'raw_output_%s' % test.iteration)
  +analysisdir = os.path.join(test.resultsdir, 'analysis_%s' % 
  test.iteration)
  +
  +# Run IOzone and record its results
  +c = command=params.get(iozone_cmd)
 
 'command=' looks unnecessary here.

Funny, only realized that I left this variable now that you've
mentioned :) Will fix it

  +t = int(params.get(iozone_timeout))
  +logging.info(Running IOzone command on guest, timeout %ss, t)
  +results = session.get_command_output(command=c, timeout=t)
 
 Does IOzone produce any output while it's running or only when it's
 done?  If the former is true, we might want to print that output as it's
 being produced:
 
 results = session.get_command_output(command=c, timeout=t,
 print_func=logging.debug)

Good point, it generates output while it's running, that just haven't
occurred to me.

Fixed all in r4467

http://autotest.kernel.org/changeset/4467

Thanks!

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


question on virtio

2010-05-05 Thread Michael S. Tsirkin
Hi!
I see this in virtio_ring.c:

/* Put entry in available array (but don't update avail-idx *
   until they do sync). */

Why is it done this way?
It seems that updating the index straight away would be simpler, while
this might allow the host to specilatively look up the buffer and handle
it, without waiting for the kick.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Booting/installing WindowsNT

2010-05-05 Thread Avi Kivity

On 05/05/2010 01:18 PM, Andre Przywara wrote:

Michael Tokarev wrote:

05.05.2010 12:32, Andre Przywara wrote:

Avi Kivity wrote:

On 05/04/2010 06:27 PM, Andre Przywara wrote:



3. In all other cases so far it BSoDs with STOP 0x3E error
right before displaying that kernel message.

MSDN talks about a mulitprocessor configuration error:
http://msdn.microsoft.com/en-us/library/ms819006.aspx
I suspected the offline CPUs in the mptable that confuse NT. But -smp
1,maxcpus=1 does not make a difference. I will try to dig deeper in
this area.

OK, I tackled this down. It is the max CPUID level that differs.
In the AMD CPUID guide leafs _0002 till _0004 are reserved, the
CPU that Michael and I used (K8RevF) actually have a max leaf of 1 
here.

Default qemu64 has a max leaf of 4.
So by saying -cpu qemu64,level=1 (or 2 or 3) it works for me.
Modern OS only read leaf 4 on Intel systems, it seems that NT4 is
missing this.


Confirmed, with -cpu qemu64,level=[123] it works for me as well.

The strange thing is that NT4 never reads leaf 4:
kvm-2341  [003]   228.527874: kvm_cpuid: func 4000 rax 0 rbx 
4b4d564b rcx 564b4d56 rdx 4d
kvm-2341  [003]   228.530033: kvm_cpuid: func 1 rax 623 rbx 800 rcx 
80002001 rdx 78bfbfd
kvm-2341  [003]   228.530081: kvm_cpuid: func 8000 rax 800a 
rbx 68747541 rcx 444d4163 rdx 69746e65
kvm-2341  [003]   228.530084: kvm_cpuid: func 8008 rax 3028 rbx 0 
rcx 0 rdx 0
kvm-2341  [003]   228.530147: kvm_cpuid: func 1 rax 623 rbx 800 rcx 
80002001 rdx 78bfbfd
kvm-2341  [002]   228.538254: kvm_cpuid: func 1 rax 623 rbx 800 rcx 
80002001 rdx 78bfbfd
kvm-2341  [002]   228.539902: kvm_cpuid: func 1 rax 623 rbx 800 rcx 
80002001 rdx 78bfbfd
kvm-2341  [002]   236.273370: kvm_cpuid: func 1 rax 623 rbx 800 rcx 
80002001 rdx 78bfbfd
kvm-2341  [002]   236.273381: kvm_cpuid: func 0 rax 4 rbx 68747541 rcx 
444d4163 rdx 69746e65


So maybe it's just a simple guest bug that was never encountered in real 
life because no processors had that leaf.




With level=4 it BSODs afterwards, with level=1 it beyond that:
kvm-2472  [002]   871.379192: kvm_cpuid: func 1 rax 623 rbx 800 rcx 
80002001 rdx 78bfbfd
kvm-2472  [002]   871.379235: kvm_cpuid: func 0 rax 1 rbx 68747541 rcx 
444d4163 rdx 69746e65
kvm-2472  [002]   871.379238: kvm_cpuid: func 1 rax 623 rbx 800 rcx 
80002001 rdx 78bfbfd



Interestingly it also accesses leaf 8000_0008, I thought that that 
leaf wasn't around in 1996.


It's the bios:
src/mtrr.c:cpuid(0x8008u, eax, ebx, ecx, edx);


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] x86: eliminate TS_XSAVE

2010-05-05 Thread H. Peter Anvin
Your code is functionally equivalent to the immediate values patch; neither 
uses a direct branch which would be more efficient.

Avi Kivity a...@redhat.com wrote:

On 05/04/2010 09:24 PM, H. Peter Anvin wrote:

 I would like to request one change, however.  I would like to see the
 alternatives code to be:

  movb $0,reg
  movb $1,reg

 ... instead of using xor (which has to be padded with NOPs, which is of
 course pointless since the slot is a fixed size.)

Right.

 I would suggest using
 a byte-sized variable instead of a dword-size variable to save a few
 bytes, too.


I used a bool, and the code already compiles to a byte mov.  Though it 
could be argued that a word instruction is better since it avoids a 
false dependency, and allows a preceding instruction that modifies %reg 
to be executed after the mov instruction.

 Once the jump label framework is integrated and has matured, I think we
 should consider using it to save the mov/test/jump.


IIRC that has an implied unlikely() which isn't suitable here?

Perhaps the immediate values patches.

-- 
error compiling committee.c: too many arguments to function


-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Re: [PATCH 1/2] x86: eliminate TS_XSAVE

2010-05-05 Thread H. Peter Anvin
You don't want to use bool since some gcc versions don't handle bool in asm 
well; use a u8 instead. 

Avi Kivity a...@redhat.com wrote:

On 05/04/2010 09:24 PM, H. Peter Anvin wrote:

 I would like to request one change, however.  I would like to see the
 alternatives code to be:

  movb $0,reg
  movb $1,reg

 ... instead of using xor (which has to be padded with NOPs, which is of
 course pointless since the slot is a fixed size.)

Right.

 I would suggest using
 a byte-sized variable instead of a dword-size variable to save a few
 bytes, too.


I used a bool, and the code already compiles to a byte mov.  Though it 
could be argued that a word instruction is better since it avoids a 
false dependency, and allows a preceding instruction that modifies %reg 
to be executed after the mov instruction.

 Once the jump label framework is integrated and has matured, I think we
 should consider using it to save the mov/test/jump.


IIRC that has an implied unlikely() which isn't suitable here?

Perhaps the immediate values patches.

-- 
error compiling committee.c: too many arguments to function


-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Re: [PATCH v3 7/10] KVM MMU: allow more page become unsync at gfn mapping time

2010-05-05 Thread Xiao Guangrong


Marcelo Tosatti wrote:
 On Wed, Apr 28, 2010 at 11:55:49AM +0800, Xiao Guangrong wrote:
 In current code, shadow page can become asynchronous only if one
 shadow page for a gfn, this rule is too strict, in fact, we can
 let all last mapping page(i.e, it's the pte page) become unsync,
 and sync them at invlpg or flush tlb time.

 This patch allow more page become asynchronous at gfn mapping time 

 Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
 
 Xiao,
 
 This patch breaks Fedora 8 32 install. Reverted patches 5-10.

Hi Marcelo,

Sorry for the delay reply since i'm on holiday.

I have found the reason of this issue, two fix patches will be sent soon,
could you please try it?

Thanks,
Xiao
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM MMU: fix for forgot mark parent-unsync_children bit

2010-05-05 Thread Xiao Guangrong
When mapping a new parent to unsync shadow page, we should mark
parent's unsync_children bit

Reported-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 97f2ea0..bf35a2f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1374,7 +1374,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (sp-unsync_children) {
set_bit(KVM_REQ_MMU_SYNC, vcpu-requests);
kvm_mmu_mark_parents_unsync(sp);
-   }
+   } else if (sp-unsync)
+   kvm_mmu_mark_parents_unsync(sp);
+
trace_kvm_mmu_get_page(sp, false);
return sp;
}
-- 
1.6.1.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM MMU: fix race in invlpg code

2010-05-05 Thread Xiao Guangrong
It has race in invlpg code, like below sequences:

A: hold mmu_lock and get 'sp'
B: release mmu_lock and do other things
C: hold mmu_lock and continue use 'sp'

if other path freed 'sp' in stage B, then kernel will crash

This patch checks 'sp' whether lived before use 'sp' in stage C

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/paging_tmpl.h |   22 --
 1 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 624b38f..13ea675 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -462,11 +462,16 @@ out_unlock:
 
 static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
 {
-   struct kvm_mmu_page *sp = NULL;
+   struct kvm_mmu_page *sp = NULL, *s;
struct kvm_shadow_walk_iterator iterator;
+   struct hlist_head *bucket;
+   struct hlist_node *node, *tmp;
gfn_t gfn = -1;
u64 *sptep = NULL, gentry;
int invlpg_counter, level, offset = 0, need_flush = 0;
+   unsigned index;
+   bool live = false;
+   union kvm_mmu_page_role role;
 
spin_lock(vcpu-kvm-mmu_lock);
 
@@ -480,7 +485,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
 
if (!sp-unsync)
break;
-
+   role = sp-role;
WARN_ON(level != PT_PAGE_TABLE_LEVEL);
shift = PAGE_SHIFT -
  (PT_LEVEL_BITS - PT64_LEVEL_BITS) * level;
@@ -519,10 +524,23 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t 
gva)
 
mmu_guess_page_from_pte_write(vcpu, gfn_to_gpa(gfn) + offset, gentry);
spin_lock(vcpu-kvm-mmu_lock);
+   index = kvm_page_table_hashfn(gfn);
+   bucket = vcpu-kvm-arch.mmu_page_hash[index];
+   hlist_for_each_entry_safe(s, node, tmp, bucket, hash_link)
+   if (s == sp) {
+   if (s-gfn == gfn  s-role.word == role.word)
+   live = true;
+   break;
+   }
+
+   if (!live)
+   goto unlock_exit;
+
if (atomic_read(vcpu-kvm-arch.invlpg_counter) == invlpg_counter) {
++vcpu-kvm-stat.mmu_pte_updated;
FNAME(update_pte)(vcpu, sp, sptep, gentry);
}
+unlock_exit:
spin_unlock(vcpu-kvm-mmu_lock);
mmu_release_page_from_pte_write(vcpu);
 }
-- 
1.6.1.2


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM MMU: fix for forgot mark parent-unsync_children bit

2010-05-05 Thread Avi Kivity

On 05/05/2010 03:19 PM, Xiao Guangrong wrote:

When mapping a new parent to unsync shadow page, we should mark
parent's unsync_children bit

Reported-by: Marcelo Tosattimtosa...@redhat.com
Signed-off-by: Xiao Guangrongxiaoguangr...@cn.fujitsu.com
---
  arch/x86/kvm/mmu.c |4 +++-
  1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 97f2ea0..bf35a2f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1374,7 +1374,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (sp-unsync_children) {
set_bit(KVM_REQ_MMU_SYNC,vcpu-requests);
kvm_mmu_mark_parents_unsync(sp);
-   }
+   } else if (sp-unsync)
+   kvm_mmu_mark_parents_unsync(sp);
+
trace_kvm_mmu_get_page(sp, false);
return sp;
}
   



Which patch does this fix?  If it wasn't merge yet, please repost with 
the fix included.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM MMU: fix race in invlpg code

2010-05-05 Thread Avi Kivity

On 05/05/2010 03:21 PM, Xiao Guangrong wrote:

It has race in invlpg code, like below sequences:

A: hold mmu_lock and get 'sp'
B: release mmu_lock and do other things
C: hold mmu_lock and continue use 'sp'

if other path freed 'sp' in stage B, then kernel will crash

This patch checks 'sp' whether lived before use 'sp' in stage C

Signed-off-by: Xiao Guangrongxiaoguangr...@cn.fujitsu.com
---
  arch/x86/kvm/paging_tmpl.h |   22 --
  1 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 624b38f..13ea675 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -462,11 +462,16 @@ out_unlock:

  static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
  {
-   struct kvm_mmu_page *sp = NULL;
+   struct kvm_mmu_page *sp = NULL, *s;
struct kvm_shadow_walk_iterator iterator;
+   struct hlist_head *bucket;
+   struct hlist_node *node, *tmp;
gfn_t gfn = -1;
u64 *sptep = NULL, gentry;
int invlpg_counter, level, offset = 0, need_flush = 0;
+   unsigned index;
+   bool live = false;
+   union kvm_mmu_page_role role;

spin_lock(vcpu-kvm-mmu_lock);

@@ -480,7 +485,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)

if (!sp-unsync)
break;
-
+   role = sp-role;
WARN_ON(level != PT_PAGE_TABLE_LEVEL);
shift = PAGE_SHIFT -
  (PT_LEVEL_BITS - PT64_LEVEL_BITS) * level;
@@ -519,10 +524,23 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t 
gva)

mmu_guess_page_from_pte_write(vcpu, gfn_to_gpa(gfn) + offset, gentry);
spin_lock(vcpu-kvm-mmu_lock);
+   index = kvm_page_table_hashfn(gfn);
+   bucket =vcpu-kvm-arch.mmu_page_hash[index];
+   hlist_for_each_entry_safe(s, node, tmp, bucket, hash_link)
+   if (s == sp) {
+   if (s-gfn == gfn  s-role.word == role.word)
+   live = true;
+   break;
+   }
+
+   if (!live)
+   goto unlock_exit;
+
   


Did you try the root_count method?  I think it's cleaner.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM MMU: fix for forgot mark parent-unsync_children bit

2010-05-05 Thread Xiao Guangrong


Avi Kivity wrote:
 On 05/05/2010 03:19 PM, Xiao Guangrong wrote:
 When mapping a new parent to unsync shadow page, we should mark
 parent's unsync_children bit

 Reported-by: Marcelo Tosattimtosa...@redhat.com
 Signed-off-by: Xiao Guangrongxiaoguangr...@cn.fujitsu.com
 ---
   arch/x86/kvm/mmu.c |4 +++-
   1 files changed, 3 insertions(+), 1 deletions(-)

 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 97f2ea0..bf35a2f 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -1374,7 +1374,9 @@ static struct kvm_mmu_page
 *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
   if (sp-unsync_children) {
   set_bit(KVM_REQ_MMU_SYNC,vcpu-requests);
   kvm_mmu_mark_parents_unsync(sp);
 -}
 +} else if (sp-unsync)
 +kvm_mmu_mark_parents_unsync(sp);
 +
   trace_kvm_mmu_get_page(sp, false);
   return sp;
   }

 
 
 Which patch does this fix?  If it wasn't merge yet, please repost with
 the fix included.

Oh, OK, i'll sent the previous pathset that are reverted by Marcelo.

Thanks,
Xiao
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM MMU: fix race in invlpg code

2010-05-05 Thread Xiao Guangrong


Avi Kivity wrote:

   spin_lock(vcpu-kvm-mmu_lock);
 +index = kvm_page_table_hashfn(gfn);
 +bucket =vcpu-kvm-arch.mmu_page_hash[index];
 +hlist_for_each_entry_safe(s, node, tmp, bucket, hash_link)
 +if (s == sp) {
 +if (s-gfn == gfn  s-role.word == role.word)
 +live = true;
 +break;
 +}
 +
 +if (!live)
 +goto unlock_exit;
 +

 
 Did you try the root_count method?  I think it's cleaner.

Avi, Thanks for your idea.

I have considered this method, but i'm not sure when it's the good time
to real free this page, and i think we also need a way to synchronize the
real free path and this path. Do you have any comment for it :-(

Xiao
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM MMU: fix race in invlpg code

2010-05-05 Thread Avi Kivity

On 05/05/2010 03:45 PM, Xiao Guangrong wrote:


Avi Kivity wrote:

   

   spin_lock(vcpu-kvm-mmu_lock);
+index = kvm_page_table_hashfn(gfn);
+bucket =vcpu-kvm-arch.mmu_page_hash[index];
+hlist_for_each_entry_safe(s, node, tmp, bucket, hash_link)
+if (s == sp) {
+if (s-gfn == gfn   s-role.word == role.word)
+live = true;
+break;
+}
+
+if (!live)
+goto unlock_exit;
+

   

Did you try the root_count method?  I think it's cleaner.
 

Avi, Thanks for your idea.

I have considered this method, but i'm not sure when it's the good time
to real free this page, and i think we also need a way to synchronize the
real free path and this path. Do you have any comment for it :-(
   


Same as mmu_free_roots():

--sp-root_count;
if (!sp-root_count  sp-role.invalid) {
kvm_mmu_zap_page(vcpu-kvm, sp);
goto unlock_exit;
}


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] KVM MMU: do not intercept invlpg if 'oos_shadow' is disabled

2010-05-05 Thread Xiao Guangrong


Avi Kivity wrote:
 On 04/30/2010 12:05 PM, Xiao Guangrong wrote:
 If 'oos_shadow' == 0, intercepting invlpg command is really
 unnecessary.

 And it's good for us to compare the performance between enable
 'oos_shadow'
 and disable 'oos_shadow'

 @@ -74,8 +74,9 @@ static int dbg = 0;
   module_param(dbg, bool, 0644);
   #endif

 -static int oos_shadow = 1;
 +int __read_mostly oos_shadow = 1;
   module_param(oos_shadow, bool, 0644);
 +EXPORT_SYMBOL_GPL(oos_shadow);

 
 Please rename to kvm_oos_shadow to reduce potential for conflict with
 other global names.
 
 But really, this is a debug option, I don't expect people to run with
 oos_shadow=0, so there's not much motivation to optimize it.

Agreed, but, 'oos_shadow' option is document in 
Documentation/kernel-parameters.txt,
if it's just a debug option, i think we do better not document it.

Thanks,
Xiao

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[qemu-kvm tests PATCH v2] qemu-kvm tests: fix linker script problem

2010-05-05 Thread Naphtali Sprei

commit 848bd0c89c83814023cf51c72effdbc7de0d18b7 causes the linker script
itself (flat.lds) to become part of the linked objects, which messed
the output file, specifically, the symbol edata is not the last symbol
anymore.


change v1 - v2
Instead of dropping the dependency, put it on a separate line/rule, so the
lds file will not be considered as one of the dependencies in the linking 
line/rule.

Signed-off-by: Naphtali Sprei nsp...@redhat.com
---
 kvm/user/config-x86-common.mak |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kvm/user/config-x86-common.mak b/kvm/user/config-x86-common.mak
index 61cc2f0..241c422 100644
--- a/kvm/user/config-x86-common.mak
+++ b/kvm/user/config-x86-common.mak
@@ -19,7 +19,8 @@ CFLAGS += -m$(bits)
 libgcc := $(shell $(CC) -m$(bits) --print-libgcc-file-name)
 
 FLATLIBS = test/lib/libcflat.a $(libgcc)
-%.flat: %.o $(FLATLIBS) flat.lds
+%.flat: flat.lds
+%.flat: %.o $(FLATLIBS)
$(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,flat.lds $^ $(FLATLIBS)
 
 tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \
-- 
1.6.3.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] KVM: SVM: Don't allow nested guest to VMMCALL into host

2010-05-05 Thread Joerg Roedel
This patch disables the possibility for a l2-guest to do a
VMMCALL directly into the host. This would happen if the
l1-hypervisor doesn't intercept VMMCALL and the l2-guest
executes this instruction.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/kvm/svm.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index bc087c7..2e9b57a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2036,6 +2036,9 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
svm-vmcb-control.intercept_cr_write = ~INTERCEPT_CR8_MASK;
}
 
+   /* We don't want to see VMMCALLs from a nested guest */
+   svm-vmcb-control.intercept = ~(1ULL  INTERCEPT_VMMCALL);
+
/*
 * We don't want a nested guest to be more powerful than the guest, so
 * all intercepts are ORed
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] Important fixes for KVM-AMD

2010-05-05 Thread Joerg Roedel
Hi Avi, Marcelo,

here is a set of patches which fix problems in kvm-amd. Patch 1 fixes a
stupid problem with the event-reinjection introduced by me in my
previous patchset.  Patch 2 was a helper to find the bug patch 3 fixes.
I kept it in the patchset because it may be helpful in the future to
debug other problems too. Patch 3 is the most important fix because it
makes kvm-amd on 32 bit hosts work again.  Without this patch the first
vmrum fails with exit-reason VMEXIT_INVALID. Patch 4 fixes the Xen 4.0
shipped with SLES11 in nested svm. The last patch in this series fixes a
potential l2-guest breakout scenario because it may be possible for the
l2-guest to issue hypercalls directly to the host if the l1-hypervisor
does not intercept VMMCALL.

Thanks,

Joerg

Diffstat:

 arch/x86/include/asm/msr-index.h |2 +
 arch/x86/kvm/svm.c   |  108 --
 arch/x86/kvm/x86.c   |2 +-
 3 files changed, 106 insertions(+), 6 deletions(-)

Shortlog:

Joerg Roedel (5):
  KVM: X86: Fix stupid bug in exception reinjection path
  KVM: SVM: Dump vmcb contents on failed vmrun
  KVM: SVM: Fix wrong intercept masks on 32 bit
  KVM: SVM: Allow EFER.LMSLE to be set with nested svm
  KVM: SVM: Don't allow nested guest to VMMCALL into host


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] KVM: X86: Fix stupid bug in exception reinjection path

2010-05-05 Thread Joerg Roedel
The patch merged recently which allowed to mark an exception
as reinjected has a bug as it always marks the exception as
reinjected. This breaks nested-svm shadow-on-shadow
implementation.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/kvm/x86.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b2ce1d..c83528e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -277,7 +277,7 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
vcpu-arch.exception.has_error_code = has_error;
vcpu-arch.exception.nr = nr;
vcpu-arch.exception.error_code = error_code;
-   vcpu-arch.exception.reinject = true;
+   vcpu-arch.exception.reinject = reinject;
return;
}
 
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] KVM: SVM: Allow EFER.LMSLE to be set with nested svm

2010-05-05 Thread Joerg Roedel
This patch enables setting of efer bit 13 which is allowed
in all SVM capable processors. This is necessary for the
SLES11 version of Xen 4.0 to boot with nested svm.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/include/asm/msr-index.h |2 ++
 arch/x86/kvm/svm.c   |2 +-
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index bc473ac..352767d 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -20,6 +20,7 @@
 #define _EFER_LMA  10 /* Long mode active (read-only) */
 #define _EFER_NX   11 /* No execute enable */
 #define _EFER_SVME 12 /* Enable virtualization */
+#define _EFER_LMSLE13 /* Long Mode Segment Limit Enable */
 #define _EFER_FFXSR14 /* Enable Fast FXSAVE/FXRSTOR */
 
 #define EFER_SCE   (1_EFER_SCE)
@@ -27,6 +28,7 @@
 #define EFER_LMA   (1_EFER_LMA)
 #define EFER_NX(1_EFER_NX)
 #define EFER_SVME  (1_EFER_SVME)
+#define EFER_LMSLE (1_EFER_LMSLE)
 #define EFER_FFXSR (1_EFER_FFXSR)
 
 /* Intel MSRs. Some also available on other CPUs */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 74f7b9d..bc087c7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -610,7 +610,7 @@ static __init int svm_hardware_setup(void)
 
if (nested) {
printk(KERN_INFO kvm: Nested Virtualization enabled\n);
-   kvm_enable_efer_bits(EFER_SVME);
+   kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
}
 
for_each_possible_cpu(cpu) {
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] KVM: SVM: Dump vmcb contents on failed vmrun

2010-05-05 Thread Joerg Roedel
This patch adds a function to dump the vmcb into the kernel
log and calls it after a failed vmrun to ease debugging.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/kvm/svm.c |   95 
 1 files changed, 95 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 889f660..0201b06 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2637,6 +2637,99 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm) 
= {
[SVM_EXIT_NPF]  = pf_interception,
 };
 
+void dump_vmcb(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+   struct vmcb_control_area *control = svm-vmcb-control;
+   struct vmcb_save_area *save = svm-vmcb-save;
+
+   pr_err(VMCB Control Area:\n);
+   pr_err(cr_read:%04x\n, control-intercept_cr_read);
+   pr_err(cr_write:   %04x\n, control-intercept_cr_write);
+   pr_err(dr_read:%04x\n, control-intercept_dr_read);
+   pr_err(dr_write:   %04x\n, control-intercept_dr_write);
+   pr_err(exceptions: %08x\n, control-intercept_exceptions);
+   pr_err(intercepts: %016llx\n, control-intercept);
+   pr_err(pause filter count: %d\n, control-pause_filter_count);
+   pr_err(iopm_base_pa:   %016llx\n, control-iopm_base_pa);
+   pr_err(msrpm_base_pa:  %016llx\n, control-msrpm_base_pa);
+   pr_err(tsc_offset: %016llx\n, control-tsc_offset);
+   pr_err(asid:   %d\n, control-asid);
+   pr_err(tlb_ctl:%d\n, control-tlb_ctl);
+   pr_err(int_ctl:%08x\n, control-int_ctl);
+   pr_err(int_vector: %08x\n, control-int_vector);
+   pr_err(int_state:  %08x\n, control-int_state);
+   pr_err(exit_code:  %08x\n, control-exit_code);
+   pr_err(exit_info1: %016llx\n, control-exit_info_1);
+   pr_err(exit_info2: %016llx\n, control-exit_info_2);
+   pr_err(exit_int_info:  %08x\n, control-exit_int_info);
+   pr_err(exit_int_info_err:  %08x\n, control-exit_int_info_err);
+   pr_err(nested_ctl: %lld\n, control-nested_ctl);
+   pr_err(nested_cr3: %016llx\n, control-nested_cr3);
+   pr_err(event_inj:  %08x\n, control-event_inj);
+   pr_err(event_inj_err:  %08x\n, control-event_inj_err);
+   pr_err(lbr_ctl:%lld\n, control-lbr_ctl);
+   pr_err(next_rip:   %016llx\n, control-next_rip);
+   pr_err(VMCB State Save Area:\n);
+   pr_err(es:   s: %04x a: %04x l: %08x b: %016llx\n,
+   save-es.selector, save-es.attrib,
+   save-es.limit, save-es.base);
+   pr_err(cs:   s: %04x a: %04x l: %08x b: %016llx\n,
+   save-cs.selector, save-cs.attrib,
+   save-cs.limit, save-cs.base);
+   pr_err(ss:   s: %04x a: %04x l: %08x b: %016llx\n,
+   save-ss.selector, save-ss.attrib,
+   save-ss.limit, save-ss.base);
+   pr_err(ds:   s: %04x a: %04x l: %08x b: %016llx\n,
+   save-ds.selector, save-ds.attrib,
+   save-ds.limit, save-ds.base);
+   pr_err(fs:   s: %04x a: %04x l: %08x b: %016llx\n,
+   save-fs.selector, save-fs.attrib,
+   save-fs.limit, save-fs.base);
+   pr_err(gs:   s: %04x a: %04x l: %08x b: %016llx\n,
+   save-gs.selector, save-gs.attrib,
+   save-gs.limit, save-gs.base);
+   pr_err(gdtr: s: %04x a: %04x l: %08x b: %016llx\n,
+   save-gdtr.selector, save-gdtr.attrib,
+   save-gdtr.limit, save-gdtr.base);
+   pr_err(ldtr: s: %04x a: %04x l: %08x b: %016llx\n,
+   save-ldtr.selector, save-ldtr.attrib,
+   save-ldtr.limit, save-ldtr.base);
+   pr_err(idtr: s: %04x a: %04x l: %08x b: %016llx\n,
+   save-idtr.selector, save-idtr.attrib,
+   save-idtr.limit, save-idtr.base);
+   pr_err(tr:   s: %04x a: %04x l: %08x b: %016llx\n,
+   save-tr.selector, save-tr.attrib,
+   save-tr.limit, save-tr.base);
+   pr_err(cpl:%defer: %016llx\n,
+   save-cpl, save-efer);
+   pr_err(cr0:%016llx cr2:  %016llx\n,
+   save-cr0, save-cr2);
+   pr_err(cr3:%016llx cr4:  %016llx\n,
+   save-cr3, save-cr4);
+   pr_err(dr6:%016llx dr7:  %016llx\n,
+   save-dr6, save-dr7);
+   pr_err(rip:%016llx rflags:   %016llx\n,
+   save-rip, save-rflags);
+   pr_err(rsp:%016llx rax:  %016llx\n,
+   save-rsp, save-rax);
+   pr_err(star:   %016llx lstar:%016llx\n,
+   save-star, save-lstar);
+   pr_err(cstar:  %016llx sfmask:   %016llx\n,
+   save-cstar, 

[PATCH 3/5] KVM: SVM: Fix wrong intercept masks on 32 bit

2010-05-05 Thread Joerg Roedel
This patch makes KVM on 32 bit SVM working again by
correcting the masks used for iret interception. With the
wrong masks the upper 32 bits of the intercepts are masked
out which leaves vmrun unintercepted. This is not legal on
svm and the vmrun fails.
Bug was introduced by commits 95ba827313 and 3cfc3092.

Cc: Jan Kiszka jan.kis...@siemens.com
Cc: Gleb Natapov g...@redhat.com
Cc: sta...@kernel.org
Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/kvm/svm.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0201b06..74f7b9d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2290,7 +2290,7 @@ static int cpuid_interception(struct vcpu_svm *svm)
 static int iret_interception(struct vcpu_svm *svm)
 {
++svm-vcpu.stat.nmi_window_exits;
-   svm-vmcb-control.intercept = ~(1UL  INTERCEPT_IRET);
+   svm-vmcb-control.intercept = ~(1ULL  INTERCEPT_IRET);
svm-vcpu.arch.hflags |= HF_IRET_MASK;
return 1;
 }
@@ -2824,7 +2824,7 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
 
svm-vmcb-control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
vcpu-arch.hflags |= HF_NMI_MASK;
-   svm-vmcb-control.intercept |= (1UL  INTERCEPT_IRET);
+   svm-vmcb-control.intercept |= (1ULL  INTERCEPT_IRET);
++vcpu-stat.nmi_injections;
 }
 
@@ -2891,10 +2891,10 @@ static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, 
bool masked)
 
if (masked) {
svm-vcpu.arch.hflags |= HF_NMI_MASK;
-   svm-vmcb-control.intercept |= (1UL  INTERCEPT_IRET);
+   svm-vmcb-control.intercept |= (1ULL  INTERCEPT_IRET);
} else {
svm-vcpu.arch.hflags = ~HF_NMI_MASK;
-   svm-vmcb-control.intercept = ~(1UL  INTERCEPT_IRET);
+   svm-vmcb-control.intercept = ~(1ULL  INTERCEPT_IRET);
}
 }
 
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] KVM MMU: do not intercept invlpg if 'oos_shadow' is disabled

2010-05-05 Thread Avi Kivity

On 05/05/2010 03:54 PM, Xiao Guangrong wrote:


Avi Kivity wrote:
   

On 04/30/2010 12:05 PM, Xiao Guangrong wrote:
 

If 'oos_shadow' == 0, intercepting invlpg command is really
unnecessary.

And it's good for us to compare the performance between enable
'oos_shadow'
and disable 'oos_shadow'

@@ -74,8 +74,9 @@ static int dbg = 0;
   module_param(dbg, bool, 0644);
   #endif

-static int oos_shadow = 1;
+int __read_mostly oos_shadow = 1;
   module_param(oos_shadow, bool, 0644);
+EXPORT_SYMBOL_GPL(oos_shadow);

   

Please rename to kvm_oos_shadow to reduce potential for conflict with
other global names.

But really, this is a debug option, I don't expect people to run with
oos_shadow=0, so there's not much motivation to optimize it.
 

Agreed, but, 'oos_shadow' option is document in 
Documentation/kernel-parameters.txt,
if it's just a debug option, i think we do better not document it.
   


It has to be documented, otherwise people complain :)

Anyway the variable name and the option name don't have to be the same 
(I think).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] KVM: SVM: Allow EFER.LMSLE to be set with nested svm

2010-05-05 Thread Avi Kivity

On 05/05/2010 05:04 PM, Joerg Roedel wrote:

This patch enables setting of efer bit 13 which is allowed
in all SVM capable processors. This is necessary for the
SLES11 version of Xen 4.0 to boot with nested svm.
   


Interesting, why does it require it?

Obviously it isn't needed since it manages to run on Intel without it.


  /* Intel MSRs. Some also available on other CPUs */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 74f7b9d..bc087c7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -610,7 +610,7 @@ static __init int svm_hardware_setup(void)

if (nested) {
printk(KERN_INFO kvm: Nested Virtualization enabled\n);
-   kvm_enable_efer_bits(EFER_SVME);
+   kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
}

for_each_possible_cpu(cpu) {
   


What if the host doesn't have it?

Why enable it only for the nested case?  It's not svm specific (it's 
useful for running non-hvm Xen in non-nested mode).


Isn't there a cpuid bit for it?  If so, it should be exposed to 
userspace, and the feature should depend on it.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] KVM: SVM: Allow EFER.LMSLE to be set with nested svm

2010-05-05 Thread Joerg Roedel
On Wed, May 05, 2010 at 05:46:59PM +0300, Avi Kivity wrote:
 On 05/05/2010 05:04 PM, Joerg Roedel wrote:
 This patch enables setting of efer bit 13 which is allowed
 in all SVM capable processors. This is necessary for the
 SLES11 version of Xen 4.0 to boot with nested svm.


 Interesting, why does it require it?

I don't know. I traced the Xen crash down and found that is gets a #GP
because it tries to set this bit.

 Obviously it isn't needed since it manages to run on Intel without it.

I have heard inofficial statements that they set this bit to provide the
functionality to their guests. And Xen sets this bit together with the
SVM bit.

   /* Intel MSRs. Some also available on other CPUs */
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index 74f7b9d..bc087c7 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -610,7 +610,7 @@ static __init int svm_hardware_setup(void)

  if (nested) {
  printk(KERN_INFO kvm: Nested Virtualization enabled\n);
 -kvm_enable_efer_bits(EFER_SVME);
 +kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
  }

  for_each_possible_cpu(cpu) {


 What if the host doesn't have it?

It is present in all SVM capable AMD processors.

 Why enable it only for the nested case?  It's not svm specific (it's  
 useful for running non-hvm Xen in non-nested mode).

Because there is no cpuid bit for this feature. You can roughly check
for it using the svm cpuid bit.


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [qemu-kvm tests PATCH v2] qemu-kvm tests: fix linker script problem

2010-05-05 Thread Avi Kivity

On 05/05/2010 04:53 PM, Naphtali Sprei wrote:

commit 848bd0c89c83814023cf51c72effdbc7de0d18b7 causes the linker script
itself (flat.lds) to become part of the linked objects, which messed
the output file, specifically, the symbol edata is not the last symbol
anymore.


change v1 -  v2
Instead of dropping the dependency, put it on a separate line/rule, so the
lds file will not be considered as one of the dependencies in the linking 
line/rule.



  FLATLIBS = test/lib/libcflat.a $(libgcc)
-%.flat: %.o $(FLATLIBS) flat.lds
+%.flat: flat.lds
+%.flat: %.o $(FLATLIBS)
$(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,flat.lds $^ $(FLATLIBS)
   


I don't think that works - $^ selects all prerequisites, not just the 
ones in the line for the make rule.


prereq-%:
touch $@

dummy: prereq-1

dummy: prereq-2
echo $^

$ make dummy
touch prereq-2
touch prereq-1
echo prereq-2 prereq-1
prereq-2 prereq-1

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] KVM: SVM: Allow EFER.LMSLE to be set with nested svm

2010-05-05 Thread Avi Kivity

On 05/05/2010 06:04 PM, Joerg Roedel wrote:

Because there is no cpuid bit for this feature.


That is sad.


You can roughly check
for it using the svm cpuid bit.
   


Doesn't it kill cross-vendor migration?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] KVM: SVM: Allow EFER.LMSLE to be set with nested svm

2010-05-05 Thread Roedel, Joerg
On Wed, May 05, 2010 at 11:06:59AM -0400, Avi Kivity wrote:
 On 05/05/2010 06:04 PM, Joerg Roedel wrote:
  You can roughly check
  for it using the svm cpuid bit.
 
 Doesn't it kill cross-vendor migration?

Enabling Nested SVM kills it anyway, so this is not an issue. AFAIK the
feature is not present on Intel CPUs.

Joerg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] turn off kvmclock when resetting cpu

2010-05-05 Thread Glauber Costa
On Wed, May 05, 2010 at 10:26:43AM +0300, Avi Kivity wrote:
 
 +msr_data.info.nmsrs = n;
 +
 +return kvm_vcpu_ioctl(env, KVM_SET_MSRS,msr_data);
 +}
 +
 +
 
 How about a different approach?  Query the supported MSRs
 (KVM_GET_MSR_LIST or thereabout) and reset them (with special cases
 for the TSC, and the old clock MSRs when the new ones are present)?
I didn't went that route because I was unsure that every one of them
would be resetable by writing 0 on it.

And if we are going to special case the most part of it, then there
is no point in doing it.

If you think it is doable to special case just the tsc, then I am fine.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] turn off kvmclock when resetting cpu

2010-05-05 Thread Avi Kivity

On 05/05/2010 06:24 PM, Glauber Costa wrote:

On Wed, May 05, 2010 at 10:26:43AM +0300, Avi Kivity wrote:

   

+msr_data.info.nmsrs = n;
+
+return kvm_vcpu_ioctl(env, KVM_SET_MSRS,msr_data);
+}
+
+
   

How about a different approach?  Query the supported MSRs
(KVM_GET_MSR_LIST or thereabout) and reset them (with special cases
for the TSC, and the old clock MSRs when the new ones are present)?
 

I didn't went that route because I was unsure that every one of them
would be resetable by writing 0 on it.
   


There are probably others.  We should reset them correctly anyway.

It's probably done by generic qemu code so it works.


And if we are going to special case the most part of it, then there
is no point in doing it.

If you think it is doable to special case just the tsc, then I am fine.
   


I think if we have the following sequence

  clear all msrs
  qemu reset
  kvm specific msr reset

Then we'd be fine.

Oh, and tsc needs to be reset to 0 as well - it isn't a special case.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vCPU scalability for linux VMs

2010-05-05 Thread Alec Istomin

On Wednesday, May 5, 2010 at 04:43:32 -0400, Avi Kivity wrote:

 So you have a total of 32 vcpus on 8 cores?  This is known to be 
 problematic.  You may see some improvement by enabling hyperthreading.

exactly, 32 vCPUs on 8 core hardware that doesn't support hyperthreading
(Clovertown E5335)

 There is ongoing work to improve this.

If interested - let me know when is it going to be a good time/build to
do the regression on this.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[qemu-kvm tests PATCH v3] qemu-kvm tests: fix linker script problem

2010-05-05 Thread Naphtali Sprei

commit 848bd0c89c83814023cf51c72effdbc7de0d18b7 causes the linker script
itself (flat.lds) to become part of the linked objects, which messed
the output file, specifically, the symbol edata is not the last symbol
anymore.

changes v2 - v3

Instead of using a separate rule, which doesn't really adds the flat.lds to the 
prerequisite
list, use Avi suggestion of filtering.


Signed-off-by: Naphtali Sprei nsp...@redhat.com
---
 kvm/user/config-x86-common.mak |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kvm/user/config-x86-common.mak b/kvm/user/config-x86-common.mak
index 61cc2f0..cf36aa1 100644
--- a/kvm/user/config-x86-common.mak
+++ b/kvm/user/config-x86-common.mak
@@ -20,7 +20,7 @@ libgcc := $(shell $(CC) -m$(bits) --print-libgcc-file-name)
 
 FLATLIBS = test/lib/libcflat.a $(libgcc)
 %.flat: %.o $(FLATLIBS) flat.lds
-   $(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,flat.lds $^ $(FLATLIBS)
+   $(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,flat.lds $(filter %.o, $^) 
$(FLATLIBS)
 
 tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \
$(TEST_DIR)/smptest.flat  $(TEST_DIR)/port80.flat \
-- 
1.6.3.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] turn off kvmclock when resetting cpu

2010-05-05 Thread Glauber Costa
On Wed, May 05, 2010 at 06:34:22PM +0300, Avi Kivity wrote:
 On 05/05/2010 06:24 PM, Glauber Costa wrote:
 On Wed, May 05, 2010 at 10:26:43AM +0300, Avi Kivity wrote:
 
 +msr_data.info.nmsrs = n;
 +
 +return kvm_vcpu_ioctl(env, KVM_SET_MSRS,msr_data);
 +}
 +
 +
 How about a different approach?  Query the supported MSRs
 (KVM_GET_MSR_LIST or thereabout) and reset them (with special cases
 for the TSC, and the old clock MSRs when the new ones are present)?
 I didn't went that route because I was unsure that every one of them
 would be resetable by writing 0 on it.
 
 There are probably others.  We should reset them correctly anyway.
 
 It's probably done by generic qemu code so it works.
 
 And if we are going to special case the most part of it, then there
 is no point in doing it.
 
 If you think it is doable to special case just the tsc, then I am fine.
 
 I think if we have the following sequence
 
   clear all msrs
   qemu reset
   kvm specific msr reset
 
 Then we'd be fine.
 
 Oh, and tsc needs to be reset to 0 as well - it isn't a special case.
This means a guest running on a perfectly synchronized tsc host
will not see a sync tsc. Simply because we can't possible
reset all tscs at the same time.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch uq/master 0/9] enable smp 1 and related fixes

2010-05-05 Thread Anthony Liguori

On 05/04/2010 07:45 AM, Marcelo Tosatti wrote:

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


How does this work without an in-kernel apic (or does uq/master already 
have an in-kernel apic)?


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Network Plugin Architecture (NPA) for vmxnet3

2010-05-05 Thread Pankaj Thakkar
On Tue, May 04, 2010 at 05:58:52PM -0700, Chris Wright wrote:
 Date: Tue, 4 May 2010 17:58:52 -0700
 From: Chris Wright chr...@sous-sol.org
 To: Pankaj Thakkar pthak...@vmware.com
 CC: linux-ker...@vger.kernel.org linux-ker...@vger.kernel.org,
   net...@vger.kernel.org net...@vger.kernel.org,
   virtualizat...@lists.linux-foundation.org
  virtualizat...@lists.linux-foundation.org,
   pv-driv...@vmware.com pv-driv...@vmware.com,
   Shreyas Bhatewara sbhatew...@vmware.com,
   kvm@vger.kernel.org kvm@vger.kernel.org
 Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
 
 * Pankaj Thakkar (pthak...@vmware.com) wrote:
  We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that
  Linux users can exploit the benefits provided by passthrough devices in a
  seamless manner while retaining the benefits of virtualization. The document
  below tries to answer most of the questions which we anticipated. Please 
  let us
  know your comments and queries.
 
 How does the throughput, latency, and host CPU utilization for normal
 data path compare with say NetQueue?

NetQueue is really for scaling across multiple VMs. NPA allows similar scaling
and also helps in improving the CPU efficiency for a single VM since the
hypervisor is bypassed. Througput wise both emulation and passthrough (NPA) can
obtain line rates on 10gig but passthrough saves upto 40% cpu based on the
workload. We did a demo at IDF 2009 where we compared 8 VMs running on NetQueue
v/s 8 VMs running on NPA (using Niantic) and we obtained similar CPU efficiency
gains.

 
 And does this obsolete your UPT implementation?

NPA and UPT share a lot of code in the hypervisor. UPT was adopted only by very
limited IHVs and hence NPA is our way forward to have all IHVs onboard.

 How many cards actually support this NPA interface?  What does it look
 like, i.e. where is the NPA specification?  (AFAIK, we never got the UPT
 one).

We have it working internally with Intel Niantic (10G) and Kawela (1G) SR-IOV
NIC. We are also working with upcoming Broadcom 10G card and plan to support
other IHVs. This is unlike UPT so we don't dictate the register sets or rings
like we did in UPT. Rather we have guidelines like that the card should have an
embedded switch for inter VF switching or should support programming (rx
filters, VLAN, etc) though the PF driver rather than the VF driver.

 How do you handle hardware which has a more symmetric view of the
 SR-IOV world (SR-IOV is only PCI sepcification, not a network driver
 specification)?  Or hardware which has multiple functions per physical
 port (multiqueue, hw filtering, embedded switch, etc.)?

I am not sure what do you mean by symmetric view of SR-IOV world?

NPA allows multi-queue VFs and requires an embedded switch currently. As far as
the PF driver is concerned we require IHVs to support all existing and upcoming
features like NetQueue, FCoE, etc. The PF driver is considered special and is
used to drive the traffic for the emulated/paravirtualized VMs and is also used
to program things on behalf of the VFs through the hypervisor. If the hardware
has multiple physical functions they are treated as separate adapters (with
their own set of VFs) and we require the embedded switch to maintain that
distinction as well.


  NPA offers several benefits:
  1. Performance: Critical performance sensitive paths are not trapped and the
  guest can directly drive the hardware without incurring virtualization
  overheads.
 
 Can you demonstrate with data?

The setup is 2.667Ghz Nehalem server running SLES11 VM talking to a 2.33Ghz
Barcelona client box running RHEL 5.1. We had netperf streams with 16k msg size
over 64k socket size running between server VM and client and they are using
Intel Niantic 10G cards. In both cases (NPA and regular) the VM was CPU
saturated (used one full core).

TX: regular vmxnet3 = 3085.5 Mbps/GHz; NPA vmxnet3 = 4397.2 Mbps/GHz
RX: regular vmxnet3 = 1379.6 Mbps/GHz; NPA vmxnet3 = 2349.7 Mbps/GHz

We have similar results for other configuration and in general we have seen NPA
is better in terms of CPU cost and can save upto 40% of CPU cost.

 
  2. Hypervisor control: All control operations from the guest such as 
  programming
  MAC address go through the hypervisor layer and hence can be subjected to
  hypervisor policies. The PF driver can be further used to put policy 
  decisions
  like which VLAN the guest should be on.
 
 This can happen without NPA as well.  VF simply needs to request
 the change via the PF (in fact, hw does that right now).  Also, we
 already have a host side management interface via PF (see, for example,
 RTM_SETLINK IFLA_VF_MAC interface).
 
 What is control plane interface?  Just something like a fixed register set?

All operations other than TX/RX go through the vmxnet3 shell to the vmxnet3
device emulation. So the control plane is really the vmxnet3 device emulation
as far as the guest is concerned.

 
  3. Guest Management: No 

  1   2   >