consumer database
Here's some of the healthcare lists we have: Physicians (34 specialties) - 788k records, 17k emails, 200k fax numbers Chiropractors - 108,421 total records * 3,414 emails * 6,553 fax numbers US Surgery Centers - 85k records and 14k emails Theres many more too, just send me an email here for additional info/samples: lavern.mcl...@newbusinesshorizon.co.cc Send email to disapp...@newbusinesshorizon.co.cc to ensure no further communication -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] qemu-kvm: event writeback can overwrite interrupts with -no-kvm-irqchip
From: Marcelo Tosatti mtosa...@redhat.com Interrupts that are injected during a vcpu event save/writeback cycle are lost. Fix by writebacking the state before injecting interrupts. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/qemu-kvm.c b/qemu-kvm.c index 43d599d..749587a 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -885,11 +885,6 @@ int pre_kvm_run(kvm_context_t kvm, CPUState *env) { kvm_arch_pre_run(env, env-kvm_run); -if (env-kvm_vcpu_dirty) { -kvm_arch_load_regs(env, KVM_PUT_RUNTIME_STATE); -env-kvm_vcpu_dirty = 0; -} - pthread_mutex_unlock(qemu_mutex); return 0; } @@ -907,6 +902,10 @@ int kvm_run(CPUState *env) int fd = env-kvm_fd; again: +if (env-kvm_vcpu_dirty) { +kvm_arch_load_regs(env, KVM_PUT_RUNTIME_STATE); +env-kvm_vcpu_dirty = 0; +} push_nmi(kvm); #if !defined(__s390__) if (!kvm-irqchip_in_kernel) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] qemu-kvm: Process exit requests in kvm loop
From: Jan Kiszka jan.kis...@siemens.com This unbreaks the monitor quit command for qemu-kvm. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/qemu-kvm.c b/qemu-kvm.c index 91f0222..43d599d 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -2047,6 +2047,9 @@ int kvm_main_loop(void) vm_stop(EXCP_DEBUG); kvm_debug_cpu_requested = NULL; } +if (qemu_exit_requested()) { +exit(0); +} } pause_all_threads(); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] replace set_msr_entry with kvm_msr_entry
From: Glauber Costa glom...@redhat.com this is yet another function that upstream qemu implements, so we can just use its implementation. Signed-off-by: Glauber Costa glom...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 748ff69..439c31a 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -693,13 +693,6 @@ int kvm_arch_qemu_create_context(void) return 0; } -static void set_msr_entry(struct kvm_msr_entry *entry, uint32_t index, - uint64_t data) -{ -entry-index = index; -entry-data = data; -} - /* returns 0 on success, non-0 on failure */ static int get_msr_entry(struct kvm_msr_entry *entry, CPUState *env) { @@ -960,19 +953,19 @@ void kvm_arch_load_regs(CPUState *env, int level) /* msrs */ n = 0; /* Remember to increase msrs size if you add new registers below */ -set_msr_entry(msrs[n++], MSR_IA32_SYSENTER_CS, env-sysenter_cs); -set_msr_entry(msrs[n++], MSR_IA32_SYSENTER_ESP, env-sysenter_esp); -set_msr_entry(msrs[n++], MSR_IA32_SYSENTER_EIP, env-sysenter_eip); +kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_CS, env-sysenter_cs); +kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_ESP, env-sysenter_esp); +kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_EIP, env-sysenter_eip); if (kvm_has_msr_star) - set_msr_entry(msrs[n++], MSR_STAR, env-star); + kvm_msr_entry_set(msrs[n++], MSR_STAR, env-star); if (kvm_has_vm_hsave_pa) -set_msr_entry(msrs[n++], MSR_VM_HSAVE_PA, env-vm_hsave); +kvm_msr_entry_set(msrs[n++], MSR_VM_HSAVE_PA, env-vm_hsave); #ifdef TARGET_X86_64 if (lm_capable_kernel) { -set_msr_entry(msrs[n++], MSR_CSTAR, env-cstar); -set_msr_entry(msrs[n++], MSR_KERNELGSBASE, env-kernelgsbase); -set_msr_entry(msrs[n++], MSR_FMASK, env-fmask); -set_msr_entry(msrs[n++], MSR_LSTAR , env-lstar); +kvm_msr_entry_set(msrs[n++], MSR_CSTAR, env-cstar); +kvm_msr_entry_set(msrs[n++], MSR_KERNELGSBASE, env-kernelgsbase); +kvm_msr_entry_set(msrs[n++], MSR_FMASK, env-fmask); +kvm_msr_entry_set(msrs[n++], MSR_LSTAR , env-lstar); } #endif if (level == KVM_PUT_FULL_STATE) { @@ -983,20 +976,20 @@ void kvm_arch_load_regs(CPUState *env, int level) * huge jump-backs that would occur without any writeback at all. */ if (smp_cpus == 1 || env-tsc != 0) { -set_msr_entry(msrs[n++], MSR_IA32_TSC, env-tsc); +kvm_msr_entry_set(msrs[n++], MSR_IA32_TSC, env-tsc); } -set_msr_entry(msrs[n++], MSR_KVM_SYSTEM_TIME, env-system_time_msr); -set_msr_entry(msrs[n++], MSR_KVM_WALL_CLOCK, env-wall_clock_msr); +kvm_msr_entry_set(msrs[n++], MSR_KVM_SYSTEM_TIME, env-system_time_msr); +kvm_msr_entry_set(msrs[n++], MSR_KVM_WALL_CLOCK, env-wall_clock_msr); } #ifdef KVM_CAP_MCE if (env-mcg_cap) { if (level == KVM_PUT_RESET_STATE) -set_msr_entry(msrs[n++], MSR_MCG_STATUS, env-mcg_status); +kvm_msr_entry_set(msrs[n++], MSR_MCG_STATUS, env-mcg_status); else if (level == KVM_PUT_FULL_STATE) { -set_msr_entry(msrs[n++], MSR_MCG_STATUS, env-mcg_status); -set_msr_entry(msrs[n++], MSR_MCG_CTL, env-mcg_ctl); +kvm_msr_entry_set(msrs[n++], MSR_MCG_STATUS, env-mcg_status); +kvm_msr_entry_set(msrs[n++], MSR_MCG_CTL, env-mcg_ctl); for (i = 0; i (env-mcg_cap 0xff); i++) -set_msr_entry(msrs[n++], MSR_MC0_CTL + i, env-mce_banks[i]); +kvm_msr_entry_set(msrs[n++], MSR_MC0_CTL + i, env-mce_banks[i]); } } #endif diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 5239eaf..76c1adb 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -552,6 +552,8 @@ static int kvm_put_sregs(CPUState *env) return kvm_vcpu_ioctl(env, KVM_SET_SREGS, sregs); } +#endif + static void kvm_msr_entry_set(struct kvm_msr_entry *entry, uint32_t index, uint64_t value) { @@ -559,6 +561,7 @@ static void kvm_msr_entry_set(struct kvm_msr_entry *entry, entry-data = value; } +#ifdef KVM_UPSTREAM static int kvm_put_msrs(CPUState *env, int level) { struct { -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] introduce qemu_ram_map
From: Marcelo Tosatti mtosa...@redhat.com Which allows drivers to register an mmap region into ram block mappings. To be used by device assignment driver. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/cpu-common.h b/cpu-common.h index 29b4ea5..4b0ba60 100644 --- a/cpu-common.h +++ b/cpu-common.h @@ -40,6 +40,7 @@ static inline void cpu_register_physical_memory(target_phys_addr_t start_addr, } ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr); +ram_addr_t qemu_ram_map(ram_addr_t size, void *host); ram_addr_t qemu_ram_alloc(ram_addr_t); void qemu_ram_free(ram_addr_t addr); /* This should only be used for ram local to a device. */ diff --git a/exec.c b/exec.c index 9748496..4f94e87 100644 --- a/exec.c +++ b/exec.c @@ -2805,6 +2805,34 @@ static void *file_ram_alloc(ram_addr_t memory, const char *path) } #endif +ram_addr_t qemu_ram_map(ram_addr_t size, void *host) +{ +RAMBlock *new_block; + +size = TARGET_PAGE_ALIGN(size); +new_block = qemu_malloc(sizeof(*new_block)); + +new_block-host = host; + +new_block-offset = last_ram_offset; +new_block-length = size; + +new_block-next = ram_blocks; +ram_blocks = new_block; + +phys_ram_dirty = qemu_realloc(phys_ram_dirty, +(last_ram_offset + size) TARGET_PAGE_BITS); +memset(phys_ram_dirty + (last_ram_offset TARGET_PAGE_BITS), + 0xff, size TARGET_PAGE_BITS); + +last_ram_offset += size; + +if (kvm_enabled()) +kvm_setup_guest_memory(new_block-host, size); + +return new_block-offset; +} + ram_addr_t qemu_ram_alloc(ram_addr_t size) { RAMBlock *new_block; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] remove unused kvm_dirty_bitmap array
From: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/qemu-kvm.c b/qemu-kvm.c index ae6570a..779bc5b 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -2156,7 +2156,6 @@ void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size, * dirty pages logging */ /* FIXME: use unsigned long pointer instead of unsigned char */ -unsigned char *kvm_dirty_bitmap = NULL; int kvm_physical_memory_set_dirty_tracking(int enable) { int r = 0; @@ -2165,17 +2164,9 @@ int kvm_physical_memory_set_dirty_tracking(int enable) return 0; if (enable) { -if (!kvm_dirty_bitmap) { -unsigned bitmap_size = BITMAP_SIZE(phys_ram_size); -kvm_dirty_bitmap = qemu_malloc(bitmap_size); -r = kvm_dirty_pages_log_enable_all(kvm_context); -} +r = kvm_dirty_pages_log_enable_all(kvm_context); } else { -if (kvm_dirty_bitmap) { -r = kvm_dirty_pages_log_reset(kvm_context); -qemu_free(kvm_dirty_bitmap); -kvm_dirty_bitmap = NULL; -} +r = kvm_dirty_pages_log_reset(kvm_context); } return r; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] use upstream memslot management code
From: Marcelo Tosatti mtosa...@redhat.com Drop qemu-kvm's implementation in favour of qemu's, they are functionally equivalent. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 8e3cf38..1f13a6d 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -256,10 +256,7 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num, AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev); AssignedDevRegion *region = r_dev-v_addrs[region_num]; PCIRegion *real_region = r_dev-real_device.regions[region_num]; -pcibus_t old_ephys = region-e_physbase; -pcibus_t old_esize = region-e_size; -int first_map = (region-e_size == 0); -int ret = 0; +int ret = 0, flags = 0; DEBUG(e_phys=%08 FMT_PCIBUS r_virt=%p type=%d len=%08 FMT_PCIBUS region_num=%d \n, e_phys, region-u.r_virtbase, type, e_size, region_num); @@ -267,30 +264,22 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, int region_num, region-e_physbase = e_phys; region-e_size = e_size; -if (!first_map) - kvm_destroy_phys_mem(kvm_context, old_ephys, - TARGET_PAGE_ALIGN(old_esize)); - if (e_size 0) { + +if (region_num == PCI_ROM_SLOT) +flags |= IO_MEM_ROM; + +cpu_register_physical_memory(e_phys, e_size, region-memory_index | flags); + /* deal with MSI-X MMIO page */ if (real_region-base_addr = r_dev-msix_table_addr real_region-base_addr + real_region-size = r_dev-msix_table_addr) { int offset = r_dev-msix_table_addr - real_region-base_addr; -ret = munmap(region-u.r_virtbase + offset, TARGET_PAGE_SIZE); -if (ret == 0) -DEBUG(munmap done, virt_base 0x%p\n, -region-u.r_virtbase + offset); -else { -fprintf(stderr, %s: fail munmap msix table!\n, __func__); -exit(1); -} + cpu_register_physical_memory(e_phys + offset, TARGET_PAGE_SIZE, r_dev-mmio_index); } - ret = kvm_register_phys_mem(kvm_context, e_phys, -region-u.r_virtbase, -TARGET_PAGE_ALIGN(e_size), 0); } if (ret != 0) { @@ -539,6 +528,15 @@ static int assigned_dev_register_regions(PCIRegion *io_regions, pci_dev-v_addrs[i].u.r_virtbase += (cur_region-base_addr 0xFFF); + +if (!slow_map) { +void *virtbase = pci_dev-v_addrs[i].u.r_virtbase; + +pci_dev-v_addrs[i].memory_index = qemu_ram_map(cur_region-size, +virtbase); +} else +pci_dev-v_addrs[i].memory_index = 0; + pci_register_bar((PCIDevice *) pci_dev, i, cur_region-size, t, slow_map ? assigned_dev_iomem_map_slow @@ -726,10 +724,6 @@ static void free_assigned_device(AssignedDevice *dev) kvm_remove_ioperm_data(region-u.r_baseport, region-r_size); continue; } else if (pci_region-type IORESOURCE_MEM) { -if (region-e_size 0) -kvm_destroy_phys_mem(kvm_context, region-e_physbase, - TARGET_PAGE_ALIGN(region-e_size)); - if (region-u.r_virtbase) { int ret = munmap(region-u.r_virtbase, (pci_region-size + 0xFFF) 0xF000); diff --git a/hw/device-assignment.h b/hw/device-assignment.h index 1cbfc36..d561112 100644 --- a/hw/device-assignment.h +++ b/hw/device-assignment.h @@ -63,7 +63,7 @@ typedef struct { typedef struct { pcibus_t e_physbase; -uint32_t memory_index; +ram_addr_t memory_index; union { void *r_virtbase;/* mmapped access address for memory regions */ uint32_t r_baseport; /* the base guest port for I/O regions */ diff --git a/kvm-all.c b/kvm-all.c index 87b7f1e..9ac35aa 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -30,7 +30,6 @@ /* KVM uses PAGE_SIZE in it's definition of COALESCED_MMIO_MAX */ #define PAGE_SIZE TARGET_PAGE_SIZE -#ifdef KVM_UPSTREAM //#define DEBUG_KVM @@ -42,6 +41,8 @@ do { } while (0) #endif +#ifdef KVM_UPSTREAM + typedef struct KVMSlot { target_phys_addr_t start_addr; @@ -76,6 +77,8 @@ struct KVMState static KVMState *kvm_state; +#endif + static KVMSlot *kvm_alloc_slot(KVMState *s) { int i; @@ -152,6 +155,7 @@ static int kvm_set_user_memory_region(KVMState *s, KVMSlot *slot) return kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, mem); } +#ifdef KVM_UPSTREAM static void kvm_reset_vcpu(void *opaque) { CPUState *env = opaque;
[COMMIT master] qemu-kvm tests: enhanced msr test
From: Naphtali Sprei nsp...@redhat.com Changed the code structure and added few tests for some of the msr's. Signed-off-by: Naphtali Sprei nsp...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/kvm/user/test/x86/msr.c b/kvm/user/test/x86/msr.c index 0d6f286..662cb4f 100644 --- a/kvm/user/test/x86/msr.c +++ b/kvm/user/test/x86/msr.c @@ -2,7 +2,80 @@ #include libcflat.h -#define MSR_KERNEL_GS_BASE 0xc102 /* SwapGS GS shadow */ +struct msr_info { +int index; +char *name; +struct tc { +int valid; +unsigned long long value; +unsigned long long expected; +} val_pairs[20]; +}; + + +#define addr_64 0x123456789abcULL + +struct msr_info msr_info[] = +{ +{ .index = 0x001b, .name = MSR_IA32_APICBASE, + .val_pairs = { +{ .valid = 1, .value = 0x56789900, .expected = 0x56789900}, +{ .valid = 1, .value = 0x56789D01, .expected = 0x56789D01}, +} +}, +{ .index = 0x0174, .name = IA32_SYSENTER_CS, + .val_pairs = {{ .valid = 1, .value = 0x1234, .expected = 0x1234}} +}, +{ .index = 0x0175, .name = MSR_IA32_SYSENTER_ESP, + .val_pairs = {{ .valid = 1, .value = addr_64, .expected = addr_64}} +}, +{ .index = 0x0176, .name = IA32_SYSENTER_EIP, + .val_pairs = {{ .valid = 1, .value = addr_64, .expected = addr_64}} +}, +{ .index = 0x01a0, .name = MSR_IA32_MISC_ENABLE, + // reserved: 1:2, 4:6, 8:10, 13:15, 17, 19:21, 24:33, 35:63 + .val_pairs = {{ .valid = 1, .value = 0x400c51889, .expected = 0x400c51889}} +}, +{ .index = 0x0277, .name = MSR_IA32_CR_PAT, + .val_pairs = {{ .valid = 1, .value = 0x07070707, .expected = 0x07070707}} +}, +{ .index = 0xc100, .name = MSR_FS_BASE, + .val_pairs = {{ .valid = 1, .value = addr_64, .expected = addr_64}} +}, +{ .index = 0xc101, .name = MSR_GS_BASE, + .val_pairs = {{ .valid = 1, .value = addr_64, .expected = addr_64}} +}, +{ .index = 0xc102, .name = MSR_KERNEL_GS_BASE, + .val_pairs = {{ .valid = 1, .value = addr_64, .expected = addr_64}} +}, +{ .index = 0xc080, .name = MSR_EFER, + .val_pairs = {{ .valid = 1, .value = 0xD00, .expected = 0xD00}} +}, +{ .index = 0xc082, .name = MSR_LSTAR, + .val_pairs = {{ .valid = 1, .value = addr_64, .expected = addr_64}} +}, +{ .index = 0xc083, .name = MSR_CSTAR, + .val_pairs = {{ .valid = 1, .value = addr_64, .expected = addr_64}} +}, +{ .index = 0xc084, .name = MSR_SYSCALL_MASK, + .val_pairs = {{ .valid = 1, .value = 0x, .expected = 0x}} +}, + +//MSR_IA32_DEBUGCTLMSR needs svm feature LBRV +//MSR_VM_HSAVE_PA only AMD host +}; + +static int find_msr_info(int msr_index) +{ +int i; +for (i = 0; i sizeof(msr_info)/sizeof(msr_info[0]) ; i++) { +if (msr_info[i].index == msr_index) { +return i; +} +} +return -1; +} + int nr_passed, nr_tests; @@ -32,23 +105,42 @@ static unsigned long long rdmsr(unsigned index) #endif -static void test_kernel_gs_base(void) -{ -#ifdef __x86_64__ - unsigned long long v1 = 0x123456789abc, v2; - wrmsr(MSR_KERNEL_GS_BASE, v1); - v2 = rdmsr(MSR_KERNEL_GS_BASE); - report(MSR_KERNEL_GS_BASE, v1 == v2); -#endif + +static void test_msr_rw(int msr_index, unsigned long long input, unsigned long long expected) +{ +unsigned long long r = 0; +int index; +char *sptr; +if ((index = find_msr_info(msr_index)) != -1) { +sptr = msr_info[index].name; +} else { +printf(couldn't find name for msr # 0x%x, skipping\n, msr_index); +return; +} +wrmsr(msr_index, input); +r = rdmsr(msr_index); +if (expected != r) { +printf(testing %s: output = 0x%x:0x%x expected = 0x%x:0x%x\n, sptr, r 32, r, expected 32, expected); +} +report(sptr, expected == r); } int main(int ac, char **av) { - test_kernel_gs_base(); +int i, j; +for (i = 0 ; i sizeof(msr_info) / sizeof(msr_info[0]); i++) { +for (j = 0; j sizeof(msr_info[i].val_pairs) / sizeof(msr_info[i].val_pairs[0]); j++) { +if (msr_info[i].val_pairs[j].valid) { +test_msr_rw(msr_info[i].index, msr_info[i].val_pairs[j].value, msr_info[i].val_pairs[j].expected); +} else { +break; +} +} +} - printf(%d tests, %d failures\n, nr_tests, nr_tests - nr_passed); +printf(%d tests, %d failures\n, nr_tests, nr_tests - nr_passed); - return nr_passed == nr_tests ? 0 : 1; +return nr_passed == nr_tests ? 0 : 1; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Document KVM_GET_MP_STATE and KVM_SET_MP_STATE
From: Avi Kivity a...@redhat.com Acked-by: Pekka Enberg penb...@cs.helsinki.fi Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt index baa8fde..a237518 100644 --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt @@ -848,6 +848,50 @@ function properly, this is the place to put them. __u8 pad[64]; }; +4.37 KVM_GET_MP_STATE + +Capability: KVM_CAP_MP_STATE +Architectures: x86, ia64 +Type: vcpu ioctl +Parameters: struct kvm_mp_state (out) +Returns: 0 on success; -1 on error + +struct kvm_mp_state { + __u32 mp_state; +}; + +Returns the vcpu's current multiprocessing state (though also valid on +uniprocessor guests). + +Possible values are: + + - KVM_MP_STATE_RUNNABLE:the vcpu is currently running + - KVM_MP_STATE_UNINITIALIZED: the vcpu is an application processor (AP) + which has not yet received an INIT signal + - KVM_MP_STATE_INIT_RECEIVED: the vcpu has received an INIT signal, and is + now ready for a SIPI + - KVM_MP_STATE_HALTED: the vcpu has executed a HLT instruction and + is waiting for an interrupt + - KVM_MP_STATE_SIPI_RECEIVED: the vcpu has just received a SIPI (vector + accesible via KVM_GET_VCPU_EVENTS) + +This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel +irqchip, the multiprocessing state must be maintained by userspace. + +4.38 KVM_SET_MP_STATE + +Capability: KVM_CAP_MP_STATE +Architectures: x86, ia64 +Type: vcpu ioctl +Parameters: struct kvm_mp_state (in) +Returns: 0 on success; -1 on error + +Sets the vcpu's current multiprocessing state; see KVM_GET_MP_STATE for +arguments. + +This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel +irqchip, the multiprocessing state must be maintained by userspace. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Minor MMU documentation edits
From: Avi Kivity a...@redhat.com Reported by Andrew Jones. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/Documentation/kvm/mmu.txt b/Documentation/kvm/mmu.txt index da04671..0cc28fb 100644 --- a/Documentation/kvm/mmu.txt +++ b/Documentation/kvm/mmu.txt @@ -75,8 +75,8 @@ direct mode; otherwise it operates in shadow mode (see below). Memory == -Guest memory (gpa) is part of user address space of the process that is using -kvm. Userspace defines the translation between guest addresses and user +Guest memory (gpa) is part of the user address space of the process that is +using kvm. Userspace defines the translation between guest addresses and user addresses (gpa-hva); note that two gpas may alias to the same gva, but not vice versa. @@ -111,7 +111,7 @@ is not related to a translation directly. It points to other shadow pages. A leaf spte corresponds to either one or two translations encoded into one paging structure entry. These are always the lowest level of the -translation stack, with an optional higher level translations left to NPT/EPT. +translation stack, with optional higher level translations left to NPT/EPT. Leaf ptes point at guest pages. The following table shows translations encoded by leaf ptes, with higher-level @@ -167,7 +167,7 @@ Shadow pages contain the following information: Either the guest page table containing the translations shadowed by this page, or the base page frame for linear translations. See role.direct. spt: -A pageful of 64-bit sptes containig the translations for this page. +A pageful of 64-bit sptes containing the translations for this page. Accessed by both kvm and hardware. The page pointed to by spt will have its page-private pointing back at the shadow page structure. @@ -235,7 +235,7 @@ the amount of emulation we have to do when the guest modifies multiple gptes, or when the a guest page is no longer used as a page table and is used for random guest data. -As a side effect we have resynchronize all reachable unsynchronized shadow +As a side effect we have to resynchronize all reachable unsynchronized shadow pages on a tlb flush. -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: MMU: fix hashing for TDP and non-paging modes
From: Eric Northup digitale...@google.com For TDP mode, avoid creating multiple page table roots for the single guest-to-host physical address map by fixing the inputs used for the shadow page table hash in mmu_alloc_roots(). Signed-off-by: Eric Northup digitale...@google.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ddfa865..9696d65 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2059,10 +2059,12 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) hpa_t root = vcpu-arch.mmu.root_hpa; ASSERT(!VALID_PAGE(root)); - if (tdp_enabled) - direct = 1; if (mmu_check_root(vcpu, root_gfn)) return 1; + if (tdp_enabled) { + direct = 1; + root_gfn = 0; + } sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL, direct, ACC_ALL, NULL); @@ -2072,8 +2074,6 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) return 0; } direct = !is_paging(vcpu); - if (tdp_enabled) - direct = 1; for (i = 0; i 4; ++i) { hpa_t root = vcpu-arch.mmu.pae_root[i]; @@ -2089,6 +2089,10 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) root_gfn = 0; if (mmu_check_root(vcpu, root_gfn)) return 1; + if (tdp_enabled) { + direct = 1; + root_gfn = i 30; + } sp = kvm_mmu_get_page(vcpu, root_gfn, i 30, PT32_ROOT_LEVEL, direct, ACC_ALL, NULL); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: VMX: Add definition for msr autoload entry
From: Avi Kivity a...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index fb9a080..4497318 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -25,6 +25,8 @@ * */ +#include linux/types.h + /* * Definitions of Primary Processor-Based VM-Execution Controls. */ @@ -394,6 +396,10 @@ enum vmcs_field { #define ASM_VMX_INVEPT .byte 0x66, 0x0f, 0x38, 0x80, 0x08 #define ASM_VMX_INVVPID .byte 0x66, 0x0f, 0x38, 0x81, 0x08 - +struct vmx_msr_entry { + u32 index; + u32 reserved; + u64 value; +} __aligned(16); #endif -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Fix mmu shrinker error
From: Gui Jianfeng guijianf...@cn.fujitsu.com kvm_mmu_remove_one_alloc_mmu_page() assumes kvm_mmu_zap_page() only reclaims only one sp, but that's not the case. This will cause mmu shrinker returns a wrong number. This patch fix the counting error. Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 9696d65..18d2f58 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2902,13 +2902,13 @@ restart: kvm_flush_remote_tlbs(kvm); } -static void kvm_mmu_remove_one_alloc_mmu_page(struct kvm *kvm) +static int kvm_mmu_remove_some_alloc_mmu_pages(struct kvm *kvm) { struct kvm_mmu_page *page; page = container_of(kvm-arch.active_mmu_pages.prev, struct kvm_mmu_page, link); - kvm_mmu_zap_page(kvm, page); + return kvm_mmu_zap_page(kvm, page) + 1; } static int mmu_shrink(int nr_to_scan, gfp_t gfp_mask) @@ -2920,7 +2920,7 @@ static int mmu_shrink(int nr_to_scan, gfp_t gfp_mask) spin_lock(kvm_lock); list_for_each_entry(kvm, vm_list, vm_list) { - int npages, idx; + int npages, idx, freed_pages; idx = srcu_read_lock(kvm-srcu); spin_lock(kvm-mmu_lock); @@ -2928,8 +2928,8 @@ static int mmu_shrink(int nr_to_scan, gfp_t gfp_mask) kvm-arch.n_free_mmu_pages; cache_count += npages; if (!kvm_freed nr_to_scan 0 npages 0) { - kvm_mmu_remove_one_alloc_mmu_page(kvm); - cache_count--; + freed_pages = kvm_mmu_remove_some_alloc_mmu_pages(kvm); + cache_count -= freed_pages; kvm_freed = kvm; } nr_to_scan--; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: MMU: move unsync/sync tracpoints to proper place
From: Xiao Guangrong xiaoguangr...@cn.fujitsu.com Move unsync/sync tracepoints to the proper place, it's good for us to obtain unsync page live time Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 18d2f58..51eb6d6 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1189,6 +1189,7 @@ static struct kvm_mmu_page *kvm_mmu_lookup_page(struct kvm *kvm, gfn_t gfn) static void kvm_unlink_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp) { WARN_ON(!sp-unsync); + trace_kvm_mmu_sync_page(sp); sp-unsync = 0; --kvm-stat.mmu_unsync; } @@ -1202,7 +1203,6 @@ static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) return 1; } - trace_kvm_mmu_sync_page(sp); if (rmap_write_protect(vcpu-kvm, sp-gfn)) kvm_flush_remote_tlbs(vcpu-kvm); kvm_unlink_unsync_page(vcpu-kvm, sp); @@ -1730,7 +1730,6 @@ static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) struct kvm_mmu_page *s; struct hlist_node *node, *n; - trace_kvm_mmu_unsync_page(sp); index = kvm_page_table_hashfn(sp-gfn); bucket = vcpu-kvm-arch.mmu_page_hash[index]; /* don't unsync if pagetable is shadowed with multiple roles */ @@ -1740,6 +1739,7 @@ static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) if (s-role.word != sp-role.word) return 1; } + trace_kvm_mmu_unsync_page(sp); ++vcpu-kvm-stat.mmu_unsync; sp-unsync = 1; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)
From: Shane Wang shane.w...@intel.com Per document, for feature control MSR: Bit 1 enables VMXON in SMX operation. If the bit is clear, execution of VMXON in SMX operation causes a general-protection exception. Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution of VMXON outside SMX operation causes a general-protection exception. This patch is to enable this kind of check with SMX for VMXON in KVM. Signed-off-by: Shane Wang shane.w...@intel.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index bc473ac..f932485 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -202,8 +202,9 @@ #define MSR_IA32_EBL_CR_POWERON0x002a #define MSR_IA32_FEATURE_CONTROL0x003a -#define FEATURE_CONTROL_LOCKED (10) -#define FEATURE_CONTROL_VMXON_ENABLED (12) +#define FEATURE_CONTROL_LOCKED (10) +#define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (11) +#define FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX (12) #define MSR_IA32_APICBASE 0x001b #define MSR_IA32_APICBASE_BSP (18) diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c index 86c9f91..46b8277 100644 --- a/arch/x86/kernel/tboot.c +++ b/arch/x86/kernel/tboot.c @@ -46,6 +46,7 @@ /* Global pointer to shared data; NULL means no measured launch. */ struct tboot *tboot __read_mostly; +EXPORT_SYMBOL(tboot); /* timeout for APs (in secs) to enter wait-for-SIPI state during shutdown */ #define AP_WAIT_TIMEOUT1 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c4f3955..d2a47ae 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -27,6 +27,7 @@ #include linux/moduleparam.h #include linux/ftrace_event.h #include linux/slab.h +#include linux/tboot.h #include kvm_cache_regs.h #include x86.h @@ -1272,9 +1273,16 @@ static __init int vmx_disabled_by_bios(void) u64 msr; rdmsrl(MSR_IA32_FEATURE_CONTROL, msr); - return (msr (FEATURE_CONTROL_LOCKED | - FEATURE_CONTROL_VMXON_ENABLED)) - == FEATURE_CONTROL_LOCKED; + if (msr FEATURE_CONTROL_LOCKED) { + if (!(msr FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX) +tboot_enabled()) + return 1; + if (!(msr FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX) +!tboot_enabled()) + return 1; + } + + return 0; /* locked but not enabled */ } @@ -1282,21 +1290,23 @@ static int hardware_enable(void *garbage) { int cpu = raw_smp_processor_id(); u64 phys_addr = __pa(per_cpu(vmxarea, cpu)); - u64 old; + u64 old, test_bits; if (read_cr4() X86_CR4_VMXE) return -EBUSY; INIT_LIST_HEAD(per_cpu(vcpus_on_cpu, cpu)); rdmsrl(MSR_IA32_FEATURE_CONTROL, old); - if ((old (FEATURE_CONTROL_LOCKED | - FEATURE_CONTROL_VMXON_ENABLED)) - != (FEATURE_CONTROL_LOCKED | - FEATURE_CONTROL_VMXON_ENABLED)) + + test_bits = FEATURE_CONTROL_LOCKED; + test_bits |= FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX; + if (tboot_enabled()) + test_bits |= FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX; + + if ((old test_bits) != test_bits) { /* enable and lock */ - wrmsrl(MSR_IA32_FEATURE_CONTROL, old | - FEATURE_CONTROL_LOCKED | - FEATURE_CONTROL_VMXON_ENABLED); + wrmsrl(MSR_IA32_FEATURE_CONTROL, old | test_bits); + } write_cr4(read_cr4() | X86_CR4_VMXE); /* FIXME: not cpu hotplug safe */ asm volatile (ASM_VMX_VMXON_RAX : : a(phys_addr), m(phys_addr) diff --git a/include/linux/tboot.h b/include/linux/tboot.h index bf2a0c7..1dba6ee 100644 --- a/include/linux/tboot.h +++ b/include/linux/tboot.h @@ -150,6 +150,7 @@ extern int tboot_force_iommu(void); #else +#define tboot_enabled()0 #define tboot_probe() do { } while (0) #define tboot_shutdown(shutdown_type) do { } while (0) #define tboot_sleep(sleep_state, pm1a_control, pm1b_control) \ -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: VMX: Avoid writing HOST_CR0 every entry
From: Avi Kivity a...@redhat.com cr0.ts may change between entries, so we copy cr0 to HOST_CR0 before each entry. That is slow, so instead, set HOST_CR0 to have TS set unconditionally (which is a safe value), and issue a clts() just before exiting vcpu context if the task indeed owns the fpu. Saves ~50 cycles/exit. Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ba0fd42..777e00d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -812,6 +812,8 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx) wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base); } #endif + if (current_thread_info()-status TS_USEDFPU) + clts(); } static void vmx_load_host_state(struct vcpu_vmx *vmx) @@ -2510,7 +2512,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, !!bypass_guest_pf); vmcs_write32(CR3_TARGET_COUNT, 0); /* 22.2.1 */ - vmcs_writel(HOST_CR0, read_cr0()); /* 22.2.3 */ + vmcs_writel(HOST_CR0, read_cr0() | X86_CR0_TS); /* 22.2.3 */ vmcs_writel(HOST_CR4, read_cr4()); /* 22.2.3, 22.2.5 */ vmcs_writel(HOST_CR3, read_cr3()); /* 22.2.3 FIXME: shadow tables */ @@ -3863,11 +3865,6 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu) if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) vmx_set_interrupt_shadow(vcpu, 0); - /* -* Loading guest fpu may have cleared host cr0.ts -*/ - vmcs_writel(HOST_CR0, read_cr0()); - asm( /* Store host registers */ push %%Rdx; push %%Rbp; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f6f8dad..8e267ab 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1723,8 +1723,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) { - kvm_put_guest_fpu(vcpu); kvm_x86_ops-vcpu_put(vcpu); + kvm_put_guest_fpu(vcpu); } static int is_efer_nx(void) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: Fix wallclock version writing race
From: Avi Kivity a...@redhat.com Wallclock writing uses an unprotected global variable to hold the version; this can cause one guest to interfere with another if both write their wallclock at the same time. Acked-by: Glauber Costa glom...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8e267ab..4d0a968 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -754,14 +754,22 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data) static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock) { - static int version; + int version; + int r; struct pvclock_wall_clock wc; struct timespec boot; if (!wall_clock) return; - version++; + r = kvm_read_guest(kvm, wall_clock, version, sizeof(version)); + if (r) + return; + + if (version 1) + ++version; /* first time write, random junk */ + + ++version; kvm_write_guest(kvm, wall_clock, version, sizeof(version)); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: cleanup nop emulation
From: Gleb Natapov g...@redhat.com Make it more explicit what we are checking for. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index a99d49c..03a7291 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2799,8 +2799,8 @@ special_insn: goto done; break; case 0x90: /* nop / xchg r8,rax */ - if (!(c-rex_prefix 1)) { /* nop */ - c-dst.type = OP_NONE; + if (c-dst.ptr == (unsigned long *)c-regs[VCPU_REGS_RAX]) { + c-dst.type = OP_NONE; /* nop */ break; } case 0x91 ... 0x97: /* xchg reg,rax */ -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: add get_cached_segment_base() callback to x86_emulate_ops
From: Gleb Natapov g...@redhat.com On VMX it is expensive to call get_cached_descriptor() just to get segment base since multiple vmcs_reads are done instead of only one. Introduce new call back get_cached_segment_base() for efficiency. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index f751657..df53ba2 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -132,6 +132,7 @@ struct x86_emulate_ops { int seg, struct kvm_vcpu *vcpu); u16 (*get_segment_selector)(int seg, struct kvm_vcpu *vcpu); void (*set_segment_selector)(u16 sel, int seg, struct kvm_vcpu *vcpu); + unsigned long (*get_cached_segment_base)(int seg, struct kvm_vcpu *vcpu); void (*get_gdt)(struct desc_ptr *dt, struct kvm_vcpu *vcpu); ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 7c8ed56..8228778 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2097,17 +2097,6 @@ static bool emulator_io_permited(struct x86_emulate_ctxt *ctxt, return true; } -static u32 get_cached_descriptor_base(struct x86_emulate_ctxt *ctxt, - struct x86_emulate_ops *ops, - int seg) -{ - struct desc_struct desc; - if (ops-get_cached_descriptor(desc, seg, ctxt-vcpu)) - return get_desc_base(desc); - else - return ~0; -} - static void save_state_to_tss16(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops, struct tss_segment_16 *tss) @@ -2383,7 +2372,7 @@ static int emulator_do_task_switch(struct x86_emulate_ctxt *ctxt, int ret; u16 old_tss_sel = ops-get_segment_selector(VCPU_SREG_TR, ctxt-vcpu); ulong old_tss_base = - get_cached_descriptor_base(ctxt, ops, VCPU_SREG_TR); + ops-get_cached_segment_base(VCPU_SREG_TR, ctxt-vcpu); u32 desc_limit; /* FIXME: old_tss_base == ~0 ? */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 673efbe..29cc2b1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3669,6 +3669,12 @@ static void emulator_get_gdt(struct desc_ptr *dt, struct kvm_vcpu *vcpu) kvm_x86_ops-get_gdt(vcpu, dt); } +static unsigned long emulator_get_cached_segment_base(int seg, + struct kvm_vcpu *vcpu) +{ + return get_segment_base(vcpu, seg); +} + static bool emulator_get_cached_descriptor(struct desc_struct *desc, int seg, struct kvm_vcpu *vcpu) { @@ -3759,6 +3765,7 @@ static struct x86_emulate_ops emulate_ops = { .set_cached_descriptor = emulator_set_cached_descriptor, .get_segment_selector = emulator_get_segment_selector, .set_segment_selector = emulator_set_segment_selector, + .get_cached_segment_base = emulator_get_cached_segment_base, .get_gdt = emulator_get_gdt, .get_cr = emulator_get_cr, .set_cr = emulator_set_cr, -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: cleanup xchg emulation
From: Gleb Natapov g...@redhat.com Dst operand is already initialized during decoding stage. No need to reinitialize. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index a81e6bf..a99d49c 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2804,8 +2804,8 @@ special_insn: break; } case 0x91 ... 0x97: /* xchg reg,rax */ - c-src.type = c-dst.type = OP_REG; - c-src.bytes = c-dst.bytes = c-op_bytes; + c-src.type = OP_REG; + c-src.bytes = c-op_bytes; c-src.ptr = (unsigned long *) c-regs[VCPU_REGS_RAX]; c-src.val = *(c-src.ptr); goto xchg; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: handle emulation failure case first
From: Gleb Natapov g...@redhat.com If emulation failed return immediately. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 01bb1f3..4121a9f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3879,22 +3879,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu, restart: r = x86_emulate_insn(vcpu-arch.emulate_ctxt, emulate_ops); - shadow_mask = vcpu-arch.emulate_ctxt.interruptibility; - - if (r == 0) - kvm_x86_ops-set_interrupt_shadow(vcpu, shadow_mask); - - if (vcpu-arch.pio.count) { - if (!vcpu-arch.pio.in) - vcpu-arch.pio.count = 0; - return EMULATE_DO_MMIO; - } - - if (vcpu-mmio_needed) { - if (vcpu-mmio_is_write) - vcpu-mmio_needed = 0; - return EMULATE_DO_MMIO; - } if (r) { /* emulation failed */ /* @@ -3910,6 +3894,21 @@ restart: return EMULATE_FAIL; } + shadow_mask = vcpu-arch.emulate_ctxt.interruptibility; + kvm_x86_ops-set_interrupt_shadow(vcpu, shadow_mask); + + if (vcpu-arch.pio.count) { + if (!vcpu-arch.pio.in) + vcpu-arch.pio.count = 0; + return EMULATE_DO_MMIO; + } + + if (vcpu-mmio_needed) { + if (vcpu-mmio_is_write) + vcpu-mmio_needed = 0; + return EMULATE_DO_MMIO; + } + if (vcpu-arch.exception.pending) vcpu-arch.emulate_ctxt.restart = false; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: handle far address source operand
From: Gleb Natapov g...@redhat.com ljmp/lcall instruction operand contains address and segment. It can be 10 bytes long. Currently we decode it as two different operands. Fix it by introducing new kind of operand that can hold entire far address. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 288cbed..69a64a6 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -143,7 +143,11 @@ struct x86_emulate_ops { struct operand { enum { OP_REG, OP_MEM, OP_IMM, OP_NONE } type; unsigned int bytes; - unsigned long val, orig_val, *ptr; + unsigned long orig_val, *ptr; + union { + unsigned long val; + char valptr[sizeof(unsigned long) + 2]; + }; }; struct fetch_cache { diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 03a7291..687ea09 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -67,6 +67,8 @@ #define SrcImmUByte (84) /* 8-bit unsigned immediate operand. */ #define SrcImmU (94) /* Immediate operand, unsigned */ #define SrcSI (0xa4) /* Source is in the DS:RSI */ +#define SrcImmFAddr (0xb4) /* Source is immediate far address */ +#define SrcMemFAddr (0xc4) /* Source is far address in memory */ #define SrcMask (0xf4) /* Generic ModRM decode. */ #define ModRM (18) @@ -88,10 +90,6 @@ #define Src2CL (129) #define Src2ImmByte (229) #define Src2One (329) -#define Src2Imm16 (429) -#define Src2Mem16 (529) /* Used for Ep encoding. First argument has to be - in memory and second argument is located - immediately after the first one in memory. */ #define Src2Mask(729) enum { @@ -175,7 +173,7 @@ static u32 opcode_table[256] = { /* 0x90 - 0x97 */ DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, DstReg, /* 0x98 - 0x9F */ - 0, 0, SrcImm | Src2Imm16 | No64, 0, + 0, 0, SrcImmFAddr | No64, 0, ImplicitOps | Stack, ImplicitOps | Stack, 0, 0, /* 0xA0 - 0xA7 */ ByteOp | DstReg | SrcMem | Mov | MemAbs, DstReg | SrcMem | Mov | MemAbs, @@ -215,7 +213,7 @@ static u32 opcode_table[256] = { ByteOp | SrcImmUByte | DstAcc, SrcImmUByte | DstAcc, /* 0xE8 - 0xEF */ SrcImm | Stack, SrcImm | ImplicitOps, - SrcImmU | Src2Imm16 | No64, SrcImmByte | ImplicitOps, + SrcImmFAddr | No64, SrcImmByte | ImplicitOps, SrcNone | ByteOp | DstAcc, SrcNone | DstAcc, SrcNone | ByteOp | DstAcc, SrcNone | DstAcc, /* 0xF0 - 0xF7 */ @@ -350,7 +348,7 @@ static u32 group_table[] = { [Group5*8] = DstMem | SrcNone | ModRM, DstMem | SrcNone | ModRM, SrcMem | ModRM | Stack, 0, - SrcMem | ModRM | Stack, SrcMem | ModRM | Src2Mem16 | ImplicitOps, + SrcMem | ModRM | Stack, SrcMemFAddr | ModRM | ImplicitOps, SrcMem | ModRM | Stack, 0, [Group7*8] = 0, 0, ModRM | SrcMem | Priv, ModRM | SrcMem | Priv, @@ -576,6 +574,13 @@ static u32 group2_table[] = { (_type)_x; \ }) +#define insn_fetch_arr(_arr, _size, _eip)\ +({ rc = do_insn_fetch(ctxt, ops, (_eip), _arr, (_size)); \ + if (rc != X86EMUL_CONTINUE) \ + goto done; \ + (_eip) += (_size); \ +}) + static inline unsigned long ad_mask(struct decode_cache *c) { return (1UL (c-ad_bytes 3)) - 1; @@ -1160,6 +1165,17 @@ done_prefixes: c-regs[VCPU_REGS_RSI]); c-src.val = 0; break; + case SrcImmFAddr: + c-src.type = OP_IMM; + c-src.ptr = (unsigned long *)c-eip; + c-src.bytes = c-op_bytes + 2; + insn_fetch_arr(c-src.valptr, c-src.bytes, c-eip); + break; + case SrcMemFAddr: + c-src.type = OP_MEM; + c-src.ptr = (unsigned long *)c-modrm_ea; + c-src.bytes = c-op_bytes + 2; + break; } /* @@ -1179,22 +1195,10 @@ done_prefixes: c-src2.bytes = 1; c-src2.val = insn_fetch(u8, 1, c-eip); break; - case Src2Imm16: - c-src2.type = OP_IMM; - c-src2.ptr = (unsigned long *)c-eip; - c-src2.bytes = 2; - c-src2.val = insn_fetch(u16, 2, c-eip); - break; case Src2One: c-src2.bytes = 1; c-src2.val = 1; break; - case Src2Mem16: - c-src2.type = OP_MEM; - c-src2.bytes = 2; -
[COMMIT master] KVM: x86 emulator: set RFLAGS outside x86 emulator code
From: Gleb Natapov g...@redhat.com Removes the need for set_flags() callback. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index b7e00cb..a87d95f 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -142,7 +142,6 @@ struct x86_emulate_ops { ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); int (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); int (*cpl)(struct kvm_vcpu *vcpu); - void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags); int (*get_dr)(int dr, unsigned long *dest, struct kvm_vcpu *vcpu); int (*set_dr)(int dr, unsigned long value, struct kvm_vcpu *vcpu); int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 437f31b..291e220 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3034,7 +3034,6 @@ writeback: /* Commit shadow register state. */ memcpy(ctxt-vcpu-arch.regs, c-regs, sizeof c-regs); ctxt-eip = c-eip; - ops-set_rflags(ctxt-vcpu, ctxt-eflags); done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3544ea9..f42be00 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3761,11 +3761,6 @@ static void emulator_set_segment_selector(u16 sel, int seg, kvm_set_segment(vcpu, kvm_seg, seg); } -static void emulator_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) -{ - kvm_x86_ops-set_rflags(vcpu, rflags); -} - static struct x86_emulate_ops emulate_ops = { .read_std= kvm_read_guest_virt_system, .write_std = kvm_write_guest_virt_system, @@ -3784,7 +3779,6 @@ static struct x86_emulate_ops emulate_ops = { .get_cr = emulator_get_cr, .set_cr = emulator_set_cr, .cpl = emulator_get_cpl, - .set_rflags = emulator_set_rflags, .get_dr = emulator_get_dr, .set_dr = emulator_set_dr, .set_msr = kvm_set_msr, @@ -3896,6 +3890,7 @@ restart: shadow_mask = vcpu-arch.emulate_ctxt.interruptibility; kvm_x86_ops-set_interrupt_shadow(vcpu, shadow_mask); + kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags); kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip); if (vcpu-arch.pio.count) { -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: fill in run-mmio details in (read|write)_emulated function
From: Gleb Natapov g...@redhat.com Fill in run-mmio details in (read|write)_emulated function just like pio does. There is no point in filling only vcpu fields there just to copy them into vcpu-run a little bit later. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index dfad042..55496f4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3341,9 +3341,10 @@ mmio: trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, bytes, gpa, 0); vcpu-mmio_needed = 1; - vcpu-mmio_phys_addr = gpa; - vcpu-mmio_size = bytes; - vcpu-mmio_is_write = 0; + vcpu-run-exit_reason = KVM_EXIT_MMIO; + vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa; + vcpu-run-mmio.len = vcpu-mmio_size = bytes; + vcpu-run-mmio.is_write = vcpu-mmio_is_write = 0; return X86EMUL_UNHANDLEABLE; } @@ -3391,10 +3392,11 @@ mmio: return X86EMUL_CONTINUE; vcpu-mmio_needed = 1; - vcpu-mmio_phys_addr = gpa; - vcpu-mmio_size = bytes; - vcpu-mmio_is_write = 1; - memcpy(vcpu-mmio_data, val, bytes); + vcpu-run-exit_reason = KVM_EXIT_MMIO; + vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa; + vcpu-run-mmio.len = vcpu-mmio_size = bytes; + vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1; + memcpy(vcpu-run-mmio.data, val, bytes); return X86EMUL_CONTINUE; } @@ -3805,7 +3807,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu, { int r, shadow_mask; struct decode_cache *c; - struct kvm_run *run = vcpu-run; kvm_clear_exception_queue(vcpu); vcpu-arch.mmio_fault_cr2 = cr2; @@ -3892,14 +3893,6 @@ restart: return EMULATE_DO_MMIO; } - if (r || vcpu-mmio_is_write) { - run-exit_reason = KVM_EXIT_MMIO; - run-mmio.phys_addr = vcpu-mmio_phys_addr; - memcpy(run-mmio.data, vcpu-mmio_data, 8); - run-mmio.len = vcpu-mmio_size; - run-mmio.is_write = vcpu-mmio_is_write; - } - if (r) { if (kvm_mmu_unprotect_page_virt(vcpu, cr2)) goto done; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: fix X86EMUL_RETRY_INSTR and X86EMUL_CMPXCHG_FAILED values
From: Gleb Natapov g...@redhat.com Currently X86EMUL_PROPAGATE_FAULT, X86EMUL_RETRY_INSTR and X86EMUL_CMPXCHG_FAILED have the same value so caller cannot distinguish why function such as emulator_cmpxchg_emulated() (which can return both X86EMUL_PROPAGATE_FAULT and X86EMUL_CMPXCHG_FAILED) failed. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 6c4f491..0cf4311 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -51,8 +51,9 @@ struct x86_emulate_ctxt; #define X86EMUL_UNHANDLEABLE1 /* Terminate emulation but return success to the caller. */ #define X86EMUL_PROPAGATE_FAULT 2 /* propagate a generated fault to guest */ -#define X86EMUL_RETRY_INSTR 2 /* retry the instruction for some reason */ -#define X86EMUL_CMPXCHG_FAILED 2 /* cmpxchg did not see expected value */ +#define X86EMUL_RETRY_INSTR 3 /* retry the instruction for some reason */ +#define X86EMUL_CMPXCHG_FAILED 4 /* cmpxchg did not see expected value */ + struct x86_emulate_ops { /* * read_std: Read bytes of standard (non-emulated/special) memory. -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: cleanup some direct calls into kvm to use existing callbacks
From: Gleb Natapov g...@redhat.com Use callbacks from x86_emulate_ops to access segments instead of calling into kvm directly. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 8228778..f56ec48 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -622,31 +622,35 @@ static void set_seg_override(struct decode_cache *c, int seg) c-seg_override = seg; } -static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, int seg) +static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, int seg) { if (ctxt-mode == X86EMUL_MODE_PROT64 seg VCPU_SREG_FS) return 0; - return kvm_x86_ops-get_segment_base(ctxt-vcpu, seg); + return ops-get_cached_segment_base(seg, ctxt-vcpu); } static unsigned long seg_override_base(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops, struct decode_cache *c) { if (!c-has_seg_override) return 0; - return seg_base(ctxt, c-seg_override); + return seg_base(ctxt, ops, c-seg_override); } -static unsigned long es_base(struct x86_emulate_ctxt *ctxt) +static unsigned long es_base(struct x86_emulate_ctxt *ctxt, +struct x86_emulate_ops *ops) { - return seg_base(ctxt, VCPU_SREG_ES); + return seg_base(ctxt, ops, VCPU_SREG_ES); } -static unsigned long ss_base(struct x86_emulate_ctxt *ctxt) +static unsigned long ss_base(struct x86_emulate_ctxt *ctxt, +struct x86_emulate_ops *ops) { - return seg_base(ctxt, VCPU_SREG_SS); + return seg_base(ctxt, ops, VCPU_SREG_SS); } static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt, @@ -941,7 +945,7 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) memset(c, 0, sizeof(struct decode_cache)); c-eip = ctxt-eip; c-fetch.start = c-fetch.end = c-eip; - ctxt-cs_base = seg_base(ctxt, VCPU_SREG_CS); + ctxt-cs_base = seg_base(ctxt, ops, VCPU_SREG_CS); memcpy(c-regs, ctxt-vcpu-arch.regs, sizeof c-regs); switch (mode) { @@ -1065,7 +1069,7 @@ done_prefixes: set_seg_override(c, VCPU_SREG_DS); if (!(!c-twobyte c-b == 0x8d)) - c-modrm_ea += seg_override_base(ctxt, c); + c-modrm_ea += seg_override_base(ctxt, ops, c); if (c-ad_bytes != 8) c-modrm_ea = (u32)c-modrm_ea; @@ -1161,7 +1165,7 @@ done_prefixes: c-src.type = OP_MEM; c-src.bytes = (c-d ByteOp) ? 1 : c-op_bytes; c-src.ptr = (unsigned long *) - register_address(c, seg_override_base(ctxt, c), + register_address(c, seg_override_base(ctxt, ops, c), c-regs[VCPU_REGS_RSI]); c-src.val = 0; break; @@ -1257,7 +1261,7 @@ done_prefixes: c-dst.type = OP_MEM; c-dst.bytes = (c-d ByteOp) ? 1 : c-op_bytes; c-dst.ptr = (unsigned long *) - register_address(c, es_base(ctxt), + register_address(c, es_base(ctxt, ops), c-regs[VCPU_REGS_RDI]); c-dst.val = 0; break; @@ -1516,7 +1520,8 @@ exception: return X86EMUL_PROPAGATE_FAULT; } -static inline void emulate_push(struct x86_emulate_ctxt *ctxt) +static inline void emulate_push(struct x86_emulate_ctxt *ctxt, + struct x86_emulate_ops *ops) { struct decode_cache *c = ctxt-decode; @@ -1524,7 +1529,7 @@ static inline void emulate_push(struct x86_emulate_ctxt *ctxt) c-dst.bytes = c-op_bytes; c-dst.val = c-src.val; register_address_increment(c, c-regs[VCPU_REGS_RSP], -c-op_bytes); - c-dst.ptr = (void *) register_address(c, ss_base(ctxt), + c-dst.ptr = (void *) register_address(c, ss_base(ctxt, ops), c-regs[VCPU_REGS_RSP]); } @@ -1535,7 +1540,7 @@ static int emulate_pop(struct x86_emulate_ctxt *ctxt, struct decode_cache *c = ctxt-decode; int rc; - rc = read_emulated(ctxt, ops, register_address(c, ss_base(ctxt), + rc = read_emulated(ctxt, ops, register_address(c, ss_base(ctxt, ops), c-regs[VCPU_REGS_RSP]), dest, len); if (rc != X86EMUL_CONTINUE) @@ -1588,15 +1593,14 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt, return rc; } -static void emulate_push_sreg(struct x86_emulate_ctxt *ctxt, int seg) +static void emulate_push_sreg(struct x86_emulate_ctxt *ctxt, + struct
[COMMIT master] KVM: x86 emulator: add (set|get)_msr callbacks to x86_emulate_ops
From: Gleb Natapov g...@redhat.com Add (set|get)_msr callbacks to x86_emulate_ops instead of calling them directly. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index c37296d..f751657 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -139,6 +139,8 @@ struct x86_emulate_ops { void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags); int (*get_dr)(int dr, unsigned long *dest, struct kvm_vcpu *vcpu); int (*set_dr)(int dr, unsigned long value, struct kvm_vcpu *vcpu); + int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); + int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata); }; /* Type, address-of, and value of an instruction's operand. */ diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 8a4aa73..7c8ed56 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1875,7 +1875,7 @@ setup_syscalls_segments(struct x86_emulate_ctxt *ctxt, } static int -emulate_syscall(struct x86_emulate_ctxt *ctxt) +emulate_syscall(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { struct decode_cache *c = ctxt-decode; struct kvm_segment cs, ss; @@ -1890,7 +1890,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt) setup_syscalls_segments(ctxt, cs, ss); - kvm_x86_ops-get_msr(ctxt-vcpu, MSR_STAR, msr_data); + ops-get_msr(ctxt-vcpu, MSR_STAR, msr_data); msr_data = 32; cs.selector = (u16)(msr_data 0xfffc); ss.selector = (u16)(msr_data + 8); @@ -1907,17 +1907,17 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt) #ifdef CONFIG_X86_64 c-regs[VCPU_REGS_R11] = ctxt-eflags ~EFLG_RF; - kvm_x86_ops-get_msr(ctxt-vcpu, - ctxt-mode == X86EMUL_MODE_PROT64 ? - MSR_LSTAR : MSR_CSTAR, msr_data); + ops-get_msr(ctxt-vcpu, +ctxt-mode == X86EMUL_MODE_PROT64 ? +MSR_LSTAR : MSR_CSTAR, msr_data); c-eip = msr_data; - kvm_x86_ops-get_msr(ctxt-vcpu, MSR_SYSCALL_MASK, msr_data); + ops-get_msr(ctxt-vcpu, MSR_SYSCALL_MASK, msr_data); ctxt-eflags = ~(msr_data | EFLG_RF); #endif } else { /* legacy mode */ - kvm_x86_ops-get_msr(ctxt-vcpu, MSR_STAR, msr_data); + ops-get_msr(ctxt-vcpu, MSR_STAR, msr_data); c-eip = (u32)msr_data; ctxt-eflags = ~(EFLG_VM | EFLG_IF | EFLG_RF); @@ -1927,7 +1927,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt) } static int -emulate_sysenter(struct x86_emulate_ctxt *ctxt) +emulate_sysenter(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { struct decode_cache *c = ctxt-decode; struct kvm_segment cs, ss; @@ -1949,7 +1949,7 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt) setup_syscalls_segments(ctxt, cs, ss); - kvm_x86_ops-get_msr(ctxt-vcpu, MSR_IA32_SYSENTER_CS, msr_data); + ops-get_msr(ctxt-vcpu, MSR_IA32_SYSENTER_CS, msr_data); switch (ctxt-mode) { case X86EMUL_MODE_PROT32: if ((msr_data 0xfffc) == 0x0) { @@ -1979,17 +1979,17 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt) kvm_x86_ops-set_segment(ctxt-vcpu, cs, VCPU_SREG_CS); kvm_x86_ops-set_segment(ctxt-vcpu, ss, VCPU_SREG_SS); - kvm_x86_ops-get_msr(ctxt-vcpu, MSR_IA32_SYSENTER_EIP, msr_data); + ops-get_msr(ctxt-vcpu, MSR_IA32_SYSENTER_EIP, msr_data); c-eip = msr_data; - kvm_x86_ops-get_msr(ctxt-vcpu, MSR_IA32_SYSENTER_ESP, msr_data); + ops-get_msr(ctxt-vcpu, MSR_IA32_SYSENTER_ESP, msr_data); c-regs[VCPU_REGS_RSP] = msr_data; return X86EMUL_CONTINUE; } static int -emulate_sysexit(struct x86_emulate_ctxt *ctxt) +emulate_sysexit(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) { struct decode_cache *c = ctxt-decode; struct kvm_segment cs, ss; @@ -2012,7 +2012,7 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt) cs.dpl = 3; ss.dpl = 3; - kvm_x86_ops-get_msr(ctxt-vcpu, MSR_IA32_SYSENTER_CS, msr_data); + ops-get_msr(ctxt-vcpu, MSR_IA32_SYSENTER_CS, msr_data); switch (usermode) { case X86EMUL_MODE_PROT32: cs.selector = (u16)(msr_data + 16); @@ -3099,7 +3099,7 @@ twobyte_insn: } break; case 0x05: /* syscall */ - rc = emulate_syscall(ctxt); + rc = emulate_syscall(ctxt, ops); if (rc != X86EMUL_CONTINUE) goto done; else @@ -3155,7 +3155,7 @@ twobyte_insn: /* wrmsr */ msr_data = (u32)c-regs[VCPU_REGS_RAX] |
[COMMIT master] KVM: x86 emulator: move interruptibility state tracking out of emulator
From: Gleb Natapov g...@redhat.com Emulator shouldn't access vcpu directly. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 97a42e8..c40b405 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1843,20 +1843,6 @@ static inline int writeback(struct x86_emulate_ctxt *ctxt, return X86EMUL_CONTINUE; } -static void toggle_interruptibility(struct x86_emulate_ctxt *ctxt, u32 mask) -{ - u32 int_shadow = kvm_x86_ops-get_interrupt_shadow(ctxt-vcpu, mask); - /* -* an sti; sti; sequence only disable interrupts for the first -* instruction. So, if the last instruction, be it emulated or -* not, left the system with the INT_STI flag enabled, it -* means that the last instruction is an sti. We should not -* leave the flag on in this case. The same goes for mov ss -*/ - if (!(int_shadow mask)) - ctxt-interruptibility = mask; -} - static inline void setup_syscalls_segments(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops, struct desc_struct *cs, @@ -2516,7 +2502,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) int rc = X86EMUL_CONTINUE; int saved_dst_type = c-dst.type; - ctxt-interruptibility = 0; ctxt-decode.mem_read.pos = 0; if (ctxt-mode == X86EMUL_MODE_PROT64 (c-d No64)) { @@ -2789,7 +2774,7 @@ special_insn: } if (c-modrm_reg == VCPU_SREG_SS) - toggle_interruptibility(ctxt, KVM_X86_SHADOW_INT_MOV_SS); + ctxt-interruptibility = KVM_X86_SHADOW_INT_MOV_SS; rc = load_segment_descriptor(ctxt, ops, sel, c-modrm_reg); @@ -2958,7 +2943,7 @@ special_insn: if (emulator_bad_iopl(ctxt, ops)) kvm_inject_gp(ctxt-vcpu, 0); else { - toggle_interruptibility(ctxt, KVM_X86_SHADOW_INT_STI); + ctxt-interruptibility = KVM_X86_SHADOW_INT_STI; ctxt-eflags |= X86_EFLAGS_IF; c-dst.type = OP_NONE; /* Disable writeback. */ } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d84d531..2b29ca3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3793,12 +3793,26 @@ static void cache_all_regs(struct kvm_vcpu *vcpu) vcpu-arch.regs_dirty = ~0; } +static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask) +{ + u32 int_shadow = kvm_x86_ops-get_interrupt_shadow(vcpu, mask); + /* +* an sti; sti; sequence only disable interrupts for the first +* instruction. So, if the last instruction, be it emulated or +* not, left the system with the INT_STI flag enabled, it +* means that the last instruction is an sti. We should not +* leave the flag on in this case. The same goes for mov ss +*/ + if (!(int_shadow mask)) + kvm_x86_ops-set_interrupt_shadow(vcpu, mask); +} + int emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2, u16 error_code, int emulation_type) { - int r, shadow_mask; + int r; struct decode_cache *c = vcpu-arch.emulate_ctxt.decode; kvm_clear_exception_queue(vcpu); @@ -3826,6 +3840,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu, ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16; memset(c, 0, sizeof(struct decode_cache)); memcpy(c-regs, vcpu-arch.regs, sizeof c-regs); + vcpu-arch.emulate_ctxt.interruptibility = 0; r = x86_decode_insn(vcpu-arch.emulate_ctxt, emulate_ops); trace_kvm_emulate_insn_start(vcpu); @@ -3893,8 +3908,7 @@ restart: return EMULATE_FAIL; } - shadow_mask = vcpu-arch.emulate_ctxt.interruptibility; - kvm_x86_ops-set_interrupt_shadow(vcpu, shadow_mask); + toggle_interruptibility(vcpu, vcpu-arch.emulate_ctxt.interruptibility); kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags); memcpy(vcpu-arch.regs, c-regs, sizeof c-regs); kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: do not inject exception directly into vcpu
From: Gleb Natapov g...@redhat.com Return exception as a result of instruction emulation and handle injection in KVM code. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index a87d95f..51cfd73 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -216,6 +216,12 @@ struct x86_emulate_ctxt { int interruptibility; bool restart; /* restart string instruction after writeback */ + + int exception; /* exception that happens during emulation or -1 */ + u32 error_code; /* error code for exception */ + bool error_code_valid; + unsigned long cr2; /* faulted address in case of #PF */ + /* decode cache */ struct decode_cache decode; }; diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index c40b405..b43ac98 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -653,6 +653,37 @@ static unsigned long ss_base(struct x86_emulate_ctxt *ctxt, return seg_base(ctxt, ops, VCPU_SREG_SS); } +static void emulate_exception(struct x86_emulate_ctxt *ctxt, int vec, + u32 error, bool valid) +{ + ctxt-exception = vec; + ctxt-error_code = error; + ctxt-error_code_valid = valid; + ctxt-restart = false; +} + +static void emulate_gp(struct x86_emulate_ctxt *ctxt, int err) +{ + emulate_exception(ctxt, GP_VECTOR, err, true); +} + +static void emulate_pf(struct x86_emulate_ctxt *ctxt, unsigned long addr, + int err) +{ + ctxt-cr2 = addr; + emulate_exception(ctxt, PF_VECTOR, err, true); +} + +static void emulate_ud(struct x86_emulate_ctxt *ctxt) +{ + emulate_exception(ctxt, UD_VECTOR, 0, false); +} + +static void emulate_ts(struct x86_emulate_ctxt *ctxt, int err) +{ + emulate_exception(ctxt, TS_VECTOR, err, true); +} + static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops, unsigned long eip, u8 *dest) @@ -1285,7 +1316,7 @@ static int read_emulated(struct x86_emulate_ctxt *ctxt, rc = ops-read_emulated(addr, mc-data + mc-end, n, err, ctxt-vcpu); if (rc == X86EMUL_PROPAGATE_FAULT) - kvm_inject_page_fault(ctxt-vcpu, addr, err); + emulate_pf(ctxt, addr, err); if (rc != X86EMUL_CONTINUE) return rc; mc-end += n; @@ -1366,13 +1397,13 @@ static int read_segment_descriptor(struct x86_emulate_ctxt *ctxt, get_descriptor_table_ptr(ctxt, ops, selector, dt); if (dt.size index * 8 + 7) { - kvm_inject_gp(ctxt-vcpu, selector 0xfffc); + emulate_gp(ctxt, selector 0xfffc); return X86EMUL_PROPAGATE_FAULT; } addr = dt.address + index * 8; ret = ops-read_std(addr, desc, sizeof *desc, ctxt-vcpu, err); if (ret == X86EMUL_PROPAGATE_FAULT) - kvm_inject_page_fault(ctxt-vcpu, addr, err); + emulate_pf(ctxt, addr, err); return ret; } @@ -1391,14 +1422,14 @@ static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt, get_descriptor_table_ptr(ctxt, ops, selector, dt); if (dt.size index * 8 + 7) { - kvm_inject_gp(ctxt-vcpu, selector 0xfffc); + emulate_gp(ctxt, selector 0xfffc); return X86EMUL_PROPAGATE_FAULT; } addr = dt.address + index * 8; ret = ops-write_std(addr, desc, sizeof *desc, ctxt-vcpu, err); if (ret == X86EMUL_PROPAGATE_FAULT) - kvm_inject_page_fault(ctxt-vcpu, addr, err); + emulate_pf(ctxt, addr, err); return ret; } @@ -1517,7 +1548,7 @@ load: ops-set_cached_descriptor(seg_desc, seg, ctxt-vcpu); return X86EMUL_CONTINUE; exception: - kvm_queue_exception_e(ctxt-vcpu, err_vec, err_code); + emulate_exception(ctxt, err_vec, err_code, true); return X86EMUL_PROPAGATE_FAULT; } @@ -1578,7 +1609,7 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt, break; case X86EMUL_MODE_VM86: if (iopl 3) { - kvm_inject_gp(ctxt-vcpu, 0); + emulate_gp(ctxt, 0); return X86EMUL_PROPAGATE_FAULT; } change_mask |= EFLG_IF; @@ -1829,7 +1860,7 @@ static inline int writeback(struct x86_emulate_ctxt *ctxt, err, ctxt-vcpu); if (rc == X86EMUL_PROPAGATE_FAULT) - kvm_inject_page_fault(ctxt-vcpu, + emulate_pf(ctxt, (unsigned
[COMMIT master] KVM: x86 emulator: make (get|set)_dr() callback return error if it fails
From: Gleb Natapov g...@redhat.com Make (get|set)_dr() callback return error if it fails instead of injecting exception behind emulator's back. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 061f7d3..d5979ec 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3151,9 +3151,14 @@ twobyte_insn: goto done; } - ops-set_dr(c-modrm_reg,c-regs[c-modrm_rm] - ((ctxt-mode == X86EMUL_MODE_PROT64) ? ~0ULL : ~0U), - ctxt-vcpu); + if (ops-set_dr(c-modrm_reg, c-regs[c-modrm_rm] + ((ctxt-mode == X86EMUL_MODE_PROT64) ? +~0ULL : ~0U), ctxt-vcpu) 0) { + /* #UD condition is already handled by the code above */ + kvm_inject_gp(ctxt-vcpu, 0); + goto done; + } + c-dst.type = OP_NONE; /* no writeback */ break; case 0x30: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f6c799d..dfad042 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -573,7 +573,7 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_get_cr8); -int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) +static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) { switch (dr) { case 0 ... 3: @@ -582,29 +582,21 @@ int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) vcpu-arch.eff_db[dr] = val; break; case 4: - if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) { - kvm_queue_exception(vcpu, UD_VECTOR); - return 1; - } + if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) + return 1; /* #UD */ /* fall through */ case 6: - if (val 0xULL) { - kvm_inject_gp(vcpu, 0); - return 1; - } + if (val 0xULL) + return -1; /* #GP */ vcpu-arch.dr6 = (val DR6_VOLATILE) | DR6_FIXED_1; break; case 5: - if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) { - kvm_queue_exception(vcpu, UD_VECTOR); - return 1; - } + if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) + return 1; /* #UD */ /* fall through */ default: /* 7 */ - if (val 0xULL) { - kvm_inject_gp(vcpu, 0); - return 1; - } + if (val 0xULL) + return -1; /* #GP */ vcpu-arch.dr7 = (val DR7_VOLATILE) | DR7_FIXED_1; if (!(vcpu-guest_debug KVM_GUESTDBG_USE_HW_BP)) { kvm_x86_ops-set_dr7(vcpu, vcpu-arch.dr7); @@ -615,28 +607,37 @@ int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) return 0; } + +int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) +{ + int res; + + res = __kvm_set_dr(vcpu, dr, val); + if (res 0) + kvm_queue_exception(vcpu, UD_VECTOR); + else if (res 0) + kvm_inject_gp(vcpu, 0); + + return res; +} EXPORT_SYMBOL_GPL(kvm_set_dr); -int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val) +static int _kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val) { switch (dr) { case 0 ... 3: *val = vcpu-arch.db[dr]; break; case 4: - if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) { - kvm_queue_exception(vcpu, UD_VECTOR); + if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) return 1; - } /* fall through */ case 6: *val = vcpu-arch.dr6; break; case 5: - if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) { - kvm_queue_exception(vcpu, UD_VECTOR); + if (kvm_read_cr4_bits(vcpu, X86_CR4_DE)) return 1; - } /* fall through */ default: /* 7 */ *val = vcpu-arch.dr7; @@ -645,6 +646,15 @@ int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val) return 0; } + +int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val) +{ + if (_kvm_get_dr(vcpu, dr, val)) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 1; + } + return 0; +} EXPORT_SYMBOL_GPL(kvm_get_dr); static inline u32 bit(int bitno) @@
[COMMIT master] KVM: x86 emulator: handle shadowed registers outside emulator
From: Gleb Natapov g...@redhat.com Emulator shouldn't access vcpu directly. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 42cb7d7..97a42e8 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -941,12 +941,9 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) /* we cannot decode insn before we complete previous rep insn */ WARN_ON(ctxt-restart); - /* Shadow copy of register state. Committed on successful emulation. */ - memset(c, 0, sizeof(struct decode_cache)); c-eip = ctxt-eip; c-fetch.start = c-fetch.end = c-eip; ctxt-cs_base = seg_base(ctxt, ops, VCPU_SREG_CS); - memcpy(c-regs, ctxt-vcpu-arch.regs, sizeof c-regs); switch (mode) { case X86EMUL_MODE_REAL: @@ -2486,16 +2483,13 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt, struct decode_cache *c = ctxt-decode; int rc; - memset(c, 0, sizeof(struct decode_cache)); c-eip = ctxt-eip; - memcpy(c-regs, ctxt-vcpu-arch.regs, sizeof c-regs); c-dst.type = OP_NONE; rc = emulator_do_task_switch(ctxt, ops, tss_selector, reason, has_error_code, error_code); if (rc == X86EMUL_CONTINUE) { - memcpy(ctxt-vcpu-arch.regs, c-regs, sizeof c-regs); rc = writeback(ctxt, ops); if (rc == X86EMUL_CONTINUE) ctxt-eip = c-eip; @@ -2525,13 +2519,6 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) ctxt-interruptibility = 0; ctxt-decode.mem_read.pos = 0; - /* Shadow copy of register state. Committed on successful emulation. -* NOTE: we can copy them from vcpu as x86_decode_insn() doesn't -* modify them. -*/ - - memcpy(c-regs, ctxt-vcpu-arch.regs, sizeof c-regs); - if (ctxt-mode == X86EMUL_MODE_PROT64 (c-d No64)) { kvm_queue_exception(ctxt-vcpu, UD_VECTOR); goto done; @@ -3031,8 +3018,6 @@ writeback: * without decoding */ ctxt-decode.mem_read.end = 0; - /* Commit shadow register state. */ - memcpy(ctxt-vcpu-arch.regs, c-regs, sizeof c-regs); ctxt-eip = c-eip; done: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f42be00..d84d531 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3799,7 +3799,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type) { int r, shadow_mask; - struct decode_cache *c; + struct decode_cache *c = vcpu-arch.emulate_ctxt.decode; kvm_clear_exception_queue(vcpu); vcpu-arch.mmio_fault_cr2 = cr2; @@ -3824,13 +3824,14 @@ int emulate_instruction(struct kvm_vcpu *vcpu, ? X86EMUL_MODE_VM86 : cs_l ? X86EMUL_MODE_PROT64 : cs_db ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16; + memset(c, 0, sizeof(struct decode_cache)); + memcpy(c-regs, vcpu-arch.regs, sizeof c-regs); r = x86_decode_insn(vcpu-arch.emulate_ctxt, emulate_ops); trace_kvm_emulate_insn_start(vcpu); /* Only allow emulation of specific instructions on #UD * (namely VMMCALL, sysenter, sysexit, syscall)*/ - c = vcpu-arch.emulate_ctxt.decode; if (emulation_type EMULTYPE_TRAP_UD) { if (!c-twobyte) return EMULATE_FAIL; @@ -3871,6 +3872,10 @@ int emulate_instruction(struct kvm_vcpu *vcpu, return EMULATE_DONE; } + /* this is needed for vmware backdor interface to work since it + changes registers values during IO operation */ + memcpy(c-regs, vcpu-arch.regs, sizeof c-regs); + restart: r = x86_emulate_insn(vcpu-arch.emulate_ctxt, emulate_ops); @@ -3891,6 +3896,7 @@ restart: shadow_mask = vcpu-arch.emulate_ctxt.interruptibility; kvm_x86_ops-set_interrupt_shadow(vcpu, shadow_mask); kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags); + memcpy(vcpu-arch.regs, c-regs, sizeof c-regs); kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip); if (vcpu-arch.pio.count) { @@ -4874,6 +4880,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason, bool has_error_code, u32 error_code) { + struct decode_cache *c = vcpu-arch.emulate_ctxt.decode; int cs_db, cs_l, ret; cache_all_regs(vcpu); @@ -4888,6 +4895,8 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason, ? X86EMUL_MODE_VM86 : cs_l ? X86EMUL_MODE_PROT64 : cs_db
[COMMIT master] KVM: x86 emulator: x86_emulate_insn() return -1 only in case of emulation failure
From: Gleb Natapov g...@redhat.com Currently emulator returns -1 when emulation failed or IO is needed. Caller tries to guess whether emulation failed by looking at other variables. Make it easier for caller to recognise error condition by always returning -1 in case of failure. For this new emulator internal return value X86EMUL_IO_NEEDED is introduced. It is used to distinguish between error condition (which returns X86EMUL_UNHANDLEABLE) and condition that requires IO exit to userspace to continue emulation. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 0cf4311..777240d 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -53,6 +53,7 @@ struct x86_emulate_ctxt; #define X86EMUL_PROPAGATE_FAULT 2 /* propagate a generated fault to guest */ #define X86EMUL_RETRY_INSTR 3 /* retry the instruction for some reason */ #define X86EMUL_CMPXCHG_FAILED 4 /* cmpxchg did not see expected value */ +#define X86EMUL_IO_NEEDED 5 /* IO is needed to complete emulation */ struct x86_emulate_ops { /* diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 55496f4..adf82ef 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3230,7 +3230,7 @@ static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes, } ret = kvm_read_guest(vcpu-kvm, gpa, data, toread); if (ret 0) { - r = X86EMUL_UNHANDLEABLE; + r = X86EMUL_IO_NEEDED; goto out; } @@ -3286,7 +3286,7 @@ static int kvm_write_guest_virt_system(gva_t addr, void *val, } ret = kvm_write_guest(vcpu-kvm, gpa, data, towrite); if (ret 0) { - r = X86EMUL_UNHANDLEABLE; + r = X86EMUL_IO_NEEDED; goto out; } @@ -3346,7 +3346,7 @@ mmio: vcpu-run-mmio.len = vcpu-mmio_size = bytes; vcpu-run-mmio.is_write = vcpu-mmio_is_write = 0; - return X86EMUL_UNHANDLEABLE; + return X86EMUL_IO_NEEDED; } int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, @@ -3818,8 +3818,6 @@ int emulate_instruction(struct kvm_vcpu *vcpu, */ cache_all_regs(vcpu); - vcpu-mmio_is_write = 0; - if (!(emulation_type EMULTYPE_NO_DECODE)) { int cs_db, cs_l; kvm_x86_ops-get_cs_db_l_bits(vcpu, cs_db, cs_l); @@ -3893,24 +3891,26 @@ restart: return EMULATE_DO_MMIO; } - if (r) { - if (kvm_mmu_unprotect_page_virt(vcpu, cr2)) - goto done; - if (!vcpu-mmio_needed) { - ++vcpu-stat.insn_emulation_fail; - trace_kvm_emulate_insn_failed(vcpu); - kvm_report_emulation_failure(vcpu, mmio); - return EMULATE_FAIL; - } + if (vcpu-mmio_needed) { + if (vcpu-mmio_is_write) + vcpu-mmio_needed = 0; return EMULATE_DO_MMIO; } - if (vcpu-mmio_is_write) { - vcpu-mmio_needed = 0; - return EMULATE_DO_MMIO; + if (r) { /* emulation failed */ + /* +* if emulation was due to access to shadowed page table +* and it failed try to unshadow page and re-entetr the +* guest to let CPU execute the instruction. +*/ + if (kvm_mmu_unprotect_page_virt(vcpu, cr2)) + return EMULATE_DONE; + + trace_kvm_emulate_insn_failed(vcpu); + kvm_report_emulation_failure(vcpu, mmio); + return EMULATE_FAIL; } -done: if (vcpu-arch.exception.pending) vcpu-arch.emulate_ctxt.restart = false; -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: advance RIP outside x86 emulator code
From: Gleb Natapov g...@redhat.com Return new RIP as part of instruction emulation result instead of updating KVM's RIP from x86 emulator code. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index d7a18a0..437f31b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2496,8 +2496,9 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt, if (rc == X86EMUL_CONTINUE) { memcpy(ctxt-vcpu-arch.regs, c-regs, sizeof c-regs); - kvm_rip_write(ctxt-vcpu, c-eip); rc = writeback(ctxt, ops); + if (rc == X86EMUL_CONTINUE) + ctxt-eip = c-eip; } return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; @@ -2554,7 +2555,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) if (address_mask(c, c-regs[VCPU_REGS_RCX]) == 0) { string_done: ctxt-restart = false; - kvm_rip_write(ctxt-vcpu, c-eip); + ctxt-eip = c-eip; goto done; } /* The second termination condition only applies for REPE @@ -3032,7 +3033,7 @@ writeback: ctxt-decode.mem_read.end = 0; /* Commit shadow register state. */ memcpy(ctxt-vcpu-arch.regs, c-regs, sizeof c-regs); - kvm_rip_write(ctxt-vcpu, c-eip); + ctxt-eip = c-eip; ops-set_rflags(ctxt-vcpu, ctxt-eflags); done: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4121a9f..3544ea9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3896,6 +3896,7 @@ restart: shadow_mask = vcpu-arch.emulate_ctxt.interruptibility; kvm_x86_ops-set_interrupt_shadow(vcpu, shadow_mask); + kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip); if (vcpu-arch.pio.count) { if (!vcpu-arch.pio.in) @@ -4900,6 +4901,7 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason, if (ret) return EMULATE_FAIL; + kvm_rip_write(vcpu, vcpu-arch.emulate_ctxt.eip); kvm_x86_ops-set_rflags(vcpu, vcpu-arch.emulate_ctxt.eflags); return EMULATE_DONE; } -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: make set_cr() callback return error if it fails
From: Gleb Natapov g...@redhat.com Make set_cr() callback return error if it fails instead of injecting #GP behind emulator's back. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index df53ba2..6c4f491 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -135,7 +135,7 @@ struct x86_emulate_ops { unsigned long (*get_cached_segment_base)(int seg, struct kvm_vcpu *vcpu); void (*get_gdt)(struct desc_ptr *dt, struct kvm_vcpu *vcpu); ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); - void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); + int (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); int (*cpl)(struct kvm_vcpu *vcpu); void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags); int (*get_dr)(int dr, unsigned long *dest, struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index f56ec48..061f7d3 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2272,7 +2272,10 @@ static int load_state_from_tss32(struct x86_emulate_ctxt *ctxt, struct decode_cache *c = ctxt-decode; int ret; - ops-set_cr(3, tss-cr3, ctxt-vcpu); + if (ops-set_cr(3, tss-cr3, ctxt-vcpu)) { + kvm_inject_gp(ctxt-vcpu, 0); + return X86EMUL_PROPAGATE_FAULT; + } c-eip = tss-eip; ctxt-eflags = tss-eflags | 2; c-regs[VCPU_REGS_RAX] = tss-eax; @@ -3135,7 +3138,10 @@ twobyte_insn: c-dst.type = OP_NONE; /* no writeback */ break; case 0x22: /* mov reg, cr */ - ops-set_cr(c-modrm_reg, c-modrm_val, ctxt-vcpu); + if (ops-set_cr(c-modrm_reg, c-modrm_val, ctxt-vcpu)) { + kvm_inject_gp(ctxt-vcpu, 0); + goto done; + } c-dst.type = OP_NONE; break; case 0x23: /* mov from reg to dr */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 29cc2b1..f6c799d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -414,57 +414,49 @@ out: return changed; } -void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) +static int __kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) { cr0 |= X86_CR0_ET; #ifdef CONFIG_X86_64 - if (cr0 0xUL) { - kvm_inject_gp(vcpu, 0); - return; - } + if (cr0 0xUL) + return 1; #endif cr0 = ~CR0_RESERVED_BITS; - if ((cr0 X86_CR0_NW) !(cr0 X86_CR0_CD)) { - kvm_inject_gp(vcpu, 0); - return; - } + if ((cr0 X86_CR0_NW) !(cr0 X86_CR0_CD)) + return 1; - if ((cr0 X86_CR0_PG) !(cr0 X86_CR0_PE)) { - kvm_inject_gp(vcpu, 0); - return; - } + if ((cr0 X86_CR0_PG) !(cr0 X86_CR0_PE)) + return 1; if (!is_paging(vcpu) (cr0 X86_CR0_PG)) { #ifdef CONFIG_X86_64 if ((vcpu-arch.efer EFER_LME)) { int cs_db, cs_l; - if (!is_pae(vcpu)) { - kvm_inject_gp(vcpu, 0); - return; - } + if (!is_pae(vcpu)) + return 1; kvm_x86_ops-get_cs_db_l_bits(vcpu, cs_db, cs_l); - if (cs_l) { - kvm_inject_gp(vcpu, 0); - return; - - } + if (cs_l) + return 1; } else #endif - if (is_pae(vcpu) !load_pdptrs(vcpu, vcpu-arch.cr3)) { - kvm_inject_gp(vcpu, 0); - return; - } - + if (is_pae(vcpu) !load_pdptrs(vcpu, vcpu-arch.cr3)) + return 1; } kvm_x86_ops-set_cr0(vcpu, cr0); kvm_mmu_reset_context(vcpu); - return; + return 0; +} + +void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) +{ + if (__kvm_set_cr0(vcpu, cr0)) + kvm_inject_gp(vcpu, 0); } EXPORT_SYMBOL_GPL(kvm_set_cr0); @@ -474,61 +466,56 @@ void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw) } EXPORT_SYMBOL_GPL(kvm_lmsw); -void kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) +int __kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) { unsigned long old_cr4 = kvm_read_cr4(vcpu); unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE; - if (cr4 CR4_RESERVED_BITS) { - kvm_inject_gp(vcpu, 0); - return; - } + if (cr4 CR4_RESERVED_BITS) + return 1;
[COMMIT master] KVM: x86 emulator: add (set|get)_dr callbacks to x86_emulate_ops
From: Gleb Natapov g...@redhat.com Add (set|get)_dr callbacks to x86_emulate_ops instead of calling them directly. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 69a64a6..c37296d 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -137,6 +137,8 @@ struct x86_emulate_ops { void (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); int (*cpl)(struct kvm_vcpu *vcpu); void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags); + int (*get_dr)(int dr, unsigned long *dest, struct kvm_vcpu *vcpu); + int (*set_dr)(int dr, unsigned long value, struct kvm_vcpu *vcpu); }; /* Type, address-of, and value of an instruction's operand. */ diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3f0007b..74cb6ac 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -590,10 +590,6 @@ void kvm_emulate_cpuid(struct kvm_vcpu *vcpu); int kvm_emulate_halt(struct kvm_vcpu *vcpu); int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address); int emulate_clts(struct kvm_vcpu *vcpu); -int emulator_get_dr(struct x86_emulate_ctxt *ctxt, int dr, - unsigned long *dest); -int emulator_set_dr(struct x86_emulate_ctxt *ctxt, int dr, - unsigned long value); void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 687ea09..8a4aa73 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3132,7 +3132,7 @@ twobyte_insn: kvm_queue_exception(ctxt-vcpu, UD_VECTOR); goto done; } - emulator_get_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]); + ops-get_dr(c-modrm_reg, c-regs[c-modrm_rm], ctxt-vcpu); c-dst.type = OP_NONE; /* no writeback */ break; case 0x22: /* mov reg, cr */ @@ -3145,7 +3145,10 @@ twobyte_insn: kvm_queue_exception(ctxt-vcpu, UD_VECTOR); goto done; } - emulator_set_dr(ctxt, c-modrm_reg, c-regs[c-modrm_rm]); + + ops-set_dr(c-modrm_reg,c-regs[c-modrm_rm] + ((ctxt-mode == X86EMUL_MODE_PROT64) ? ~0ULL : ~0U), + ctxt-vcpu); c-dst.type = OP_NONE; /* no writeback */ break; case 0x30: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4d0a968..71ff194 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3575,16 +3575,14 @@ int emulate_clts(struct kvm_vcpu *vcpu) return X86EMUL_CONTINUE; } -int emulator_get_dr(struct x86_emulate_ctxt *ctxt, int dr, unsigned long *dest) +int emulator_get_dr(int dr, unsigned long *dest, struct kvm_vcpu *vcpu) { - return kvm_get_dr(ctxt-vcpu, dr, dest); + return kvm_get_dr(vcpu, dr, dest); } -int emulator_set_dr(struct x86_emulate_ctxt *ctxt, int dr, unsigned long value) +int emulator_set_dr(int dr, unsigned long value, struct kvm_vcpu *vcpu) { - unsigned long mask = (ctxt-mode == X86EMUL_MODE_PROT64) ? ~0ULL : ~0U; - - return kvm_set_dr(ctxt-vcpu, dr, value mask); + return kvm_set_dr(vcpu, dr, value); } void kvm_report_emulation_failure(struct kvm_vcpu *vcpu, const char *context) @@ -3766,6 +3764,8 @@ static struct x86_emulate_ops emulate_ops = { .set_cr = emulator_set_cr, .cpl = emulator_get_cpl, .set_rflags = emulator_set_rflags, + .get_dr = emulator_get_dr, + .set_dr = emulator_set_dr, }; static void cache_all_regs(struct kvm_vcpu *vcpu) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots
From: Avi Kivity a...@redhat.com On svm, kvm_read_pdptr() may require reading guest memory, which can sleep. Push the spinlock into mmu_alloc_roots(), and only take it after we've read the pdptr. Tested-by: Joerg Roedel joerg.roe...@amd.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 51eb6d6..de99638 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2065,11 +2065,13 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) direct = 1; root_gfn = 0; } + spin_lock(vcpu-kvm-mmu_lock); sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL, direct, ACC_ALL, NULL); root = __pa(sp-spt); ++sp-root_count; + spin_unlock(vcpu-kvm-mmu_lock); vcpu-arch.mmu.root_hpa = root; return 0; } @@ -2093,11 +2095,14 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu) direct = 1; root_gfn = i 30; } + spin_lock(vcpu-kvm-mmu_lock); sp = kvm_mmu_get_page(vcpu, root_gfn, i 30, PT32_ROOT_LEVEL, direct, ACC_ALL, NULL); root = __pa(sp-spt); ++sp-root_count; + spin_unlock(vcpu-kvm-mmu_lock); + vcpu-arch.mmu.pae_root[i] = root | PT_PRESENT_MASK; } vcpu-arch.mmu.root_hpa = __pa(vcpu-arch.mmu.pae_root); @@ -2466,7 +2471,9 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu) goto out; spin_lock(vcpu-kvm-mmu_lock); kvm_mmu_free_some_pages(vcpu); + spin_unlock(vcpu-kvm-mmu_lock); r = mmu_alloc_roots(vcpu); + spin_lock(vcpu-kvm-mmu_lock); mmu_sync_roots(vcpu); spin_unlock(vcpu-kvm-mmu_lock); if (r) -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[COMMIT master] KVM: x86 emulator: introduce read cache
From: Gleb Natapov g...@redhat.com Introduce read cache which is needed for instruction that require more then one exit to userspace. After returning from userspace the instruction will be re-executed with cached read value. Signed-off-by: Gleb Natapov g...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 0b2729b..288cbed 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -186,6 +186,7 @@ struct decode_cache { unsigned long modrm_val; struct fetch_cache fetch; struct read_cache io_read; + struct read_cache mem_read; }; struct x86_emulate_ctxt { diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 5ac0bb4..776874b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1263,6 +1263,33 @@ done: return (rc == X86EMUL_UNHANDLEABLE) ? -1 : 0; } +static int read_emulated(struct x86_emulate_ctxt *ctxt, +struct x86_emulate_ops *ops, +unsigned long addr, void *dest, unsigned size) +{ + int rc; + struct read_cache *mc = ctxt-decode.mem_read; + + while (size) { + int n = min(size, 8u); + size -= n; + if (mc-pos mc-end) + goto read_cached; + + rc = ops-read_emulated(addr, mc-data + mc-end, n, ctxt-vcpu); + if (rc != X86EMUL_CONTINUE) + return rc; + mc-end += n; + + read_cached: + memcpy(dest, mc-data + mc-pos, n); + mc-pos += n; + dest += n; + addr += n; + } + return X86EMUL_CONTINUE; +} + static int pio_in_emulated(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops, unsigned int size, unsigned short port, @@ -1504,9 +1531,9 @@ static int emulate_pop(struct x86_emulate_ctxt *ctxt, struct decode_cache *c = ctxt-decode; int rc; - rc = ops-read_emulated(register_address(c, ss_base(ctxt), -c-regs[VCPU_REGS_RSP]), - dest, len, ctxt-vcpu); + rc = read_emulated(ctxt, ops, register_address(c, ss_base(ctxt), + c-regs[VCPU_REGS_RSP]), + dest, len); if (rc != X86EMUL_CONTINUE) return rc; @@ -2475,6 +2502,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) int saved_dst_type = c-dst.type; ctxt-interruptibility = 0; + ctxt-decode.mem_read.pos = 0; /* Shadow copy of register state. Committed on successful emulation. * NOTE: we can copy them from vcpu as x86_decode_insn() doesn't @@ -2529,20 +2557,16 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) } if (c-src.type == OP_MEM) { - rc = ops-read_emulated((unsigned long)c-src.ptr, - c-src.val, - c-src.bytes, - ctxt-vcpu); + rc = read_emulated(ctxt, ops, (unsigned long)c-src.ptr, + c-src.val, c-src.bytes); if (rc != X86EMUL_CONTINUE) goto done; c-src.orig_val = c-src.val; } if (c-src2.type == OP_MEM) { - rc = ops-read_emulated((unsigned long)c-src2.ptr, - c-src2.val, - c-src2.bytes, - ctxt-vcpu); + rc = read_emulated(ctxt, ops, (unsigned long)c-src2.ptr, + c-src2.val, c-src2.bytes); if (rc != X86EMUL_CONTINUE) goto done; } @@ -2553,8 +2577,8 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) if ((c-dst.type == OP_MEM) !(c-d Mov)) { /* optimisation - avoid slow emulated read if Mov */ - rc = ops-read_emulated((unsigned long)c-dst.ptr, c-dst.val, - c-dst.bytes, ctxt-vcpu); + rc = read_emulated(ctxt, ops, (unsigned long)c-dst.ptr, + c-dst.val, c-dst.bytes); if (rc != X86EMUL_CONTINUE) goto done; } @@ -2981,7 +3005,11 @@ writeback: (rc-end != 0 rc-end == rc-pos)) ctxt-restart = false; } - + /* +* reset read cache here in case string instruction is restared +* without decoding +*/ + ctxt-decode.mem_read.end = 0; /* Commit shadow register state. */
[COMMIT master] KVM: x86: properly update ready_for_interrupt_injection
From: Marcelo Tosatti mtosa...@redhat.com The recent changes to emulate string instructions without entering guest mode exposed a bug where pending interrupts are not properly reflected in ready_for_interrupt_injection. The result is that userspace overwrites a previously queued interrupt, when irqchip's are emulated in userspace. Fix by always updating state before returning to userspace. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b2ce1d..dff08e5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4653,7 +4653,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) } srcu_read_unlock(kvm-srcu, vcpu-srcu_idx); - post_kvm_run_save(vcpu); vapic_exit(vcpu); @@ -4703,6 +4702,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) r = __vcpu_run(vcpu); out: + post_kvm_run_save(vcpu); if (vcpu-sigset_active) sigprocmask(SIG_SETMASK, sigsaved, NULL); -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH] virtio-spec: document block CMD and FLUSH
On Wed, 5 May 2010 14:28:41 +0930 Rusty Russell ru...@rustcorp.com.au wrote: On Wed, 5 May 2010 05:47:05 am Jamie Lokier wrote: Jens Axboe wrote: On Tue, May 04 2010, Rusty Russell wrote: ISTR someone mentioning a desire for such an API years ago, so CC'ing the usual I/O suspects... It would be nice to have a more fuller API for this, but the reality is that only the flush approach is really workable. Even just strict ordering of requests could only be supported on SCSI, and even there the kernel still lacks proper guarantees on error handling to prevent reordering there. There's a few I/O scheduling differences that might be useful: 1. The I/O scheduler could freely move WRITEs before a FLUSH but not before a BARRIER. That might be useful for time-critical WRITEs, and those issued by high I/O priority. This is only because noone actually wants flushes or barriers, though I/O people seem to only offer that. We really want these writes must occur before this write. That offers maximum choice to the I/O subsystem and potentially to smart (virtual?) disks. 2. The I/O scheduler could move WRITEs after a FLUSH if the FLUSH is only for data belonging to a particular file (e.g. fdatasync with no file size change, even on btrfs if O_DIRECT was used for the writes being committed). That would entail tagging FLUSHes and WRITEs with a fs-specific identifier (such as inode number), opaque to the scheduler which only checks equality. This is closer. In userspace I'd be happy with a all prior writes to this struct file before all future writes. Even if the original guarantees were stronger (ie. inode basis). We currently implement transactions using 4 fsync /msync pairs. write_recovery_data(fd); fsync(fd); msync(mmap); write_recovery_header(fd); fsync(fd); msync(mmap); overwrite_with_new_data(fd); fsync(fd); msync(mmap); remove_recovery_header(fd); fsync(fd); msync(mmap); Seems over-zealous. If the recovery_header held a strong checksum of the recovery_data you would not need the first fsync, and as long as you have two places to write recovery data, you don't need the 3rd and 4th syncs. Just: write_internally_checksummed_recovery_data_and_header_to_unused_log_space() fsync / msync overwrite_with_new_data() To recovery you choose the most recent log_space and replay the content. That may be a redundant operation, but that is no loss. Also cannot see the point of msync if you have already performed an fsync, and if there is a point, I would expect you to call msync before fsync... Maybe there is some subtlety there that I am not aware of. Yet we really only need ordering, not guarantees about it actually hitting disk before returning. In other words, FLUSH can be more relaxed than BARRIER inside the kernel. It's ironic that we think of fsync as stronger than fbarrier outside the kernel :-) It's an implementation detail; barrier has less flexibility because it has less information about what is required. I'm saying I want to give you as much information as I can, even if you don't use it yet. Only we know that approach doesn't work. People will learn that they don't need to give the extra information to still achieve the same result - just like they did with ext3 and fsync. Then when we improve the implementation to only provide the guarantees that you asked for, people will complain that they are getting empty files that they didn't expect. The abstraction I would like to see is a simple 'barrier' that contains no data and has a filesystem-wide effect. If a filesystem wanted a 'full' barrier such as the current BIO_RW_BARRER, it would send an empty barrier, then the data, then another empty barrier. (However I suspect most filesystems don't really need barriers on both sides.) A low level driver might merge these together if the underlying hardware supported that combined operation (which I believe some do). I think this merging would be less complex that the current need to split a BIO_RW_BARRIER in to the three separate operations when only a flush is possible (I know it would make md code a lot nicer :-). I would probably expose this to user-space as extra flags to sync_file_range: SYNC_FILE_RANGE_BARRIER_BEFORE SYNC_FILE_RANGE_BARRIER_AFTER This would make it clear that a barrier does *not* imply a sync, it only applies to data for which a sync has already been requested. So data that has already been 'synced' is stored strictly before data which has not yet been submitted with write() (or by changing a mmapped area). The barrier would still be filesystem wide in that if you SYNC_FILE_WRITE_WRITE one file, then SYNC_FILE_RANGE_BARRIER_BEFORE another file on the same filesystem, the pages scheduled in the first file would be affect by the barrier request on the second file. Implementing
Re: [PATCH 2/2] turn off kvmclock when resetting cpu
On 05/04/2010 09:35 PM, Glauber Costa wrote: Currently, in the linux kernel, we reset kvmclock if we are rebooting into a crash kernel through kexec. The rationale, is that a new kernel won't follow the same memory addresses, and the memory where kvmclock is located in the first kernel, will be something else in the second one. We don't do it in normal reboots, because the second kernel ends up registering kvmclock again, which has the effect of turning off the first instance. This is, however, totally wrong. This assumes we're booting into a kernel that also has kvmclock enabled. If by some reason we reboot into something that doesn't do kvmclock including but not limited to: * rebooting into an older kernel without kvmclock support, * rebooting with no-kvmclock, * rebootint into another O.S, we'll simply have the hypervisor writing into a random memory position into the guest. Neat, uh? Moreover, I believe the fix belongs in qemu, since it is the entity more prepared to detect all kinds of reboots (by means of a cpu_reset), not to mention the presence of misbehaving guests, that can forget to turn kvmclock off. This patch fixes the issue for me. Signed-off-by: Glauber Costaglom...@redhat.com --- qemu-kvm-x86.c | 19 +++ 1 files changed, 19 insertions(+), 0 deletions(-) diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c index 439c31a..4b94e04 100644 --- a/qemu-kvm-x86.c +++ b/qemu-kvm-x86.c @@ -1417,8 +1417,27 @@ void kvm_arch_push_nmi(void *opaque) } #endif /* KVM_CAP_USER_NMI */ +static int kvm_turn_off_clock(CPUState *env) +{ +struct { +struct kvm_msrs info; +struct kvm_msr_entry entries[100]; +} msr_data; + +struct kvm_msr_entry *msrs = msr_data.entries; +int n = 0; + +kvm_msr_entry_set(msrs[n++], MSR_KVM_SYSTEM_TIME, 0); +kvm_msr_entry_set(msrs[n++], MSR_KVM_WALL_CLOCK, 0); This fails if the kernel doesn't support those MSRs. Moreover, you need to use the new MSRs as well if we are ever to succeed in deprecating the old ones. +msr_data.info.nmsrs = n; + +return kvm_vcpu_ioctl(env, KVM_SET_MSRS,msr_data); +} + + How about a different approach? Query the supported MSRs (KVM_GET_MSR_LIST or thereabout) and reset them (with special cases for the TSC, and the old clock MSRs when the new ones are present)? Long term we need a kernel reset function, but this will do for now. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] x86: eliminate TS_XSAVE
On 05/04/2010 09:24 PM, H. Peter Anvin wrote: I would like to request one change, however. I would like to see the alternatives code to be: movb $0,reg movb $1,reg ... instead of using xor (which has to be padded with NOPs, which is of course pointless since the slot is a fixed size.) Right. I would suggest using a byte-sized variable instead of a dword-size variable to save a few bytes, too. I used a bool, and the code already compiles to a byte mov. Though it could be argued that a word instruction is better since it avoids a false dependency, and allows a preceding instruction that modifies %reg to be executed after the mov instruction. Once the jump label framework is integrated and has matured, I think we should consider using it to save the mov/test/jump. IIRC that has an implied unlikely() which isn't suitable here? Perhaps the immediate values patches. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/2] x86 FPU API
Currently all fpu accessors are wedded to task_struct. However kvm also uses the fpu in a different context. Introduce an FPU API, and replace the current uses with the new API. While this patchset is oriented towards deeper changes, as a first step it simlifies xsave for kvm. v2: eliminate useless padding in use_xsave() by using a larger instruction Avi Kivity (2): x86: eliminate TS_XSAVE x86: Introduce 'struct fpu' and related API arch/x86/include/asm/i387.h| 135 +++- arch/x86/include/asm/processor.h |6 ++- arch/x86/include/asm/thread_info.h |1 - arch/x86/include/asm/xsave.h |7 +- arch/x86/kernel/cpu/common.c |5 +- arch/x86/kernel/i387.c | 107 ++--- arch/x86/kernel/process.c | 20 +++--- arch/x86/kernel/process_32.c |2 +- arch/x86/kernel/process_64.c |2 +- arch/x86/kernel/xsave.c|8 +- arch/x86/math-emu/fpu_aux.c|6 +- 11 files changed, 181 insertions(+), 118 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] x86: eliminate TS_XSAVE
The fpu code currently uses current-thread_info-status TS_XSAVE as a way to distinguish between XSAVE capable processors and older processors. The decision is not really task specific; instead we use the task status to avoid a global memory reference - the value should be the same across all threads. Eliminate this tie-in into the task structure by using an alternative instruction keyed off the XSAVE cpu feature; this results in shorter and faster code, without introducing a global memory reference. Acked-by: Suresh Siddha suresh.b.sid...@intel.com Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/include/asm/i387.h| 20 arch/x86/include/asm/thread_info.h |1 - arch/x86/kernel/cpu/common.c |5 + arch/x86/kernel/i387.c |5 + arch/x86/kernel/xsave.c|6 +++--- 5 files changed, 21 insertions(+), 16 deletions(-) diff --git a/arch/x86/include/asm/i387.h b/arch/x86/include/asm/i387.h index da29309..301fff5 100644 --- a/arch/x86/include/asm/i387.h +++ b/arch/x86/include/asm/i387.h @@ -56,6 +56,18 @@ extern int restore_i387_xstate_ia32(void __user *buf); #define X87_FSW_ES (1 7)/* Exception Summary */ +static inline bool use_xsave(void) +{ + bool has_xsave; + + alternative_io(mov $0, %0, + mov $1, %0, + X86_FEATURE_XSAVE, + =g(has_xsave)); + + return has_xsave; +} + #ifdef CONFIG_X86_64 /* Ignore delayed exceptions from user space */ @@ -99,7 +111,7 @@ static inline void clear_fpu_state(struct task_struct *tsk) /* * xsave header may indicate the init state of the FP. */ - if ((task_thread_info(tsk)-status TS_XSAVE) + if (use_xsave() !(xstate-xsave_hdr.xstate_bv XSTATE_FP)) return; @@ -164,7 +176,7 @@ static inline void fxsave(struct task_struct *tsk) static inline void __save_init_fpu(struct task_struct *tsk) { - if (task_thread_info(tsk)-status TS_XSAVE) + if (use_xsave()) xsave(tsk); else fxsave(tsk); @@ -218,7 +230,7 @@ static inline int fxrstor_checking(struct i387_fxsave_struct *fx) */ static inline void __save_init_fpu(struct task_struct *tsk) { - if (task_thread_info(tsk)-status TS_XSAVE) { + if (use_xsave()) { struct xsave_struct *xstate = tsk-thread.xstate-xsave; struct i387_fxsave_struct *fx = tsk-thread.xstate-fxsave; @@ -266,7 +278,7 @@ end: static inline int restore_fpu_checking(struct task_struct *tsk) { - if (task_thread_info(tsk)-status TS_XSAVE) + if (use_xsave()) return xrstor_checking(tsk-thread.xstate-xsave); else return fxrstor_checking(tsk-thread.xstate-fxsave); diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index d017ed5..d4092fa 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -242,7 +242,6 @@ static inline struct thread_info *current_thread_info(void) #define TS_POLLING 0x0004 /* true if in idle loop and not sleeping */ #define TS_RESTORE_SIGMASK 0x0008 /* restore signal mask in do_signal() */ -#define TS_XSAVE 0x0010 /* Use xsave/xrstor */ #define tsk_is_polling(t) (task_thread_info(t)-status TS_POLLING) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 4868e4a..c1c00d0 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1243,10 +1243,7 @@ void __cpuinit cpu_init(void) /* * Force FPU initialization: */ - if (cpu_has_xsave) - current_thread_info()-status = TS_XSAVE; - else - current_thread_info()-status = 0; + current_thread_info()-status = 0; clear_used_math(); mxcsr_feature_mask_init(); diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c index 54c31c2..14ca1dc 100644 --- a/arch/x86/kernel/i387.c +++ b/arch/x86/kernel/i387.c @@ -102,10 +102,7 @@ void __cpuinit fpu_init(void) mxcsr_feature_mask_init(); /* clean state in init */ - if (cpu_has_xsave) - current_thread_info()-status = TS_XSAVE; - else - current_thread_info()-status = 0; + current_thread_info()-status = 0; clear_used_math(); } #endif /* CONFIG_X86_64 */ diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c index 782c3a3..c1b0a11 100644 --- a/arch/x86/kernel/xsave.c +++ b/arch/x86/kernel/xsave.c @@ -99,7 +99,7 @@ int save_i387_xstate(void __user *buf) if (err) return err; - if (task_thread_info(tsk)-status TS_XSAVE) + if (use_xsave()) err = xsave_user(buf); else err =
Re: [PATCHv2 00/23] next round of emulator cleanups
On 04/28/2010 07:15 PM, Gleb Natapov wrote: This is the next round of emulator cleanups. Make it even more detached from kvm. First patch introduces IO read cache which is needed to correctly emulate instructions that require more then one IO read exit during emulation. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RESEND] intel_txt: enable SMX flag for VMXON in KVM
On 05/05/2010 01:38 PM, Shane Wang wrote: Per Intel SDM 3B 20.7, for IA32_FEATURE_CONTROL MSR Bit 1 enables VMXON in SMX operation. If the bit is clear, execution of VMXON in SMX operation causes a general-protection exception. Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution of VMXON outside SMX operation causes a general-protection exception. This patch is to check the correct in/outside-SMX flag when detecting if VMX is disabled by BIOS, and to set in-SMX flag for VMXON after Intel TXT is launched in KVM. Already committed as 9d4b473eeea, I forgot to confirm, sorry. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 4/7] export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
On 05/03/2010 06:52 PM, Glauber Costa wrote: Right now, we were using individual KVM_CAP entities to communicate userspace about which cpuids we support. This is suboptimal, since it generates a delay between the feature arriving in the host, and being available at the guest. A much better mechanism is to list para features in KVM_GET_SUPPORTED_CPUID. This makes userspace automatically aware of what we provide. And if we ever add a new cpuid bit in the future, we have to do that again, which create some complexity and delay in feature adoption. Signed-off-by: Glauber Costaglom...@redhat.com --- arch/x86/include/asm/kvm_para.h |4 arch/x86/kvm/x86.c | 27 +++ 2 files changed, 31 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 9734808..f019f8c 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -16,6 +16,10 @@ #define KVM_FEATURE_CLOCKSOURCE 0 #define KVM_FEATURE_NOP_IO_DELAY 1 #define KVM_FEATURE_MMU_OP2 +/* This indicates that the new set of kvmclock msrs + * are available. The use of 0x11 and 0x12 is deprecated + */ +#define KVM_FEATURE_CLOCKSOURCE23 Separate patch. #define MSR_KVM_WALL_CLOCK 0x11 #define MSR_KVM_SYSTEM_TIME 0x12 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index eb84947..8a7cdda 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1971,6 +1971,20 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, } break; } + case 0x4000: { Use symbolic name, please. + char signature[] = KVMKVMKVM; + u32 *sigptr = (u32 *)signature; + entry-eax = 1; Where did this come from? + entry-ebx = sigptr[0]; + entry-ecx = sigptr[1]; + entry-edx = sigptr[2]; Overflow, you're reading 12 bytes from a 10-byte variable. + break; + } + case 0x4001: + entry-eax = (1 KVM_FEATURE_CLOCKSOURCE) | + (1 KVM_FEATURE_NOP_IO_DELAY) | + (1 KVM_FEATURE_CLOCKSOURCE2); Indentation... Also, have to initialize all fields, since the real cpu won't initialize them for you. Sidenote: the real cpu may be a kvm vcpu, so it may in fact support those features. + break; case 0x8000: entry-eax = min(entry-eax, 0x801a); break; @@ -2017,6 +2031,19 @@ static int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid, for (func = 0x8001; func= limit nent cpuid-nent; ++func) do_cpuid_ent(cpuid_entries[nent], func, 0, nent, cpuid-nent); + + + + r = -E2BIG; + if (nent= cpuid-nent) + goto out_free; + + do_cpuid_ent(cpuid_entries[nent], 0x4000, 0,nent, cpuid-nent); + limit = cpuid_entries[nent - 1].eax; The kvm cpuid does not follow the limit thing. + for (func = 0x4001; func= limit nent cpuid-nent; ++func) + do_cpuid_ent(cpuid_entries[nent], func, 0, + nent, cpuid-nent); + r = -E2BIG; To avoid confusion, please write Documentation/kvm/cpuid.txt based on the current qemu-kvm code, and implement this patch according to the documentation. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 6/7] don't compute pvclock adjustments if we trust the tsc
On 05/03/2010 06:52 PM, Glauber Costa wrote: If the HV told us we can fully trust the TSC, skip any correction Signed-off-by: Glauber Costaglom...@redhat.com --- arch/x86/include/asm/kvm_para.h|5 + arch/x86/include/asm/pvclock-abi.h |1 + arch/x86/kernel/kvmclock.c |3 +++ arch/x86/kernel/pvclock.c |4 4 files changed, 13 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index f019f8c..6f1b878 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -21,6 +21,11 @@ */ #define KVM_FEATURE_CLOCKSOURCE23 +/* The last 8 bits are used to indicate how to interpret the flags field + * in pvclock structure. If no bits are set, all flags are ignored. + */ +#define KVM_FEATURE_CLOCKSOURCE_STABLE_TSC 24 This needs documentation (in cpuid.txt). The flag doesn't mean the TSC is stable, rather it means the pvclock tsc stable bit is valid. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/7] Enable pvclock flags in vcpu_time_info structure
On 05/03/2010 06:52 PM, Glauber Costa wrote: This patch removes one padding byte and transform it into a flags field. New versions of guests using pvclock will query these flags upon each read. Flags, however, will only be interpreted when the guest decides to. It uses the pvclock_valid_flags function to signal that a specific set of flags should be taken into consideration. Which flags are valid are usually devised via HV negotiation. Signed-off-by: Glauber Costaglom...@redhat.com CC: Jeremy Fitzhardingejer...@goop.org --- arch/x86/include/asm/pvclock-abi.h |3 ++- arch/x86/include/asm/pvclock.h |1 + arch/x86/kernel/pvclock.c |9 + 3 files changed, 12 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h index 6d93508..ec5c41a 100644 --- a/arch/x86/include/asm/pvclock-abi.h +++ b/arch/x86/include/asm/pvclock-abi.h @@ -29,7 +29,8 @@ struct pvclock_vcpu_time_info { u64 system_time; u32 tsc_to_system_mul; s8tsc_shift; - u8pad[3]; + u8flags; + u8pad[2]; } __attribute__((__packed__)); /* 32 bytes */ struct pvclock_wall_clock { diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h index 53235fd..cd02f32 100644 --- a/arch/x86/include/asm/pvclock.h +++ b/arch/x86/include/asm/pvclock.h @@ -6,6 +6,7 @@ /* some helper functions for xen and kvm pv clock sources */ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src); +void pvclock_set_flags(u8 flags); unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src); void pvclock_read_wallclock(struct pvclock_wall_clock *wall, struct pvclock_vcpu_time_info *vcpu, diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 03801f2..aa2262b 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -31,8 +31,16 @@ struct pvclock_shadow_time { u32 tsc_to_nsec_mul; int tsc_shift; u32 version; + u8 flags; }; +static u8 valid_flags = 0; + Minor optimization: __read_mostly. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Booting/installing WindowsNT
Avi Kivity wrote: On 05/04/2010 06:27 PM, Andre Przywara wrote: 3. In all other cases so far it BSoDs with STOP 0x3E error right before displaying that kernel message. MSDN talks about a mulitprocessor configuration error: http://msdn.microsoft.com/en-us/library/ms819006.aspx I suspected the offline CPUs in the mptable that confuse NT. But -smp 1,maxcpus=1 does not make a difference. I will try to dig deeper in this area. OK, I tackled this down. It is the max CPUID level that differs. In the AMD CPUID guide leafs _0002 till _0004 are reserved, the CPU that Michael and I used (K8RevF) actually have a max leaf of 1 here. Default qemu64 has a max leaf of 4. So by saying -cpu qemu64,level=1 (or 2 or 3) it works for me. Modern OS only read leaf 4 on Intel systems, it seems that NT4 is missing this. I will now think about a proper fix for this. What about disabling ACPI? smp should still work through the mptable. Didn't make a difference. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448-3567-12 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Booting/installing WindowsNT
On 05/05/2010 11:32 AM, Andre Przywara wrote: Avi Kivity wrote: On 05/04/2010 06:27 PM, Andre Przywara wrote: 3. In all other cases so far it BSoDs with STOP 0x3E error right before displaying that kernel message. MSDN talks about a mulitprocessor configuration error: http://msdn.microsoft.com/en-us/library/ms819006.aspx I suspected the offline CPUs in the mptable that confuse NT. But -smp 1,maxcpus=1 does not make a difference. I will try to dig deeper in this area. OK, I tackled this down. It is the max CPUID level that differs. In the AMD CPUID guide leafs _0002 till _0004 are reserved, the CPU that Michael and I used (K8RevF) actually have a max leaf of 1 here. Default qemu64 has a max leaf of 4. So by saying -cpu qemu64,level=1 (or 2 or 3) it works for me. Modern OS only read leaf 4 on Intel systems, it seems that NT4 is missing this. I will now think about a proper fix for this. I don't understand. Shouldn't the values for cpuid leaf 4 be the same for qemu64 whether the cpu is Intel or AMD? The real cpuid shouldn't matter. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] replace set_msr_entry with kvm_msr_entry
On 05/04/2010 09:35 PM, Glauber Costa wrote: this is yet another function that upstream qemu implements, so we can just use its implementation. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vCPU scalability for linux VMs
On 05/05/2010 04:45 AM, Alec Istomin wrote: Gentlemen, Reaching out with a non-development question, sorry if it's not appropriate here. I'm looking for a way to improve Linux SMP VMs performance under KVM. My preliminary results show that single vCPU Linux VMs perform up to 10 times better than 4vCPU Linux VMs (consolidated performance of 8 VMs on 8 core pre-Nehalem server). I suspect that I'm missing something major and look for any means that can help improve SMP VMs performance. So you have a total of 32 vcpus on 8 cores? This is known to be problematic. You may see some improvement by enabling hyperthreading. There is ongoing work to improve this. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: Process exit requests in kvm loop
On 05/04/2010 12:28 PM, Jan Kiszka wrote: This unbreaks the monitor quit command for qemu-kvm. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm: event writeback can overwrite interrupts with -no-kvm-irqchip
On 05/04/2010 05:15 AM, Marcelo Tosatti wrote: Interrupts that are injected during a vcpu event save/writeback cycle are lost. Fix by writebacking the state before injecting interrupts. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/6] qemu-kvm: use upstream memslot code
On 05/04/2010 01:48 AM, Marcelo Tosatti wrote: See individual patches for details. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Booting/installing WindowsNT
05.05.2010 12:32, Andre Przywara wrote: Avi Kivity wrote: On 05/04/2010 06:27 PM, Andre Przywara wrote: 3. In all other cases so far it BSoDs with STOP 0x3E error right before displaying that kernel message. MSDN talks about a mulitprocessor configuration error: http://msdn.microsoft.com/en-us/library/ms819006.aspx I suspected the offline CPUs in the mptable that confuse NT. But -smp 1,maxcpus=1 does not make a difference. I will try to dig deeper in this area. OK, I tackled this down. It is the max CPUID level that differs. In the AMD CPUID guide leafs _0002 till _0004 are reserved, the CPU that Michael and I used (K8RevF) actually have a max leaf of 1 here. Default qemu64 has a max leaf of 4. So by saying -cpu qemu64,level=1 (or 2 or 3) it works for me. Modern OS only read leaf 4 on Intel systems, it seems that NT4 is missing this. Confirmed, with -cpu qemu64,level=[123] it works for me as well. Note again that after service pack 6 (I haven't tried other SPs), the problem goes away entirely -- winNT SP6 works with the default kvm cpu just fine. Thanks! /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fix -mem-path with hugetlbfs
On 05/04/2010 12:12 AM, Marcelo Tosatti wrote: Avi, please apply to both master and uq/master. --- Fallback to qemu_vmalloc in case file_ram_alloc fails. applied to both, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [qemu-kvm tests PATCH] qemu-kvm tests: enhanced msr test
On 05/02/2010 06:10 PM, Naphtali Sprei wrote: Changed the code structure and added few tests for some of the msr's. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Booting/installing WindowsNT
Avi Kivity wrote: On 05/05/2010 11:32 AM, Andre Przywara wrote: Avi Kivity wrote: On 05/04/2010 06:27 PM, Andre Przywara wrote: 3. In all other cases so far it BSoDs with STOP 0x3E error right before displaying that kernel message. MSDN talks about a mulitprocessor configuration error: http://msdn.microsoft.com/en-us/library/ms819006.aspx I suspected the offline CPUs in the mptable that confuse NT. But -smp 1,maxcpus=1 does not make a difference. I will try to dig deeper in this area. OK, I tackled this down. It is the max CPUID level that differs. In the AMD CPUID guide leafs _0002 till _0004 are reserved, the CPU that Michael and I used (K8RevF) actually have a max leaf of 1 here. Default qemu64 has a max leaf of 4. So by saying -cpu qemu64,level=1 (or 2 or 3) it works for me. Modern OS only read leaf 4 on Intel systems, it seems that NT4 is missing this. I will now think about a proper fix for this. I don't understand. Shouldn't the values for cpuid leaf 4 be the same for qemu64 whether the cpu is Intel or AMD? The real cpuid shouldn't matter. Yes, but if the max leaf value is smaller than 4, then the guest will not read it. It seems that NT does not like the entries returned by KVM for leaf 4. I am about to find out what exactly is causing that. I have the theory that the stop is intentional as NT4 workstation does not _want_ to support certain SMP configurations (more than 2 processors?) I have seen similar issue with WinXPPro and -smp 4 (which went away with -smp 4,cores=4). Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448-3567-12 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [qemu-kvm tests PATCH] qemu-kvm tests: fix linker script problem
On 05/03/2010 02:34 PM, Naphtali Sprei wrote: This is a fix to a previous patch by me. It's on 'next' branch, as of now. commit 848bd0c89c83814023cf51c72effdbc7de0d18b7 causes the linker script itself (flat.lds) to become part of the linked objects, which messed the output file, one such problem is that symbol edata is not the last symbol anymore. diff --git a/kvm/user/config-x86-common.mak b/kvm/user/config-x86-common.mak index 61cc2f0..ad7aeac 100644 --- a/kvm/user/config-x86-common.mak +++ b/kvm/user/config-x86-common.mak @@ -19,7 +19,7 @@ CFLAGS += -m$(bits) libgcc := $(shell $(CC) -m$(bits) --print-libgcc-file-name) FLATLIBS = test/lib/libcflat.a $(libgcc) -%.flat: %.o $(FLATLIBS) flat.lds +%.flat: %.o $(FLATLIBS) $(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,flat.lds $^ $(FLATLIBS) This drops the dependency, so if flat.lds changes, we don't rebuild. I think you can replace $^ by $(filter %.o, $^) and retain the dependency. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Booting/installing WindowsNT
On 05/05/2010 11:51 AM, Michael Tokarev wrote: In the AMD CPUID guide leafs _0002 till _0004 are reserved, the CPU that Michael and I used (K8RevF) actually have a max leaf of 1 here. Default qemu64 has a max leaf of 4. So by saying -cpu qemu64,level=1 (or 2 or 3) it works for me. Modern OS only read leaf 4 on Intel systems, it seems that NT4 is missing this. OK, I tackled this down. It is the max CPUID level that differs. Confirmed, with -cpu qemu64,level=[123] it works for me as well. Note again that after service pack 6 (I haven't tried other SPs), the problem goes away entirely -- winNT SP6 works with the default kvm cpu just fine. Interesting, may be a guest bug that was fixed later. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [qemu-kvm tests PATCH] qemu-kvm tests: merged stringio into emulator
On 05/03/2010 06:39 PM, Naphtali Sprei wrote: based on 'next' branch. Changed test-case stringio into C code and merged into emulator test-case. Removed traces of stringio test-case. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM hook for code integrity checking
On 04/30/2010 05:53 PM, Suen Chun Hui wrote: Dear KVM developers, I'm currently working on an open source security patch to use KVM to implement code verification on a guest VM in runtime. Thus, it would be very helpful if someone can point to me the right function or place to look at for adding 2 hooks into the KVM paging code to: 1. Detect a new guest page (which I assume will imply a new pte and imply a new spte). Currently, I'm considering putting a hook in the function mmu_set_spte(), but may there is a better place. This hook will be used as the main entry point into the code verification function This is in general not possible. Hosts with npt or ept will not see new guest ptes. It could be done with physical pages, but you'll have no way of knowing if the pages are used in userspace, the kernel, or both. 2. Detect a write fault to a read-only spte (eg. for the case of updating the dirty bit back to the guest pte) Unfortunately, I'm unable to find an appropriate place where this actually takes place after reading the code many times. This hook will be used to prevent a secondary peek page from modifying an existing verified code page. set_spte() or mmu_set_spte() may work. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH 7/9] KVM test: Introduce the local_login()
On 04/29/2010 02:44 AM, Amos Kong wrote: On Wed, Apr 28, 2010 at 03:01:40PM +0300, Michael Goldish wrote: On 04/26/2010 01:04 PM, Jason Wang wrote: This patch introduces a new method which is used to log into the guest through the guest serial console. The serial_mode must be set to session in order to make use of this patch. In what cases would we want to use this feature? The serial console is not supported by all guests and I'm not sure it supports multiple concurrent sessions (does it?), so it's probably not possible to use it reliably as a replacement for the regular remote shell servers, or even as an alternative variant. We could not get system log by ssh session when network doesn't work(haven't launched, down, unstable, ...) Using serial console can get more useful info. Control guest by ssh in some network related testcases isn't credible. It should be independent. Can you provide a usage example? Which test is going to use this and how? Do you think it should be used in existing tests or in new tests only? Signed-off-by: Jason Wang jasow...@redhat.com --- client/tests/kvm/kvm_vm.py | 25 + 1 files changed, 25 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py index 0cdf925..a22893b 100755 --- a/client/tests/kvm/kvm_vm.py +++ b/client/tests/kvm/kvm_vm.py @@ -814,7 +814,32 @@ class VM: command, )) return session +def local_login(self, timeout=240): + +Log into the guest via serial console +If timeout expires while waiting for output from the guest (e.g. a +password prompt or a shell prompt) -- fail. + + +serial_mode = self.params.get(serial_mode) +username = self.params.get(username, ) +password = self.params.get(password, ) +prompt = self.params.get(shell_prompt, [\#\$]) +linesep = eval('%s' % self.params.get(shell_linesep, r\n)) +if serial_mode != session: +logging.debug(serial_mode is not session) +return None +else: +command = nc -U %s % self.serial_file_name +assist = self.params.get(prompt_assist) +session = kvm_utils.remote_login(command, password, prompt, linesep, + timeout, , username) ^ You probably meant to pass the prompt assist string to remote_login() but instead you're passing . +if session: + session.set_status_test_command(self.params.get(status_test_ +command, )) +return session + def copy_files_to(self, local_path, remote_path, nic_index=0, timeout=300): Transfer files to the guest. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Autotest mailing list autot...@test.kernel.org http://test.kernel.org/cgi-bin/mailman/listinfo/autotest -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What changed since kvm-72 resulting in winNT to fail to boot (STOP 0x0000001E) ?
02.05.2010 10:06, Avi Kivity wrote: On 04/30/2010 11:06 PM, Michael Tokarev wrote: I've a bugreport handy, see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=575439 about the apparent problem booting winNT 4 in kvm 0.12. At least 2 people were hit by this issue. In short, when booting winNT 4.0, it BSODs with error code 0x001E, which means inaccessible boot device. Note that it is when upgrading from -72 to 0.12 [...] What about 0.11? Does it work? After finding the cause of the other problem (in the thread Booting/Installing Windows NT, all thanks going to Andre Przywara), I can proceed with this issue finally. I tried installing winNT here on old kvm and upgrading kvm. So far I can say that if winNT were installed with kvm-72 and later, it boots just fine in kvm-0.12. So I don't know what the problem is in this case. Maybe it is becauese the OP installed his winNT guest before kvm-72 and now in 0.12 the guest is not able to find its filesystem anymore, or maybe it's because there was some bug fixed in service pack 1 (which I used here) that makes the problem go away - I dunno. But having in mind how picky winNT was for the hardware changes, I don't think it's worth the effort to debug this problem further - winNT is really ancient system, and having upgrade path for kvm from some ancient development snapshot to current version isn't that important, IMHO. Yes, a few people will be hit by this issue, which is a sad thing, but seriously, we've more interesting things to do ;) Thanks! /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Booting/installing WindowsNT
Michael Tokarev wrote: 05.05.2010 12:32, Andre Przywara wrote: Avi Kivity wrote: On 05/04/2010 06:27 PM, Andre Przywara wrote: 3. In all other cases so far it BSoDs with STOP 0x3E error right before displaying that kernel message. MSDN talks about a mulitprocessor configuration error: http://msdn.microsoft.com/en-us/library/ms819006.aspx I suspected the offline CPUs in the mptable that confuse NT. But -smp 1,maxcpus=1 does not make a difference. I will try to dig deeper in this area. OK, I tackled this down. It is the max CPUID level that differs. In the AMD CPUID guide leafs _0002 till _0004 are reserved, the CPU that Michael and I used (K8RevF) actually have a max leaf of 1 here. Default qemu64 has a max leaf of 4. So by saying -cpu qemu64,level=1 (or 2 or 3) it works for me. Modern OS only read leaf 4 on Intel systems, it seems that NT4 is missing this. Confirmed, with -cpu qemu64,level=[123] it works for me as well. The strange thing is that NT4 never reads leaf 4: kvm-2341 [003] 228.527874: kvm_cpuid: func 4000 rax 0 rbx 4b4d564b rcx 564b4d56 rdx 4d kvm-2341 [003] 228.530033: kvm_cpuid: func 1 rax 623 rbx 800 rcx 80002001 rdx 78bfbfd kvm-2341 [003] 228.530081: kvm_cpuid: func 8000 rax 800a rbx 68747541 rcx 444d4163 rdx 69746e65 kvm-2341 [003] 228.530084: kvm_cpuid: func 8008 rax 3028 rbx 0 rcx 0 rdx 0 kvm-2341 [003] 228.530147: kvm_cpuid: func 1 rax 623 rbx 800 rcx 80002001 rdx 78bfbfd kvm-2341 [002] 228.538254: kvm_cpuid: func 1 rax 623 rbx 800 rcx 80002001 rdx 78bfbfd kvm-2341 [002] 228.539902: kvm_cpuid: func 1 rax 623 rbx 800 rcx 80002001 rdx 78bfbfd kvm-2341 [002] 236.273370: kvm_cpuid: func 1 rax 623 rbx 800 rcx 80002001 rdx 78bfbfd kvm-2341 [002] 236.273381: kvm_cpuid: func 0 rax 4 rbx 68747541 rcx 444d4163 rdx 69746e65 With level=4 it BSODs afterwards, with level=1 it beyond that: kvm-2472 [002] 871.379192: kvm_cpuid: func 1 rax 623 rbx 800 rcx 80002001 rdx 78bfbfd kvm-2472 [002] 871.379235: kvm_cpuid: func 0 rax 1 rbx 68747541 rcx 444d4163 rdx 69746e65 kvm-2472 [002] 871.379238: kvm_cpuid: func 1 rax 623 rbx 800 rcx 80002001 rdx 78bfbfd Interestingly it also accesses leaf 8000_0008, I thought that that leaf wasn't around in 1996. Note again that after service pack 6 (I haven't tried other SPs), the problem goes away entirely -- winNT SP6 works with the default kvm cpu just fine. I agree with Avi that it looks like a bug to me. I will see if I can learn more about it. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448-3567-12 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM test: Add new subtest iozone_windows
On Wed, 2010-05-05 at 13:17 +0300, Michael Goldish wrote: On 05/04/2010 01:03 AM, Lucas Meneghel Rodrigues wrote: Following the new IOzone postprocessing changes, add a new KVM subtest iozone_windows, which takes advantage of the fact that there's a windows build for the test, so we can ship it on winutils.iso and run it, providing this way the ability to track IO performance for windows guests also. The new test imports the postprocessing library directly from iozone, so it can postprocess the results right after the benchmark is finished on the windows guest. I'll update winutils.iso on the download page soon. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/kvm/tests/iozone_windows.py | 40 ++ client/tests/kvm/tests_base.cfg.sample |7 - 2 files changed, 46 insertions(+), 1 deletions(-) create mode 100644 client/tests/kvm/tests/iozone_windows.py diff --git a/client/tests/kvm/tests/iozone_windows.py b/client/tests/kvm/tests/iozone_windows.py new file mode 100644 index 000..86ec2c4 --- /dev/null +++ b/client/tests/kvm/tests/iozone_windows.py @@ -0,0 +1,40 @@ +import logging, time, os +from autotest_lib.client.common_lib import error +from autotest_lib.client.bin import utils +from autotest_lib.client.tests.iozone import postprocessing +import kvm_subprocess, kvm_test_utils, kvm_utils + + +def run_iozone_windows(test, params, env): + +Run IOzone for windows on a windows guest: +1) Log into a guest +2) Execute the IOzone test contained in the winutils.iso +3) Get results +4) Postprocess it with the IOzone postprocessing module + +@param test: kvm test object +@param params: Dictionary with the test parameters +@param env: Dictionary with test environment. + +vm = kvm_test_utils.get_living_vm(env, params.get(main_vm)) +session = kvm_test_utils.wait_for_login(vm) +results_path = os.path.join(test.resultsdir, +'raw_output_%s' % test.iteration) +analysisdir = os.path.join(test.resultsdir, 'analysis_%s' % test.iteration) + +# Run IOzone and record its results +c = command=params.get(iozone_cmd) 'command=' looks unnecessary here. Funny, only realized that I left this variable now that you've mentioned :) Will fix it +t = int(params.get(iozone_timeout)) +logging.info(Running IOzone command on guest, timeout %ss, t) +results = session.get_command_output(command=c, timeout=t) Does IOzone produce any output while it's running or only when it's done? If the former is true, we might want to print that output as it's being produced: results = session.get_command_output(command=c, timeout=t, print_func=logging.debug) Good point, it generates output while it's running, that just haven't occurred to me. Fixed all in r4467 http://autotest.kernel.org/changeset/4467 Thanks! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
question on virtio
Hi! I see this in virtio_ring.c: /* Put entry in available array (but don't update avail-idx * until they do sync). */ Why is it done this way? It seems that updating the index straight away would be simpler, while this might allow the host to specilatively look up the buffer and handle it, without waiting for the kick. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Booting/installing WindowsNT
On 05/05/2010 01:18 PM, Andre Przywara wrote: Michael Tokarev wrote: 05.05.2010 12:32, Andre Przywara wrote: Avi Kivity wrote: On 05/04/2010 06:27 PM, Andre Przywara wrote: 3. In all other cases so far it BSoDs with STOP 0x3E error right before displaying that kernel message. MSDN talks about a mulitprocessor configuration error: http://msdn.microsoft.com/en-us/library/ms819006.aspx I suspected the offline CPUs in the mptable that confuse NT. But -smp 1,maxcpus=1 does not make a difference. I will try to dig deeper in this area. OK, I tackled this down. It is the max CPUID level that differs. In the AMD CPUID guide leafs _0002 till _0004 are reserved, the CPU that Michael and I used (K8RevF) actually have a max leaf of 1 here. Default qemu64 has a max leaf of 4. So by saying -cpu qemu64,level=1 (or 2 or 3) it works for me. Modern OS only read leaf 4 on Intel systems, it seems that NT4 is missing this. Confirmed, with -cpu qemu64,level=[123] it works for me as well. The strange thing is that NT4 never reads leaf 4: kvm-2341 [003] 228.527874: kvm_cpuid: func 4000 rax 0 rbx 4b4d564b rcx 564b4d56 rdx 4d kvm-2341 [003] 228.530033: kvm_cpuid: func 1 rax 623 rbx 800 rcx 80002001 rdx 78bfbfd kvm-2341 [003] 228.530081: kvm_cpuid: func 8000 rax 800a rbx 68747541 rcx 444d4163 rdx 69746e65 kvm-2341 [003] 228.530084: kvm_cpuid: func 8008 rax 3028 rbx 0 rcx 0 rdx 0 kvm-2341 [003] 228.530147: kvm_cpuid: func 1 rax 623 rbx 800 rcx 80002001 rdx 78bfbfd kvm-2341 [002] 228.538254: kvm_cpuid: func 1 rax 623 rbx 800 rcx 80002001 rdx 78bfbfd kvm-2341 [002] 228.539902: kvm_cpuid: func 1 rax 623 rbx 800 rcx 80002001 rdx 78bfbfd kvm-2341 [002] 236.273370: kvm_cpuid: func 1 rax 623 rbx 800 rcx 80002001 rdx 78bfbfd kvm-2341 [002] 236.273381: kvm_cpuid: func 0 rax 4 rbx 68747541 rcx 444d4163 rdx 69746e65 So maybe it's just a simple guest bug that was never encountered in real life because no processors had that leaf. With level=4 it BSODs afterwards, with level=1 it beyond that: kvm-2472 [002] 871.379192: kvm_cpuid: func 1 rax 623 rbx 800 rcx 80002001 rdx 78bfbfd kvm-2472 [002] 871.379235: kvm_cpuid: func 0 rax 1 rbx 68747541 rcx 444d4163 rdx 69746e65 kvm-2472 [002] 871.379238: kvm_cpuid: func 1 rax 623 rbx 800 rcx 80002001 rdx 78bfbfd Interestingly it also accesses leaf 8000_0008, I thought that that leaf wasn't around in 1996. It's the bios: src/mtrr.c:cpuid(0x8008u, eax, ebx, ecx, edx); -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] x86: eliminate TS_XSAVE
Your code is functionally equivalent to the immediate values patch; neither uses a direct branch which would be more efficient. Avi Kivity a...@redhat.com wrote: On 05/04/2010 09:24 PM, H. Peter Anvin wrote: I would like to request one change, however. I would like to see the alternatives code to be: movb $0,reg movb $1,reg ... instead of using xor (which has to be padded with NOPs, which is of course pointless since the slot is a fixed size.) Right. I would suggest using a byte-sized variable instead of a dword-size variable to save a few bytes, too. I used a bool, and the code already compiles to a byte mov. Though it could be argued that a word instruction is better since it avoids a false dependency, and allows a preceding instruction that modifies %reg to be executed after the mov instruction. Once the jump label framework is integrated and has matured, I think we should consider using it to save the mov/test/jump. IIRC that has an implied unlikely() which isn't suitable here? Perhaps the immediate values patches. -- error compiling committee.c: too many arguments to function -- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Re: [PATCH 1/2] x86: eliminate TS_XSAVE
You don't want to use bool since some gcc versions don't handle bool in asm well; use a u8 instead. Avi Kivity a...@redhat.com wrote: On 05/04/2010 09:24 PM, H. Peter Anvin wrote: I would like to request one change, however. I would like to see the alternatives code to be: movb $0,reg movb $1,reg ... instead of using xor (which has to be padded with NOPs, which is of course pointless since the slot is a fixed size.) Right. I would suggest using a byte-sized variable instead of a dword-size variable to save a few bytes, too. I used a bool, and the code already compiles to a byte mov. Though it could be argued that a word instruction is better since it avoids a false dependency, and allows a preceding instruction that modifies %reg to be executed after the mov instruction. Once the jump label framework is integrated and has matured, I think we should consider using it to save the mov/test/jump. IIRC that has an implied unlikely() which isn't suitable here? Perhaps the immediate values patches. -- error compiling committee.c: too many arguments to function -- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Re: [PATCH v3 7/10] KVM MMU: allow more page become unsync at gfn mapping time
Marcelo Tosatti wrote: On Wed, Apr 28, 2010 at 11:55:49AM +0800, Xiao Guangrong wrote: In current code, shadow page can become asynchronous only if one shadow page for a gfn, this rule is too strict, in fact, we can let all last mapping page(i.e, it's the pte page) become unsync, and sync them at invlpg or flush tlb time. This patch allow more page become asynchronous at gfn mapping time Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com Xiao, This patch breaks Fedora 8 32 install. Reverted patches 5-10. Hi Marcelo, Sorry for the delay reply since i'm on holiday. I have found the reason of this issue, two fix patches will be sent soon, could you please try it? Thanks, Xiao -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM MMU: fix for forgot mark parent-unsync_children bit
When mapping a new parent to unsync shadow page, we should mark parent's unsync_children bit Reported-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 97f2ea0..bf35a2f 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1374,7 +1374,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, if (sp-unsync_children) { set_bit(KVM_REQ_MMU_SYNC, vcpu-requests); kvm_mmu_mark_parents_unsync(sp); - } + } else if (sp-unsync) + kvm_mmu_mark_parents_unsync(sp); + trace_kvm_mmu_get_page(sp, false); return sp; } -- 1.6.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM MMU: fix race in invlpg code
It has race in invlpg code, like below sequences: A: hold mmu_lock and get 'sp' B: release mmu_lock and do other things C: hold mmu_lock and continue use 'sp' if other path freed 'sp' in stage B, then kernel will crash This patch checks 'sp' whether lived before use 'sp' in stage C Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/paging_tmpl.h | 22 -- 1 files changed, 20 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 624b38f..13ea675 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -462,11 +462,16 @@ out_unlock: static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) { - struct kvm_mmu_page *sp = NULL; + struct kvm_mmu_page *sp = NULL, *s; struct kvm_shadow_walk_iterator iterator; + struct hlist_head *bucket; + struct hlist_node *node, *tmp; gfn_t gfn = -1; u64 *sptep = NULL, gentry; int invlpg_counter, level, offset = 0, need_flush = 0; + unsigned index; + bool live = false; + union kvm_mmu_page_role role; spin_lock(vcpu-kvm-mmu_lock); @@ -480,7 +485,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) if (!sp-unsync) break; - + role = sp-role; WARN_ON(level != PT_PAGE_TABLE_LEVEL); shift = PAGE_SHIFT - (PT_LEVEL_BITS - PT64_LEVEL_BITS) * level; @@ -519,10 +524,23 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) mmu_guess_page_from_pte_write(vcpu, gfn_to_gpa(gfn) + offset, gentry); spin_lock(vcpu-kvm-mmu_lock); + index = kvm_page_table_hashfn(gfn); + bucket = vcpu-kvm-arch.mmu_page_hash[index]; + hlist_for_each_entry_safe(s, node, tmp, bucket, hash_link) + if (s == sp) { + if (s-gfn == gfn s-role.word == role.word) + live = true; + break; + } + + if (!live) + goto unlock_exit; + if (atomic_read(vcpu-kvm-arch.invlpg_counter) == invlpg_counter) { ++vcpu-kvm-stat.mmu_pte_updated; FNAME(update_pte)(vcpu, sp, sptep, gentry); } +unlock_exit: spin_unlock(vcpu-kvm-mmu_lock); mmu_release_page_from_pte_write(vcpu); } -- 1.6.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM MMU: fix for forgot mark parent-unsync_children bit
On 05/05/2010 03:19 PM, Xiao Guangrong wrote: When mapping a new parent to unsync shadow page, we should mark parent's unsync_children bit Reported-by: Marcelo Tosattimtosa...@redhat.com Signed-off-by: Xiao Guangrongxiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 97f2ea0..bf35a2f 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1374,7 +1374,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, if (sp-unsync_children) { set_bit(KVM_REQ_MMU_SYNC,vcpu-requests); kvm_mmu_mark_parents_unsync(sp); - } + } else if (sp-unsync) + kvm_mmu_mark_parents_unsync(sp); + trace_kvm_mmu_get_page(sp, false); return sp; } Which patch does this fix? If it wasn't merge yet, please repost with the fix included. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM MMU: fix race in invlpg code
On 05/05/2010 03:21 PM, Xiao Guangrong wrote: It has race in invlpg code, like below sequences: A: hold mmu_lock and get 'sp' B: release mmu_lock and do other things C: hold mmu_lock and continue use 'sp' if other path freed 'sp' in stage B, then kernel will crash This patch checks 'sp' whether lived before use 'sp' in stage C Signed-off-by: Xiao Guangrongxiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/paging_tmpl.h | 22 -- 1 files changed, 20 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 624b38f..13ea675 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -462,11 +462,16 @@ out_unlock: static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) { - struct kvm_mmu_page *sp = NULL; + struct kvm_mmu_page *sp = NULL, *s; struct kvm_shadow_walk_iterator iterator; + struct hlist_head *bucket; + struct hlist_node *node, *tmp; gfn_t gfn = -1; u64 *sptep = NULL, gentry; int invlpg_counter, level, offset = 0, need_flush = 0; + unsigned index; + bool live = false; + union kvm_mmu_page_role role; spin_lock(vcpu-kvm-mmu_lock); @@ -480,7 +485,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) if (!sp-unsync) break; - + role = sp-role; WARN_ON(level != PT_PAGE_TABLE_LEVEL); shift = PAGE_SHIFT - (PT_LEVEL_BITS - PT64_LEVEL_BITS) * level; @@ -519,10 +524,23 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva) mmu_guess_page_from_pte_write(vcpu, gfn_to_gpa(gfn) + offset, gentry); spin_lock(vcpu-kvm-mmu_lock); + index = kvm_page_table_hashfn(gfn); + bucket =vcpu-kvm-arch.mmu_page_hash[index]; + hlist_for_each_entry_safe(s, node, tmp, bucket, hash_link) + if (s == sp) { + if (s-gfn == gfn s-role.word == role.word) + live = true; + break; + } + + if (!live) + goto unlock_exit; + Did you try the root_count method? I think it's cleaner. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM MMU: fix for forgot mark parent-unsync_children bit
Avi Kivity wrote: On 05/05/2010 03:19 PM, Xiao Guangrong wrote: When mapping a new parent to unsync shadow page, we should mark parent's unsync_children bit Reported-by: Marcelo Tosattimtosa...@redhat.com Signed-off-by: Xiao Guangrongxiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 97f2ea0..bf35a2f 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1374,7 +1374,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, if (sp-unsync_children) { set_bit(KVM_REQ_MMU_SYNC,vcpu-requests); kvm_mmu_mark_parents_unsync(sp); -} +} else if (sp-unsync) +kvm_mmu_mark_parents_unsync(sp); + trace_kvm_mmu_get_page(sp, false); return sp; } Which patch does this fix? If it wasn't merge yet, please repost with the fix included. Oh, OK, i'll sent the previous pathset that are reverted by Marcelo. Thanks, Xiao -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM MMU: fix race in invlpg code
Avi Kivity wrote: spin_lock(vcpu-kvm-mmu_lock); +index = kvm_page_table_hashfn(gfn); +bucket =vcpu-kvm-arch.mmu_page_hash[index]; +hlist_for_each_entry_safe(s, node, tmp, bucket, hash_link) +if (s == sp) { +if (s-gfn == gfn s-role.word == role.word) +live = true; +break; +} + +if (!live) +goto unlock_exit; + Did you try the root_count method? I think it's cleaner. Avi, Thanks for your idea. I have considered this method, but i'm not sure when it's the good time to real free this page, and i think we also need a way to synchronize the real free path and this path. Do you have any comment for it :-( Xiao -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM MMU: fix race in invlpg code
On 05/05/2010 03:45 PM, Xiao Guangrong wrote: Avi Kivity wrote: spin_lock(vcpu-kvm-mmu_lock); +index = kvm_page_table_hashfn(gfn); +bucket =vcpu-kvm-arch.mmu_page_hash[index]; +hlist_for_each_entry_safe(s, node, tmp, bucket, hash_link) +if (s == sp) { +if (s-gfn == gfn s-role.word == role.word) +live = true; +break; +} + +if (!live) +goto unlock_exit; + Did you try the root_count method? I think it's cleaner. Avi, Thanks for your idea. I have considered this method, but i'm not sure when it's the good time to real free this page, and i think we also need a way to synchronize the real free path and this path. Do you have any comment for it :-( Same as mmu_free_roots(): --sp-root_count; if (!sp-root_count sp-role.invalid) { kvm_mmu_zap_page(vcpu-kvm, sp); goto unlock_exit; } -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] KVM MMU: do not intercept invlpg if 'oos_shadow' is disabled
Avi Kivity wrote: On 04/30/2010 12:05 PM, Xiao Guangrong wrote: If 'oos_shadow' == 0, intercepting invlpg command is really unnecessary. And it's good for us to compare the performance between enable 'oos_shadow' and disable 'oos_shadow' @@ -74,8 +74,9 @@ static int dbg = 0; module_param(dbg, bool, 0644); #endif -static int oos_shadow = 1; +int __read_mostly oos_shadow = 1; module_param(oos_shadow, bool, 0644); +EXPORT_SYMBOL_GPL(oos_shadow); Please rename to kvm_oos_shadow to reduce potential for conflict with other global names. But really, this is a debug option, I don't expect people to run with oos_shadow=0, so there's not much motivation to optimize it. Agreed, but, 'oos_shadow' option is document in Documentation/kernel-parameters.txt, if it's just a debug option, i think we do better not document it. Thanks, Xiao -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[qemu-kvm tests PATCH v2] qemu-kvm tests: fix linker script problem
commit 848bd0c89c83814023cf51c72effdbc7de0d18b7 causes the linker script itself (flat.lds) to become part of the linked objects, which messed the output file, specifically, the symbol edata is not the last symbol anymore. change v1 - v2 Instead of dropping the dependency, put it on a separate line/rule, so the lds file will not be considered as one of the dependencies in the linking line/rule. Signed-off-by: Naphtali Sprei nsp...@redhat.com --- kvm/user/config-x86-common.mak |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/kvm/user/config-x86-common.mak b/kvm/user/config-x86-common.mak index 61cc2f0..241c422 100644 --- a/kvm/user/config-x86-common.mak +++ b/kvm/user/config-x86-common.mak @@ -19,7 +19,8 @@ CFLAGS += -m$(bits) libgcc := $(shell $(CC) -m$(bits) --print-libgcc-file-name) FLATLIBS = test/lib/libcflat.a $(libgcc) -%.flat: %.o $(FLATLIBS) flat.lds +%.flat: flat.lds +%.flat: %.o $(FLATLIBS) $(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,flat.lds $^ $(FLATLIBS) tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \ -- 1.6.3.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] KVM: SVM: Don't allow nested guest to VMMCALL into host
This patch disables the possibility for a l2-guest to do a VMMCALL directly into the host. This would happen if the l1-hypervisor doesn't intercept VMMCALL and the l2-guest executes this instruction. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/kvm/svm.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index bc087c7..2e9b57a 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2036,6 +2036,9 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm) svm-vmcb-control.intercept_cr_write = ~INTERCEPT_CR8_MASK; } + /* We don't want to see VMMCALLs from a nested guest */ + svm-vmcb-control.intercept = ~(1ULL INTERCEPT_VMMCALL); + /* * We don't want a nested guest to be more powerful than the guest, so * all intercepts are ORed -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] Important fixes for KVM-AMD
Hi Avi, Marcelo, here is a set of patches which fix problems in kvm-amd. Patch 1 fixes a stupid problem with the event-reinjection introduced by me in my previous patchset. Patch 2 was a helper to find the bug patch 3 fixes. I kept it in the patchset because it may be helpful in the future to debug other problems too. Patch 3 is the most important fix because it makes kvm-amd on 32 bit hosts work again. Without this patch the first vmrum fails with exit-reason VMEXIT_INVALID. Patch 4 fixes the Xen 4.0 shipped with SLES11 in nested svm. The last patch in this series fixes a potential l2-guest breakout scenario because it may be possible for the l2-guest to issue hypercalls directly to the host if the l1-hypervisor does not intercept VMMCALL. Thanks, Joerg Diffstat: arch/x86/include/asm/msr-index.h |2 + arch/x86/kvm/svm.c | 108 -- arch/x86/kvm/x86.c |2 +- 3 files changed, 106 insertions(+), 6 deletions(-) Shortlog: Joerg Roedel (5): KVM: X86: Fix stupid bug in exception reinjection path KVM: SVM: Dump vmcb contents on failed vmrun KVM: SVM: Fix wrong intercept masks on 32 bit KVM: SVM: Allow EFER.LMSLE to be set with nested svm KVM: SVM: Don't allow nested guest to VMMCALL into host -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] KVM: X86: Fix stupid bug in exception reinjection path
The patch merged recently which allowed to mark an exception as reinjected has a bug as it always marks the exception as reinjected. This breaks nested-svm shadow-on-shadow implementation. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/kvm/x86.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6b2ce1d..c83528e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -277,7 +277,7 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu, vcpu-arch.exception.has_error_code = has_error; vcpu-arch.exception.nr = nr; vcpu-arch.exception.error_code = error_code; - vcpu-arch.exception.reinject = true; + vcpu-arch.exception.reinject = reinject; return; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] KVM: SVM: Allow EFER.LMSLE to be set with nested svm
This patch enables setting of efer bit 13 which is allowed in all SVM capable processors. This is necessary for the SLES11 version of Xen 4.0 to boot with nested svm. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/include/asm/msr-index.h |2 ++ arch/x86/kvm/svm.c |2 +- 2 files changed, 3 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index bc473ac..352767d 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -20,6 +20,7 @@ #define _EFER_LMA 10 /* Long mode active (read-only) */ #define _EFER_NX 11 /* No execute enable */ #define _EFER_SVME 12 /* Enable virtualization */ +#define _EFER_LMSLE13 /* Long Mode Segment Limit Enable */ #define _EFER_FFXSR14 /* Enable Fast FXSAVE/FXRSTOR */ #define EFER_SCE (1_EFER_SCE) @@ -27,6 +28,7 @@ #define EFER_LMA (1_EFER_LMA) #define EFER_NX(1_EFER_NX) #define EFER_SVME (1_EFER_SVME) +#define EFER_LMSLE (1_EFER_LMSLE) #define EFER_FFXSR (1_EFER_FFXSR) /* Intel MSRs. Some also available on other CPUs */ diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 74f7b9d..bc087c7 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -610,7 +610,7 @@ static __init int svm_hardware_setup(void) if (nested) { printk(KERN_INFO kvm: Nested Virtualization enabled\n); - kvm_enable_efer_bits(EFER_SVME); + kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE); } for_each_possible_cpu(cpu) { -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] KVM: SVM: Dump vmcb contents on failed vmrun
This patch adds a function to dump the vmcb into the kernel log and calls it after a failed vmrun to ease debugging. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/kvm/svm.c | 95 1 files changed, 95 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 889f660..0201b06 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2637,6 +2637,99 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm) = { [SVM_EXIT_NPF] = pf_interception, }; +void dump_vmcb(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + struct vmcb_control_area *control = svm-vmcb-control; + struct vmcb_save_area *save = svm-vmcb-save; + + pr_err(VMCB Control Area:\n); + pr_err(cr_read:%04x\n, control-intercept_cr_read); + pr_err(cr_write: %04x\n, control-intercept_cr_write); + pr_err(dr_read:%04x\n, control-intercept_dr_read); + pr_err(dr_write: %04x\n, control-intercept_dr_write); + pr_err(exceptions: %08x\n, control-intercept_exceptions); + pr_err(intercepts: %016llx\n, control-intercept); + pr_err(pause filter count: %d\n, control-pause_filter_count); + pr_err(iopm_base_pa: %016llx\n, control-iopm_base_pa); + pr_err(msrpm_base_pa: %016llx\n, control-msrpm_base_pa); + pr_err(tsc_offset: %016llx\n, control-tsc_offset); + pr_err(asid: %d\n, control-asid); + pr_err(tlb_ctl:%d\n, control-tlb_ctl); + pr_err(int_ctl:%08x\n, control-int_ctl); + pr_err(int_vector: %08x\n, control-int_vector); + pr_err(int_state: %08x\n, control-int_state); + pr_err(exit_code: %08x\n, control-exit_code); + pr_err(exit_info1: %016llx\n, control-exit_info_1); + pr_err(exit_info2: %016llx\n, control-exit_info_2); + pr_err(exit_int_info: %08x\n, control-exit_int_info); + pr_err(exit_int_info_err: %08x\n, control-exit_int_info_err); + pr_err(nested_ctl: %lld\n, control-nested_ctl); + pr_err(nested_cr3: %016llx\n, control-nested_cr3); + pr_err(event_inj: %08x\n, control-event_inj); + pr_err(event_inj_err: %08x\n, control-event_inj_err); + pr_err(lbr_ctl:%lld\n, control-lbr_ctl); + pr_err(next_rip: %016llx\n, control-next_rip); + pr_err(VMCB State Save Area:\n); + pr_err(es: s: %04x a: %04x l: %08x b: %016llx\n, + save-es.selector, save-es.attrib, + save-es.limit, save-es.base); + pr_err(cs: s: %04x a: %04x l: %08x b: %016llx\n, + save-cs.selector, save-cs.attrib, + save-cs.limit, save-cs.base); + pr_err(ss: s: %04x a: %04x l: %08x b: %016llx\n, + save-ss.selector, save-ss.attrib, + save-ss.limit, save-ss.base); + pr_err(ds: s: %04x a: %04x l: %08x b: %016llx\n, + save-ds.selector, save-ds.attrib, + save-ds.limit, save-ds.base); + pr_err(fs: s: %04x a: %04x l: %08x b: %016llx\n, + save-fs.selector, save-fs.attrib, + save-fs.limit, save-fs.base); + pr_err(gs: s: %04x a: %04x l: %08x b: %016llx\n, + save-gs.selector, save-gs.attrib, + save-gs.limit, save-gs.base); + pr_err(gdtr: s: %04x a: %04x l: %08x b: %016llx\n, + save-gdtr.selector, save-gdtr.attrib, + save-gdtr.limit, save-gdtr.base); + pr_err(ldtr: s: %04x a: %04x l: %08x b: %016llx\n, + save-ldtr.selector, save-ldtr.attrib, + save-ldtr.limit, save-ldtr.base); + pr_err(idtr: s: %04x a: %04x l: %08x b: %016llx\n, + save-idtr.selector, save-idtr.attrib, + save-idtr.limit, save-idtr.base); + pr_err(tr: s: %04x a: %04x l: %08x b: %016llx\n, + save-tr.selector, save-tr.attrib, + save-tr.limit, save-tr.base); + pr_err(cpl:%defer: %016llx\n, + save-cpl, save-efer); + pr_err(cr0:%016llx cr2: %016llx\n, + save-cr0, save-cr2); + pr_err(cr3:%016llx cr4: %016llx\n, + save-cr3, save-cr4); + pr_err(dr6:%016llx dr7: %016llx\n, + save-dr6, save-dr7); + pr_err(rip:%016llx rflags: %016llx\n, + save-rip, save-rflags); + pr_err(rsp:%016llx rax: %016llx\n, + save-rsp, save-rax); + pr_err(star: %016llx lstar:%016llx\n, + save-star, save-lstar); + pr_err(cstar: %016llx sfmask: %016llx\n, + save-cstar,
[PATCH 3/5] KVM: SVM: Fix wrong intercept masks on 32 bit
This patch makes KVM on 32 bit SVM working again by correcting the masks used for iret interception. With the wrong masks the upper 32 bits of the intercepts are masked out which leaves vmrun unintercepted. This is not legal on svm and the vmrun fails. Bug was introduced by commits 95ba827313 and 3cfc3092. Cc: Jan Kiszka jan.kis...@siemens.com Cc: Gleb Natapov g...@redhat.com Cc: sta...@kernel.org Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/kvm/svm.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 0201b06..74f7b9d 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2290,7 +2290,7 @@ static int cpuid_interception(struct vcpu_svm *svm) static int iret_interception(struct vcpu_svm *svm) { ++svm-vcpu.stat.nmi_window_exits; - svm-vmcb-control.intercept = ~(1UL INTERCEPT_IRET); + svm-vmcb-control.intercept = ~(1ULL INTERCEPT_IRET); svm-vcpu.arch.hflags |= HF_IRET_MASK; return 1; } @@ -2824,7 +2824,7 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu) svm-vmcb-control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI; vcpu-arch.hflags |= HF_NMI_MASK; - svm-vmcb-control.intercept |= (1UL INTERCEPT_IRET); + svm-vmcb-control.intercept |= (1ULL INTERCEPT_IRET); ++vcpu-stat.nmi_injections; } @@ -2891,10 +2891,10 @@ static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked) if (masked) { svm-vcpu.arch.hflags |= HF_NMI_MASK; - svm-vmcb-control.intercept |= (1UL INTERCEPT_IRET); + svm-vmcb-control.intercept |= (1ULL INTERCEPT_IRET); } else { svm-vcpu.arch.hflags = ~HF_NMI_MASK; - svm-vmcb-control.intercept = ~(1UL INTERCEPT_IRET); + svm-vmcb-control.intercept = ~(1ULL INTERCEPT_IRET); } } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] KVM MMU: do not intercept invlpg if 'oos_shadow' is disabled
On 05/05/2010 03:54 PM, Xiao Guangrong wrote: Avi Kivity wrote: On 04/30/2010 12:05 PM, Xiao Guangrong wrote: If 'oos_shadow' == 0, intercepting invlpg command is really unnecessary. And it's good for us to compare the performance between enable 'oos_shadow' and disable 'oos_shadow' @@ -74,8 +74,9 @@ static int dbg = 0; module_param(dbg, bool, 0644); #endif -static int oos_shadow = 1; +int __read_mostly oos_shadow = 1; module_param(oos_shadow, bool, 0644); +EXPORT_SYMBOL_GPL(oos_shadow); Please rename to kvm_oos_shadow to reduce potential for conflict with other global names. But really, this is a debug option, I don't expect people to run with oos_shadow=0, so there's not much motivation to optimize it. Agreed, but, 'oos_shadow' option is document in Documentation/kernel-parameters.txt, if it's just a debug option, i think we do better not document it. It has to be documented, otherwise people complain :) Anyway the variable name and the option name don't have to be the same (I think). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] KVM: SVM: Allow EFER.LMSLE to be set with nested svm
On 05/05/2010 05:04 PM, Joerg Roedel wrote: This patch enables setting of efer bit 13 which is allowed in all SVM capable processors. This is necessary for the SLES11 version of Xen 4.0 to boot with nested svm. Interesting, why does it require it? Obviously it isn't needed since it manages to run on Intel without it. /* Intel MSRs. Some also available on other CPUs */ diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 74f7b9d..bc087c7 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -610,7 +610,7 @@ static __init int svm_hardware_setup(void) if (nested) { printk(KERN_INFO kvm: Nested Virtualization enabled\n); - kvm_enable_efer_bits(EFER_SVME); + kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE); } for_each_possible_cpu(cpu) { What if the host doesn't have it? Why enable it only for the nested case? It's not svm specific (it's useful for running non-hvm Xen in non-nested mode). Isn't there a cpuid bit for it? If so, it should be exposed to userspace, and the feature should depend on it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] KVM: SVM: Allow EFER.LMSLE to be set with nested svm
On Wed, May 05, 2010 at 05:46:59PM +0300, Avi Kivity wrote: On 05/05/2010 05:04 PM, Joerg Roedel wrote: This patch enables setting of efer bit 13 which is allowed in all SVM capable processors. This is necessary for the SLES11 version of Xen 4.0 to boot with nested svm. Interesting, why does it require it? I don't know. I traced the Xen crash down and found that is gets a #GP because it tries to set this bit. Obviously it isn't needed since it manages to run on Intel without it. I have heard inofficial statements that they set this bit to provide the functionality to their guests. And Xen sets this bit together with the SVM bit. /* Intel MSRs. Some also available on other CPUs */ diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 74f7b9d..bc087c7 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -610,7 +610,7 @@ static __init int svm_hardware_setup(void) if (nested) { printk(KERN_INFO kvm: Nested Virtualization enabled\n); -kvm_enable_efer_bits(EFER_SVME); +kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE); } for_each_possible_cpu(cpu) { What if the host doesn't have it? It is present in all SVM capable AMD processors. Why enable it only for the nested case? It's not svm specific (it's useful for running non-hvm Xen in non-nested mode). Because there is no cpuid bit for this feature. You can roughly check for it using the svm cpuid bit. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [qemu-kvm tests PATCH v2] qemu-kvm tests: fix linker script problem
On 05/05/2010 04:53 PM, Naphtali Sprei wrote: commit 848bd0c89c83814023cf51c72effdbc7de0d18b7 causes the linker script itself (flat.lds) to become part of the linked objects, which messed the output file, specifically, the symbol edata is not the last symbol anymore. change v1 - v2 Instead of dropping the dependency, put it on a separate line/rule, so the lds file will not be considered as one of the dependencies in the linking line/rule. FLATLIBS = test/lib/libcflat.a $(libgcc) -%.flat: %.o $(FLATLIBS) flat.lds +%.flat: flat.lds +%.flat: %.o $(FLATLIBS) $(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,flat.lds $^ $(FLATLIBS) I don't think that works - $^ selects all prerequisites, not just the ones in the line for the make rule. prereq-%: touch $@ dummy: prereq-1 dummy: prereq-2 echo $^ $ make dummy touch prereq-2 touch prereq-1 echo prereq-2 prereq-1 prereq-2 prereq-1 -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] KVM: SVM: Allow EFER.LMSLE to be set with nested svm
On 05/05/2010 06:04 PM, Joerg Roedel wrote: Because there is no cpuid bit for this feature. That is sad. You can roughly check for it using the svm cpuid bit. Doesn't it kill cross-vendor migration? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] KVM: SVM: Allow EFER.LMSLE to be set with nested svm
On Wed, May 05, 2010 at 11:06:59AM -0400, Avi Kivity wrote: On 05/05/2010 06:04 PM, Joerg Roedel wrote: You can roughly check for it using the svm cpuid bit. Doesn't it kill cross-vendor migration? Enabling Nested SVM kills it anyway, so this is not an issue. AFAIK the feature is not present on Intel CPUs. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] turn off kvmclock when resetting cpu
On Wed, May 05, 2010 at 10:26:43AM +0300, Avi Kivity wrote: +msr_data.info.nmsrs = n; + +return kvm_vcpu_ioctl(env, KVM_SET_MSRS,msr_data); +} + + How about a different approach? Query the supported MSRs (KVM_GET_MSR_LIST or thereabout) and reset them (with special cases for the TSC, and the old clock MSRs when the new ones are present)? I didn't went that route because I was unsure that every one of them would be resetable by writing 0 on it. And if we are going to special case the most part of it, then there is no point in doing it. If you think it is doable to special case just the tsc, then I am fine. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] turn off kvmclock when resetting cpu
On 05/05/2010 06:24 PM, Glauber Costa wrote: On Wed, May 05, 2010 at 10:26:43AM +0300, Avi Kivity wrote: +msr_data.info.nmsrs = n; + +return kvm_vcpu_ioctl(env, KVM_SET_MSRS,msr_data); +} + + How about a different approach? Query the supported MSRs (KVM_GET_MSR_LIST or thereabout) and reset them (with special cases for the TSC, and the old clock MSRs when the new ones are present)? I didn't went that route because I was unsure that every one of them would be resetable by writing 0 on it. There are probably others. We should reset them correctly anyway. It's probably done by generic qemu code so it works. And if we are going to special case the most part of it, then there is no point in doing it. If you think it is doable to special case just the tsc, then I am fine. I think if we have the following sequence clear all msrs qemu reset kvm specific msr reset Then we'd be fine. Oh, and tsc needs to be reset to 0 as well - it isn't a special case. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vCPU scalability for linux VMs
On Wednesday, May 5, 2010 at 04:43:32 -0400, Avi Kivity wrote: So you have a total of 32 vcpus on 8 cores? This is known to be problematic. You may see some improvement by enabling hyperthreading. exactly, 32 vCPUs on 8 core hardware that doesn't support hyperthreading (Clovertown E5335) There is ongoing work to improve this. If interested - let me know when is it going to be a good time/build to do the regression on this. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[qemu-kvm tests PATCH v3] qemu-kvm tests: fix linker script problem
commit 848bd0c89c83814023cf51c72effdbc7de0d18b7 causes the linker script itself (flat.lds) to become part of the linked objects, which messed the output file, specifically, the symbol edata is not the last symbol anymore. changes v2 - v3 Instead of using a separate rule, which doesn't really adds the flat.lds to the prerequisite list, use Avi suggestion of filtering. Signed-off-by: Naphtali Sprei nsp...@redhat.com --- kvm/user/config-x86-common.mak |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kvm/user/config-x86-common.mak b/kvm/user/config-x86-common.mak index 61cc2f0..cf36aa1 100644 --- a/kvm/user/config-x86-common.mak +++ b/kvm/user/config-x86-common.mak @@ -20,7 +20,7 @@ libgcc := $(shell $(CC) -m$(bits) --print-libgcc-file-name) FLATLIBS = test/lib/libcflat.a $(libgcc) %.flat: %.o $(FLATLIBS) flat.lds - $(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,flat.lds $^ $(FLATLIBS) + $(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,flat.lds $(filter %.o, $^) $(FLATLIBS) tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \ $(TEST_DIR)/smptest.flat $(TEST_DIR)/port80.flat \ -- 1.6.3.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] turn off kvmclock when resetting cpu
On Wed, May 05, 2010 at 06:34:22PM +0300, Avi Kivity wrote: On 05/05/2010 06:24 PM, Glauber Costa wrote: On Wed, May 05, 2010 at 10:26:43AM +0300, Avi Kivity wrote: +msr_data.info.nmsrs = n; + +return kvm_vcpu_ioctl(env, KVM_SET_MSRS,msr_data); +} + + How about a different approach? Query the supported MSRs (KVM_GET_MSR_LIST or thereabout) and reset them (with special cases for the TSC, and the old clock MSRs when the new ones are present)? I didn't went that route because I was unsure that every one of them would be resetable by writing 0 on it. There are probably others. We should reset them correctly anyway. It's probably done by generic qemu code so it works. And if we are going to special case the most part of it, then there is no point in doing it. If you think it is doable to special case just the tsc, then I am fine. I think if we have the following sequence clear all msrs qemu reset kvm specific msr reset Then we'd be fine. Oh, and tsc needs to be reset to 0 as well - it isn't a special case. This means a guest running on a perfectly synchronized tsc host will not see a sync tsc. Simply because we can't possible reset all tscs at the same time. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch uq/master 0/9] enable smp 1 and related fixes
On 05/04/2010 07:45 AM, Marcelo Tosatti wrote: -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html How does this work without an in-kernel apic (or does uq/master already have an in-kernel apic)? Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: Network Plugin Architecture (NPA) for vmxnet3
On Tue, May 04, 2010 at 05:58:52PM -0700, Chris Wright wrote: Date: Tue, 4 May 2010 17:58:52 -0700 From: Chris Wright chr...@sous-sol.org To: Pankaj Thakkar pthak...@vmware.com CC: linux-ker...@vger.kernel.org linux-ker...@vger.kernel.org, net...@vger.kernel.org net...@vger.kernel.org, virtualizat...@lists.linux-foundation.org virtualizat...@lists.linux-foundation.org, pv-driv...@vmware.com pv-driv...@vmware.com, Shreyas Bhatewara sbhatew...@vmware.com, kvm@vger.kernel.org kvm@vger.kernel.org Subject: Re: RFC: Network Plugin Architecture (NPA) for vmxnet3 * Pankaj Thakkar (pthak...@vmware.com) wrote: We intend to upgrade the upstreamed vmxnet3 driver to implement NPA so that Linux users can exploit the benefits provided by passthrough devices in a seamless manner while retaining the benefits of virtualization. The document below tries to answer most of the questions which we anticipated. Please let us know your comments and queries. How does the throughput, latency, and host CPU utilization for normal data path compare with say NetQueue? NetQueue is really for scaling across multiple VMs. NPA allows similar scaling and also helps in improving the CPU efficiency for a single VM since the hypervisor is bypassed. Througput wise both emulation and passthrough (NPA) can obtain line rates on 10gig but passthrough saves upto 40% cpu based on the workload. We did a demo at IDF 2009 where we compared 8 VMs running on NetQueue v/s 8 VMs running on NPA (using Niantic) and we obtained similar CPU efficiency gains. And does this obsolete your UPT implementation? NPA and UPT share a lot of code in the hypervisor. UPT was adopted only by very limited IHVs and hence NPA is our way forward to have all IHVs onboard. How many cards actually support this NPA interface? What does it look like, i.e. where is the NPA specification? (AFAIK, we never got the UPT one). We have it working internally with Intel Niantic (10G) and Kawela (1G) SR-IOV NIC. We are also working with upcoming Broadcom 10G card and plan to support other IHVs. This is unlike UPT so we don't dictate the register sets or rings like we did in UPT. Rather we have guidelines like that the card should have an embedded switch for inter VF switching or should support programming (rx filters, VLAN, etc) though the PF driver rather than the VF driver. How do you handle hardware which has a more symmetric view of the SR-IOV world (SR-IOV is only PCI sepcification, not a network driver specification)? Or hardware which has multiple functions per physical port (multiqueue, hw filtering, embedded switch, etc.)? I am not sure what do you mean by symmetric view of SR-IOV world? NPA allows multi-queue VFs and requires an embedded switch currently. As far as the PF driver is concerned we require IHVs to support all existing and upcoming features like NetQueue, FCoE, etc. The PF driver is considered special and is used to drive the traffic for the emulated/paravirtualized VMs and is also used to program things on behalf of the VFs through the hypervisor. If the hardware has multiple physical functions they are treated as separate adapters (with their own set of VFs) and we require the embedded switch to maintain that distinction as well. NPA offers several benefits: 1. Performance: Critical performance sensitive paths are not trapped and the guest can directly drive the hardware without incurring virtualization overheads. Can you demonstrate with data? The setup is 2.667Ghz Nehalem server running SLES11 VM talking to a 2.33Ghz Barcelona client box running RHEL 5.1. We had netperf streams with 16k msg size over 64k socket size running between server VM and client and they are using Intel Niantic 10G cards. In both cases (NPA and regular) the VM was CPU saturated (used one full core). TX: regular vmxnet3 = 3085.5 Mbps/GHz; NPA vmxnet3 = 4397.2 Mbps/GHz RX: regular vmxnet3 = 1379.6 Mbps/GHz; NPA vmxnet3 = 2349.7 Mbps/GHz We have similar results for other configuration and in general we have seen NPA is better in terms of CPU cost and can save upto 40% of CPU cost. 2. Hypervisor control: All control operations from the guest such as programming MAC address go through the hypervisor layer and hence can be subjected to hypervisor policies. The PF driver can be further used to put policy decisions like which VLAN the guest should be on. This can happen without NPA as well. VF simply needs to request the change via the PF (in fact, hw does that right now). Also, we already have a host side management interface via PF (see, for example, RTM_SETLINK IFLA_VF_MAC interface). What is control plane interface? Just something like a fixed register set? All operations other than TX/RX go through the vmxnet3 shell to the vmxnet3 device emulation. So the control plane is really the vmxnet3 device emulation as far as the guest is concerned. 3. Guest Management: No