Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
On 06.05.14 02:41, Paul Mackerras wrote: On Mon, May 05, 2014 at 01:19:30PM +0200, Alexander Graf wrote: On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: +#ifdef CONFIG_PPC_BOOK3S_64 + return vcpu->arch.fault_dar; How about PA6T and G5s? G5 sets DAR on an alignment interrupt. As for PA6T, I don't know for sure, but if it doesn't, ordinary alignment interrupts wouldn't be handled properly, since the code in arch/powerpc/kernel/align.c assumes DAR contains the address being accessed on all PowerPC CPUs. Now that's a good point. If we simply behave like Linux, I'm fine. This definitely deserves a comment on the #ifdef in the code. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest
On 06.05.14 06:26, Gavin Shan wrote: On Mon, May 05, 2014 at 08:00:12AM -0600, Alex Williamson wrote: On Mon, 2014-05-05 at 13:56 +0200, Alexander Graf wrote: On 05/05/2014 03:27 AM, Gavin Shan wrote: The series of patches intends to support EEH for PCI devices, which have been passed through to PowerKVM based guest via VFIO. The implementation is straightforward based on the issues or problems we have to resolve to support EEH for PowerKVM based guest. - Emulation for EEH RTAS requests. Thanksfully, we already have infrastructure to emulate XICS. Without introducing new mechanism, we just extend that existing infrastructure to support EEH RTAS emulation. EEH RTAS requests initiated from guest are posted to host where the requests get handled or delivered to underly firmware for further handling. For that, the host kerenl has to maintain the PCI address (host domain/bus/slot/function to guest's PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address mapping will be built when initializing VFIO device in QEMU and destroied when the VFIO device in QEMU is going to offline, or VM is destroy. Do you also expose all those interfaces to user space? VFIO is as much about user space device drivers as it is about device assignment. Yep, all the interfaces are exported to user space. I would like to first see an implementation that doesn't touch KVM emulation code at all but instead routes everything through QEMU. As a second step we can then accelerate performance critical paths inside of KVM. Ok. I'll change the implementation. However, the QEMU still has to poll/push information from/to host kerenl. So the best place for that would be tce_iommu_driver_ops::ioctl as EEH is Power specific feature. For the error injection, I guess I have to put the logic token management into QEMU and error injection request will be handled by QEMU and then routed to host kernel via additional syscall as we did for pSeries. Yes, start off without in-kernel XICS so everything simply lives in QEMU. Then add callbacks into the in-kernel XICS to inject these interrupts if we don't have wide enough interfaces already. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/4] KVM: nVMX: rearrange get_vmx_mem_address
Our common function for vmptr checks (in 2/4) needs to fetch the memory address Signed-off-by: Bandan Das --- arch/x86/kvm/vmx.c | 106 ++--- 1 file changed, 53 insertions(+), 53 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1f68c58..c18fe9a4 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5775,6 +5775,59 @@ static enum hrtimer_restart vmx_preemption_timer_fn(struct hrtimer *timer) } /* + * Decode the memory-address operand of a vmx instruction, as recorded on an + * exit caused by such an instruction (run by a guest hypervisor). + * On success, returns 0. When the operand is invalid, returns 1 and throws + * #UD or #GP. + */ +static int get_vmx_mem_address(struct kvm_vcpu *vcpu, +unsigned long exit_qualification, +u32 vmx_instruction_info, gva_t *ret) +{ + /* +* According to Vol. 3B, "Information for VM Exits Due to Instruction +* Execution", on an exit, vmx_instruction_info holds most of the +* addressing components of the operand. Only the displacement part +* is put in exit_qualification (see 3B, "Basic VM-Exit Information"). +* For how an actual address is calculated from all these components, +* refer to Vol. 1, "Operand Addressing". +*/ + int scaling = vmx_instruction_info & 3; + int addr_size = (vmx_instruction_info >> 7) & 7; + bool is_reg = vmx_instruction_info & (1u << 10); + int seg_reg = (vmx_instruction_info >> 15) & 7; + int index_reg = (vmx_instruction_info >> 18) & 0xf; + bool index_is_valid = !(vmx_instruction_info & (1u << 22)); + int base_reg = (vmx_instruction_info >> 23) & 0xf; + bool base_is_valid = !(vmx_instruction_info & (1u << 27)); + + if (is_reg) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 1; + } + + /* Addr = segment_base + offset */ + /* offset = base + [index * scale] + displacement */ + *ret = vmx_get_segment_base(vcpu, seg_reg); + if (base_is_valid) + *ret += kvm_register_read(vcpu, base_reg); + if (index_is_valid) + *ret += kvm_register_read(vcpu, index_reg)<> 7) & 7; - bool is_reg = vmx_instruction_info & (1u << 10); - int seg_reg = (vmx_instruction_info >> 15) & 7; - int index_reg = (vmx_instruction_info >> 18) & 0xf; - bool index_is_valid = !(vmx_instruction_info & (1u << 22)); - int base_reg = (vmx_instruction_info >> 23) & 0xf; - bool base_is_valid = !(vmx_instruction_info & (1u << 27)); - - if (is_reg) { - kvm_queue_exception(vcpu, UD_VECTOR); - return 1; - } - - /* Addr = segment_base + offset */ - /* offset = base + [index * scale] + displacement */ - *ret = vmx_get_segment_base(vcpu, seg_reg); - if (base_is_valid) - *ret += kvm_register_read(vcpu, base_reg); - if (index_is_valid) - *ret += kvm_register_read(vcpu, index_reg)
[PATCH v2 3/4] KVM: nVMX: fail on invalid vmclear/vmptrld pointer
The spec mandates that if the vmptrld or vmclear address is equal to the vmxon region pointer, the instruction should fail with error "VMPTRLD with VMXON pointer" or "VMCLEAR with VMXON pointer" Signed-off-by: Bandan Das --- arch/x86/kvm/vmx.c | 12 1 file changed, 12 insertions(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 059906a..6c125ff 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6083,6 +6083,12 @@ static int handle_vmclear(struct kvm_vcpu *vcpu) return 1; } + if (vmptr == vmx->nested.vmxon_ptr) { + nested_vmx_failValid(vcpu, VMXERR_VMCLEAR_VMXON_POINTER); + skip_emulated_instruction(vcpu); + return 1; + } + if (vmptr == vmx->nested.current_vmptr) { nested_release_vmcs12(vmx); vmx->nested.current_vmptr = -1ull; @@ -6426,6 +6432,12 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu) return 1; } + if (vmptr == vmx->nested.vmxon_ptr) { + nested_vmx_failValid(vcpu, VMXERR_VMCLEAR_VMXON_POINTER); + skip_emulated_instruction(vcpu); + return 1; + } + if (vmx->nested.current_vmptr != vmptr) { struct vmcs12 *new_vmcs12; struct page *page; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/4] Emulate VMXON region correctly
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=54521 The vmxon region is unused by nvmx, but adding these checks are probably harmless and may detect buggy L1 hypervisors in the future! v2: 1/4 - Commit message change to reflect addition of new function 2/4 - Use cpuid_maxphyaddr() - Fix a leak with kunmap() - Remove unnecessary braces around comparisions - Move all checks into a common function, this will be later used by handle_vmptrld and handle_vmclear in 4/4 4/4 - New patch - use common function to perform checks on vmptr Bandan Das (4): KVM: nVMX: rearrange get_vmx_mem_address KVM: nVMX: additional checks on vmxon region KVM: nVMX: fail on invalid vmclear/vmptrld pointer KVM: nVMX: move vmclear and vmptrld pre-checks to nested_vmx_check_vmptr arch/x86/kvm/cpuid.c | 1 + arch/x86/kvm/vmx.c | 240 +-- 2 files changed, 156 insertions(+), 85 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/4] KVM: nVMX: move vmclear and vmptrld pre-checks to nested_vmx_check_vmptr
Some checks are common to all, and moreover, according to the spec, the check for whether any bits beyond the physical address width are set are also applicable to all of them Signed-off-by: Bandan Das --- arch/x86/kvm/vmx.c | 83 -- 1 file changed, 37 insertions(+), 46 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6c125ff..9b36057 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5833,8 +5833,10 @@ static int get_vmx_mem_address(struct kvm_vcpu *vcpu, * - if it's 4KB aligned * - No bits beyond the physical address width are set * - Returns 0 on success or else 1 + * (Intel SDM Section 30.3) */ -static int nested_vmx_check_vmptr(struct kvm_vcpu *vcpu, int exit_reason) +static int nested_vmx_check_vmptr(struct kvm_vcpu *vcpu, int exit_reason, + gpa_t *vmpointer) { gva_t gva; gpa_t vmptr; @@ -5882,11 +5884,42 @@ static int nested_vmx_check_vmptr(struct kvm_vcpu *vcpu, int exit_reason) kunmap(page); vmx->nested.vmxon_ptr = vmptr; break; + case EXIT_REASON_VMCLEAR: + if (!IS_ALIGNED(vmptr, PAGE_SIZE) || (vmptr >> maxphyaddr)) { + nested_vmx_failValid(vcpu, +VMXERR_VMCLEAR_INVALID_ADDRESS); + skip_emulated_instruction(vcpu); + return 1; + } + if (vmptr == vmx->nested.vmxon_ptr) { + nested_vmx_failValid(vcpu, +VMXERR_VMCLEAR_VMXON_POINTER); + skip_emulated_instruction(vcpu); + return 1; + } + break; + case EXIT_REASON_VMPTRLD: + if (!IS_ALIGNED(vmptr, PAGE_SIZE) || (vmptr >> maxphyaddr)) { + nested_vmx_failValid(vcpu, +VMXERR_VMPTRLD_INVALID_ADDRESS); + skip_emulated_instruction(vcpu); + return 1; + } + + if (vmptr == vmx->nested.vmxon_ptr) { + nested_vmx_failValid(vcpu, +VMXERR_VMCLEAR_VMXON_POINTER); + skip_emulated_instruction(vcpu); + return 1; + } + break; default: return 1; /* shouldn't happen */ } + if (vmpointer) + *vmpointer = vmptr; return 0; } @@ -5929,7 +5962,7 @@ static int handle_vmon(struct kvm_vcpu *vcpu) return 1; } - if (nested_vmx_check_vmptr(vcpu, EXIT_REASON_VMON)) + if (nested_vmx_check_vmptr(vcpu, EXIT_REASON_VMON, NULL)) return 1; if (vmx->nested.vmxon) { @@ -6058,37 +6091,16 @@ static int handle_vmoff(struct kvm_vcpu *vcpu) static int handle_vmclear(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); - gva_t gva; gpa_t vmptr; struct vmcs12 *vmcs12; struct page *page; - struct x86_exception e; if (!nested_vmx_check_permission(vcpu)) return 1; - if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION), - vmcs_read32(VMX_INSTRUCTION_INFO), &gva)) + if (nested_vmx_check_vmptr(vcpu, EXIT_REASON_VMCLEAR, &vmptr)) return 1; - if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &vmptr, - sizeof(vmptr), &e)) { - kvm_inject_page_fault(vcpu, &e); - return 1; - } - - if (!IS_ALIGNED(vmptr, PAGE_SIZE)) { - nested_vmx_failValid(vcpu, VMXERR_VMCLEAR_INVALID_ADDRESS); - skip_emulated_instruction(vcpu); - return 1; - } - - if (vmptr == vmx->nested.vmxon_ptr) { - nested_vmx_failValid(vcpu, VMXERR_VMCLEAR_VMXON_POINTER); - skip_emulated_instruction(vcpu); - return 1; - } - if (vmptr == vmx->nested.current_vmptr) { nested_release_vmcs12(vmx); vmx->nested.current_vmptr = -1ull; @@ -6408,35 +6420,14 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu) static int handle_vmptrld(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); - gva_t gva; gpa_t vmptr; - struct x86_exception e; u32 exec_control; if (!nested_vmx_check_permission(vcpu)) return 1; - if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION), - vmcs_read32(VMX_INSTRUCTION_INFO), &gva)) - return 1; - - if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &vmptr, - sizeof(vmptr), &e)) { - kvm_inject_page_fault(vcpu, &e);
[PATCH v2 2/4] KVM: nVMX: additional checks on vmxon region
Currently, the vmxon region isn't used in the nested case. However, according to the spec, the vmxon instruction performs additional sanity checks on this region and the associated pointer. Modify emulated vmxon to better adhere to the spec requirements Signed-off-by: Bandan Das --- arch/x86/kvm/cpuid.c | 1 + arch/x86/kvm/vmx.c | 67 2 files changed, 68 insertions(+) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index f47a104..da9894b 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -726,6 +726,7 @@ int cpuid_maxphyaddr(struct kvm_vcpu *vcpu) not_found: return 36; } +EXPORT_SYMBOL_GPL(cpuid_maxphyaddr); /* * If no match is found, check whether we exceed the vCPU's limit diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c18fe9a4..059906a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -354,6 +354,7 @@ struct vmcs02_list { struct nested_vmx { /* Has the level1 guest done vmxon? */ bool vmxon; + gpa_t vmxon_ptr; /* The guest-physical address of the current VMCS L1 keeps for L2 */ gpa_t current_vmptr; @@ -5828,6 +5829,68 @@ static int get_vmx_mem_address(struct kvm_vcpu *vcpu, } /* + * This function performs the various checks including + * - if it's 4KB aligned + * - No bits beyond the physical address width are set + * - Returns 0 on success or else 1 + */ +static int nested_vmx_check_vmptr(struct kvm_vcpu *vcpu, int exit_reason) +{ + gva_t gva; + gpa_t vmptr; + struct x86_exception e; + struct page *page; + struct vcpu_vmx *vmx = to_vmx(vcpu); + int maxphyaddr = cpuid_maxphyaddr(vcpu); + + if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION), + vmcs_read32(VMX_INSTRUCTION_INFO), &gva)) + return 1; + + if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &vmptr, + sizeof(vmptr), &e)) { + kvm_inject_page_fault(vcpu, &e); + return 1; + } + + switch (exit_reason) { + case EXIT_REASON_VMON: + /* +* SDM 3: 24.11.5 +* The first 4 bytes of VMXON region contain the supported +* VMCS revision identifier +* +* Note - IA32_VMX_BASIC[48] will never be 1 +* for the nested case; +* which replaces physical address width with 32 +* +*/ + if (!IS_ALIGNED(vmptr, PAGE_SIZE) || (vmptr >> maxphyaddr)) { + nested_vmx_failInvalid(vcpu); + skip_emulated_instruction(vcpu); + return 1; + } + + page = nested_get_page(vcpu, vmptr); + if (page == NULL || + *(u32 *)kmap(page) != VMCS12_REVISION) { + nested_vmx_failInvalid(vcpu); + kunmap(page); + skip_emulated_instruction(vcpu); + return 1; + } + kunmap(page); + vmx->nested.vmxon_ptr = vmptr; + break; + + default: + return 1; /* shouldn't happen */ + } + + return 0; +} + +/* * Emulate the VMXON instruction. * Currently, we just remember that VMX is active, and do not save or even * inspect the argument to VMXON (the so-called "VMXON pointer") because we @@ -5865,6 +5928,10 @@ static int handle_vmon(struct kvm_vcpu *vcpu) kvm_inject_gp(vcpu, 0); return 1; } + + if (nested_vmx_check_vmptr(vcpu, EXIT_REASON_VMON)) + return 1; + if (vmx->nested.vmxon) { nested_vmx_failValid(vcpu, VMXERR_VMXON_IN_VMX_ROOT_OPERATION); skip_emulated_instruction(vcpu); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest
On Mon, May 05, 2014 at 08:00:12AM -0600, Alex Williamson wrote: >On Mon, 2014-05-05 at 13:56 +0200, Alexander Graf wrote: >> On 05/05/2014 03:27 AM, Gavin Shan wrote: >> > The series of patches intends to support EEH for PCI devices, which have >> > been >> > passed through to PowerKVM based guest via VFIO. The implementation is >> > straightforward based on the issues or problems we have to resolve to >> > support >> > EEH for PowerKVM based guest. >> > >> > - Emulation for EEH RTAS requests. Thanksfully, we already have >> > infrastructure >> >to emulate XICS. Without introducing new mechanism, we just extend that >> >existing infrastructure to support EEH RTAS emulation. EEH RTAS requests >> >initiated from guest are posted to host where the requests get handled >> > or >> >delivered to underly firmware for further handling. For that, the host >> > kerenl >> >has to maintain the PCI address (host domain/bus/slot/function to >> > guest's >> >PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address >> > mapping >> >will be built when initializing VFIO device in QEMU and destroied when >> > the >> >VFIO device in QEMU is going to offline, or VM is destroy. >> >> Do you also expose all those interfaces to user space? VFIO is as much >> about user space device drivers as it is about device assignment. >> Yep, all the interfaces are exported to user space. >> I would like to first see an implementation that doesn't touch KVM >> emulation code at all but instead routes everything through QEMU. As a >> second step we can then accelerate performance critical paths inside of KVM. >> Ok. I'll change the implementation. However, the QEMU still has to poll/push information from/to host kerenl. So the best place for that would be tce_iommu_driver_ops::ioctl as EEH is Power specific feature. For the error injection, I guess I have to put the logic token management into QEMU and error injection request will be handled by QEMU and then routed to host kernel via additional syscall as we did for pSeries. >> That way we ensure that user space device drivers have all the power >> over a device they need to drive it. > >+1 > Thanks, Gavin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest
On Mon, May 05, 2014 at 08:17:00PM +0530, Aneesh Kumar K.V wrote: > Alexander Graf writes: > > > On 05/04/2014 07:30 PM, Aneesh Kumar K.V wrote: > >> Signed-off-by: Aneesh Kumar K.V > > > > No patch description, no proper explanations anywhere why you're doing > > what. All of that in a pretty sensitive piece of code. There's no way > > this patch can go upstream in its current form. > > > > Sorry about being vague. Will add a better commit message. The goal is > to export MPSS support to guest if the host support the same. MPSS > support is exported via penc encoding in "ibm,segment-page-sizes". The > actual format can be found at htab_dt_scan_page_sizes. When the guest > memory is backed by hugetlbfs we expose the penc encoding the host > support to guest via kvmppc_add_seg_page_size. In a case like this it's good to assume the reader doesn't know very much about Power CPUs, and probably isn't familiar with acronyms such as MPSS. The patch needs an introductory paragraph explaining that on recent IBM Power CPUs, while the hashed page table is looked up using the page size from the segmentation hardware (i.e. the SLB), it is possible to have the HPT entry indicate a larger page size. Thus for example it is possible to put a 16MB page in a 64kB segment, but since the hash lookup is done using a 64kB page size, it may be necessary to put multiple entries in the HPT for a single 16MB page. This capability is called mixed page-size segment (MPSS). With MPSS, there are two relevant page sizes: the base page size, which is the size used in searching the HPT, and the actual page size, which is the size indicated in the HPT entry. Note that the actual page size is always >= base page size. > Now the challenge to THP support is to make sure that our henter, > hremove etc decode base page size and actual page size correctly > from the hash table entry values. Most of the changes is to do that. > Rest of the stuff is already handled by kvm. > > NOTE: It is much easier to read the code after applying the patch rather > than reading the diff. I have added comments around each steps in the > code. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 5/5] change update_range to handle > 4GB 2nd stage range for ARMv7
Hi Gavin, thanks, didn't catch that, I'll remove these calls. - Mario On 05/05/2014 04:34 PM, Gavin Guo wrote: > Hi Mario, > > On Tue, Apr 29, 2014 at 9:06 AM, Mario Smarduch > wrote: >> >> This patch adds support for unmapping 2nd stage page tables for addresses >> >4GB >> on ARMv7. >> >> Signed-off-by: Mario Smarduch >> --- >> arch/arm/kvm/mmu.c | 20 >> 1 file changed, 12 insertions(+), 8 deletions(-) >> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c >> index 88f5503..afbf8ba 100644 >> --- a/arch/arm/kvm/mmu.c >> +++ b/arch/arm/kvm/mmu.c >> @@ -176,21 +176,25 @@ static void clear_pte_entry(struct kvm *kvm, pte_t >> *pte, phys_addr_t addr) >> } >> } >> >> +/* Function shared between identity and 2nd stage mappings. For 2nd stage >> + * the IPA may be > 4GB on ARMv7, and page table range functions >> + * will fail. kvm_xxx_addr_end() is used to handle both cases. >> + */ >> static void unmap_range(struct kvm *kvm, pgd_t *pgdp, >> - unsigned long long start, u64 size) >> + phys_addr_t start, u64 size) >> { >> pgd_t *pgd; >> pud_t *pud; >> pmd_t *pmd; >> pte_t *pte; >> - unsigned long long addr = start, end = start + size; >> - u64 next; >> + phys_addr_t addr = start, end = start + size; >> + phys_addr_t next; >> >> while (addr < end) { >> pgd = pgdp + pgd_index(addr); >> pud = pud_offset(pgd, addr); >> if (pud_none(*pud)) { >> - addr = pud_addr_end(addr, end); >> + addr = kvm_pud_addr_end(addr, end); >> continue; >> } >> >> @@ -200,13 +204,13 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp, >> * move on. >> */ >> clear_pud_entry(kvm, pud, addr); >> - addr = pud_addr_end(addr, end); >> + addr = kvm_pud_addr_end(addr, end); >> continue; >> } >> >> pmd = pmd_offset(pud, addr); >> if (pmd_none(*pmd)) { >> - addr = pmd_addr_end(addr, end); >> + addr = kvm_pmd_addr_end(addr, end); >> continue; >> } >> >> @@ -221,10 +225,10 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp, >> */ >> if (kvm_pmd_huge(*pmd) || page_empty(pte)) { >> clear_pmd_entry(kvm, pmd, addr); >> - next = pmd_addr_end(addr, end); >> + next = kvm_pmd_addr_end(addr, end); >> if (page_empty(pmd) && !page_empty(pud)) { >> clear_pud_entry(kvm, pud, addr); >> - next = pud_addr_end(addr, end); >> + next = kvm_pud_addr_end(addr, end); >> } >> } >> >> -- >> 1.7.9.5 >> >> >> > > It seems that your adding kvm_pmd_addr_end(addr, end) already exists > in the following patch and may need to remove these parts from your > patch. > > commit a3c8bd31af260a17d626514f636849ee1cd1f63e > Author: Marc Zyngier > Date: Tue Feb 18 14:29:03 2014 + > > ARM: KVM: introduce kvm_p*d_addr_end > > The use of p*d_addr_end with stage-2 translation is slightly dodgy, > as the IPA is 40bits, while all the p*d_addr_end helpers are > taking an unsigned long (arm64 is fine with that as unligned long > is 64bit). > > The fix is to introduce 64bit clean versions of the same helpers, > and use them in the stage-2 page table code. > > Signed-off-by: Marc Zyngier > Acked-by: Catalin Marinas > Reviewed-by: Christoffer Dall > > Gavin > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM exit on UD interception
Thank you Jun! Now I understand that there is a strong need to support this scenario where the host might run into trouble executing binaries with instructions unknown to it. I am still wondering if there is a way to actually exit KVM on UD from a syscall instruction without modifying the KVM kernel module? Best regards, Alex On Mon, May 5, 2014 at 7:07 PM, Nakajima, Jun wrote: > On Mon, May 5, 2014 at 11:48 AM, Alexandru Duţu wrote: >> Thank you Jun! I see that in case of VMX does not emulated the >> instruction that produced a UD exception, it just queues the exception >> and returns 1. After that KVM will still try to enter virtualized >> execution and so forth, the execution probably finishing with a DF and >> shut down. It does not seem that KVM, in case of VMX, will exit >> immediately on UD. >> >> I am not sure what you meant with MOVBE emulation. > > I meant: > > commit 84cffe499b9418d6c3b4de2ad9599cc2ec50c607 > Author: Borislav Petkov > Date: Tue Oct 29 12:54:56 2013 +0100 > > kvm: Emulate MOVBE > > This basically came from the need to be able to boot 32-bit Atom SMP > guests on an AMD host, i.e. a host which doesn't support MOVBE. As a > matter of fact, qemu has since recently received MOVBE support but we > cannot share that with kvm emulation and thus we have to do this in the > host. We're waay faster in kvm anyway. :-) > > So, we piggyback on the #UD path and emulate the MOVBE functionality. > With it, an 8-core SMP guest boots in under 6 seconds. > > Also, requesting MOVBE emulation needs to happen explicitly to work, > i.e. qemu -cpu n270,+movbe... > > Just FYI, a fairly straight-forward boot of a MOVBE-enabled 3.9-rc6+ > kernel in kvm executes MOVBE ~60K times. > > Signed-off-by: Andre Przywara > Signed-off-by: Borislav Petkov > Signed-off-by: Paolo Bonzini > > > -- > Jun > Intel Open Source Technology Center -- Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 0/3] Emulator Speedups - Optimize Instruction fetches
My initial attempt at caching gva->gpa->hva translations. Pretty straight forward with details in the individual patches. I haven't yet looked into if there are other possibilities to speed things up, just thought of sending these out since the numbers are better 567 cycles/emulated jump instruction 718 cycles/emulated move instruction 730 cycles/emulated arithmetic instruction 946 cycles/emulated memory load instruction 956 cycles/emulated memory store instruction 921 cycles/emulated memory RMW instruction Old realmode.flat numbers from init ctxt changes - https://lkml.org/lkml/2014/4/16/848 639 cycles/emulated jump instruction (4.3%) 776 cycles/emulated move instruction (7.5%) 791 cycles/emulated arithmetic instruction (11%) 943 cycles/emulated memory load instruction (5.2%) 948 cycles/emulated memory store instruction (7.6%) 929 cycles/emulated memory RMW instruction (9.0%) Bandan Das (3): KVM: x86: pass ctxt to fetch helper function KVM: x86: use memory_prepare in fetch helper function KVM: x86: cache userspace address for faster fetches arch/x86/include/asm/kvm_emulate.h | 7 +- arch/x86/kvm/x86.c | 46 -- 2 files changed, 40 insertions(+), 13 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 3/3] KVM: x86: cache userspace address for faster fetches
On every instruction fetch, kvm_read_guest_virt_helper does the gva to gpa translation followed by searching for the memslot. Store the gva hva mapping so that if there's a match we can directly call __copy_from_user() Suggested-by: Paolo Bonzini Signed-off-by: Bandan Das --- arch/x86/include/asm/kvm_emulate.h | 7 ++- arch/x86/kvm/x86.c | 33 +++-- 2 files changed, 29 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 085d688..20ccde4 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -323,10 +323,11 @@ struct x86_emulate_ctxt { int (*execute)(struct x86_emulate_ctxt *ctxt); int (*check_perm)(struct x86_emulate_ctxt *ctxt); /* -* The following five fields are cleared together, +* The following six fields are cleared together, * the rest are initialized unconditionally in x86_decode_insn * or elsewhere */ + bool addr_cache_valid; u8 rex_prefix; u8 lock_prefix; u8 rep_prefix; @@ -348,6 +349,10 @@ struct x86_emulate_ctxt { struct fetch_cache fetch; struct read_cache io_read; struct read_cache mem_read; + struct { + gfn_t gfn; + unsigned long uaddr; + } addr_cache; }; /* Repeat String Operation Prefix */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index cf69e3b..7afcfc7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4072,26 +4072,38 @@ static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes, unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset); int ret; unsigned long uaddr; + gfn_t gfn = addr >> PAGE_SHIFT; - ret = ctxt->ops->memory_prepare(ctxt, addr, toread, - exception, false, - NULL, &uaddr); - if (ret != X86EMUL_CONTINUE) - return ret; + if (ctxt->addr_cache_valid && + (ctxt->addr_cache.gfn == gfn)) + uaddr = (ctxt->addr_cache.uaddr << PAGE_SHIFT) + + offset_in_page(addr); + else { + ret = ctxt->ops->memory_prepare(ctxt, addr, toread, + exception, false, + NULL, &uaddr); + if (ret != X86EMUL_CONTINUE) + return ret; + + if (unlikely(kvm_is_error_hva(uaddr))) { + r = X86EMUL_PROPAGATE_FAULT; + return r; + } - if (unlikely(kvm_is_error_hva(uaddr))) { - r = X86EMUL_PROPAGATE_FAULT; - return r; + /* Cache gfn and hva */ + ctxt->addr_cache.gfn = addr >> PAGE_SHIFT; + ctxt->addr_cache.uaddr = uaddr >> PAGE_SHIFT; + ctxt->addr_cache_valid = true; } ret = __copy_from_user(data, (void __user *)uaddr, toread); if (ret < 0) { r = X86EMUL_IO_NEEDED; + /* Where else should we invalidate cache ? */ + ctxt->ops->memory_finish(ctxt, NULL, uaddr); return r; } - ctxt->ops->memory_finish(ctxt, NULL, uaddr); - bytes -= toread; data += toread; addr += toread; @@ -4339,6 +4351,7 @@ static void emulator_memory_finish(struct x86_emulate_ctxt *ctxt, struct kvm_memory_slot *memslot; gfn_t gfn; + ctxt->addr_cache_valid = false; if (!opaque) return; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
On Mon, May 05, 2014 at 01:19:30PM +0200, Alexander Graf wrote: > On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: > >+#ifdef CONFIG_PPC_BOOK3S_64 > >+return vcpu->arch.fault_dar; > > How about PA6T and G5s? G5 sets DAR on an alignment interrupt. As for PA6T, I don't know for sure, but if it doesn't, ordinary alignment interrupts wouldn't be handled properly, since the code in arch/powerpc/kernel/align.c assumes DAR contains the address being accessed on all PowerPC CPUs. Did PA Semi ever publish a user manual for the PA6T, I wonder? Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 2/3] KVM: x86: use memory_prepare in fetch helper function
Insn fetch fastpath function. Not that arch.walk_mmu->gva_to_gpa can't be used but let's piggyback on top of interface meant for our purpose Signed-off-by: Bandan Das --- arch/x86/kvm/x86.c | 25 + 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 17e3d661..cf69e3b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4065,29 +4065,38 @@ static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes, struct x86_exception *exception) { void *data = val; - struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); int r = X86EMUL_CONTINUE; while (bytes) { - gpa_t gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr, access, - exception); unsigned offset = addr & (PAGE_SIZE-1); unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset); int ret; + unsigned long uaddr; - if (gpa == UNMAPPED_GVA) - return X86EMUL_PROPAGATE_FAULT; - ret = kvm_read_guest(vcpu->kvm, gpa, data, toread); + ret = ctxt->ops->memory_prepare(ctxt, addr, toread, + exception, false, + NULL, &uaddr); + if (ret != X86EMUL_CONTINUE) + return ret; + + if (unlikely(kvm_is_error_hva(uaddr))) { + r = X86EMUL_PROPAGATE_FAULT; + return r; + } + + ret = __copy_from_user(data, (void __user *)uaddr, toread); if (ret < 0) { r = X86EMUL_IO_NEEDED; - goto out; + return r; } + ctxt->ops->memory_finish(ctxt, NULL, uaddr); + bytes -= toread; data += toread; addr += toread; } -out: + return r; } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 1/3] KVM: x86: pass ctxt to fetch helper function
In the following patches, our adress caching struct that's embedded within struct x86_emulate_ctxt will need to be accessed Signed-off-by: Bandan Das --- arch/x86/kvm/x86.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 122410d..17e3d661 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4061,10 +4061,11 @@ gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, } static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes, - struct kvm_vcpu *vcpu, u32 access, + struct x86_emulate_ctxt *ctxt, u32 access, struct x86_exception *exception) { void *data = val; + struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); int r = X86EMUL_CONTINUE; while (bytes) { @@ -4098,7 +4099,7 @@ static int kvm_fetch_guest_virt(struct x86_emulate_ctxt *ctxt, struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; - return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, + return kvm_read_guest_virt_helper(addr, val, bytes, ctxt, access | PFERR_FETCH_MASK, exception); } @@ -4110,7 +4111,7 @@ int kvm_read_guest_virt(struct x86_emulate_ctxt *ctxt, struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; - return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, access, + return kvm_read_guest_virt_helper(addr, val, bytes, ctxt, access, exception); } EXPORT_SYMBOL_GPL(kvm_read_guest_virt); @@ -4119,8 +4120,7 @@ static int kvm_read_guest_virt_system(struct x86_emulate_ctxt *ctxt, gva_t addr, void *val, unsigned int bytes, struct x86_exception *exception) { - struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); - return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, exception); + return kvm_read_guest_virt_helper(addr, val, bytes, ctxt, 0, exception); } int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt, -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
On Mon, 2014-05-05 at 16:43 +0200, Alexander Graf wrote: > > Paul mentioned that BOOK3S always had DAR value set on alignment > > interrupt. And the patch is to enable/collect correct DAR value when > > running with Little Endian PR guest. Now to limit the impact and to > > enable Little Endian PR guest, I ended up doing the conditional code > > only for book3s 64 for which we know for sure that we set DAR value. > > Yes, and I'm asking whether we know that this statement holds true for > PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is > at least developed by IBM, I'd assume its semantics here are similar to > POWER4, but for PA6T I wouldn't be so sure. I am not aware of any PowerPC processor that does not set DAR on alignment interrupts. Paul, are you ? Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM exit on UD interception
On Mon, May 5, 2014 at 11:48 AM, Alexandru Duţu wrote: > Thank you Jun! I see that in case of VMX does not emulated the > instruction that produced a UD exception, it just queues the exception > and returns 1. After that KVM will still try to enter virtualized > execution and so forth, the execution probably finishing with a DF and > shut down. It does not seem that KVM, in case of VMX, will exit > immediately on UD. > > I am not sure what you meant with MOVBE emulation. I meant: commit 84cffe499b9418d6c3b4de2ad9599cc2ec50c607 Author: Borislav Petkov Date: Tue Oct 29 12:54:56 2013 +0100 kvm: Emulate MOVBE This basically came from the need to be able to boot 32-bit Atom SMP guests on an AMD host, i.e. a host which doesn't support MOVBE. As a matter of fact, qemu has since recently received MOVBE support but we cannot share that with kvm emulation and thus we have to do this in the host. We're waay faster in kvm anyway. :-) So, we piggyback on the #UD path and emulate the MOVBE functionality. With it, an 8-core SMP guest boots in under 6 seconds. Also, requesting MOVBE emulation needs to happen explicitly to work, i.e. qemu -cpu n270,+movbe... Just FYI, a fairly straight-forward boot of a MOVBE-enabled 3.9-rc6+ kernel in kvm executes MOVBE ~60K times. Signed-off-by: Andre Przywara Signed-off-by: Borislav Petkov Signed-off-by: Paolo Bonzini -- Jun Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote: > Isn't this a greater problem? We should start swapping before we hit > the point where non movable kernel allocation fails, no? Possibly but the fact remains, this can be avoided by making sure that if we create a CMA reserve for KVM, then it uses it rather than using the rest of main memory for hash tables. > The fact that KVM uses a good number of normal kernel pages is maybe > suboptimal, but shouldn't be a critical problem. The point is that we explicitly reserve those pages in CMA for use by KVM for that specific purpose, but the current code tries first to get them out of the normal pool. This is not an optimal behaviour and is what Aneesh patches are trying to fix. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
On Mon, 2014-05-05 at 19:56 +0530, Aneesh Kumar K.V wrote: > > Paul mentioned that BOOK3S always had DAR value set on alignment > interrupt. And the patch is to enable/collect correct DAR value when > running with Little Endian PR guest. Now to limit the impact and to > enable Little Endian PR guest, I ended up doing the conditional code > only for book3s 64 for which we know for sure that we set DAR value. Only BookS ? Afaik, the kernel align.c unconditionally uses DAR on every processor type. It's DSISR that may or may not be populated but afaik DAR always is. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes for 2014-04-29
On Wed, Apr 30, 2014 at 1:20 AM, Juan Quintela wrote: > > > 2014-04-29 > -- > > - security (CVE) > New group to handle that issues responsible. > Mail is still not encrypted, wolud be. > mst writing a wiki page about it > what is the criteria to request (not) for a CVE number > Look at http://wiki.qemu.org/SecurityProcess > > - hot [un]plug for passthrough devices for platform devices > > Lots of discussions about how to do it internally/externally from > qemu, both with its [dis]advantages. Basically how to do things > there. > Iv'e had a play with QOMifying both Memory regions and GPIOs and attaching them via QOM links. Looks viable as a unified solution. Can we discuss at next call? Regards, Peter > > Later, Juan. > > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 5/5] change update_range to handle > 4GB 2nd stage range for ARMv7
Hi Mario, On Tue, Apr 29, 2014 at 9:06 AM, Mario Smarduch wrote: > > This patch adds support for unmapping 2nd stage page tables for addresses >4GB > on ARMv7. > > Signed-off-by: Mario Smarduch > --- > arch/arm/kvm/mmu.c | 20 > 1 file changed, 12 insertions(+), 8 deletions(-) > > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c > index 88f5503..afbf8ba 100644 > --- a/arch/arm/kvm/mmu.c > +++ b/arch/arm/kvm/mmu.c > @@ -176,21 +176,25 @@ static void clear_pte_entry(struct kvm *kvm, pte_t > *pte, phys_addr_t addr) > } > } > > +/* Function shared between identity and 2nd stage mappings. For 2nd stage > + * the IPA may be > 4GB on ARMv7, and page table range functions > + * will fail. kvm_xxx_addr_end() is used to handle both cases. > + */ > static void unmap_range(struct kvm *kvm, pgd_t *pgdp, > - unsigned long long start, u64 size) > + phys_addr_t start, u64 size) > { > pgd_t *pgd; > pud_t *pud; > pmd_t *pmd; > pte_t *pte; > - unsigned long long addr = start, end = start + size; > - u64 next; > + phys_addr_t addr = start, end = start + size; > + phys_addr_t next; > > while (addr < end) { > pgd = pgdp + pgd_index(addr); > pud = pud_offset(pgd, addr); > if (pud_none(*pud)) { > - addr = pud_addr_end(addr, end); > + addr = kvm_pud_addr_end(addr, end); > continue; > } > > @@ -200,13 +204,13 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp, > * move on. > */ > clear_pud_entry(kvm, pud, addr); > - addr = pud_addr_end(addr, end); > + addr = kvm_pud_addr_end(addr, end); > continue; > } > > pmd = pmd_offset(pud, addr); > if (pmd_none(*pmd)) { > - addr = pmd_addr_end(addr, end); > + addr = kvm_pmd_addr_end(addr, end); > continue; > } > > @@ -221,10 +225,10 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp, > */ > if (kvm_pmd_huge(*pmd) || page_empty(pte)) { > clear_pmd_entry(kvm, pmd, addr); > - next = pmd_addr_end(addr, end); > + next = kvm_pmd_addr_end(addr, end); > if (page_empty(pmd) && !page_empty(pud)) { > clear_pud_entry(kvm, pud, addr); > - next = pud_addr_end(addr, end); > + next = kvm_pud_addr_end(addr, end); > } > } > > -- > 1.7.9.5 > > > It seems that your adding kvm_pmd_addr_end(addr, end) already exists in the following patch and may need to remove these parts from your patch. commit a3c8bd31af260a17d626514f636849ee1cd1f63e Author: Marc Zyngier Date: Tue Feb 18 14:29:03 2014 + ARM: KVM: introduce kvm_p*d_addr_end The use of p*d_addr_end with stage-2 translation is slightly dodgy, as the IPA is 40bits, while all the p*d_addr_end helpers are taking an unsigned long (arm64 is fine with that as unligned long is 64bit). The fix is to introduce 64bit clean versions of the same helpers, and use them in the stage-2 page table code. Signed-off-by: Marc Zyngier Acked-by: Catalin Marinas Reviewed-by: Christoffer Dall Gavin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward
Marcin, Can you provide detailed instructions on how to reproduce the problem? Thanks On Mon, May 05, 2014 at 08:27:10PM -0300, Marcelo Tosatti wrote: > On Mon, May 05, 2014 at 08:26:04PM +0200, Marcin Gibuła wrote: > > >>is it possible to have kvmclock jumping forward? > > >> > > >>Because I've reproducible case when at about 1 per 20 vm restores, VM > > >>freezes for couple of hours and then resumes with date few hundreds years > > >>ahead. Happens only with kvmclock. > > >> > > >>And this patch seems to fix very similar issue so maybe it's all the same > > >>bug. > > > > > >I'm fairly sure it is the exact same bug. Jumping backward is like jumping > > >forward by a big amount :) > > > > Hi, > > > > I've tested your path on my test VM... don't know if it's pure luck > > or not, but it didn't hang with over 70 restores. > > > > The message "KVM Clock migrated backwards, using later time" fires > > every time, but VM is healthy after resume. > > What is the host clocksource? (cat > /sys/devices/system/clocksource/clocksource0/current_clocksource). > > And kernel version? > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvmclock: Ensure time in migration never goes backward
On Mon, May 05, 2014 at 08:23:43PM -0300, Marcelo Tosatti wrote: > Hi Alexander, > > On Mon, May 05, 2014 at 03:51:22PM +0200, Alexander Graf wrote: > > When we migrate we ask the kernel about its current belief on what the guest > > time would be. > > KVM_GET_CLOCK which returns the time in "struct kvm_clock_data". > > > However, I've seen cases where the kvmclock guest structure > > indicates a time more recent than the kvm returned time. This should not happen because the value returned by KVM_GET_CLOCK (get_kernel_ns() + kvmclock_offset) should be relatively in sync with what is seen in the guest via kvmclock read. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward
On Mon, May 05, 2014 at 08:26:04PM +0200, Marcin Gibuła wrote: > >>is it possible to have kvmclock jumping forward? > >> > >>Because I've reproducible case when at about 1 per 20 vm restores, VM > >>freezes for couple of hours and then resumes with date few hundreds years > >>ahead. Happens only with kvmclock. > >> > >>And this patch seems to fix very similar issue so maybe it's all the same > >>bug. > > > >I'm fairly sure it is the exact same bug. Jumping backward is like jumping > >forward by a big amount :) > > Hi, > > I've tested your path on my test VM... don't know if it's pure luck > or not, but it didn't hang with over 70 restores. > > The message "KVM Clock migrated backwards, using later time" fires > every time, but VM is healthy after resume. What is the host clocksource? (cat /sys/devices/system/clocksource/clocksource0/current_clocksource). And kernel version? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvmclock: Ensure time in migration never goes backward
Hi Alexander, On Mon, May 05, 2014 at 03:51:22PM +0200, Alexander Graf wrote: > When we migrate we ask the kernel about its current belief on what the guest > time would be. KVM_GET_CLOCK which returns the time in "struct kvm_clock_data". > However, I've seen cases where the kvmclock guest structure > indicates a time more recent than the kvm returned time. More details please: 1) By what algorithm you retrieve and compare time in kvmclock guest structure and KVM_GET_CLOCK. What are the results of the comparison. And whether and backwards time was visible in the guest. 2) What is the host clocksource. The test below is not a good one because: T1) KVM_GET_CLOCK (save s->clock). T2) save env->tsc. The difference in scaled time between T1 and T2 is larger than 1 nanosecond, so the (time_at_migration > s->clock) check is almost always positive (what matters though is whether time backwards event can be seen reading kvmclock in the guest). > To make sure we never go backwards, calculate what the guest would have seen > as time at the point of migration and use that value instead of the kernel > returned one when it's more recent. > > While this doesn't fix the underlying issue that the kernel's view of time > is skewed, it allows us to safely migrate guests even from sources that are > known broken. > > Signed-off-by: Alexander Graf > --- > hw/i386/kvm/clock.c | 48 > 1 file changed, 48 insertions(+) > > diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c > index 892aa02..c6521cf 100644 > --- a/hw/i386/kvm/clock.c > +++ b/hw/i386/kvm/clock.c > @@ -14,6 +14,7 @@ > */ > > #include "qemu-common.h" > +#include "qemu/host-utils.h" > #include "sysemu/sysemu.h" > #include "sysemu/kvm.h" > #include "hw/sysbus.h" > @@ -34,6 +35,47 @@ typedef struct KVMClockState { > bool clock_valid; > } KVMClockState; > > +struct pvclock_vcpu_time_info { > +uint32_t version; > +uint32_t pad0; > +uint64_t tsc_timestamp; > +uint64_t system_time; > +uint32_t tsc_to_system_mul; > +int8_t tsc_shift; > +uint8_tflags; > +uint8_tpad[2]; > +} __attribute__((__packed__)); /* 32 bytes */ > + > +static uint64_t kvmclock_current_nsec(KVMClockState *s) > +{ > +CPUState *cpu = first_cpu; > +CPUX86State *env = cpu->env_ptr; > +hwaddr kvmclock_struct_pa = env->system_time_msr & ~1ULL; > +uint64_t migration_tsc = env->tsc; > +struct pvclock_vcpu_time_info time; > +uint64_t delta; > +uint64_t nsec_lo; > +uint64_t nsec_hi; > +uint64_t nsec; > + > +if (!(env->system_time_msr & 1ULL)) { > +/* KVM clock not active */ > +return 0; > +} > + > +cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time)); > + > +delta = migration_tsc - time.tsc_timestamp; > +if (time.tsc_shift < 0) { > +delta >>= -time.tsc_shift; > +} else { > +delta <<= time.tsc_shift; > +} > + > +mulu64(&nsec_lo, &nsec_hi, delta, time.tsc_to_system_mul); > +nsec = (nsec_lo >> 32) | (nsec_hi << 32); > +return nsec + time.system_time; > +} > > static void kvmclock_vm_state_change(void *opaque, int running, > RunState state) > @@ -45,9 +87,15 @@ static void kvmclock_vm_state_change(void *opaque, int > running, > > if (running) { > struct kvm_clock_data data; > +uint64_t time_at_migration = kvmclock_current_nsec(s); > > s->clock_valid = false; > > +if (time_at_migration > s->clock) { > +fprintf(stderr, "KVM Clock migrated backwards, using later > time\n"); > +s->clock = time_at_migration; > +} > + > data.clock = s->clock; > data.flags = 0; > ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data); > -- > 1.7.12.4 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Nested EPT page fault
Hi, I have one question related to nested EPT page fault. At the very start, L0 hypervisor launches L2 with an empty EPT0->2 table, building the table on-the-fly. when one L2 physical page is accessed, ept_page_fault(paging_tmpl.h) will be called to handle this fault in L0. which will first call ept_walk_addr to get guest ept entry from EPT1->2. If there is no such entry, a guest page fault will be injected into L1 to handle this fault. When the next time, the same L2 physical page is accessed, ept_page_fault will be triggered again in L0, which will also call ept_walk_addr and get the previously filled ept entry in EPT1->2, then try_async_pf will be called to translate the L1 physical page to L0 physical page. At the very last, an entry will be created in the EPT0->2 to solve the page fault. Please correct me if I am wrong. My question is when the EPT0->1 will be accessed during the EPT0->2 entry created, since according to the turtle's paper, both EPT0->1 and EPT->12 will be accessed to populate an entry in EPT0->2. Thanks for your time! Best Wishes, Yaohui -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
Am 05.05.14 16:57, schrieb Olof Johansson: [Now without HTML email -- it's what you get for cc:ing me at work instead of my upstream email :)] 2014-05-05 7:43 GMT-07:00 Alexander Graf : On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote: Alexander Graf writes: On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: Although it's optional IBM POWER cpus always had DAR value set on alignment interrupt. So don't try to compute these values. Signed-off-by: Aneesh Kumar K.V --- Changes from V3: * Use make_dsisr instead of checking feature flag to decide whether to use saved dsisr or not ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst) { +#ifdef CONFIG_PPC_BOOK3S_64 + return vcpu->arch.fault_dar; How about PA6T and G5s? Paul mentioned that BOOK3S always had DAR value set on alignment interrupt. And the patch is to enable/collect correct DAR value when running with Little Endian PR guest. Now to limit the impact and to enable Little Endian PR guest, I ended up doing the conditional code only for book3s 64 for which we know for sure that we set DAR value. Yes, and I'm asking whether we know that this statement holds true for PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at least developed by IBM, I'd assume its semantics here are similar to POWER4, but for PA6T I wouldn't be so sure. Thanks for looking out for us, obviously IBM doesn't (based on the reply a minute ago). In the end, since there's been no work to enable KVM on PA6T, I'm not too worried. I guess it's one more thing to sort out (and check for) whenever someone does that. I definitely don't have cycles to deal with that myself at this time. I can help find hardware for someone who wants to, but even then I'm guessing the interest is pretty limited. -Olof -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Just for info: "PR" KVM works great on my PA6T machine. I booted the Lubuntu 14.04 PowerPC live DVD on a QEMU virtual machine with "PR" KVM successfully. But Mac OS X Jaguar, Panther, and Tiger don't boot with KVM on Mac-on-Linux and QEMU. See http://forum.hyperion-entertainment.biz/viewtopic.php?f=35&t=1747. -- Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: x86: improve the usability of the 'kvm_pio' tracepoint
Il 02/05/2014 17:57, Ulrich Obergfell ha scritto: This patch moves the 'kvm_pio' tracepoint to emulator_pio_in_emulated() and emulator_pio_out_emulated(), and it adds an argument (a pointer to the 'pio_data'). A single 8-bit or 16-bit or 32-bit data item is fetched from 'pio_data' (depending on 'size'), and the value is included in the trace record ('val'). If 'count' is greater than one, this is indicated by the string "(...)" in the trace output. A difference is that the tracepoint will be reported after an exit to userspace in the case of "in", rather than before. The improvement however is noticeable; especially for "out" it allows to obtain much more information about the state of a device from a long trace. Applying to kvm/queue, thanks. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: x86_64 allyesconfig has screwed up voffset and blows up KVM
On 05/05/2014 11:41 AM, Andy Lutomirski wrote: > I'm testing 39bfe90706ab0f588db7cb4d1c0e6d1181e1d2f9. I'm not sure > what's going on here. > > voffset.h contains: > > #define VO__end 0x8111c7a0 > #define VO__end 0x8db9a000 > #define VO__text 0x8100 > > because > > $ nm vmlinux|grep ' _end' > 8111c7a0 t _end > 8db9a000 B _end > The "t _end" implies there is a local symbol _end which I guess the scripts are incorrectly picking up. Taking a look now. -hpa -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 05/20] machine: Replace QEMUMachine by MachineClass in accelerator configuration
From: Marcel Apfelbaum This minimizes QEMUMachine usage, as part of machine QOM-ification. Signed-off-by: Marcel Apfelbaum Signed-off-by: Andreas Färber --- include/hw/boards.h | 3 +-- include/hw/xen/xen.h| 2 +- include/qemu/typedefs.h | 1 + include/sysemu/kvm.h| 2 +- include/sysemu/qtest.h | 2 +- kvm-all.c | 6 +++--- kvm-stub.c | 2 +- qtest.c | 2 +- vl.c| 10 +- xen-all.c | 2 +- xen-stub.c | 2 +- 11 files changed, 17 insertions(+), 17 deletions(-) diff --git a/include/hw/boards.h b/include/hw/boards.h index be2e432..8f53334 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -3,12 +3,11 @@ #ifndef HW_BOARDS_H #define HW_BOARDS_H +#include "qemu/typedefs.h" #include "sysemu/blockdev.h" #include "hw/qdev.h" #include "qom/object.h" -typedef struct MachineClass MachineClass; - typedef struct QEMUMachineInitArgs { const MachineClass *machine; ram_addr_t ram_size; diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h index 9d549fc..85fda3d 100644 --- a/include/hw/xen/xen.h +++ b/include/hw/xen/xen.h @@ -36,7 +36,7 @@ void xen_cmos_set_s3_resume(void *opaque, int irq, int level); qemu_irq *xen_interrupt_controller_init(void); -int xen_init(QEMUMachine *machine); +int xen_init(MachineClass *mc); int xen_hvm_init(MemoryRegion **ram_memory); void xenstore_store_pv_console_info(int i, struct CharDriverState *chr); diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h index bf8daac..86bab12 100644 --- a/include/qemu/typedefs.h +++ b/include/qemu/typedefs.h @@ -31,6 +31,7 @@ typedef struct MemoryListener MemoryListener; typedef struct MemoryMappingList MemoryMappingList; typedef struct QEMUMachine QEMUMachine; +typedef struct MachineClass MachineClass; typedef struct NICInfo NICInfo; typedef struct HCIInfo HCIInfo; typedef struct AudioState AudioState; diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index 192fe89..5ad4e0e 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -152,7 +152,7 @@ extern KVMState *kvm_state; /* external API */ -int kvm_init(QEMUMachine *machine); +int kvm_init(MachineClass *mc); int kvm_has_sync_mmu(void); int kvm_has_vcpu_events(void); diff --git a/include/sysemu/qtest.h b/include/sysemu/qtest.h index 224131f..95c9ade 100644 --- a/include/sysemu/qtest.h +++ b/include/sysemu/qtest.h @@ -26,7 +26,7 @@ static inline bool qtest_enabled(void) bool qtest_driver(void); -int qtest_init_accel(QEMUMachine *machine); +int qtest_init_accel(MachineClass *mc); void qtest_init(const char *qtest_chrdev, const char *qtest_log, Error **errp); static inline int qtest_available(void) diff --git a/kvm-all.c b/kvm-all.c index 82a9119..5cb7f26 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1341,7 +1341,7 @@ static int kvm_max_vcpus(KVMState *s) return (ret) ? ret : kvm_recommended_vcpus(s); } -int kvm_init(QEMUMachine *machine) +int kvm_init(MachineClass *mc) { static const char upgrade_note[] = "Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n" @@ -1433,8 +1433,8 @@ int kvm_init(QEMUMachine *machine) } kvm_type = qemu_opt_get(qemu_get_machine_opts(), "kvm-type"); -if (machine->kvm_type) { -type = machine->kvm_type(kvm_type); +if (mc->kvm_type) { +type = mc->kvm_type(kvm_type); } else if (kvm_type) { fprintf(stderr, "Invalid argument kvm-type=%s\n", kvm_type); goto err; diff --git a/kvm-stub.c b/kvm-stub.c index ccdba62..8acda86 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -34,7 +34,7 @@ int kvm_init_vcpu(CPUState *cpu) return -ENOSYS; } -int kvm_init(QEMUMachine *machine) +int kvm_init(MachineClass *mc) { return -ENOSYS; } diff --git a/qtest.c b/qtest.c index 0ac9f42..2aba20d 100644 --- a/qtest.c +++ b/qtest.c @@ -500,7 +500,7 @@ static void qtest_event(void *opaque, int event) } } -int qtest_init_accel(QEMUMachine *machine) +int qtest_init_accel(MachineClass *mc) { configure_icount("0"); diff --git a/vl.c b/vl.c index 2c2b625..f423b2e 100644 --- a/vl.c +++ b/vl.c @@ -2725,7 +2725,7 @@ static MachineClass *machine_parse(const char *name) exit(!name || !is_help_option(name)); } -static int tcg_init(QEMUMachine *machine) +static int tcg_init(MachineClass *mc) { tcg_exec_init(tcg_tb_size * 1024 * 1024); return 0; @@ -2735,7 +2735,7 @@ static struct { const char *opt_name; const char *name; int (*available)(void); -int (*init)(QEMUMachine *); +int (*init)(MachineClass *mc); bool *allowed; } accel_list[] = { { "tcg", "tcg", tcg_available, tcg_init, &tcg_allowed }, @@ -2744,7 +2744,7 @@ static struct { { "qtest", "QTest", qtest_available, qtest_init_accel, &qtest_allowed }, }; -static int configure_accelerator(QEMUMachine *machine) +static int configure_accelerator(MachineClass *mc) { const char *p;
Re: KVM exit on UD interception
Thank you Jun! I see that in case of VMX does not emulated the instruction that produced a UD exception, it just queues the exception and returns 1. After that KVM will still try to enter virtualized execution and so forth, the execution probably finishing with a DF and shut down. It does not seem that KVM, in case of VMX, will exit immediately on UD. I am not sure what you meant with MOVBE emulation. Thanks, Alex On Mon, May 5, 2014 at 12:34 PM, Nakajima, Jun wrote: > On Mon, May 5, 2014 at 8:56 AM, Alexandru Duţu wrote: >> Dear all, >> >> It seems that currently, on UD interception KVM does not exit >> completely. Virtualized execution finishes, KVM executes >> ud_intercept() after which it enters virtualized execution again. > > Maybe you might want to take a look at the VMX side (to port it to > SVM). The MOVBE emulation, for example, should be helpful. > >> >> I am working on accelerating with virtualized execution a simulator >> that emulates system calls. Essentially doing virtualized execution >> without a OS kernel. In order to make this work, I had to modify my >> the KVM kernel module such that ud_intercept() return 0 and not 1 >> which break KVM __vcpu_run loop. This is necessary as I need to trap >> syscall instructions, exit virtualized execution with UD exception, >> emulate the system call in the simulator and after the system call is >> done enter back in virtualized mode and start execution with the help >> of KVM. >> >> So by modifying ud_intercept() to return 0, I got all this to work. Is >> it possible to achieve the same effect (exit on undefined opcode) >> without modifying ud_intercept()? >> >> It seems that re-entering virtualized execution on UD interception >> gives the user the flexibility of running binaries with newer >> instructions on older hardware, if kvm is able to emulate the newer >> instructions. I do not fully understand the details of this scenario, >> is there such a scenario or is it likely that ud_interception() will >> change? >> >> Thank you in advance! >> >> Best regards, >> Alex >> -- > > -- > Jun > Intel Open Source Technology Center -- Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
x86_64 allyesconfig has screwed up voffset and blows up KVM
I'm testing 39bfe90706ab0f588db7cb4d1c0e6d1181e1d2f9. I'm not sure what's going on here. voffset.h contains: #define VO__end 0x8111c7a0 #define VO__end 0x8db9a000 #define VO__text 0x8100 because $ nm vmlinux|grep ' _end' 8111c7a0 t _end 8db9a000 B _end Booting the resulting image says: KVM internal error. Suberror: 1 emulation failure EAX=8001 EBX= ECX=c080 EDX= ESI=00014630 EDI=0b08f000 EBP=0010 ESP=038f14b8 EIP=00100119 EFL=00010046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0018 00c09300 DPL=0 DS [-WA] CS =0010 00c09b00 DPL=0 CS32 [-RA] SS =0018 00c09300 DPL=0 DS [-WA] DS =0018 00c09300 DPL=0 DS [-WA] FS =0018 00c09300 DPL=0 DS [-WA] GS =0018 00c09300 DPL=0 DS [-WA] LDT= 00c0 TR =0020 0fff 00808b00 DPL=0 TSS64-busy GDT= 038e5320 0030 IDT= CR0=8011 CR2= CR3=0b089000 CR4=0020 DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER=0500 Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? Linus's tree from today doesn't seem any better. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward
is it possible to have kvmclock jumping forward? Because I've reproducible case when at about 1 per 20 vm restores, VM freezes for couple of hours and then resumes with date few hundreds years ahead. Happens only with kvmclock. And this patch seems to fix very similar issue so maybe it's all the same bug. I'm fairly sure it is the exact same bug. Jumping backward is like jumping forward by a big amount :) Hi, I've tested your path on my test VM... don't know if it's pure luck or not, but it didn't hang with over 70 restores. The message "KVM Clock migrated backwards, using later time" fires every time, but VM is healthy after resume. -- mg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward
> Am 05.05.2014 um 19:46 schrieb Marcin Gibuła : > > W dniu 2014-05-05 15:51, Alexander Graf pisze: >> When we migrate we ask the kernel about its current belief on what the guest >> time would be. However, I've seen cases where the kvmclock guest structure >> indicates a time more recent than the kvm returned time. > > Hi, > > is it possible to have kvmclock jumping forward? > > Because I've reproducible case when at about 1 per 20 vm restores, VM freezes > for couple of hours and then resumes with date few hundreds years ahead. > Happens only with kvmclock. > > And this patch seems to fix very similar issue so maybe it's all the same bug. I'm fairly sure it is the exact same bug. Jumping backward is like jumping forward by a big amount :) Alex > > -- > mg > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward
W dniu 2014-05-05 15:51, Alexander Graf pisze: When we migrate we ask the kernel about its current belief on what the guest time would be. However, I've seen cases where the kvmclock guest structure indicates a time more recent than the kvm returned time. Hi, is it possible to have kvmclock jumping forward? Because I've reproducible case when at about 1 per 20 vm restores, VM freezes for couple of hours and then resumes with date few hundreds years ahead. Happens only with kvmclock. And this patch seems to fix very similar issue so maybe it's all the same bug. -- mg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM exit on UD interception
On Mon, May 5, 2014 at 8:56 AM, Alexandru Duţu wrote: > Dear all, > > It seems that currently, on UD interception KVM does not exit > completely. Virtualized execution finishes, KVM executes > ud_intercept() after which it enters virtualized execution again. Maybe you might want to take a look at the VMX side (to port it to SVM). The MOVBE emulation, for example, should be helpful. > > I am working on accelerating with virtualized execution a simulator > that emulates system calls. Essentially doing virtualized execution > without a OS kernel. In order to make this work, I had to modify my > the KVM kernel module such that ud_intercept() return 0 and not 1 > which break KVM __vcpu_run loop. This is necessary as I need to trap > syscall instructions, exit virtualized execution with UD exception, > emulate the system call in the simulator and after the system call is > done enter back in virtualized mode and start execution with the help > of KVM. > > So by modifying ud_intercept() to return 0, I got all this to work. Is > it possible to achieve the same effect (exit on undefined opcode) > without modifying ud_intercept()? > > It seems that re-entering virtualized execution on UD interception > gives the user the flexibility of running binaries with newer > instructions on older hardware, if kvm is able to emulate the newer > instructions. I do not fully understand the details of this scenario, > is there such a scenario or is it likely that ud_interception() will > change? > > Thank you in advance! > > Best regards, > Alex > -- -- Jun Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/1] KVM: x86: improve the usability of the 'kvm_pio' tracepoint
> From: "Xiao Guangrong" > To: "Ulrich Obergfell" , kvm@vger.kernel.org > Cc: pbonz...@redhat.com > Sent: Monday, May 5, 2014 9:10:19 AM > Subject: Re: [PATCH 0/1] KVM: x86: improve the usability of the 'kvm_pio' > tracepoint > > On 05/02/2014 11:57 PM, Ulrich Obergfell wrote: >> The current implementation of the 'kvm_pio' tracepoint in >> emulator_pio_in_out() >> only tells us that 'something' has been read from or written to an I/O port. >> To >> improve the usability of the tracepoint, I propose to include the >> value/content >> that has been read or written in the trace output. The proposed patch aims at >> the more common case where a single 8-bit or 16-bit or 32-bit value has been >> read or written -- it does not fully cover the case where 'count' is greater >> than one. >> >> This is an example of what the patch can do (trace of PCI config space >> access). >> >> - on the host >> >># trace-cmd record -e kvm:kvm_pio -f "(port >= 0xcf8) && (port <= 0xcff)" >>/sys/kernel/debug/tracing/events/kvm/kvm_pio/filter >>Hit Ctrl^C to stop recording >> >> - in a Linux guest >> >># dd if=/sys/bus/pci/devices/:00:06.0/config bs=2 count=4 | hexdump >>4+0 records in >>4+0 records out >>8 bytes (8 B) copied, 0.000114056 s, 70.1 kB/s >>000 1af4 1001 0507 0010 >>008 >> >> - on the host >> >># trace-cmd report >>... >>qemu-kvm-23216 [001] 15211.994089: kvm_pio: pio_write >>at 0xcf8 size 4 count 1 val 0x80003000 >>qemu-kvm-23216 [001] 15211.994108: kvm_pio: pio_read >>at 0xcfc size 2 count 1 val 0x1af4 >>qemu-kvm-23216 [001] 15211.994129: kvm_pio: pio_write >>at 0xcf8 size 4 count 1 val 0x80003000 >>qemu-kvm-23216 [001] 15211.994136: kvm_pio: pio_read >>at 0xcfe size 2 count 1 val 0x1001 >>qemu-kvm-23216 [001] 15211.994143: kvm_pio: pio_write >>at 0xcf8 size 4 count 1 val 0x80003004 >>qemu-kvm-23216 [001] 15211.994150: kvm_pio: pio_read >>at 0xcfc size 2 count 1 val 0x507 >>qemu-kvm-23216 [001] 15211.994155: kvm_pio: pio_write >>at 0xcf8 size 4 count 1 val 0x80003004 >>qemu-kvm-23216 [001] 15211.994161: kvm_pio: pio_read >>at 0xcfe size 2 count 1 val 0x10 >> > > Nice. > > Could please check "perf kvm stat" to see if "--event=ioport" > can work after your patch? > > Reviewed-by: Xiao Guangrong I've run a quick test with a local kernel - built from 3.15.0-rc1 source, including the proposed patch - in combination with the 'perf' package that is installed on my test machine. I didn't build a new 'perf' binary from 3.15.0-rc1 source. The following output of the 'perf kvm stat live --event=ioport -d 10' command looks reasonable. 17:10:29.036811 Analyze events for all VMs, all VCPUs: IO Port AccessSamples Samples% Time% Min Time Max Time Avg time 0x177:PIN 3520.00%15.40%1us3us 1.68us ( +- 8.63% ) 0x376:PIN 3017.14%16.37%1us6us 2.08us ( +- 17.15% ) 0x170:POUT 15 8.57%18.99%2us9us 4.83us ( +- 14.34% ) 0xc0ea:POUT 10 5.71% 6.57%2us2us 2.51us ( +- 5.06% ) 0xc0ea:PIN 10 5.71% 6.21%1us6us 2.37us ( +- 23.18% ) 0x176:POUT 10 5.71% 6.69%1us3us 2.55us ( +- 7.59% ) 0x170:PIN 5 2.86% 3.36%2us2us 2.56us ( +- 1.17% ) 0x171:PIN 5 2.86% 1.47%1us1us 1.12us ( +- 0.37% ) 0x171:POUT 5 2.86% 3.26%2us2us 2.49us ( +- 2.25% ) 0x172:PIN 5 2.86% 1.45%1us1us 1.11us ( +- 0.24% ) 0x172:POUT 5 2.86% 2.67%1us2us 2.04us ( +- 3.00% ) 0x173:PIN 5 2.86% 1.46%1us1us 1.11us ( +- 0.29% ) 0x173:POUT 5 2.86% 2.60%1us2us 1.99us ( +- 2.96% ) 0x174:PIN 5 2.86% 1.45%1us1us 1.11us ( +- 0.16% ) 0x174:POUT 5 2.86% 2.60%1us2us 1.99us ( +- 3.13% ) 0x175:PIN 5 2.86% 1.46%1us1us 1.12us ( +- 0.15% ) 0x175:POUT 5 2.86% 2.60%1us2us 1.98us ( +- 3.04% ) 0x176:PIN 5 2.86% 1.45%1us1us 1.11us ( +- 0.23% ) 0x177:POUT 5 2.86% 3.94%2us3us 3.01us ( +- 2.06% ) Total Samples:
KVM exit on UD interception
Dear all, It seems that currently, on UD interception KVM does not exit completely. Virtualized execution finishes, KVM executes ud_intercept() after which it enters virtualized execution again. I am working on accelerating with virtualized execution a simulator that emulates system calls. Essentially doing virtualized execution without a OS kernel. In order to make this work, I had to modify my the KVM kernel module such that ud_intercept() return 0 and not 1 which break KVM __vcpu_run loop. This is necessary as I need to trap syscall instructions, exit virtualized execution with UD exception, emulate the system call in the simulator and after the system call is done enter back in virtualized mode and start execution with the help of KVM. So by modifying ud_intercept() to return 0, I got all this to work. Is it possible to achieve the same effect (exit on undefined opcode) without modifying ud_intercept()? It seems that re-entering virtualized execution on UD interception gives the user the flexibility of running binaries with newer instructions on older hardware, if kvm is able to emulate the newer instructions. I do not fully understand the details of this scenario, is there such a scenario or is it likely that ud_interception() will change? Thank you in advance! Best regards, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
Alexander Graf writes: >> Am 05.05.2014 um 16:35 schrieb "Aneesh Kumar K.V" >> : >> >> Alexander Graf writes: >> On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote: We reserve 5% of total ram for CMA allocation and not using that can result in us running out of numa node memory with specific configuration. One caveat is we may not have node local hpt with pinned vcpu configuration. But currently libvirt also pins the vcpu to cpuset after creating hash page table. >>> >>> I don't understand the problem. Can you please elaborate? >> >> Lets take a system with 100GB RAM. We reserve around 5GB for htab >> allocation. Now if we use rest of available memory for hugetlbfs >> (because we want all the guest to be backed by huge pages), we would >> end up in a situation where we have a few GB of free RAM and 5GB of CMA >> reserve area. Now if we allow hash page table allocation to consume the >> free space, we would end up hitting page allocation failure for other >> non movable kernel allocation even though we still have 5GB CMA reserve >> space free. > > Isn't this a greater problem? We should start swapping before we hit > the point where non movable kernel allocation fails, no? But there is nothing much to swap. Because most of the memory is reserved for guest RAM via hugetlbfs. > > The fact that KVM uses a good number of normal kernel pages is maybe > suboptimal, but shouldn't be a critical problem. Yes. But then in this case we could do better isn't it ? We already have a large part of guest RAM kept aside for htab allocation which cannot be used for non movable allocation. And we ignore that reserve space and use other areas for hash page table allocation with the current code. We actually hit this case in one of the test box. KVM guest htab at c01e5000 (order 30), LPID 1 libvirtd invoked oom-killer: gfp_mask=0x2000d0, order=0,oom_score_adj=0 libvirtd cpuset=/ mems_allowed=0,16 CPU: 72 PID: 20044 Comm: libvirtd Not tainted 3.10.23-1401.pkvm2_1.4.ppc64 #1 Call Trace: [c01e3b63f150] [c0017330] .show_stack+0x130/0x200(unreliable) [c01e3b63f220] [c087a888] .dump_stack+0x28/0x3c [c01e3b63f290] [c0876a4c] .dump_header+0xbc/0x228 [c01e3b63f360] [c01dd838].oom_kill_process+0x318/0x4c0 [c01e3b63f440] [c01de258] .out_of_memory+0x518/0x550 [c01e3b63f520] [c01e5aac].__alloc_pages_nodemask+0xb3c/0xbf0 [c01e3b63f700] [c0243580] .new_slab+0x440/0x490 [c01e3b63f7a0] [c08781fc] .__slab_alloc+0x17c/0x618 [c01e3b63f8d0] [c02467fc].kmem_cache_alloc_node_trace+0xcc/0x300 [c01e3b63f990] [c010f62c].alloc_fair_sched_group+0xfc/0x200 [c01e3b63fa60] [c0104f00].sched_create_group+0x50/0xe0 [c01e3b63fae0] [c0104fc0].cpu_cgroup_css_alloc+0x30/0x80 [c01e3b63fb60] [c01513ec] .cgroup_mkdir+0x2bc/0x6e0 [c01e3b63fc50] [c0275aec] .vfs_mkdir+0x14c/0x220 [c01e3b63fcf0] [c027a734] .SyS_mkdirat+0x94/0x110 [c01e3b63fdb0] [c027a7e4] .SyS_mkdir+0x34/0x50 [c01e3b63fe30] [c0009f54] syscall_exit+0x0/0x98 Node 0 DMA free:23424kB min:23424kB low:29248kB high:35136kB active_anon:0kB inactive_anon:128kB active_file:256kB inactive_file:384kB unevictable:9536kB isolated(anon):0kB isolated(file):0kB present:67108864kB managed:65931776kB mlocked:9536kB dirty:64kB writeback:0kB mapped:5376kB shmem:0kB slab_reclaimable:23616kB slab_unreclaimable:1237056kB kernel_stack:18256kB pagetables:1088kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:78 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 Node 16 DMA free:5787008kB min:21376kB low:26688kB high:32064kB active_anon:1984kB inactive_anon:2112kB active_file:896kB inactive_file:64kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:67108864kB managed:60060032kB mlocked:0kB dirty:128kB writeback:3712kB mapped:0kB shmem:0kB slab_reclaimable:23424kB slab_unreclaimable:826048kB kernel_stack:576kB pagetables:1408kB unstable:0kB bounce:0kB free_cma:5767040kB writeback_tmp:0kB pages_scanned:756 all_unreclaimable? yes -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/11] perf kvm: allow for variable string sizes
On 5/5/14, 4:27 AM, Christian Borntraeger wrote: diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c index 922706c..806c0e4 100644 --- a/tools/perf/builtin-kvm.c +++ b/tools/perf/builtin-kvm.c @@ -75,7 +75,7 @@ struct kvm_events_ops { bool (*is_end_event)(struct perf_evsel *evsel, struct perf_sample *sample, struct event_key *key); void (*decode_key)(struct perf_kvm_stat *kvm, struct event_key *key, - char decode[20]); + char *decode); const char *name; }; @@ -84,6 +84,8 @@ struct exit_reasons_table { const char *reason; }; +#define DECODE_STR_LEN_MAX 80 + #define EVENTS_BITS 12 #define EVENTS_CACHE_SIZE (1UL << EVENTS_BITS) @@ -101,6 +103,8 @@ struct perf_kvm_stat { struct exit_reasons_table *exit_reasons; const char *exit_reasons_isa; + int decode_str_len; + This should not be a part of the perf_kvm_stat struct. Just leave it as a macro and use DECODE_STR_LEN_MAX in place of 20. Which means DECODE_STR_LEN_MAX needs to be 20 in this patch, and arch specific in the follow up patch. struct kvm_events_ops *events_ops; key_cmp_fun compare; struct list_head kvm_events_cache[EVENTS_CACHE_SIZE]; @@ -182,12 +186,12 @@ static const char *get_exit_reason(struct perf_kvm_stat *kvm, static void exit_event_decode_key(struct perf_kvm_stat *kvm, struct event_key *key, - char decode[20]) + char *decode) { const char *exit_reason = get_exit_reason(kvm, kvm->exit_reasons, key->key); - scnprintf(decode, 20, "%s", exit_reason); + scnprintf(decode, kvm->decode_str_len, "%s", exit_reason); } static struct kvm_events_ops exit_events = { @@ -249,10 +253,11 @@ static bool mmio_event_end(struct perf_evsel *evsel, struct perf_sample *sample, static void mmio_event_decode_key(struct perf_kvm_stat *kvm __maybe_unused, struct event_key *key, - char decode[20]) + char *decode) { - scnprintf(decode, 20, "%#lx:%s", (unsigned long)key->key, - key->info == KVM_TRACE_MMIO_WRITE ? "W" : "R"); + scnprintf(decode, kvm->decode_str_len, "%#lx:%s", + (unsigned long)key->key, + key->info == KVM_TRACE_MMIO_WRITE ? "W" : "R"); } static struct kvm_events_ops mmio_events = { @@ -292,10 +297,11 @@ static bool ioport_event_end(struct perf_evsel *evsel, static void ioport_event_decode_key(struct perf_kvm_stat *kvm __maybe_unused, struct event_key *key, - char decode[20]) + char *decode) { - scnprintf(decode, 20, "%#llx:%s", (unsigned long long)key->key, - key->info ? "POUT" : "PIN"); + scnprintf(decode, kvm->decode_str_len, "%#llx:%s", + (unsigned long long)key->key, + key->info ? "POUT" : "PIN"); } static struct kvm_events_ops ioport_events = { @@ -523,13 +529,13 @@ static bool handle_end_event(struct perf_kvm_stat *kvm, time_diff = sample->time - time_begin; if (kvm->duration && time_diff > kvm->duration) { - char decode[32]; + char decode[DECODE_STR_LEN_MAX]; kvm->events_ops->decode_key(kvm, &event->key, decode); if (strcmp(decode, "HLT")) { - pr_info("%" PRIu64 " VM %d, vcpu %d: %s event took %" PRIu64 "usec\n", + pr_info("%" PRIu64 " VM %d, vcpu %d: %*s event took %" PRIu64 "usec\n", sample->time, sample->pid, vcpu_record->vcpu_id, -decode, time_diff/1000); +32, decode, time_diff/1000); This pr_info does not need the length. David -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
> Am 05.05.2014 um 16:35 schrieb "Aneesh Kumar K.V" > : > > Alexander Graf writes: > >>> On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote: >>> We reserve 5% of total ram for CMA allocation and not using that can >>> result in us running out of numa node memory with specific >>> configuration. One caveat is we may not have node local hpt with pinned >>> vcpu configuration. But currently libvirt also pins the vcpu to cpuset >>> after creating hash page table. >> >> I don't understand the problem. Can you please elaborate? > > Lets take a system with 100GB RAM. We reserve around 5GB for htab > allocation. Now if we use rest of available memory for hugetlbfs > (because we want all the guest to be backed by huge pages), we would > end up in a situation where we have a few GB of free RAM and 5GB of CMA > reserve area. Now if we allow hash page table allocation to consume the > free space, we would end up hitting page allocation failure for other > non movable kernel allocation even though we still have 5GB CMA reserve > space free. Isn't this a greater problem? We should start swapping before we hit the point where non movable kernel allocation fails, no? The fact that KVM uses a good number of normal kernel pages is maybe suboptimal, but shouldn't be a critical problem. Alex > > -aneesh > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
> Am 05.05.2014 um 16:50 schrieb "Aneesh Kumar K.V" > : > > Alexander Graf writes: > >>> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote: >>> Alexander Graf writes: >>> > On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: > Although it's optional IBM POWER cpus always had DAR value set on > alignment interrupt. So don't try to compute these values. > > Signed-off-by: Aneesh Kumar K.V > --- > Changes from V3: > * Use make_dsisr instead of checking feature flag to decide whether to use >saved dsisr or not >>> >>> > ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst) > { > +#ifdef CONFIG_PPC_BOOK3S_64 > +return vcpu->arch.fault_dar; How about PA6T and G5s? >>> Paul mentioned that BOOK3S always had DAR value set on alignment >>> interrupt. And the patch is to enable/collect correct DAR value when >>> running with Little Endian PR guest. Now to limit the impact and to >>> enable Little Endian PR guest, I ended up doing the conditional code >>> only for book3s 64 for which we know for sure that we set DAR value. >> >> Yes, and I'm asking whether we know that this statement holds true for >> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is >> at least developed by IBM, I'd assume its semantics here are similar to >> POWER4, but for PA6T I wouldn't be so sure. > > I will have to defer to Paul on that question. But that should not > prevent this patch from going upstream right ? Regressions are big no-gos. Alex > > -aneesh > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
> Am 05.05.2014 um 16:57 schrieb Olof Johansson : > > [Now without HTML email -- it's what you get for cc:ing me at work > instead of my upstream email :)] > > 2014-05-05 7:43 GMT-07:00 Alexander Graf : >> >>> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote: >>> >>> Alexander Graf writes: >>> > On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: > > Although it's optional IBM POWER cpus always had DAR value set on > alignment interrupt. So don't try to compute these values. > > Signed-off-by: Aneesh Kumar K.V > --- > Changes from V3: > * Use make_dsisr instead of checking feature flag to decide whether to use >saved dsisr or not >>> >>> > ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst) > { > +#ifdef CONFIG_PPC_BOOK3S_64 > + return vcpu->arch.fault_dar; How about PA6T and G5s? >>> Paul mentioned that BOOK3S always had DAR value set on alignment >>> interrupt. And the patch is to enable/collect correct DAR value when >>> running with Little Endian PR guest. Now to limit the impact and to >>> enable Little Endian PR guest, I ended up doing the conditional code >>> only for book3s 64 for which we know for sure that we set DAR value. >> >> >> Yes, and I'm asking whether we know that this statement holds true for PA6T >> and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at least >> developed by IBM, I'd assume its semantics here are similar to POWER4, but >> for PA6T I wouldn't be so sure. > > Thanks for looking out for us, obviously IBM doesn't (based on the > reply a minute ago). > > In the end, since there's been no work to enable KVM on PA6T, I'm not > too worried. I guess it's one more thing to sort out (and check for) > whenever someone does that. > > I definitely don't have cycles to deal with that myself at this time. > I can help find hardware for someone who wants to, but even then I'm > guessing the interest is pretty limited. I know of at least 1 person who successfully runs PR KVM on a PA6T, so it's neither neglected nor non-working. If you can get me access to a pa6t system I can easily check whether alignment interrupts generate dar and dsisr properly :). Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
2014-05-05 8:03 GMT-07:00 Aneesh Kumar K.V : > Olof Johansson writes: > >> 2014-05-05 7:43 GMT-07:00 Alexander Graf : >> >>> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote: >>> Alexander Graf writes: On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: > >> Although it's optional IBM POWER cpus always had DAR value set on >> alignment interrupt. So don't try to compute these values. >> >> Signed-off-by: Aneesh Kumar K.V >> --- >> Changes from V3: >> * Use make_dsisr instead of checking feature flag to decide whether to >> use >> saved dsisr or not >> >> ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst) >>{ >> +#ifdef CONFIG_PPC_BOOK3S_64 >> + return vcpu->arch.fault_dar; >> > How about PA6T and G5s? > > > Paul mentioned that BOOK3S always had DAR value set on alignment interrupt. And the patch is to enable/collect correct DAR value when running with Little Endian PR guest. Now to limit the impact and to enable Little Endian PR guest, I ended up doing the conditional code only for book3s 64 for which we know for sure that we set DAR value. >>> >>> Yes, and I'm asking whether we know that this statement holds true for >>> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at >>> least developed by IBM, I'd assume its semantics here are similar to >>> POWER4, but for PA6T I wouldn't be so sure. >>> >>> >> Thanks for looking out for us, obviously IBM doesn't (based on the reply a >> minute ago). > > The reason I deferred the question to Paul is really because I don't > know enough about PA6T and G5 to comment. I intentionally restricted the > changes to BOOK3S_64 because I wanted to make sure I don't break > anything else. It is in no way to hint that others don't care. Ah, I see -- the disconnect is that you don't think PA6T and 970 are 64-bit book3s CPUs. They are. -Olof -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
Olof Johansson writes: > 2014-05-05 7:43 GMT-07:00 Alexander Graf : > >> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote: >> >>> Alexander Graf writes: >>> >>> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: > Although it's optional IBM POWER cpus always had DAR value set on > alignment interrupt. So don't try to compute these values. > > Signed-off-by: Aneesh Kumar K.V > --- > Changes from V3: > * Use make_dsisr instead of checking feature flag to decide whether to > use > saved dsisr or not > > >>> >>> ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst) >{ > +#ifdef CONFIG_PPC_BOOK3S_64 > + return vcpu->arch.fault_dar; > How about PA6T and G5s? Paul mentioned that BOOK3S always had DAR value set on alignment >>> interrupt. And the patch is to enable/collect correct DAR value when >>> running with Little Endian PR guest. Now to limit the impact and to >>> enable Little Endian PR guest, I ended up doing the conditional code >>> only for book3s 64 for which we know for sure that we set DAR value. >>> >> >> Yes, and I'm asking whether we know that this statement holds true for >> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at >> least developed by IBM, I'd assume its semantics here are similar to >> POWER4, but for PA6T I wouldn't be so sure. >> >> > Thanks for looking out for us, obviously IBM doesn't (based on the reply a > minute ago). The reason I deferred the question to Paul is really because I don't know enough about PA6T and G5 to comment. I intentionally restricted the changes to BOOK3S_64 because I wanted to make sure I don't break anything else. It is in no way to hint that others don't care. -aneesh -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
[Now without HTML email -- it's what you get for cc:ing me at work instead of my upstream email :)] 2014-05-05 7:43 GMT-07:00 Alexander Graf : > > On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote: >> >> Alexander Graf writes: >> >>> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: Although it's optional IBM POWER cpus always had DAR value set on alignment interrupt. So don't try to compute these values. Signed-off-by: Aneesh Kumar K.V --- Changes from V3: * Use make_dsisr instead of checking feature flag to decide whether to use saved dsisr or not >> >> ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst) { +#ifdef CONFIG_PPC_BOOK3S_64 + return vcpu->arch.fault_dar; >>> >>> How about PA6T and G5s? >>> >>> >> Paul mentioned that BOOK3S always had DAR value set on alignment >> interrupt. And the patch is to enable/collect correct DAR value when >> running with Little Endian PR guest. Now to limit the impact and to >> enable Little Endian PR guest, I ended up doing the conditional code >> only for book3s 64 for which we know for sure that we set DAR value. > > > Yes, and I'm asking whether we know that this statement holds true for PA6T > and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at least > developed by IBM, I'd assume its semantics here are similar to POWER4, but > for PA6T I wouldn't be so sure. > Thanks for looking out for us, obviously IBM doesn't (based on the reply a minute ago). In the end, since there's been no work to enable KVM on PA6T, I'm not too worried. I guess it's one more thing to sort out (and check for) whenever someone does that. I definitely don't have cycles to deal with that myself at this time. I can help find hardware for someone who wants to, but even then I'm guessing the interest is pretty limited. -Olof -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
Alexander Graf writes: > On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote: >> Alexander Graf writes: >> >>> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: Although it's optional IBM POWER cpus always had DAR value set on alignment interrupt. So don't try to compute these values. Signed-off-by: Aneesh Kumar K.V --- Changes from V3: * Use make_dsisr instead of checking feature flag to decide whether to use saved dsisr or not >> >> ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst) { +#ifdef CONFIG_PPC_BOOK3S_64 + return vcpu->arch.fault_dar; >>> How about PA6T and G5s? >>> >>> >> Paul mentioned that BOOK3S always had DAR value set on alignment >> interrupt. And the patch is to enable/collect correct DAR value when >> running with Little Endian PR guest. Now to limit the impact and to >> enable Little Endian PR guest, I ended up doing the conditional code >> only for book3s 64 for which we know for sure that we set DAR value. > > Yes, and I'm asking whether we know that this statement holds true for > PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is > at least developed by IBM, I'd assume its semantics here are similar to > POWER4, but for PA6T I wouldn't be so sure. I will have to defer to Paul on that question. But that should not prevent this patch from going upstream right ? -aneesh -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] KVM: PPC: Book3S PR: Add POWER8 support
Alexander Graf writes: > On 05/05/2014 04:38 PM, Aneesh Kumar K.V wrote: >> Alexander Graf writes: >> >>> On 05/04/2014 06:36 PM, Aneesh Kumar K.V wrote: Alexander Graf writes: > When running on a POWER8 host, we get away with running the guest as > POWER7 > and nothing falls apart. > > However, when we start exposing POWER8 as guest CPU, guests will start > using > new abilities on POWER8 which we need to handle. > > This patch set does a minimalistic approach to implementing those bits to > make guests happy enough to run. > > > Alex > > Alexander Graf (6): > KVM: PPC: Book3S PR: Ignore PMU SPRs > KVM: PPC: Book3S PR: Emulate TIR register > KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR > KVM: PPC: Book3S PR: Expose TAR facility to guest > KVM: PPC: Book3S PR: Expose EBB registers > KVM: PPC: Book3S PR: Expose TM registers > >arch/powerpc/include/asm/kvm_asm.h| 18 --- >arch/powerpc/include/asm/kvm_book3s_asm.h | 2 + >arch/powerpc/include/asm/kvm_host.h | 3 ++ >arch/powerpc/kernel/asm-offsets.c | 3 ++ >arch/powerpc/kvm/book3s.c | 34 + >arch/powerpc/kvm/book3s_emulate.c | 53 >arch/powerpc/kvm/book3s_hv.c | 30 --- >arch/powerpc/kvm/book3s_pr.c | 82 > +++ >arch/powerpc/kvm/book3s_segment.S | 25 ++ >9 files changed, 212 insertions(+), 38 deletions(-) > I did most of this as part of [RFC PATCH 01/10] KVM: PPC: BOOK3S: PR: Add POWER8 support http://mid.gmane.org/1390927455-3312-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com Any reason why that is not picked up ? TM was the reason I didn't push the patchset again. I was not sure how to get all the TM details to work. >>> Ugh, I guess I mostly discarded it as brainstorm patches because they >>> were marked RFC :( >>> >> Do you want me to rework them ?. I guess facility unavailable part and >> TM part in this series are better than what I had. Rest all are more or >> less similar. Or you could cherry pick the SPR handling you haven't >> added yet from this series ? > > I personally refuse to apply patches that are marked RFC, since IMHO on > those the author himself isn't sure he wants them applied yet :). > > I'd say I'll just apply mine after another autotest run and then you > rebase your things on top and fill the gaps with a real, non-RFC patch set. Will do -aneesh -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] KVM: PPC: Book3S PR: Add POWER8 support
On 05/05/2014 04:38 PM, Aneesh Kumar K.V wrote: Alexander Graf writes: On 05/04/2014 06:36 PM, Aneesh Kumar K.V wrote: Alexander Graf writes: When running on a POWER8 host, we get away with running the guest as POWER7 and nothing falls apart. However, when we start exposing POWER8 as guest CPU, guests will start using new abilities on POWER8 which we need to handle. This patch set does a minimalistic approach to implementing those bits to make guests happy enough to run. Alex Alexander Graf (6): KVM: PPC: Book3S PR: Ignore PMU SPRs KVM: PPC: Book3S PR: Emulate TIR register KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR KVM: PPC: Book3S PR: Expose TAR facility to guest KVM: PPC: Book3S PR: Expose EBB registers KVM: PPC: Book3S PR: Expose TM registers arch/powerpc/include/asm/kvm_asm.h| 18 --- arch/powerpc/include/asm/kvm_book3s_asm.h | 2 + arch/powerpc/include/asm/kvm_host.h | 3 ++ arch/powerpc/kernel/asm-offsets.c | 3 ++ arch/powerpc/kvm/book3s.c | 34 + arch/powerpc/kvm/book3s_emulate.c | 53 arch/powerpc/kvm/book3s_hv.c | 30 --- arch/powerpc/kvm/book3s_pr.c | 82 +++ arch/powerpc/kvm/book3s_segment.S | 25 ++ 9 files changed, 212 insertions(+), 38 deletions(-) I did most of this as part of [RFC PATCH 01/10] KVM: PPC: BOOK3S: PR: Add POWER8 support http://mid.gmane.org/1390927455-3312-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com Any reason why that is not picked up ? TM was the reason I didn't push the patchset again. I was not sure how to get all the TM details to work. Ugh, I guess I mostly discarded it as brainstorm patches because they were marked RFC :( Do you want me to rework them ?. I guess facility unavailable part and TM part in this series are better than what I had. Rest all are more or less similar. Or you could cherry pick the SPR handling you haven't added yet from this series ? I personally refuse to apply patches that are marked RFC, since IMHO on those the author himself isn't sure he wants them applied yet :). I'd say I'll just apply mine after another autotest run and then you rebase your things on top and fill the gaps with a real, non-RFC patch set. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest
Alexander Graf writes: > On 05/04/2014 07:30 PM, Aneesh Kumar K.V wrote: >> Signed-off-by: Aneesh Kumar K.V > > No patch description, no proper explanations anywhere why you're doing > what. All of that in a pretty sensitive piece of code. There's no way > this patch can go upstream in its current form. > Sorry about being vague. Will add a better commit message. The goal is to export MPSS support to guest if the host support the same. MPSS support is exported via penc encoding in "ibm,segment-page-sizes". The actual format can be found at htab_dt_scan_page_sizes. When the guest memory is backed by hugetlbfs we expose the penc encoding the host support to guest via kvmppc_add_seg_page_size. Now the challenge to THP support is to make sure that our henter, hremove etc decode base page size and actual page size correctly from the hash table entry values. Most of the changes is to do that. Rest of the stuff is already handled by kvm. NOTE: It is much easier to read the code after applying the patch rather than reading the diff. I have added comments around each steps in the code. -aneesh -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote: Alexander Graf writes: On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: Although it's optional IBM POWER cpus always had DAR value set on alignment interrupt. So don't try to compute these values. Signed-off-by: Aneesh Kumar K.V --- Changes from V3: * Use make_dsisr instead of checking feature flag to decide whether to use saved dsisr or not ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst) { +#ifdef CONFIG_PPC_BOOK3S_64 + return vcpu->arch.fault_dar; How about PA6T and G5s? Paul mentioned that BOOK3S always had DAR value set on alignment interrupt. And the patch is to enable/collect correct DAR value when running with Little Endian PR guest. Now to limit the impact and to enable Little Endian PR guest, I ended up doing the conditional code only for book3s 64 for which we know for sure that we set DAR value. Yes, and I'm asking whether we know that this statement holds true for PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at least developed by IBM, I'd assume its semantics here are similar to POWER4, but for PA6T I wouldn't be so sure. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] KVM: PPC: Book3S PR: Add POWER8 support
Alexander Graf writes: > On 05/04/2014 06:36 PM, Aneesh Kumar K.V wrote: >> Alexander Graf writes: >> >>> When running on a POWER8 host, we get away with running the guest as POWER7 >>> and nothing falls apart. >>> >>> However, when we start exposing POWER8 as guest CPU, guests will start using >>> new abilities on POWER8 which we need to handle. >>> >>> This patch set does a minimalistic approach to implementing those bits to >>> make guests happy enough to run. >>> >>> >>> Alex >>> >>> Alexander Graf (6): >>>KVM: PPC: Book3S PR: Ignore PMU SPRs >>>KVM: PPC: Book3S PR: Emulate TIR register >>>KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR >>>KVM: PPC: Book3S PR: Expose TAR facility to guest >>>KVM: PPC: Book3S PR: Expose EBB registers >>>KVM: PPC: Book3S PR: Expose TM registers >>> >>> arch/powerpc/include/asm/kvm_asm.h| 18 --- >>> arch/powerpc/include/asm/kvm_book3s_asm.h | 2 + >>> arch/powerpc/include/asm/kvm_host.h | 3 ++ >>> arch/powerpc/kernel/asm-offsets.c | 3 ++ >>> arch/powerpc/kvm/book3s.c | 34 + >>> arch/powerpc/kvm/book3s_emulate.c | 53 >>> arch/powerpc/kvm/book3s_hv.c | 30 --- >>> arch/powerpc/kvm/book3s_pr.c | 82 >>> +++ >>> arch/powerpc/kvm/book3s_segment.S | 25 ++ >>> 9 files changed, 212 insertions(+), 38 deletions(-) >>> >> I did most of this as part of >> >> [RFC PATCH 01/10] KVM: PPC: BOOK3S: PR: Add POWER8 support >> http://mid.gmane.org/1390927455-3312-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com >> >> Any reason why that is not picked up ? TM was the reason I didn't push the >> patchset again. I was not sure how to get all the TM details to >> work. > > Ugh, I guess I mostly discarded it as brainstorm patches because they > were marked RFC :( > Do you want me to rework them ?. I guess facility unavailable part and TM part in this series are better than what I had. Rest all are more or less similar. Or you could cherry pick the SPR handling you haven't added yet from this series ? -aneesh -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
Alexander Graf writes: > On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote: >> We reserve 5% of total ram for CMA allocation and not using that can >> result in us running out of numa node memory with specific >> configuration. One caveat is we may not have node local hpt with pinned >> vcpu configuration. But currently libvirt also pins the vcpu to cpuset >> after creating hash page table. > > I don't understand the problem. Can you please elaborate? > > Lets take a system with 100GB RAM. We reserve around 5GB for htab allocation. Now if we use rest of available memory for hugetlbfs (because we want all the guest to be backed by huge pages), we would end up in a situation where we have a few GB of free RAM and 5GB of CMA reserve area. Now if we allow hash page table allocation to consume the free space, we would end up hitting page allocation failure for other non movable kernel allocation even though we still have 5GB CMA reserve space free. -aneesh -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
Alexander Graf writes: > On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: >> Although it's optional IBM POWER cpus always had DAR value set on >> alignment interrupt. So don't try to compute these values. >> >> Signed-off-by: Aneesh Kumar K.V >> --- >> Changes from V3: >> * Use make_dsisr instead of checking feature flag to decide whether to use >>saved dsisr or not >> >> ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst) >> { >> +#ifdef CONFIG_PPC_BOOK3S_64 >> +return vcpu->arch.fault_dar; > > How about PA6T and G5s? > > Paul mentioned that BOOK3S always had DAR value set on alignment interrupt. And the patch is to enable/collect correct DAR value when running with Little Endian PR guest. Now to limit the impact and to enable Little Endian PR guest, I ended up doing the conditional code only for book3s 64 for which we know for sure that we set DAR value. -aneesh -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] kvm/irqchip: Speed up KVM_SET_GSI_ROUTING
Il 05/05/2014 16:21, Christian Borntraeger ha scritto: On 28/04/14 18:39, Paolo Bonzini wrote: From: Christian Borntraeger Given all your work, What about From: Paolo Bonzini plus "Based on an inital patch from Christian Borntraeger" No big deal, I don't care about authorship that much. @@ -221,17 +225,18 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) unsigned long flags = (unsigned long)key; struct kvm_kernel_irq_routing_entry *irq; struct kvm *kvm = irqfd->kvm; + int idx; if (flags & POLLIN) { - rcu_read_lock(); - irq = rcu_dereference(irqfd->irq_entry); + idx = srcu_read_lock(&kvm->irq_srcu); + irq = srcu_dereference(irqfd->irq_entry, &kvm->irq_srcu); /* An event has been signaled, inject an interrupt */ if (irq) kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1, false); else schedule_work(&irqfd->inject); - rcu_read_unlock(); + srcu_read_unlock(&kvm->irq_srcu, idx); } if (flags & POLLHUP) { @@ -363,7 +368,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) } list_add_rcu(&irqfd->resampler_link, &irqfd->resampler->list); - synchronize_rcu(); + synchronize_srcu(&kvm->irq_srcu); No idea what resampler is, can this become time critical as well - iow do we need expedited here? It's for level-triggered interrupts. I decided that if synchronize_rcu was good enough before, synchronize_srcu will do after the patch. @@ -85,7 +86,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm, mutex_lock(&kvm->irq_lock); hlist_del_init_rcu(&kian->link); mutex_unlock(&kvm->irq_lock); - synchronize_rcu(); + synchronize_srcu_expedited(&kvm->irq_srcu); Hmm, looks like all callers are slow path (shutdown, deregister assigned dev). Couldnt we use the non expedited variant? ... but I have screwed up this one. Thanks, I'll change it. r = kvm_arch_init_vm(kvm, type); if (r) - goto out_err_nodisable; + goto out_err_no_disable; r = hardware_enable_all(); if (r) - goto out_err_nodisable; + goto out_err_no_disable; #ifdef CONFIG_HAVE_KVM_IRQCHIP INIT_HLIST_HEAD(&kvm->mask_notifier_list); @@ -473,10 +473,12 @@ static struct kvm *kvm_create_vm(unsigned long type) r = -ENOMEM; kvm->memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); if (!kvm->memslots) - goto out_err_nosrcu; + goto out_err_no_srcu; kvm_init_memslots_id(kvm); if (init_srcu_struct(&kvm->srcu)) - goto out_err_nosrcu; + goto out_err_no_srcu; + if (init_srcu_struct(&kvm->irq_srcu)) + goto out_err_no_irq_srcu; for (i = 0; i < KVM_NR_BUSES; i++) { kvm->buses[i] = kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL); @@ -505,10 +507,12 @@ static struct kvm *kvm_create_vm(unsigned long type) return kvm; out_err: + cleanup_srcu_struct(&kvm->irq_srcu); +out_err_no_irq_srcu: cleanup_srcu_struct(&kvm->srcu); -out_err_nosrcu: +out_err_no_srcu: hardware_disable_all(); -out_err_nodisable: +out_err_no_disable: the patch would be smaller without this change, but it makes the naming more consistent, so ok. Yeah, out_err_noirq_srcu or out_err_noirqsrcu are both very ugly. Thanks for the review, I'm making the small change to remove expedited and applying to kvm/queue. Paolo for (i = 0; i < KVM_NR_BUSES; i++) kfree(kvm->buses[i]); kfree(kvm->memslots); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] kvm/irqchip: Speed up KVM_SET_GSI_ROUTING
On 28/04/14 18:39, Paolo Bonzini wrote: > From: Christian Borntraeger Given all your work, What about From: Paolo Bonzini plus "Based on an inital patch from Christian Borntraeger" > > When starting lots of dataplane devices the bootup takes very long on > Christian's s390 with irqfd patches. With larger setups he is even > able to trigger some timeouts in some components. Turns out that the > KVM_SET_GSI_ROUTING ioctl takes very long (strace claims up to 0.1 sec) > when having multiple CPUs. This is caused by the synchronize_rcu and > the HZ=100 of s390. By changing the code to use a private srcu we can > speed things up. This patch reduces the boot time till mounting root > from 8 to 2 seconds on my s390 guest with 100 disks. > > Uses of hlist_for_each_entry_rcu, hlist_add_head_rcu, hlist_del_init_rcu > are fine because they do not have lockdep checks (hlist_for_each_entry_rcu > uses rcu_dereference_raw rather than rcu_dereference, and write-sides > do not do rcu lockdep at all). > > Note that we're hardly relying on the "sleepable" part of srcu. We just > want SRCU's faster detection of grace periods. > > Testing was done by Andrew Theurer using NETPERF. The difference between > results "before" and "after" the patch has mean -0.2% and standard deviation > 0.6%. Using a paired t-test on the data points says that there is a 2.5% > probability that the patch is the cause of the performance difference > (rather than a random fluctuation). > > Cc: Marcelo Tosatti > Cc: Michael S. Tsirkin > Signed-off-by: Christian Borntraeger > Signed-off-by: Paolo Bonzini Some questions regarding expedided vs. non expedited and a comment without a necessary action. Otherwise Reviewed-by: Christian Borntraeger Tested-by: Christian Borntraeger # on s390 > --- > include/linux/kvm_host.h | 1 + > virt/kvm/eventfd.c | 25 +++-- > virt/kvm/irq_comm.c | 17 + > virt/kvm/irqchip.c | 31 --- > virt/kvm/kvm_main.c | 16 ++-- > 5 files changed, 51 insertions(+), 39 deletions(-) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 820fc2e1d9df..cd0df9a9352d 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -368,6 +368,7 @@ struct kvm { > struct mm_struct *mm; /* userspace tied to this vm */ > struct kvm_memslots *memslots; > struct srcu_struct srcu; > + struct srcu_struct irq_srcu; > #ifdef CONFIG_KVM_APIC_ARCHITECTURE > u32 bsp_vcpu_id; > #endif > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > index 912ec5a95e2c..20c3af7692c5 100644 > --- a/virt/kvm/eventfd.c > +++ b/virt/kvm/eventfd.c > @@ -31,6 +31,7 @@ > #include > #include > #include > +#include > #include > > #include "iodev.h" > @@ -118,19 +119,22 @@ static void > irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian) > { > struct _irqfd_resampler *resampler; > + struct kvm *kvm; > struct _irqfd *irqfd; > + int idx; > > resampler = container_of(kian, struct _irqfd_resampler, notifier); > + kvm = resampler->kvm; > > - kvm_set_irq(resampler->kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID, > + kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID, > resampler->notifier.gsi, 0, false); > > - rcu_read_lock(); > + idx = srcu_read_lock(&kvm->irq_srcu); > > list_for_each_entry_rcu(irqfd, &resampler->list, resampler_link) > eventfd_signal(irqfd->resamplefd, 1); > > - rcu_read_unlock(); > + srcu_read_unlock(&kvm->irq_srcu, idx); > } > > static void > @@ -142,7 +146,7 @@ irqfd_resampler_shutdown(struct _irqfd *irqfd) > mutex_lock(&kvm->irqfds.resampler_lock); > > list_del_rcu(&irqfd->resampler_link); > - synchronize_rcu(); > + synchronize_srcu(&kvm->irq_srcu); > > if (list_empty(&resampler->list)) { > list_del(&resampler->link); > @@ -221,17 +225,18 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int > sync, void *key) > unsigned long flags = (unsigned long)key; > struct kvm_kernel_irq_routing_entry *irq; > struct kvm *kvm = irqfd->kvm; > + int idx; > > if (flags & POLLIN) { > - rcu_read_lock(); > - irq = rcu_dereference(irqfd->irq_entry); > + idx = srcu_read_lock(&kvm->irq_srcu); > + irq = srcu_dereference(irqfd->irq_entry, &kvm->irq_srcu); > /* An event has been signaled, inject an interrupt */ > if (irq) > kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1, > false); > else > schedule_work(&irqfd->inject); > - rcu_read_unlock(); > + srcu_read_unlock(&kvm->irq_srcu, idx); > } > > if (flags & POLLHUP) { > @@ -363,7 +368,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) >
Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest
On Mon, 2014-05-05 at 13:56 +0200, Alexander Graf wrote: > On 05/05/2014 03:27 AM, Gavin Shan wrote: > > The series of patches intends to support EEH for PCI devices, which have > > been > > passed through to PowerKVM based guest via VFIO. The implementation is > > straightforward based on the issues or problems we have to resolve to > > support > > EEH for PowerKVM based guest. > > > > - Emulation for EEH RTAS requests. Thanksfully, we already have > > infrastructure > >to emulate XICS. Without introducing new mechanism, we just extend that > >existing infrastructure to support EEH RTAS emulation. EEH RTAS requests > >initiated from guest are posted to host where the requests get handled or > >delivered to underly firmware for further handling. For that, the host > > kerenl > >has to maintain the PCI address (host domain/bus/slot/function to guest's > >PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address > > mapping > >will be built when initializing VFIO device in QEMU and destroied when > > the > >VFIO device in QEMU is going to offline, or VM is destroy. > > Do you also expose all those interfaces to user space? VFIO is as much > about user space device drivers as it is about device assignment. > > I would like to first see an implementation that doesn't touch KVM > emulation code at all but instead routes everything through QEMU. As a > second step we can then accelerate performance critical paths inside of KVM. > > That way we ensure that user space device drivers have all the power > over a device they need to drive it. +1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvmclock: Ensure time in migration never goes backward
When we migrate we ask the kernel about its current belief on what the guest time would be. However, I've seen cases where the kvmclock guest structure indicates a time more recent than the kvm returned time. To make sure we never go backwards, calculate what the guest would have seen as time at the point of migration and use that value instead of the kernel returned one when it's more recent. While this doesn't fix the underlying issue that the kernel's view of time is skewed, it allows us to safely migrate guests even from sources that are known broken. Signed-off-by: Alexander Graf --- hw/i386/kvm/clock.c | 48 1 file changed, 48 insertions(+) diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c index 892aa02..c6521cf 100644 --- a/hw/i386/kvm/clock.c +++ b/hw/i386/kvm/clock.c @@ -14,6 +14,7 @@ */ #include "qemu-common.h" +#include "qemu/host-utils.h" #include "sysemu/sysemu.h" #include "sysemu/kvm.h" #include "hw/sysbus.h" @@ -34,6 +35,47 @@ typedef struct KVMClockState { bool clock_valid; } KVMClockState; +struct pvclock_vcpu_time_info { +uint32_t version; +uint32_t pad0; +uint64_t tsc_timestamp; +uint64_t system_time; +uint32_t tsc_to_system_mul; +int8_t tsc_shift; +uint8_tflags; +uint8_tpad[2]; +} __attribute__((__packed__)); /* 32 bytes */ + +static uint64_t kvmclock_current_nsec(KVMClockState *s) +{ +CPUState *cpu = first_cpu; +CPUX86State *env = cpu->env_ptr; +hwaddr kvmclock_struct_pa = env->system_time_msr & ~1ULL; +uint64_t migration_tsc = env->tsc; +struct pvclock_vcpu_time_info time; +uint64_t delta; +uint64_t nsec_lo; +uint64_t nsec_hi; +uint64_t nsec; + +if (!(env->system_time_msr & 1ULL)) { +/* KVM clock not active */ +return 0; +} + +cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time)); + +delta = migration_tsc - time.tsc_timestamp; +if (time.tsc_shift < 0) { +delta >>= -time.tsc_shift; +} else { +delta <<= time.tsc_shift; +} + +mulu64(&nsec_lo, &nsec_hi, delta, time.tsc_to_system_mul); +nsec = (nsec_lo >> 32) | (nsec_hi << 32); +return nsec + time.system_time; +} static void kvmclock_vm_state_change(void *opaque, int running, RunState state) @@ -45,9 +87,15 @@ static void kvmclock_vm_state_change(void *opaque, int running, if (running) { struct kvm_clock_data data; +uint64_t time_at_migration = kvmclock_current_nsec(s); s->clock_valid = false; +if (time_at_migration > s->clock) { +fprintf(stderr, "KVM Clock migrated backwards, using later time\n"); +s->clock = time_at_migration; +} + data.clock = s->clock; data.flags = 0; ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data); -- 1.7.12.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/11] perf kvm: use defines of kvm events
Il 25/04/2014 11:12, Christian Borntraeger ha scritto: From: Alexander Yarygin Currently perf-kvm uses string literals for kvm event names, but it works only for x86, because other architectures may have other names for those events. This patch introduces defines for kvm_entry and kvm_exit events and lets perf-kvm replace literals. Signed-off-by: Alexander Yarygin Reviewed-by: Cornelia Huck Signed-off-by: Christian Borntraeger --- arch/x86/include/uapi/asm/kvm.h | 8 tools/perf/builtin-kvm.c| 10 -- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index d3a8778..88c0099 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -8,6 +8,8 @@ #include #include +#include +#include #define DE_VECTOR 0 #define DB_VECTOR 1 @@ -342,4 +344,10 @@ struct kvm_xcrs { struct kvm_sync_regs { }; +#define VCPU_ID "vcpu_id" + +#define KVM_ENTRY "kvm:kvm_entry" +#define KVM_EXIT "kvm:kvm_exit" +#define KVM_EXIT_REASON "exit_reason" + #endif /* _ASM_X86_KVM_H */ What about adding a new asm/kvm-perf.h header instead? 1) I don't like very much the namespace pollution that the first hunk causes (and the second one isn't really pretty either). 2) perf doesn't need most of uapi/asm/kvm.h, in fact it only needs a couple of #defines because it is a dependency of uapi/asm/svm.h. So it is uapi/asm/svm.h that should include uapi/asm/kvm.h, not perf. Paolo diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c index 806c0e4..9a162ae 100644 --- a/tools/perf/builtin-kvm.c +++ b/tools/perf/builtin-kvm.c @@ -30,8 +30,6 @@ #include #ifdef HAVE_KVM_STAT_SUPPORT -#include -#include #include struct event_key { @@ -130,12 +128,12 @@ static void exit_event_get_key(struct perf_evsel *evsel, struct event_key *key) { key->info = 0; - key->key = perf_evsel__intval(evsel, sample, "exit_reason"); + key->key = perf_evsel__intval(evsel, sample, KVM_EXIT_REASON); } static bool kvm_exit_event(struct perf_evsel *evsel) { - return !strcmp(evsel->name, "kvm:kvm_exit"); + return !strcmp(evsel->name, KVM_EXIT); } static bool exit_event_begin(struct perf_evsel *evsel, @@ -151,7 +149,7 @@ static bool exit_event_begin(struct perf_evsel *evsel, static bool kvm_entry_event(struct perf_evsel *evsel) { - return !strcmp(evsel->name, "kvm:kvm_entry"); + return !strcmp(evsel->name, KVM_ENTRY); } static bool exit_event_end(struct perf_evsel *evsel, @@ -557,7 +555,7 @@ struct vcpu_event_record *per_vcpu_record(struct thread *thread, return NULL; } - vcpu_record->vcpu_id = perf_evsel__intval(evsel, sample, "vcpu_id"); + vcpu_record->vcpu_id = perf_evsel__intval(evsel, sample, VCPU_ID); thread->priv = vcpu_record; } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest
On 05/05/2014 03:27 AM, Gavin Shan wrote: The series of patches intends to support EEH for PCI devices, which have been passed through to PowerKVM based guest via VFIO. The implementation is straightforward based on the issues or problems we have to resolve to support EEH for PowerKVM based guest. - Emulation for EEH RTAS requests. Thanksfully, we already have infrastructure to emulate XICS. Without introducing new mechanism, we just extend that existing infrastructure to support EEH RTAS emulation. EEH RTAS requests initiated from guest are posted to host where the requests get handled or delivered to underly firmware for further handling. For that, the host kerenl has to maintain the PCI address (host domain/bus/slot/function to guest's PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address mapping will be built when initializing VFIO device in QEMU and destroied when the VFIO device in QEMU is going to offline, or VM is destroy. Do you also expose all those interfaces to user space? VFIO is as much about user space device drivers as it is about device assignment. I would like to first see an implementation that doesn't touch KVM emulation code at all but instead routes everything through QEMU. As a second step we can then accelerate performance critical paths inside of KVM. That way we ensure that user space device drivers have all the power over a device they need to drive it. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest
On 05/04/2014 07:30 PM, Aneesh Kumar K.V wrote: Signed-off-by: Aneesh Kumar K.V No patch description, no proper explanations anywhere why you're doing what. All of that in a pretty sensitive piece of code. There's no way this patch can go upstream in its current form. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: PR: Fix WARN_ON with debug options on
On 05/04/2014 07:26 PM, Aneesh Kumar K.V wrote: With debug option "sleep inside atomic section checking" enabled we get the below WARN_ON during a PR KVM boot. This is because upstream now have PREEMPT_COUNT enabled even if we have preempt disabled. Fix the warning by adding preempt_disable/enable around floating point and altivec enable. WARNING: at arch/powerpc/kernel/process.c:156 Modules linked in: kvm_pr kvm CPU: 1 PID: 3990 Comm: qemu-system-ppc Tainted: GW 3.15.0-rc1+ #4 task: c000eb85b3a0 ti: c000ec59c000 task.ti: c000ec59c000 NIP: c0015c84 LR: d3334644 CTR: c0015c00 REGS: c000ec59f140 TRAP: 0700 Tainted: GW (3.15.0-rc1+) MSR: 80029032 CR: 4224 XER: 2000 CFAR: c0015c24 SOFTE: 1 GPR00: d3334644 c000ec59f3c0 c0e2fa40 c000e2f8 GPR04: 0800 2000 0001 8000 GPR08: 0001 0001 2000 c0015c00 GPR12: d333da18 cfb80900 GPR16: 3fffce4e0fa1 GPR20: 0010 0001 0002 100b9a38 GPR24: 0002 0013 GPR28: c000eb85b3a0 2000 c000e2f8 NIP [c0015c84] .enable_kernel_fp+0x84/0x90 LR [d3334644] .kvmppc_handle_ext+0x134/0x190 [kvm_pr] Call Trace: [c000ec59f3c0] [0010] 0x10 (unreliable) [c000ec59f430] [d3334644] .kvmppc_handle_ext+0x134/0x190 [kvm_pr] [c000ec59f4c0] [d324b380] .kvmppc_set_msr+0x30/0x50 [kvm] [c000ec59f530] [d3337cac] .kvmppc_core_emulate_op_pr+0x16c/0x5e0 [kvm_pr] [c000ec59f5f0] [d324a944] .kvmppc_emulate_instruction+0x284/0xa80 [kvm] [c000ec59f6c0] [d3336888] .kvmppc_handle_exit_pr+0x488/0xb70 [kvm_pr] [c000ec59f790] [d3338d34] kvm_start_lightweight+0xcc/0xdc [kvm_pr] [c000ec59f960] [d3336288] .kvmppc_vcpu_run_pr+0xc8/0x190 [kvm_pr] [c000ec59f9f0] [d324c880] .kvmppc_vcpu_run+0x30/0x50 [kvm] [c000ec59fa60] [d3249e74] .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [kvm] [c000ec59faf0] [d3244948] .kvm_vcpu_ioctl+0x478/0x760 [kvm] [c000ec59fcb0] [c0224e34] .do_vfs_ioctl+0x4d4/0x790 [c000ec59fd90] [c0225148] .SyS_ioctl+0x58/0xb0 [c000ec59fe30] [c000a1e4] syscall_exit+0x0/0x98 Signed-off-by: Aneesh Kumar K.V Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.
On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote: We reserve 5% of total ram for CMA allocation and not using that can result in us running out of numa node memory with specific configuration. One caveat is we may not have node local hpt with pinned vcpu configuration. But currently libvirt also pins the vcpu to cpuset after creating hash page table. I don't understand the problem. Can you please elaborate? Alex Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 23 ++- 1 file changed, 6 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index fb25ebc0af0c..f32896ffd784 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -52,7 +52,7 @@ static void kvmppc_rmap_reset(struct kvm *kvm); long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) { - unsigned long hpt; + unsigned long hpt = 0; struct revmap_entry *rev; struct page *page = NULL; long order = KVM_DEFAULT_HPT_ORDER; @@ -64,22 +64,11 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp) } kvm->arch.hpt_cma_alloc = 0; - /* -* try first to allocate it from the kernel page allocator. -* We keep the CMA reserved for failed allocation. -*/ - hpt = __get_free_pages(GFP_KERNEL | __GFP_ZERO | __GFP_REPEAT | - __GFP_NOWARN, order - PAGE_SHIFT); - - /* Next try to allocate from the preallocated pool */ - if (!hpt) { - VM_BUG_ON(order < KVM_CMA_CHUNK_ORDER); - page = kvm_alloc_hpt(1 << (order - PAGE_SHIFT)); - if (page) { - hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page)); - kvm->arch.hpt_cma_alloc = 1; - } else - --order; + VM_BUG_ON(order < KVM_CMA_CHUNK_ORDER); + page = kvm_alloc_hpt(1 << (order - PAGE_SHIFT)); + if (page) { + hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page)); + kvm->arch.hpt_cma_alloc = 1; } /* Lastly try successively smaller sizes from the page allocator */ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr
On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote: Although it's optional IBM POWER cpus always had DAR value set on alignment interrupt. So don't try to compute these values. Signed-off-by: Aneesh Kumar K.V --- Changes from V3: * Use make_dsisr instead of checking feature flag to decide whether to use saved dsisr or not arch/powerpc/include/asm/disassemble.h | 34 +++ arch/powerpc/kernel/align.c| 34 +-- arch/powerpc/kvm/book3s_emulate.c | 43 -- 3 files changed, 40 insertions(+), 71 deletions(-) diff --git a/arch/powerpc/include/asm/disassemble.h b/arch/powerpc/include/asm/disassemble.h index 856f8deb557a..6330a61b875a 100644 --- a/arch/powerpc/include/asm/disassemble.h +++ b/arch/powerpc/include/asm/disassemble.h @@ -81,4 +81,38 @@ static inline unsigned int get_oc(u32 inst) { return (inst >> 11) & 0x7fff; } + +#define IS_XFORM(inst) (get_op(inst) == 31) +#define IS_DSFORM(inst)(get_op(inst) >= 56) + +/* + * Create a DSISR value from the instruction + */ +static inline unsigned make_dsisr(unsigned instr) +{ + unsigned dsisr; + + + /* bits 6:15 --> 22:31 */ + dsisr = (instr & 0x03ff) >> 16; + + if (IS_XFORM(instr)) { + /* bits 29:30 --> 15:16 */ + dsisr |= (instr & 0x0006) << 14; + /* bit 25 -->17 */ + dsisr |= (instr & 0x0040) << 8; + /* bits 21:24 --> 18:21 */ + dsisr |= (instr & 0x0780) << 3; + } else { + /* bit 5 -->17 */ + dsisr |= (instr & 0x0400) >> 12; + /* bits 1: 4 --> 18:21 */ + dsisr |= (instr & 0x7800) >> 17; + /* bits 30:31 --> 12:13 */ + if (IS_DSFORM(instr)) + dsisr |= (instr & 0x0003) << 18; + } + + return dsisr; +} #endif /* __ASM_PPC_DISASSEMBLE_H__ */ diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c index 94908af308d8..34f55524d456 100644 --- a/arch/powerpc/kernel/align.c +++ b/arch/powerpc/kernel/align.c @@ -25,14 +25,13 @@ #include #include #include +#include struct aligninfo { unsigned char len; unsigned char flags; }; -#define IS_XFORM(inst) (((inst) >> 26) == 31) -#define IS_DSFORM(inst)(((inst) >> 26) >= 56) #define INVALID { 0, 0 } @@ -192,37 +191,6 @@ static struct aligninfo aligninfo[128] = { }; /* - * Create a DSISR value from the instruction - */ -static inline unsigned make_dsisr(unsigned instr) -{ - unsigned dsisr; - - - /* bits 6:15 --> 22:31 */ - dsisr = (instr & 0x03ff) >> 16; - - if (IS_XFORM(instr)) { - /* bits 29:30 --> 15:16 */ - dsisr |= (instr & 0x0006) << 14; - /* bit 25 -->17 */ - dsisr |= (instr & 0x0040) << 8; - /* bits 21:24 --> 18:21 */ - dsisr |= (instr & 0x0780) << 3; - } else { - /* bit 5 -->17 */ - dsisr |= (instr & 0x0400) >> 12; - /* bits 1: 4 --> 18:21 */ - dsisr |= (instr & 0x7800) >> 17; - /* bits 30:31 --> 12:13 */ - if (IS_DSFORM(instr)) - dsisr |= (instr & 0x0003) << 18; - } - - return dsisr; -} - -/* * The dcbz (data cache block zero) instruction * gives an alignment fault if used on non-cacheable * memory. We handle the fault mainly for the diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 99d40f8977e8..04c38f049dfd 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -569,48 +569,14 @@ unprivileged: u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst) { - u32 dsisr = 0; - - /* -* This is what the spec says about DSISR bits (not mentioned = 0): -* -* 12:13[DS]Set to bits 30:31 -* 15:16[X] Set to bits 29:30 -* 17 [X] Set to bit 25 -* [D/DS] Set to bit 5 -* 18:21[X] Set to bits 21:24 -* [D/DS] Set to bits 1:4 -* 22:26Set to bits 6:10 (RT/RS/FRT/FRS) -* 27:31Set to bits 11:15 (RA) -*/ - - switch (get_op(inst)) { - /* D-form */ - case OP_LFS: - case OP_LFD: - case OP_STFD: - case OP_STFS: - dsisr |= (inst >> 12) & 0x4000; /* bit 17 */ - dsisr |= (inst >> 17) & 0x3c00; /* bits 18:21 */ - break; - /* X-form */ - case 31: - dsisr |= (inst << 14) & 0x18000; /* bits 15:16 */ - dsisr |= (inst <
Re: [PATCH 0/6] KVM: PPC: Book3S PR: Add POWER8 support
On 05/04/2014 06:36 PM, Aneesh Kumar K.V wrote: Alexander Graf writes: When running on a POWER8 host, we get away with running the guest as POWER7 and nothing falls apart. However, when we start exposing POWER8 as guest CPU, guests will start using new abilities on POWER8 which we need to handle. This patch set does a minimalistic approach to implementing those bits to make guests happy enough to run. Alex Alexander Graf (6): KVM: PPC: Book3S PR: Ignore PMU SPRs KVM: PPC: Book3S PR: Emulate TIR register KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR KVM: PPC: Book3S PR: Expose TAR facility to guest KVM: PPC: Book3S PR: Expose EBB registers KVM: PPC: Book3S PR: Expose TM registers arch/powerpc/include/asm/kvm_asm.h| 18 --- arch/powerpc/include/asm/kvm_book3s_asm.h | 2 + arch/powerpc/include/asm/kvm_host.h | 3 ++ arch/powerpc/kernel/asm-offsets.c | 3 ++ arch/powerpc/kvm/book3s.c | 34 + arch/powerpc/kvm/book3s_emulate.c | 53 arch/powerpc/kvm/book3s_hv.c | 30 --- arch/powerpc/kvm/book3s_pr.c | 82 +++ arch/powerpc/kvm/book3s_segment.S | 25 ++ 9 files changed, 212 insertions(+), 38 deletions(-) I did most of this as part of [RFC PATCH 01/10] KVM: PPC: BOOK3S: PR: Add POWER8 support http://mid.gmane.org/1390927455-3312-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com Any reason why that is not picked up ? TM was the reason I didn't push the patchset again. I was not sure how to get all the TM details to work. Ugh, I guess I mostly discarded it as brainstorm patches because they were marked RFC :( Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5] KVM: PPC: BOOK3S: PR: Enable Little Endian PR guest
On 05/05/2014 05:09 AM, Aneesh Kumar K.V wrote: This patch make sure we inherit the LE bit correctly in different case so that we can run Little Endian distro in PR mode Signed-off-by: Aneesh Kumar K.V Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/11] perf kvm: add stat support on s390
On 25/04/14 11:12, Christian Borntraeger wrote: > +#if defined(__i386__) || defined(__x86_64__) > else if (!strcmp(kvm->report_event, "mmio")) > kvm->events_ops = &mmio_events; > else if (!strcmp(kvm->report_event, "ioport")) > kvm->events_ops = &ioport_events; > +#endif To address Davids review, the next version will have this hunk as well: diff --git a/tools/perf/Documentation/perf-kvm.txt b/tools/perf/Documentation/perf-kvm.txt index 52276a6..e974749 100644 --- a/tools/perf/Documentation/perf-kvm.txt +++ b/tools/perf/Documentation/perf-kvm.txt @@ -103,8 +103,8 @@ STAT REPORT OPTIONS analyze events which occures on this vcpu. (default: all vcpus) --event=:: - event to be analyzed. Possible values: vmexit, mmio, ioport. - (default: vmexit) + event to be analyzed. Possible values: vmexit, mmio (x86 only), + ioport (x86 only). (default: vmexit) -k:: --key=:: Sorting key. Possible values: sample (default, sort by samples -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC 00/11] perf/s390/kvm: trace events, perf kvm stat
On 02/05/14 20:14, David Ahern wrote: > On 5/2/14, 3:16 AM, Jiri Olsa wrote: [...] >> CC-ing David Ahern >> > I don't have the original emails, but looking at > https://lkml.org/lkml/2014/4/25/331 > > > [PATCH 01/11] s390: add sie exit reasons tables > [PATCH 02/11] KVM: s390: Use trace tables from sie.h > [PATCH 03/11] KVM: s390: decoder of SIE intercepted instructions > [PATCH 04/11] KVM: s390: Use intercept_insn decoder in trace event > - not perf related > > > [PATCH 05/11] perf kvm: Intoduce HAVE_KVM_STAT_SUPPORT flag > [PATCH 06/11] perf kvm: simplify of exit reasons tables definitions > [PATCH 07/11] perf kvm: Refactoring of cpu_isa_config() > [PATCH 10/11] perf: allow to use cpuinfo on s390 > Reviewed-by: David Ahern > > > [PATCH 09/11] perf kvm: use defines of kvm events > - KVM team should ack kvm.h change Paolo, any chance to ack these changes? > - perf side looks fine to me > > [PATCH 11/11] perf kvm: add stat support on s390 > - like to see the arch bits moved to arch/x86 and arch/s390 rather than > adding #ifdefs > - disabling ioport and mmio options is ok, but if you are going to compile it > out update the documentation accordingly. > > David > Thanks. Question is now how to proceed: Patch 1-4 are s390/kvm specific. I am s390/kvm maintainer, so I can hereby Ack them. Patch 5-10 are perf specific Patch 11 is s390/kvm/perf specific and needs both patch series as a base. I see several variants for the next submission: a: all patches via Paolos KVM tree b: all patches via perf tree (e.g. via Jiri) c: via both trees. (e.g. I prepare a git branch based on 3.15-rc1 so that during next merge window the common history should make most things work out fine) d: patch 1-4 via KVM, patch 5-10 via perf, patch 11 after both trees are merged into Linus Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/11] perf kvm: allow for variable string sizes
David, thanks for the review. Are you ok with this change as well? The alternative is to shorten our descriptions (in 1/11 s390: add sie exit reasons tables), which would make the trace output less comprehensible, though. Christian On 25/04/14 11:12, Christian Borntraeger wrote: > From: Alexander Yarygin > > This makes it possible for other architectures to decode to different > string lengths. > > Needed by follow-up patch "perf kvm: add stat support on s390". > > Signed-off-by: Alexander Yarygin > Signed-off-by: Christian Borntraeger > --- > tools/perf/builtin-kvm.c | 38 +++--- > 1 file changed, 23 insertions(+), 15 deletions(-) > > diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c > index 922706c..806c0e4 100644 > --- a/tools/perf/builtin-kvm.c > +++ b/tools/perf/builtin-kvm.c > @@ -75,7 +75,7 @@ struct kvm_events_ops { > bool (*is_end_event)(struct perf_evsel *evsel, >struct perf_sample *sample, struct event_key *key); > void (*decode_key)(struct perf_kvm_stat *kvm, struct event_key *key, > -char decode[20]); > +char *decode); > const char *name; > }; > > @@ -84,6 +84,8 @@ struct exit_reasons_table { > const char *reason; > }; > > +#define DECODE_STR_LEN_MAX 80 > + > #define EVENTS_BITS 12 > #define EVENTS_CACHE_SIZE(1UL << EVENTS_BITS) > > @@ -101,6 +103,8 @@ struct perf_kvm_stat { > struct exit_reasons_table *exit_reasons; > const char *exit_reasons_isa; > > + int decode_str_len; > + > struct kvm_events_ops *events_ops; > key_cmp_fun compare; > struct list_head kvm_events_cache[EVENTS_CACHE_SIZE]; > @@ -182,12 +186,12 @@ static const char *get_exit_reason(struct perf_kvm_stat > *kvm, > > static void exit_event_decode_key(struct perf_kvm_stat *kvm, > struct event_key *key, > - char decode[20]) > + char *decode) > { > const char *exit_reason = get_exit_reason(kvm, kvm->exit_reasons, > key->key); > > - scnprintf(decode, 20, "%s", exit_reason); > + scnprintf(decode, kvm->decode_str_len, "%s", exit_reason); > } > > static struct kvm_events_ops exit_events = { > @@ -249,10 +253,11 @@ static bool mmio_event_end(struct perf_evsel *evsel, > struct perf_sample *sample, > > static void mmio_event_decode_key(struct perf_kvm_stat *kvm __maybe_unused, > struct event_key *key, > - char decode[20]) > + char *decode) > { > - scnprintf(decode, 20, "%#lx:%s", (unsigned long)key->key, > - key->info == KVM_TRACE_MMIO_WRITE ? "W" : "R"); > + scnprintf(decode, kvm->decode_str_len, "%#lx:%s", > + (unsigned long)key->key, > + key->info == KVM_TRACE_MMIO_WRITE ? "W" : "R"); > } > > static struct kvm_events_ops mmio_events = { > @@ -292,10 +297,11 @@ static bool ioport_event_end(struct perf_evsel *evsel, > > static void ioport_event_decode_key(struct perf_kvm_stat *kvm __maybe_unused, > struct event_key *key, > - char decode[20]) > + char *decode) > { > - scnprintf(decode, 20, "%#llx:%s", (unsigned long long)key->key, > - key->info ? "POUT" : "PIN"); > + scnprintf(decode, kvm->decode_str_len, "%#llx:%s", > + (unsigned long long)key->key, > + key->info ? "POUT" : "PIN"); > } > > static struct kvm_events_ops ioport_events = { > @@ -523,13 +529,13 @@ static bool handle_end_event(struct perf_kvm_stat *kvm, > time_diff = sample->time - time_begin; > > if (kvm->duration && time_diff > kvm->duration) { > - char decode[32]; > + char decode[DECODE_STR_LEN_MAX]; > > kvm->events_ops->decode_key(kvm, &event->key, decode); > if (strcmp(decode, "HLT")) { > - pr_info("%" PRIu64 " VM %d, vcpu %d: %s event took %" > PRIu64 "usec\n", > + pr_info("%" PRIu64 " VM %d, vcpu %d: %*s event took %" > PRIu64 "usec\n", >sample->time, sample->pid, > vcpu_record->vcpu_id, > - decode, time_diff/1000); > + 32, decode, time_diff/1000); > } > } > > @@ -738,7 +744,7 @@ static void show_timeofday(void) > > static void print_result(struct perf_kvm_stat *kvm) > { > - char decode[20]; > + char decode[DECODE_STR_LEN_MAX]; > struct kvm_event *event; > int vcpu = kvm->trace_vcpu; > > @@ -749,7 +755,7 @@ static void print_result(struct perf_kvm_stat *kvm) > > pr_info("\n\n"); > print_vcpu_info(kvm); > - pr_info("%2
Re: [patch] KVM: s390: return -EFAULT if copy_from_user() fails
On 03/05/14 22:18, Dan Carpenter wrote: > When copy_from_user() fails, this code returns the number of bytes > remaining instead of a negative error code. The positive number is > returned to the user but otherwise it is harmless. > > Signed-off-by: Dan Carpenter Thanks. Applied to KVM/s390 fix queue. > --- > I am not able to compile this. > > diff --git a/arch/s390/kvm/guestdbg.c b/arch/s390/kvm/guestdbg.c > index 757ccef..3e8d409 100644 > --- a/arch/s390/kvm/guestdbg.c > +++ b/arch/s390/kvm/guestdbg.c > @@ -223,9 +223,10 @@ int kvm_s390_import_bp_data(struct kvm_vcpu *vcpu, > goto error; > } > > - ret = copy_from_user(bp_data, dbg->arch.hw_bp, size); > - if (ret) > + if (copy_from_user(bp_data, dbg->arch.hw_bp, size)) { > + ret = -EFAULT; > goto error; > + } > > for (i = 0; i < dbg->arch.nr_hw_bp; i++) { > switch (bp_data[i].type) { > -- > To unsubscribe from this list: send the line "unsubscribe linux-s390" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/1] KVM: x86: improve the usability of the 'kvm_pio' tracepoint
On 05/02/2014 11:57 PM, Ulrich Obergfell wrote: > The current implementation of the 'kvm_pio' tracepoint in > emulator_pio_in_out() > only tells us that 'something' has been read from or written to an I/O port. > To > improve the usability of the tracepoint, I propose to include the > value/content > that has been read or written in the trace output. The proposed patch aims at > the more common case where a single 8-bit or 16-bit or 32-bit value has been > read or written -- it does not fully cover the case where 'count' is greater > than one. > > This is an example of what the patch can do (trace of PCI config space > access). > > - on the host > ># trace-cmd record -e kvm:kvm_pio -f "(port >= 0xcf8) && (port <= 0xcff)" >/sys/kernel/debug/tracing/events/kvm/kvm_pio/filter >Hit Ctrl^C to stop recording > > - in a Linux guest > ># dd if=/sys/bus/pci/devices/:00:06.0/config bs=2 count=4 | hexdump >4+0 records in >4+0 records out >8 bytes (8 B) copied, 0.000114056 s, 70.1 kB/s >000 1af4 1001 0507 0010 >008 > > - on the host > ># trace-cmd report >... >qemu-kvm-23216 [001] 15211.994089: kvm_pio: pio_write >at 0xcf8 size 4 count 1 val 0x80003000 >qemu-kvm-23216 [001] 15211.994108: kvm_pio: pio_read >at 0xcfc size 2 count 1 val 0x1af4 >qemu-kvm-23216 [001] 15211.994129: kvm_pio: pio_write >at 0xcf8 size 4 count 1 val 0x80003000 >qemu-kvm-23216 [001] 15211.994136: kvm_pio: pio_read >at 0xcfe size 2 count 1 val 0x1001 >qemu-kvm-23216 [001] 15211.994143: kvm_pio: pio_write >at 0xcf8 size 4 count 1 val 0x80003004 >qemu-kvm-23216 [001] 15211.994150: kvm_pio: pio_read >at 0xcfc size 2 count 1 val 0x507 >qemu-kvm-23216 [001] 15211.994155: kvm_pio: pio_write >at 0xcf8 size 4 count 1 val 0x80003004 >qemu-kvm-23216 [001] 15211.994161: kvm_pio: pio_read >at 0xcfe size 2 count 1 val 0x10 > Nice. Could please check "perf kvm stat" to see if "--event=ioport" can work after your patch? Reviewed-by: Xiao Guangrong -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html