Re: [PATCH v4 4/6] kvm, mem-hotplug: Reload L1' apic access page on migration in vcpu_enter_guest().
Hi Gleb, On 09/03/2014 11:04 PM, Gleb Natapov wrote: On Wed, Sep 03, 2014 at 09:42:30AM +0800, tangchen wrote: Hi Gleb, On 09/03/2014 12:00 AM, Gleb Natapov wrote: .. +static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) +{ + /* +* apic access page could be migrated. When the page is being migrated, +* GUP will wait till the migrate entry is replaced with the new pte +* entry pointing to the new page. +*/ + vcpu->kvm->arch.apic_access_page = gfn_to_page(vcpu->kvm, + APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT); + kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm, + page_to_phys(vcpu->kvm->arch.apic_access_page)); I am a little bit worried that here all vcpus write to vcpu->kvm->arch.apic_access_page without any locking. It is probably benign since pointer write is atomic on x86. Paolo? Do we even need apic_access_page? Why not call gfn_to_page(APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT) put_page() on rare occasions we need to know its address? Isn't it a necessary item defined in hardware spec ? vcpu->kvm->arch.apic_access_page? No. This is internal kvm data structure. I didn't read intel spec deeply, but according to the code, the page's address is written into vmcs. And it made me think that we cannnot remove it. We cannot remove writing of apic page address into vmcs, but this is not done by assigning to vcpu->kvm->arch.apic_access_page, but by vmwrite in set_apic_access_page_addr(). OK, I'll try to remove kvm->arch.apic_access_page and send a patch for it soon. BTW, if you don't have objection to the first two patches, would you please help to commit them first ? Thanks. -- Gleb. . -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 v5] powerpc/kvm: support to handle sw breakpoint
On Monday 08 September 2014 06:35 PM, Alexander Graf wrote: > > > On 07.09.14 18:31, Madhavan Srinivasan wrote: >> This patch adds kernel side support for software breakpoint. >> Design is that, by using an illegal instruction, we trap to hypervisor >> via Emulation Assistance interrupt, where we check for the illegal >> instruction >> and accordingly we return to Host or Guest. Patch also adds support for >> software breakpoint in PR KVM. >> >> Signed-off-by: Madhavan Srinivasan >> --- >> arch/powerpc/include/asm/kvm_ppc.h | 6 ++ >> arch/powerpc/kvm/book3s.c | 3 ++- >> arch/powerpc/kvm/book3s_hv.c | 41 >> ++ >> arch/powerpc/kvm/book3s_pr.c | 3 +++ >> arch/powerpc/kvm/emulate.c | 18 + >> 5 files changed, 66 insertions(+), 5 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/kvm_ppc.h >> b/arch/powerpc/include/asm/kvm_ppc.h >> index fb86a22..dd83c9a 100644 >> --- a/arch/powerpc/include/asm/kvm_ppc.h >> +++ b/arch/powerpc/include/asm/kvm_ppc.h >> @@ -38,6 +38,12 @@ >> #include >> #endif >> >> +/* >> + * KVMPPC_INST_SW_BREAKPOINT is debug Instruction >> + * for supporting software breakpoint. >> + */ >> +#define KVMPPC_INST_SW_BREAKPOINT 0x0000 >> + >> enum emulation_result { >> EMULATE_DONE, /* no further processing */ >> EMULATE_DO_MMIO, /* kvm_run filled with MMIO request */ >> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c >> index dd03f6b..00e9c9f 100644 >> --- a/arch/powerpc/kvm/book3s.c >> +++ b/arch/powerpc/kvm/book3s.c >> @@ -778,7 +778,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, >> int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, >> struct kvm_guest_debug *dbg) >> { >> -return -EINVAL; >> +vcpu->guest_debug = dbg->control; >> +return 0; >> } >> >> void kvmppc_decrementer_func(unsigned long data) >> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c >> index 27cced9..3a2414c 100644 >> --- a/arch/powerpc/kvm/book3s_hv.c >> +++ b/arch/powerpc/kvm/book3s_hv.c >> @@ -725,6 +725,30 @@ static int kvmppc_hcall_impl_hv(unsigned long cmd) >> return kvmppc_hcall_impl_hv_realmode(cmd); >> } >> >> +static int kvmppc_emulate_debug_inst(struct kvm_run *run, >> +struct kvm_vcpu *vcpu) >> +{ >> +u32 last_inst; >> + >> +if(kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst) != >> +EMULATE_DONE) { >> +/* >> + * Fetch failed, so return to guest and >> + * try executing it again. >> + */ >> +return RESUME_GUEST; > > Please end the if() here. In the kernel we usually treat abort > situations as separate. > OK. Will change it. >> +} else { >> +if (last_inst == KVMPPC_INST_SW_BREAKPOINT) { >> +run->exit_reason = KVM_EXIT_DEBUG; >> +run->debug.arch.address = kvmppc_get_pc(vcpu); >> +return RESUME_HOST; >> +} else { >> +kvmppc_core_queue_program(vcpu, SRR1_PROGILL); >> +return RESUME_GUEST; >> +} >> +} >> +} >> + >> static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, >> struct task_struct *tsk) >> { >> @@ -807,12 +831,18 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, >> struct kvm_vcpu *vcpu, >> break; >> /* >> * This occurs if the guest executes an illegal instruction. >> - * We just generate a program interrupt to the guest, since >> - * we don't emulate any guest instructions at this stage. >> + * If the guest debug is disabled, generate a program interrupt >> + * to the guest. If guest debug is enabled, we need to check >> + * whether the instruction is a software breakpoint instruction. >> + * Accordingly return to Guest or Host. >> */ >> case BOOK3S_INTERRUPT_H_EMUL_ASSIST: >> -kvmppc_core_queue_program(vcpu, SRR1_PROGILL); >> -r = RESUME_GUEST; >> +if(vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP) { >> +r = kvmppc_emulate_debug_inst(run, vcpu); >> +} else { >> +kvmppc_core_queue_program(vcpu, SRR1_PROGILL); >> +r = RESUME_GUEST; >> +} >> break; >> /* >> * This occurs if the guest (kernel or userspace), does something that >> @@ -922,6 +952,9 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, >> u64 id, >> long int i; >> >> switch (id) { >> +case KVM_REG_PPC_DEBUG_INST: >> +*val = get_reg_val(id, KVMPPC_INST_SW_BREAKPOINT); >> +break; >> case KVM_REG_PPC_HIOR: >> *val = get_reg_val(id, 0); >> break; >> diff --git a/arch/p
[Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
https://bugzilla.kernel.org/show_bug.cgi?id=81841 --- Comment #19 from Marti Raudsepp --- (In reply to Joel Schopp from comment #15) > It's not clear to me which devices were being put in the same group. Hi Joel, any updates on this? I posted my IOMMU groups in comment #17 in case you missed it. -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2 v5] powerpc/kvm: common sw breakpoint instr across ppc
On Monday 08 September 2014 06:39 PM, Alexander Graf wrote: > > > On 07.09.14 18:31, Madhavan Srinivasan wrote: >> This patch extends the use of illegal instruction as software >> breakpoint instruction across the ppc platform. Patch extends >> booke program interrupt code to support software breakpoint. >> >> Signed-off-by: Madhavan Srinivasan >> --- >> >> Patch is only compile tested. Will really help if >> someone can try it out and let me know comments. >> >> arch/powerpc/kvm/booke.c | 18 -- >> 1 file changed, 16 insertions(+), 2 deletions(-) >> >> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c >> index b4c89fa..1b84853 100644 >> --- a/arch/powerpc/kvm/booke.c >> +++ b/arch/powerpc/kvm/booke.c >> @@ -870,6 +870,10 @@ int kvmppc_handle_exit(struct kvm_run *run, struct >> kvm_vcpu *vcpu, >> case BOOKE_INTERRUPT_HV_PRIV: >> emulated = kvmppc_get_last_inst(vcpu, false, &last_inst); >> break; >> +case BOOKE_INTERRUPT_PROGRAM: >> +/*SW breakpoints arrive as illegal instructions on HV */ > > Is it my email client or is there a space missing again? ;) > Facepalm. Will fix it. > Also, please only fetch the last instruction if debugging is active. > Will change it. >> +emulated = kvmppc_get_last_inst(vcpu, false, &last_inst); >> +break; >> default: >> break; >> } >> @@ -947,7 +951,17 @@ int kvmppc_handle_exit(struct kvm_run *run, struct >> kvm_vcpu *vcpu, >> break; >> >> case BOOKE_INTERRUPT_PROGRAM: >> -if (vcpu->arch.shared->msr & (MSR_PR | MSR_GS)) { >> +if ((vcpu->arch.shared->msr & (MSR_PR | MSR_GS)) && >> +(last_inst == KVMPPC_INST_SW_BREAKPOINT)) { > > I think this is changing the logic from "if the guest is in user mode or > we're in HV, deflect" to "if the guest is in user mode or an HV guest > and the instruction is a breakpoint, treat it as debug. Otherwise > deflect". So you're essentially breaking PR KVM here from what I can tell. > > Why don't you just split the whole thing out to the beginning of > BOOKE_INTERRUPT_PROGRAM and check for > > a) debug is enabled > b) instruction is sw breakpoint > This is what we pretty much do for the server side. Will changes it. > instead? > >> +/* >> + * We are here because of an SW breakpoint instr, >> + * so lets return to host to handle. >> + */ >> +r = kvmppc_handle_debug(run, vcpu); >> +run->exit_reason = KVM_EXIT_DEBUG; >> +kvmppc_account_exit(vcpu, DEBUG_EXITS); >> +break; >> +} else { >> /* >> * Program traps generated by user-level software must >> * be handled by the guest kernel. >> @@ -1505,7 +1519,7 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, >> struct kvm_one_reg *reg) >> val = get_reg_val(reg->id, vcpu->arch.tsr); >> break; >> case KVM_REG_PPC_DEBUG_INST: >> -val = get_reg_val(reg->id, KVMPPC_INST_EHPRIV_DEBUG); > > Please also remove the definition of EHPRIV_DEBUG. > OK. Will do. Thanks for review Maddy > > Alex > >> +val = get_reg_val(reg->id, KVMPPC_INST_SW_BREAKPOINT); >> break; >> case KVM_REG_PPC_VRSAVE: >> val = get_reg_val(reg->id, vcpu->arch.vrsave); >> > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ARM/arm64: KVM: fix use of WnR bit in kvm_is_write_fault()
On 9 September 2014 11:35, Marc Zyngier wrote: > Hi Ard, > > > On 2014-09-08 21:29, Ard Biesheuvel wrote: >> >> The ISS encoding for an exception from a Data Abort has a WnR >> bit[6] that indicates whether the Data Abort was caused by a >> read or a write instruction. While there are several fields >> in the encoding that are only valid if the ISV bit[24] is set, >> WnR is not one of them, so we can read it unconditionally. >> >> Signed-off-by: Ard Biesheuvel >> --- >> >> This fixes an issue I observed with UEFI running under QEMU/KVM using >> NOR flash emulation and the upcoming KVM_CAP_READONLY_MEM support, where >> NOR flash reads were mistaken for NOR flash writes, resulting in all read >> accesses to go through the MMIO emulation layer. >> >> arch/arm/include/asm/kvm_mmu.h | 5 + >> arch/arm64/include/asm/kvm_mmu.h | 5 + >> 2 files changed, 2 insertions(+), 8 deletions(-) >> >> diff --git a/arch/arm/include/asm/kvm_mmu.h >> b/arch/arm/include/asm/kvm_mmu.h >> index 5cc0b0f5f72f..fad5648980ad 100644 >> --- a/arch/arm/include/asm/kvm_mmu.h >> +++ b/arch/arm/include/asm/kvm_mmu.h >> @@ -83,10 +83,7 @@ static inline bool kvm_is_write_fault(unsigned long >> hsr) >> unsigned long hsr_ec = hsr >> HSR_EC_SHIFT; >> if (hsr_ec == HSR_EC_IABT) >> return false; >> - else if ((hsr & HSR_ISV) && !(hsr & HSR_WNR)) >> - return false; >> - else >> - return true; >> + return hsr & HSR_WNR; >> } >> >> static inline void kvm_clean_pgd(pgd_t *pgd) >> diff --git a/arch/arm64/include/asm/kvm_mmu.h >> b/arch/arm64/include/asm/kvm_mmu.h >> index 8e138c7c53ac..09fd9e4c13d8 100644 >> --- a/arch/arm64/include/asm/kvm_mmu.h >> +++ b/arch/arm64/include/asm/kvm_mmu.h >> @@ -100,10 +100,7 @@ static inline bool kvm_is_write_fault(unsigned long >> esr) >> if (esr_ec == ESR_EL2_EC_IABT) >> return false; >> >> - if ((esr & ESR_EL2_ISV) && !(esr & ESR_EL2_WNR)) >> - return false; >> - >> - return true; >> + return esr & ESR_EL2_WNR; >> } >> >> static inline void kvm_clean_pgd(pgd_t *pgd) {} > > > Nice catch. One thing though. > > This is a case where code duplication has led to this glaring bug: > On both arm and arm64, kvm_emulate.h has code that implements this > correctly, just that we failed to use it. Blame me. > > I think this should be rewritten entierely in mmu.c, with something like > this (fully untested, of course): > > static bool kvm_is_write_fault(struct kvm_vcpu *vcpu) > { > if (kvm_vcpu_trap_is_iabt(vcpu)) > return false; > > return kvm_vcpu_dabt_iswrite(vcpu); > } > > Care to respin it? > Will do. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ARM/arm64: KVM: fix use of WnR bit in kvm_is_write_fault()
Hi Ard, On 2014-09-08 21:29, Ard Biesheuvel wrote: The ISS encoding for an exception from a Data Abort has a WnR bit[6] that indicates whether the Data Abort was caused by a read or a write instruction. While there are several fields in the encoding that are only valid if the ISV bit[24] is set, WnR is not one of them, so we can read it unconditionally. Signed-off-by: Ard Biesheuvel --- This fixes an issue I observed with UEFI running under QEMU/KVM using NOR flash emulation and the upcoming KVM_CAP_READONLY_MEM support, where NOR flash reads were mistaken for NOR flash writes, resulting in all read accesses to go through the MMIO emulation layer. arch/arm/include/asm/kvm_mmu.h | 5 + arch/arm64/include/asm/kvm_mmu.h | 5 + 2 files changed, 2 insertions(+), 8 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 5cc0b0f5f72f..fad5648980ad 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -83,10 +83,7 @@ static inline bool kvm_is_write_fault(unsigned long hsr) unsigned long hsr_ec = hsr >> HSR_EC_SHIFT; if (hsr_ec == HSR_EC_IABT) return false; - else if ((hsr & HSR_ISV) && !(hsr & HSR_WNR)) - return false; - else - return true; + return hsr & HSR_WNR; } static inline void kvm_clean_pgd(pgd_t *pgd) diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 8e138c7c53ac..09fd9e4c13d8 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -100,10 +100,7 @@ static inline bool kvm_is_write_fault(unsigned long esr) if (esr_ec == ESR_EL2_EC_IABT) return false; - if ((esr & ESR_EL2_ISV) && !(esr & ESR_EL2_WNR)) - return false; - - return true; + return esr & ESR_EL2_WNR; } static inline void kvm_clean_pgd(pgd_t *pgd) {} Nice catch. One thing though. This is a case where code duplication has led to this glaring bug: On both arm and arm64, kvm_emulate.h has code that implements this correctly, just that we failed to use it. Blame me. I think this should be rewritten entierely in mmu.c, with something like this (fully untested, of course): static bool kvm_is_write_fault(struct kvm_vcpu *vcpu) { if (kvm_vcpu_trap_is_iabt(vcpu)) return false; return kvm_vcpu_dabt_iswrite(vcpu); } Care to respin it? Thanks, M. -- Fast, cheap, reliable. Pick two. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: arm64: add gic-400 compatible
On Mon, Sep 08, 2014 at 11:21:44PM +0100, Joel Schopp wrote: > Add a one liner to identify the gic-400. It's gicv2 with optional MSI > extensions. > > Cc: Christoffer Dall > Signed-off-by: Joel Schopp > --- > virt/kvm/arm/vgic.c |1 + > 1 file changed, 1 insertion(+) > > diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c > index 73eba79..e81444e 100644 > --- a/virt/kvm/arm/vgic.c > +++ b/virt/kvm/arm/vgic.c > @@ -1550,6 +1550,7 @@ static struct notifier_block vgic_cpu_nb = { > > static const struct of_device_id vgic_ids[] = { > { .compatible = "arm,cortex-a15-gic", .data = vgic_v2_probe, }, > + { .compatible = "arm,gic-400", .data = vgic_v2_probe, }, Given this is documented and is supported by the GIC driver: Acked-by: Mark Rutland Mark. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] ARM/arm64: KVM: fix use of WnR bit in kvm_is_write_fault()
The ISS encoding for an exception from a Data Abort has a WnR bit[6] that indicates whether the Data Abort was caused by a read or a write instruction. While there are several fields in the encoding that are only valid if the ISV bit[24] is set, WnR is not one of them, so we can read it unconditionally. Instead of fixing both implementations of kvm_is_write_fault() in place, reimplement it just once using kvm_vcpu_dabt_iswrite(), which already does the right thing with respect to the WnR bit. Also fix up the callers to pass 'vcpu' Acked-by: Laszlo Ersek Signed-off-by: Ard Biesheuvel --- v2: reimplement locally in mmu.c and drop the 2 arch specific version add ack arch/arm/include/asm/kvm_mmu.h | 11 --- arch/arm/kvm/mmu.c | 12 ++-- arch/arm64/include/asm/kvm_mmu.h | 13 - 3 files changed, 10 insertions(+), 26 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 5cc0b0f5f72f..3f688b458143 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -78,17 +78,6 @@ static inline void kvm_set_pte(pte_t *pte, pte_t new_pte) flush_pmd_entry(pte); } -static inline bool kvm_is_write_fault(unsigned long hsr) -{ - unsigned long hsr_ec = hsr >> HSR_EC_SHIFT; - if (hsr_ec == HSR_EC_IABT) - return false; - else if ((hsr & HSR_ISV) && !(hsr & HSR_WNR)) - return false; - else - return true; -} - static inline void kvm_clean_pgd(pgd_t *pgd) { clean_dcache_area(pgd, PTRS_PER_S2_PGD * sizeof(pgd_t)); diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 747ef4c97d98..c68ec28f17c3 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -746,6 +746,14 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap) return false; } +static bool kvm_is_write_fault(struct kvm_vcpu *vcpu) +{ + if (kvm_vcpu_trap_is_iabt(vcpu)) + return false; + + return kvm_vcpu_dabt_iswrite(vcpu); +} + static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_memory_slot *memslot, unsigned long hva, unsigned long fault_status) @@ -760,7 +768,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, pfn_t pfn; pgprot_t mem_type = PAGE_S2; - write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu)); + write_fault = kvm_is_write_fault(vcpu); if (fault_status == FSC_PERM && !write_fault) { kvm_err("Unexpected L2 read permission error\n"); return -EFAULT; @@ -886,7 +894,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) gfn = fault_ipa >> PAGE_SHIFT; memslot = gfn_to_memslot(vcpu->kvm, gfn); hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable); - write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu)); + write_fault = kvm_is_write_fault(vcpu); if (kvm_is_error_hva(hva) || (write_fault && !writable)) { if (is_iabt) { /* Prefetch Abort on I/O address */ diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 8e138c7c53ac..737da742b293 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -93,19 +93,6 @@ void kvm_clear_hyp_idmap(void); #definekvm_set_pte(ptep, pte) set_pte(ptep, pte) #definekvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd) -static inline bool kvm_is_write_fault(unsigned long esr) -{ - unsigned long esr_ec = esr >> ESR_EL2_EC_SHIFT; - - if (esr_ec == ESR_EL2_EC_IABT) - return false; - - if ((esr & ESR_EL2_ISV) && !(esr & ESR_EL2_WNR)) - return false; - - return true; -} - static inline void kvm_clean_pgd(pgd_t *pgd) {} static inline void kvm_clean_pmd_entry(pmd_t *pmd) {} static inline void kvm_clean_pte(pte_t *pte) {} -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-rng: complete have_data completion in removing device
On Mon, Sep 08, 2014 at 11:29:51PM +0800, Amos Kong wrote: > On Wed, Aug 06, 2014 at 01:55:29PM +0530, Amit Shah wrote: > > On (Wed) 06 Aug 2014 [16:05:41], Amos Kong wrote: > > > On Wed, Aug 06, 2014 at 01:35:15AM +0800, Amos Kong wrote: > > > > When we try to hot-remove a busy virtio-rng device from QEMU monitor, > > > > the device can't be hot-removed. Because virtio-rng driver hangs at > > > > wait_for_completion_killable(). > > > > > > > > This patch fixed the hang by completing have_data completion before > > > > unregistering a virtio-rng device. > > > > > > Hi Amit, > > > > > > Before applying this patch, it's blocking insider > > > wait_for_completion_killable() > > > Applied this patch, wait_for_completion_killable() returns 0, > > > and vi->data_avail becomes 0, then rng_get_date() will return 0. > > > > Thanks for checking this. > > > > > Is it expected result? > > Hi Amit > > I think what will happen is vi->data_avail will be set to whatever it > > was set last. In case of a previous successful read request, the > > data_avail will be set to whatever number of bytes the host gave. On > > doing a hot-unplug on the succeeding wait, the value in data_avail > > will be re-used, and the hwrng core will wrongly take some bytes in > > the buffer as input from the host. > > > > So, I think we need to set vi->data_avail = 0; before calling > > wait_event_completion_killable(). I finally understand content, we need to set vi->data_avail to 0 before returning virtio_read(), it might enter wait_event_completion_killable() when we try to remove the device. We call complete() in remove_common(), then wait_event_completion_killable() will exit, but virtio_read() will be re-entered if the device still isn't unregistered, then re-stuck inside wait_event_completion_killable(). I tested some complex condition (both quick/slow backend), I found some problem in below two patches. I will post another fix later. The test result is expected. 1. Hotplug remove virtio-rng0, dd process will exit with an error: "dd: error reading ‘/dev/hwrng’: Operation not permitted" virtio-rng0 disappear from 'info pci' 2. Re-read by dd, hotplug virtio-rng1, dd process exit with same error, virtio-rng1 disappear Thanks, Amos > > > > Amit > > In my latest debugging, I found the hang is caused by unexpected reading > when we started to remove the device. > > I have two draft fix, 1) is skip unexpected reading by checking a > remove flag. 2) is unregistering device at the beginning of > remove_common(). I think second patch is better if it won't cause > new problem. > > The original patch (complete in remove_common()) is still necessary. > > Test results: hotplug issue disappeared (dd process will quit). > > > diff --git a/drivers/char/hw_random/virtio-rng.c > b/drivers/char/hw_random/virtio-rng.c > index 2e3139e..028797c 100644 > --- a/drivers/char/hw_random/virtio-rng.c > +++ b/drivers/char/hw_random/virtio-rng.c > @@ -35,6 +35,7 @@ struct virtrng_info { > unsigned int data_avail; > int index; > bool busy; > +bool remove; > bool hwrng_register_done; > }; > > @@ -68,6 +69,9 @@ static int virtio_read(struct hwrng *rng, void *buf, > size_t size, bool wait) > int ret; > struct virtrng_info *vi = (struct virtrng_info *)rng->priv; > > +if (vi->remove) > + return 0; > + > if (!vi->busy) { > vi->busy = true; > init_completion(&vi->have_data); > @@ -137,6 +141,8 @@ static void remove_common(struct virtio_device > *vdev) > { > struct virtrng_info *vi = vdev->priv; > > + vi->remove = true; > + complete(&vi->have_data); > vdev->config->reset(vdev); > vi->busy = false; > if (vi->hwrng_register_done) > > > diff --git a/drivers/char/hw_random/virtio-rng.c > b/drivers/char/hw_random/virtio-rng.c > index 2e3139e..9b8c2ce 100644 > --- a/drivers/char/hw_random/virtio-rng.c > +++ b/drivers/char/hw_random/virtio-rng.c > @@ -137,10 +137,11 @@ static void remove_common(struct virtio_device > *vdev) > { > struct virtrng_info *vi = vdev->priv; > > - vdev->config->reset(vdev); > - vi->busy = false; > if (vi->hwrng_register_done) > hwrng_unregister(&vi->hwrng); > + complete(&vi->have_data); > + vdev->config->reset(vdev); > + vi->busy = false; > vdev->config->del_vqs(vdev); > ida_simple_remove(&rng_index_ida, vi->index); > kfree(vi); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] ARM/arm64: KVM: fix use of WnR bit in kvm_is_write_fault()
[resending, as ARM email server seems to be in some mood] On 09/09/14 11:27, Ard Biesheuvel wrote: > The ISS encoding for an exception from a Data Abort has a WnR > bit[6] that indicates whether the Data Abort was caused by a > read or a write instruction. While there are several fields > in the encoding that are only valid if the ISV bit[24] is set, > WnR is not one of them, so we can read it unconditionally. > > Instead of fixing both implementations of kvm_is_write_fault() > in place, reimplement it just once using kvm_vcpu_dabt_iswrite(), > which already does the right thing with respect to the WnR bit. > Also fix up the callers to pass 'vcpu' > > Acked-by: Laszlo Ersek > Signed-off-by: Ard Biesheuvel Because I like that kind of diffstat: Acked-by: Marc Zyngier Christoffer, if you too are happy with that, I'll queue it right away. Thanks, M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] virtio-rng: fix stuck of hot-unplugging busy device
When we try to hot-remove a busy virtio-rng device from QEMU monitor, the device can't be hot-removed. Because virtio-rng driver hangs at wait_for_completion_killable(). This patch exits the waiting by completing have_data completion before unregistering, resets data_avail to avoid the hwrng core use wrong buffer bytes. Before real unregistering we should return -ENODEV for reading. When we hot-unplug the device, dd process in guest will exit. $ dd if=/dev/hwrng of=/dev/null dd: error reading ‘/dev/hwrng’: No such device Signed-off-by: Amos Kong Cc: sta...@vger.kernel.org --- V2: reset data_avail (Amit) adjust unregister order --- drivers/char/hw_random/virtio-rng.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/char/hw_random/virtio-rng.c b/drivers/char/hw_random/virtio-rng.c index 2e3139e..e76433b 100644 --- a/drivers/char/hw_random/virtio-rng.c +++ b/drivers/char/hw_random/virtio-rng.c @@ -68,6 +68,10 @@ static int virtio_read(struct hwrng *rng, void *buf, size_t size, bool wait) int ret; struct virtrng_info *vi = (struct virtrng_info *)rng->priv; + if (!vi->hwrng_register_done) { + return -ENODEV; + } + if (!vi->busy) { vi->busy = true; init_completion(&vi->have_data); @@ -137,10 +141,14 @@ static void remove_common(struct virtio_device *vdev) { struct virtrng_info *vi = vdev->priv; + vi->data_avail = 0; + complete(&vi->have_data); vdev->config->reset(vdev); vi->busy = false; - if (vi->hwrng_register_done) + if (vi->hwrng_register_done) { + vi->hwrng_register_done = false; hwrng_unregister(&vi->hwrng); + } vdev->config->del_vqs(vdev); ida_simple_remove(&rng_index_ida, vi->index); kfree(vi); -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Freebsd VM Hang while Bootup on KVM, processor Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
I have tried Freebsd10.0 64bit VM on the KVM Host running RHEL 6.4, processor Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz. The Freebsd VM hangs at the "booting... " prompt. If I boot the host kernel with "nosmep", then Freebsd VM boots up fine. I know Xeon V2 processors are having the smep feature. Any ideas/solutions on how to boot Freebsd VM with smep option enabled in Host Kernel. Thanks, Venkatesh -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
https://bugzilla.kernel.org/show_bug.cgi?id=81841 --- Comment #20 from Joel Schopp --- What updates are you looking for? Joerg's fix is now upstream. -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Freebsd VM Hang while Bootup on KVM, processor Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
On Tue, Sep 9, 2014 at 4:12 AM, Venkateswara Rao Nandigam wrote: > I have tried Freebsd10.0 64bit VM on the KVM Host running RHEL 6.4, processor > Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz. > > The Freebsd VM hangs at the "booting... " prompt. > > If I boot the host kernel with "nosmep", then Freebsd VM boots up fine. I > know Xeon V2 processors are having the smep feature. > > Any ideas/solutions on how to boot Freebsd VM with smep option enabled in > Host Kernel. > Does it boot on bare metal? -- Jun Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mutex
Hi Amit, Rusty RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1127062 steps: - Read random data by 'dd if=/dev/hwrng of=/dev/null' in guest - check sysfs files in the same time, 'cat /sys/class/misc/hw_random/rng_*' Result: cat process will get stuck, it will return if we kill dd process. We have some static variables (eg, current_rng, data_avail, etc) in hw_random/core.c, they are protected by rng_mutex. I try to workaround this issue by undelay(100) after mutex_unlock() in rng_dev_read(). This gives chance for hwrng_attr_*_show() to get mutex. This patch also contains some cleanup, moving some code out of mutex protection. Do you have some suggestion? Thanks. diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index aa30a25..fa69020 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -194,6 +194,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, } mutex_unlock(&rng_mutex); + udelay(100); if (need_resched()) schedule_timeout_interruptible(1); @@ -233,10 +234,10 @@ static ssize_t hwrng_attr_current_store(struct device *dev, int err; struct hwrng *rng; + err = -ENODEV; err = mutex_lock_interruptible(&rng_mutex); if (err) return -ERESTARTSYS; - err = -ENODEV; list_for_each_entry(rng, &rng_list, list) { if (strcmp(rng->name, buf) == 0) { if (rng == current_rng) { @@ -270,8 +271,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev, return -ERESTARTSYS; if (current_rng) name = current_rng->name; - ret = snprintf(buf, PAGE_SIZE, "%s\n", name); mutex_unlock(&rng_mutex); + ret = snprintf(buf, PAGE_SIZE, "%s\n", name); return ret; } @@ -284,19 +285,19 @@ static ssize_t hwrng_attr_available_show(struct device *dev, ssize_t ret = 0; struct hwrng *rng; + buf[0] = '\0'; err = mutex_lock_interruptible(&rng_mutex); if (err) return -ERESTARTSYS; - buf[0] = '\0'; list_for_each_entry(rng, &rng_list, list) { strncat(buf, rng->name, PAGE_SIZE - ret - 1); ret += strlen(rng->name); strncat(buf, " ", PAGE_SIZE - ret - 1); ret++; } + mutex_unlock(&rng_mutex); strncat(buf, "\n", PAGE_SIZE - ret - 1); ret++; - mutex_unlock(&rng_mutex); return ret; } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RFC virtio-rng: fail to read sysfs of a busy device
(Resend to fix the subject) Hi Amit, Rusty RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1127062 steps: - Read random data by 'dd if=/dev/hwrng of=/dev/null' in guest - check sysfs files in the same time, 'cat /sys/class/misc/hw_random/rng_*' Result: cat process will get stuck, it will return if we kill dd process. We have some static variables (eg, current_rng, data_avail, etc) in hw_random/core.c, they are protected by rng_mutex. I try to workaround this issue by undelay(100) after mutex_unlock() in rng_dev_read(). This gives chance for hwrng_attr_*_show() to get mutex. This patch also contains some cleanup, moving some code out of mutex protection. Do you have some suggestion? Thanks. diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index aa30a25..fa69020 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -194,6 +194,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, } mutex_unlock(&rng_mutex); + udelay(100); if (need_resched()) schedule_timeout_interruptible(1); @@ -233,10 +234,10 @@ static ssize_t hwrng_attr_current_store(struct device *dev, int err; struct hwrng *rng; + err = -ENODEV; err = mutex_lock_interruptible(&rng_mutex); if (err) return -ERESTARTSYS; - err = -ENODEV; list_for_each_entry(rng, &rng_list, list) { if (strcmp(rng->name, buf) == 0) { if (rng == current_rng) { @@ -270,8 +271,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev, return -ERESTARTSYS; if (current_rng) name = current_rng->name; - ret = snprintf(buf, PAGE_SIZE, "%s\n", name); mutex_unlock(&rng_mutex); + ret = snprintf(buf, PAGE_SIZE, "%s\n", name); return ret; } @@ -284,19 +285,19 @@ static ssize_t hwrng_attr_available_show(struct device *dev, ssize_t ret = 0; struct hwrng *rng; + buf[0] = '\0'; err = mutex_lock_interruptible(&rng_mutex); if (err) return -ERESTARTSYS; - buf[0] = '\0'; list_for_each_entry(rng, &rng_list, list) { strncat(buf, rng->name, PAGE_SIZE - ret - 1); ret += strlen(rng->name); strncat(buf, " ", PAGE_SIZE - ret - 1); ret++; } + mutex_unlock(&rng_mutex); strncat(buf, "\n", PAGE_SIZE - ret - 1); ret++; - mutex_unlock(&rng_mutex); return ret; } -- Amos. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 81841] amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
https://bugzilla.kernel.org/show_bug.cgi?id=81841 --- Comment #21 from Marti Raudsepp --- (In reply to Joel Schopp from comment #20) > What updates are you looking for? Joerg's fix is now upstream. Yes, but there's still the issue with southbridge component isolation. You requested more information from me in comment #15 that I provided in comment #17. For background see comment #9 from Alex Williamson: > AMD would need to confirm it. IOMMU groups are based on hardware advertised > isolation via the PCIe ACS capability. Without this, or a device specific > quirk to take its place, IOMMU groups must assume that peer-to-peer between > functions of a multi-function device is possible and therefore that the > devices are not isolated. [...] > I think the path forward is to get confirmation from AMD that these function > are isolated from each other and add quirks to the kernel. Then you won't > have the device dependencies in vfio-pci. The override patch allows you to > do that with just a kernel boot parameter. There's no gurantee that > pci-assign will ever be fixed since it's being phased out. -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Freebsd VM Hang while Bootup on KVM, processor Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
I haven't tried on bare metal, but have seen some other list, where in some one tried on bare metal and that was running well, as their VM on KVM was still failing. -Original Message- From: Nakajima, Jun [mailto:jun.nakaj...@intel.com] Sent: Tuesday, September 09, 2014 8:37 PM To: Venkateswara Rao Nandigam Cc: KVM Subject: Re: Freebsd VM Hang while Bootup on KVM, processor Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz On Tue, Sep 9, 2014 at 4:12 AM, Venkateswara Rao Nandigam wrote: > I have tried Freebsd10.0 64bit VM on the KVM Host running RHEL 6.4, processor > Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz. > > The Freebsd VM hangs at the "booting... " prompt. > > If I boot the host kernel with "nosmep", then Freebsd VM boots up fine. I > know Xeon V2 processors are having the smep feature. > > Any ideas/solutions on how to boot Freebsd VM with smep option enabled in > Host Kernel. > Does it boot on bare metal? -- Jun Intel Open Source Technology Center N�r��yb�X��ǧv�^�){.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf
Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
On Mon, Jul 21, 2014 at 04:14:24PM +0300, Gleb Natapov wrote: > On Wed, Jul 09, 2014 at 04:12:53PM -0300, mtosa...@redhat.com wrote: > > Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before > > deleting a pinned spte. > > > > Signed-off-by: Marcelo Tosatti > > > > --- > > arch/x86/kvm/mmu.c | 29 +++-- > > 1 file changed, 23 insertions(+), 6 deletions(-) > > > > Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c > > === > > --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c2014-07-09 > > 11:23:59.290744490 -0300 > > +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 11:24:58.449632435 > > -0300 > > @@ -1208,7 +1208,8 @@ > > * > > * Return true if tlb need be flushed. > > */ > > -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool > > pt_protect) > > +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool > > pt_protect, > > + bool skip_pinned) > > { > > u64 spte = *sptep; > > > > @@ -1218,6 +1219,22 @@ > > > > rmap_printk("rmap_write_protect: spte %p %llx\n", sptep, *sptep); > > > > + if (is_pinned_spte(spte)) { > > + /* keep pinned spte intact, mark page dirty again */ > > + if (skip_pinned) { > > + struct kvm_mmu_page *sp; > > + gfn_t gfn; > > + > > + sp = page_header(__pa(sptep)); > > + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt); > > + > > + mark_page_dirty(kvm, gfn); > > + return false; > Why not mark all pinned gfns as dirty in kvm_vm_ioctl_get_dirty_log() while > populating dirty_bitmap_buffer? The pinned gfns are per-vcpu. Requires looping all vcpus (not scalable). > > + } else > > + mmu_reload_pinned_vcpus(kvm); > Can you explain why do you need this? Because if skip_pinned = false, we want vcpus to exit (called from enable dirty logging codepath). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 4/4] KVM: MMU: pinned sps are not candidates for deletion.
On Tue, Jul 22, 2014 at 05:59:42AM +0800, Xiao Guangrong wrote: > > On Jul 10, 2014, at 3:12 AM, mtosa...@redhat.com wrote: > > > Skip pinned shadow pages when selecting pages to zap. > > It seems there is no way to prevent changing pinned spte on > zap-all path? Xiao, The way would be to reload remote mmus, forcing the vcpu to exit, zapping a page, then vcpu will pagefault any necessary page via kvm_mmu_pin_pages. kvm_mmu_invalidate_zap_all_pages does: - spin_lock(mmu_lock) - kvm_reload_remote_mmus ... - spin_unlock(mmu_lock) So its OK to change pinned spte on zap all path. > I am thing if we could move pinned spte to another list (eg. > pinned_shadow_pages) > instead of active list so that it can not be touched by any other free paths. > Your idea? As mentioned it above, it is ok to zap pinned sptes as long w reload remote mmus request is performed. That said, you still consider a separate list? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
On Tue, Jul 22, 2014 at 05:55:20AM +0800, Xiao Guangrong wrote: > > On Jul 10, 2014, at 3:12 AM, mtosa...@redhat.com wrote: > > > Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before > > deleting a pinned spte. > > > > Signed-off-by: Marcelo Tosatti > > > > --- > > arch/x86/kvm/mmu.c | 29 +++-- > > 1 file changed, 23 insertions(+), 6 deletions(-) > > > > Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c > > === > > --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c2014-07-09 > > 11:23:59.290744490 -0300 > > +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 11:24:58.449632435 > > -0300 > > @@ -1208,7 +1208,8 @@ > > * > > * Return true if tlb need be flushed. > > */ > > -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool > > pt_protect) > > +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool > > pt_protect, > > + bool skip_pinned) > > { > > u64 spte = *sptep; > > > > @@ -1218,6 +1219,22 @@ > > > > rmap_printk("rmap_write_protect: spte %p %llx\n", sptep, *sptep); > > > > + if (is_pinned_spte(spte)) { > > + /* keep pinned spte intact, mark page dirty again */ > > + if (skip_pinned) { > > + struct kvm_mmu_page *sp; > > + gfn_t gfn; > > + > > + sp = page_header(__pa(sptep)); > > + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt); > > + > > + mark_page_dirty(kvm, gfn); > > + return false; > > + } else > > + mmu_reload_pinned_vcpus(kvm); > > + } > > + > > + > > if (pt_protect) > > spte &= ~SPTE_MMU_WRITEABLE; > > spte = spte & ~PT_WRITABLE_MASK; > > This is also a window between marking spte readonly and re-ping… > IIUC, I think all spte spte can not be zapped and write-protected at any time It is safe because mmu_lock is held by kvm_mmu_slot_remove_write_access ? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: fix api documentation of KVM_GET_EMULATED_CPUID
It looks like when this was initially merged it got accidentally included in the following section. I've just moved it back in the correct section and re-numbered it as other ioctls have been added since. Signed-off-by: Alex Bennée diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 723d3f3..60c1582 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2620,6 +2620,76 @@ When debug events exit the main run loop with the reason KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run structure containing architecture specific debug information. +4.88 KVM_GET_EMULATED_CPUID + +Capability: KVM_CAP_EXT_EMUL_CPUID +Architectures: x86 +Type: system ioctl +Parameters: struct kvm_cpuid2 (in/out) +Returns: 0 on success, -1 on error + +struct kvm_cpuid2 { + __u32 nent; + __u32 flags; + struct kvm_cpuid_entry2 entries[0]; +}; + +The member 'flags' is used for passing flags from userspace. + +#define KVM_CPUID_FLAG_SIGNIFCANT_INDEXBIT(0) +#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1) +#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2) + +struct kvm_cpuid_entry2 { + __u32 function; + __u32 index; + __u32 flags; + __u32 eax; + __u32 ebx; + __u32 ecx; + __u32 edx; + __u32 padding[3]; +}; + +This ioctl returns x86 cpuid features which are emulated by +kvm.Userspace can use the information returned by this ioctl to query +which features are emulated by kvm instead of being present natively. + +Userspace invokes KVM_GET_EMULATED_CPUID by passing a kvm_cpuid2 +structure with the 'nent' field indicating the number of entries in +the variable-size array 'entries'. If the number of entries is too low +to describe the cpu capabilities, an error (E2BIG) is returned. If the +number is too high, the 'nent' field is adjusted and an error (ENOMEM) +is returned. If the number is just right, the 'nent' field is adjusted +to the number of valid entries in the 'entries' array, which is then +filled. + +The entries returned are the set CPUID bits of the respective features +which kvm emulates, as returned by the CPUID instruction, with unknown +or unsupported feature bits cleared. + +Features like x2apic, for example, may not be present in the host cpu +but are exposed by kvm in KVM_GET_SUPPORTED_CPUID because they can be +emulated efficiently and thus not included here. + +The fields in each entry are defined as follows: + + function: the eax value used to obtain the entry + index: the ecx value used to obtain the entry (for entries that are + affected by ecx) + flags: an OR of zero or more of the following: +KVM_CPUID_FLAG_SIGNIFCANT_INDEX: + if the index field is valid +KVM_CPUID_FLAG_STATEFUL_FUNC: + if cpuid for this function returns different values for successive + invocations; there will be several entries with the same function, + all with this flag set +KVM_CPUID_FLAG_STATE_READ_NEXT: + for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is + the first entry to be read by a cpu + eax, ebx, ecx, edx: the values returned by the cpuid instruction for + this function/index combination + 5. The kvm_run structure @@ -2918,76 +2988,6 @@ and usually define the validity of a groups of registers. (e.g. one bit }; -4.81 KVM_GET_EMULATED_CPUID - -Capability: KVM_CAP_EXT_EMUL_CPUID -Architectures: x86 -Type: system ioctl -Parameters: struct kvm_cpuid2 (in/out) -Returns: 0 on success, -1 on error - -struct kvm_cpuid2 { - __u32 nent; - __u32 flags; - struct kvm_cpuid_entry2 entries[0]; -}; - -The member 'flags' is used for passing flags from userspace. - -#define KVM_CPUID_FLAG_SIGNIFCANT_INDEXBIT(0) -#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1) -#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2) - -struct kvm_cpuid_entry2 { - __u32 function; - __u32 index; - __u32 flags; - __u32 eax; - __u32 ebx; - __u32 ecx; - __u32 edx; - __u32 padding[3]; -}; - -This ioctl returns x86 cpuid features which are emulated by -kvm.Userspace can use the information returned by this ioctl to query -which features are emulated by kvm instead of being present natively. - -Userspace invokes KVM_GET_EMULATED_CPUID by passing a kvm_cpuid2 -structure with the 'nent' field indicating the number of entries in -the variable-size array 'entries'. If the number of entries is too low -to describe the cpu capabilities, an error (E2BIG) is returned. If the -number is too high, the 'nent' field is adjusted and an error (ENOMEM) -is returned. If the number is just right, the 'nent' field is adjusted -to the number of valid entries in the 'entries' array, which is then -filled. - -The entries returned are the set CPUID bits of the respective features -which kvm em
[PATCH 0/2] KVM API documentation patches
Hi, I'm preparing to add ARM KVM GDB support and I went to read the API documentation and found it surprisingly mute on the subject ;-) The first patch documents the "new" KVM_SET_GUEST_DEBUG ioctl based on reviewing the code. I've included a long CC list of people who've actually done the various implementations who I hope can sanity check the write-up. The second is a trivial formatting fix for what looks like a minor merge trip-up. Alex Bennée (2): KVM: document KVM_SET_GUEST_DEBUG api KVM: fix api documentation of KVM_GET_EMULATED_CPUID Documentation/virtual/kvm/api.txt | 184 +++--- 1 file changed, 114 insertions(+), 70 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: document KVM_SET_GUEST_DEBUG api
In preparation for working on the ARM implementation I noticed the debug interface was missing from the API document. I've pieced together the expected behaviour from the code and commit messages written it up as best I can. Signed-off-by: Alex Bennée diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index d3dde61..723d3f3 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2575,6 +2575,50 @@ associated with the service will be forgotten, and subsequent RTAS calls by the guest for that service will be passed to userspace to be handled. +4.87 KVM_SET_GUEST_DEBUG + +Capability: KVM_CAP_SET_GUEST_DEBUG +Architectures: x86, s390, ppc +Type: vcpu ioctl +Parameters: struct kvm_guest_debug (in) +Returns: 0 on success; -1 on error + +struct kvm_guest_debug { + __u32 control; + __u32 pad; + struct kvm_guest_debug_arch arch; +}; + +Set up the processor specific debug registers and configure vcpu for +handling guest debug events. There are two parts to the structure, the +first a control bitfield indicates the type of debug events to handle +when running. Common control bits are: + + - KVM_GUESTDBG_ENABLE:guest debugging is enabled + - KVM_GUESTDBG_SINGLESTEP:the next run should single-step + +The top 16 bits of the control field are architecture specific control +flags which can include the following: + + - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86] + - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390] + - KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86] + - KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86] + - KVM_GUESTDBG_EXIT_PENDING: trigger an immediate guest exit [s390] + +For example KVM_GUESTDBG_USE_SW_BP indicates that software breakpoints +are enabled in memory so we need to ensure breakpoint exceptions are +correctly trapped and the KVM run loop exits at the breakpoint and not +running off into the normal guest vector. For KVM_GUESTDBG_USE_HW_BP +we need to ensure the guest vCPUs architecture specific registers are +updated to the correct (supplied) values. + +The second part of the structure is architecture specific and +typically contains a set of debug registers. + +When debug events exit the main run loop with the reason +KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run +structure containing architecture specific debug information. 5. The kvm_run structure -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2 v6] powerpc/kvm: common sw breakpoint instr across ppc
This patch extends the use of illegal instruction as software breakpoint instruction across the ppc platform. Patch extends booke program interrupt code to support software breakpoint. Signed-off-by: Madhavan Srinivasan --- Patch is only compile tested. Will really help if someone can try it out and let me know the comments. arch/powerpc/include/asm/kvm_booke.h | 2 -- arch/powerpc/kvm/booke.c | 19 ++- 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h index f7aa5cc..ab7123a 100644 --- a/arch/powerpc/include/asm/kvm_booke.h +++ b/arch/powerpc/include/asm/kvm_booke.h @@ -30,8 +30,6 @@ #define EHPRIV_OC_SHIFT11 /* "ehpriv 1" : ehpriv with OC = 1 is used for debug emulation */ #define EHPRIV_OC_DEBUG1 -#define KVMPPC_INST_EHPRIV_DEBUG (KVMPPC_INST_EHPRIV | \ -(EHPRIV_OC_DEBUG << EHPRIV_OC_SHIFT)) static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val) { diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index b4c89fa..365e85d 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -870,6 +870,11 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, case BOOKE_INTERRUPT_HV_PRIV: emulated = kvmppc_get_last_inst(vcpu, false, &last_inst); break; + case BOOKE_INTERRUPT_PROGRAM: + /* SW breakpoints arrive as illegal instructions on HV */ + if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP) + emulated = kvmppc_get_last_inst(vcpu, false, &last_inst); + break; default: break; } @@ -947,6 +952,18 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, break; case BOOKE_INTERRUPT_PROGRAM: + if ((vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP) && + (last_inst == KVMPPC_INST_SW_BREAKPOINT)) { + /* +* We are here because of an SW breakpoint instr, +* so lets return to host to handle. +*/ + r = kvmppc_handle_debug(run, vcpu); + run->exit_reason = KVM_EXIT_DEBUG; + kvmppc_account_exit(vcpu, DEBUG_EXITS); + break; + } + if (vcpu->arch.shared->msr & (MSR_PR | MSR_GS)) { /* * Program traps generated by user-level software must @@ -1505,7 +1522,7 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) val = get_reg_val(reg->id, vcpu->arch.tsr); break; case KVM_REG_PPC_DEBUG_INST: - val = get_reg_val(reg->id, KVMPPC_INST_EHPRIV_DEBUG); + val = get_reg_val(reg->id, KVMPPC_INST_SW_BREAKPOINT); break; case KVM_REG_PPC_VRSAVE: val = get_reg_val(reg->id, vcpu->arch.vrsave); -- 1.7.11.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2 v6] powerpc/kvm: support to handle sw breakpoint
This patchset adds ppc64 server side support for software breakpoint and extends the use of illegal instruction as software breakpoint across ppc platform. Patch 1, adds kernel side support for software breakpoint. Design is that, by using an illegal instruction, we trap to hypervisor via Emulation Assistance interrupt, where we check for the illegal instruction and accordingly we return to Host or Guest. Patch also adds support for software breakpoint in PR KVM. Patch 2,extends the use of illegal instruction as software breakpoint instruction across the ppc platform. Patch extends booke program interrupt code to support software breakpoint. Patch 2 is only compile tested. Will really help if someone can try it out and let me know comments. Changes v5->v6: Fixed checkpatch.pl errors Added abort case as a seperate condition block in emulation function in book3s_hv Added debug active check for instruction emulation Modified return value to emulate_fail in else part of illegal instruction case in book3s_pr.c Removed KVMPPC_INST_EHPRIV_DEBUG instruction Moved the debug instruction as a separate block in program interrupt code Changes v4->v5: Made changes to code comments and commit messages Added debugging active checks for illegal instr comparison Added debug instruction check in emulate code Extended SW breakpoint to booke Changes v3->v4: Made changes to code comments and removed #define of zero opcode Added a new function to handle the debug instruction emulation in book3s_hv Rebased the code to latest upstream source. Changes v2->v3: Changed the debug instructions. Using the all zero opcode in the instruction word as illegal instruction as mentioned in Power ISA instead of ABS Removed reg updated in emulation assist and added a call to kvmppc_emulate_instruction for reg update. Changes v1->v2: Moved the debug instruction #def to kvm_book3s.h. This way PR_KVM can also share it. Added code to use KVM get one reg infrastructure to get debug opcode. Updated emulate.c to include emulation of debug instruction incase of PR_KVM. Made changes to commit message. Madhavan Srinivasan (2): powerpc/kvm: support to handle sw breakpoint powerpc/kvm: common sw breakpoint instr across ppc arch/powerpc/include/asm/kvm_booke.h | 2 -- arch/powerpc/include/asm/kvm_ppc.h | 6 ++ arch/powerpc/kvm/book3s.c| 3 ++- arch/powerpc/kvm/book3s_hv.c | 41 arch/powerpc/kvm/book3s_pr.c | 3 +++ arch/powerpc/kvm/booke.c | 19 - arch/powerpc/kvm/emulate.c | 15 + 7 files changed, 81 insertions(+), 8 deletions(-) -- 1.7.11.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2 v6] powerpc/kvm: support to handle sw breakpoint
This patch adds kernel side support for software breakpoint. Design is that, by using an illegal instruction, we trap to hypervisor via Emulation Assistance interrupt, where we check for the illegal instruction and accordingly we return to Host or Guest. Patch also adds support for software breakpoint in PR KVM. Signed-off-by: Madhavan Srinivasan --- arch/powerpc/include/asm/kvm_ppc.h | 6 ++ arch/powerpc/kvm/book3s.c | 3 ++- arch/powerpc/kvm/book3s_hv.c | 41 ++ arch/powerpc/kvm/book3s_pr.c | 3 +++ arch/powerpc/kvm/emulate.c | 15 ++ 5 files changed, 63 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index fb86a22..dd83c9a 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -38,6 +38,12 @@ #include #endif +/* + * KVMPPC_INST_SW_BREAKPOINT is debug Instruction + * for supporting software breakpoint. + */ +#define KVMPPC_INST_SW_BREAKPOINT 0x0000 + enum emulation_result { EMULATE_DONE, /* no further processing */ EMULATE_DO_MMIO, /* kvm_run filled with MMIO request */ diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index dd03f6b..00e9c9f 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -778,7 +778,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg) { - return -EINVAL; + vcpu->guest_debug = dbg->control; + return 0; } void kvmppc_decrementer_func(unsigned long data) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 27cced9..000fbec 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -725,6 +725,30 @@ static int kvmppc_hcall_impl_hv(unsigned long cmd) return kvmppc_hcall_impl_hv_realmode(cmd); } +static int kvmppc_emulate_debug_inst(struct kvm_run *run, + struct kvm_vcpu *vcpu) +{ + u32 last_inst; + + if (kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst) != + EMULATE_DONE) { + /* +* Fetch failed, so return to guest and +* try executing it again. +*/ + return RESUME_GUEST; + } + + if (last_inst == KVMPPC_INST_SW_BREAKPOINT) { + run->exit_reason = KVM_EXIT_DEBUG; + run->debug.arch.address = kvmppc_get_pc(vcpu); + return RESUME_HOST; + } else { + kvmppc_core_queue_program(vcpu, SRR1_PROGILL); + return RESUME_GUEST; + } +} + static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, struct task_struct *tsk) { @@ -807,12 +831,18 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, break; /* * This occurs if the guest executes an illegal instruction. -* We just generate a program interrupt to the guest, since -* we don't emulate any guest instructions at this stage. +* If the guest debug is disabled, generate a program interrupt +* to the guest. If guest debug is enabled, we need to check +* whether the instruction is a software breakpoint instruction. +* Accordingly return to Guest or Host. */ case BOOK3S_INTERRUPT_H_EMUL_ASSIST: - kvmppc_core_queue_program(vcpu, SRR1_PROGILL); - r = RESUME_GUEST; + if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP) { + r = kvmppc_emulate_debug_inst(run, vcpu); + } else { + kvmppc_core_queue_program(vcpu, SRR1_PROGILL); + r = RESUME_GUEST; + } break; /* * This occurs if the guest (kernel or userspace), does something that @@ -922,6 +952,9 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id, long int i; switch (id) { + case KVM_REG_PPC_DEBUG_INST: + *val = get_reg_val(id, KVMPPC_INST_SW_BREAKPOINT); + break; case KVM_REG_PPC_HIOR: *val = get_reg_val(id, 0); break; diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index faffb27..6d73708 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -1319,6 +1319,9 @@ static int kvmppc_get_one_reg_pr(struct kvm_vcpu *vcpu, u64 id, int r = 0; switch (id) { + case KVM_REG_PPC_DEBUG_INST: + *val = get_reg_val(id, KVMPPC_INST_SW_BREAKPOINT); + break; case KVM_REG_PPC_HIOR: *val = get_reg_val(id, to_book3s(vcpu)->h
Re: [Qemu-devel] [PATCH v3 2/2] docs: update ivshmem device spec
On 09/01/2014 03:52 AM, David Marchand wrote: >>> What about upgrading QEMU and ivshmem-server while you have existing >>> guests? You cannot restart ivshmem-server, and the new QEMU would have >>> to talk to the old ivshmem-server. >> >> Version negotiation also helps avoid confusion if someone combines >> ivshmem-server and QEMU from different origins (e.g. built from source >> and distro packaged). Don't underestimate the likelihood of this happening. Any long-running process (which an ivshmem-server will be) continues running at the old version, even when a package upgrade installs a new qemu binary; the new binary should still be able to manage connections to the already-running server. Even neater would be a solution where an existing ivshmem-server could re-exec an updated ivshmem-server binary that resulted from a distro upgrade, hand over all state required for the new server to take over from the point managed by the old server, so that you aren't stuck running the old binary forever. But that's a lot trickier to write, so it is not necessary for a first implementation; and if you do that, then you have the reverse situation to worry about (the new server must still accept communication from existing old qemu binaries). Note that the goal here is to support upgrades; it is probably okay if downgrading from a new binary back to an old doesn't work correctly (because the new software was using a feature not present in the old). >> >> It's a safeguard to prevent hard-to-diagnose failures when the system is >> misconfigured. >> > > Hum, so you want the code to be defensive against mis-use, why not. > > I wanted to keep modifications on ivshmem as little as possible in a > first phase (all the more so as there are potential ivshmem users out > there that I think will be impacted by a protocol change). Existing ivshmem users MUST be aware that they are using something that is not yet polished, and be prepared to make the upgrade to the polished version. It's best to minimize the hassle by making them upgrade exactly once to a fully-robust version, rather than to have them upgrade to a slightly-more robust version only to find out we didn't plan ahead well enough to make further extensions in a back-compatible manner. > > Sending the version as the first "vm_id" with an associated fd to -1 > before sending the real client id should work with existing QEMU client > code (hw/misc/ivshmem.c). > > Do you have a better idea ? > Is there a best practice in QEMU for "version negotiation" that could > work with ivshmem protocol ? QMP starts off with a mandatory "qmp_capabilities" handshake, although we haven't yet had to define any capabilities where cross-versioned communication differs as a result. Migration is somewhat of an example, except that it is one-directional (we don't have a feedback path), so it is somewhat best effort. The qcow2 v3 file format is an example of declaring features, rather than version numbers, and making decisions about whether a feature is compatible (older clients can safely ignore the bit, without corrupting the image but possibly having worse performance) vs. incompatible (older clients must reject the image, because not handling the feature correctly would corrupt the image). The best handshakes are bi-directional - both sides advertise their version (or better, their features), and a well-defined algorithm for settling on the common subset of advertised features then ensures that the two sides know how to talk to each other, or give a reason for either side to disconnect early because of a missing feature. > > I have a v4 ready with this (and all the pending comments), I will send > it later unless a better idea is exposed. > > > Thanks. > -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [PATCH 2/2] KVM: fix api documentation of KVM_GET_EMULATED_CPUID
On Tue, Sep 09, 2014 at 05:27:19PM +0100, Alex Bennée wrote: > It looks like when this was initially merged it got accidentally included > in the following section. Whoops. > I've just moved it back in the correct section > and re-numbered it as other ioctls have been added since. > > Signed-off-by: Alex Bennée Acked-by: Borislav Petkov -- Regards/Gruss, Boris. -- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 v6] powerpc/kvm: support to handle sw breakpoint
On 09.09.14 19:07, Madhavan Srinivasan wrote: > This patch adds kernel side support for software breakpoint. > Design is that, by using an illegal instruction, we trap to hypervisor > via Emulation Assistance interrupt, where we check for the illegal instruction > and accordingly we return to Host or Guest. Patch also adds support for > software breakpoint in PR KVM. > > Signed-off-by: Madhavan Srinivasan > --- > arch/powerpc/include/asm/kvm_ppc.h | 6 ++ > arch/powerpc/kvm/book3s.c | 3 ++- > arch/powerpc/kvm/book3s_hv.c | 41 > ++ > arch/powerpc/kvm/book3s_pr.c | 3 +++ > arch/powerpc/kvm/emulate.c | 15 ++ > 5 files changed, 63 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_ppc.h > b/arch/powerpc/include/asm/kvm_ppc.h > index fb86a22..dd83c9a 100644 > --- a/arch/powerpc/include/asm/kvm_ppc.h > +++ b/arch/powerpc/include/asm/kvm_ppc.h > @@ -38,6 +38,12 @@ > #include > #endif > > +/* > + * KVMPPC_INST_SW_BREAKPOINT is debug Instruction > + * for supporting software breakpoint. > + */ > +#define KVMPPC_INST_SW_BREAKPOINT0x0000 > + > enum emulation_result { > EMULATE_DONE, /* no further processing */ > EMULATE_DO_MMIO, /* kvm_run filled with MMIO request */ > diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c > index dd03f6b..00e9c9f 100644 > --- a/arch/powerpc/kvm/book3s.c > +++ b/arch/powerpc/kvm/book3s.c > @@ -778,7 +778,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, > int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, > struct kvm_guest_debug *dbg) > { > - return -EINVAL; > + vcpu->guest_debug = dbg->control; > + return 0; > } > > void kvmppc_decrementer_func(unsigned long data) > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 27cced9..000fbec 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -725,6 +725,30 @@ static int kvmppc_hcall_impl_hv(unsigned long cmd) > return kvmppc_hcall_impl_hv_realmode(cmd); > } > > +static int kvmppc_emulate_debug_inst(struct kvm_run *run, > + struct kvm_vcpu *vcpu) > +{ > + u32 last_inst; > + > + if (kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst) != > + EMULATE_DONE) { > + /* > + * Fetch failed, so return to guest and > + * try executing it again. > + */ > + return RESUME_GUEST; > + } > + > + if (last_inst == KVMPPC_INST_SW_BREAKPOINT) { > + run->exit_reason = KVM_EXIT_DEBUG; > + run->debug.arch.address = kvmppc_get_pc(vcpu); > + return RESUME_HOST; > + } else { > + kvmppc_core_queue_program(vcpu, SRR1_PROGILL); > + return RESUME_GUEST; > + } > +} > + > static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, >struct task_struct *tsk) > { > @@ -807,12 +831,18 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, > struct kvm_vcpu *vcpu, > break; > /* >* This occurs if the guest executes an illegal instruction. > - * We just generate a program interrupt to the guest, since > - * we don't emulate any guest instructions at this stage. > + * If the guest debug is disabled, generate a program interrupt > + * to the guest. If guest debug is enabled, we need to check > + * whether the instruction is a software breakpoint instruction. > + * Accordingly return to Guest or Host. >*/ > case BOOK3S_INTERRUPT_H_EMUL_ASSIST: > - kvmppc_core_queue_program(vcpu, SRR1_PROGILL); > - r = RESUME_GUEST; > + if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP) { > + r = kvmppc_emulate_debug_inst(run, vcpu); > + } else { > + kvmppc_core_queue_program(vcpu, SRR1_PROGILL); > + r = RESUME_GUEST; > + } > break; > /* >* This occurs if the guest (kernel or userspace), does something that > @@ -922,6 +952,9 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, > u64 id, > long int i; > > switch (id) { > + case KVM_REG_PPC_DEBUG_INST: > + *val = get_reg_val(id, KVMPPC_INST_SW_BREAKPOINT); > + break; > case KVM_REG_PPC_HIOR: > *val = get_reg_val(id, 0); > break; > diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c > index faffb27..6d73708 100644 > --- a/arch/powerpc/kvm/book3s_pr.c > +++ b/arch/powerpc/kvm/book3s_pr.c Very nice patch set :). The only thing we're missing now is that Book3S PR does not allow sw breakpoints to arrive in user mode (MSR.PR == 1), because there we're ne
Re: [PATCH 0/2 v6] powerpc/kvm: support to handle sw breakpoint
On 09.09.14 19:07, Madhavan Srinivasan wrote: > This patchset adds ppc64 server side support for software breakpoint > and extends the use of illegal instruction as software > breakpoint across ppc platform. > > Patch 1, adds kernel side support for software breakpoint. > Design is that, by using an illegal instruction, we trap to > hypervisor via Emulation Assistance interrupt, where we check > for the illegal instruction and accordingly we return to Host > or Guest. Patch also adds support for software breakpoint > in PR KVM. > > Patch 2,extends the use of illegal instruction as software > breakpoint instruction across the ppc platform. Patch extends > booke program interrupt code to support software breakpoint. Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Question about using "qemu-kvm" to create vm including "-mem-path"
Hi everyone, I was wondering if anyone could answer my confusion here. I am using qemu-kvm 0.12.1 installed with centos-6.5. I would like to use large pages to back my guest vm, so I did it as this: http://www.linux-kvm.org/page/UsingLargePages, and added "-mem-path /hugetlbfs" in the command to create VM. When things went well, after creating vm, I type "tail /proc/meminfo", it would show like this, which means "HugePages" was used: [root@mind domain]# tail /proc/meminfo VmallocChunk: 34359481380 kB HardwareCorrupted: 0 kB AnonHugePages: 1368064 kB HugePages_Total: 512 HugePages_Free: 384 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB DirectMap4k:6144 kB DirectMap2M:12552192 kB However if I add "root=/dev/vda1" in the command to create vm, which looks like "-append root=/dev/vda1 console=ttyS0" in command line, then "-mem-path /hugetlbfs" doesn't work: [root@mind domain]# tail /proc/meminfo VmallocChunk: 34359481380 kB HardwareCorrupted: 0 kB AnonHugePages: 1390592 kB HugePages_Total: 512 HugePages_Free: 512 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB DirectMap4k:6144 kB DirectMap2M:12552192 kB Does anyone know how does "root=/dev/vda1" conflict with hugepages and how do I use both of them to create a VM? I am actually using an xml file to create VM, and those commands I mentioned above are shown in "ps aux | grep qemu". Thanks, Min-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] virtio-rng: fix stuck of hot-unplugging busy device
Hi Amos, On (Tue) 09 Sep 2014 [19:14:02], Amos Kong wrote: > When we try to hot-remove a busy virtio-rng device from QEMU monitor, > the device can't be hot-removed. Because virtio-rng driver hangs at > wait_for_completion_killable(). > > This patch exits the waiting by completing have_data completion before > unregistering, resets data_avail to avoid the hwrng core use wrong > buffer bytes. Before real unregistering we should return -ENODEV for > reading. > > When we hot-unplug the device, dd process in guest will exit. > $ dd if=/dev/hwrng of=/dev/null > dd: error reading ‘/dev/hwrng’: No such device > > Signed-off-by: Amos Kong > Cc: sta...@vger.kernel.org > --- > V2: reset data_avail (Amit) > adjust unregister order Thanks, this looks good. Can you please split into two parts, the complete() in one, and the hwrng_register_done stuff into another? Thanks, Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC virtio-rng: fail to read sysfs of a busy device
On (Tue) 09 Sep 2014 [23:23:07], Amos Kong wrote: > (Resend to fix the subject) > > Hi Amit, Rusty > > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1127062 > steps: > - Read random data by 'dd if=/dev/hwrng of=/dev/null' in guest > - check sysfs files in the same time, 'cat /sys/class/misc/hw_random/rng_*' > > Result: cat process will get stuck, it will return if we kill dd process. How common is it going to be to have a long-running 'dd' process on /dev/hwrng? Also, with the new khwrng thread, reading from /dev/hwrng isn't required -- just use /dev/random? (This doesn't mean we shouldn't fix the issue here...) > We have some static variables (eg, current_rng, data_avail, etc) in > hw_random/core.c, > they are protected by rng_mutex. I try to workaround this issue by > undelay(100) > after mutex_unlock() in rng_dev_read(). This gives chance for > hwrng_attr_*_show() > to get mutex. > > This patch also contains some cleanup, moving some code out of mutex > protection. > > Do you have some suggestion? Thanks. > > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > index aa30a25..fa69020 100644 > --- a/drivers/char/hw_random/core.c > +++ b/drivers/char/hw_random/core.c > @@ -194,6 +194,7 @@ static ssize_t rng_dev_read(struct file *filp, char > __user *buf, > } > > mutex_unlock(&rng_mutex); > + udelay(100); We have a need_resched() right below. Why doesn't that work? > if (need_resched()) > schedule_timeout_interruptible(1); > @@ -233,10 +234,10 @@ static ssize_t hwrng_attr_current_store(struct device > *dev, > int err; > struct hwrng *rng; The following hunk doesn't work: > + err = -ENODEV; > err = mutex_lock_interruptible(&rng_mutex); err is being set to another value in the next line! > if (err) > return -ERESTARTSYS; > - err = -ENODEV; And all usage of err below now won't have -ENODEV but some other value. > list_for_each_entry(rng, &rng_list, list) { > if (strcmp(rng->name, buf) == 0) { > if (rng == current_rng) { > @@ -270,8 +271,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev, > return -ERESTARTSYS; > if (current_rng) > name = current_rng->name; > - ret = snprintf(buf, PAGE_SIZE, "%s\n", name); > mutex_unlock(&rng_mutex); > + ret = snprintf(buf, PAGE_SIZE, "%s\n", name); This looks OK... > > return ret; > } > @@ -284,19 +285,19 @@ static ssize_t hwrng_attr_available_show(struct device > *dev, > ssize_t ret = 0; > struct hwrng *rng; > > + buf[0] = '\0'; > err = mutex_lock_interruptible(&rng_mutex); > if (err) > return -ERESTARTSYS; > > - buf[0] = '\0'; > list_for_each_entry(rng, &rng_list, list) { > strncat(buf, rng->name, PAGE_SIZE - ret - 1); > ret += strlen(rng->name); > strncat(buf, " ", PAGE_SIZE - ret - 1); > ret++; > } > + mutex_unlock(&rng_mutex); > strncat(buf, "\n", PAGE_SIZE - ret - 1); > ret++; > - mutex_unlock(&rng_mutex); But this isn't resulting in savings; the majority of the time is being spent in the for loop, and that writes to the buffer. BTW I don't expect strcat'ing to the buf in each of these scenarios is a long operation, so this reworking doesn't strike to me as something we should pursue. Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/2] virtio-rng: fix hotunplug
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1121540 When we try to hot-unplugging a busy virtio-rng device, the device can't be removed. And the reading process in guest gets stuck. Those two patches fixed this issue by completing have_data completion and preventing invalid reading. Thanks for the help of Amit. Cc: sta...@vger.kernel.org V2: reset data_avail (Amit) adjust unregister order V3: split patch, update commitlog Amos Kong (2): virtio-rng: fix stuck of hot-unplugging busy device virtio-rng: skip reading when we start to remove the device drivers/char/hw_random/virtio-rng.c | 7 +++ 1 file changed, 7 insertions(+) -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/2] virtio-rng: fix stuck of hot-unplugging busy device
When we try to hot-remove a busy virtio-rng device from QEMU monitor, the device can't be hot-removed. Because virtio-rng driver hangs at wait_for_completion_killable(). This patch exits the waiting by completing have_data completion before unregistering, resets data_avail to avoid the hwrng core use wrong buffer bytes. Signed-off-by: Amos Kong Cc: sta...@vger.kernel.org --- drivers/char/hw_random/virtio-rng.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/char/hw_random/virtio-rng.c b/drivers/char/hw_random/virtio-rng.c index 2e3139e..849b228 100644 --- a/drivers/char/hw_random/virtio-rng.c +++ b/drivers/char/hw_random/virtio-rng.c @@ -137,6 +137,8 @@ static void remove_common(struct virtio_device *vdev) { struct virtrng_info *vi = vdev->priv; + vi->data_avail = 0; + complete(&vi->have_data); vdev->config->reset(vdev); vi->busy = false; if (vi->hwrng_register_done) -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/2] virtio-rng: skip reading when we start to remove the device
Before we really unregister the hwrng device, reading will get stuck if the virtio device is reset. We should return error for reading when we start to remove the device. Signed-off-by: Amos Kong Cc: sta...@vger.kernel.org --- drivers/char/hw_random/virtio-rng.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/char/hw_random/virtio-rng.c b/drivers/char/hw_random/virtio-rng.c index 849b228..132c9cc 100644 --- a/drivers/char/hw_random/virtio-rng.c +++ b/drivers/char/hw_random/virtio-rng.c @@ -36,6 +36,7 @@ struct virtrng_info { int index; bool busy; bool hwrng_register_done; + bool hwrng_removed; }; @@ -68,6 +69,9 @@ static int virtio_read(struct hwrng *rng, void *buf, size_t size, bool wait) int ret; struct virtrng_info *vi = (struct virtrng_info *)rng->priv; + if (vi->hwrng_removed) + return -ENODEV; + if (!vi->busy) { vi->busy = true; init_completion(&vi->have_data); @@ -137,6 +141,7 @@ static void remove_common(struct virtio_device *vdev) { struct virtrng_info *vi = vdev->priv; + vi->hwrng_removed = true; vi->data_avail = 0; complete(&vi->have_data); vdev->config->reset(vdev); -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] virtio-rng: fix stuck of hot-unplugging busy device
On Wed, Sep 10, 2014 at 11:04:54AM +0530, Amit Shah wrote: > Hi Amos, > > On (Tue) 09 Sep 2014 [19:14:02], Amos Kong wrote: > > When we try to hot-remove a busy virtio-rng device from QEMU monitor, > > the device can't be hot-removed. Because virtio-rng driver hangs at > > wait_for_completion_killable(). > > > > This patch exits the waiting by completing have_data completion before > > unregistering, resets data_avail to avoid the hwrng core use wrong > > buffer bytes. Before real unregistering we should return -ENODEV for > > reading. > > > > When we hot-unplug the device, dd process in guest will exit. > > $ dd if=/dev/hwrng of=/dev/null > > dd: error reading ‘/dev/hwrng’: No such device > > > > Signed-off-by: Amos Kong > > Cc: sta...@vger.kernel.org > > --- > > V2: reset data_avail (Amit) > > adjust unregister order > > Thanks, this looks good. > > Can you please split into two parts, the complete() in one, and the > hwrng_register_done stuff into another? I just posted the V3, split to two patches, and updated the commitlog to describe why we need to return ENODEV for reading. > Thanks, > > Amit -- Amos. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC virtio-rng: fail to read sysfs of a busy device
On Wed, Sep 10, 2014 at 11:22:12AM +0530, Amit Shah wrote: > On (Tue) 09 Sep 2014 [23:23:07], Amos Kong wrote: > > (Resend to fix the subject) > > > > Hi Amit, Rusty > > > > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1127062 > > steps: > > - Read random data by 'dd if=/dev/hwrng of=/dev/null' in guest > > - check sysfs files in the same time, 'cat /sys/class/misc/hw_random/rng_*' > > > > Result: cat process will get stuck, it will return if we kill dd process. > > How common is it going to be to have a long-running 'dd' process on > /dev/hwrng? Not a common usage, but we have this strict testing. > Also, with the new khwrng thread, reading from /dev/hwrng isn't > required -- just use /dev/random? Yes. > (This doesn't mean we shouldn't fix the issue here...) Completely agree :-) > > We have some static variables (eg, current_rng, data_avail, etc) in > > hw_random/core.c, > > they are protected by rng_mutex. I try to workaround this issue by > > undelay(100) > > after mutex_unlock() in rng_dev_read(). This gives chance for > > hwrng_attr_*_show() > > to get mutex. > > > > This patch also contains some cleanup, moving some code out of mutex > > protection. > > > > Do you have some suggestion? Thanks. > > > > > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > > index aa30a25..fa69020 100644 > > --- a/drivers/char/hw_random/core.c > > +++ b/drivers/char/hw_random/core.c > > @@ -194,6 +194,7 @@ static ssize_t rng_dev_read(struct file *filp, char > > __user *buf, > > } > > > > mutex_unlock(&rng_mutex); > > + udelay(100); > > We have a need_resched() right below. Why doesn't that work? need_resched() is giving chance for userspace to > > if (need_resched()) It never success in my debugging. If we remove this check and always call schedule_timeout_interruptible(1), problem also disappears. diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index aa30a25..263a370 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf, mutex_unlock(&rng_mutex); - if (need_resched()) - schedule_timeout_interruptible(1); + schedule_timeout_interruptible(1); if (signal_pending(current)) { err = -ERESTARTSYS; > > schedule_timeout_interruptible(1); > > @@ -233,10 +234,10 @@ static ssize_t hwrng_attr_current_store(struct device > > *dev, > > int err; > > struct hwrng *rng; > The following hunk doesn't work: > > > + err = -ENODEV; > > err = mutex_lock_interruptible(&rng_mutex); > > err is being set to another value in the next line! > > > if (err) > > return -ERESTARTSYS; > > - err = -ENODEV; > > And all usage of err below now won't have -ENODEV but some other value. Oops! > > list_for_each_entry(rng, &rng_list, list) { > > if (strcmp(rng->name, buf) == 0) { > > if (rng == current_rng) { > > @@ -270,8 +271,8 @@ static ssize_t hwrng_attr_current_show(struct device > > *dev, > > return -ERESTARTSYS; > > if (current_rng) > > name = current_rng->name; > > - ret = snprintf(buf, PAGE_SIZE, "%s\n", name); > > mutex_unlock(&rng_mutex); > > + ret = snprintf(buf, PAGE_SIZE, "%s\n", name); > > This looks OK... > > > > > return ret; > > } > > @@ -284,19 +285,19 @@ static ssize_t hwrng_attr_available_show(struct > > device *dev, > > ssize_t ret = 0; > > struct hwrng *rng; > > > > + buf[0] = '\0'; > > err = mutex_lock_interruptible(&rng_mutex); > > if (err) > > return -ERESTARTSYS; > > > > - buf[0] = '\0'; > > list_for_each_entry(rng, &rng_list, list) { > > strncat(buf, rng->name, PAGE_SIZE - ret - 1); > > ret += strlen(rng->name); > > strncat(buf, " ", PAGE_SIZE - ret - 1); > > ret++; > > } > > + mutex_unlock(&rng_mutex); > > strncat(buf, "\n", PAGE_SIZE - ret - 1); > > ret++; > > - mutex_unlock(&rng_mutex); > > But this isn't resulting in savings; the majority of the time is being > spent in the for loop, and that writes to the buffer. Right > BTW I don't expect strcat'ing to the buf in each of these scenarios is > a long operation, so this reworking doesn't strike to me as something > we should pursue. > > Amit -- Amos. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Live migration locks up 3.2 guests in do_timer(ticks ~ 500000)
On Mon, Sep 08, 2014 at 06:18:46PM +0200, Paolo Bonzini wrote: > Il 08/09/2014 17:56, Matt Mullins ha scritto: > >> > What host are you running? > > What information do you want that I missed in my first email? > > What version of QEMU? Can you try the 12.04 qemu (which IIRC is 1.0) on > top of the newer kernel? I did reproduce this on qemu 1.0.1. What would you like me to test next? I've got both VM hosts currently running 3.17-rc4, so I'll know tomorrow if that works. I've been looking into this off-and-on for a while now, and this time around I may have found other folks experiencing the same issue: https://issues.apache.org/jira/browse/CLOUDSTACK-6788 That one's a little empty on the details (do y'all know more about to whom that bug was "known" than I do?), but https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1297218 sees similar symptoms due to running NTP on the host. I may try disabling that after the current round of testing-3.17-rc4 finishes up. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html