Re: [v5][PATCH] KVM: PPC: Book3E HV: call RECONCILE_IRQ_STATE to sync the software state
On 10/23/2013 09:26 AM, Tiejun Chen wrote: We enter with interrupts disabled in hardware, but we need to call RECONCILE_IRQ_STATE anyway to ensure that the software state is kept in sync instead of calling hard_irq_disable() directly. Signed-off-by: Tiejun Chen tiejun.c...@windriver.com --- v5: Fix one typo in the comment. Alex, I already addressed Scott's comment, any further feedback? Tiejun v4: Fix one typo in the patch description. v3: Base on the latest tree, now we can use RECONCILE_IRQ_STATE instead of SOFT_DISABLE_INTS. v2: Move SOFT_DISABLE_INTS[1] earlier to avoid clobbering the arguments we want to pass to kvmppc_handle_exit. arch/powerpc/kvm/booke.c | 11 --- arch/powerpc/kvm/bookehv_interrupts.S | 11 +++ 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 15d0149..0d211ff 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -899,17 +899,6 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, int s; int idx; -#ifdef CONFIG_PPC64 - WARN_ON(local_paca-irq_happened != 0); -#endif - - /* -* We enter with interrupts disabled in hardware, but -* we need to call hard_irq_disable anyway to ensure that -* the software state is kept in sync. -*/ - hard_irq_disable(); - /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index e8ed7d6..191c32b 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -33,6 +33,8 @@ #ifdef CONFIG_64BIT #include asm/exception-64e.h +#include asm/hw_irq.h +#include asm/irqflags.h #else #include ../kernel/head_booke.h /* for THREAD_NORMSAVE() */ #endif @@ -465,6 +467,15 @@ _GLOBAL(kvmppc_resume_host) mtspr SPRN_EPCR, r3 isync +#ifdef CONFIG_64BIT + /* +* We enter with interrupts disabled in hardware, but +* we need to call RECONCILE_IRQ_STATE anyway to ensure +* that the software state is kept in sync. +*/ + RECONCILE_IRQ_STATE(r3,r5) +#endif + /* Switch to kernel stack and jump to handler. */ PPC_LL r3, HOST_RUN(r1) mr r5, r14 /* intno */ -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v3] KVM: PPC: Book3E HV: call RECONCILE_IRQ_STATE to sync the software state
On 10/21/2013 05:49 PM, Tiejun Chen wrote: We enter with interrupts disabled in hardware, but we need to call SOFT_DISABLE_INTS anyway to ensure that the software state OOPS! Here I should change this with RECONCILE_IRQ_STATE :( Please ignore this to see next version directly. Tiejun is kept in sync instead of calling hard_irq_disable() directly. Signed-off-by: Tiejun Chen tiejun.c...@windriver.com --- v3: Base on the latest tree, now we can use RECONCILE_IRQ_STATE instead of SOFT_DISABLE_INTS. v2: Move SOFT_DISABLE_INTS[1] earlier to avoid clobbering the arguments we want to pass to kvmppc_handle_exit. arch/powerpc/kvm/booke.c | 11 --- arch/powerpc/kvm/bookehv_interrupts.S | 11 +++ 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 15d0149..0d211ff 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -899,17 +899,6 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, int s; int idx; -#ifdef CONFIG_PPC64 - WARN_ON(local_paca-irq_happened != 0); -#endif - - /* -* We enter with interrupts disabled in hardware, but -* we need to call hard_irq_disable anyway to ensure that -* the software state is kept in sync. -*/ - hard_irq_disable(); - /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index e8ed7d6..4e867b1 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -33,6 +33,8 @@ #ifdef CONFIG_64BIT #include asm/exception-64e.h +#include asm/hw_irq.h +#include asm/irqflags.h #else #include ../kernel/head_booke.h /* for THREAD_NORMSAVE() */ #endif @@ -465,6 +467,15 @@ _GLOBAL(kvmppc_resume_host) mtspr SPRN_EPCR, r3 isync +#ifdef CONFIG_64BIT + /* +* We enter with interrupts disabled in hardware, but +* we need to call SOFT_DISABLE_INTS anyway to ensure +* that the software state is kept in sync. +*/ + RECONCILE_IRQ_STATE(r3,r5) +#endif + /* Switch to kernel stack and jump to handler. */ PPC_LL r3, HOST_RUN(r1) mr r5, r14 /* intno */ -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v2][PATCH 1/1] KVM: PPC: Book3E HV: call SOFT_DISABLE_INTS to sync the software state
On 08/15/2013 05:01 AM, Benjamin Herrenschmidt wrote: On Wed, 2013-08-14 at 14:34 -0500, Scott Wood wrote: On Wed, 2013-08-14 at 13:56 +0200, Alexander Graf wrote: On 07.08.2013, at 04:05, Tiejun Chen wrote: We enter with interrupts disabled in hardware, but we need to call SOFT_DISABLE_INTS anyway to ensure that the software state is kept in sync instead of calling hard_irq_disable() directly. Signed-off-by: Tiejun Chen tiejun.c...@windriver.com Scott, Ben, could you please ack? ACK, but it needs to be updated due to this patch that renames SOFT_DISABLE_INTS: http://patchwork.ozlabs.org/patch/259337/ Note that the latter was applied to powerpc -next already. Thanks Ben and Scott. Alex, Should I resend an updated patch after you sync powerpc -next? Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Revert kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
On 08/07/2013 04:50 AM, Scott Wood wrote: On Tue, 2013-08-06 at 17:31 +0800, Tiejun Chen wrote: We should revert this commit to rework. Signed-off-by: Tiejun Chen tiejun.c...@windriver.com This breaks bisect. Hmm... Maybe I can squash these two patches into one. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux pte
On 08/01/2013 07:12 PM, Bharat Bhushan wrote: KVM uses same WIM tlb attributes as the corresponding qemu pte. For this we now search the linux pte for the requested page and get these cache caching/coherency attributes from pte. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v1-v2 - Use Linux pte for wimge rather than RAM/no-RAM mechanism arch/powerpc/include/asm/kvm_host.h |2 +- arch/powerpc/kvm/booke.c|2 +- arch/powerpc/kvm/e500.h |8 +--- arch/powerpc/kvm/e500_mmu_host.c| 31 ++- 4 files changed, 25 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 3328353..583d405 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -535,6 +535,7 @@ struct kvm_vcpu_arch { #endif gpa_t paddr_accessed; gva_t vaddr_accessed; + pgd_t *pgdir; u8 io_gpr; /* GPR used as IO source/target */ u8 mmio_is_bigendian; @@ -592,7 +593,6 @@ struct kvm_vcpu_arch { struct list_head run_list; struct task_struct *run_task; struct kvm_run *kvm_run; - pgd_t *pgdir; spinlock_t vpa_update_lock; struct kvmppc_vpa vpa; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 17722d8..eb2 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -697,7 +697,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) #endif kvmppc_fix_ee_before_entry(); - + vcpu-arch.pgdir = current-mm-pgd; ret = __kvmppc_vcpu_run(kvm_run, vcpu); /* No need for kvm_guest_exit. It's done in handle_exit. diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h index 4fd9650..fc4b2f6 100644 --- a/arch/powerpc/kvm/e500.h +++ b/arch/powerpc/kvm/e500.h @@ -31,11 +31,13 @@ enum vcpu_ftr { #define E500_TLB_NUM 2 /* entry is mapped somewhere in host TLB */ -#define E500_TLB_VALID (1 0) +#define E500_TLB_VALID (1 31) /* TLB1 entry is mapped by host TLB1, tracked by bitmaps */ -#define E500_TLB_BITMAP(1 1) +#define E500_TLB_BITMAP(1 30) /* TLB1 entry is mapped by host TLB0 */ -#define E500_TLB_TLB0 (1 2) +#define E500_TLB_TLB0 (1 29) +/* Lower 5 bits have WIMGE value */ +#define E500_TLB_WIMGE_MASK(0x1f) struct tlbe_ref { pfn_t pfn; /* valid only for TLB0, except briefly */ diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..9b10b0b 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,15 +64,6 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) -{ -#ifdef CONFIG_SMP - return (mas2 MAS2_ATTRIB_MASK) | MAS2_M; -#else - return mas2 MAS2_ATTRIB_MASK; -#endif -} - /* * writing shadow tlb entry to host TLB */ @@ -248,10 +239,12 @@ static inline int tlbe_is_writable(struct kvm_book3e_206_tlb_entry *tlbe) static inline void kvmppc_e500_ref_setup(struct tlbe_ref *ref, struct kvm_book3e_206_tlb_entry *gtlbe, -pfn_t pfn) +pfn_t pfn, int wimg) { ref-pfn = pfn; ref-flags |= E500_TLB_VALID; + /* Use guest supplied MAS2_G and MAS2_E */ + ref-flags |= (gtlbe-mas2 MAS2_ATTRIB_MASK) | wimg; if (tlbe_is_writable(gtlbe)) kvm_set_pfn_dirty(pfn); @@ -312,8 +305,7 @@ static void kvmppc_e500_setup_stlbe( /* Force IPROT=0 for all guest mappings. */ stlbe-mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID; - stlbe-mas2 = (gvaddr MAS2_EPN) | - e500_shadow_mas2_attrib(gtlbe-mas2, pr); + stlbe-mas2 = (gvaddr MAS2_EPN) | (ref-flags E500_TLB_WIMGE_MASK); stlbe-mas7_3 = ((u64)pfn PAGE_SHIFT) | e500_shadow_mas3_attrib(gtlbe-mas7_3, pr); @@ -332,6 +324,8 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, unsigned long hva; int pfnmap = 0; int tsize = BOOK3E_PAGESZ_4K; + pte_t pte; + int wimg = 0; /* * Translate guest physical to true physical, acquiring @@ -437,6 +431,8 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, if (likely(!pfnmap)) { unsigned long tsize_pages = 1 (tsize + 10 - PAGE_SHIFT); + pgd_t *pgdir; + pfn = gfn_to_pfn_memslot(slot, gfn); if (is_error_noslot_pfn(pfn)) { printk(KERN_ERR Couldn't get real page for gfn %lx!\n, @@ -447,9 +443,18 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500
Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
On 08/01/2013 07:12 PM, Bharat Bhushan wrote: KVM need to lookup linux pte for getting TLB attributes (WIMGE). This is similar to how book3s does. This will be used in follow-up patches. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v1-v2 - This is a new change in this version arch/powerpc/include/asm/kvm_booke.h | 73 ++ 1 files changed, 73 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h index d3c1eb3..903624d 100644 --- a/arch/powerpc/include/asm/kvm_booke.h +++ b/arch/powerpc/include/asm/kvm_booke.h @@ -102,4 +102,77 @@ static inline ulong kvmppc_get_msr(struct kvm_vcpu *vcpu) { return vcpu-arch.shared-msr; } + +/* + * Lock and read a linux PTE. If it's present and writable, atomically + * set dirty and referenced bits and return the PTE, otherwise return 0. + */ +static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int writing) +{ + pte_t pte; + +#ifdef PTE_ATOMIC_UPDATES + pte_t tmp; +/* wait until _PAGE_BUSY is clear then set it atomically */ +#ifdef CONFIG_PPC64 + __asm__ __volatile__ ( + 1:ldarx %0,0,%3\n + andi. %1,%0,%4\n + bne-1b\n + ori %1,%0,%4\n + stdcx. %1,0,%3\n + bne-1b + : =r (pte), =r (tmp), =m (*p) + : r (p), i (_PAGE_BUSY) + : cc); +#else +__asm__ __volatile__ ( +1: lwarx %0,0,%3\n + andi. %1,%0,%4\n + bne-1b\n + ori %1,%0,%4\n + stwcx. %1,0,%3\n + bne-1b +: =r (pte), =r (tmp), =m (*p) +: r (p), i (_PAGE_BUSY) +: cc); +#endif +#else + pte = pte_val(*p); +#endif + + if (pte_present(pte)) { + pte = pte_mkyoung(pte); + if (writing pte_write(pte)) + pte = pte_mkdirty(pte); + } + + *p = pte; /* clears _PAGE_BUSY */ + + return pte; +} + +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva, + int writing, unsigned long *pte_sizep) Looks this function is as same as book3s, so why not improve that as common :) Tiejun +{ + pte_t *ptep; + unsigned long ps = *pte_sizep; + unsigned int shift; + + ptep = find_linux_pte_or_hugepte(pgdir, hva, shift); + if (!ptep) + return __pte(0); + if (shift) + *pte_sizep = 1ul shift; + else + *pte_sizep = PAGE_SIZE; + + if (ps *pte_sizep) + return __pte(0); + if (!pte_present(*ptep)) + return __pte(0); + + return kvmppc_read_update_linux_pte(ptep, writing); +} + #endif /* __ASM_KVM_BOOKE_H__ */ -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/6 v2] kvm: powerpc: allow guest control G attribute in mas2
On 08/01/2013 07:12 PM, Bharat Bhushan wrote: G bit in MAS2 indicates whether the page is Guarded. There is no reason to stop guest setting E, so allow him. Could we merge patch 2 and 3 into only one. Tiejun Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v1-v2 - no change arch/powerpc/kvm/e500.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h index 277cb18..4fd9650 100644 --- a/arch/powerpc/kvm/e500.h +++ b/arch/powerpc/kvm/e500.h @@ -117,7 +117,7 @@ static inline struct kvmppc_vcpu_e500 *to_e500(struct kvm_vcpu *vcpu) #define E500_TLB_USER_PERM_MASK (MAS3_UX|MAS3_UR|MAS3_UW) #define E500_TLB_SUPER_PERM_MASK (MAS3_SX|MAS3_SR|MAS3_SW) #define MAS2_ATTRIB_MASK \ - (MAS2_X0 | MAS2_X1 | MAS2_E) + (MAS2_X0 | MAS2_X1 | MAS2_E | MAS2_G) #define MAS3_ATTRIB_MASK \ (MAS3_U0 | MAS3_U1 | MAS3_U2 | MAS3_U3 \ | E500_TLB_USER_PERM_MASK | E500_TLB_SUPER_PERM_MASK) -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 06:27 PM, Alexander Graf wrote: On 18.07.2013, at 12:19, “tiejun.chen” wrote: On 07/18/2013 06:12 PM, Alexander Graf wrote: On 18.07.2013, at 12:08, “tiejun.chen” wrote: On 07/18/2013 05:48 PM, Alexander Graf wrote: On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote: -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, July 18, 2013 1:53 PM To: '�tiejun.chen�' Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages -Original Message- From: �tiejun.chen� [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 1:52 PM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of �tiejun.chen� Sent: Thursday, July 18, 2013 1:01 PM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: �tiejun.chen� [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 11:56 AM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets M bit (coherent, cacheable) else this is treated as I/O and we set I + G (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { Why not directly use kvm_is_mmio_pfn()? What I understand from this function (someone can correct me) is that it returns false when the page is managed by kernel and is not marked as RESERVED (for some reason). For us it does not matter whether the page is reserved or not, if it is kernel visible page then it is DDR. I think you are setting I|G by addressing all mmio pages, right? If so, KVM: direct mmio pfn check Userspace may specify memory slots that are backed by mmio pages rather than normal RAM. In some cases it is not enough to identify these mmio pages by pfn_valid(). This patch adds checking the PageReserved as well. Do you know what are those some cases and how checking PageReserved helps in those cases? No, myself didn't see these actual cases in qemu,too. But this should be chronically persistent as I understand ;-) Then I will wait till someone educate me :) The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary. It certainly does more than we need and potentially slows down the fast path (RAM mapping). The only thing it does on top of if (pfn_valid()) is to check for pages that are declared reserved on the host. This happens in 2 cases: 1) Non cache coherent DMA 2) Memory hot remove The non coherent DMA case would be interesting, as with the mechanism as it is in place in Linux today, we could potentially break normal guest operation if we don't take it into account. However, it's Kconfig guarded by: depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON default n if PPC_47x default y so we never hit it with any core we care about ;). Memory hot remove does not exist on e500 FWIW, so we don't have to worry about that one either. Thanks for this good information :) So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside kvm_is_mmio_pfn() to make sure that check is only valid when that is really needed? This can decrease those unnecessary performance loss. If I'm wrong please correct me :) You're perfectly right, but this is generic KVM code. So it gets run across all
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets M bit (coherent, cacheable) else this is treated as I/O and we set I + G (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { Why not directly use kvm_is_mmio_pfn()? Tiejun + mas2_attr |= MAS2_I | MAS2_G; + } else { #ifdef CONFIG_SMP - return (mas2 MAS2_ATTRIB_MASK) | MAS2_M; -#else - return mas2 MAS2_ATTRIB_MASK; + mas2_attr |= MAS2_M; #endif + } + return mas2_attr; } /* @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe( /* Force IPROT=0 for all guest mappings. */ stlbe-mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID; stlbe-mas2 = (gvaddr MAS2_EPN) | - e500_shadow_mas2_attrib(gtlbe-mas2, pr); + e500_shadow_mas2_attrib(gtlbe-mas2, pfn); stlbe-mas7_3 = ((u64)pfn PAGE_SHIFT) | e500_shadow_mas3_attrib(gtlbe-mas7_3, pr); -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: “tiejun.chen” [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 11:56 AM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets M bit (coherent, cacheable) else this is treated as I/O and we set I + G (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { Why not directly use kvm_is_mmio_pfn()? What I understand from this function (someone can correct me) is that it returns false when the page is managed by kernel and is not marked as RESERVED (for some reason). For us it does not matter whether the page is reserved or not, if it is kernel visible page then it is DDR. I think you are setting I|G by addressing all mmio pages, right? If so, KVM: direct mmio pfn check Userspace may specify memory slots that are backed by mmio pages rather than normal RAM. In some cases it is not enough to identify these mmio pages by pfn_valid(). This patch adds checking the PageReserved as well. Tiejun -Bharat Tiejun + mas2_attr |= MAS2_I | MAS2_G; + } else { #ifdef CONFIG_SMP - return (mas2 MAS2_ATTRIB_MASK) | MAS2_M; -#else - return mas2 MAS2_ATTRIB_MASK; + mas2_attr |= MAS2_M; #endif + } + return mas2_attr; } /* @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe( /* Force IPROT=0 for all guest mappings. */ stlbe-mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID; stlbe-mas2 = (gvaddr MAS2_EPN) | - e500_shadow_mas2_attrib(gtlbe-mas2, pr); + e500_shadow_mas2_attrib(gtlbe-mas2, pfn); stlbe-mas7_3 = ((u64)pfn PAGE_SHIFT) | e500_shadow_mas3_attrib(gtlbe-mas7_3, pr); -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of “tiejun.chen” Sent: Thursday, July 18, 2013 1:01 PM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: “tiejun.chen” [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 11:56 AM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets M bit (coherent, cacheable) else this is treated as I/O and we set I + G (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { Why not directly use kvm_is_mmio_pfn()? What I understand from this function (someone can correct me) is that it returns false when the page is managed by kernel and is not marked as RESERVED (for some reason). For us it does not matter whether the page is reserved or not, if it is kernel visible page then it is DDR. I think you are setting I|G by addressing all mmio pages, right? If so, KVM: direct mmio pfn check Userspace may specify memory slots that are backed by mmio pages rather than normal RAM. In some cases it is not enough to identify these mmio pages by pfn_valid(). This patch adds checking the PageReserved as well. Do you know what are those some cases and how checking PageReserved helps in those cases? No, myself didn't see these actual cases in qemu,too. But this should be chronically persistent as I understand ;-) Tiejun -Bharat Tiejun -Bharat Tiejun + mas2_attr |= MAS2_I | MAS2_G; + } else { #ifdef CONFIG_SMP - return (mas2 MAS2_ATTRIB_MASK) | MAS2_M; -#else - return mas2 MAS2_ATTRIB_MASK; + mas2_attr |= MAS2_M; #endif + } + return mas2_attr; } /* @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe( /* Force IPROT=0 for all guest mappings. */ stlbe-mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID; stlbe-mas2 = (gvaddr MAS2_EPN) | - e500_shadow_mas2_attrib(gtlbe-mas2, pr); + e500_shadow_mas2_attrib(gtlbe-mas2, pfn); stlbe-mas7_3 = ((u64)pfn PAGE_SHIFT) | e500_shadow_mas3_attrib(gtlbe-mas7_3, pr); -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets M bit (coherent, cacheable) else this is treated as I/O and we set I + G (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { + mas2_attr |= MAS2_I | MAS2_G; + } else { #ifdef CONFIG_SMP - return (mas2 MAS2_ATTRIB_MASK) | MAS2_M; -#else - return mas2 MAS2_ATTRIB_MASK; + mas2_attr |= MAS2_M; #endif + } Additionally, in UP case this little chunk of code is equivalent to if (1) { mas2_attr |= MAS2_I | MAS2_G; } else { } So you'd better wrapper MAS2_m in advance like, #ifdef CONFIG_SMP #define M_IF_SMPMAS2_M #else #define M_IF_SMP0 #endif Then if (1) mas2_attr |= MAS2_I | MAS2_G; else mas2_attr |= M_IF_SMP; Tiejun + return mas2_attr; } /* @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe( /* Force IPROT=0 for all guest mappings. */ stlbe-mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID; stlbe-mas2 = (gvaddr MAS2_EPN) | - e500_shadow_mas2_attrib(gtlbe-mas2, pr); + e500_shadow_mas2_attrib(gtlbe-mas2, pfn); stlbe-mas7_3 = ((u64)pfn PAGE_SHIFT) | e500_shadow_mas3_attrib(gtlbe-mas7_3, pr); -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, July 18, 2013 1:53 PM To: '“tiejun.chen”' Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages -Original Message- From: “tiejun.chen” [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 1:52 PM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of “tiejun.chen” Sent: Thursday, July 18, 2013 1:01 PM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: “tiejun.chen” [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 11:56 AM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets M bit (coherent, cacheable) else this is treated as I/O and we set I + G (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { Why not directly use kvm_is_mmio_pfn()? What I understand from this function (someone can correct me) is that it returns false when the page is managed by kernel and is not marked as RESERVED (for some reason). For us it does not matter whether the page is reserved or not, if it is kernel visible page then it is DDR. I think you are setting I|G by addressing all mmio pages, right? If so, KVM: direct mmio pfn check Userspace may specify memory slots that are backed by mmio pages rather than normal RAM. In some cases it is not enough to identify these mmio pages by pfn_valid(). This patch adds checking the PageReserved as well. Do you know what are those some cases and how checking PageReserved helps in those cases? No, myself didn't see these actual cases in qemu,too. But this should be chronically persistent as I understand ;-) Then I will wait till someone educate me :) The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary. Furthermore, how to distinguish we're creating TLB entry for the device assigned directly to the GS? I think its unnecessary to always check if that is mmio's pfn since we have more non direct assigned devices. So maybe we can introduce another helper to fixup that TLB entry in instead of this path. Tiejun -Bharat + mas2_attr |= MAS2_I | MAS2_G; + } else { #ifdef CONFIG_SMP - return (mas2 MAS2_ATTRIB_MASK) | MAS2_M; -#else - return mas2 MAS2_ATTRIB_MASK; + mas2_attr |= MAS2_M; #endif + } + return mas2_attr; } /* @@ -313,7 +320,7 @@ static void kvmppc_e500_setup_stlbe( /* Force IPROT=0 for all guest mappings. */ stlbe-mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID; stlbe-mas2 = (gvaddr MAS2_EPN) | - e500_shadow_mas2_attrib(gtlbe-mas2, pr); + e500_shadow_mas2_attrib(gtlbe-mas2, pfn); stlbe-mas7_3 = ((u64)pfn PAGE_SHIFT) | e500_shadow_mas3_attrib(gtlbe-mas7_3, pr); -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 05:44 PM, Alexander Graf wrote: On 18.07.2013, at 10:55, �tiejun.chen� wrote: On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, July 18, 2013 1:53 PM To: '�tiejun.chen�' Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages -Original Message- From: �tiejun.chen� [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 1:52 PM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of �tiejun.chen� Sent: Thursday, July 18, 2013 1:01 PM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: �tiejun.chen� [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 11:56 AM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets M bit (coherent, cacheable) else this is treated as I/O and we set I + G (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { Why not directly use kvm_is_mmio_pfn()? What I understand from this function (someone can correct me) is that it returns false when the page is managed by kernel and is not marked as RESERVED (for some reason). For us it does not matter whether the page is reserved or not, if it is kernel visible page then it is DDR. I think you are setting I|G by addressing all mmio pages, right? If so, KVM: direct mmio pfn check Userspace may specify memory slots that are backed by mmio pages rather than normal RAM. In some cases it is not enough to identify these mmio pages by pfn_valid(). This patch adds checking the PageReserved as well. Do you know what are those some cases and how checking PageReserved helps in those cases? No, myself didn't see these actual cases in qemu,too. But this should be chronically persistent as I understand ;-) Then I will wait till someone educate me :) The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary. Furthermore, how to distinguish we're creating TLB entry for the device assigned directly to the GS? Because other devices wouldn't be available to the guest through memory slots. Yes. I think its unnecessary to always check if that is mmio's pfn since we have more non direct assigned devices. I'm not sure I understand. The shadow TLB code only knows here is a host virtual address. It needs to figure out whether the host physical address behind that is RAM (can access with cache enabled) or not (has to disable cache) Sorry, looks I'm misleading you :-P So maybe we can introduce another helper to fixup that TLB entry in instead of this path. This path does fix up the shadow (host) TLB entry :). I just mean whether we can have a separate path dedicated to those direct assigned devices, not go this common path :) Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 06:00 PM, Alexander Graf wrote: On 18.07.2013, at 11:56, “tiejun.chen” wrote: On 07/18/2013 05:44 PM, Alexander Graf wrote: On 18.07.2013, at 10:55, �tiejun.chen� wrote: On 07/18/2013 04:25 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, July 18, 2013 1:53 PM To: '�tiejun.chen�' Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages -Original Message- From: �tiejun.chen� [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 1:52 PM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of �tiejun.chen� Sent: Thursday, July 18, 2013 1:01 PM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: �tiejun.chen� [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 11:56 AM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets M bit (coherent, cacheable) else this is treated as I/O and we set I + G (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { Why not directly use kvm_is_mmio_pfn()? What I understand from this function (someone can correct me) is that it returns false when the page is managed by kernel and is not marked as RESERVED (for some reason). For us it does not matter whether the page is reserved or not, if it is kernel visible page then it is DDR. I think you are setting I|G by addressing all mmio pages, right? If so, KVM: direct mmio pfn check Userspace may specify memory slots that are backed by mmio pages rather than normal RAM. In some cases it is not enough to identify these mmio pages by pfn_valid(). This patch adds checking the PageReserved as well. Do you know what are those some cases and how checking PageReserved helps in those cases? No, myself didn't see these actual cases in qemu,too. But this should be chronically persistent as I understand ;-) Then I will wait till someone educate me :) The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary. Furthermore, how to distinguish we're creating TLB entry for the device assigned directly to the GS? Because other devices wouldn't be available to the guest through memory slots. Yes. I think its unnecessary to always check if that is mmio's pfn since we have more non direct assigned devices. I'm not sure I understand. The shadow TLB code only knows here is a host virtual address. It needs to figure out whether the host physical address behind that is RAM (can access with cache enabled) or not (has to disable cache) Sorry, looks I'm misleading you :-P So maybe we can introduce another helper to fixup that TLB entry in instead of this path. This path does fix up the shadow (host) TLB entry :). I just mean whether we can have a separate path dedicated to those direct assigned devices, not go this common path :) I don't think it's possible to have a separate path without a certain level of trust. In the current flow we don't trust anyone. We just check every translated page whether we should enable caching or not. We could take that information from 2 other side though: 1) Memory Slot 2) Guest TLB Flags If we take it from the memory
Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages
On 07/18/2013 06:12 PM, Alexander Graf wrote: On 18.07.2013, at 12:08, “tiejun.chen” wrote: On 07/18/2013 05:48 PM, Alexander Graf wrote: On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote: -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, July 18, 2013 1:53 PM To: '�tiejun.chen�' Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages -Original Message- From: �tiejun.chen� [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 1:52 PM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of �tiejun.chen� Sent: Thursday, July 18, 2013 1:01 PM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: �tiejun.chen� [mailto:tiejun.c...@windriver.com] Sent: Thursday, July 18, 2013 11:56 AM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages On 07/18/2013 02:04 PM, Bharat Bhushan wrote: If there is a struct page for the requested mapping then it's normal DDR and the mapping sets M bit (coherent, cacheable) else this is treated as I/O and we set I + G (cache inhibited, guarded) This helps setting proper TLB mapping for direct assigned device Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/e500_mmu_host.c | 17 - 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..089c227 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,20 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 MAS2_ATTRIB_MASK; + + if (!pfn_valid(pfn)) { Why not directly use kvm_is_mmio_pfn()? What I understand from this function (someone can correct me) is that it returns false when the page is managed by kernel and is not marked as RESERVED (for some reason). For us it does not matter whether the page is reserved or not, if it is kernel visible page then it is DDR. I think you are setting I|G by addressing all mmio pages, right? If so, KVM: direct mmio pfn check Userspace may specify memory slots that are backed by mmio pages rather than normal RAM. In some cases it is not enough to identify these mmio pages by pfn_valid(). This patch adds checking the PageReserved as well. Do you know what are those some cases and how checking PageReserved helps in those cases? No, myself didn't see these actual cases in qemu,too. But this should be chronically persistent as I understand ;-) Then I will wait till someone educate me :) The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do not want to call this for all tlbwe operation unless it is necessary. It certainly does more than we need and potentially slows down the fast path (RAM mapping). The only thing it does on top of if (pfn_valid()) is to check for pages that are declared reserved on the host. This happens in 2 cases: 1) Non cache coherent DMA 2) Memory hot remove The non coherent DMA case would be interesting, as with the mechanism as it is in place in Linux today, we could potentially break normal guest operation if we don't take it into account. However, it's Kconfig guarded by: depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON default n if PPC_47x default y so we never hit it with any core we care about ;). Memory hot remove does not exist on e500 FWIW, so we don't have to worry about that one either. Thanks for this good information :) So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside kvm_is_mmio_pfn() to make sure that check is only valid when that is really needed? This can decrease those unnecessary performance loss. If I'm wrong please correct me :) You're perfectly right, but this is generic KVM code. So it gets run across all architectures. What if someone has the great idea to add a new case here for x86, but doesn't tell us
Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()
On 07/13/2013 07:05 AM, Benjamin Herrenschmidt wrote: On Fri, 2013-07-12 at 12:50 -0500, Scott Wood wrote: [1] SOFT_DISABLE_INTS seems an odd name for something that updates the software state to be consistent with interrupts being *hard* disabled. I can sort of see the logic in it, but it's confusing when first encountered. From the name it looks like all it would do is set soft_enabled to 1. It's indeed odd. Also worse when we use DISABLE_INTS which is just a macro on top of SOFT_DISABLE_INTS :-) I've been wanting to change the macro name for a while now and never got to it. Patch welcome :-) What about SOFT_IRQ_DISABLE? This is close to name hard_irq_disable() :) And then remove all DISABLE_INTS as well? Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()
On 07/13/2013 01:50 AM, Scott Wood wrote: On 07/11/2013 10:22:28 PM, tiejun.chen wrote: If so, why not to remove directly hard_irq_disable() inside kvmppc_handle_exit() by reverting that commit, kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()? Then we can use SOFT_DISABLE_INTS() explicitly before call kvmppc_handle_exit() like this: KVM: PPC: Book3E HV: call SOFT_DISABLE_INTS to sync the software state We enter with interrupts disabled in hardware, but we need to call SOFT_DISABLE_INTS anyway to ensure that the software state is kept in sync. Signed-off-by: Tiejun Chen tiejun.c...@windriver.com diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index e8ed7d6..b521d21 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -33,6 +33,8 @@ #ifdef CONFIG_64BIT #include asm/exception-64e.h +#include asm/hw_irq.h +#include asm/irqflags.h #else #include ../kernel/head_booke.h /* for THREAD_NORMSAVE() */ #endif @@ -469,6 +471,14 @@ _GLOBAL(kvmppc_resume_host) PPC_LL r3, HOST_RUN(r1) mr r5, r14 /* intno */ mr r14, r4 /* Save vcpu pointer. */ +#ifdef CONFIG_64BIT + /* +* We enter with interrupts disabled in hardware, but +* we need to call SOFT_DISABLE_INTS anyway to ensure +* that the software state is kept in sync. +*/ + SOFT_DISABLE_INTS(r7,r8) +#endif bl kvmppc_handle_exit This will clobber the arguments we want to pass to kvmppc_handle_exit. That can be fixed by moving SOFT_DISABLE_INTS[1] earlier, and maybe it's more idiomatic Okay. Once we have a final name to replace SOFT_DISABLE_INTS, I can regenerate this as you comment. to use SOFT_DISABLE_INTS rather than what we currently do, but we still want to fix hard_irq_disable(). There are other places where we call hard_irq_disable() where interrupts (and I believe preemption) were previously enabled. Yes, I had a preliminary change ACKed by Ben, and I guess you also saw :) so I'll send that firstly. Thanks, Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()
On 07/14/2013 12:13 PM, Benjamin Herrenschmidt wrote: On Fri, 2013-07-12 at 12:54 +0800, tiejun.chen wrote: Is the following fine? powerpc: to access local paca after hard irq disabled We can access paca directly after hard interrupt disabled, and this can avoid accessing wrong paca when using get_paca() in preempt case. Signed-off-by: Tiejun Chen tiejun.c...@windriver.com Ack. We still have an unresolved problem where gcc decides to copy r13 to another register and then index from that, or even store and reload it, and this possibly accross preempt sections. It's unclear to me in what circumstances it will do it and whether there's a case of us getting completely screwed over, I need to investigate. This is the reason why we originally made the accesses to soft_enabled be inline asm. Understood. We might need to do a bulk conversion of all PACA accesses to either such inline asm or hide r13 behind asm (forcing essentially a copy to another register on each use) or a combination of both. IE. inline asm for direct access of things like soft_enabled, and a get_paca/put_paca style interface that copies r13 and includes a preempt_disable/enable for the rest. I'd like to check this possibility later. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()
On 07/15/2013 10:47 AM, Benjamin Herrenschmidt wrote: On Mon, 2013-07-15 at 10:20 +0800, tiejun.chen wrote: What about SOFT_IRQ_DISABLE? This is close to name hard_irq_disable() :) And then remove all DISABLE_INTS as well? Or RECONCILE_IRQ_STATE... But sounds this doesn't imply this key point that the soft-irq is always disabled here :) And as I understand, the irq state is always needed to be reconciled when we disable soft irq, right? Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()
On 07/12/2013 08:19 AM, Benjamin Herrenschmidt wrote: On Thu, 2013-07-11 at 15:07 +0200, Alexander Graf wrote: Ok, let me quickly explain the problem. We are leaving host context, switching slowly into guest context. During that transition we call get_paca() indirectly (apparently by another call to hard_disable() which sounds bogus, but that's another story). get_paca() warns when we're preemptible. We're only not preemptible when either preempt is disabled or irqs are disabled. Irqs are disabled, but arch_irqs_disabled() doesn't know, because it only checks for soft disabled IRQs. So we can fix this either by setting IRQs as soft disabled as well or by disabling preemption until we enter the guest for real. Any preferences? Well, if you hard disable first (ie, direct transition from full enabled to hard disabled), you know you have nothing lazy pending in irq_pending, then it's ok to mess around with local_paca-soft_enabled to make it look disabled. IE. Call hard_irq_disable(), then only turn local_paca-soft_enabled back on late in the process, some time before the final rfi(d). That works as long as you had a transition from full enabled to full disabled and don't hard re-enable in the process. IE, You are certain that there is nothing pending in irq_happened. HOWEVER ! If you do that, you *ALSO* need to clear irq_happened. You must *NEVER* leave PACA_IRQ_HARD_DIS set in irq_happened if you are soft-enabled, and since the above means that you *will* be seen as soft enabled on the way out of the guest, you can kaboom. BTW. I'm fine with a patch that does: #define hard_irq_disable() do {\ u8 _was_enabled = get_paca()-soft_enabled; \ Current problem I met is issued from the above line. __hard_irq_disable(); \ - get_paca()-soft_enabled = 0;\ Not here. If I'm misunderstanding what you guys means, please correct me since this is a long discussion thread. I have to reread that carefully. Tiejun + local_paca-soft_enabled = 0;\ In fact we should probably do it anyway. Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()
On 07/12/2013 12:36 AM, Scott Wood wrote: On 07/11/2013 11:30:41 AM, Alexander Graf wrote: On 11.07.2013, at 18:18, Scott Wood wrote: On 07/11/2013 08:07:30 AM, Alexander Graf wrote: get_paca() warns when we're preemptible. We're only not preemptible when either preempt is disabled or irqs are disabled. Irqs are disabled, but arch_irqs_disabled() doesn't know, because it only checks for soft disabled IRQs. So we can fix this either by setting IRQs as soft disabled as well If we set IRQs as soft-disabled prior to calling hard_irq_disable(), then hard_irq_disable() will fail to call trace_hardirqs_off(). Right... Plus we'd have the same problem trying to set soft_enabled to 0. Any preferences? Use arch_local_save_flags() in hard_irq_disable() instead of reading soft_enabled with C code. That only operates on the soft_enabled bit. We also need to access irq_happened. OK, so we'll need more inline asm. If so, why not to remove directly hard_irq_disable() inside kvmppc_handle_exit() by reverting that commit, kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()? Then we can use SOFT_DISABLE_INTS() explicitly before call kvmppc_handle_exit() like this: KVM: PPC: Book3E HV: call SOFT_DISABLE_INTS to sync the software state We enter with interrupts disabled in hardware, but we need to call SOFT_DISABLE_INTS anyway to ensure that the software state is kept in sync. Signed-off-by: Tiejun Chen tiejun.c...@windriver.com diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index e8ed7d6..b521d21 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -33,6 +33,8 @@ #ifdef CONFIG_64BIT #include asm/exception-64e.h +#include asm/hw_irq.h +#include asm/irqflags.h #else #include ../kernel/head_booke.h /* for THREAD_NORMSAVE() */ #endif @@ -469,6 +471,14 @@ _GLOBAL(kvmppc_resume_host) PPC_LL r3, HOST_RUN(r1) mr r5, r14 /* intno */ mr r14, r4 /* Save vcpu pointer. */ +#ifdef CONFIG_64BIT + /* +* We enter with interrupts disabled in hardware, but +* we need to call SOFT_DISABLE_INTS anyway to ensure +* that the software state is kept in sync. +*/ + SOFT_DISABLE_INTS(r7,r8) +#endif bl kvmppc_handle_exit /* Restore vcpu pointer and the nonvolatiles we used. */ Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()
On 07/12/2013 11:57 AM, Benjamin Herrenschmidt wrote: On Fri, 2013-07-12 at 10:13 +0800, tiejun.chen wrote: #define hard_irq_disable()do {\ u8 _was_enabled = get_paca()-soft_enabled; \ Current problem I met is issued from the above line. __hard_irq_disable(); \ - get_paca()-soft_enabled = 0; \ Not here. If I'm misunderstanding what you guys means, please correct me since this is a long discussion thread. I have to reread that carefully. Then make it u8 _was_enabled; __hard_irq_disable(); was_enabled = local_paca- Once you have hard disabled, using local_paca directly *should* be safe (minus that gcc problem I mentioned). Is the following fine? powerpc: to access local paca after hard irq disabled We can access paca directly after hard interrupt disabled, and this can avoid accessing wrong paca when using get_paca() in preempt case. Signed-off-by: Tiejun Chen tiejun.c...@windriver.com --- arch/powerpc/include/asm/hw_irq.h |7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h index ba713f1..10be1dd 100644 --- a/arch/powerpc/include/asm/hw_irq.h +++ b/arch/powerpc/include/asm/hw_irq.h @@ -96,10 +96,11 @@ static inline bool arch_irqs_disabled(void) #endif #define hard_irq_disable() do {\ - u8 _was_enabled = get_paca()-soft_enabled; \ + u8 _was_enabled;\ __hard_irq_disable(); \ - get_paca()-soft_enabled = 0; \ - get_paca()-irq_happened |= PACA_IRQ_HARD_DIS; \ + _was_enabled = local_paca-soft_enabled;\ + local_paca-soft_enabled = 0; \ + local_paca-irq_happened |= PACA_IRQ_HARD_DIS; \ if (_was_enabled) \ trace_hardirqs_off(); \ } while(0) -- 1.7.9.5 Or what about that change to call SOFT_DISABLE_INTS only in KVM scenario? Which better? Then I can send to review? Thanks, Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/8] kvm/ppc: fixes for 3.10
On 07/10/2013 05:33 AM, Alexander Graf wrote: On 09.07.2013, at 11:09, tiejun.chen wrote: On 07/09/2013 04:53 PM, Alexander Graf wrote: Am 09.07.2013 um 09:21 schrieb tiejun.chen tiejun.c...@windriver.com: On 06/07/2013 08:16 AM, Scott Wood wrote: Most of these have been posted before, but I grouped them together as there are some contextual dependencies between them. Gleb/Paolo: As Alex doesn't appear to be back yet, can you apply these if there's no objection over the next few days? Scott and Alex, Any plan to merge these commits into git://github.com/agraf/linux-2.6.git as well? Which commits exactly are you missing there? Alex, Four patches, kvm/ppc/booke: Delay kvmppc_lazy_ee_enable kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit() kvm/ppc/booke: Hold srcu lock when calling gfn functions kvm/ppc/booke64: Disable e6500 support are already in Linus's tree, https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/?qt=authorq=Scott+Wood So should we sync these commits into your tree? They should already be in there? Yes, I already see this. Thanks, Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()
On 07/10/2013 05:49 PM, Alexander Graf wrote: On 10.07.2013, at 08:02, Tiejun Chen wrote: We should ensure the preemption cannot occur while calling get_paca() insdide hard_irq_disable(), otherwise the paca_struct may be the wrong one just after. And btw, we may update timing stats in this case. Signed-off-by: Tiejun Chen tiejun.c...@windriver.com --- arch/powerpc/kvm/booke.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index dcc94f0..9dae25d 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -839,6 +839,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, WARN_ON(local_paca-irq_happened != 0); #endif + preempt_disable(); /* * We enter with interrupts disabled in hardware, but * we need to call hard_irq_disable anyway to ensure that @@ -848,6 +849,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); + preempt_enable(); All of the code here is already called with interrupts disabled. I don't see how we could preempt then? But the kernel still check this in preempt case, #define get_paca() ((void) debug_smp_processor_id(), local_paca) then trigger that known call trace as I observed :) BUG: using smp_processor_id() in preemptible [] code: qemu-system-ppc/2065 caller is .kvmppc_handle_exit+0x48/0x810 CPU: 0 PID: 2065 Comm: qemu-system-ppc Not tainted 3.10.0-172036-ge2daa28-dirty #116 Call Trace: [c001fc637570] [c000835c] .show_stack+0x7c/0x1f0 (unreliable) [c001fc637640] [c0673a0c] .dump_stack+0x28/0x3c [c001fc6376b0] [c02f04d8] .debug_smp_processor_id+0x108/0x120 [c001fc637740] [c00444e8] .kvmppc_handle_exit+0x48/0x810 [c001fc6377f0] [c0047c80] .kvmppc_resume_host+0xa4/0xf8 Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v1][PATCH 1/1] KVM: PPC: disable preemption when using hard_irq_disable()
On 07/11/2013 03:15 AM, Scott Wood wrote: On 07/10/2013 01:02:19 AM, Tiejun Chen wrote: We should ensure the preemption cannot occur while calling get_paca() insdide hard_irq_disable(), otherwise the paca_struct may be the wrong one just after. And btw, we may update timing stats in this case. The soft-ee mechanism depends on accessing the PACA directly via r13 to avoid this. We probably should be using inline asm to read was_enabled rather than Yes. hoping the compiler doesn't do anything silly. Do you recommend I should directly replace get_paca() with local_paca inside hard_irq_disable()? #define hard_irq_disable() do {\ u8 _was_enabled = get_paca()-soft_enabled; \ - u8 _was_enabled = local_paca-soft_enabled; But is this safe for all scenarios? Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/8] kvm/ppc: fixes for 3.10
On 06/07/2013 08:16 AM, Scott Wood wrote: Most of these have been posted before, but I grouped them together as there are some contextual dependencies between them. Gleb/Paolo: As Alex doesn't appear to be back yet, can you apply these if there's no objection over the next few days? Scott and Alex, Any plan to merge these commits into git://github.com/agraf/linux-2.6.git as well? Or instead, where can we proceed to book3e KVM? Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/8] kvm/ppc: fixes for 3.10
On 07/09/2013 04:53 PM, Alexander Graf wrote: Am 09.07.2013 um 09:21 schrieb tiejun.chen tiejun.c...@windriver.com: On 06/07/2013 08:16 AM, Scott Wood wrote: Most of these have been posted before, but I grouped them together as there are some contextual dependencies between them. Gleb/Paolo: As Alex doesn't appear to be back yet, can you apply these if there's no objection over the next few days? Scott and Alex, Any plan to merge these commits into git://github.com/agraf/linux-2.6.git as well? Which commits exactly are you missing there? Alex, Four patches, kvm/ppc/booke: Delay kvmppc_lazy_ee_enable kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit() kvm/ppc/booke: Hold srcu lock when calling gfn functions kvm/ppc/booke64: Disable e6500 support are already in Linus's tree, https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/?qt=authorq=Scott+Wood So should we sync these commits into your tree? Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6 v5] KVM: PPC: exit to user space on ehpriv instruction
On 06/26/2013 01:42 PM, Bharat Bhushan wrote: ehpriv instruction is used for setting software breakpoints by user space. This patch adds support to exit to user space with run-debug have relevant information. As this is the first point we are using run-debug, also defined the run-debug structure. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/disassemble.h |4 arch/powerpc/include/uapi/asm/kvm.h| 21 + arch/powerpc/kvm/e500_emulate.c| 27 +++ 3 files changed, 48 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/disassemble.h b/arch/powerpc/include/asm/disassemble.h index 9b198d1..856f8de 100644 --- a/arch/powerpc/include/asm/disassemble.h +++ b/arch/powerpc/include/asm/disassemble.h @@ -77,4 +77,8 @@ static inline unsigned int get_d(u32 inst) return inst 0x; } +static inline unsigned int get_oc(u32 inst) +{ + return (inst 11) 0x7fff; +} #endif /* __ASM_PPC_DISASSEMBLE_H__ */ diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0fb1a6e..ded0607 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -269,7 +269,24 @@ struct kvm_fpu { __u64 fpr[32]; }; +/* + * Defines for h/w breakpoint, watchpoint (read, write or both) and + * software breakpoint. + * These are used as type in KVM_SET_GUEST_DEBUG ioctl and status + * for KVM_DEBUG_EXIT. + */ +#define KVMPPC_DEBUG_NONE 0x0 +#define KVMPPC_DEBUG_BREAKPOINT(1UL 1) +#define KVMPPC_DEBUG_WATCH_WRITE (1UL 2) +#define KVMPPC_DEBUG_WATCH_READ(1UL 3) struct kvm_debug_exit_arch { + __u64 address; + /* +* exiting to userspace because of h/w breakpoint, watchpoint +* (read, write or both) and software breakpoint. +*/ + __u32 status; + __u32 reserved; }; /* for KVM_SET_GUEST_DEBUG */ @@ -281,10 +298,6 @@ struct kvm_guest_debug_arch { * Type denotes h/w breakpoint, read watchpoint, write * watchpoint or watchpoint (both read and write). */ -#define KVMPPC_DEBUG_NONE 0x0 -#define KVMPPC_DEBUG_BREAKPOINT(1UL 1) -#define KVMPPC_DEBUG_WATCH_WRITE (1UL 2) -#define KVMPPC_DEBUG_WATCH_READ(1UL 3) __u32 type; __u32 reserved; } bp[16]; diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c index b10a012..dab9d07 100644 --- a/arch/powerpc/kvm/e500_emulate.c +++ b/arch/powerpc/kvm/e500_emulate.c @@ -26,6 +26,8 @@ #define XOP_TLBRE 946 #define XOP_TLBWE 978 #define XOP_TLBILX 18 +#define XOP_EHPRIV 270 +#define EHPRIV_OC_DEBUG 0 As I think the case, OC = 0, is a bit specific since IIRC, if the OC operand is omitted, its equal 0 by default. So I think we should start this OC value from 1 or other magic number. And if possible, we'd better add some comments to describe this to make the OC definition readable. Tiejun #ifdef CONFIG_KVM_E500MC static int dbell2prio(ulong param) @@ -82,6 +84,26 @@ static int kvmppc_e500_emul_msgsnd(struct kvm_vcpu *vcpu, int rb) } #endif +static int kvmppc_e500_emul_ehpriv(struct kvm_run *run, struct kvm_vcpu *vcpu, + unsigned int inst, int *advance) +{ + int emulated = EMULATE_DONE; + + switch (get_oc(inst)) { + case EHPRIV_OC_DEBUG: + run-exit_reason = KVM_EXIT_DEBUG; + run-debug.arch.address = vcpu-arch.pc; + run-debug.arch.status = 0; + kvmppc_account_exit(vcpu, DEBUG_EXITS); + emulated = EMULATE_EXIT_USER; + *advance = 0; + break; + default: + emulated = EMULATE_FAIL; + } + return emulated; +} + int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned int inst, int *advance) { @@ -130,6 +152,11 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, emulated = kvmppc_e500_emul_tlbivax(vcpu, ea); break; + case XOP_EHPRIV: + emulated = kvmppc_e500_emul_ehpriv(run, vcpu, inst, + advance); + break; + default: emulated = EMULATE_FAIL; } -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6 v5] KVM: PPC: exit to user space on ehpriv instruction
On 06/26/2013 04:44 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Wednesday, June 26, 2013 12:25 PM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; b...@kernel.crashing.org; linuxppc-...@lists.ozlabs.org; linux- ker...@vger.kernel.org; mi...@neuling.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 4/6 v5] KVM: PPC: exit to user space on ehpriv instruction On 06/26/2013 01:42 PM, Bharat Bhushan wrote: ehpriv instruction is used for setting software breakpoints by user space. This patch adds support to exit to user space with run-debug have relevant information. As this is the first point we are using run-debug, also defined the run-debug structure. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/disassemble.h |4 arch/powerpc/include/uapi/asm/kvm.h| 21 + arch/powerpc/kvm/e500_emulate.c| 27 +++ 3 files changed, 48 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/disassemble.h b/arch/powerpc/include/asm/disassemble.h index 9b198d1..856f8de 100644 --- a/arch/powerpc/include/asm/disassemble.h +++ b/arch/powerpc/include/asm/disassemble.h @@ -77,4 +77,8 @@ static inline unsigned int get_d(u32 inst) return inst 0x; } +static inline unsigned int get_oc(u32 inst) +{ + return (inst 11) 0x7fff; +} #endif /* __ASM_PPC_DISASSEMBLE_H__ */ diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0fb1a6e..ded0607 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -269,7 +269,24 @@ struct kvm_fpu { __u64 fpr[32]; }; +/* + * Defines for h/w breakpoint, watchpoint (read, write or both) and + * software breakpoint. + * These are used as type in KVM_SET_GUEST_DEBUG ioctl and status + * for KVM_DEBUG_EXIT. + */ +#define KVMPPC_DEBUG_NONE 0x0 +#define KVMPPC_DEBUG_BREAKPOINT(1UL 1) +#define KVMPPC_DEBUG_WATCH_WRITE (1UL 2) +#define KVMPPC_DEBUG_WATCH_READ(1UL 3) struct kvm_debug_exit_arch { + __u64 address; + /* +* exiting to userspace because of h/w breakpoint, watchpoint +* (read, write or both) and software breakpoint. +*/ + __u32 status; + __u32 reserved; }; /* for KVM_SET_GUEST_DEBUG */ @@ -281,10 +298,6 @@ struct kvm_guest_debug_arch { * Type denotes h/w breakpoint, read watchpoint, write * watchpoint or watchpoint (both read and write). */ -#define KVMPPC_DEBUG_NONE 0x0 -#define KVMPPC_DEBUG_BREAKPOINT(1UL 1) -#define KVMPPC_DEBUG_WATCH_WRITE (1UL 2) -#define KVMPPC_DEBUG_WATCH_READ(1UL 3) __u32 type; __u32 reserved; } bp[16]; diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c index b10a012..dab9d07 100644 --- a/arch/powerpc/kvm/e500_emulate.c +++ b/arch/powerpc/kvm/e500_emulate.c @@ -26,6 +26,8 @@ #define XOP_TLBRE 946 #define XOP_TLBWE 978 #define XOP_TLBILX 18 +#define XOP_EHPRIV 270 +#define EHPRIV_OC_DEBUG 0 As I think the case, OC = 0, is a bit specific since IIRC, if the OC operand is omitted, its equal 0 by default. So I think we should start this OC value from 1 or other magic number. ehpriv instruction is defined to be used as: ehpriv OC // where OC can be 0,1, ... n and in extended for it can be used as ehpriv // With no OC, and here it assumes OC = 0 So OC = 0 is not specific but ehpriv is same as ehpriv 0. Yes, this is just what I mean. I do not think of any special reason to reserve ehpriv and ehpriv 0. So I still prefer we can reserve the 'ehpriv' without OC operand as one simple approach to test or develop something for KVM quickly because its really convenient to trap into the hypervisor only with one 'ehpriv' instruction easily. But I have no further objection if you guys are fine to this ;-) Tiejun Thanks -Bharat And if possible, we'd better add some comments to describe this to make the OC definition readable. Tiejun #ifdef CONFIG_KVM_E500MC static int dbell2prio(ulong param) @@ -82,6 +84,26 @@ static int kvmppc_e500_emul_msgsnd(struct kvm_vcpu *vcpu, int rb) } #endif +static int kvmppc_e500_emul_ehpriv(struct kvm_run *run, struct kvm_vcpu *vcpu, + unsigned int inst, int *advance) +{ + int emulated = EMULATE_DONE; + + switch (get_oc(inst)) { + case EHPRIV_OC_DEBUG: + run-exit_reason = KVM_EXIT_DEBUG; + run-debug.arch.address = vcpu-arch.pc; + run-debug.arch.status = 0; + kvmppc_account_exit(vcpu, DEBUG_EXITS); + emulated
Re: Performance Analysis - EMUL_MTSPR: Increased counts
On 05/13/2013 04:32 PM, nitesh narayan lal wrote: Hi, I am doing the virtio performance analysis on PowerPC,with the updated QEMU I had observed that the EMUL_MTSPR instruction counts in KVM Exit Sounds you are saying this behaviour is only reproduced after you update QEMU. If so, you can use 'git bisect' easily to figure out this issued commit. count is going really high when I send packets from Guest with respect to without sending network packets from Guest. I had added few prints and found that special register number 336 which corresponds to TSR:Timer Status is been called number of times Which platform? Which CPU core? which I think could result in a higher CPU Utilization. How many processes else are running except for KVM guest? Tiejun I would be thankful if I could get some opinions about this observation. Regards Nitesh Narayan Lal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v1][KVM][PATCH 1/1] kvm:ppc:booehv: direct ISI exception to Guest
On 05/11/2013 03:39 AM, Alexander Graf wrote: Am 10.05.2013 um 21:22 schrieb Scott Wood scottw...@freescale.com: On 05/10/2013 12:57:33 PM, Alexander Graf wrote: Could you guys please collect performance data during the next weeks on both guest-directed ISIs as well as VF MMIOs (preferably with in-kernel MMIO), so that we can decide on the direction that's worth going towards? Collecting data on VF MMIO would require implementing it (or at least salvaging and fixing some old code), which is not a high priority at the moment. If we do implement VF in the future we could always undo the direct ISI change, but it would still be nice to know if there's any real benefit in the first place. Mike sounded like he had an almost working poc, which is good enough to collect rough numbers. Which can the test case be adopted? Mike, If you already have a good case for your poc, please share that with me. Then I'd like to run that. Tiejun And yes, changes like these should always get at least basic performance numbers along with them, regardless of drawbacks. Alex FWIW, I doubt that the more stress on HW TLB will be significant. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
On 05/09/2013 03:33 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Caraman Mihai Claudiu-B02008 Sent: Wednesday, May 08, 2013 6:44 PM To: Wood Scott-B07421; tiejun.chen Cc: linuxppc-...@lists.ozlabs.org; ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts This only disable soft interrupt for kvmppc_restart_interrupt() that restarts interrupts if they were meant for the host: a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL Those aren't the only exceptions that can end up going to the host. We could get a TLB miss that results in a heavyweight MMIO exit, etc. And shouldn't we handle kvmppc_restart_interrupt() like the original HOST flow? #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack) \ START_EXCEPTION(label); \ NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\ EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*) \ ... Could you elaborate on what you mean? I think Tiejun was saying that host has flags and replays only EE/DEC/DBELL interrupts. There is special macro masked_interrupt_book3e in those exception handlers that sets paca-irq_happened. The list of replied interrupts is limited to asynchronous noncritical interrupts which can be masked by MSR[EE] (therefore no TLB miss). Embedded Perfmon interrupt is also asynchronous, Why that is not in the list of masked interruts. Are you saying perfmon? If so, its also in that list: START_EXCEPTION(perfmon); NORMAL_EXCEPTION_PROLOG(0x260, BOOKE_INTERRUPT_PERFORMANCE_MONITOR, PROLOG_ADDITION_NONE) EXCEPTION_COMMON(0x260, PACA_EXGEN, INTS_DISABLE) Tiejun -Bharat Now on KVM book3e we don't want to put them in the irq_happened lazy state but rather to execute them directly, so there is no reason for exception handling symmetry between host and guest. -Mike -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
On 05/09/2013 03:51 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Thursday, May 09, 2013 1:18 PM To: Bhushan Bharat-R65777 Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc- d...@lists.ozlabs.org; ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts On 05/09/2013 03:33 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of bounces+Caraman Mihai Claudiu-B02008 Sent: Wednesday, May 08, 2013 6:44 PM To: Wood Scott-B07421; tiejun.chen Cc: linuxppc-...@lists.ozlabs.org; ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts This only disable soft interrupt for kvmppc_restart_interrupt() that restarts interrupts if they were meant for the host: a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL Those aren't the only exceptions that can end up going to the host. We could get a TLB miss that results in a heavyweight MMIO exit, etc. And shouldn't we handle kvmppc_restart_interrupt() like the original HOST flow? #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack) \ START_EXCEPTION(label); \ NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\ EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*) \ ... Could you elaborate on what you mean? I think Tiejun was saying that host has flags and replays only EE/DEC/DBELL interrupts. There is special macro masked_interrupt_book3e in those exception handlers that sets paca- irq_happened. The list of replied interrupts is limited to asynchronous noncritical interrupts which can be masked by MSR[EE] (therefore no TLB miss). Embedded Perfmon interrupt is also asynchronous, Why that is not in the list of masked interruts. Are you saying perfmon? If so, its also in that list: START_EXCEPTION(perfmon); NORMAL_EXCEPTION_PROLOG(0x260, BOOKE_INTERRUPT_PERFORMANCE_MONITOR, PROLOG_ADDITION_NONE) EXCEPTION_COMMON(0x260, PACA_EXGEN, INTS_DISABLE) Where it is recorded in paca-irq_happned to be replayed later ? In stead of PROLOG_ADDITION_MASKABLE, actually we have nothing to do with PROLOG_ADDITION_NONE for perfmon, so perfmon can be executed without lazy state. Tiejun Tiejun -Bharat Now on KVM book3e we don't want to put them in the irq_happened lazy state but rather to execute them directly, so there is no reason for exception handling symmetry between host and guest. -Mike -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
On 05/09/2013 04:12 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Kevin Hao [mailto:haoke...@gmail.com] Sent: Thursday, May 09, 2013 1:38 PM To: Bhushan Bharat-R65777 Cc: tiejun.chen; Caraman Mihai Claudiu-B02008; k...@vger.kernel.org; Wood Scott- B07421; ag...@suse.de; kvm-ppc@vger.kernel.org; linuxppc-...@lists.ozlabs.org Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts On Thu, May 09, 2013 at 07:51:09AM +, Bhushan Bharat-R65777 wrote: -Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Thursday, May 09, 2013 1:18 PM To: Bhushan Bharat-R65777 Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc- d...@lists.ozlabs.org; ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts On 05/09/2013 03:33 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf bounces+Of Caraman Mihai Claudiu-B02008 Sent: Wednesday, May 08, 2013 6:44 PM To: Wood Scott-B07421; tiejun.chen Cc: linuxppc-...@lists.ozlabs.org; ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts This only disable soft interrupt for kvmppc_restart_interrupt() that restarts interrupts if they were meant for the host: a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL Those aren't the only exceptions that can end up going to the host. We could get a TLB miss that results in a heavyweight MMIO exit, etc. And shouldn't we handle kvmppc_restart_interrupt() like the original HOST flow? #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack) \ START_EXCEPTION(label); \ NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\ EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*) \ ... Could you elaborate on what you mean? I think Tiejun was saying that host has flags and replays only EE/DEC/DBELL interrupts. There is special macro masked_interrupt_book3e in those exception handlers that sets paca- irq_happened. The list of replied interrupts is limited to asynchronous noncritical interrupts which can be masked by MSR[EE] (therefore no TLB miss). Embedded Perfmon interrupt is also asynchronous, Why that is not in the list of masked interruts. Are you saying perfmon? If so, its also in that list: START_EXCEPTION(perfmon); NORMAL_EXCEPTION_PROLOG(0x260, BOOKE_INTERRUPT_PERFORMANCE_MONITOR, PROLOG_ADDITION_NONE) EXCEPTION_COMMON(0x260, PACA_EXGEN, INTS_DISABLE) Where it is recorded in paca-irq_happned to be replayed later ? Actually we don't want replay the perfmon interrupt later. We would run it even soft irq is disabled and just treat it as NMI. Please see the following function quoted from arch/powerpc/perf/core-fsl-emb.c: /* * If interrupts were soft-disabled when a PMU interrupt occurs, treat * it as an NMI. */ static inline int perf_intr_is_nmi(struct pt_regs *regs) { #ifdef __powerpc64__ return !regs-softe; #else return 0; #endif } Is it because that we cannot afford to lose perfmon interrupt for more accurate capturing of data ? powerpc/perf: e500 support This implements perf_event support for the Freescale embedded performance monitor, based on the existing perf_event.c that supports server/classic chips. Some limitations: - Performance monitor interrupts are regular EE interrupts, and thus you can't profile places with interrupts disabled. We may want to implement soft IRQ-disabling, with perfmon interrupts exempted and treated as NMIs. Tiejun -Bharat Thanks, Kevin Tiejun -Bharat Now on KVM book3e we don't want to put them in the irq_happened lazy state but rather to execute them directly, so there is no reason for exception handling symmetry between host and guest. -Mike ___ Linuxppc-dev mailing list linuxppc-...@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
On 05/09/2013 04:23 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Caraman Mihai Claudiu-B02008 Sent: Wednesday, May 08, 2013 6:44 PM To: Wood Scott-B07421; tiejun.chen Cc: linuxppc-...@lists.ozlabs.org; ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts This only disable soft interrupt for kvmppc_restart_interrupt() that restarts interrupts if they were meant for the host: a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL Those aren't the only exceptions that can end up going to the host. We could get a TLB miss that results in a heavyweight MMIO exit, etc. And shouldn't we handle kvmppc_restart_interrupt() like the original HOST flow? #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack) \ START_EXCEPTION(label); \ NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\ EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*) \ ... Could you elaborate on what you mean? I think Tiejun was saying that host has flags and replays only EE/DEC/DBELL interrupts. There is special macro masked_interrupt_book3e in those exception handlers that sets paca-irq_happened. The list of replied interrupts is limited to asynchronous noncritical interrupts which can be masked by MSR[EE] (therefore no TLB miss). Now on KVM book3e we don't want to put them in the irq_happened lazy state but rather to execute them directly, so there is no reason for exception handling symmetry between host and guest. Another Question: The case is: Actually in the case GS=1 even if EE=0, EXT/DEC/DBELL still occur as I recall. Case 1) - Local_irq_disable() will set soft_enabled = 0 - Now Externel interrupt happens, there we set PACA_IRQ_EE in irq_happened, Also clears EE in SRR1 and rfi. So interrupts are hard disabled. No more other interrupt gated by MSR.EE can happen. Looks like the idea here is to not let a device keep on inserting interrupt till the interrupt condition on device is cleared, right? I don't understand the interrupt condition on device is cleared here. I think regardless if you clear the device interrupt status, the system still receive a pending interrupt once EE or GS = 1. - local_irq_enable() - This checks that irq_happened is set, and replays ret_from_except also check to replay. Now the case 2) Case 2) - Local_irq_disable() will set soft_enabled = 0 - Now DEC interrupt happens. We set PACA_IRQ_DEC in irq_happened, But do not clear EE in SRR1 and rfi. So interrupts are not hard disabled. - Now say EE interrupt happens, there we set PACA_IRQ_EE in irq_happened, Also clears EE in SRR1 and rfi. So interrupts are hard disabled. - local_irq_enable() - This checks that irq_happened is set. IIUC, it replays only one interrupt? is not it? After anyone is replayed in arch_local_irq_restore(), we will set soft/hard interrupt there: set_soft_enabled(1); __hard_irq_enable(); Then any pending interrupt can be executed now. Additionally, ret_from_except probably check to replay all. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v1][KVM][PATCH 1/1] kvm:ppc:booehv: direct ISI exception to Guest
On 05/08/2013 05:28 PM, tiejun.chen wrote: On 05/08/2013 05:20 PM, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of tiejun.chen Sent: Wednesday, May 08, 2013 4:54 AM To: Wood Scott-B07421 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc-...@lists.ozlabs.org Subject: Re: [v1][KVM][PATCH 1/1] kvm:ppc:booehv: direct ISI exception to Guest On 05/08/2013 07:40 AM, Scott Wood wrote: On 05/07/2013 06:06:30 AM, Tiejun Chen wrote: We also can direct ISI exception to Guest like DSI. Signed-off-by: Tiejun Chen tiejun.c...@windriver.com --- arch/powerpc/kvm/booke_emulate.c |3 +++ arch/powerpc/kvm/e500mc.c|3 ++- 2 files changed, 5 insertions(+), 1 deletion(-) Are you seeing a real performance improvement from this? This will interfere No. But after we reduce the exit to host, shouldn't this improve performance? We lose some flexibility for this so it make sense only if we gain measurable improvements. Sounds we have much more works to do. somewhat with using the VF bit, if we were to ever do so, since VF only affects Sorry, what is the VF you said? VF stands for virtualization fault see MAS8[VF] and we may use it for virtualized I almost forget this point :) Looks KVM PPC have no this mechanism currently since I don't find MAS8_VF is used in kernel, right? If I'm missing something please correct me. Tiejun MMIO. The hypervisor should deny execute access on pages marked with VF. Accordingly in this case guest ISI exceptions should be handled by the hypervisor. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
On 05/09/2013 07:21 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Thursday, May 09, 2013 3:48 PM To: Bhushan Bharat-R65777 Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc- d...@lists.ozlabs.org; ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts On 05/09/2013 06:00 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Thursday, May 09, 2013 3:15 PM To: Bhushan Bharat-R65777 Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc- d...@lists.ozlabs.org; ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts On 05/09/2013 04:23 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of bounces+Caraman Mihai Claudiu-B02008 Sent: Wednesday, May 08, 2013 6:44 PM To: Wood Scott-B07421; tiejun.chen Cc: linuxppc-...@lists.ozlabs.org; ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts This only disable soft interrupt for kvmppc_restart_interrupt() that restarts interrupts if they were meant for the host: a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL Those aren't the only exceptions that can end up going to the host. We could get a TLB miss that results in a heavyweight MMIO exit, etc. And shouldn't we handle kvmppc_restart_interrupt() like the original HOST flow? #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack) \ START_EXCEPTION(label); \ NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\ EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*) \ ... Could you elaborate on what you mean? I think Tiejun was saying that host has flags and replays only EE/DEC/DBELL interrupts. There is special macro masked_interrupt_book3e in those exception handlers that sets paca- irq_happened. The list of replied interrupts is limited to asynchronous noncritical interrupts which can be masked by MSR[EE] (therefore no TLB miss). Now on KVM book3e we don't want to put them in the irq_happened lazy state but rather to execute them directly, so there is no reason for exception handling symmetry between host and guest. Another Question: The case is: Actually in the case GS=1 even if EE=0, EXT/DEC/DBELL still occur as I recall. Case 1) - Local_irq_disable() will set soft_enabled = 0 - Now Externel interrupt happens, there we set PACA_IRQ_EE in irq_happened, Also clears EE in SRR1 and rfi. So interrupts are hard disabled. No more other interrupt gated by MSR.EE can happen. Looks like the idea here is to not let a device keep on inserting interrupt till the interrupt condition on device is cleared, right? I don't understand the interrupt condition on device is cleared here. I think regardless if you clear the device interrupt status, the system still receive a pending interrupt once EE or GS = 1. Once yes, but I think to avoid flood of device interrupt we disable MSR.EE when soft-disabled. But we neither ACK nor send EOI to that irq in the interrupt controller, so that should be in pending state. - local_irq_enable() - This checks that irq_happened is set, and replays ret_from_except also check to replay. Now the case 2) Case 2) - Local_irq_disable() will set soft_enabled = 0 - Now DEC interrupt happens. We set PACA_IRQ_DEC in irq_happened, But do not clear EE in SRR1 and rfi. So interrupts are not hard disabled. - Now say EE interrupt happens, there we set PACA_IRQ_EE in irq_happened, Also clears EE in SRR1 and rfi. So interrupts are hard disabled. - local_irq_enable() - This checks that irq_happened is set. IIUC, it replays only one interrupt? is not it? After anyone is replayed in arch_local_irq_restore(), we will set soft/hard interrupt there: set_soft_enabled(1); __hard_irq_enable(); Then any pending interrupt can be executed now. Do you mean that the interrupt should fire again? I means the pending exception including external interrupt, the decrementer exception and the doorbell exception, can trap CPU once EE=1 with __hard_irq_enable() here. Then the kernel can handle those exception since soft enable is also 1 now. Additionally, ret_from_except probably check to replay all. Local_irq_enable() will not take us to ret_from_except. Yes. I just say ret_from_except can provide an approach to replay all :) __replay_interrupt() from arch_local_irq_enable() will take us to ret_from_except/lite :) There all pending interrupts
Re: [v1][KVM][PATCH 1/1] kvm:ppc:booehv: direct ISI exception to Guest
On 05/09/2013 07:34 PM, Caraman Mihai Claudiu-B02008 wrote: VF stands for virtualization fault see MAS8[VF] and we may use it for virtualized Looks KVM PPC have no this mechanism currently since I don't find MAS8_VF is used in kernel, right? Yes but 'we may use it' in the feature, I have a functional POC with VF. Any IO performance to be improved with this POC? Now we capture virtualized MMIO accesses as TLB misses (we don't write into HW TLB guest translation outside visible memslots). Yes. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/4] kvm/ppc: Call trace_hardirqs_on before entry
On 05/10/2013 11:34 AM, Bhushan Bharat-R65777 wrote: -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Scott Wood Sent: Friday, May 10, 2013 8:40 AM To: Alexander Graf; Benjamin Herrenschmidt Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc-...@lists.ozlabs.org; Wood Scott-B07421 Subject: [PATCH v2 3/4] kvm/ppc: Call trace_hardirqs_on before entry Currently this is only being done on 64-bit. Rather than just move it out of the 64-bit ifdef, move it to kvm_lazy_ee_enable() so that it is consistent with lazy ee state, and so that we don't track more host code as interrupts-enabled than necessary. Rename kvm_lazy_ee_enable() to kvm_fix_ee_before_entry() to reflect that this function now has a role on 32-bit as well. Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/include/asm/kvm_ppc.h | 11 --- arch/powerpc/kvm/book3s_pr.c |4 ++-- arch/powerpc/kvm/booke.c |4 ++-- arch/powerpc/kvm/powerpc.c |2 -- 4 files changed, 12 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index a5287fe..6885846 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -394,10 +394,15 @@ static inline void kvmppc_mmu_flush_icache(pfn_t pfn) } } -/* Please call after prepare_to_enter. This function puts the lazy ee state - back to normal mode, without actually enabling interrupts. */ -static inline void kvmppc_lazy_ee_enable(void) +/* + * Please call after prepare_to_enter. This function puts the lazy ee and irq + * disabled tracking state back to normal mode, without actually enabling + * interrupts. + */ +static inline void kvmppc_fix_ee_before_entry(void) { + trace_hardirqs_on(); + #ifdef CONFIG_PPC64 /* Only need to enable IRQs by hard enabling them after this */ local_paca-irq_happened = 0; diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index bdc40b8..0b97ce4 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -890,7 +890,7 @@ program_interrupt: local_irq_enable(); r = s; } else { - kvmppc_lazy_ee_enable(); + kvmppc_fix_ee_before_entry(); } } @@ -1161,7 +1161,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) if (vcpu-arch.shared-msr MSR_FP) kvmppc_handle_ext(vcpu, BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP); - kvmppc_lazy_ee_enable(); + kvmppc_fix_ee_before_entry(); ret = __kvmppc_vcpu_run(kvm_run, vcpu); diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 705fc5c..eb89b83 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -673,7 +673,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) ret = s; goto out; } - kvmppc_lazy_ee_enable(); + kvmppc_fix_ee_before_entry(); local_irq_disable() is called before kvmppc_prepare_to_enter(). In patch 4, we call hard_irq_disable() once enter kvmppc_prepare_to_enter(). Tiejun Now we put the irq_happend and soft_enabled back to previous state without checking for any interrupt happened in between. If any interrupt happens in between, will not that be lost? -Bharat kvm_guest_enter(); @@ -1154,7 +1154,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, local_irq_enable(); r = (s 2) | RESUME_HOST | (r RESUME_FLAG_NV); } else { - kvmppc_lazy_ee_enable(); + kvmppc_fix_ee_before_entry(); } } diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 6316ee3..4e05f8c 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -117,8 +117,6 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu) kvm_guest_exit(); continue; } - - trace_hardirqs_on(); #endif kvm_guest_enter(); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Linuxppc-dev mailing list linuxppc-...@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v1][KVM][PATCH 1/1] kvm:ppc:booehv: direct ISI exception to Guest
On 05/08/2013 05:20 PM, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of tiejun.chen Sent: Wednesday, May 08, 2013 4:54 AM To: Wood Scott-B07421 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc-...@lists.ozlabs.org Subject: Re: [v1][KVM][PATCH 1/1] kvm:ppc:booehv: direct ISI exception to Guest On 05/08/2013 07:40 AM, Scott Wood wrote: On 05/07/2013 06:06:30 AM, Tiejun Chen wrote: We also can direct ISI exception to Guest like DSI. Signed-off-by: Tiejun Chen tiejun.c...@windriver.com --- arch/powerpc/kvm/booke_emulate.c |3 +++ arch/powerpc/kvm/e500mc.c|3 ++- 2 files changed, 5 insertions(+), 1 deletion(-) Are you seeing a real performance improvement from this? This will interfere No. But after we reduce the exit to host, shouldn't this improve performance? We lose some flexibility for this so it make sense only if we gain measurable improvements. Sounds we have much more works to do. somewhat with using the VF bit, if we were to ever do so, since VF only affects Sorry, what is the VF you said? VF stands for virtualization fault see MAS8[VF] and we may use it for virtualized I almost forget this point :) MMIO. The hypervisor should deny execute access on pages marked with VF. Accordingly in this case guest ISI exceptions should be handled by the hypervisor. Thanks for your information. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc: Make hard_irq_disable() do the right thing vs. irq tracing
On 05/07/2013 03:04 PM, Benjamin Herrenschmidt wrote: If hard_irq_disable() is called while interrupts are already soft-disabled (which is the most common case) all is already well. However you can (and in some cases want) to call it while everything is enabled (to make sure you don't get a lazy even, for example before entry into KVM guests) and in this case we need to inform the irq tracer that the irqs are going off. We have to change the inline into a macro to avoid an include circular dependency hell hole. Signed-off-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- Tested on pseries, Scott, I don't expect a problem with that patch especially since most callers already are soft disabled, so I'll merge it now along with my other pending stuff and you can simplify your KVM one. diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h index e45c494..d615b28 100644 --- a/arch/powerpc/include/asm/hw_irq.h +++ b/arch/powerpc/include/asm/hw_irq.h @@ -95,15 +95,13 @@ static inline bool arch_irqs_disabled(void) #define __hard_irq_disable() __mtmsrd(local_paca-kernel_msr, 1) #endif -static inline void hard_irq_disable(void) -{ - __hard_irq_disable(); - get_paca()-soft_enabled = 0; - get_paca()-irq_happened |= PACA_IRQ_HARD_DIS; -} - -/* include/linux/interrupt.h needs hard_irq_disable to be a macro */ -#define hard_irq_disable hard_irq_disable +#define hard_irq_disable() do {\ + __hard_irq_disable(); \ + if (local_paca-soft_enabled)\ + trace_hardirqs_off(); \ + get_paca()-soft_enabled = 0;\ Could we simplify this as follows: +#define hard_irq_disable() do {\ + __hard_irq_disable(); \ + if (get_paca()-soft_enabled) { \ + trace_hardirqs_off(); \ + get_paca()-soft_enabled = 0;\ + } \ + get_paca()-irq_happened |= PACA_IRQ_HARD_DIS; \ +} while(0) Tiejun + get_paca()-irq_happened |= PACA_IRQ_HARD_DIS; \ +} while(0) static inline bool lazy_irq_pending(void) { -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v1][KVM][PATCH 1/1] kvm:ppc: enable doorbell exception with E500MC
On 05/07/2013 09:33 AM, tiejun.chen wrote: On 05/06/2013 10:58 PM, Alexander Graf wrote: On 05/06/2013 04:53 AM, Tiejun Chen wrote: Actually E500MC also support doorbell exception, and CONFIG_PPC_E500MC can cover BOOK3E/BOOK3E_64 as well. Signed-off-by: Tiejun Chentiejun.c...@windriver.com --- arch/powerpc/kvm/booke.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 1020119..dc1f590 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -795,7 +795,7 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu, kvmppc_fill_pt_regs(regs); timer_interrupt(regs); break; -#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_BOOK3E_64) +#if defined(CONFIG_PPC_E500MC) I suppose you mean CONFIG_KVM_E500MC here? Why didn't this work for you before? This works for me. Here I just mean currently CONFIG_PPC_E500MC is always selected no matter what CONFIG_PPC_FSL_BOOK3E or CONFIG_PPC_BOOK3E_64 is enabled. And especially, this is already in the arch/powerpc/kvm/booke.c file, so I think one #ifdef (CONFIG_PPC_E500MC) is enough and also makes sense. The ifdef above should cover the same range of CPUs. Or furthermore, the #ifdef CONFIG_PPC_DOORBELL is reasonable to cover this. I think this may be better so I send next version with this. So please take a look at. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v1][KVM][PATCH 1/1] kvm:ppc: enable doorbell exception with E500MC
On 05/06/2013 10:58 PM, Alexander Graf wrote: On 05/06/2013 04:53 AM, Tiejun Chen wrote: Actually E500MC also support doorbell exception, and CONFIG_PPC_E500MC can cover BOOK3E/BOOK3E_64 as well. Signed-off-by: Tiejun Chentiejun.c...@windriver.com --- arch/powerpc/kvm/booke.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 1020119..dc1f590 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -795,7 +795,7 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu, kvmppc_fill_pt_regs(regs); timer_interrupt(regs); break; -#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_BOOK3E_64) +#if defined(CONFIG_PPC_E500MC) I suppose you mean CONFIG_KVM_E500MC here? Why didn't this work for you before? This works for me. Here I just mean currently CONFIG_PPC_E500MC is always selected no matter what CONFIG_PPC_FSL_BOOK3E or CONFIG_PPC_BOOK3E_64 is enabled. And especially, this is already in the arch/powerpc/kvm/booke.c file, so I think one #ifdef (CONFIG_PPC_E500MC) is enough and also makes sense. The ifdef above should cover the same range of CPUs. Or furthermore, the #ifdef CONFIG_PPC_DOORBELL is reasonable to cover this. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
On 05/07/2013 07:50 AM, Scott Wood wrote: On 05/05/2013 10:13:17 PM, tiejun.chen wrote: On 05/06/2013 11:10 AM, Tiejun Chen wrote: For the external interrupt, the decrementer exception and the doorbell excpetion, we also need to soft-disable interrupts while doing as host interrupt handlers since the DO_KVM hook is always performed to skip EXCEPTION_COMMON then miss this original chance with the 'ints' (INTS_DISABLE). http://patchwork.ozlabs.org/patch/241344/ http://patchwork.ozlabs.org/patch/241412/ :-) I'm observing the same behaviour as well: WARN_ON_ONCE(!irqs_disabled()); Signed-off-by: Tiejun Chen tiejun.c...@windriver.com --- arch/powerpc/kvm/bookehv_interrupts.S |9 + 1 file changed, 9 insertions(+) diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index e8ed7d6..2fd62bf 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -33,6 +33,8 @@ #ifdef CONFIG_64BIT #include asm/exception-64e.h +#include asm/hw_irq.h +#include asm/irqflags.h #else #include ../kernel/head_booke.h /* for THREAD_NORMSAVE() */ #endif @@ -469,6 +471,13 @@ _GLOBAL(kvmppc_resume_host) PPC_LLr3, HOST_RUN(r1) mrr5, r14 /* intno */ mrr14, r4 /* Save vcpu pointer. */ +#ifdef CONFIG_64BIT +/* Should we soft-disable interrupts? */ +andi.r6, r5, BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL +beqskip_soft_dis +SOFT_DISABLE_INTS(r7,r8) +skip_soft_dis: +#endif Why wouldn't we always disable them? kvmppc_handle_exit() will enable interrupts when it's ready. This only disable soft interrupt for kvmppc_restart_interrupt() that restarts interrupts if they were meant for the host: a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL b. bl kvmppc_handle_exit c. kvmppc_handle_exit() { int r = RESUME_HOST; int s; /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); /* restart interrupts if they were meant for the host */ kvmppc_restart_interrupt(vcpu, exit_nr); local_irq_enable(); == Enable again. And shouldn't we handle kvmppc_restart_interrupt() like the original HOST flow? #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack) \ START_EXCEPTION(label); \ NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\ EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*) \ ... So I think this should be reasonable :) Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
On 05/07/2013 10:06 AM, Scott Wood wrote: On 05/06/2013 08:56:25 PM, tiejun.chen wrote: On 05/07/2013 07:50 AM, Scott Wood wrote: On 05/05/2013 10:13:17 PM, tiejun.chen wrote: On 05/06/2013 11:10 AM, Tiejun Chen wrote: For the external interrupt, the decrementer exception and the doorbell excpetion, we also need to soft-disable interrupts while doing as host interrupt handlers since the DO_KVM hook is always performed to skip EXCEPTION_COMMON then miss this original chance with the 'ints' (INTS_DISABLE). http://patchwork.ozlabs.org/patch/241344/ http://patchwork.ozlabs.org/patch/241412/ :-) I'm observing the same behaviour as well: WARN_ON_ONCE(!irqs_disabled()); So, could you explain the benefits of your approach over what's being discussed in those threads? They're a long thread so I think I need to take time to see :) Why wouldn't we always disable them? kvmppc_handle_exit() will enable interrupts when it's ready. This only disable soft interrupt for kvmppc_restart_interrupt() that restarts interrupts if they were meant for the host: a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL Those aren't the only exceptions that can end up going to the host. We could get a TLB miss that results in a heavyweight MMIO exit, etc. This is like host handler, so I'm just disabling soft interrupt during kvmppc_restart_interrupt() for Doorbell interrupt/Decrementer Interrupt/External Input Interrupt. I don't see anything should be disabled for any TLB exception in host handler. And I'd rather see any fix for this problem stay out of the asm code. We already have an appropriate SOFT_DISABLE_INTS so I think we can take this easily :) b. bl kvmppc_handle_exit c. kvmppc_handle_exit() { int r = RESUME_HOST; int s; /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); /* restart interrupts if they were meant for the host */ kvmppc_restart_interrupt(vcpu, exit_nr); local_irq_enable();== Enable again. And shouldn't we handle kvmppc_restart_interrupt() like the original HOST flow? #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack) \ START_EXCEPTION(label); \ NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\ EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*) \ ... Could you elaborate on what you mean? In host handler, we always use MASKABLE_EXCEPTION() to define-to-handle some exceptions: Doorbell interrupt/Decrementer Interrupt/External Input Interrupt: #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack) \ START_EXCEPTION(label); \ NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\ EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*) \ This would call INTS_DISABLE, which is equal to SOFT_DISABLE_INTS(), to disable soft interrupt before call all associated handlers: do_IRQ()/timer_interrupt()/doorbell_exception(). But DO_KVM hook always skips INTS_DISABLE. So I think we also need to do INTS_DISABLE for kvmppc_restart_interrupt() since actually that restarts interrupts for the host with a similar way as they are called by host. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
On 05/06/2013 11:10 AM, Tiejun Chen wrote: For the external interrupt, the decrementer exception and the doorbell excpetion, we also need to soft-disable interrupts while doing as host interrupt handlers since the DO_KVM hook is always performed to skip EXCEPTION_COMMON then miss this original chance with the 'ints' (INTS_DISABLE). Sorry, miss to send Ben. Tiejun Signed-off-by: Tiejun Chen tiejun.c...@windriver.com --- arch/powerpc/kvm/bookehv_interrupts.S |9 + 1 file changed, 9 insertions(+) diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index e8ed7d6..2fd62bf 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -33,6 +33,8 @@ #ifdef CONFIG_64BIT #include asm/exception-64e.h +#include asm/hw_irq.h +#include asm/irqflags.h #else #include ../kernel/head_booke.h /* for THREAD_NORMSAVE() */ #endif @@ -469,6 +471,13 @@ _GLOBAL(kvmppc_resume_host) PPC_LL r3, HOST_RUN(r1) mr r5, r14 /* intno */ mr r14, r4 /* Save vcpu pointer. */ +#ifdef CONFIG_64BIT + /* Should we soft-disable interrupts? */ + andi. r6, r5, BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL + beq skip_soft_dis + SOFT_DISABLE_INTS(r7,r8) +skip_soft_dis: +#endif bl kvmppc_handle_exit /* Restore vcpu pointer and the nonvolatiles we used. */ -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8 v3] KVM: PPC: e500: Add e6500 core to Kconfig description
On 04/26/2013 11:11 AM, tiejun.chen wrote: On 04/25/2013 07:32 PM, Caraman Mihai Claudiu-B02008 wrote: Is the flowing is fine with that generic machine, ppce500, to boot P5040DS with 64bit, ./qemu-system-ppc64 -enable-kvm -m 1048 -nographic -M ppce500 -kernel uImage -initrd ramdisk.gz -L . -append root=/dev/ram rw console=ttyS0,115200 -cpu e5500 -dtb p5040ds.dtb Thanks, Tiejun There is no need for -dtb. With your comment, I use kvm-ppc-queue which top commit is be28a27c, kvm/ppc: don't call complete_mmio_load when it's a store, in plus that patch you pointedto build one uImage based on corenet64_smp_defconfig, but we need to enable CONFIG_PPC_QEMU_E500 manually, and select CONFIG_TICK_CPU_ACCOUNTING since the default CONFIG_VIRT_CPU_ACCOUNTING_NATIVE would introduce some trace when boot VM. And perform as follows: ./qemu-system-ppc64 -enable-kvm -m 1048 -nographic -M ppce500 -kernel uImage -initrd ramdisk.gz -L . -append root=/dev/ram rw console=ttyS0,115200 -cpu e5500 But I can't see anything in the serial port. Please ignore this since e5500 is okay now. Thanks for your reply. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7 v3] KVM: PPC: exit to user space on ehpriv instruction
On 04/26/2013 06:45 PM, Alexander Graf wrote: On 08.04.2013, at 12:32, Bharat Bhushan wrote: From: Bharat Bhushan bharat.bhus...@freescale.com ehpriv instruction is used for setting software breakpoints by user space. This patch adds support to exit to user space with run-debug have relevant information. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/e500_emulate.c | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c index e78f353..cefdd38 100644 --- a/arch/powerpc/kvm/e500_emulate.c +++ b/arch/powerpc/kvm/e500_emulate.c @@ -26,6 +26,7 @@ #define XOP_TLBRE 946 #define XOP_TLBWE 978 #define XOP_TLBILX 18 +#define XOP_EHPRIV 270 #ifdef CONFIG_KVM_E500MC static int dbell2prio(ulong param) @@ -130,6 +131,15 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, emulated = kvmppc_e500_emul_tlbivax(vcpu, ea); break; + case XOP_EHPRIV: This is supposed to check for oc, no? The other day I already sent one patch only to check OC, KVM/PPC: emulate ehpriv. But Bharat said he's waiting for other debug patches to be reviewed. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8 v3] KVM: PPC: e500: Add e6500 core to Kconfig description
On 04/25/2013 05:09 PM, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Friday, April 19, 2013 1:03 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org Subject: Re: [PATCH 8/8 v3] KVM: PPC: e500: Add e6500 core to Kconfig description On 04/11/2013 06:03 PM, Mihai Caraman wrote: Add e6500 core to Kconfig description. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v3: - No change arch/powerpc/kvm/Kconfig |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 63c67ec..4489520 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -136,15 +136,15 @@ config KVM_E500V2 If unsure, say N. config KVM_E500MC - bool KVM support for PowerPC E500MC/E5500 processors + bool KVM support for PowerPC E500MC/E5500/E6500 processors depends on PPC_E500MC select KVM select KVM_MMIO select KVM_BOOKE_HV select MMU_NOTIFIER ---help--- - Support running unmodified E500MC/E5500 (32-bit) guest kernels in I ever tried p5040ds but failed with 64-bit, but looks are you saying this patch set can make e5500/e6500 work well with 64-bit? If so, will we need to upgrade qemu or something else like dtb? KVM should work on p5040ds with and without this patchset. The latest qemu requires this patch: powerpc: Add paravirt idle loop for 64-bit Book-E, you will not pass guest udev without it. Which should qemu tree be used here? My tree is cloned from: git://repo.or.cz/qemu/agraf.git ppc-next But I can't find this commit. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8 v3] KVM: PPC: e500: Add e6500 core to Kconfig description
On 04/25/2013 05:32 PM, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Thursday, April 25, 2013 12:17 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org Subject: Re: [PATCH 8/8 v3] KVM: PPC: e500: Add e6500 core to Kconfig description On 04/25/2013 05:09 PM, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Friday, April 19, 2013 1:03 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org Subject: Re: [PATCH 8/8 v3] KVM: PPC: e500: Add e6500 core to Kconfig description On 04/11/2013 06:03 PM, Mihai Caraman wrote: Add e6500 core to Kconfig description. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v3: - No change arch/powerpc/kvm/Kconfig |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 63c67ec..4489520 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -136,15 +136,15 @@ config KVM_E500V2 If unsure, say N. config KVM_E500MC - bool KVM support for PowerPC E500MC/E5500 processors + bool KVM support for PowerPC E500MC/E5500/E6500 processors depends on PPC_E500MC select KVM select KVM_MMIO select KVM_BOOKE_HV select MMU_NOTIFIER ---help--- - Support running unmodified E500MC/E5500 (32-bit) guest kernels in I ever tried p5040ds but failed with 64-bit, but looks are you saying this patch set can make e5500/e6500 work well with 64-bit? If so, will we need to upgrade qemu or something else like dtb? KVM should work on p5040ds with and without this patchset. The latest qemu requires this patch: powerpc: Add paravirt idle loop for 64-bit Book-E, you will not pass guest udev without it. This is a kernel patch required by latest qemu. Looks this commit is applied only into galak/powerpc.git, next, but still not merged into agraf/linux-2.6.git, so I'm confused which tree can support 64bit Book3E KVM as you point. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] kvm:book3e: Fix a build error
On 04/25/2013 08:11 PM, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc- ow...@vger.kernel.org] On Behalf Of Tiejun Chen Sent: Thursday, April 25, 2013 2:46 PM To: ga...@kernel.crashing.org Cc: linuxppc-...@lists.ozlabs.org; kvm-ppc@vger.kernel.org; k...@vger.kernel.org Subject: [PATCH 1/1] kvm:book3e: Fix a build error Commit cd66cc2e, powerpc/85xx: Add AltiVec support for e6500, adds support for AltiVec on a Book-E class processor, but while compiling in the CONFIG_PPC_BOOK3E_64 and CONFIG_VIRTUALIZATION case, this introduce the following error: arch/powerpc/kernel/exceptions-64e.S:402: undefined reference to `kvmppc_handler_42_0x01B' arch/powerpc/kernel/built-in.o: In function `exc_altivec_assist_book3e': arch/powerpc/kernel/exceptions-64e.S:424: undefined reference to `kvmppc_handler_43_0x01B' make: *** [vmlinux] Error 1 Looks we should add these altivec kvm handlers. Signed-off-by: Tiejun Chen tiejun.c...@windriver.com --- arch/powerpc/kvm/bookehv_interrupts.S |5 + 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index e8ed7d6..fa9c78a 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -319,6 +319,11 @@ kvm_handler BOOKE_INTERRUPT_DEBUG, EX_PARAMS(DBG), \ SPRN_DSRR0, SPRN_DSRR1, 0 kvm_handler BOOKE_INTERRUPT_DEBUG, EX_PARAMS(CRIT), \ SPRN_CSRR0, SPRN_CSRR1, 0 +/* altivec */ +kvm_handler BOOKE_INTERRUPT_ALTIVEC_UNAVAIL, EX_PARAMS(GEN), \ + SPRN_SRR0, SPRN_SRR1, 0 +kvm_handler BOOKE_INTERRUPT_ALTIVEC_ASSIST, EX_PARAMS(GEN), \ + SPRN_SRR0, SPRN_SRR1, 0 #else /* * For input register values, see arch/powerpc/include/asm/kvm_booke_hv_asm.h -- It seems that you are not using kvm-ppc-queue branch. This is just used to fix a build error in powerpc.git when introduce commit cd66cc2e, powerpc/85xx: Add AltiVec support for e6500, in *powerpc.git* as I mentioned in this patch head. I already have a patch ready for this (and AltiVec support is work This change don't block your AltiVec support for kvm unless you think this change is wrong. And especially, we always can reproduce this error with/without enabling AltiVec, so I also don't think this should be suspended until support e6500 in kvm since kvm based on e5500 should work. Tiejun in progress) but we need first to pull e6500 kernel patches from Linux tree into agraf.git. -Mike -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8 v3] KVM: PPC: e500: Add e6500 core to Kconfig description
On 04/25/2013 07:32 PM, Caraman Mihai Claudiu-B02008 wrote: Is the flowing is fine with that generic machine, ppce500, to boot P5040DS with 64bit, ./qemu-system-ppc64 -enable-kvm -m 1048 -nographic -M ppce500 -kernel uImage -initrd ramdisk.gz -L . -append root=/dev/ram rw console=ttyS0,115200 -cpu e5500 -dtb p5040ds.dtb Thanks, Tiejun There is no need for -dtb. With your comment, I use kvm-ppc-queue which top commit is be28a27c, kvm/ppc: don't call complete_mmio_load when it's a store, in plus that patch you pointed to build one uImage based on corenet64_smp_defconfig, but we need to enable CONFIG_PPC_QEMU_E500 manually, and select CONFIG_TICK_CPU_ACCOUNTING since the default CONFIG_VIRT_CPU_ACCOUNTING_NATIVE would introduce some trace when boot VM. And perform as follows: ./qemu-system-ppc64 -enable-kvm -m 1048 -nographic -M ppce500 -kernel uImage -initrd ramdisk.gz -L . -append root=/dev/ram rw console=ttyS0,115200 -cpu e5500 But I can't see anything in the serial port. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8 v3] KVM: PPC: e500: Add e6500 core to Kconfig description
On 04/11/2013 06:03 PM, Mihai Caraman wrote: Add e6500 core to Kconfig description. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v3: - No change arch/powerpc/kvm/Kconfig |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 63c67ec..4489520 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -136,15 +136,15 @@ config KVM_E500V2 If unsure, say N. config KVM_E500MC - bool KVM support for PowerPC E500MC/E5500 processors + bool KVM support for PowerPC E500MC/E5500/E6500 processors depends on PPC_E500MC select KVM select KVM_MMIO select KVM_BOOKE_HV select MMU_NOTIFIER ---help--- - Support running unmodified E500MC/E5500 (32-bit) guest kernels in I ever tried p5040ds but failed with 64-bit, but looks are you saying this patch set can make e5500/e6500 work well with 64-bit? If so, will we need to upgrade qemu or something else like dtb? Tiejun - virtual machines on E500MC/E5500 host processors. + Support running unmodified E500MC/E5500/E6500 guest kernels in + virtual machines on E500MC/E5500/E6500 host processors. This module provides access to the hardware capabilities through a character device node named /dev/kvm. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7 v3] KVM: PPC: exit to user space on ehpriv instruction
On 04/08/2013 06:32 PM, Bharat Bhushan wrote: From: Bharat Bhushan bharat.bhus...@freescale.com ehpriv instruction is used for setting software breakpoints by user space. This patch adds support to exit to user space with run-debug have relevant information. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/e500_emulate.c | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c index e78f353..cefdd38 100644 --- a/arch/powerpc/kvm/e500_emulate.c +++ b/arch/powerpc/kvm/e500_emulate.c @@ -26,6 +26,7 @@ #define XOP_TLBRE 946 #define XOP_TLBWE 978 #define XOP_TLBILX 18 +#define XOP_EHPRIV 270 #ifdef CONFIG_KVM_E500MC static int dbell2prio(ulong param) @@ -130,6 +131,15 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, emulated = kvmppc_e500_emul_tlbivax(vcpu, ea); break; + case XOP_EHPRIV: + run-exit_reason = KVM_EXIT_DEBUG; IIRC, the ehpriv instruction should generate a Hypervisor Privilege Exception to trap into the Hypervisor proactive. And we can use this ability to design something conveniently. And so, that is not only for the debug mechanism like you did. So here if 'run-exit_reason' is fixed to KVM_EXIT_DEBUG, how to distinguish other scenarios? So as I understand, we should use 'ehpriv oc' exactly then resolve 'oc' further to go different cases, right? Tiejun + run-debug.arch.address = vcpu-arch.pc; + run-debug.arch.status = 0; + kvmppc_account_exit(vcpu, DEBUG_EXITS); + emulated = EMULATE_EXIT_USER; + *advance = 0; + break; + default: emulated = EMULATE_FAIL; } -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 1/6] kvm: add device control API
On 04/02/2013 06:47 AM, Scott Wood wrote: Currently, devices that are emulated inside KVM are configured in a hardcoded manner based on an assumption that any given architecture only has one way to do it. If there's any need to access device state, it is done through inflexible one-purpose-only IOCTLs (e.g. KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is cumbersome and depletes a limited numberspace. This API provides a mechanism to instantiate a device of a certain type, returning an ID that can be used to set/get attributes of the device. Attributes may include configuration parameters (e.g. register base address), device state, operational commands, etc. It is similar to the ONE_REG API, except that it acts on devices rather than vcpus. Both device types and individual attributes can be tested without having to create the device or get/set the attribute, without the need for separately managing enumerated capabilities. Signed-off-by: Scott Wood scottw...@freescale.com --- Documentation/virtual/kvm/api.txt| 70 ++ Documentation/virtual/kvm/devices/README |1 + arch/powerpc/include/asm/kvm_host.h |6 +++ arch/powerpc/include/asm/kvm_ppc.h |2 + arch/powerpc/kvm/powerpc.c |7 +++ include/uapi/linux/kvm.h | 27 virt/kvm/kvm_main.c | 31 + 7 files changed, 144 insertions(+) create mode 100644 Documentation/virtual/kvm/devices/README diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 976eb65..77328aa 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2173,6 +2173,76 @@ header; first `n_valid' valid entries with contents from the data written, then `n_invalid' invalid entries, invalidating any previously valid entries found. +4.79 KVM_CREATE_DEVICE + +Capability: KVM_CAP_DEVICE_CTRL +Type: vm ioctl +Parameters: struct kvm_create_device (in/out) +Returns: 0 on success, -1 on error +Errors: + ENODEV: The device type is unknown or unsupported + EEXIST: Device already created, and this type of device may not + be instantiated multiple times + ENOSPC: Too many devices have been created + + Other error conditions may be defined by individual device types. + +Creates an emulated device in the kernel. The file descriptor returned +in fd can be used with KVM_SET/GET/HAS_DEVICE_ATTR. + +If the KVM_CREATE_DEVICE_TEST flag is set, only test whether the +device type is supported (not necessarily whether it can be created +in the current vm). + +Individual devices should not define flags. Attributes should be used +for specifying any behavior that is not implied by the device type +number. + +struct kvm_create_device { + __u32 type; /* in: KVM_DEV_TYPE_xxx */ + __u32 fd; /* out: device handle */ + __u32 flags; /* in: KVM_CREATE_DEVICE_xxx */ +}; + +4.80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR + +Capability: KVM_CAP_DEVICE_CTRL +Type: device ioctl +Parameters: struct kvm_device_attr +Returns: 0 on success, -1 on error +Errors: + ENXIO: The group or attribute is unknown/unsupported for this device + EPERM: The attribute cannot (currently) be accessed this way + (e.g. read-only attribute, or attribute that only makes + sense when the device is in a different state) + + Other error conditions may be defined by individual device types. + +Gets/sets a specified piece of device configuration and/or state. The +semantics are device-specific. See individual device documentation in +the devices directory. As with ONE_REG, the size of the data +transferred is defined by the particular attribute. + +struct kvm_device_attr { + __u32 flags; /* no flags currently defined */ + __u32 group; /* device-defined */ + __u64 attr; /* group-defined */ + __u64 addr; /* userspace address of attr data */ +}; + +4.81 KVM_HAS_DEVICE_ATTR + +Capability: KVM_CAP_DEVICE_CTRL +Type: device ioctl +Parameters: struct kvm_device_attr +Returns: 0 on success, -1 on error +Errors: + ENXIO: The group or attribute is unknown/unsupported for this device + +Tests whether a device supports a particular attribute. A successful +return indicates the attribute is implemented. It does not necessarily +indicate that the attribute can be read or written in the device's +current state. addr is ignored. 4.77 KVM_ARM_VCPU_INIT diff --git a/Documentation/virtual/kvm/devices/README b/Documentation/virtual/kvm/devices/README new file mode 100644 index 000..34a6983 --- /dev/null +++ b/Documentation/virtual/kvm/devices/README @@ -0,0 +1 @@ +This directory contains specific device bindings for KVM_CAP_DEVICE_CTRL. diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index e34f8fe..e0caae2 100644 ---
Re: [RFC PATCH v2 1/6] kvm: add device control API
On 04/03/2013 01:30 AM, Scott Wood wrote: On 04/02/2013 01:59:57 AM, tiejun.chen wrote: On 04/02/2013 06:47 AM, Scott Wood wrote: diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ff71541..ed033c0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2158,6 +2158,17 @@ out: } #endif +static int kvm_ioctl_create_device(struct kvm *kvm, + struct kvm_create_device *cd) +{ +bool test = cd-flags KVM_CREATE_DEVICE_TEST; + +switch (cd-type) { +default: +return -ENODEV; +} Even after apply patch 5, looks here still misses something like: if (test) WARN_ON_ONCE(!cd-type); Why? How does userspace passing in a bad type value mean the kernel needs to report internal badness, why is a value of zero worse than any other bad value, and why only when the test flag is set? I just mean we need do something here since looks the 'test' variable is defined but unused, right? But please correct this as you expect :) And if the userspace can't guarantee cd-type is never zero, we should return -ENODEV as well after that switch(). Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 1/6] kvm: add device control API
On 04/03/2013 09:34 AM, Scott Wood wrote: On 04/02/2013 08:28:01 PM, tiejun.chen wrote: On 04/03/2013 01:30 AM, Scott Wood wrote: On 04/02/2013 01:59:57 AM, tiejun.chen wrote: On 04/02/2013 06:47 AM, Scott Wood wrote: diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ff71541..ed033c0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2158,6 +2158,17 @@ out: } #endif +static int kvm_ioctl_create_device(struct kvm *kvm, + struct kvm_create_device *cd) +{ +bool test = cd-flags KVM_CREATE_DEVICE_TEST; + +switch (cd-type) { +default: +return -ENODEV; +} Even after apply patch 5, looks here still misses something like: if (test) WARN_ON_ONCE(!cd-type); Why? How does userspace passing in a bad type value mean the kernel needs to report internal badness, why is a value of zero worse than any other bad value, and why only when the test flag is set? I just mean we need do something here since looks the 'test' variable is defined but unused, right? But please correct this as you expect :) Yes, it's unused in this patch, but is used after patch 5 is applied. I didn't think it was worth adding a temporary unused annotation, since this part of the kernel doesn't use -Werror. Yes, its accepted in !-Werror case if we shouldn't warn something as you said. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unexpected data TLB miss happens when guest OS executing a bl instruction
On 07/03/2012 09:44 PM, Fei K Chen wrote: On Jul 2, 2012, at 10:34 AM, Fei K Chen wrote: We are debuging kvm on IBM poweren chip by RSICWatch tool. So for those of you who don't know, the RISCWatch tool (RW) is a JTAG based HW probe debugger. An unexpected data TLB miss happened and we can not explain why. Any one have met this before? 1. Guest OS executes a bl instruction with PC=0xC05A49CC. According to the guest linux kernel objdump file, the next instruction will be mflr r0 with PC=0xC0599CC0. Normally, if you were debugging the host, you could strip off the 0xC and read the location from memory to verify this. However, since you are debugging the guest, you probably can't figure out the machine physical address. 2. By single-step execution in RISCWatch, guest OS does jump to an instruction with PC=0xC0599CC0. At this time, RISCWatch tool can not display what the instruction is. We guess this is because there is no instruction TLB entry in hardware TLB for PC=0xC0599CC0. Thus an instruction TLB miss is expected if we press the Asmstep to execute the next instruction. 3. Unfortunately, poweren jumps an instruction with PC=0xC0051FF4 which is the beginning of data TLB miss entry in kvm. We read the values in spr SRR0 and DEAR. Both of them are 0xC0599CC0. We even can not imagine why this happens. So when RW decides to look at memory it uses instruction stuffing/ramming where it is able to insert instructions into the thread's instruction port, this way it can perform loads and stores using translation in the same way that software does. Since you are using the ASM window, it is trying to read the instruction (before it is executed) and this causes your data fault. So this is normal and completely expect. Instead of using the ASM window use the command line window to step and read iar then you will get the correct instruction fault you are looking for. Yes, as what you said, by using the command line window, the data TLB miss does not happen now. Note: It may also be the case that you can uncheck the track IAR box in the asm windows if you insist on doing that. 4. As external interrupt will happen during single-step debugging, we set a hardware breakpoint at PC=0xC0599CC0, and let poweren directly run to that point. Yes, hardware probes become difficult with any interrupts active and, IMNSHO, should really only be used to debug bootstrap or exception level code. It would be way easier to instrument the host fault handlers to help you debug this case. 5. When poweren stops at PC=0xC0599CC0, from the output of RISCWatch, a trap instruction is placed at PC=0xC0599CC0. It is different with what should be according to the kernel objdump file. The only explanation we can imagine is that our kvm code set a wrong TLB entry for PC=0xC0599CC0 (it may be brought by that unexpected data TLB miss). As explained above, I'm pretty sure you did not hit the data fault in the same way as before. Does the rest of the instruction stream match? If not then you likely have a translation error. The reset of the instructions before and after 0xC0599CC0 match the vmlinux objdump file. So it seems that someone changed the memory content at 0xC0599CC0. However, there are only a handful of static trap instructions in vmlinux, so you should be able to track them down. My bet is that some software (or RW) has inserted the trap instruction to facilitate some form of break point? The instruction in 0xC0599CC0 should not be covered by anyone because the instruction mflr r0 is saving the lr register. Without the value in lr, function call can not return to the caller. I think you can set a data write breakpoint at another address to check if that is replaced with trap instruction as well. If not, at least this means this shouldn't be inserted by RW :) When that stops, you also can dump 0xC0599CC0 to take a look at what happened. I am tracing the event that someone writes to address 0xC0599CC0 with the help of RW. Strange, poweren dose not stop before it stops at that unexpected trap instruction. It looks like that the trap exists before I set up the data address write breakpoint. Did you enable kprobe to do probe something? Since Kprobe just use trap instruction to replace the krpobed address to make CPU hit. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 13/38] KVM: PPC: booke: category E.HV (GS-mode) support
+/* + * Host interrupt handlers may have clobbered these guest-readable + * SPRGs, so we need to reload them here with the guest's values. + */ +lwz r3, VCPU_VRSAVE(r4) +lwz r5, VCPU_SHARED_SPRG4(r11) +mtspr SPRN_VRSAVE, r3 +lwz r6, VCPU_SHARED_SPRG5(r11) +mtspr SPRN_SPRG4W, r5 +lwz r7, VCPU_SHARED_SPRG6(r11) +mtspr SPRN_SPRG5W, r6 +lwz r8, VCPU_SHARED_SPRG7(r11) +mtspr SPRN_SPRG6W, r7 +mtspr SPRN_SPRG7W, r8 + That should be here. +/* Load some guest volatiles. */ +PPC_LL r3, VCPU_LR(r4) +PPC_LL r5, VCPU_XER(r4) +PPC_LL r6, VCPU_CTR(r4) +PPC_LL r7, VCPU_CR(r4) +PPC_LL r8, VCPU_PC(r4) +#ifndef CONFIG_64BIT +lwz r9, (VCPU_SHARED_MSR + 4)(r11) +#else +ld r9, (VCPU_SHARED_MSR)(r11) +#endif +PPC_LL r0, VCPU_GPR(r0)(r4) +PPC_LL r1, VCPU_GPR(r1)(r4) +PPC_LL r2, VCPU_GPR(r2)(r4) +PPC_LL r10, VCPU_GPR(r10)(r4) +PPC_LL r11, VCPU_GPR(r11)(r4) +PPC_LL r12, VCPU_GPR(r12)(r4) +PPC_LL r13, VCPU_GPR(r13)(r4) +mtlrr3 +mtxer r5 +mtctr r6 +mtcrr7 +mtsrr0 r8 +mtsrr1 r9 + +#ifdef CONFIG_KVM_EXIT_TIMING +/* save enter time */ +1: +mfspr r6, SPRN_TBRU +mfspr r7, SPRN_TBRL +mfspr r8, SPRN_TBRU +cmpwr8, r6 Is not we should save guest CR after this otherwise this can corrupt it? I think this should be a typo since in our previous kvm implementation, we always did collect kvm exit timing at the above location :) Tiejun Thanks -Bharat +PPC_STL r7, VCPU_TIMING_LAST_ENTER_TBL(r4) +bne 1b +PPC_STL r8, VCPU_TIMING_LAST_ENTER_TBU(r4) +#endif + +/* Finish loading guest volatiles and jump to guest. */ +PPC_LL r5, VCPU_GPR(r5)(r4) +PPC_LL r6, VCPU_GPR(r6)(r4) +PPC_LL r7, VCPU_GPR(r7)(r4) +PPC_LL r8, VCPU_GPR(r8)(r4) +PPC_LL r9, VCPU_GPR(r9)(r4) + +PPC_LL r3, VCPU_GPR(r3)(r4) +PPC_LL r4, VCPU_GPR(r4)(r4) +rfi -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 3/4] KVM: PPC: epapr: install ev_idle hcall for e500 guest
Liu Yu wrote: If the guest hypervisor node contains has-idle property. Signed-off-by: Liu Yu yu@freescale.com --- v5: no change arch/powerpc/kernel/epapr_hcalls.S | 29 + arch/powerpc/kernel/epapr_paravirt.c | 11 ++- 2 files changed, 39 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kernel/epapr_hcalls.S b/arch/powerpc/kernel/epapr_hcalls.S index 697b390..72fa234 100644 --- a/arch/powerpc/kernel/epapr_hcalls.S +++ b/arch/powerpc/kernel/epapr_hcalls.S @@ -15,6 +15,35 @@ #include asm/ppc_asm.h #include asm/asm-offsets.h +#define HC_VENDOR_EPAPR (1 16) +#define HC_EV_IDLE 16 Why not use 'EV_IDLE' directly? + +_GLOBAL(epapr_ev_idle) +epapr_ev_idle: + rlwinm r3,r1,0,0,31-THREAD_SHIFT /* current thread_info */ + lwz r4,TI_LOCAL_FLAGS(r3) /* set napping bit */ + ori r4,r4,_TLF_NAPPING /* so when we take an exception */ + stw r4,TI_LOCAL_FLAGS(r3) /* it will return to our caller */ + + wrteei 1 + +idle_loop: + LOAD_REG_IMMEDIATE(r11, HC_VENDOR_EPAPR | HC_EV_IDLE) And could this line be simplified as something like this: LOAD_REG_IMMEDIATE(r11, EV_HCALL_TOKEN(EV_IDLE)) If so, even we can remove the previous HC_VENDOR_EPAPR definition as well. Tiejun + +.global epapr_ev_idle_start +epapr_ev_idle_start: + li r3, -1 + nop + nop + nop + + /* + * Guard against spurious wakeups from a hypervisor -- + * only interrupt will cause us to return to LR due to + * _TLF_NAPPING. + */ + b idle_loop + /* Hypercall entry point. Will be patched with device tree instructions. */ .global epapr_hypercall_start epapr_hypercall_start: diff --git a/arch/powerpc/kernel/epapr_paravirt.c b/arch/powerpc/kernel/epapr_paravirt.c index e601da7..43d875e 100644 --- a/arch/powerpc/kernel/epapr_paravirt.c +++ b/arch/powerpc/kernel/epapr_paravirt.c @@ -21,6 +21,10 @@ #include asm/epapr_hcalls.h #include asm/cacheflush.h #include asm/code-patching.h +#include asm/machdep.h + +extern void epapr_ev_idle(void); +extern u32 epapr_ev_idle_start[]; bool epapr_para_enabled = false; @@ -39,8 +43,13 @@ static int __init epapr_para_init(void) insts = of_get_property(hyper_node, hcall-instructions, len); if (insts !(len % 4) len = (4 * 4)) { - for (i = 0; i (len / 4); i++) + for (i = 0; i (len / 4); i++) { patch_instruction(epapr_hypercall_start + i, insts[i]); + patch_instruction(epapr_ev_idle_start + i, insts[i]); + } + + if (of_get_property(hyper_node, has-idle, NULL)) + ppc_md.power_save = epapr_ev_idle; epapr_para_enabled = true; } else { -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 3/4] KVM: PPC: epapr: install ev_idle hcall for e500 guest
Liu Yu-B13201 wrote: -Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Tuesday, February 21, 2012 6:54 PM To: Liu Yu-B13201 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc-...@ozlabs.org; Wood Scott-B07421 Subject: Re: [PATCH v5 3/4] KVM: PPC: epapr: install ev_idle hcall for e500 guest Liu Yu wrote: If the guest hypervisor node contains has-idle property. Signed-off-by: Liu Yu yu@freescale.com --- v5: no change arch/powerpc/kernel/epapr_hcalls.S | 29 + arch/powerpc/kernel/epapr_paravirt.c | 11 ++- 2 files changed, 39 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kernel/epapr_hcalls.S b/arch/powerpc/kernel/epapr_hcalls.S index 697b390..72fa234 100644 --- a/arch/powerpc/kernel/epapr_hcalls.S +++ b/arch/powerpc/kernel/epapr_hcalls.S @@ -15,6 +15,35 @@ #include asm/ppc_asm.h #include asm/asm-offsets.h +#define HC_VENDOR_EPAPR(1 16) +#define HC_EV_IDLE 16 Why not use 'EV_IDLE' directly? + +_GLOBAL(epapr_ev_idle) +epapr_ev_idle: + rlwinm r3,r1,0,0,31-THREAD_SHIFT /* current thread_info */ + lwz r4,TI_LOCAL_FLAGS(r3) /* set napping bit */ + ori r4,r4,_TLF_NAPPING /* so when we take an exception */ + stw r4,TI_LOCAL_FLAGS(r3) /* it will return to our caller */ + + wrteei 1 + +idle_loop: + LOAD_REG_IMMEDIATE(r11, HC_VENDOR_EPAPR | HC_EV_IDLE) And could this line be simplified as something like this: LOAD_REG_IMMEDIATE(r11, EV_HCALL_TOKEN(EV_IDLE)) If so, even we can remove the previous HC_VENDOR_EPAPR definition as well. Because the epapr_hcalls.h contains C functions, so it cannot be included by assembly code. These common definitions are already covered in epapr_hcalls.h, but looks you redefine the same items many times, in kvm_para.h/epapr_hcalls.S. And I think maybe we'll also reuse these generics elsewhere lately. So can we limit that with __ASSEMBLY__ in epapr_hcalls.h? Or other way. If so it makes our life easy in the future. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html