Re: [git pull] Please pull powerpc.git next branch
On Wed, Jan 29, Alistair Popple wrote: Looks like I missed the dart iommu code when changing the iommu table initialisation. The patch below should fix it, would you mind testing it Ben? +++ b/arch/powerpc/sysdev/dart_iommu.c + iommu_table_dart.it_page_shift = IOMMU_PAGE_SHIFT_4K; Yes, that fixes it for me. Thanks! Olaf ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Please pull 'next' branch of 5xxx tree
On Wed, Jan 29, 2014 at 18:46 +1100, Benjamin Herrenschmidt wrote: On Tue, 2014-01-28 at 17:00 +1100, Benjamin Herrenschmidt wrote: On Tue, 2014-01-28 at 06:46 +0100, Anatolij Gustschin wrote: Hi Ben ! On Wed, 15 Jan 2014 22:18:59 +0100 Anatolij Gustschin ag...@denx.de wrote: Hi Ben ! please pull mpc5xxx patches for v3.14: Ping. Oops, you sent that while I was on vacation and I missed it. Next time, try to send your pull request earlier if possible, I'd like to have most stuff together before -rc5. I'll try to send this one to Linus after he has pulled my current one. Hrm, I get a merge conflicts with spi-mpc512x-psc.c, please check that I fixed it up properly in powerpc-next and let me know. Did read the merge commit (git show e9a371100dfd), did a build and run test of f878f84373ae powerpc: Wire up sched_setattr and sched_getattr syscalls and everything looks good. Thank you! virtually yours Gerhard Sittig -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr. 5, D-82194 Groebenzell, Germany Phone: +49-8142-66989-0 Fax: +49-8142-66989-80 Email: off...@denx.de ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/ppc32: fix the bug in the init of non-base exception stack for UP
We would allocate one specific exception stack for each kind of non-base exceptions for every CPU. For ppc32 the CPU hard ID is used as the subscript to get the specific exception stack for one CPU. But for an UP kernel, there is only one element in the each kind of exception stack array. We would get stuck if the CPU hard ID is not equal to '0'. So in this case we should use the subscript '0' no matter what the CPU hard ID is. Signed-off-by: Kevin Hao haoke...@gmail.com --- arch/powerpc/kernel/irq.c | 5 + arch/powerpc/kernel/setup_32.c | 5 + 2 files changed, 10 insertions(+) diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c index 9729b23bfb0a..1d0848bba049 100644 --- a/arch/powerpc/kernel/irq.c +++ b/arch/powerpc/kernel/irq.c @@ -559,8 +559,13 @@ void exc_lvl_ctx_init(void) #ifdef CONFIG_PPC64 cpu_nr = i; #else +#ifdef CONFIG_SMP cpu_nr = get_hard_smp_processor_id(i); +#else + cpu_nr = 0; #endif +#endif + memset((void *)critirq_ctx[cpu_nr], 0, THREAD_SIZE); tp = critirq_ctx[cpu_nr]; tp-cpu = cpu_nr; diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c index 2b0da27eaee4..04cc4fcca78b 100644 --- a/arch/powerpc/kernel/setup_32.c +++ b/arch/powerpc/kernel/setup_32.c @@ -247,7 +247,12 @@ static void __init exc_lvl_early_init(void) /* interrupt stacks must be in lowmem, we get that for free on ppc32 * as the memblock is limited to lowmem by MEMBLOCK_REAL_LIMIT */ for_each_possible_cpu(i) { +#ifdef CONFIG_SMP hw_cpu = get_hard_smp_processor_id(i); +#else + hw_cpu = 0; +#endif + critirq_ctx[hw_cpu] = (struct thread_info *) __va(memblock_alloc(THREAD_SIZE, THREAD_SIZE)); #ifdef CONFIG_BOOKE -- 1.8.5.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
On Tue, 28 Jan 2014, Nishanth Aravamudan wrote: This helps about the same as David's patch -- but I found the reason why! ppc64 doesn't set CONFIG_HAVE_MEMORYLESS_NODES :) Expect a patch shortly for that and one other case I found. Oww... ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: enable CONFIG_HAVE_MEMORYLESS_NODES
On Tue, 28 Jan 2014, Nishanth Aravamudan wrote: Anton Blanchard found an issue with an LPAR that had no memory in Node 0. Christoph Lameter recommended, as one possible solution, to use numa_mem_id() for locality of the nearest memory node-wise. However, numa_mem_id() [and the other related APIs] are only useful if CONFIG_HAVE_MEMORYLESS_NODES is set. This is only the case for ia64 currently, but clearly we can have memoryless nodes on ppc64. Add the Kconfig option and define it to be the same value as CONFIG_NUMA. Well this is trivial but if you need encouragement: Reviewed-by: Christoph Lameter c...@linux.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 01/10] KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation
On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote: We definitely don't need to emulate mtspr, because both the registers are hypervisor resource. This patch description doesn't cover what the patch actually does. It changes the implementation from always tell the guest it uses 100% to give the guest an accurate amount of cpu time spent inside guest context. Also, I think we either go with full hyp semantics which means we also emulate the offset or we go with no hyp awareness in the guest at all which means we also don't emulate SPURR which is a hyp privileged register. Otherwise I like the patch :). Alex Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s.h | 2 -- arch/powerpc/include/asm/kvm_host.h | 4 ++-- arch/powerpc/kvm/book3s_emulate.c | 16 arch/powerpc/kvm/book3s_pr.c | 10 ++ 4 files changed, 20 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index bc23b1ba7980..396448afa38b 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -83,8 +83,6 @@ struct kvmppc_vcpu_book3s { u64 sdr1; u64 hior; u64 msr_mask; - u64 purr_offset; - u64 spurr_offset; #ifdef CONFIG_PPC_BOOK3S_32 u32 vsid_pool[VSID_POOL_SIZE]; u32 vsid_next; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 9a0cdb2c9d58..0a3785271f34 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -506,8 +506,8 @@ struct kvm_vcpu_arch { #ifdef CONFIG_BOOKE u32 decar; #endif - u32 tbl; - u32 tbu; + /* Time base value when we entered the guest */ + u64 entry_tb; u32 tcr; ulong tsr; /* we need to perform set/clr_bits() which requires ulong */ u32 ivor[64]; diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index a7d54aa203d0..e1f1e5e16449 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -422,12 +422,6 @@ int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) (mfmsr() MSR_HV)) vcpu-arch.hflags |= BOOK3S_HFLAG_DCBZ32; break; - case SPRN_PURR: - to_book3s(vcpu)-purr_offset = spr_val - get_tb(); - break; - case SPRN_SPURR: - to_book3s(vcpu)-spurr_offset = spr_val - get_tb(); - break; case SPRN_GQR0: case SPRN_GQR1: case SPRN_GQR2: @@ -523,10 +517,16 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val *spr_val = 0; break; case SPRN_PURR: - *spr_val = get_tb() + to_book3s(vcpu)-purr_offset; + /* +* On exit we would have updated purr +*/ + *spr_val = vcpu-arch.purr; break; case SPRN_SPURR: - *spr_val = get_tb() + to_book3s(vcpu)-purr_offset; + /* +* On exit we would have updated spurr +*/ + *spr_val = vcpu-arch.spurr; break; case SPRN_GQR0: case SPRN_GQR1: diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index fdcbabdfb709..02231f5193c2 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -115,6 +115,11 @@ void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu, svcpu-lr = vcpu-arch.lr; svcpu-pc = vcpu-arch.pc; svcpu-in_use = true; + /* +* Now also save the current time base value. We use this +* to find the guest purr and spurr value. +*/ + vcpu-arch.entry_tb = get_tb(); } /* Copy data touched by real-mode code from shadow vcpu back to vcpu */ @@ -161,6 +166,11 @@ void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu, out: preempt_enable(); + /* +* Update purr and spurr using time base +*/ + vcpu-arch.purr += get_tb() - vcpu-arch.entry_tb; + vcpu-arch.spurr += get_tb() - vcpu-arch.entry_tb; } static int kvmppc_core_check_requests_pr(struct kvm_vcpu *vcpu) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 02/10] KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote: virtual time base register is a per vm register and need to saved and restored on vm exit and entry. Writing to VTB is not allowed in the privileged mode. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/include/asm/reg.h | 7 +++ arch/powerpc/include/asm/time.h | 12 arch/powerpc/kvm/book3s_emulate.c | 3 +++ arch/powerpc/kvm/book3s_pr.c| 3 +++ 5 files changed, 26 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 0a3785271f34..9ebdd12e50a9 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -508,6 +508,7 @@ struct kvm_vcpu_arch { #endif /* Time base value when we entered the guest */ u64 entry_tb; + u64 entry_vtb; u32 tcr; ulong tsr; /* we need to perform set/clr_bits() which requires ulong */ u32 ivor[64]; diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index e789f76c9bc2..6c649355b1e9 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -1161,6 +1161,13 @@ #define mtspr(rn, v) asm volatile(mtspr __stringify(rn) ,%0 : \ : r ((unsigned long)(v)) \ : memory) +#ifdef CONFIG_PPC_BOOK3S_64 +#define mfvtb()({unsigned long rval; \ + asm volatile(mfspr %0, %1 : \ +=r (rval) : i (SPRN_VTB)); rval;}) +#else +#define mfvtb() BUG() +#endif static inline mfvtb(unsigned long) { #ifdef CONFIG_PPC_BOOK3S_64 return mfspr(SPRN_VTB); #else BUG(); #endif } is a lot easier to read and get right. But reg.h is Ben's call. Also could you please give me a pointer to the specification for it? I tried to look up vtb in the 2.06 ISA and couldn't find it. Is it a CPU specific register? #ifdef __powerpc64__ #if defined(CONFIG_PPC_CELL) || defined(CONFIG_PPC_FSL_BOOK3E) diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h index c1f267694acb..1e89dbc665d9 100644 --- a/arch/powerpc/include/asm/time.h +++ b/arch/powerpc/include/asm/time.h @@ -101,6 +101,18 @@ static inline u64 get_rtc(void) return (u64)hi * 10 + lo; } +#ifdef CONFIG_PPC_BOOK3S_64 +static inline u64 get_vtb(void) +{ + return mfvtb(); +} +#else +static inline u64 get_vtb(void) +{ + return 0; +} +#endif Just put the #ifdef inside the function body. + #ifdef CONFIG_PPC64 static inline u64 get_tb(void) { diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index e1f1e5e16449..4b58d8a90cb5 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -528,6 +528,9 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val */ *spr_val = vcpu-arch.spurr; break; + case SPRN_VTB: + *spr_val = vcpu-arch.vtb; + break; case SPRN_GQR0: case SPRN_GQR1: case SPRN_GQR2: diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 02231f5193c2..b5598e9cdd09 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -120,6 +120,8 @@ void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu, * to find the guest purr and spurr value. */ vcpu-arch.entry_tb = get_tb(); + vcpu-arch.entry_vtb = get_vtb(); + } /* Copy data touched by real-mode code from shadow vcpu back to vcpu */ @@ -171,6 +173,7 @@ out: */ vcpu-arch.purr += get_tb() - vcpu-arch.entry_tb; vcpu-arch.spurr += get_tb() - vcpu-arch.entry_tb; + vcpu-arch.vtb += get_vtb() - vcpu-arch.entry_vtb; I thought it's per vm? That would contradict the per-vcpu logic you're implementing here. This way vtb scews with world switches on SMP guests. Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 03/10] KVM: PPC: BOOK3S: PR: Emulate instruction counter
On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote: Writing to IC is not allowed in the privileged mode. This is not a patch description. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/kvm/book3s_emulate.c | 3 +++ arch/powerpc/kvm/book3s_pr.c| 2 ++ 3 files changed, 6 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 9ebdd12e50a9..e0b13aca98e6 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -509,6 +509,7 @@ struct kvm_vcpu_arch { /* Time base value when we entered the guest */ u64 entry_tb; u64 entry_vtb; + u64 entry_ic; u32 tcr; ulong tsr; /* we need to perform set/clr_bits() which requires ulong */ u32 ivor[64]; diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 4b58d8a90cb5..abe6f3057e5b 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -531,6 +531,9 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val case SPRN_VTB: *spr_val = vcpu-arch.vtb; break; + case SPRN_IC: + *spr_val = vcpu-arch.ic; + break; case SPRN_GQR0: case SPRN_GQR1: case SPRN_GQR2: diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index b5598e9cdd09..51d469f8c9fd 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -121,6 +121,7 @@ void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu, */ vcpu-arch.entry_tb = get_tb(); vcpu-arch.entry_vtb = get_vtb(); + vcpu-arch.entry_ic = mfspr(SPRN_IC); Is this implemented on all systems? } @@ -174,6 +175,7 @@ out: vcpu-arch.purr += get_tb() - vcpu-arch.entry_tb; vcpu-arch.spurr += get_tb() - vcpu-arch.entry_tb; vcpu-arch.vtb += get_vtb() - vcpu-arch.entry_vtb; + vcpu-arch.ic += mfspr(SPRN_IC) - vcpu-arch.entry_ic; This is getting quite convoluted. How about we act slightly more fuzzy and put all of this into vcpu_load/put? Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] Handle vmalloc addresses
The nx-842 compression driver does not currently handle getting a physical address for vmalloc addresses. The current driver uses __pa() for all addresses which does not properly handle vmalloc addresses and thus causes a failure since we do not pass a proper physical address to phyp. This patch adds a routine to convert an address to a physical address by checking for vmalloc addresses and handling them properly. Signed-off-by: Nathan Fontenot nf...@linux.vnet.ibm.com --- drivers/crypto/nx/nx-842.c | 29 +++-- 1 file changed, 19 insertions(+), 10 deletions(-) Index: linux/drivers/crypto/nx/nx-842.c === --- linux.orig/drivers/crypto/nx/nx-842.c 2014-01-22 08:52:55.0 -0600 +++ linux/drivers/crypto/nx/nx-842.c2014-01-29 08:25:33.0 -0600 @@ -158,6 +158,15 @@ return sl-entry_nr * sizeof(struct nx842_slentry); } +static inline unsigned long nx842_get_pa(void *addr) +{ + if (is_vmalloc_addr(addr)) + return page_to_phys(vmalloc_to_page(addr)) + + offset_in_page(addr); + else + return __pa(addr); +} + static int nx842_build_scatterlist(unsigned long buf, int len, struct nx842_scatterlist *sl) { @@ -168,7 +177,7 @@ entry = sl-entries; while (len) { - entry-ptr = __pa(buf); + entry-ptr = nx842_get_pa((void *)buf); nextpage = ALIGN(buf + 1, NX842_HW_PAGE_SIZE); if (nextpage buf + len) { /* we aren't at the end yet */ @@ -370,8 +379,8 @@ op.flags = NX842_OP_COMPRESS; csbcpb = workmem-csbcpb; memset(csbcpb, 0, sizeof(*csbcpb)); - op.csbcpb = __pa(csbcpb); - op.out = __pa(slout.entries); + op.csbcpb = nx842_get_pa(csbcpb); + op.out = nx842_get_pa(slout.entries); for (i = 0; i hdr-blocks_nr; i++) { /* @@ -401,13 +410,13 @@ */ if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) { /* Create direct DDE */ - op.in = __pa(inbuf); + op.in = nx842_get_pa((void *)inbuf); op.inlen = max_sync_size; } else { /* Create indirect DDE (scatterlist) */ nx842_build_scatterlist(inbuf, max_sync_size, slin); - op.in = __pa(slin.entries); + op.in = nx842_get_pa(slin.entries); op.inlen = -nx842_get_scatterlist_size(slin); } @@ -565,7 +574,7 @@ op.flags = NX842_OP_DECOMPRESS; csbcpb = workmem-csbcpb; memset(csbcpb, 0, sizeof(*csbcpb)); - op.csbcpb = __pa(csbcpb); + op.csbcpb = nx842_get_pa(csbcpb); /* * max_sync_size may have changed since compression, @@ -597,12 +606,12 @@ if (likely((inbuf NX842_HW_PAGE_MASK) == ((inbuf + hdr-sizes[i] - 1) NX842_HW_PAGE_MASK))) { /* Create direct DDE */ - op.in = __pa(inbuf); + op.in = nx842_get_pa((void *)inbuf); op.inlen = hdr-sizes[i]; } else { /* Create indirect DDE (scatterlist) */ nx842_build_scatterlist(inbuf, hdr-sizes[i] , slin); - op.in = __pa(slin.entries); + op.in = nx842_get_pa(slin.entries); op.inlen = -nx842_get_scatterlist_size(slin); } @@ -613,12 +622,12 @@ */ if (likely(max_sync_size == NX842_HW_PAGE_SIZE)) { /* Create direct DDE */ - op.out = __pa(outbuf); + op.out = nx842_get_pa((void *)outbuf); op.outlen = max_sync_size; } else { /* Create indirect DDE (scatterlist) */ nx842_build_scatterlist(outbuf, max_sync_size, slout); - op.out = __pa(slout.entries); + op.out = nx842_get_pa(slout.entries); op.outlen = -nx842_get_scatterlist_size(slout); } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 07/10] KVM: PPC: BOOK3S: PR: Emulate facility status and control register
On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote: We allow priv-mode update of this. The guest value is saved in fscr, and the value actually used is saved in shadow_fscr. shadow_fscr only contains values that are allowed by the host. On facility unavailable interrupt, if the facility is allowed by fscr but disabled in shadow_fscr we need to emulate the support. Currently all but EBB is disabled. We still don't support performance monitoring in PR guest. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_asm.h | 1 + arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/kernel/asm-offsets.c | 2 ++ arch/powerpc/kvm/book3s_emulate.c | 16 arch/powerpc/kvm/book3s_interrupts.S | 25 ++--- 5 files changed, 42 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 192917d2239c..abd42523ad93 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -103,6 +103,7 @@ struct kvmppc_host_state { #ifdef CONFIG_PPC_BOOK3S_64 u64 cfar; u64 ppr; + u64 host_fscr; #endif }; diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index e0b13aca98e6..f4be7be14330 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -478,6 +478,7 @@ struct kvm_vcpu_arch { ulong ppr; ulong pspb; ulong fscr; + ulong shadow_fscr; ulong tfhar; ulong tfiar; ulong texasr; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 2c2227da6917..7484676b8f25 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -525,6 +525,7 @@ int main(void) DEFINE(VCPU_CFAR, offsetof(struct kvm_vcpu, arch.cfar)); DEFINE(VCPU_PPR, offsetof(struct kvm_vcpu, arch.ppr)); DEFINE(VCPU_FSCR, offsetof(struct kvm_vcpu, arch.fscr)); + DEFINE(VCPU_SHADOW_FSCR, offsetof(struct kvm_vcpu, arch.shadow_fscr)); DEFINE(VCPU_PSPB, offsetof(struct kvm_vcpu, arch.pspb)); DEFINE(VCPU_TFHAR, offsetof(struct kvm_vcpu, arch.tfhar)); DEFINE(VCPU_TFIAR, offsetof(struct kvm_vcpu, arch.tfiar)); @@ -626,6 +627,7 @@ int main(void) #ifdef CONFIG_PPC_BOOK3S_64 HSTATE_FIELD(HSTATE_CFAR, cfar); HSTATE_FIELD(HSTATE_PPR, ppr); + HSTATE_FIELD(HSTATE_FSCR, host_fscr); #endif /* CONFIG_PPC_BOOK3S_64 */ #else /* CONFIG_PPC_BOOK3S */ diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 7f25adbd2590..60d0b6b745e7 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -468,6 +468,19 @@ int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) case SPRN_MSSSR0: case SPRN_DABR: break; + case SPRN_FSCR: + { + ulong host_fscr = mfspr(SPRN_FSCR); + /* +* We disable FSCR_EBB for pr guest. TAR and DSCR are always +* enabled. +*/ + if (spr_val ~(FSCR_TAR|FSCR_DSCR|FSCR_EBB)) + pr_info(KVM: invalud FSCR value 0x%lx, spr_val); Is this worth printing at all? If it is, it's probably more of a pr_debug(). Also s/invalud/invalid/. Alex + vcpu-arch.fscr = spr_val (FSCR_TAR|FSCR_DSCR); + vcpu-arch.shadow_fscr = vcpu-arch.fscr host_fscr; + break; + } unprivileged: default: printk(KERN_INFO KVM: invalid SPR write: %d\n, sprn); @@ -591,6 +604,9 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val */ *spr_val = 0; break; + case SPRN_FSCR: + *spr_val = vcpu-arch.fscr; + break; default: unprivileged: printk(KERN_INFO KVM: invalid SPR read: %d\n, sprn); diff --git a/arch/powerpc/kvm/book3s_interrupts.S b/arch/powerpc/kvm/book3s_interrupts.S index f779450cb07c..fcbdf4817301 100644 --- a/arch/powerpc/kvm/book3s_interrupts.S +++ b/arch/powerpc/kvm/book3s_interrupts.S @@ -107,6 +107,14 @@ kvm_start_lightweight: ld r3, VCPU_SHARED(r4) ld r3, VCPU_SHARED_SPRG3(r3) mtspr SPRN_SPRG3, r3 + +BEGIN_FTR_SECTION + mfspr r3,SPRN_FSCR + PPC_STL r3, HSTATE_FSCR(r13) + + PPC_LL r3, VCPU_SHADOW_FSCR(r4) + mtspr SPRN_FSCR, r3 +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) #endif /* CONFIG_PPC_BOOK3S_64 */ PPC_LL r4, VCPU_SHADOW_MSR(r4) /* get shadow_msr */ @@ -148,6 +156,9 @@ kvm_start_lightweight: bl FUNC(kvmppc_copy_from_svcpu) nop + /* R7 = vcpu */ + PPC_LL r7, GPR4(r1) + #ifdef CONFIG_PPC_BOOK3S_64 /*
Re: [RFC PATCH 08/10] KVM: PPC: BOOK3S: PR: Add support for facility unavailable interrupt
On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote: At this point we allow all the supported facilities except EBB. So forward the interrupt to guest as illegal instruction. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_asm.h | 4 +++- arch/powerpc/kvm/book3s.c | 4 arch/powerpc/kvm/book3s_emulate.c | 18 ++ arch/powerpc/kvm/book3s_pr.c | 17 + 4 files changed, 42 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index 1bd92fd43cfb..799244face51 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -99,6 +99,7 @@ #define BOOK3S_INTERRUPT_PERFMON 0xf00 #define BOOK3S_INTERRUPT_ALTIVEC 0xf20 #define BOOK3S_INTERRUPT_VSX 0xf40 +#define BOOK3S_INTERRUPT_FAC_UNAVAIL0xf60 #define BOOK3S_IRQPRIO_SYSTEM_RESET 0 #define BOOK3S_IRQPRIO_DATA_SEGMENT 1 @@ -117,7 +118,8 @@ #define BOOK3S_IRQPRIO_DECREMENTER14 #define BOOK3S_IRQPRIO_PERFORMANCE_MONITOR15 #define BOOK3S_IRQPRIO_EXTERNAL_LEVEL 16 -#define BOOK3S_IRQPRIO_MAX 17 +#define BOOK3S_IRQPRIO_FAC_UNAVAIL 17 +#define BOOK3S_IRQPRIO_MAX 18 #define BOOK3S_HFLAG_DCBZ32 0x1 #define BOOK3S_HFLAG_SLB 0x2 diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 8912608b7e1b..a9aea28c2677 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -143,6 +143,7 @@ static int kvmppc_book3s_vec2irqprio(unsigned int vec) case 0xd00: prio = BOOK3S_IRQPRIO_DEBUG;break; case 0xf20: prio = BOOK3S_IRQPRIO_ALTIVEC; break; case 0xf40: prio = BOOK3S_IRQPRIO_VSX; break; + case 0xf60: prio = BOOK3S_IRQPRIO_FAC_UNAVAIL; break; default:prio = BOOK3S_IRQPRIO_MAX; break; } @@ -273,6 +274,9 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) case BOOK3S_IRQPRIO_PERFORMANCE_MONITOR: vec = BOOK3S_INTERRUPT_PERFMON; break; + case BOOK3S_IRQPRIO_FAC_UNAVAIL: + vec = BOOK3S_INTERRUPT_FAC_UNAVAIL; + break; default: deliver = 0; printk(KERN_ERR KVM: Unknown interrupt: 0x%x\n, priority); diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 60d0b6b745e7..bf6b11021250 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -481,6 +481,15 @@ int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) vcpu-arch.shadow_fscr = vcpu-arch.fscr host_fscr; break; } + case SPRN_EBBHR: + vcpu-arch.ebbhr = spr_val; + break; + case SPRN_EBBRR: + vcpu-arch.ebbrr = spr_val; + break; + case SPRN_BESCR: + vcpu-arch.bescr = spr_val; + break; unprivileged: default: printk(KERN_INFO KVM: invalid SPR write: %d\n, sprn); @@ -607,6 +616,15 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val case SPRN_FSCR: *spr_val = vcpu-arch.fscr; break; + case SPRN_EBBHR: + *spr_val = vcpu-arch.ebbhr; + break; + case SPRN_EBBRR: + *spr_val = vcpu-arch.ebbrr; + break; + case SPRN_BESCR: + *spr_val = vcpu-arch.bescr; + break; default: unprivileged: printk(KERN_INFO KVM: invalid SPR read: %d\n, sprn); diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 51d469f8c9fd..828056ec208f 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -900,6 +900,23 @@ int kvmppc_handle_exit_pr(struct kvm_run *run, struct kvm_vcpu *vcpu, case BOOK3S_INTERRUPT_PERFMON: r = RESUME_GUEST; break; + case BOOK3S_INTERRUPT_FAC_UNAVAIL: + { + /* +* Check for the facility that need to be emulated +*/ + ulong fscr_ic = vcpu-arch.shadow_fscr 56; + if (fscr_ic != FSCR_EBB_LG) { + /* +* We only disable EBB facility. +* So only emulate that. I don't understand the comment. We emulate nothing at all here. We either - hit an EBB unavailable in which case we send the guest an illegal instruction interrupt or we - hit another facility interrupt in which case we forward the interrupt to the guest, but not the interrupt cause (fscr_ic). I think the EBB case should be explicit: /* We don't allow
Re: [RFC PATCH 10/10] PPC: BOOK3S: Disable/Enable TM looking at the ibm,pa-features device tree entry
On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote: Runtime disable transactional memory feature looking at pa-features device tree entry. We need to do this so that we can run a kernel built with TM config in PR mode. For PR guest we provide a device tree entry with TM feature disabled in pa-features Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com We need to be able to run kernels without this patch, so better fix TM for good - worst case by always aborting transactions. Alex --- arch/powerpc/kernel/prom.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index fa0ad8aafbcc..de8c2caf1024 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -160,6 +160,11 @@ static struct ibm_pa_feature { {CPU_FTR_NODSISRALIGN, 0, 0,1, 1, 1}, {0, MMU_FTR_CI_LARGE_PAGE, 0, 1, 2, 0}, {CPU_FTR_REAL_LE, PPC_FEATURE_TRUE_LE, 5, 0, 0}, + /* +* We should use CPU_FTR_TM_COMP so that if we disable TM, it won't get +* enabled via device tree +*/ + {CPU_FTR_TM_COMP, 0, 0, 22, 0, 0}, }; static void __init scan_features(unsigned long node, unsigned char *ftrs, ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 0/6] setting the table for integration of cpuidle with the scheduler
As everyone should know by now, we want to integrate the cpuidle governor with the scheduler for a more efficient idling of CPUs. In order to help the transition, this small patch series moves the existing interaction with cpuidle from architecture code to generic core code. The ARM, PPC, SH and X86 architectures are concerned. No functional change should have occurred yet. @peterz: Are you willing to pick up those patches? Change from v1: - dropped removal of arch_cpu_idle_prepare() arch/arm/kernel/process.c | 16 +++-- arch/powerpc/platforms/pseries/processor_idle.c | 5 +++ arch/powerpc/platforms/pseries/setup.c | 34 --- arch/sh/kernel/idle.c | 4 +-- arch/x86/kernel/process.c | 5 +-- kernel/Makefile | 1 - kernel/cpu/Makefile | 1 - kernel/sched/Makefile | 2 +- kernel/{cpu = sched}/idle.c| 4 ++- 9 files changed, 30 insertions(+), 42 deletions(-) Nicolas ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop
In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code. Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver. Signed-off-by: Nicolas Pitre n...@linaro.org Acked-by: Daniel Lezcano daniel.lezc...@linaro.org --- kernel/cpu/idle.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/cpu/idle.c b/kernel/cpu/idle.c index 988573a9a3..ffcd3ee9af 100644 --- a/kernel/cpu/idle.c +++ b/kernel/cpu/idle.c @@ -3,6 +3,7 @@ */ #include linux/sched.h #include linux/cpu.h +#include linux/cpuidle.h #include linux/tick.h #include linux/mm.h #include linux/stackprotector.h @@ -95,7 +96,8 @@ static void cpu_idle_loop(void) if (!current_clr_polling_and_test()) { stop_critical_timings(); rcu_idle_enter(); - arch_cpu_idle(); + if (cpuidle_idle_call()) + arch_cpu_idle(); WARN_ON_ONCE(irqs_disabled()); rcu_idle_exit(); start_critical_timings(); -- 1.8.4.108.g55ea5f6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 2/6] ARM: remove redundant cpuidle_idle_call()
The core idle loop now takes care of it. Signed-off-by: Nicolas Pitre n...@linaro.org Acked-by: Daniel Lezcano daniel.lezc...@linaro.org --- arch/arm/kernel/process.c | 16 +--- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c index 92f7b15dd2..adabeababe 100644 --- a/arch/arm/kernel/process.c +++ b/arch/arm/kernel/process.c @@ -30,7 +30,6 @@ #include linux/uaccess.h #include linux/random.h #include linux/hw_breakpoint.h -#include linux/cpuidle.h #include linux/leds.h #include linux/reboot.h @@ -133,7 +132,11 @@ EXPORT_SYMBOL_GPL(arm_pm_restart); void (*arm_pm_idle)(void); -static void default_idle(void) +/* + * Called from the core idle loop. + */ + +void arch_cpu_idle(void) { if (arm_pm_idle) arm_pm_idle(); @@ -168,15 +171,6 @@ void arch_cpu_idle_dead(void) #endif /* - * Called from the core idle loop. - */ -void arch_cpu_idle(void) -{ - if (cpuidle_idle_call()) - default_idle(); -} - -/* * Called by kexec, immediately prior to machine_kexec(). * * This must completely disable all secondary CPUs; simply causing those CPUs -- 1.8.4.108.g55ea5f6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 5/6] X86: remove redundant cpuidle_idle_call()
The core idle loop now takes care of it. Signed-off-by: Nicolas Pitre n...@linaro.org Acked-by: Daniel Lezcano daniel.lezc...@linaro.org --- arch/x86/kernel/process.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 3fb8d95ab8..4505e2a950 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -298,10 +298,7 @@ void arch_cpu_idle_dead(void) */ void arch_cpu_idle(void) { - if (cpuidle_idle_call()) - x86_idle(); - else - local_irq_enable(); + x86_idle(); } /* -- 1.8.4.108.g55ea5f6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 3/6] PPC: remove redundant cpuidle_idle_call()
The core idle loop now takes care of it. However a few things need checking: - Invocation of cpuidle_idle_call() in pseries_lpar_idle() happened through arch_cpu_idle() and was therefore always preceded by a call to ppc64_runlatch_off(). To preserve this property now that cpuidle_idle_call() is invoked directly from core code, a call to ppc64_runlatch_off() has been added to idle_loop_prolog() in platforms/pseries/processor_idle.c. - Similarly, cpuidle_idle_call() was followed by ppc64_runlatch_off() so a call to the later has been added to idle_loop_epilog(). - And since arch_cpu_idle() always made sure to re-enable IRQs if they were not enabled, this is now done in idle_loop_epilog() as well. The above was made in order to keep the execution flow close to the original. I don't know if that was strictly necessary. Someone well aquainted with the platform details might find some room for possible optimizations. Signed-off-by: Nicolas Pitre n...@linaro.org Reviewed-by: Preeti U Murthy pre...@linux.vnet.ibm.com --- arch/powerpc/platforms/pseries/processor_idle.c | 5 arch/powerpc/platforms/pseries/setup.c | 34 ++--- 2 files changed, 19 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c index a166e38bd6..72ddfe3d2f 100644 --- a/arch/powerpc/platforms/pseries/processor_idle.c +++ b/arch/powerpc/platforms/pseries/processor_idle.c @@ -33,6 +33,7 @@ static struct cpuidle_state *cpuidle_state_table; static inline void idle_loop_prolog(unsigned long *in_purr) { + ppc64_runlatch_off(); *in_purr = mfspr(SPRN_PURR); /* * Indicate to the HV that we are idle. Now would be @@ -49,6 +50,10 @@ static inline void idle_loop_epilog(unsigned long in_purr) wait_cycles += mfspr(SPRN_PURR) - in_purr; get_lppaca()-wait_state_cycles = cpu_to_be64(wait_cycles); get_lppaca()-idle = 0; + + if (irqs_disabled()) + local_irq_enable(); + ppc64_runlatch_on(); } static int snooze_loop(struct cpuidle_device *dev, diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index c1f1908587..7604c19d54 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -39,7 +39,6 @@ #include linux/irq.h #include linux/seq_file.h #include linux/root_dev.h -#include linux/cpuidle.h #include linux/of.h #include linux/kexec.h @@ -356,29 +355,24 @@ early_initcall(alloc_dispatch_log_kmem_cache); static void pseries_lpar_idle(void) { - /* This would call on the cpuidle framework, and the back-end pseries -* driver to go to idle states + /* +* Default handler to go into low thread priority and possibly +* low power mode by cedeing processor to hypervisor */ - if (cpuidle_idle_call()) { - /* On error, execute default handler -* to go into low thread priority and possibly -* low power mode by cedeing processor to hypervisor -*/ - /* Indicate to hypervisor that we are idle. */ - get_lppaca()-idle = 1; + /* Indicate to hypervisor that we are idle. */ + get_lppaca()-idle = 1; - /* -* Yield the processor to the hypervisor. We return if -* an external interrupt occurs (which are driven prior -* to returning here) or if a prod occurs from another -* processor. When returning here, external interrupts -* are enabled. -*/ - cede_processor(); + /* +* Yield the processor to the hypervisor. We return if +* an external interrupt occurs (which are driven prior +* to returning here) or if a prod occurs from another +* processor. When returning here, external interrupts +* are enabled. +*/ + cede_processor(); - get_lppaca()-idle = 0; - } + get_lppaca()-idle = 0; } /* -- 1.8.4.108.g55ea5f6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 4/6] SH: remove redundant cpuidle_idle_call()
The core idle loop now takes care of it. Signed-off-by: Nicolas Pitre n...@linaro.org Acked-by: Daniel Lezcano daniel.lezc...@linaro.org --- arch/sh/kernel/idle.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c index 2ea4483fd7..be616ee0cf 100644 --- a/arch/sh/kernel/idle.c +++ b/arch/sh/kernel/idle.c @@ -16,7 +16,6 @@ #include linux/thread_info.h #include linux/irqflags.h #include linux/smp.h -#include linux/cpuidle.h #include linux/atomic.h #include asm/pgalloc.h #include asm/smp.h @@ -40,8 +39,7 @@ void arch_cpu_idle_dead(void) void arch_cpu_idle(void) { - if (cpuidle_idle_call()) - sh_idle(); + sh_idle(); } void __init select_idle_routine(void) -- 1.8.4.108.g55ea5f6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 6/6] cpu/idle.c: move to sched/idle.c
Integration of cpuidle with the scheduler requires that the idle loop be closely integrated with the scheduler proper. Moving cpu/idle.c into the sched directory will allow for a smoother integration, and eliminate a subdirectory which contained only one source file. Signed-off-by: Nicolas Pitre n...@linaro.org --- kernel/Makefile | 1 - kernel/cpu/Makefile | 1 - kernel/sched/Makefile| 2 +- kernel/{cpu = sched}/idle.c | 0 4 files changed, 1 insertion(+), 3 deletions(-) delete mode 100644 kernel/cpu/Makefile rename kernel/{cpu = sched}/idle.c (100%) diff --git a/kernel/Makefile b/kernel/Makefile index bc010ee272..6f1c7e5cfc 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -22,7 +22,6 @@ obj-y += sched/ obj-y += locking/ obj-y += power/ obj-y += printk/ -obj-y += cpu/ obj-y += irq/ obj-y += rcu/ diff --git a/kernel/cpu/Makefile b/kernel/cpu/Makefile deleted file mode 100644 index 59ab052ef7..00 --- a/kernel/cpu/Makefile +++ /dev/null @@ -1 +0,0 @@ -obj-y = idle.o diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile index 7b621409cf..ac3e0ea68f 100644 --- a/kernel/sched/Makefile +++ b/kernel/sched/Makefile @@ -11,7 +11,7 @@ ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y) CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer endif -obj-y += core.o proc.o clock.o cputime.o idle_task.o fair.o rt.o stop_task.o +obj-y += core.o proc.o clock.o cputime.o idle_task.o idle.o fair.o rt.o stop_task.o obj-y += wait.o completion.o obj-$(CONFIG_SMP) += cpupri.o obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o diff --git a/kernel/cpu/idle.c b/kernel/sched/idle.c similarity index 100% rename from kernel/cpu/idle.c rename to kernel/sched/idle.c -- 1.8.4.108.g55ea5f6 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 5/6] X86: remove redundant cpuidle_idle_call()
Hi, On Wed, Jan 29, 2014 at 9:45 AM, Nicolas Pitre nicolas.pi...@linaro.org wrote: The core idle loop now takes care of it. Signed-off-by: Nicolas Pitre n...@linaro.org Acked-by: Daniel Lezcano daniel.lezc...@linaro.org --- arch/x86/kernel/process.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 3fb8d95ab8..4505e2a950 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -298,10 +298,7 @@ void arch_cpu_idle_dead(void) */ void arch_cpu_idle(void) { - if (cpuidle_idle_call()) - x86_idle(); - else - local_irq_enable(); + x86_idle(); You're taking out the local_irq_enable() here but I don't see the equivalent of adding it back in the 1/6 patch that moves the cpuidle_idle_call() up to common code. It seems that one of the call paths through cpuidle_idle_call() don't re-enable it on its own. Even if this is the right thing to do, why it's OK to do so should probably be documented in the patch description. -Olof ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 5/6] X86: remove redundant cpuidle_idle_call()
On Wed, 29 Jan 2014, Olof Johansson wrote: Hi, On Wed, Jan 29, 2014 at 9:45 AM, Nicolas Pitre nicolas.pi...@linaro.org wrote: The core idle loop now takes care of it. Signed-off-by: Nicolas Pitre n...@linaro.org Acked-by: Daniel Lezcano daniel.lezc...@linaro.org --- arch/x86/kernel/process.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 3fb8d95ab8..4505e2a950 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -298,10 +298,7 @@ void arch_cpu_idle_dead(void) */ void arch_cpu_idle(void) { - if (cpuidle_idle_call()) - x86_idle(); - else - local_irq_enable(); + x86_idle(); You're taking out the local_irq_enable() here but I don't see the equivalent of adding it back in the 1/6 patch that moves the cpuidle_idle_call() up to common code. It seems that one of the call paths through cpuidle_idle_call() don't re-enable it on its own. When cpuidle_idle_call() returns non-zero, IRQs are left disabled. When it returns zero then IRQs should be disabled. Same goes for cpuidle drivers. That's the theory at least. Looking into some cpuidle drivers for x86 I found at least one that doesn't respect this convention. Damn. Even if this is the right thing to do, why it's OK to do so should probably be documented in the patch description. Better yet, I'm going to amend patch 1/6 with the below to make things more reliable while still identifying misbehaving drivers. diff --git a/kernel/cpu/idle.c b/kernel/cpu/idle.c index ffcd3ee9af..14ca43430a 100644 --- a/kernel/cpu/idle.c +++ b/kernel/cpu/idle.c @@ -98,7 +98,8 @@ static void cpu_idle_loop(void) rcu_idle_enter(); if (cpuidle_idle_call()) arch_cpu_idle(); - WARN_ON_ONCE(irqs_disabled()); + if (WARN_ON_ONCE(irqs_disabled())) + local_irq_enable(); rcu_idle_exit(); start_critical_timings(); } else { ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop
On Wed, 29 Jan 2014, Nicolas Pitre wrote: In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code. Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver. Signed-off-by: Nicolas Pitre n...@linaro.org Acked-by: Daniel Lezcano daniel.lezc...@linaro.org As mentioned in my reply to Olof's comment on patch #5/6, here's a new version of this patch adding the safety local_irq_enable() to the core code. - 8 From: Nicolas Pitre nicolas.pi...@linaro.org Subject: idle: move the cpuidle entry point to the generic idle loop In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code. Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver. In both cases i.e. whether it is a cpuidle driver or the default arch_cpu_idle(), the calling convention expects IRQs to be disabled on entry and enabled on exit. There is a warning in place already but let's add a forced IRQ enable here as well. This will allow for removing the forced IRQ enable some implementations do locally and allowing for the warning to trig. Signed-off-by: Nicolas Pitre n...@linaro.org diff --git a/kernel/cpu/idle.c b/kernel/cpu/idle.c index 988573a9a3..14ca43430a 100644 --- a/kernel/cpu/idle.c +++ b/kernel/cpu/idle.c @@ -3,6 +3,7 @@ */ #include linux/sched.h #include linux/cpu.h +#include linux/cpuidle.h #include linux/tick.h #include linux/mm.h #include linux/stackprotector.h @@ -95,8 +96,10 @@ static void cpu_idle_loop(void) if (!current_clr_polling_and_test()) { stop_critical_timings(); rcu_idle_enter(); - arch_cpu_idle(); - WARN_ON_ONCE(irqs_disabled()); + if (cpuidle_idle_call()) + arch_cpu_idle(); + if (WARN_ON_ONCE(irqs_disabled())) + local_irq_enable(); rcu_idle_exit(); start_critical_timings(); } else { ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Please pull 'next' branch of 5xxx tree
Hi Ben ! On Wed, 29 Jan 2014 18:46:09 +1100 Benjamin Herrenschmidt b...@kernel.crashing.org wrote: ... Hrm, I get a merge conflicts with spi-mpc512x-psc.c, please check that I fixed it up properly in powerpc-next and let me know. your fix is correct. Thanks, Anatolij ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] Fix compile error of pgtable-ppc64.h
On Tue, Jan 28, 2014 at 05:52:42PM +0530, Aneesh Kumar K.V wrote: From: Li Zhong zh...@linux.vnet.ibm.com It seems that forward declaration couldn't work well with typedef, use struct spinlock directly to avoiding following build errors: In file included from include/linux/spinlock.h:81, from include/linux/seqlock.h:35, from include/linux/time.h:5, from include/uapi/linux/timex.h:56, from include/linux/timex.h:56, from include/linux/sched.h:17, from arch/powerpc/kernel/asm-offsets.c:17: include/linux/spinlock_types.h:76: error: redefinition of typedef 'spinlock_t' /root/linux-next/arch/powerpc/include/asm/pgtable-ppc64.h:563: note: previous declaration of 'spinlock_t' was here build fix for upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f for 3.13 stable series I don't understand, why is this needed? Is there a corrisponding patch upstream that already does this? What went wrong with a normal backport of the patch to 3.13? confused, greg k-h ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
On 28.01.2014 [10:29:47 -0800], Nishanth Aravamudan wrote: On 27.01.2014 [14:58:05 +0900], Joonsoo Kim wrote: On Fri, Jan 24, 2014 at 05:10:42PM -0800, Nishanth Aravamudan wrote: On 24.01.2014 [16:25:58 -0800], David Rientjes wrote: On Fri, 24 Jan 2014, Nishanth Aravamudan wrote: Thank you for clarifying and providing a test patch. I ran with this on the system showing the original problem, configured to have 15GB of memory. With your patch after boot: MemTotal: 15604736 kB MemFree: 8768192 kB Slab:3882560 kB SReclaimable: 105408 kB SUnreclaim: 3777152 kB With Anton's patch after boot: MemTotal: 15604736 kB MemFree:11195008 kB Slab:1427968 kB SReclaimable: 109184 kB SUnreclaim: 1318784 kB I know that's fairly unscientific, but the numbers are reproducible. Hello, I think that there is one mistake on David's patch although I'm not sure that it is the reason for this result. With David's patch, get_partial() in new_slab_objects() doesn't work properly, because we only change node id in !node_match() case. If we meet just !freelist case, we pass node id directly to new_slab_objects(), so we always try to allocate new slab page regardless existence of partial pages. We should solve it. Could you try this one? This helps about the same as David's patch -- but I found the reason why! ppc64 doesn't set CONFIG_HAVE_MEMORYLESS_NODES :) Expect a patch shortly for that and one other case I found. This patch on its own seems to help on our test system by saving around 1.5GB of slab. Tested-by: Nishanth Aravamudan n...@linux.vnet.ibm.com Acked-by: Nishanth Aravamudan n...@linux.vnet.ibm.com with the caveat below. Thanks, Nish Thanks. --- a/mm/slub.c +++ b/mm/slub.c @@ -1698,8 +1698,10 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node, struct kmem_cache_cpu *c) { void *object; - int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node; + int searchnode = (node == NUMA_NO_NODE) ? numa_mem_id() : node; + if (node != NUMA_NO_NODE !node_present_pages(node)) + searchnode = numa_mem_id(); This might be clearer as: int searchnode = node; if (node == NUMA_NO_NODE || !node_present_pages(node)) searchnode = numa_mem_id(); Cody Schafer mentioned to me on IRC that this may not always reflect exactly what the caller intends. int searchnode = node; if (node == NUMA_NO_NODE) searchnode = numa_mem_id(); if (!node_present_pages(node)) searchnode = local_memory_node(node); The difference in semantics from the previous is that here, if we have a memoryless node, rather than using the CPU's nearest NUMA node, we use the NUMA node closest to the requested one? object = get_partial_node(s, get_node(s, searchnode), c, flags); if (object || node != NUMA_NO_NODE) return object; @@ -2278,10 +2280,14 @@ redo: if (unlikely(!node_match(page, node))) { stat(s, ALLOC_NODE_MISMATCH); - deactivate_slab(s, page, c-freelist); - c-page = NULL; - c-freelist = NULL; - goto new_slab; + if (unlikely(!node_present_pages(node))) + node = numa_mem_id(); Similarly here? -Nish + if (!node_match(page, node)) { + deactivate_slab(s, page, c-freelist); + c-page = NULL; + c-freelist = NULL; + goto new_slab; + } } /* ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 02/10] KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
On Wed, 2014-01-29 at 17:39 +0100, Alexander Graf wrote: static inline mfvtb(unsigned long) { #ifdef CONFIG_PPC_BOOK3S_64 return mfspr(SPRN_VTB); #else BUG(); #endif } is a lot easier to read and get right. But reg.h is Ben's call. Agreed. Also could you please give me a pointer to the specification for it? I tried to look up vtb in the 2.06 ISA and couldn't find it. Is it a CPU specific register? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 2/2] Fix compile error of pgtable-ppc64.h
On Wed, 2014-01-29 at 10:45 -0800, Greg KH wrote: On Tue, Jan 28, 2014 at 05:52:42PM +0530, Aneesh Kumar K.V wrote: From: Li Zhong zh...@linux.vnet.ibm.com It seems that forward declaration couldn't work well with typedef, use struct spinlock directly to avoiding following build errors: In file included from include/linux/spinlock.h:81, from include/linux/seqlock.h:35, from include/linux/time.h:5, from include/uapi/linux/timex.h:56, from include/linux/timex.h:56, from include/linux/sched.h:17, from arch/powerpc/kernel/asm-offsets.c:17: include/linux/spinlock_types.h:76: error: redefinition of typedef 'spinlock_t' /root/linux-next/arch/powerpc/include/asm/pgtable-ppc64.h:563: note: previous declaration of 'spinlock_t' was here build fix for upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f for 3.13 stable series I don't understand, why is this needed? Is there a corrisponding patch upstream that already does this? What went wrong with a normal backport of the patch to 3.13? There's a corresponding patch in powerpc-next that I'm about to send to Linus today, but for the backport, the fix could be folded into the original offending patch. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 02/10] KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
On Thu, 2014-01-30 at 09:54 +1100, Benjamin Herrenschmidt wrote: On Wed, 2014-01-29 at 17:39 +0100, Alexander Graf wrote: static inline mfvtb(unsigned long) { #ifdef CONFIG_PPC_BOOK3S_64 return mfspr(SPRN_VTB); #else BUG(); #endif } is a lot easier to read and get right. But reg.h is Ben's call. Agreed. I mean I agree with Alex, his version is nicer :-) Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Add cpu family documentation
This patch adds some documentation on the different cpu families supported by arch/powerpc. Signed-off-by: Michael Ellerman m...@ellerman.id.au --- Documentation/powerpc/cpu_families.txt | 76 ++ 1 file changed, 76 insertions(+) create mode 100644 Documentation/powerpc/cpu_families.txt diff --git a/Documentation/powerpc/cpu_families.txt b/Documentation/powerpc/cpu_families.txt new file mode 100644 index 000..df72657 --- /dev/null +++ b/Documentation/powerpc/cpu_families.txt @@ -0,0 +1,76 @@ +CPU Families + + +This doco tries to summarise some of the different cpu families that exist and +are supported by arch/powerpc. + +Book3S (aka sPAPR) +-- + + - Hash MMU + - Mix of 32 64 bit + + Old + POWER --- 601 --- 603 + || | + || *- 740 + || | + || *- 750 (G3) --- 750CX --- 750CL --- 750FX + || | + || | + | 604 *--- 7400 --- 7410 --- 7450 --- 7455 --- 7447 --- 7448 + || + || + |* [620] --- POWER3/630 --- POWER3+ --- POWER4 --- POWER4+ --- POWER5 --- POWER5+ --- POWER5++ --- POWER6 --- POWER7 --- POWER7+ --- POWER8 + | (64bit) |. + | |. + | |*--- Cell + | | + | *--- 970 --- 970FX --- 970MP + | + *--- RS64 (threads) + + + PA6T (64bit) ... + + +IBM BookE +- + + - Software loaded TLB. + - All 32 bit + + 401 --- 403 --- 405 --- 440 --- 450 --- 460 --- 476 + | + *--- BG/P + + +Motorola/Freescale 8xx +-- + + - Software loaded with hardware assist. + - All 32 bit + + 8xx --- 850 + + +Freescale BookE +--- + + - Software loaded TLB. + - e6500 adds HW loaded indirect TLB entries. + - Mix of 32 64 bit + + e200 --- e500 --- e500v2 --- e500mc --- e5500 --- e6500 + (Book3E) (HW TLB) + (64bit) + +IBM A2 core +--- + + - Book3E, software loaded TLB + HW loaded indirect TLB entries. + - 64 bit + + A2 core --- BG/Q + | + *--- WSP -- 1.8.3.2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[git pull] Please pull powerpc.git next branch
Hi Linus ! Here are a few more powerpc bits for this merge window. The bulk is made of two pull requests from Scott and Anatolij that I had missed previously (they arrived while I was away). Since both their branches are in -next independently, and the content has been around for a little while, they can still go in. The rest is mostly bug and regression fixes, a small series of cleanups to our pseries cpuidle code (including moving it to the right place), and one new cpuidle bakend for the powernv platform. I also wired up the new sched_attr syscalls. Cheers, Ben. The following changes since commit d891ea23d5203e5c47439b2a174f86a00b356a6c: Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client (2014-01-28 11:02:23 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git next for you to fetch changes up to f878f84373aefda7f041a74b24a83b8b7dec1cf0: powerpc: Wire up sched_setattr and sched_getattr syscalls (2014-01-29 17:13:05 +1100) Alistair Popple (1): powerpc/iommu: Fix initialisation of DART iommu table Andreas Schwab (1): powerpc: Fix hw breakpoints on !HAVE_HW_BREAKPOINT configurations Benjamin Herrenschmidt (3): Merge remote-tracking branch 'agust/next' into next Merge remote-tracking branch 'scott/next' into next powerpc: Wire up sched_setattr and sched_getattr syscalls Deepthi Dharwar (6): powerpc/pseries/cpuidle: Move processor_idle.c to drivers/cpuidle. powerpc/pseries/cpuidle: Use cpuidle_register() for initialisation. powerpc/pseries/cpuidle: Make cpuidle-pseries backend driver a non-module. powerpc/pseries/cpuidle: Remove MAX_IDLE_STATE macro. powerpc/pseries/cpuidle: smt-snooze-delay cleanup. powerpc/powernv/cpuidle: Back-end cpuidle driver for powernv platform. Gerhard Sittig (20): dts: mpc512x: introduce dt-bindings/clock/ header dts: mpc512x: add clock related device tree specs clk: mpc512x: introduce COMMON_CLK for MPC512x (disabled) clk: mpc512x: add backwards compat to the CCF code dts: mpc512x: add clock specs for client lookups clk: mpc5xxx: switch to COMMON_CLK, retire PPC_CLOCK spi: mpc512x: adjust to OF based clock lookup serial: mpc512x: adjust for OF based clock lookup serial: mpc512x: setup the PSC FIFO clock as well USB: fsl-mph-dr-of: adjust for OF based clock lookup mtd: mpc5121_nfc: adjust for OF based clock lookup fsl-viu: adjust for OF based clock lookup net: can: mscan: adjust to common clock support for mpc512x net: can: mscan: remove non-CCF code for MPC512x powerpc/mpc512x: improve DIU related clock setup clk: mpc512x: remove migration support workarounds powerpc/512x: clk: minor comment updates powerpc/512x: clk: enforce even SDHC divider values powerpc/512x: clk: support MPC5121/5123/5125 SoC variants powerpc/512x: dts: add MPC5125 clock specs Joe Perches (1): powerpc/numa: Fix decimal permissions Li Zhong (1): powerpc/mm: Fix compile error of pgtable-ppc64.h Paul Mackerras (2): powerpc: Fix 32-bit frames for signals delivered when transactional powerpc: Make sure cache directory is removed when offlining cpu Scott Wood (1): powerpc/booke64: Guard e6500 tlb handler with CONFIG_PPC_FSL_BOOK3E Tang Yuantian (1): clk: corenet: Adds the clock binding Tiejun Chen (1): powerpc/hugetlb: Replace __get_cpu_var with get_cpu_var jmarc...@redhat.com (1): powerpc/mm: Fix mmap errno when MAP_FIXED is set and mapping exceeds the allowed address space .../devicetree/bindings/clock/corenet-clock.txt| 134 +++ arch/powerpc/Kconfig |6 +- arch/powerpc/boot/dts/ac14xx.dts |7 + arch/powerpc/boot/dts/mpc5121.dtsi | 113 +- arch/powerpc/boot/dts/mpc5125twr.dts | 53 +- arch/powerpc/include/asm/clk_interface.h | 20 - arch/powerpc/include/asm/mpc5121.h |7 +- arch/powerpc/include/asm/pgtable-ppc64.h |6 +- arch/powerpc/include/asm/processor.h |7 - arch/powerpc/include/asm/systbl.h |2 + arch/powerpc/include/asm/unistd.h |2 +- arch/powerpc/include/uapi/asm/unistd.h |3 +- arch/powerpc/kernel/Makefile |1 - arch/powerpc/kernel/cacheinfo.c|3 + arch/powerpc/kernel/clock.c| 82 -- arch/powerpc/kernel/process.c |2 +- arch/powerpc/kernel/signal_32.c| 19 +- arch/powerpc/kernel/sysfs.c|2 - arch/powerpc/mm/hugetlbpage.c |4 +- arch/powerpc/mm/numa.c |2 +-
Re: [PATCH] powerpc: Add cpu family documentation
Hi Michael, Nice. On Thu, 30 Jan 2014 13:38:00 +1100 Michael Ellerman m...@ellerman.id.au wrote: +++ b/Documentation/powerpc/cpu_families.txt @@ -0,0 +1,76 @@ +CPU Families + + +This doco tries to summarise some of the different cpu families that exist and document + || + |* [620] --- POWER3/630 --- POWER3+ --- POWER4 --- POWER4+ --- POWER5 --- POWER5+ --- POWER5++ --- POWER6 --- POWER7 --- POWER7+ --- POWER8 Its a pity that this wraps ... -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpm3sDjLd9rD.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop
Hi Nicolas, On 01/30/2014 02:01 AM, Nicolas Pitre wrote: On Wed, 29 Jan 2014, Nicolas Pitre wrote: In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code. Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver. Signed-off-by: Nicolas Pitre n...@linaro.org Acked-by: Daniel Lezcano daniel.lezc...@linaro.org As mentioned in my reply to Olof's comment on patch #5/6, here's a new version of this patch adding the safety local_irq_enable() to the core code. - 8 From: Nicolas Pitre nicolas.pi...@linaro.org Subject: idle: move the cpuidle entry point to the generic idle loop In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code. Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver. In both cases i.e. whether it is a cpuidle driver or the default arch_cpu_idle(), the calling convention expects IRQs to be disabled on entry and enabled on exit. There is a warning in place already but let's add a forced IRQ enable here as well. This will allow for removing the forced IRQ enable some implementations do locally and Why would this patch allow for removing the forced IRQ enable that are being done on some archs in arch_cpu_idle()? Isn't this patch expecting the default arch_cpu_idle() to have re-enabled the interrupts after exiting from the default idle state? Its supposed to only catch faulty cpuidle drivers that haven't enabled IRQs on exit from idle state but are expected to have done so, isn't it? Thanks Regards Preeti U Murthy allowing for the warning to trig. Signed-off-by: Nicolas Pitre n...@linaro.org diff --git a/kernel/cpu/idle.c b/kernel/cpu/idle.c index 988573a9a3..14ca43430a 100644 --- a/kernel/cpu/idle.c +++ b/kernel/cpu/idle.c @@ -3,6 +3,7 @@ */ #include linux/sched.h #include linux/cpu.h +#include linux/cpuidle.h #include linux/tick.h #include linux/mm.h #include linux/stackprotector.h @@ -95,8 +96,10 @@ static void cpu_idle_loop(void) if (!current_clr_polling_and_test()) { stop_critical_timings(); rcu_idle_enter(); - arch_cpu_idle(); - WARN_ON_ONCE(irqs_disabled()); + if (cpuidle_idle_call()) + arch_cpu_idle(); + if (WARN_ON_ONCE(irqs_disabled())) + local_irq_enable(); rcu_idle_exit(); start_critical_timings(); } else { ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop
On Thu, 30 Jan 2014, Preeti U Murthy wrote: Hi Nicolas, On 01/30/2014 02:01 AM, Nicolas Pitre wrote: On Wed, 29 Jan 2014, Nicolas Pitre wrote: In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code. Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver. Signed-off-by: Nicolas Pitre n...@linaro.org Acked-by: Daniel Lezcano daniel.lezc...@linaro.org As mentioned in my reply to Olof's comment on patch #5/6, here's a new version of this patch adding the safety local_irq_enable() to the core code. - 8 From: Nicolas Pitre nicolas.pi...@linaro.org Subject: idle: move the cpuidle entry point to the generic idle loop In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code. Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver. In both cases i.e. whether it is a cpuidle driver or the default arch_cpu_idle(), the calling convention expects IRQs to be disabled on entry and enabled on exit. There is a warning in place already but let's add a forced IRQ enable here as well. This will allow for removing the forced IRQ enable some implementations do locally and Why would this patch allow for removing the forced IRQ enable that are being done on some archs in arch_cpu_idle()? Isn't this patch expecting the default arch_cpu_idle() to have re-enabled the interrupts after exiting from the default idle state? Its supposed to only catch faulty cpuidle drivers that haven't enabled IRQs on exit from idle state but are expected to have done so, isn't it? Exact. However x86 currently does this: if (cpuidle_idle_call()) x86_idle(); else local_irq_enable(); So whenever cpuidle_idle_call() is successful then IRQs are unconditionally enabled whether or not the underlying cpuidle driver has properly done it or not. And the reason is that some of the x86 cpuidle do fail to enable IRQs before returning. So the idea is to get rid of this unconditional IRQ enabling and let the core issue a warning instead (as well as enabling IRQs to allow the system to run). Nicolas ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop
Hi Nicolas, On 01/30/2014 10:58 AM, Nicolas Pitre wrote: On Thu, 30 Jan 2014, Preeti U Murthy wrote: Hi Nicolas, On 01/30/2014 02:01 AM, Nicolas Pitre wrote: On Wed, 29 Jan 2014, Nicolas Pitre wrote: In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code. Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver. Signed-off-by: Nicolas Pitre n...@linaro.org Acked-by: Daniel Lezcano daniel.lezc...@linaro.org As mentioned in my reply to Olof's comment on patch #5/6, here's a new version of this patch adding the safety local_irq_enable() to the core code. - 8 From: Nicolas Pitre nicolas.pi...@linaro.org Subject: idle: move the cpuidle entry point to the generic idle loop In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code. Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver. In both cases i.e. whether it is a cpuidle driver or the default arch_cpu_idle(), the calling convention expects IRQs to be disabled on entry and enabled on exit. There is a warning in place already but let's add a forced IRQ enable here as well. This will allow for removing the forced IRQ enable some implementations do locally and Why would this patch allow for removing the forced IRQ enable that are being done on some archs in arch_cpu_idle()? Isn't this patch expecting the default arch_cpu_idle() to have re-enabled the interrupts after exiting from the default idle state? Its supposed to only catch faulty cpuidle drivers that haven't enabled IRQs on exit from idle state but are expected to have done so, isn't it? Exact. However x86 currently does this: if (cpuidle_idle_call()) x86_idle(); else local_irq_enable(); So whenever cpuidle_idle_call() is successful then IRQs are unconditionally enabled whether or not the underlying cpuidle driver has properly done it or not. And the reason is that some of the x86 cpuidle do fail to enable IRQs before returning. So the idea is to get rid of this unconditional IRQ enabling and let the core issue a warning instead (as well as enabling IRQs to allow the system to run). Oh ok, thank you for clarifying this:) Regards Preeti U Murthy Nicolas ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 07/10] KVM: PPC: BOOK3S: PR: Emulate facility status and control register
On Tue, Jan 28, 2014 at 10:14:12PM +0530, Aneesh Kumar K.V wrote: We allow priv-mode update of this. The guest value is saved in fscr, and the value actually used is saved in shadow_fscr. shadow_fscr only contains values that are allowed by the host. On facility unavailable interrupt, if the facility is allowed by fscr but disabled in shadow_fscr we need to emulate the support. Currently all but EBB is disabled. We still don't support performance monitoring in PR guest. ... + /* + * Save the current fscr in shadow fscr + */ + mfspr r3,SPRN_FSCR + PPC_STL r3, VCPU_SHADOW_FSCR(r7) I don't think you need to do this. What could possibly have changed FSCR since we loaded it on the way into the guest? Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH 02/10] KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
On Tue, Jan 28, 2014 at 10:14:07PM +0530, Aneesh Kumar K.V wrote: virtual time base register is a per vm register and need to saved and restored on vm exit and entry. Writing to VTB is not allowed in the privileged mode. ... +#ifdef CONFIG_PPC_BOOK3S_64 +#define mfvtb() ({unsigned long rval; \ + asm volatile(mfspr %0, %1 : \ + =r (rval) : i (SPRN_VTB)); rval;}) The mfspr will be a no-op on anything before POWER8, meaning the result will be whatever value was in the destination GPR before the mfspr. I suppose that may not matter if the result is only ever used when we're running on a POWER8 host, but I would feel more comfortable if we had explicit feature tests to make sure of that, rather than possibly doing computations with unpredictable values. With your patch, a guest on a POWER7 or a PPC970 could do a read from VTB and get garbage -- first, there is nothing to stop userspace from requesting POWER8 emulation on an older machine, and secondly, even if the virtual machine is a PPC970 (say) you don't implement unimplemented SPR semantics for VTB (no-op if PR=0, illegal instruction interrupt if PR=1). On the whole I think it is reasonable to reject an attempt to set the virtual PVR to a POWER8 PVR value if we are not running on a POWER8 host, because emulating all the new POWER8 features in software (particularly transactional memory) would not be feasible. Alex may disagree. :) Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev