date:20170901

[PATCH] arch/powerpc: Standardize DTS status assignments from "ok" to "okay"

2017-09-01 Thread Robert P. J. Day


While the current kernel drivers/of/ code allows developers to be
sloppy and use a DTS status value of "ok", the current DTSpec 0.1
makes it clear that the proper spelling is "okay", so fix the small
number of PowerPC .dts files that do this.

Signed-off-by: Robert P. J. Day 

---

diff --git a/arch/powerpc/boot/dts/akebono.dts 
b/arch/powerpc/boot/dts/akebono.dts
index e61d5dc..660ee21 100644
--- a/arch/powerpc/boot/dts/akebono.dts
+++ b/arch/powerpc/boot/dts/akebono.dts
@@ -40,7 +40,7 @@
d-cache-size = <32768>;
dcr-controller;
dcr-access-method = "native";
-   status = "ok";
+   status = "okay";
};
cpu@1 {
device_type = "cpu";
diff --git a/arch/powerpc/boot/dts/bluestone.dts 
b/arch/powerpc/boot/dts/bluestone.dts
index b0b26d8..64eaf7e 100644
--- a/arch/powerpc/boot/dts/bluestone.dts
+++ b/arch/powerpc/boot/dts/bluestone.dts
@@ -109,7 +109,7 @@

OCM: ocm@40004 {
compatible = "ibm,ocm";
-   status = "ok";
+   status = "okay";
cell-index = <1>;
/* configured in U-Boot */
reg = <4 0x0004 0x8000>; /* 32K */
diff --git a/arch/powerpc/boot/dts/currituck.dts 
b/arch/powerpc/boot/dts/currituck.dts
index 4191e18..093c4bc 100644
--- a/arch/powerpc/boot/dts/currituck.dts
+++ b/arch/powerpc/boot/dts/currituck.dts
@@ -39,7 +39,7 @@
d-cache-size = <32768>;
dcr-controller;
dcr-access-method = "native";
-   status = "ok";
+   status = "okay";
};
cpu@1 {
device_type = "cpu";
diff --git a/arch/powerpc/boot/dts/iss4xx-mpic.dts 
b/arch/powerpc/boot/dts/iss4xx-mpic.dts
index 23e9d9b..9804ada 100644
--- a/arch/powerpc/boot/dts/iss4xx-mpic.dts
+++ b/arch/powerpc/boot/dts/iss4xx-mpic.dts
@@ -43,7 +43,7 @@
d-cache-size = <32768>;
dcr-controller;
dcr-access-method = "native";
-   status = "ok";
+   status = "okay";
};
cpu@1 {
device_type = "cpu";

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
http://crashcourse.ca

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday

Re: [PATCH V3 6/6] crypto/nx: Add P9 NX support for 842 compression engine

2017-09-01 Thread Haren Myneni

On 09/01/2017 04:34 AM, Michael Ellerman wrote:
> Haren Myneni  writes:
>>> On Mon, Aug 28, 2017 at 7:25 PM, Michael Ellerman  
>>> wrote:
 Hi Haren,

 Some comments inline ...

 Haren Myneni  writes:

> diff --git a/drivers/crypto/nx/nx-842-powernv.c 
> b/drivers/crypto/nx/nx-842-powernv.c
> index c0dd4c7e17d3..13089a0b9dfa 100644
> --- a/drivers/crypto/nx/nx-842-powernv.c
> +++ b/drivers/crypto/nx/nx-842-powernv.c
> @@ -32,6 +33,9 @@ MODULE_ALIAS_CRYPTO("842-nx");
>
>  #define WORKMEM_ALIGN(CRB_ALIGN)
>  #define CSB_WAIT_MAX (5000) /* ms */
> +#define VAS_RETRIES  (10)

 Where does that number come from?
>>
>> Sometimes HW returns copy/paste failures. So we should retry the
>> request again. With 10 retries, Test running 12 hours was successful
>> for repeated compression/decompression requests with 1024 threads.
> 
> But why 10. Why not 5, or 100, or 1, or 10,000?

VAS spec says small number of retries. During my 12 hour test with 1024 threads 
- doing continuous compression/decompression requests, noticed around 6 or 7 
retries needed. Hence used 10 retries.  

> 
> Presumably when we have to retry it means the NX is too busy to service the
> request?

One possible case. We can also see failures when receive/send credit are 
exhausted or reached the cached windows limit.

> 
> Do we have any way to find out how long it might be busy for?
Hard to know the reason of failure. VAS simply returns success or failure 
without providing the actual reason.

> 
> Should we try an NX on another chip?
Hard to decide whether to fall back on other NX engine since no way to know the 
actual failure reason on the current NX or no idea whether other NX is free. 

> 
> We should also take into account the size of our request, ie. are we
> asking the NX to compress one page, or 1GB ?

842 compression/decompression request size for NX is always fixed. So divide in 
to smaller requests for large buffer. Whereas NX gzip engine is different - can 
be configurable request size. We can look at this optimization when gzip 
support is added.   
   
> 
> If it's just one page maybe we should fall back to software immediately.

Right now falls back to SW decompression after 10 retries. Whereas user can use 
SW 842 compression upon failures.

We are planning to look in to performance analysis as part of VAS/NX 
optimizatiion and make necessary changes.

Thanks
Haren
> 
> cheers
>

Re: [PATCH 14/19] powerpc: Add ppc_strict_facility_enable boot option

2017-09-01 Thread Benjamin Herrenschmidt

On Thu, 2015-10-29 at 11:44 +1100, Anton Blanchard wrote:
>  
> +extern void msr_check_and_set(unsigned long bits);
> +extern bool strict_msr_control;
> +extern void __msr_check_and_clear(unsigned long bits);
> +static inline void msr_check_and_clear(unsigned long bits)
> +{
> +   if (strict_msr_control)
> +   __msr_check_and_clear(bits);
> +}
> +
>  static inline unsigned long mfvtb (void)

Sounds like a good candidate for a static key...

Cheers,
Ben.

Re: [PATCH 07/19] powerpc: Create mtmsrd_isync()

2017-09-01 Thread Benjamin Herrenschmidt

On Thu, 2015-10-29 at 11:43 +1100, Anton Blanchard wrote:
> mtmsrd_isync() will do an mtmsrd followed by an isync on older
> processors. On newer processors we avoid the isync via a feature fixup.

The isync is needed specifically when enabling/disable FP etc... right
?

I'd like to make the name a bit clearer. Maybe something like
set_msr_fpvec() or maybe you can come up with something even better, ie
 use a name that represents what it's for rather than what it does.

> Signed-off-by: Anton Blanchard 
> ---
>  arch/powerpc/include/asm/reg.h |  8 
>  arch/powerpc/kernel/process.c  | 30 ++
>  2 files changed, 30 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index a908ada..987dac0 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -1193,12 +1193,20 @@
>  #define __mtmsrd(v, l)   asm volatile("mtmsrd %0," __stringify(l) \
>: : "r" (v) : "memory")
>  #define mtmsr(v) __mtmsrd((v), 0)
> +#define __MTMSR  "mtmsrd"
>  #else
>  #define mtmsr(v) asm volatile("mtmsr %0" : \
>: "r" ((unsigned long)(v)) \
>: "memory")
> +#define __MTMSR  "mtmsr"
>  #endif
>  
> +static inline void mtmsr_isync(unsigned long val)
> +{
> + asm volatile(__MTMSR " %0; " ASM_FTR_IFCLR("isync", "nop", %1) : :
> + "r" (val), "i" (CPU_FTR_ARCH_206) : "memory");
> +}
> +
>  #define mfspr(rn)({unsigned long rval; \
>   asm volatile("mfspr %0," __stringify(rn) \
>   : "=r" (rval)); rval;})
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index ef64219..5bf8ec2 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -130,7 +130,10 @@ void enable_kernel_fp(void)
>   check_if_tm_restore_required(current);
>   giveup_fpu(current);
>   } else {
> - giveup_fpu(NULL);   /* just enables FP for kernel */
> + u64 oldmsr = mfmsr();
> +
> + if (!(oldmsr & MSR_FP))
> + mtmsr_isync(oldmsr | MSR_FP);
>   }
>  }
>  EXPORT_SYMBOL(enable_kernel_fp);
> @@ -144,7 +147,10 @@ void enable_kernel_altivec(void)
>   check_if_tm_restore_required(current);
>   giveup_altivec(current);
>   } else {
> - giveup_altivec_notask();
> + u64 oldmsr = mfmsr();
> +
> + if (!(oldmsr & MSR_VEC))
> + mtmsr_isync(oldmsr | MSR_VEC);
>   }
>  }
>  EXPORT_SYMBOL(enable_kernel_altivec);
> @@ -173,10 +179,14 @@ void enable_kernel_vsx(void)
>  {
>   WARN_ON(preemptible());
>  
> - if (current->thread.regs && (current->thread.regs->msr & MSR_VSX))
> + if (current->thread.regs && (current->thread.regs->msr & MSR_VSX)) {
>   giveup_vsx(current);
> - else
> - giveup_vsx(NULL);   /* just enable vsx for kernel - force */
> + } else {
> + u64 oldmsr = mfmsr();
> +
> + if (!(oldmsr & MSR_VSX))
> + mtmsr_isync(oldmsr | MSR_VSX);
> + }
>  }
>  EXPORT_SYMBOL(enable_kernel_vsx);
>  
> @@ -209,10 +219,14 @@ void enable_kernel_spe(void)
>  {
>   WARN_ON(preemptible());
>  
> - if (current->thread.regs && (current->thread.regs->msr & MSR_SPE))
> + if (current->thread.regs && (current->thread.regs->msr & MSR_SPE)) {
>   giveup_spe(current);
> - else
> - giveup_spe(NULL);   /* just enable SPE for kernel - force */
> + } else {
> + u64 oldmsr = mfmsr();
> +
> + if (!(oldmsr & MSR_SPE))
> + mtmsr_isync(oldmsr | MSR_SPE);
> + }
>  }
>  EXPORT_SYMBOL(enable_kernel_spe);
>

Re: [PATCH 03/19] powerpc: Create context switch helpers save_sprs() and restore_sprs()

2017-09-01 Thread Benjamin Herrenschmidt

On Thu, 2015-10-29 at 11:43 +1100, Anton Blanchard wrote:
> Move all our context switch SPR save and restore code into two
> helpers. We do a few optimisations:

To avoid confusion with other places where we might save and restore
SPRs for things like power management etc... can you name these
save_task_sprs and restore_task_sprs or something similar ? At least
for me it makes things a bit clearer :)

> - Group all mfsprs and all mtsprs. In many cases an mtspr sets a
> scoreboarding bit that an mfspr waits on, so the current practise of
> mfspr A; mtspr A; mfpsr B; mtspr B is the worst scheduling we can
> do.
> 
> - SPR writes are slow, so check that the value is changing before
> writing it.
> 
> A context switch microbenchmark using yield():
> 
> http://ozlabs.org/~anton/junkcode/context_switch2.c
> 
> ./context_switch2 --test=yield 0 0
> 
> shows an improvement of almost 10% on POWER8.
> 
> Signed-off-by: Anton Blanchard 
> ---
>  arch/powerpc/include/asm/processor.h |  1 +
>  arch/powerpc/include/asm/switch_to.h | 11 -
>  arch/powerpc/kernel/entry_64.S   | 60 +--
>  arch/powerpc/kernel/process.c| 92 
> +++-
>  4 files changed, 82 insertions(+), 82 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/processor.h 
> b/arch/powerpc/include/asm/processor.h
> index 5afea36..c273f3e 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -294,6 +294,7 @@ struct thread_struct {
>  #endif
>  #ifdef CONFIG_PPC64
>   unsigned long   dscr;
> + unsigned long   fscr;
>   /*
>* This member element dscr_inherit indicates that the process
>* has explicitly attempted and changed the DSCR register value
> diff --git a/arch/powerpc/include/asm/switch_to.h 
> b/arch/powerpc/include/asm/switch_to.h
> index 15cca17..33a071d 100644
> --- a/arch/powerpc/include/asm/switch_to.h
> +++ b/arch/powerpc/include/asm/switch_to.h
> @@ -15,17 +15,6 @@ extern struct task_struct *__switch_to(struct task_struct 
> *,
>  struct thread_struct;
>  extern struct task_struct *_switch(struct thread_struct *prev,
>  struct thread_struct *next);
> -#ifdef CONFIG_PPC_BOOK3S_64
> -static inline void save_early_sprs(struct thread_struct *prev)
> -{
> - if (cpu_has_feature(CPU_FTR_ARCH_207S))
> - prev->tar = mfspr(SPRN_TAR);
> - if (cpu_has_feature(CPU_FTR_DSCR))
> - prev->dscr = mfspr(SPRN_DSCR);
> -}
> -#else
> -static inline void save_early_sprs(struct thread_struct *prev) {}
> -#endif
>  
>  extern void enable_kernel_fp(void);
>  extern void enable_kernel_altivec(void);
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 93bb284..e84e5bc 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -453,29 +453,12 @@ _GLOBAL(_switch)
>   SAVE_8GPRS(14, r1)
>   SAVE_10GPRS(22, r1)
>   mflrr20 /* Return to switch caller */
> -#ifdef CONFIG_ALTIVEC
> -BEGIN_FTR_SECTION
> - mfspr   r24,SPRN_VRSAVE /* save vrsave register value */
> - std r24,THREAD_VRSAVE(r3)
> -END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
> -#endif /* CONFIG_ALTIVEC */
> +
>   std r20,_NIP(r1)
>   mfcrr23
>   std r23,_CCR(r1)
>   std r1,KSP(r3)  /* Set old stack pointer */
>  
> -#ifdef CONFIG_PPC_BOOK3S_64
> -BEGIN_FTR_SECTION
> - /* Event based branch registers */
> - mfspr   r0, SPRN_BESCR
> - std r0, THREAD_BESCR(r3)
> - mfspr   r0, SPRN_EBBHR
> - std r0, THREAD_EBBHR(r3)
> - mfspr   r0, SPRN_EBBRR
> - std r0, THREAD_EBBRR(r3)
> -END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> -#endif
> -
>  #ifdef CONFIG_SMP
>   /* We need a sync somewhere here to make sure that if the
>* previous task gets rescheduled on another CPU, it sees all
> @@ -563,47 +546,6 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT)
>   mr  r1,r8   /* start using new stack pointer */
>   std r7,PACAKSAVE(r13)
>  
> -#ifdef CONFIG_PPC_BOOK3S_64
> -BEGIN_FTR_SECTION
> - /* Event based branch registers */
> - ld  r0, THREAD_BESCR(r4)
> - mtspr   SPRN_BESCR, r0
> - ld  r0, THREAD_EBBHR(r4)
> - mtspr   SPRN_EBBHR, r0
> - ld  r0, THREAD_EBBRR(r4)
> - mtspr   SPRN_EBBRR, r0
> -
> - ld  r0,THREAD_TAR(r4)
> - mtspr   SPRN_TAR,r0
> -END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> -#endif
> -
> -#ifdef CONFIG_ALTIVEC
> -BEGIN_FTR_SECTION
> - ld  r0,THREAD_VRSAVE(r4)
> - mtspr   SPRN_VRSAVE,r0  /* if G4, restore VRSAVE reg */
> -END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
> -#endif /* CONFIG_ALTIVEC */
> -#ifdef CONFIG_PPC64
> -BEGIN_FTR_SECTION
> - lwz r6,THREAD_DSCR_INHERIT(r4)
> - ld  r0,THREAD_DSCR(r4)
> - cmpwi   r6,0
> - bne 1f
> - ld  r0,PACA_DSCR_DEFAULT(r13)
> -1:
> -BEGIN_FTR_SECTION_NESTED(70)
> - mfspr   r8,

Re: [PATCH V3 6/6] crypto/nx: Add P9 NX support for 842 compression engine

2017-09-01 Thread Haren Myneni

On 09/01/2017 04:29 AM, Michael Ellerman wrote:
> Hi Dan,
> 
> Thanks for reviewing this series.
> 
> Dan Streetman  writes:
>> On Tue, Aug 29, 2017 at 5:54 PM, Haren Myneni  
>> wrote:
>>> On 08/29/2017 02:23 PM, Benjamin Herrenschmidt wrote:
 On Tue, 2017-08-29 at 09:58 -0400, Dan Streetman wrote:
>> +
>> +   ret = -EINVAL;
>> +   if (coproc && coproc->vas.rxwin) {
>> +   wmem->txwin = nx842_alloc_txwin(coproc);
>
> this is wrong.  the workmem is scratch memory that's valid only for
> the duration of a single operation.
>>>
>>> Correct, workmem is used until crypto_free is called.
>>
>> that's not a 'single operation'.  a single operation is compress() or
>> decompress().
>>
>
> do you actually need a txwin per crypto transform?  or do you need a
> txwin per coprocessor?  or txwin per processor?  either per-coproc or
> per-cpu should be created at driver init and held separately
> (globally) instead of a per-transform txwin.  I really don't see why
> you would need a txwin per transform, because the coproc should not
> care how many different transforms there are.

 We should only need a single window for the whole kernel really, plus
 one per user process who wants direct access but that's not relevant
 here.
>>>
>>> Opening send window for each crypto transform (crypto_alloc,
>>> compression/decompression, ..., crypto_free) so that does not
>>> have to wait for the previous copy/paste complete.
>>> VAS will map send and receive windows, and can cache in send
>>> windows (up to 128). So I thought using the same send window
>>> (per chip) for more requests (say 1000) may be adding overhead.
>>>
>>> I will make changes if you prefer using 1 send window per chip.
>>
>> i don't have the spec, so i shouldn't be making the decision on it,
>> but i do know putting a persistent field into the workmem is the wrong
>> location.  If it's valid for the life of the transform, put it into
>> the transform context.  The workmem buffer is intended to be used only
>> during a single operation - it's "working memory" to perform each
>> individual crypto transformation.
> 
> I agree workmem isn't the right place for the txwin. But I don't believe
> it actually breaks anything to put txwin there.
> 
> So for now I'm going to merge this series as-is and I've asked Haren to
> send fixes as soon as he can to clean it up.

I will move txwin to nx842_crypto_ctx and send patch soon.  Even though 
nx842_crypto_ctx is also used for pseries, we can ignore txwin. 

Thanks
Haren


> 
> cheers
>

Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

2017-09-01 Thread Andy Lutomirski

On Fri, Sep 1, 2017 at 3:52 PM, Brijesh Singh  wrote:
> Hi Boris,
>
> On 08/30/2017 12:46 PM, Borislav Petkov wrote:
>>
>> On Wed, Aug 30, 2017 at 11:18:42AM -0500, Brijesh Singh wrote:
>>>
>>> I was trying to avoid mixing early and no-early set_memory_decrypted()
>>> but if
>>> feedback is: use early_set_memory_decrypted() only if its required
>>> otherwise
>>> use set_memory_decrypted() then I can improve the logic in next rev.
>>> thanks
>>
>>
>> Yes, I think you should use the early versions when you're, well,
>> *early* :-) But get rid of that for_each_possible_cpu() and do it only
>> on the current CPU, as this is a per-CPU path anyway. If you need to
>> do it on *every* CPU and very early, then you need a separate function
>> which is called in kvm_smp_prepare_boot_cpu() as there you're pre-SMP.
>>
>
> I am trying to implement your feedback and now remember why I choose to
> use early_set_memory_decrypted() and for_each_possible_cpu loop. These
> percpu variables are static. Hence before clearing the C-bit we must
> perform the in-place decryption so that original assignment is preserved
> after we change the C-bit. Tom's SME patch [1] added sme_early_decrypt()
> -- which can be used to perform the in-place decryption but we do not have
> similar routine for non-early cases. In order to address your feedback,
> we have to add similar functions. So far, we have not seen the need for
> having such functions except this cases. The approach we have right now
> works just fine and not sure if its worth adding new functions.
>
> Thoughts ?
>
> [1] Commit :7f8b7e7 x86/mm: Add support for early encryption/decryption of
> memory

Shouldn't this be called DEFINE_PER_CPU_UNENCRYPTED?  ISTM the "HV
shared" bit is incidental.

Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

2017-09-01 Thread Brijesh Singh


Hi Boris,

On 08/30/2017 12:46 PM, Borislav Petkov wrote:

On Wed, Aug 30, 2017 at 11:18:42AM -0500, Brijesh Singh wrote:

I was trying to avoid mixing early and no-early set_memory_decrypted() but if
feedback is: use early_set_memory_decrypted() only if its required otherwise
use set_memory_decrypted() then I can improve the logic in next rev. thanks


Yes, I think you should use the early versions when you're, well,
*early* :-) But get rid of that for_each_possible_cpu() and do it only
on the current CPU, as this is a per-CPU path anyway. If you need to
do it on *every* CPU and very early, then you need a separate function
which is called in kvm_smp_prepare_boot_cpu() as there you're pre-SMP.



I am trying to implement your feedback and now remember why I choose to
use early_set_memory_decrypted() and for_each_possible_cpu loop. These
percpu variables are static. Hence before clearing the C-bit we must
perform the in-place decryption so that original assignment is preserved
after we change the C-bit. Tom's SME patch [1] added sme_early_decrypt()
-- which can be used to perform the in-place decryption but we do not have
similar routine for non-early cases. In order to address your feedback,
we have to add similar functions. So far, we have not seen the need for
having such functions except this cases. The approach we have right now
works just fine and not sure if its worth adding new functions.

Thoughts ?

[1] Commit :7f8b7e7 x86/mm: Add support for early encryption/decryption of 
memory


-Brijesh

[PATCH] powerpc/sstep: Avoid used uninitialized error

2017-09-01 Thread Michael Ellerman

Older compilers think val may be used uninitialized:

  arch/powerpc/lib/sstep.c: In function 'emulate_loadstore':
  arch/powerpc/lib/sstep.c:2758:23: error: 'val' may be used uninitialized in 
this function

We know better, but initialise val to 0 to avoid breaking the build.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/lib/sstep.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 06dd61d8d48b..fb9f58b868e7 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -2727,6 +2727,7 @@ int emulate_loadstore(struct pt_regs *regs, struct 
instruction_op *op)
if (!address_ok(regs, ea, size))
return -EFAULT;
err = 0;
+   val = 0;
switch (size) {
 #ifdef __powerpc64__
case 1:
-- 
2.7.4

Re: [PATCH 1/1] bpf: take advantage of stack_depth tracking in powerpc JIT

2017-09-01 Thread Daniel Borkmann


On 09/01/2017 08:53 PM, Sandipan Das wrote:

Take advantage of stack_depth tracking, originally introduced for
x64, in powerpc JIT as well. Round up allocated stack by 16 bytes
to make sure it stays aligned for functions called from JITed bpf
program.

Signed-off-by: Sandipan Das 


Awesome, thanks for following up! :)

Re: [PATCH 1/1] bpf: take advantage of stack_depth tracking in powerpc JIT

2017-09-01 Thread Naveen N. Rao

On 2017/09/02 12:23AM, Sandipan Das wrote:
> Take advantage of stack_depth tracking, originally introduced for
> x64, in powerpc JIT as well. Round up allocated stack by 16 bytes
> to make sure it stays aligned for functions called from JITed bpf
> program.
> 
> Signed-off-by: Sandipan Das 
> ---

LGTM, thanks!
Reviewed-by: Naveen N. Rao 

Michael,
Seeing as this is powerpc specific, can you please take this through 
your tree?


Thanks,
Naveen

>  arch/powerpc/net/bpf_jit64.h  |  7 ---
>  arch/powerpc/net/bpf_jit_comp64.c | 16 ++--
>  2 files changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit64.h b/arch/powerpc/net/bpf_jit64.h
> index 62fa7589db2b..8bdef7ed28a8 100644
> --- a/arch/powerpc/net/bpf_jit64.h
> +++ b/arch/powerpc/net/bpf_jit64.h
> @@ -23,7 +23,7 @@
>   *   [   nv gpr save area] 8*8   |
>   *   [tail_call_cnt  ] 8 |
>   *   [local_tmp_var  ] 8 |
> - * fp (r31) -->  [   ebpf stack space] 512   |
> + * fp (r31) -->  [   ebpf stack space] upto 512  |
>   *   [ frame header  ] 32/112|
>   * sp (r1) --->  [stack pointer  ] --
>   */
> @@ -32,8 +32,8 @@
>  #define BPF_PPC_STACK_SAVE   (8*8)
>  /* for bpf JIT code internal usage */
>  #define BPF_PPC_STACK_LOCALS 16
> -/* Ensure this is quadword aligned */
> -#define BPF_PPC_STACKFRAME   (STACK_FRAME_MIN_SIZE + MAX_BPF_STACK + \
> +/* stack frame excluding BPF stack, ensure this is quadword aligned */
> +#define BPF_PPC_STACKFRAME   (STACK_FRAME_MIN_SIZE + \
>BPF_PPC_STACK_LOCALS + BPF_PPC_STACK_SAVE)
> 
>  #ifndef __ASSEMBLY__
> @@ -103,6 +103,7 @@ struct codegen_context {
>*/
>   unsigned int seen;
>   unsigned int idx;
> + unsigned int stack_size;
>  };
> 
>  #endif /* !__ASSEMBLY__ */
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
> b/arch/powerpc/net/bpf_jit_comp64.c
> index 6ba5d253e857..a01362c88f6a 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -69,7 +69,7 @@ static inline bool bpf_has_stack_frame(struct 
> codegen_context *ctx)
>  static int bpf_jit_stack_local(struct codegen_context *ctx)
>  {
>   if (bpf_has_stack_frame(ctx))
> - return STACK_FRAME_MIN_SIZE + MAX_BPF_STACK;
> + return STACK_FRAME_MIN_SIZE + ctx->stack_size;
>   else
>   return -(BPF_PPC_STACK_SAVE + 16);
>  }
> @@ -82,8 +82,9 @@ static int bpf_jit_stack_tailcallcnt(struct codegen_context 
> *ctx)
>  static int bpf_jit_stack_offsetof(struct codegen_context *ctx, int reg)
>  {
>   if (reg >= BPF_PPC_NVR_MIN && reg < 32)
> - return (bpf_has_stack_frame(ctx) ? BPF_PPC_STACKFRAME : 0)
> - - (8 * (32 - reg));
> + return (bpf_has_stack_frame(ctx) ?
> + (BPF_PPC_STACKFRAME + ctx->stack_size) : 0)
> + - (8 * (32 - reg));
> 
>   pr_err("BPF JIT is asking about unknown registers");
>   BUG();
> @@ -134,7 +135,7 @@ static void bpf_jit_build_prologue(u32 *image, struct 
> codegen_context *ctx)
>   PPC_BPF_STL(0, 1, PPC_LR_STKOFF);
>   }
> 
> - PPC_BPF_STLU(1, 1, -BPF_PPC_STACKFRAME);
> + PPC_BPF_STLU(1, 1, -(BPF_PPC_STACKFRAME + ctx->stack_size));
>   }
> 
>   /*
> @@ -161,7 +162,7 @@ static void bpf_jit_build_prologue(u32 *image, struct 
> codegen_context *ctx)
>   /* Setup frame pointer to point to the bpf stack area */
>   if (bpf_is_seen_register(ctx, BPF_REG_FP))
>   PPC_ADDI(b2p[BPF_REG_FP], 1,
> - STACK_FRAME_MIN_SIZE + MAX_BPF_STACK);
> + STACK_FRAME_MIN_SIZE + ctx->stack_size);
>  }
> 
>  static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context 
> *ctx)
> @@ -183,7 +184,7 @@ static void bpf_jit_emit_common_epilogue(u32 *image, 
> struct codegen_context *ctx
> 
>   /* Tear down our stack frame */
>   if (bpf_has_stack_frame(ctx)) {
> - PPC_ADDI(1, 1, BPF_PPC_STACKFRAME);
> + PPC_ADDI(1, 1, BPF_PPC_STACKFRAME + ctx->stack_size);
>   if (ctx->seen & SEEN_FUNC) {
>   PPC_BPF_LL(0, 1, PPC_LR_STKOFF);
>   PPC_MTLR(0);
> @@ -993,6 +994,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
> 
>   memset(&cgctx, 0, sizeof(struct codegen_context));
> 
> + /* Make sure that the stack is quadword aligned. */
> + cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
> +
>   /* Scouting faux-generate pass 0 */
>   if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
>   /* We hit something illegal or unsupported. */
> -- 
> 2.13.5
>

[PATCH 1/1] bpf: take advantage of stack_depth tracking in powerpc JIT

2017-09-01 Thread Sandipan Das

Take advantage of stack_depth tracking, originally introduced for
x64, in powerpc JIT as well. Round up allocated stack by 16 bytes
to make sure it stays aligned for functions called from JITed bpf
program.

Signed-off-by: Sandipan Das 
---
 arch/powerpc/net/bpf_jit64.h  |  7 ---
 arch/powerpc/net/bpf_jit_comp64.c | 16 ++--
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit64.h b/arch/powerpc/net/bpf_jit64.h
index 62fa7589db2b..8bdef7ed28a8 100644
--- a/arch/powerpc/net/bpf_jit64.h
+++ b/arch/powerpc/net/bpf_jit64.h
@@ -23,7 +23,7 @@
  * [   nv gpr save area] 8*8   |
  * [tail_call_cnt  ] 8 |
  * [local_tmp_var  ] 8 |
- * fp (r31) -->[   ebpf stack space] 512   |
+ * fp (r31) -->[   ebpf stack space] upto 512  |
  * [ frame header  ] 32/112|
  * sp (r1) --->[stack pointer  ] --
  */
@@ -32,8 +32,8 @@
 #define BPF_PPC_STACK_SAVE (8*8)
 /* for bpf JIT code internal usage */
 #define BPF_PPC_STACK_LOCALS   16
-/* Ensure this is quadword aligned */
-#define BPF_PPC_STACKFRAME (STACK_FRAME_MIN_SIZE + MAX_BPF_STACK + \
+/* stack frame excluding BPF stack, ensure this is quadword aligned */
+#define BPF_PPC_STACKFRAME (STACK_FRAME_MIN_SIZE + \
 BPF_PPC_STACK_LOCALS + BPF_PPC_STACK_SAVE)
 
 #ifndef __ASSEMBLY__
@@ -103,6 +103,7 @@ struct codegen_context {
 */
unsigned int seen;
unsigned int idx;
+   unsigned int stack_size;
 };
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 6ba5d253e857..a01362c88f6a 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -69,7 +69,7 @@ static inline bool bpf_has_stack_frame(struct codegen_context 
*ctx)
 static int bpf_jit_stack_local(struct codegen_context *ctx)
 {
if (bpf_has_stack_frame(ctx))
-   return STACK_FRAME_MIN_SIZE + MAX_BPF_STACK;
+   return STACK_FRAME_MIN_SIZE + ctx->stack_size;
else
return -(BPF_PPC_STACK_SAVE + 16);
 }
@@ -82,8 +82,9 @@ static int bpf_jit_stack_tailcallcnt(struct codegen_context 
*ctx)
 static int bpf_jit_stack_offsetof(struct codegen_context *ctx, int reg)
 {
if (reg >= BPF_PPC_NVR_MIN && reg < 32)
-   return (bpf_has_stack_frame(ctx) ? BPF_PPC_STACKFRAME : 0)
-   - (8 * (32 - reg));
+   return (bpf_has_stack_frame(ctx) ?
+   (BPF_PPC_STACKFRAME + ctx->stack_size) : 0)
+   - (8 * (32 - reg));
 
pr_err("BPF JIT is asking about unknown registers");
BUG();
@@ -134,7 +135,7 @@ static void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
PPC_BPF_STL(0, 1, PPC_LR_STKOFF);
}
 
-   PPC_BPF_STLU(1, 1, -BPF_PPC_STACKFRAME);
+   PPC_BPF_STLU(1, 1, -(BPF_PPC_STACKFRAME + ctx->stack_size));
}
 
/*
@@ -161,7 +162,7 @@ static void bpf_jit_build_prologue(u32 *image, struct 
codegen_context *ctx)
/* Setup frame pointer to point to the bpf stack area */
if (bpf_is_seen_register(ctx, BPF_REG_FP))
PPC_ADDI(b2p[BPF_REG_FP], 1,
-   STACK_FRAME_MIN_SIZE + MAX_BPF_STACK);
+   STACK_FRAME_MIN_SIZE + ctx->stack_size);
 }
 
 static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context 
*ctx)
@@ -183,7 +184,7 @@ static void bpf_jit_emit_common_epilogue(u32 *image, struct 
codegen_context *ctx
 
/* Tear down our stack frame */
if (bpf_has_stack_frame(ctx)) {
-   PPC_ADDI(1, 1, BPF_PPC_STACKFRAME);
+   PPC_ADDI(1, 1, BPF_PPC_STACKFRAME + ctx->stack_size);
if (ctx->seen & SEEN_FUNC) {
PPC_BPF_LL(0, 1, PPC_LR_STKOFF);
PPC_MTLR(0);
@@ -993,6 +994,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 
memset(&cgctx, 0, sizeof(struct codegen_context));
 
+   /* Make sure that the stack is quadword aligned. */
+   cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
+
/* Scouting faux-generate pass 0 */
if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
/* We hit something illegal or unsupported. */
-- 
2.13.5

[PATCH] mm/mmu_notifier: avoid double notification when it is useless

2017-09-01 Thread jglisse

From: Jérôme Glisse 

(Note that this is 4.15 material or 4.14 if people are extra confident. I
 am posting now to get people to test. To that effect maybe it would be a
 good idea to have that patch sit in linux-next for a while for testing.

 Other motivation is that the problem is fresh in everyone's memory

 Thanks to Andrea for thinking of a problematic scenario for COW)

When clearing a pte/pmd we are given a choice to notify the event through
(notify version of *_clear_flush call mmu_notifier_invalidate_range) under
the page table lock. But that notification is not necessary in all cases.

This patches remove almost all the case where it is useless to have a call to
mmu_notifier_invalidate_range() before mmu_notifier_invalidate_range_end().
It also adds documentation in all those case explaining why.

Below is a more in depth analysis of why this is fine to do this:

For secondary TLB (non CPU TLB) like IOMMU TLB or device TLB (when device use
thing like ATS/PASID to get the IOMMU to walk the CPU page table to access a
process virtual address space). There is only 2 cases when you need to notify
those secondary TLB while holding page table lock when clearing a pte/pmd:

  A) page backing address is free before mmu_notifier_invalidate_range_end()
  B) a page table entry is updated to point to a new page (COW, write fault
 on zero page, __replace_page(), ...)

Case A is obvious you do not want to take the risk for the device to write to
a page that might now be use by some completely different task.

Case B is more subtle. For correctness it requires the following sequence to
happen:
  - take page table lock
  - clear page table entry and notify ([pmd/pte]p_huge_clear_flush_notify())
  - set page table entry to point to new page

If clearing the page table entry is not followed by a notify before setting
the new pte/pmd value then you can break memory model like C11 or C++11 for
the device.

Consider the following scenario (device use a feature similar to ATS/PASID):

Two address addrA and addrB such that |addrA - addrB| >= PAGE_SIZE we assume
they are write protected for COW (other case of B apply too).

[Time N] 
CPU-thread-0  {try to write to addrA}
CPU-thread-1  {try to write to addrB}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {read addrA and populate device TLB}
DEV-thread-2  {read addrB and populate device TLB}
[Time N+1] --
CPU-thread-0  {COW_step0: {mmu_notifier_invalidate_range_start(addrA)}}
CPU-thread-1  {COW_step0: {mmu_notifier_invalidate_range_start(addrB)}}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+2] --
CPU-thread-0  {COW_step1: {update page table to point to new page for addrA}}
CPU-thread-1  {COW_step1: {update page table to point to new page for addrB}}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+3] --
CPU-thread-0  {preempted}
CPU-thread-1  {preempted}
CPU-thread-2  {write to addrA which is a write to new page}
CPU-thread-3  {}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+3] --
CPU-thread-0  {preempted}
CPU-thread-1  {preempted}
CPU-thread-2  {}
CPU-thread-3  {write to addrB which is a write to new page}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+4] --
CPU-thread-0  {preempted}
CPU-thread-1  {COW_step3: {mmu_notifier_invalidate_range_end(addrB)}}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {}
DEV-thread-2  {}
[Time N+5] --
CPU-thread-0  {preempted}
CPU-thread-1  {}
CPU-thread-2  {}
CPU-thread-3  {}
DEV-thread-0  {read addrA from old page}
DEV-thread-2  {read addrB from new page}

So here because at time N+2 the clear page table entry was not pair with a
notification to invalidate the secondary TLB, the device see the new value for
addrB before seing the new value for addrA. This break total memory ordering
for the device.

When changing a pte to write protect or to point to a new write protected page
with same content (KSM) it is fine to delay the mmu_notifier_invalidate_range
call to mmu_notifier_invalidate_range_end() outside the page table lock. This
is true ven if the thread doing the page table update is preempted right after
releasing page table lock but before call mmu_notifier_invalidate_range_end().

Signed-off-by: Jérôme Glisse 
Cc: Andrea Arcangeli 
Cc: Nadav Amit 
Cc: Linus Torvalds 
Cc: Andrew Morton 
Cc: Joerg Roedel 
Cc: Suravee Suthikulpanit 
Cc: David Woodhouse 
Cc: Alistair Popple 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Stephen Rothwell 

Cc: io...@lists.linux-foundation.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-n...@vger.ker

[PATCH V13 4/4] powerpc/vphn: Fix numa update end-loop bug

2017-09-01 Thread Michael Bringmann

powerpc/vphn: On Power systems with shared configurations of CPUs
and memory, there are some issues with the association of additional
CPUs and memory to nodes when hot-adding resources.  This patch
fixes an end-of-updates processing problem observed occasionally
in numa_update_cpu_topology().

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/mm/numa.c |7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 3a5b334..fccf23f 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1410,6 +1410,13 @@ int numa_update_cpu_topology(bool cpus_locked)
cpu = cpu_last_thread_sibling(cpu);
}
 
+   /*
+* Prevent processing of 'updates' from overflowing array
+* in cases where last entry filled in a 'next' pointer.
+*/
+   if (i)
+   updates[i-1].next = NULL;
+
pr_debug("Topology update for the following CPUs:\n");
if (cpumask_weight(&updated_cpus)) {
for (ud = &updates[0]; ud; ud = ud->next) {

[PATCH V13 3/4] powerpc/hotplug: Improve responsiveness of hotplug change

2017-09-01 Thread Michael Bringmann

powerpc/hotplug: On Power systems with shared configurations of CPUs
and memory, there are some issues with the association of additional
CPUs and memory to nodes when hot-adding resources.  During hotplug
CPU operations, this patch resets the timer on topology update work
function to a small value to better ensure that the CPU topology is
detected and configured sooner.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/include/asm/topology.h  |8 
 arch/powerpc/mm/numa.c   |   21 -
 arch/powerpc/platforms/pseries/hotplug-cpu.c |2 ++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index dc4e159..beb9bca 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -98,6 +98,14 @@ static inline int prrn_is_enabled(void)
 }
 #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */
 
+#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_NEED_MULTIPLE_NODES)
+#if defined(CONFIG_PPC_SPLPAR)
+extern int timed_topology_update(int nsecs);
+#else
+#definetimed_topology_update(nsecs)
+#endif /* CONFIG_PPC_SPLPAR */
+#endif /* CONFIG_HOTPLUG_CPU || CONFIG_NEED_MULTIPLE_NODES */
+
 #include 
 
 #ifdef CONFIG_SMP
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index c08d736..3a5b334 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1148,15 +1148,34 @@ struct topology_update_data {
int new_nid;
 };
 
+#define TOPOLOGY_DEF_TIMER_SECS60
+
 static u8 vphn_cpu_change_counts[NR_CPUS][MAX_DISTANCE_REF_POINTS];
 static cpumask_t cpu_associativity_changes_mask;
 static int vphn_enabled;
 static int prrn_enabled;
 static void reset_topology_timer(void);
+static int topology_timer_secs = 1;
 static int topology_inited;
 static int topology_update_needed;
 
 /*
+ * Change polling interval for associativity changes.
+ */
+int timed_topology_update(int nsecs)
+{
+   if (nsecs > 0)
+   topology_timer_secs = nsecs;
+   else
+   topology_timer_secs = TOPOLOGY_DEF_TIMER_SECS;
+
+   if (vphn_enabled)
+   reset_topology_timer();
+
+   return 0;
+}
+
+/*
  * Store the current values of the associativity change counters in the
  * hypervisor.
  */
@@ -1489,7 +1508,7 @@ static void topology_timer_fn(unsigned long ignored)
 static void reset_topology_timer(void)
 {
topology_timer.data = 0;
-   topology_timer.expires = jiffies + 60 * HZ;
+   topology_timer.expires = jiffies + topology_timer_secs * HZ;
mod_timer(&topology_timer, topology_timer.expires);
 }
 
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 6afd1ef..5a7fb1e 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -356,6 +356,7 @@ static int dlpar_online_cpu(struct device_node *dn)
BUG_ON(get_cpu_current_state(cpu)
!= CPU_STATE_OFFLINE);
cpu_maps_update_done();
+   timed_topology_update(1);
rc = device_online(get_cpu_device(cpu));
if (rc)
goto out;
@@ -522,6 +523,7 @@ static int dlpar_offline_cpu(struct device_node *dn)
set_preferred_offline_state(cpu,
CPU_STATE_OFFLINE);
cpu_maps_update_done();
+   timed_topology_update(1);
rc = device_offline(get_cpu_device(cpu));
if (rc)
goto out;

[PATCH V13 2/4] powerpc/vphn: Improve recognition of PRRN/VPHN

2017-09-01 Thread Michael Bringmann

powerpc/vphn: On Power systems with shared configurations of CPUs
and memory, there are some issues with the association of additional
CPUs and memory to nodes when hot-adding resources.  This patch
updates the initialization checks to independently recognize PRRN
or VPHN support.

Signed-off-by: Michael Bringmann 
---
Changes in V13:
  -- Split patch to improve review
---
 arch/powerpc/mm/numa.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 312f6ee..c08d736 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1543,7 +1543,8 @@ int start_topology_update(void)
rc = of_reconfig_notifier_register(&dt_update_nb);
 #endif
}
-   } else if (firmware_has_feature(FW_FEATURE_VPHN) &&
+   }
+   if (firmware_has_feature(FW_FEATURE_VPHN) &&
   lppaca_shared_proc(get_lppaca())) {
if (!vphn_enabled) {
vphn_enabled = 1;
@@ -1568,7 +1569,8 @@ int stop_topology_update(void)
 #ifdef CONFIG_SMP
rc = of_reconfig_notifier_unregister(&dt_update_nb);
 #endif
-   } else if (vphn_enabled) {
+   }
+   if (vphn_enabled) {
vphn_enabled = 0;
rc = del_timer_sync(&topology_timer);
}

[PATCH V13 1/4] powerpc/vphn: Update CPU topology when VPHN enabled

2017-09-01 Thread Michael Bringmann

powerpc/vphn: On Power systems with shared configurations of CPUs
and memory, there are some issues with the association of additional
CPUs and memory to nodes when hot-adding resources.  This patch
corrects the currently broken capability to set the topology for
shared CPUs in LPARs.  At boot time for shared CPU lpars, the
topology for each CPU was being set to node zero.  Now when
numa_update_cpu_topology() is called appropriately, the Virtual
Processor Home Node (VPHN) capabilities information provided by the
pHyp allows the appropriate node in the shared configuration to be
selected for the CPU.

Signed-off-by: Michael Bringmann 
---
Changes in V13:
  -- Split patch for improved review
---
 arch/powerpc/mm/numa.c |   31 ---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index b95c584..312f6ee 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1153,6 +1153,8 @@ struct topology_update_data {
 static int vphn_enabled;
 static int prrn_enabled;
 static void reset_topology_timer(void);
+static int topology_inited;
+static int topology_update_needed;
 
 /*
  * Store the current values of the associativity change counters in the
@@ -1246,6 +1248,11 @@ static long vphn_get_associativity(unsigned long cpu,
"hcall_vphn() experienced a hardware fault "
"preventing VPHN. Disabling polling...\n");
stop_topology_update();
+   break;
+   case H_SUCCESS:
+   dbg("VPHN hcall succeeded. Reset polling...\n");
+   timed_topology_update(0);
+   break;
}
 
return rc;
@@ -1323,8 +1330,11 @@ int numa_update_cpu_topology(bool cpus_locked)
struct device *dev;
int weight, new_nid, i = 0;
 
-   if (!prrn_enabled && !vphn_enabled)
+   if (!prrn_enabled && !vphn_enabled) {
+   if (!topology_inited)
+   topology_update_needed = 1;
return 0;
+   }
 
weight = cpumask_weight(&cpu_associativity_changes_mask);
if (!weight)
@@ -1363,6 +1373,8 @@ int numa_update_cpu_topology(bool cpus_locked)
cpumask_andnot(&cpu_associativity_changes_mask,
&cpu_associativity_changes_mask,
cpu_sibling_mask(cpu));
+   pr_info("Assoc chg gives same node %d for cpu%d\n",
+   new_nid, cpu);
cpu = cpu_last_thread_sibling(cpu);
continue;
}
@@ -1433,6 +1445,7 @@ int numa_update_cpu_topology(bool cpus_locked)
 
 out:
kfree(updates);
+   topology_update_needed = 0;
return changed;
 }
 
@@ -1453,6 +1466,13 @@ static void topology_schedule_update(void)
schedule_work(&topology_work);
 }
 
+void shared_topology_update(void)
+{
+   if (firmware_has_feature(FW_FEATURE_VPHN) &&
+  lppaca_shared_proc(get_lppaca()))
+   topology_schedule_update();
+}
+
 static void topology_timer_fn(unsigned long ignored)
 {
if (prrn_enabled && cpumask_weight(&cpu_associativity_changes_mask))
@@ -1519,7 +1539,6 @@ int start_topology_update(void)
if (firmware_has_feature(FW_FEATURE_PRRN)) {
if (!prrn_enabled) {
prrn_enabled = 1;
-   vphn_enabled = 0;
 #ifdef CONFIG_SMP
rc = of_reconfig_notifier_register(&dt_update_nb);
 #endif
@@ -1527,7 +1546,6 @@ int start_topology_update(void)
} else if (firmware_has_feature(FW_FEATURE_VPHN) &&
   lppaca_shared_proc(get_lppaca())) {
if (!vphn_enabled) {
-   prrn_enabled = 0;
vphn_enabled = 1;
setup_cpu_associativity_change_counters();
init_timer_deferrable(&topology_timer);
@@ -1613,9 +1631,16 @@ static int topology_update_init(void)
if (topology_updates_enabled)
start_topology_update();
 
+   shared_topology_update();
+
if (!proc_create("powerpc/topology_updates", 0644, NULL, &topology_ops))
return -ENOMEM;
 
+   topology_inited = 1;
+   if (topology_update_needed)
+   bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
+   nr_cpumask_bits);
+
return 0;
 }
 device_initcall(topology_update_init);

[PATCH V13 0/4] powerpc/vphn: Update CPU topology when VPHN enabled

2017-09-01 Thread Michael Bringmann

powerpc/numa: On Power systems with shared configurations of CPUs
and memory, there are some issues with the association of additional
CPUs and memory to nodes when hot-adding resources.  This patch
addresses some of those problems.

First, it corrects the currently broken capability to set the
topology for shared CPUs in LPARs.  At boot time for shared CPU
lpars, the topology for each CPU was being set to node zero.  Now
when numa_update_cpu_topology() is called appropriately, the Virtual
Processor Home Node (VPHN) capabilities information provided by the
pHyp allows the appropriate node in the shared configuration to be
selected for the CPU.

Next, it updates the initialization checks to independently recognize
PRRN or VPHN support.

Next, during hotplug CPU operations, it resets the timer on topology
update work function to a small value to better ensure that the CPU
topology is detected and configured sooner.

Also, fix an end-of-updates processing problem observed occasionally
in numa_update_cpu_topology().

Signed-off-by: Michael Bringmann 

Michael Bringmann (4):
  powerpc/vphn: Update CPU topology when VPHN enabled
  powerpc/vphn: Improve recognition of PRRN/VPHN
  powerpc/hotplug: Improve responsiveness of hotplug change
  powerpc/vphn:
---
Changes in V13:
  -- Split into multiple smaller patches

Re: axonram: Delete an error message for a failed memory allocation in axon_ram_probe()

2017-09-01 Thread SF Markus Elfring

> Series applied to powerpc next, thanks.

Thanks for another positive feedback.

But I wonder how you can refer to the “series” when the forth update step
“Delete an unnecessary variable initialisation” contained a broken suggestion.


> https://git.kernel.org/powerpc/c/a1bddf3991573f3dff2762bfca5e11

This link seems to refer (only?) to the update step “Improve a size 
determination”.
How is the status for the other two change possibilities of this software 
module?

Regards,
Markus

Re: [FIX PATCH v0] powerpc: Fix memory unplug failure on radix guest

2017-09-01 Thread Nathan Fontenot

On 09/01/2017 01:53 AM, Bharata B Rao wrote:
> On Thu, Aug 10, 2017 at 02:53:48PM +0530, Bharata B Rao wrote:
>> For a PowerKVM guest, it is possible to specify a DIMM device in
>> addition to the system RAM at boot time. When such a cold plugged DIMM
>> device is removed from a radix guest, we hit the following warning in the
>> guest kernel resulting in the eventual failure of memory unplug:
>>
>> remove_pud_table: unaligned range
>> WARNING: CPU: 3 PID: 164 at arch/powerpc/mm/pgtable-radix.c:597 
>> remove_pagetable+0x468/0xca0
>> Call Trace:
>> remove_pagetable+0x464/0xca0 (unreliable)
>> radix__remove_section_mapping+0x24/0x40
>> remove_section_mapping+0x28/0x60
>> arch_remove_memory+0xcc/0x120
>> remove_memory+0x1ac/0x270
>> dlpar_remove_lmb+0x1ac/0x210
>> dlpar_memory+0xbc4/0xeb0
>> pseries_hp_work_fn+0x1a4/0x230
>> process_one_work+0x1cc/0x660
>> worker_thread+0xac/0x6d0
>> kthread+0x16c/0x1b0
>> ret_from_kernel_thread+0x5c/0x74
>>
>> The DIMM memory that is cold plugged gets merged to the same memblock
>> region as RAM and hence gets mapped at 1G alignment. However since the
>> removal is done for one LMB (lmb size 256MB) at a time, the address
>> of the LMB (which is 256MB aligned) would get flagged as unaligned
>> in remove_pud_table() resulting in the above failure.
>>
>> This problem is not seen for hot plugged memory because for the
>> hot plugged memory, the mappings are created separately for each
>> LMB and hence they all get aligned at 256MB.
>>
>> To fix this problem for the cold plugged memory, let us mark the
>> cold plugged memblock region explicitly as HOTPLUGGED so that the
>> region doesn't get merged with RAM. All the memory that is discovered
>> via ibm,dynamic-memory-configuration is marked so(1). Next identify
>> such regions in radix_init_pgtable() and create separate mappings
>> within that region for each LMB so that they get don't get aligned
>> like RAM region at 1G (2).
>>
>> (1) For PowerKVM guests, all boot time memory is represented via
>> memory@ nodes and hot plugged/pluggable memory is represented via
>> ibm,dynamic-memory-reconfiguration property. We are marking all
>> hotplugged memory that is in ASSIGNED state during boot as HOTPLUGGED.
>> With this only cold plugged memory gets marked for PowerKVM but
>> need to check how this will affect PowerVM guests.
>>
>> (2) To create separate mappings for every LMB in the hot plugged
>> region, we need lmb-size. I am currently using memory_block_size_bytes()
>> API to get the lmb-size. Since this is early init time code, the
>> machine type isn't probed yet and hence memory_block_size_bytes()
>> would return the default LMB size as 16MB. Hence we end up creating
>> separate mappings at much lower granularity than what we can ideally
>> do for pseries machine.
>>
>> Signed-off-by: Bharata B Rao 
>> ---
>>  arch/powerpc/kernel/prom.c  |  1 +
>>  arch/powerpc/mm/pgtable-radix.c | 17 ++---
>>  2 files changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
>> index f830562..24ecf53 100644
>> --- a/arch/powerpc/kernel/prom.c
>> +++ b/arch/powerpc/kernel/prom.c
>> @@ -524,6 +524,7 @@ static int __init 
>> early_init_dt_scan_drconf_memory(unsigned long node)
>>  size = 0x8000ul - base;
>>  }
>>  memblock_add(base, size);
>> +memblock_mark_hotplug(base, size);
> 
> One of the suggestions was to make the above conditional to radix so
> that PowerVM doesn't get affected by this. However early_radix_enabled()
> check isn't usable yet at this point and MMU_FTR_TYPE_RADIX will get set
> only a bit later in early_init_devtree().

We do walk the dynamic reconfiguration memory again in the numa code, see
parse_drconf_memory() in numa.c, would it far enough along in boot to use
early_radix_enabled() and mark the memory hotplug at this point?

This may not be the ideal place to mark hotplug memory for radix but it may
be nicer than adding another walk of the device tree property somewhere else.

-Nathan

> 
> Regards,
> Bharata.
>

Re: [V4, 1/7] crypto/nx: Rename nx842_powernv_function as icswx function

2017-09-01 Thread Michael Ellerman

On Thu, 2017-08-31 at 07:11:29 UTC, Haren Myneni wrote:
> Rename nx842_powernv_function to nx842_powernv_exec.
> nx842_powernv_exec points to nx842_exec_icswx and
> will be point to VAS exec function which will be added later
> for P9 NX support.
> 
> Signed-off-by: Haren Myneni 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/c97f8169fb227cae5adeac56cafa98

cheers

Re: powerpc: 4xx: constify platform_suspend_ops

2017-09-01 Thread Michael Ellerman

On Wed, 2017-08-30 at 16:48:20 UTC, Arvind Yadav wrote:
> platform_suspend_ops are not supposed to change at runtime.
> Functions suspend_set_ops working with const platform_suspend_ops.
> So mark the non-const structs as const.
> 
> Signed-off-by: Arvind Yadav 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/7def9a2418430a8af2a56023769ccd

cheers

Re: powerpc: Fix DAR reporting when alignment handler faults

2017-09-01 Thread Michael Ellerman

On Thu, 2017-08-24 at 10:49:57 UTC, Michael Ellerman wrote:
> Anton noticed that if we fault part way through emulating an unaligned
> instruction, we don't update the DAR to reflect that.
> 
> The DAR value is eventually reported back to userspace as the address
> in the SEGV signal, and if userspace is using that value to demand
> fault then it can be confused by us not setting the value correctly.
> 
> This patch is ugly as hell, but is intended to be the minimal fix and
> back ports easily.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/f9effe925039cf54489b5c04e0d400

cheers

Re: [v3, 01/17] powerpc: Correct instruction code for xxlor instruction

2017-09-01 Thread Michael Ellerman

On Wed, 2017-08-30 at 04:12:24 UTC, Paul Mackerras wrote:
> The instruction code for xxlor that commit 0016a4cf5582 ("powerpc:
> Emulate most Book I instructions in emulate_step()", 2010-06-15)
> added is actually the code for xxlnor.  It is used in get_vsr()
> and put_vsr() and the effect of the error is that if emulate_step
> is used to emulate a VSX load or store from any register other
> than vsr0, the bitwise complement of the correct value will be
> loaded or stored.  This corrects the error.
> 
> Fixes: 0016a4cf5582 ("powerpc: Emulate most Book I instructions in 
> emulate_step()")
> Signed-off-by: Paul Mackerras 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/93b2d3cf3733b4060d3623161551f5

cheers

Re: powerpc/pseries: Don't attempt to acquire drc during memory hot add for assigned lmbs

2017-09-01 Thread Michael Ellerman

On Wed, 2017-08-23 at 17:18:43 UTC, John Allen wrote:
> Check if an LMB is assigned before attempting to call dlpar_acquire_drc in
> order to avoid any unnecessary rtas calls. This substantially reduces the
> running time of memory hot add on lpars with large amounts of memory.
> 
> Signed-off-by: John Allen 
> Reviewed-by: Nathan Fontenot 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/afb5519fdb346201728040cab4e08c

cheers

Re: [1/4] powerpc/32: add memset16()

2017-09-01 Thread Michael Ellerman

On Wed, 2017-08-23 at 14:54:32 UTC, Christophe Leroy wrote:
> Commit 694fc88ce271f ("powerpc/string: Implement optimized
> memset variants") added memset16(), memset32() and memset64()
> for the 64 bits PPC.
> 
> On 32 bits, memset64() is not relevant, and as shown below,
> the generic version of memset32() gives a good code, so only
> memset16() is candidate for an optimised version.
> 
> 09c0 :
>  9c0:   2c 05 00 00 cmpwi   r5,0
>  9c4:   39 23 ff fc addir9,r3,-4
>  9c8:   4d 82 00 20 beqlr
>  9cc:   7c a9 03 a6 mtctr   r5
>  9d0:   94 89 00 04 stwur4,4(r9)
>  9d4:   42 00 ff fc bdnz9d0 
>  9d8:   4e 80 00 20 blr
> 
> The last part of memset() handling the not 4-bytes multiples
> operates on bytes, making it unsuitable for handling word without
> modification. As it would increase memset() complexity, it is
> better to implement memset16() from scratch. In addition it
> has the advantage of allowing a more optimised memset16() than what
> we would have by using the memset() function.
> 
> Signed-off-by: Christophe Leroy 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/da74f659205ea08cb0fd0b3050637b

cheers

Re: [1/2] powerpc/powernv/npu: Move tlb flush before launching ATSD

2017-09-01 Thread Michael Ellerman

On Fri, 2017-08-11 at 06:22:56 UTC, Alistair Popple wrote:
> The nest mmu tlb flush needs to happen before the GPU translation shootdown
> is launched to avoid the GPU refilling its tlb with stale nmmu translations
> prior to the nmmu flush completing.
> 
> Signed-off-by: Alistair Popple 
> Cc: sta...@vger.kernel.org

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/bab9f954aaf352127725a9b7920226

cheers

Re: powerpc/eeh: Delete an error message for a failed memory allocation in two functions

2017-09-01 Thread Michael Ellerman

On Fri, 2017-08-04 at 14:46:51 UTC, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Fri, 4 Aug 2017 16:37:56 +0200
> 
> Omit an extra message for a memory allocation failure in these functions.
> 
> This issue was detected by using the Coccinelle software.
> 
> Link: 
> http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
> Signed-off-by: Markus Elfring 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/6ab41161b44a3b4d504ac29c9dd997

cheers

Re: [1/4] axonram: Delete an error message for a failed memory allocation in axon_ram_probe()

2017-09-01 Thread Michael Ellerman

On Thu, 2017-08-03 at 19:12:50 UTC, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Thu, 3 Aug 2017 19:49:18 +0200
> 
> Omit an extra message for a memory allocation failure in this function.
> 
> This issue was detected by using the Coccinelle software.
> 
> Link: 
> http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
> Signed-off-by: Markus Elfring 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/a1bddf3991573f3dff2762bfca5e11

cheers

Re: powerpc/macintosh: constify wf_sensor_ops structures

2017-09-01 Thread Michael Ellerman

On Wed, 2017-08-02 at 21:01:45 UTC, Julia Lawall wrote:
> The wf_sensor_ops structures are only stored in the ops field of a
> wf_sensor structure, which is declared as const.  Thus the
> wf_sensor_ops structures themselves can be const.
> 
> Done with the help of Coccinelle.
> 
> // 
> @r disable optional_qualifier@
> identifier i;
> position p;
> @@
> static struct wf_sensor_ops i@p = { ... };
> 
> @ok1@
> identifier r.i;
> struct wf_sensor s;
> position p;
> @@
> s.ops = &i@p
> 
> @ok2@
> identifier r.i;
> struct wf_sat_sensor s;
> position p;
> @@
> s.sens.ops = &i@p
> 
> @bad@
> position p != {r.p,ok1.p,ok2.p};
> identifier r.i;
> struct wf_sensor_ops e;
> @@
> e@i@p
> 
> @depends on !bad disable optional_qualifier@
> identifier r.i;
> @@
> static
> +const
>  struct wf_sensor_ops i = { ... };
> // 
> 
> Signed-off-by: Julia Lawall 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/de854e54d79bc0ad5c45c5be50821b

cheers

Re: macintosh: Convert to using %pOF instead of full_name

2017-09-01 Thread Michael Ellerman

On Tue, 2017-07-18 at 21:43:12 UTC, Rob Herring wrote:
> Now that we have a custom printf format specifier, convert users of
> full_name to use %pOF instead. This is preparation to remove storing
> of the full path string for each node.
> 
> Signed-off-by: Rob Herring 
> Cc: Benjamin Herrenschmidt 
> Cc: linuxppc-dev@lists.ozlabs.org

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b6a945ae03fd3962b51b27ecedf4f1

cheers

Re: ide: pmac: Convert to using %pOF instead of full_name

2017-09-01 Thread Michael Ellerman

On Tue, 2017-07-18 at 21:43:07 UTC, Rob Herring wrote:
> Now that we have a custom printf format specifier, convert users of
> full_name to use %pOF instead. This is preparation to remove storing
> of the full path string for each node.
> 
> Signed-off-by: Rob Herring 
> Cc: "David S. Miller" 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: linux-...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Acked-by: David S. Miller 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/859420e3155d8192b31a93cd92d32c

cheers

Re: [1/4] powerpc/smp: Use cpu_to_chip_id() to find core siblings

2017-09-01 Thread Michael Ellerman

On Thu, 2017-06-29 at 07:12:53 UTC, Oliver O'Halloran wrote:
> When building the CPU scheduler topology the kernel uses the ibm,chipid
> property from the devicetree to group logical CPUs. Currently the DT
> search for this property is open-coded in smp.c and this functionality
> is a duplication of what's in cpu_to_chip_id() already. This patch
> removes the existing search in favor of that.
> 
> It's worth mentioning that the semantics of the search are different
> in cpu_to_chip_id(). When there is no ibm,chipid in the CPUs node it
> will also search /cpus and / for the property, but this should not
> effect the output topology.
> 
> Signed-off-by: Oliver O'Halloran 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/e3d8b67e2c60dcd35661d34df249f2

cheers

Re: powerpc/mm: Use seq_putc() in two functions

2017-09-01 Thread Michael Ellerman

On Sun, 2017-05-07 at 14:38:36 UTC, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Sun, 7 May 2017 16:32:04 +0200
> 
> Two single characters (line breaks) should be put into a sequence.
> Thus use the corresponding function "seq_putc".
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/aae85e3c20ec3baa90aa901a517c8f

cheers

Re: [08/15] powerpc/iommu: use permission-specific DEVICE_ATTR variants

2017-09-01 Thread Michael Ellerman

On Sat, 2016-10-29 at 19:37:02 UTC, Julia Lawall wrote:
> Use DEVICE_ATTR_RW for read-write attributes.  This simplifies the
> source code, improves readbility, and reduces the chance of
> inconsistencies.
> 
> The semantic patch that makes this change is as follows:
> (http://coccinelle.lip6.fr/)
> 
> // 
> @rw@
> declarer name DEVICE_ATTR;
> identifier x,x_show,x_store;
> @@
> 
> DEVICE_ATTR(x, \(0644\|S_IRUGO|S_IWUSR\), x_show, x_store);
> 
> @script:ocaml@
> x << rw.x;
> x_show << rw.x_show;
> x_store << rw.x_store;
> @@
> 
> if not (x^"_show" = x_show && x^"_store" = x_store)
> then Coccilib.include_match false
> 
> @@
> declarer name DEVICE_ATTR_RW;
> identifier rw.x,rw.x_show,rw.x_store;
> @@
> 
> - DEVICE_ATTR(x, \(0644\|S_IRUGO|S_IWUSR\), x_show, x_store);
> + DEVICE_ATTR_RW(x);
> // 
> 
> Signed-off-by: Julia Lawall 
> Acked-by: Michael Ellerman  (powerpc)

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/8a7aef2cb3dafd2e8560750f4e5ad2

cheers

Re: [PATCH V3 6/6] crypto/nx: Add P9 NX support for 842 compression engine

2017-09-01 Thread Michael Ellerman

Haren Myneni  writes:
>> On Mon, Aug 28, 2017 at 7:25 PM, Michael Ellerman  
>> wrote:
>>> Hi Haren,
>>>
>>> Some comments inline ...
>>>
>>> Haren Myneni  writes:
>>>
 diff --git a/drivers/crypto/nx/nx-842-powernv.c 
 b/drivers/crypto/nx/nx-842-powernv.c
 index c0dd4c7e17d3..13089a0b9dfa 100644
 --- a/drivers/crypto/nx/nx-842-powernv.c
 +++ b/drivers/crypto/nx/nx-842-powernv.c
 @@ -32,6 +33,9 @@ MODULE_ALIAS_CRYPTO("842-nx");

  #define WORKMEM_ALIGN(CRB_ALIGN)
  #define CSB_WAIT_MAX (5000) /* ms */
 +#define VAS_RETRIES  (10)
>>>
>>> Where does that number come from?
>
> Sometimes HW returns copy/paste failures. So we should retry the
> request again. With 10 retries, Test running 12 hours was successful
> for repeated compression/decompression requests with 1024 threads.

But why 10. Why not 5, or 100, or 1, or 10,000?

Presumably when we have to retry it means the NX is too busy to service the
request?

Do we have any way to find out how long it might be busy for?

Should we try an NX on another chip?

We should also take into account the size of our request, ie. are we
asking the NX to compress one page, or 1GB ?

If it's just one page maybe we should fall back to software immediately.

cheers

Re: [PATCH v3 0/3] Split default display handling out from VGA arbiter

2017-09-01 Thread Daniel Axtens

Hi Ard,

> If we are all in agreement that fixing X is not an option, I think
> this is a reasonable approach

This did come up in discussion at some earlier point in one of the many
spins we've done of this - I don't remember if you brought it up or
someone else did - but my concern was this: If it were just X we would
be fine, but if we go down that path I'm worried about also having to
fix Mir/Wayland/whatever-the-new-exciting-display-server-is-this-week,
ad nauseum.

> Acked-by: Ard Biesheuvel 

Thanks!

Regards,
Daniel

Machine Check in P2010(e500v2)

2017-09-01 Thread Joakim Tjernlund

I am trying to debug a Machine Check for a P2010 (e500v2) CPU:

[   28.111816] Caused by (from MCSR=10008): Bus - Read Data Bus Error
[   28.117998] Oops: Machine check, sig: 7 [#1]
[   28.122263] P1010 RDB
[   28.124529] Modules linked in: linux_bcm_knet(PO) linux_user_bde(PO) 
linux_kernel_bde(PO)
[   28.132718] CPU: 0 PID: 470 Comm: emxp2_hw_bl Tainted: P   O
4.1.38+ #49
[   28.140376] task: db16cd10 ti: df128000 task.ti: df128000
[   28.145770] NIP:  LR: 10a4e404 CTR: 10046c38
[   28.150730] REGS: df129f10 TRAP: 0204   Tainted: P   O (4.1.38+)
[   28.157776] MSR: 0002d000   CR: 44002428  XER: 
[   28.164140] DEAR: b7187000 ESR: 
GPR00: 10a4e404 bf86ea30 b7ca94a0 132f9fa8 07006000 0700  132f9fd8
GPR08: b7149000 b7159000 0003e000 bf86ea20 24004424 11d6cf7c  
GPR16: 10f6e29c 10f6c872 10f6db01 b541 b541 11d92fcc 0011 0001
GPR24: 01a4d12d 132ffbf0 11d6  07006000  132f9fa8 
[   28.196375] NIP []   (null)
[   28.199859] LR [10a4e404] 0x10a4e404
[   28.203426] Call Trace:
[   28.205866] ---[ end trace f456255ddf9bee83 ]---

I cannot figure out why NIP is NULL ? It LOOKs like NIP is set to
MCSRR0 early on but maybe it is lost somehow?

Anyhow, looking at entry_32.S:
.globl  mcheck_transfer_to_handler
mcheck_transfer_to_handler:
mfspr   r0,SPRN_DSRR0
stw r0,_DSRR0(r11)
mfspr   r0,SPRN_DSRR1
stw r0,_DSRR1(r11)
/* fall through */

.globl  debug_transfer_to_handler
debug_transfer_to_handler:
mfspr   r0,SPRN_CSRR0
stw r0,_CSRR0(r11)
mfspr   r0,SPRN_CSRR1
stw r0,_CSRR1(r11)
/* fall through */

.globl  crit_transfer_to_handler
crit_transfer_to_handler:

It looks odd that DSRRx is assigned in mcheck and CSRRx in debug and
crit has none. Should not this assigment be shifted down one level?

  Jocke

Re: [PATCH V3 6/6] crypto/nx: Add P9 NX support for 842 compression engine

2017-09-01 Thread Michael Ellerman

Hi Dan,

Thanks for reviewing this series.

Dan Streetman  writes:
> On Tue, Aug 29, 2017 at 5:54 PM, Haren Myneni  
> wrote:
>> On 08/29/2017 02:23 PM, Benjamin Herrenschmidt wrote:
>>> On Tue, 2017-08-29 at 09:58 -0400, Dan Streetman wrote:
> +
> +   ret = -EINVAL;
> +   if (coproc && coproc->vas.rxwin) {
> +   wmem->txwin = nx842_alloc_txwin(coproc);

 this is wrong.  the workmem is scratch memory that's valid only for
 the duration of a single operation.
>>
>> Correct, workmem is used until crypto_free is called.
>
> that's not a 'single operation'.  a single operation is compress() or
> decompress().
>

 do you actually need a txwin per crypto transform?  or do you need a
 txwin per coprocessor?  or txwin per processor?  either per-coproc or
 per-cpu should be created at driver init and held separately
 (globally) instead of a per-transform txwin.  I really don't see why
 you would need a txwin per transform, because the coproc should not
 care how many different transforms there are.
>>>
>>> We should only need a single window for the whole kernel really, plus
>>> one per user process who wants direct access but that's not relevant
>>> here.
>>
>> Opening send window for each crypto transform (crypto_alloc,
>> compression/decompression, ..., crypto_free) so that does not
>> have to wait for the previous copy/paste complete.
>> VAS will map send and receive windows, and can cache in send
>> windows (up to 128). So I thought using the same send window
>> (per chip) for more requests (say 1000) may be adding overhead.
>>
>> I will make changes if you prefer using 1 send window per chip.
>
> i don't have the spec, so i shouldn't be making the decision on it,
> but i do know putting a persistent field into the workmem is the wrong
> location.  If it's valid for the life of the transform, put it into
> the transform context.  The workmem buffer is intended to be used only
> during a single operation - it's "working memory" to perform each
> individual crypto transformation.

I agree workmem isn't the right place for the txwin. But I don't believe
it actually breaks anything to put txwin there.

So for now I'm going to merge this series as-is and I've asked Haren to
send fixes as soon as he can to clean it up.

cheers

Re: [PATCH] powerpc/powernv: Clear LPCR[PECE1] via stop-api only for deep state offline

2017-09-01 Thread Nicholas Piggin

On Fri, 1 Sep 2017 15:38:59 +0530
Akshay Adiga  wrote:

> On 08/31/2017 05:37 PM, Nicholas Piggin wrote:
> > On Thu, 31 Aug 2017 17:17:41 +0530
> > "Gautham R. Shenoy"  wrote:
> >  
> > > From: "Gautham R. Shenoy" 
> > >
> > > commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via
> > > stop-api only on Hotplug") clears the PECE1 bit of the LPCR via
> > > stop-api during CPU-Hotplug to prevent wakeup due to a decrementer on
> > > an offlined CPU which is in a deep stop state.
> > >
> > > In the case where the stop-api support is found to be lacking, the
> > > commit 785a12afdb4a ("powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT
> > > states when stop-api fails") disables deep states that lose hypervisor
> > > context. Thus in this case, the offlined CPU will be put to some
> > > shallow idle state.
> > >
> > > However, we currently unconditionally clear the PECE1 in LPCR via
> > > stop-api during CPU-Hotplug even when deep states are disabled due to
> > > stop-api failure.
> > >
> > > Fix this by clearing PECE1 of LPCR via stop-api during CPU-Hotplug
> > > *only* when the offlined CPU will be put to a deep state that loses
> > > hypervisor context.  
> >
> > This looks okay to me. The bug is due to calling opal_slw_set_reg when
> > firmware has not enabled that feature, right?  
> Yes,
> 
> In the case where the stop-api support is found to be lacking, the 
> commit 785a12afdb4a ("powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT 
> states when stop-api fails") disables deep states that lose hypervisor 
> context. Thus in this case, the offlined CPU will be put to some shallow 
> idle state.
> 
> If a shallow state ( < stop4 ) is being chosen for cpu hotplug, then :
> 1) this opal call is not required.
> 2) may not be supported.
> 
> Hence  should call opal_slw_set_reg() only if a deep state chosen for 
> cpu hotplug.

That makes sense, thanks for confirming.

> >  
> > >
> > > Fixes: commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via
> > > stop-api only on Hotplug")
> > >
> > > Reported-by: Pavithra Prakash 
> > > Signed-off-by: Gautham R. Shenoy 


Reviewed-by: Nicholas Piggin 


> > > ---
> > >  arch/powerpc/platforms/powernv/idle.c | 8 +++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/powerpc/platforms/powernv/idle.c  
> > b/arch/powerpc/platforms/powernv/idle.c  
> > > index 9f59041..23f8fba 100644
> > > --- a/arch/powerpc/platforms/powernv/idle.c
> > > +++ b/arch/powerpc/platforms/powernv/idle.c
> > > @@ -393,7 +393,13 @@ static void  
> > pnv_program_cpu_hotplug_lpcr(unsigned int cpu, u64 lpcr_val)  
> > >  u64 pir = get_hard_smp_processor_id(cpu);
> > >
> > >  mtspr(SPRN_LPCR, lpcr_val);
> > > -opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val);
> > > +
> > > +/*
> > > + * Program the LPCR via stop-api only for deepest stop state
> > > + * can lose hypervisor context.
> > > + */
> > > +if (supported_cpuidle_states & OPAL_PM_LOSE_FULL_CONTEXT)
> > > +opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val);
> > >  }
> > >
> > >  /*  
> >  
>

Re: [PATCH RFC] KVM: PPC: Book3S: Add MMIO emulation for VMX instructions

2017-09-01 Thread Paul Mackerras

On Tue, Aug 29, 2017 at 07:18:02PM -0300, Jose Ricardo Ziviani wrote:
> This patch provides the MMIO load/store vector indexed
> X-Form emulation.
> 
> Instructions implemented: lvx, stvx
> 
> Signed-off-by: Jose Ricardo Ziviani 

Thanks for the patch.  The basic outline of it looks fine.  The parts
where you are accessing the VMX register image in memory are a bit
hard to follow, and I'll have to check them in detail.

Longer term, I want to switch over to using the instruction emulation
infrastructure in arch/powerpc/lib/sstep.c that I have recently
expanded to handle almost all load and store instructions.  That will
reduce duplication and should simplify all of this code enormously.

Regards,
Paul.

Re: [PATCH V12] powerpc/vphn: Update CPU topology when VPHN enabled

2017-09-01 Thread Michael Ellerman

Hi Michael,

Thanks for trying to reduce this down to the minimum required.

But ...

Michael Bringmann  writes:
> powerpc/numa: On Power systems with shared configurations of CPUs
> and memory, there are some issues with the association of additional
> CPUs and memory to nodes when hot-adding resources.  This patch
> addresses some of those problems.
>
> First, it corrects the currently broken capability to set the
> topology for shared CPUs in LPARs.  At boot time for shared CPU
> lpars, the topology for each CPU was being set to node zero.  Now
> when numa_update_cpu_topology() is called appropriately, the Virtual
> Processor Home Node (VPHN) capabilities information provided by the
> pHyp allows the appropriate node in the shared configuration to be
> selected for the CPU.

That should be one patch.

> Next, it updates the initialization checks to independently recognize
> PRRN or VPHN support.

And another.

> Next, during hotplug CPU operations, it resets the timer on topology
> update work function to a small value to better ensure that the CPU
> topology is detected and configured sooner.

Another.

> Also, fix an end-of-updates processing problem observed occasionally
> in numa_update_cpu_topology().

And another.


I would have let you get away with lumping it all in together, except
that it doesn't compile when CONFIG_PPC_SPLPAR=n:

  In file included from ../include/linux/topology.h:35:0,
   from ../include/linux/gfp.h:8,
   from ../include/linux/irq.h:17,
   from ../arch/powerpc/include/asm/hardirq.h:5,
   from ../include/linux/hardirq.h:8,
   from ../include/linux/interrupt.h:12,
   from ../arch/powerpc/platforms/pseries/hotplug-cpu.c:24:
  ../arch/powerpc/platforms/pseries/hotplug-cpu.c: In function 
‘dlpar_online_cpu’:
  ../arch/powerpc/include/asm/topology.h:103:38: error: statement with no 
effect [-Werror=unused-value]
   #define timed_topology_update(nsecs) 0
^
  ../arch/powerpc/platforms/pseries/hotplug-cpu.c:366:4: note: in expansion of 
macro ‘timed_topology_update’
  timed_topology_update(1);
  ^
  ../arch/powerpc/platforms/pseries/hotplug-cpu.c: In function 
‘dlpar_offline_cpu’:
  ../arch/powerpc/include/asm/topology.h:103:38: error: statement with no 
effect [-Werror=unused-value]
   #define timed_topology_update(nsecs) 0
^
  ../arch/powerpc/platforms/pseries/hotplug-cpu.c:533:5: note: in expansion of 
macro ‘timed_topology_update’
   timed_topology_update(1);
   ^
  cc1: all warnings being treated as errors


So please fix that up, split it, and resend.

I know it would have been fairly easy for me to fix that compiler error,
but it tells me you haven't tested that configuration at all, which is
the bigger problem. Please at least give it a smoke test.

cheers

Re: [PATCH] powerpc/powernv: Clear LPCR[PECE1] via stop-api only for deep state offline

2017-09-01 Thread Akshay Adiga



On 08/31/2017 05:37 PM, Nicholas Piggin wrote:

On Thu, 31 Aug 2017 17:17:41 +0530
"Gautham R. Shenoy"  wrote:

> From: "Gautham R. Shenoy" 
>
> commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via
> stop-api only on Hotplug") clears the PECE1 bit of the LPCR via
> stop-api during CPU-Hotplug to prevent wakeup due to a decrementer on
> an offlined CPU which is in a deep stop state.
>
> In the case where the stop-api support is found to be lacking, the
> commit 785a12afdb4a ("powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT
> states when stop-api fails") disables deep states that lose hypervisor
> context. Thus in this case, the offlined CPU will be put to some
> shallow idle state.
>
> However, we currently unconditionally clear the PECE1 in LPCR via
> stop-api during CPU-Hotplug even when deep states are disabled due to
> stop-api failure.
>
> Fix this by clearing PECE1 of LPCR via stop-api during CPU-Hotplug
> *only* when the offlined CPU will be put to a deep state that loses
> hypervisor context.

This looks okay to me. The bug is due to calling opal_slw_set_reg when
firmware has not enabled that feature, right?

Yes,

In the case where the stop-api support is found to be lacking, the 
commit 785a12afdb4a ("powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT 
states when stop-api fails") disables deep states that lose hypervisor 
context. Thus in this case, the offlined CPU will be put to some shallow 
idle state.


If a shallow state ( < stop4 ) is being chosen for cpu hotplug, then :
1) this opal call is not required.
2) may not be supported.

Hence  should call opal_slw_set_reg() only if a deep state chosen for 
cpu hotplug.


>
> Fixes: commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via
> stop-api only on Hotplug")
>
> Reported-by: Pavithra Prakash 
> Signed-off-by: Gautham R. Shenoy 
> ---
>  arch/powerpc/platforms/powernv/idle.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/powernv/idle.c
b/arch/powerpc/platforms/powernv/idle.c
> index 9f59041..23f8fba 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -393,7 +393,13 @@ static void
pnv_program_cpu_hotplug_lpcr(unsigned int cpu, u64 lpcr_val)
>  u64 pir = get_hard_smp_processor_id(cpu);
>
>  mtspr(SPRN_LPCR, lpcr_val);
> -opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val);
> +
> +/*
> + * Program the LPCR via stop-api only for deepest stop state
> + * can lose hypervisor context.
> + */
> +if (supported_cpuidle_states & OPAL_PM_LOSE_FULL_CONTEXT)
> +opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val);
>  }
>
>  /*

Re: [PATCH v3 4/4] powerpc/64s: idle ESL=0 stop can avoid MSR and save/restore overhead

2017-09-01 Thread Michael Ellerman

Nicholas Piggin  writes:

> On Wed, 30 Aug 2017 21:25:59 +1000
> Michael Ellerman  wrote:
>
>> Nicholas Piggin  writes:
>> 
>> > When stop is executed with EC=ESL=0, it appears to execute like a
>> > normal instruction (resuming from NIP when woken by interrupt).
>> > So all the save/restore handling can be avoided completely. In
>> > particular NV GPRs do not have to be saved, and MSR does not have
>> > to be switched back to kernel MSR.
>> >
>> > So move the test for "lite" sleep states out to power9_idle_stop.
>> >
>> > Reviewed-by: Gautham R. Shenoy 
>> > Signed-off-by: Nicholas Piggin 
>> > ---
>> >  arch/powerpc/kernel/idle_book3s.S | 35 ---
>> >  1 file changed, 24 insertions(+), 11 deletions(-)  
>> 
>> This is blowing up for me on mambo:
>
> Oh this is a known bug in mambo that does not match the hardware.
>
> You need >= Mambo.7.8.21, or this firmware patch to work around
> the issue for old mambos.

As discussed elsewhere this still breaks on new mambo with more than one
CPU. So I've dropped it for now.

cheers

Re: [PATCH v3 0/3] Split default display handling out from VGA arbiter

2017-09-01 Thread Ard Biesheuvel

On 1 September 2017 at 08:27, Daniel Axtens  wrote:
> This patch set:
>
>  - splits the default display handling out from VGA arbiter, into its
>own file and behind its own Kconfig option (and gives the functions
>better names).
>
>  - adds extra detection of default devices. To be nominated, the vga
>arbiter and platform hooks must not have nominated a default. A
>card will then only be nominated if it has a driver attached and
>has IO or memory decoding enabled.
>
>  - adds relevant documentation.
>
> The practical impact of this is improved X autoconfiguration on some
> arm64 systems.
>
> Changes in v3:
>
>  - Add documentation - thanks Daniel Vetter for pointing it out.
>
>  - Clarify explanations. Thanks to everyone for continuing to bear
>with my incomplete understanding of PCI and provide some clarity.
>
>  - Split refactoring and adding functionality.
>
> Changes in v2: https://www.spinics.net/lists/linux-pci/msg64007.html
>
> Drop all the powerpc patches. [explanation snipped]
>
> v1: https://www.spinics.net/lists/linux-pci/msg63581.html
>

If we are all in agreement that fixing X is not an option, I think
this is a reasonable approach

Acked-by: Ard Biesheuvel

Re: [PATCH RFC] Interface to set SPRN_TIDR

2017-09-01 Thread Philippe Bergheaud


On 31/08/2017 20:06, Sukadev Bhattiprolu wrote:

felix [fe...@linux.vnet.ibm.com] wrote:

On 31/08/2017 01:32, Sukadev Bhattiprolu wrote:

Michael Neuling [mi...@neuling.org] wrote:

Suka,

Please CC Christophe who as an alternative way of doing this. We ned to get
agreement across all users of TIDR/AS_notify...

Mikey,

Thanks. There is overlap between the two patches. I will send a patch on
top of Christophe's for the interfaces to assign/clear the TIDR value and
clear the thread->tidr during arch_dup_task_struct(). I will also drop the
CONFIG_VAS check since its not only for VAS.

Christophe, can you let me know of any other comments on this patch?

Suka

Suka,

I am seconding Christophe on this matter. I think that your patch now
fulfills the CAPI use case requirements, with one exception: CAPI does not
restrict assigning a thread id to the current task. Please find a few minor
questions below.

Philippe


His patch is here:

https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ozlabs.org_pipermail_linuxppc-2Ddev_2017-2DAugust_161582.html&d=DwIFAw&c=jf_iaSHvJObTbx-siA1ZOg&r=KC0fX9VGJYXlSiH9qN2ZONDbh8vUCZFX8GUhF3rHAvg&m=XQenBfWewOBjWopgf1Fh2UAVGnlzq766MNuzx7jYfuA&s=07WOVTh9f_4IBZfCJes4lvc7LWenBlqVfAXIXxL2QH4&e=

Mikey

On Tue, 2017-08-29 at 19:38 -0700, Sukadev Bhattiprolu wrote:

We need the SPRN_TIDR to be set for use with fast thread-wakeup
(core-to-core wakeup) in VAS. Each user thread that has a receive
window setup and expects to be notified when a sender issues a paste
needs to have a unique SPRN_TIDR value.

The SPRN_TIDR value only needs to unique within the process but for
now we use a globally unique thread id as described below.

Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v2]
- Michael Ellerman: Use an interface to assign TIDR so it is
  assigned to only threads that need it; move assignment to
  restore_sprs(). Drop lint from rebase;


   arch/powerpc/include/asm/processor.h |  4 ++
   arch/powerpc/include/asm/switch_to.h |  3 ++
   arch/powerpc/kernel/process.c| 97

   3 files changed, 104 insertions(+)

diff --git a/arch/powerpc/include/asm/processor.h
b/arch/powerpc/include/asm/processor.h
index fab7ff8..bf6ba63 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -232,6 +232,10 @@ struct debug_reg {
   struct thread_struct {
unsigned long   ksp;/* Kernel stack pointer */

+#ifdef CONFIG_PPC_VAS
+   unsigned long   tidr;
+#endif
+
   #ifdef CONFIG_PPC64
unsigned long   ksp_vsid;
   #endif
diff --git a/arch/powerpc/include/asm/switch_to.h
b/arch/powerpc/include/asm/switch_to.h
index 17c8380..4962455 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -91,4 +91,7 @@ static inline void clear_task_ebb(struct task_struct *t)
   #endif
   }

+extern void set_thread_tidr(struct task_struct *t);
+extern void clear_thread_tidr(struct task_struct *t);
+
   #endif /* _ASM_POWERPC_SWITCH_TO_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 1f0fd36..13abb22 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1132,6 +1132,10 @@ static inline void restore_sprs(struct thread_struct
*old_thread,
mtspr(SPRN_TAR, new_thread->tar);
}
   #endif
+#ifdef CONFIG_PPC_VAS
+   if (old_thread->tidr != new_thread->tidr)
+   mtspr(SPRN_TIDR, new_thread->tidr);
+#endif
   }

   #ifdef CONFIG_PPC_BOOK3S_64
@@ -1446,9 +1450,97 @@ void flush_thread(void)
   #endif /* CONFIG_HAVE_HW_BREAKPOINT */
   }

+#ifdef CONFIG_PPC_VAS
+static DEFINE_SPINLOCK(vas_thread_id_lock);
+static DEFINE_IDA(vas_thread_ida);
+
+/*
+ * We need to assign an unique thread id to each thread in a process. This
+ * thread id is intended to be used with the Fast Thread-wakeup (aka Core-
+ * to-core wakeup) mechanism being implemented on top of Virtual Accelerator
+ * Switchboard (VAS).
+ *
+ * To get a unique thread-id per process we could simply use task_pid_nr()
+ * but the problem is that task_pid_nr() is not yet available for the thread
+ * when copy_thread() is called. Fixing that would require changing more
+ * intrusive arch-neutral code in code path in copy_process()?.
+ *
+ * Further, to assign unique thread ids within each process, we need an
+ * atomic field (or an IDR) in task_struct, which again intrudes into the
+ * arch-neutral code.
+ *
+ * So try to assign globally unique thraed ids for now.
+ *
+ * NOTE: TIDR 0 indicates that the thread does not need a TIDR value.
+ *  For now, only threads that expect to be notified by the VAS
+ *  hardware need a TIDR value and we assign values > 0 for those.
+ */
+#define MAX_THREAD_CONTEXT ((1 << 15) - 2)

Why are you excluding ((1 << 15) - 1)?

You are right. I don't need to exclude that. Also, TIDR is a 16-bit (0:15 in
VAS's Local Notify TID) value right? So, will ch

[PATCH v3 3/3] drm: documentation for default display device

2017-09-01 Thread Daniel Axtens

We have refactored and extended this - document it.

Signed-off-by: Daniel Axtens 
---
 Documentation/gpu/default_display.rst | 93 +++
 Documentation/gpu/index.rst   |  1 +
 2 files changed, 94 insertions(+)
 create mode 100644 Documentation/gpu/default_display.rst

diff --git a/Documentation/gpu/default_display.rst 
b/Documentation/gpu/default_display.rst
new file mode 100644
index ..3c190611564e
--- /dev/null
+++ b/Documentation/gpu/default_display.rst
@@ -0,0 +1,93 @@
+===
+Default Display Devices
+===
+
+Overview
+
+
+.. kernel-doc:: drivers/gpu/vga/default_display.c
+   :doc: overview
+
+
+Why do we need this?
+
+
+The default device is used to set the ``boot_vga`` per-device sysfs
+file, which is used by user-space. Most notably, Xorg reads this file
+via libpciaccess in order to facilitate auto-configuration.
+
+
+History
+===
+
+When the VGA arbiter was introduced, it would pick a default device on
+boot. As the arbiter exists to arbitrate access to legacy resources,
+it would only pick a card that could be accessed through legacy areas.
+(See the :doc:`vgaarbiter` documentation for more.)
+
+The arbiter was later extended on x86 and IA64 to consider the EFI
+framebuffer.
+
+This is all well and good if you have legacy resources or
+EFI. However, some systems do not have either of those. For example,
+on POWER8: [0]_
+
+ - There is no IO space at all
+
+ - We configure the 32-bit MMIO window to be around 3..4G (to avoid
+   overlapping with DMA space below it).
+
+So effectively, there is no path to the legacy areas.
+
+This means the VGA arbiter will not pick a default device on these
+platforms. So, on powerpc, a class hook was added to mark a default
+device (``arch/powerpc/kernel/pci-common.c``), independent of the
+arbiter.
+
+When this issue arose on an arm64 system, it was decided that a generic
+approach would be better than more special cases. Therefore, the
+default device selection was factored out, and it now operates using
+the priority list described in the Overview.
+
+API
+===
+
+Public
+--
+
+.. kernel-doc:: drivers/gpu/vga/default_display.c
+   :export:
+
+Private
+---
+
+.. kernel-doc:: drivers/gpu/vga/default_display.c
+   :internal:
+
+Future Work
+===
+
+There is no support for non-PCI VGA devices being marked as default.
+The following comment, extracted from an earlier version of
+:c:func:`pci_default_display()` might help:
+
+  If your VGA default device is not PCI, you'll have to override this
+  and return NULL here. In this case, I assume it will not conflict
+  with any PCI card. If this is not true, I'll have to define two
+  archs hooks for enabling/disabling the VGA default device if that
+  is possible. 
+
+  This may be a problem with real _ISA_ VGA cards, in addition to a
+  PCI one. I don't know at this point how to deal with that card. Can
+  theirs IOs be disabled at all ? If not, then I suppose it's a matter
+  of having the proper arch hook telling us about it, so we basically
+  never allow anybody to succeed a ``vga_get()``...
+
+Currently there is also no way to support non-VGA-class PCI devices as
+default display devices.
+
+
+References
+==
+  
+.. [0] https://www.spinics.net/lists/linux-pci/msg64142.html
diff --git a/Documentation/gpu/index.rst b/Documentation/gpu/index.rst
index 35d673bf9b56..8083d84f2334 100644
--- a/Documentation/gpu/index.rst
+++ b/Documentation/gpu/index.rst
@@ -16,6 +16,7 @@ Linux GPU Driver Developer's Guide
tegra
tinydrm
vc4
+   default_display
vga-switcheroo
vgaarbiter
bridge/dw-hdmi
-- 
2.11.0

[PATCH v3 2/3] drm: add fallback default device detection

2017-09-01 Thread Daniel Axtens

The VGA arbiter selects a default VGA device that is enabled and
reachable via the legacy VGA resources (mem 0xa-0xb, io
0x3b0-0x3bb, io 0x3c0-0x3df, etc).

(As a special case for x86 and IA64, this can be overridden by
EFI.)

If there is no such device, e.g., because there's no enabled VGA
device, the host bridge doesn't support access to those legacy
resources, or a PCI-PCI bridge doesn't have VGA Enable set, a
platform may select an arbitrary device by calling
pci_set_default_display(). powerpc does this, for example.

If there is also no platform hook, there will be no default
device nominated. This is not necessarily what we want.

Add handling for devices that aren't handled by the vga arbiter or
platform by adding a late initcall and a class enable hook. If there
is no default from vgaarb or the platform then the first VGA card
that is enabled, has a driver bound, and can decode memory or I/O
will be marked as default.

This means single-card setups on systems without access to legacy
areas and without arch hooks will work. Multi-card setups on these
systems will nominate an arbitrary device, rather than no devices.

Signed-off-by: Daniel Axtens 

---

v3:

Split out from re-organisation for simplicity.
Add better description and better documentaion.

Thanks to (in no particular order), Daniel Vetter, Lorenzo Pieralisi,
Ard Biesheuvel and Dave Airlie. Special thanks to Ben Herrenschmidt
and Bjorn Helgass, whose prose I have borrowed.

v1:

Tested on:
 - x86_64 laptop
 - arm64 D05 board with hibmc card
 - qemu powerpc with tcg and bochs std-vga

I know this adds another config option and that's a bit sad, but
we can't include it unconditionally as it depends on PCI.
Suggestions welcome.
---
 drivers/gpu/vga/default_display.c | 77 +++
 1 file changed, 77 insertions(+)

diff --git a/drivers/gpu/vga/default_display.c 
b/drivers/gpu/vga/default_display.c
index 99e4723360da..b8e4a5af38e8 100644
--- a/drivers/gpu/vga/default_display.c
+++ b/drivers/gpu/vga/default_display.c
@@ -42,6 +42,10 @@
  *
  *  2) Anything specified by an arch hook,
  * e.g. arch/powerpc/kernel/pci-common.c::fixup_vga()
+ *
+ *  3) If neither of those, then we still want to pick something. For
+ * now, pick the first PCI VGA device with a driver bound and with
+ * memory or I/O control on.
  */
 
 #include 
@@ -53,6 +57,12 @@
 
 static struct pci_dev *vga_default;
 
+/*
+ * only go active after the late initcall so as not to interfere with
+ * the arbiter
+ */
+static bool vga_default_active = false;
+
 /**
  * pci_default_display - return the default display device
  *
@@ -84,3 +94,70 @@ void pci_set_default_display(struct pci_dev *pdev)
pci_dev_put(vga_default);
vga_default = pci_dev_get(pdev);
 }
+
+static bool vga_default_try_device(struct pci_dev *pdev)
+{
+   u16 cmd;
+
+   /* Only deal with VGA class devices */
+   if ((pdev->class >> 8) != PCI_CLASS_DISPLAY_VGA)
+   return false;
+
+   /* Only deal with devices with drivers bound */
+   if (!pdev->driver)
+   return false;
+
+   /* Require I/O or memory control */
+   pci_read_config_word(pdev, PCI_COMMAND, &cmd);
+   if (!(cmd & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)))
+   return false;
+
+   dev_info(&pdev->dev, "vga_default: setting as default device\n");
+   pci_set_default_display(pdev);
+   return true;
+}
+
+static int __init vga_default_init(void)
+{
+   struct pci_dev *pdev;
+
+   vga_default_active = true;
+
+   if (pci_default_display())
+   return 0;
+
+   pdev = NULL;
+   while ((pdev =
+   pci_get_subsys(PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
+  PCI_ANY_ID, pdev)) != NULL) {
+   if (vga_default_try_device(pdev))
+   return 0;
+   }
+
+   return 0;
+}
+late_initcall(vga_default_init);
+
+/*
+ * A driver could be loaded much later than late_initcall, for example
+ * if it's in a module.
+ *
+ * We want to pick that up. However, we want to make sure this does
+ * not interfere with the arbiter - it should only activate if the
+ * arbiter has already had a chance to operate. To ensure this, we set
+ * vga_default_active in the late_initcall: as the vga arbiter is a
+ * subsys initcall, it is guaranteed to fire first.
+ */
+static void vga_default_enable_hook(struct pci_dev *pdev)
+{
+   if (!vga_default_active)
+  return;
+
+   if (pci_default_display())
+   return;
+
+   vga_default_try_device(pdev);
+}
+DECLARE_PCI_FIXUP_CLASS_ENABLE(PCI_ANY_ID, PCI_ANY_ID,
+  PCI_CLASS_DISPLAY_VGA, 8,
+  vga_default_enable_hook)
-- 
2.11.0

[PATCH v3 1/3] drm: split default display handler out of VGA arbiter

2017-09-01 Thread Daniel Axtens

Split the small bit of code that does default VGA handling out from
the arbiter. Add a Kconfig option to allow the kernel to be built
with just the default handling, or the arbiter and default handling.

While doing this, rename the functions from vga_(set_)default_device
to pci_(set_)default_display. This makes it clear that these functions
do not rely on legacy VGA access: they're about the default PCI display.
(The device still needs to be a member of PCI_CLASS_DISPLAY_VGA.)

This should not introduce any functional change.

Signed-off-by: Daniel Axtens 

---

I know this adds another config option and that's a bit sad, but
we can't include it unconditionally as it depends on PCI.
Suggestions welcome.

v3: split from the part where we add functionality
rename functions
---
 arch/ia64/pci/fixup.c |  6 +--
 arch/powerpc/kernel/pci-common.c  |  6 +--
 arch/x86/pci/fixup.c  |  6 +--
 arch/x86/video/fbdev.c|  4 +-
 drivers/gpu/vga/Kconfig   | 12 ++
 drivers/gpu/vga/Makefile  |  1 +
 drivers/gpu/vga/default_display.c | 86 +++
 drivers/gpu/vga/vga_switcheroo.c  |  8 ++--
 drivers/gpu/vga/vgaarb.c  | 61 ++-
 drivers/pci/pci-sysfs.c   |  4 +-
 include/linux/default_display.h   | 44 
 include/linux/vgaarb.h| 15 ---
 12 files changed, 173 insertions(+), 80 deletions(-)
 create mode 100644 drivers/gpu/vga/default_display.c
 create mode 100644 include/linux/default_display.h

diff --git a/arch/ia64/pci/fixup.c b/arch/ia64/pci/fixup.c
index 41caa99add51..a5f35e767c84 100644
--- a/arch/ia64/pci/fixup.c
+++ b/arch/ia64/pci/fixup.c
@@ -5,7 +5,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 #include 
@@ -22,7 +22,7 @@
  * card with this copy. On laptops this copy has to be used since
  * the main ROM may be compressed or combined with another image.
  * See pci_map_rom() for use of this flag. Before marking the device
- * with IORESOURCE_ROM_SHADOW check if a vga_default_device is already set
+ * with IORESOURCE_ROM_SHADOW check if a pci_default_display is already set
  * by either arch code or vga-arbitration; if so only apply the fixup to this
  * already-determined primary video card.
  */
@@ -59,7 +59,7 @@ static void pci_fixup_video(struct pci_dev *pdev)
}
bus = bus->parent;
}
-   if (!vga_default_device() || pdev == vga_default_device()) {
+   if (!pci_default_display() || pdev == pci_default_display()) {
pci_read_config_word(pdev, PCI_COMMAND, &config);
if (config & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)) {
res = &pdev->resource[PCI_ROM_RESOURCE];
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 341a7469cab8..9f82f13ac531 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -31,7 +31,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include 
 #include 
@@ -1747,8 +1747,8 @@ static void fixup_vga(struct pci_dev *pdev)
u16 cmd;
 
pci_read_config_word(pdev, PCI_COMMAND, &cmd);
-   if ((cmd & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)) || 
!vga_default_device())
-   vga_set_default_device(pdev);
+   if ((cmd & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)) || 
!pci_default_display())
+   pci_set_default_display(pdev);
 
 }
 DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_ANY_ID, PCI_ANY_ID,
diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index 11e407489db0..7e32a2a80383 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -5,7 +5,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 
@@ -302,7 +302,7 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,
PCI_DEVICE_ID_INTEL_MCH_PC1,pcie_r
  * card with this copy. On laptops this copy has to be used since
  * the main ROM may be compressed or combined with another image.
  * See pci_map_rom() for use of this flag. Before marking the device
- * with IORESOURCE_ROM_SHADOW check if a vga_default_device is already set
+ * with IORESOURCE_ROM_SHADOW check if a pci_default_display is already set
  * by either arch code or vga-arbitration; if so only apply the fixup to this
  * already-determined primary video card.
  */
@@ -334,7 +334,7 @@ static void pci_fixup_video(struct pci_dev *pdev)
}
bus = bus->parent;
}
-   if (!vga_default_device() || pdev == vga_default_device()) {
+   if (!pci_default_display() || pdev == pci_default_display()) {
pci_read_config_word(pdev, PCI_COMMAND, &config);
if (config & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)) {
res = &pdev->resource[PCI_ROM_RESOURCE];
diff --git a/arch/x86/video/fbdev.c b/arch/x86/video/fbdev.c
index 9fd24846d094..114bd9ac95d0 100644
--- a/arch/x86/video/fbdev.c
+++ b/arch/x86/vi

[PATCH v3 0/3] Split default display handling out from VGA arbiter

2017-09-01 Thread Daniel Axtens

This patch set:

 - splits the default display handling out from VGA arbiter, into its
   own file and behind its own Kconfig option (and gives the functions
   better names).

 - adds extra detection of default devices. To be nominated, the vga
   arbiter and platform hooks must not have nominated a default. A
   card will then only be nominated if it has a driver attached and
   has IO or memory decoding enabled.

 - adds relevant documentation.

The practical impact of this is improved X autoconfiguration on some
arm64 systems.

Changes in v3:

 - Add documentation - thanks Daniel Vetter for pointing it out.

 - Clarify explanations. Thanks to everyone for continuing to bear
   with my incomplete understanding of PCI and provide some clarity.

 - Split refactoring and adding functionality.

Changes in v2: https://www.spinics.net/lists/linux-pci/msg64007.html

Drop all the powerpc patches. [explanation snipped]

v1: https://www.spinics.net/lists/linux-pci/msg63581.html

Regards,
Daniel

Daniel Axtens (3):
  drm: split default display handler out of VGA arbiter
  drm: add fallback default device detection
  drm: documentation for default display device

 Documentation/gpu/default_display.rst |  93 +++
 Documentation/gpu/index.rst   |   1 +
 arch/ia64/pci/fixup.c |   6 +-
 arch/powerpc/kernel/pci-common.c  |   6 +-
 arch/x86/pci/fixup.c  |   6 +-
 arch/x86/video/fbdev.c|   4 +-
 drivers/gpu/vga/Kconfig   |  12 +++
 drivers/gpu/vga/Makefile  |   1 +
 drivers/gpu/vga/default_display.c | 163 ++
 drivers/gpu/vga/vga_switcheroo.c  |   8 +-
 drivers/gpu/vga/vgaarb.c  |  61 +++--
 drivers/pci/pci-sysfs.c   |   4 +-
 include/linux/default_display.h   |  44 +
 include/linux/vgaarb.h|  15 
 14 files changed, 344 insertions(+), 80 deletions(-)
 create mode 100644 Documentation/gpu/default_display.rst
 create mode 100644 drivers/gpu/vga/default_display.c
 create mode 100644 include/linux/default_display.h

-- 
2.11.0

50 matches

Mail list logo