Re: [RFC v2 07/12] powerpc: Macro the mask used for checking DSI exception

2017-06-20 Thread Ram Pai
On Tue, Jun 20, 2017 at 01:44:25PM +0530, Anshuman Khandual wrote:
> On 06/17/2017 09:22 AM, Ram Pai wrote:
> > Replace the magic number used to check for DSI exception
> > with a meaningful value.
> > 
> > Signed-off-by: Ram Pai 
> > ---
> >  arch/powerpc/include/asm/reg.h   | 9 -
> >  arch/powerpc/kernel/exceptions-64s.S | 2 +-
> >  2 files changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> > index 7e50e47..2dcb8a1 100644
> > --- a/arch/powerpc/include/asm/reg.h
> > +++ b/arch/powerpc/include/asm/reg.h
> > @@ -272,16 +272,23 @@
> >  #define SPRN_DAR   0x013   /* Data Address Register */
> >  #define SPRN_DBCR  0x136   /* e300 Data Breakpoint Control Reg */
> >  #define SPRN_DSISR 0x012   /* Data Storage Interrupt Status Register */
> > +#define   DSISR_BIT32  0x8000  /* not defined */
> >  #define   DSISR_NOHPTE 0x4000  /* no translation found 
> > */
> > +#define   DSISR_PAGEATTR_CONFLT0x2000  /* page attribute 
> > conflict */
> > +#define   DSISR_BIT35  0x1000  /* not defined */
> >  #define   DSISR_PROTFAULT  0x0800  /* protection fault */
> >  #define   DSISR_BADACCESS  0x0400  /* bad access to CI or G */
> >  #define   DSISR_ISSTORE0x0200  /* access was a store */
> >  #define   DSISR_DABRMATCH  0x0040  /* hit data breakpoint */
> > -#define   DSISR_NOSEGMENT  0x0020  /* SLB miss */
> >  #define   DSISR_KEYFAULT   0x0020  /* Key fault */
> > +#define   DSISR_BIT43  0x0010  /* not defined */
> >  #define   DSISR_UNSUPP_MMU 0x0008  /* Unsupported MMU config */
> >  #define   DSISR_SET_RC 0x0004  /* Failed setting of 
> > R/C bits */
> >  #define   DSISR_PGDIRFAULT  0x0002  /* Fault on page directory 
> > */
> > +#define   DSISR_PAGE_FAULT_MASK (DSISR_BIT32 | \
> > +   DSISR_PAGEATTR_CONFLT | \
> > +   DSISR_BADACCESS |   \
> > +   DSISR_BIT43)
> 
> Sorry missed this one. Seems like there are couple of unnecessary
> line additions in the subsequent patch which adds the new PKEY
> reason code.
> 
> -#define   DSISR_PAGE_FAULT_MASK (DSISR_BIT32 | \
> - DSISR_PAGEATTR_CONFLT | \
> - DSISR_BADACCESS |   \
> +#define   DSISR_PAGE_FAULT_MASK (DSISR_BIT32 |   \
> + DSISR_PAGEATTR_CONFLT | \
> + DSISR_BADACCESS |   \
> + DSISR_KEYFAULT |\
>   DSISR_BIT43)

i like to see them separately, one per line. But than you are right.
that is not the convention in this file. So will change it accordingly.

thanks,
RP
> 
> 

-- 
Ram Pai



Re: [RFC v2 06/12] powerpc: Program HPTE key protection bits.

2017-06-20 Thread Ram Pai
On Tue, Jun 20, 2017 at 01:51:45PM +0530, Anshuman Khandual wrote:
> On 06/17/2017 09:22 AM, Ram Pai wrote:
> > Map the PTE protection key bits to the HPTE key protection bits,
> > while creatiing HPTE  entries.
> > 
> > Signed-off-by: Ram Pai 
> > ---
> >  arch/powerpc/include/asm/book3s/64/mmu-hash.h | 5 +
> >  arch/powerpc/include/asm/pkeys.h  | 7 +++
> >  arch/powerpc/mm/hash_utils_64.c   | 5 +
> >  3 files changed, 17 insertions(+)
> > 
> > diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
> > b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> > index cfb8169..3d7872c 100644
> > --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> > +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> > @@ -90,6 +90,8 @@
> >  #define HPTE_R_PP0 ASM_CONST(0x8000)
> >  #define HPTE_R_TS  ASM_CONST(0x4000)
> >  #define HPTE_R_KEY_HI  ASM_CONST(0x3000)
> > +#define HPTE_R_KEY_BIT0ASM_CONST(0x2000)
> > +#define HPTE_R_KEY_BIT1ASM_CONST(0x1000)
> >  #define HPTE_R_RPN_SHIFT   12
> >  #define HPTE_R_RPN ASM_CONST(0x0000)
> >  #define HPTE_R_RPN_3_0 ASM_CONST(0x01fff000)
> > @@ -104,6 +106,9 @@
> >  #define HPTE_R_C   ASM_CONST(0x0080)
> >  #define HPTE_R_R   ASM_CONST(0x0100)
> >  #define HPTE_R_KEY_LO  ASM_CONST(0x0e00)
> > +#define HPTE_R_KEY_BIT2ASM_CONST(0x0800)
> > +#define HPTE_R_KEY_BIT3ASM_CONST(0x0400)
> > +#define HPTE_R_KEY_BIT4ASM_CONST(0x0200)
> > 
> 
> Should we indicate/document how these 5 bits are not contiguous
> in the HPTE format for any given real page ?

I can, but its all well documented in the ISA. Infact all the bits and
the macros are one to one translation from the ISA.

> 
> >  #define HPTE_V_1TB_SEG ASM_CONST(0x4000)
> >  #define HPTE_V_VRMA_MASK   ASM_CONST(0x4001ff00)
> > diff --git a/arch/powerpc/include/asm/pkeys.h 
> > b/arch/powerpc/include/asm/pkeys.h
> > index 0f3dca8..9b6820d 100644
> > --- a/arch/powerpc/include/asm/pkeys.h
> > +++ b/arch/powerpc/include/asm/pkeys.h
> > @@ -27,6 +27,13 @@
> > ((vm_flags & VM_PKEY_BIT3) ? H_PAGE_PKEY_BIT1 : 0x0UL) | \
> > ((vm_flags & VM_PKEY_BIT4) ? H_PAGE_PKEY_BIT0 : 0x0UL))
> > 
> > +#define calc_pte_to_hpte_pkey_bits(pteflags)   \
> > +   (((pteflags & H_PAGE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL) |\
> > +   ((pteflags & H_PAGE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) | \
> > +   ((pteflags & H_PAGE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) | \
> > +   ((pteflags & H_PAGE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) | \
> > +   ((pteflags & H_PAGE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL))
> > +
> 
> We can drop calc_ in here. pte_to_hpte_pkey_bits should be
> sufficient.

ok. will do.

thanks for your comments,
RP



Re: [RFC v2 02/12] powerpc: Free up four 64K PTE bits in 64K backed hpte pages.

2017-06-20 Thread Ram Pai
On Tue, Jun 20, 2017 at 04:21:45PM +0530, Anshuman Khandual wrote:
> On 06/17/2017 09:22 AM, Ram Pai wrote:
> > Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6
> > in the 64K backed hpte pages. This along with the earlier
> > patch will entirely free up the four bits from 64K PTE.
> > 
> > This patch does the following change to 64K PTE that is
> > backed by 64K hpte.
> > 
> > H_PAGE_F_SECOND which occupied bit 4 moves to the second part
> > of the pte.
> > H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
> > second part of the pte.
> > 
> > since bit 7 is now freed up, we move H_PAGE_BUSY from bit 9
> > to bit 7. Trying to minimize gaps so that contiguous bits
> > can be allocated if needed in the future.
> > 
> > The second part of the PTE will hold
> > (H_PAGE_F_SECOND|H_PAGE_F_GIX) at bit 60,61,62,63.
> 
> I still dont understand how we freed up the 5th bit which is
> used in the 5th patch. Was that bit never used for any thing
> on 64K page size (64K and 4K mappings) ?

yes. it was not used. So I gladly used it :-)


RP



Re: [RFC v2 01/12] powerpc: Free up four 64K PTE bits in 4K backed hpte pages.

2017-06-20 Thread Ram Pai
On Tue, Jun 20, 2017 at 03:50:25PM +0530, Anshuman Khandual wrote:
> On 06/17/2017 09:22 AM, Ram Pai wrote:
> > Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6
> > in the 4K backed hpte pages. These bits continue to be used
> > for 64K backed hpte pages in this patch, but will be freed
> > up in the next patch.
> 
> The counting 3, 4, 5 and 6 are in BE format I believe, I was
> initially trying to see that from right to left as we normally
> do in the kernel and was getting confused. So basically these
> bits (which are only applicable for 64K mapping IIUC) are going
> to be freed up from the PTE format.
> 
> #define _RPAGE_RSV1   0x1000UL
> #define _RPAGE_RSV2   0x0800UL
> #define _RPAGE_RSV3   0x0400UL
> #define _RPAGE_RSV4   0x0200UL
> 
> As you have mentioned before this feature is available for 64K
> page size only and not for 4K mappings. So I assume we support
> both the combinations.
> 
> * 64K mapping on 64K
> * 64K mapping on 4K

yes.

> 
> These are the current users of the above bits
> 
> #define H_PAGE_BUSY   _RPAGE_RSV1 /* software: PTE & hash are busy */
> #define H_PAGE_F_SECOND   _RPAGE_RSV2 /* HPTE is in 2ndary 
> HPTEG */
> #define H_PAGE_F_GIX  (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
> #define H_PAGE_HASHPTE_RPAGE_RPN43/* PTE has associated 
> HPTE */
> 
> > 
> > The patch does the following change to the 64K PTE format
> > 
> > H_PAGE_BUSY moves from bit 3 to bit 9
> 
> and what is in there on bit 9 now ? This ?
> 
> #define _RPAGE_SW20x00400
> 
> which is used as 
> 
> #define _PAGE_SPECIAL _RPAGE_SW2 /* software: special page */
> 
> which will not be required any more ?

i think you are reading bit 9 from right to left. the bit 9 i refer to
is from left to right. Using the same numbering convention the ISA3.0 uses.
I know it is confusing, will make a mention in the comment of this
patch, to read it the big-endian way.

BTW: Bit 9 is not used currently. so using it in this patch. But this is
a temporary move. the H_PAGE_BUSY will move to bit 7 in the next patch.

Had to keep at bit 9, because bit 7 is not yet entirely freed up. it is
used by 64K PTE backed by 64k htpe.

> 
> > H_PAGE_F_SECOND which occupied bit 4 moves to the second part
> > of the pte.
> > H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
> > second part of the pte.
> > 
> > the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
> > is  initialized  to  0xF  indicating  an invalid  slot.  If  a hpte
> > gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
> > released immediately. In  other  words, even  though   0xF   is   a
> 
> Release immediately means we attempt again for a new hash slot ?

yes.

> 
> > valid slot we discard  and consider it as an invalid
> > slot;i.e hpte_soft_invalid(). This  gives  us  an opportunity to not
> > depend on a bit in the primary PTE in order to determine the
> > validity of a slot.
> 
> So we have to see the slot number in the second half for each PTE to
> figure out if it has got a valid slot in the hash page table.

yes.

> 
> > 
> > When  we  release  ahpte   in the 0xF   slot we also   release a
> > legitimate primary   slot  andunmapthat  entry. This  is  to
> > ensure  that we do get a   legimate   non-0xF  slot the next time we
> > retry for a slot.
> 
> Okay.
> 
> > 
> > Though treating 0xF slot as invalid reduces the number of available
> > slots  and  may  have an effect  on the performance, the probabilty
> > of hitting a 0xF is extermely low.
> 
> Why you say that ? I thought every slot number has the same probability
> of hit from the hash function.

Every hash bucket has the same probability. But every slot within the
hash bucket is filled in sequentially. so it takes 15 hptes to hash to
the same bucket before we get to the 15th slot in the secondary.

> 
> > 
> > Compared  to the current scheme, the above described scheme reduces
> > the number of false hash table updates  significantly  and  has the
> 
> How it reduces false hash table updates ?

earlier, we had 1 bit allocated in the first-part-of-the 64K-PTE 
for four consecutive 4K hptes. If any one 4k hpte got hashed-in,
the bit got set. Which means anytime it faulted on the remaining
three 4k hpte, we saw the bit already set and tried to erroneously 
update that hpte. So we had a 75% update error rate. Funcationally
not bad, but bad from a performance point of view.

With the current scheme, we decide if a 4k slot is valid by looking
at its value rather than depending on a bit in the main-pte. So
there is no chance of getting mislead. And hence no chance of trying
to update a invalid hpte. Should improve performance and at the same
time give us four valuable PTE bits.


> 
> > added  advantage  of  releasing  four  valuable  PTE bits for other
> > purpose.
> > 
> > This idea was jointly 

Re: [RFC v2 03/12] powerpc: Implement sys_pkey_alloc and sys_pkey_free system call.

2017-06-20 Thread Ram Pai
On Mon, Jun 19, 2017 at 10:18:01PM +1000, Michael Ellerman wrote:
> Hi Ram,
> 
> Ram Pai  writes:
> > Sys_pkey_alloc() allocates and returns available pkey
> > Sys_pkey_free()  frees up the pkey.
> >
> > Total 32 keys are supported on powerpc. However pkey 0,1 and 31
> > are reserved. So effectively we have 29 pkeys.
> >
> > Signed-off-by: Ram Pai 
> > ---
> >  include/linux/mm.h   |  31 ---
> >  include/uapi/asm-generic/mman-common.h   |   2 +-
> 
> Those changes need to be split out and acked by mm folks.
> 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 7cb17c6..34ddac7 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -204,26 +204,35 @@ extern int overcommit_kbytes_handler(struct ctl_table 
> > *, int, void __user *,
> >  #define VM_MERGEABLE   0x8000  /* KSM may merge identical 
> > pages */
> >  
> >  #ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
> > -#define VM_HIGH_ARCH_BIT_0 32  /* bit only usable on 64-bit 
> > architectures */
> > -#define VM_HIGH_ARCH_BIT_1 33  /* bit only usable on 64-bit 
> > architectures */
> > -#define VM_HIGH_ARCH_BIT_2 34  /* bit only usable on 64-bit 
> > architectures */
> > -#define VM_HIGH_ARCH_BIT_3 35  /* bit only usable on 64-bit 
> > architectures */
> > +#define VM_HIGH_ARCH_BIT_0 32  /* bit only usable on 64-bit arch */
> > +#define VM_HIGH_ARCH_BIT_1 33  /* bit only usable on 64-bit arch */
> > +#define VM_HIGH_ARCH_BIT_2 34  /* bit only usable on 64-bit arch */
> > +#define VM_HIGH_ARCH_BIT_3 35  /* bit only usable on 64-bit arch */
> 
> Please don't change the comments, it makes the diff harder to read.

The lines were surpassing 80 columns. tried to compress the comments
without loosing meaning. will restore.

> 
> You're actually just adding this AFAICS:
> 
> > +#define VM_HIGH_ARCH_BIT_4 36  /* bit only usable on 64-bit arch */
> 
> >  #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0)
> >  #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1)
> >  #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2)
> >  #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3)
> > +#define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4)
> >  #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
> >  
> >  #if defined(CONFIG_X86)
>^
> >  # define VM_PATVM_ARCH_1   /* PAT reserves whole VMA at 
> > once (x86) */
> > -#if defined (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS)
> > -# define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
> > -# define VM_PKEY_BIT0  VM_HIGH_ARCH_0  /* A protection key is a 4-bit 
> > value */
> > -# define VM_PKEY_BIT1  VM_HIGH_ARCH_1
> > -# define VM_PKEY_BIT2  VM_HIGH_ARCH_2
> > -# define VM_PKEY_BIT3  VM_HIGH_ARCH_3
> > -#endif
> > +#if defined(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) \
> > +   || defined(CONFIG_PPC64_MEMORY_PROTECTION_KEYS)
> > +#define VM_PKEY_SHIFT  VM_HIGH_ARCH_BIT_0
> > +#define VM_PKEY_BIT0   VM_HIGH_ARCH_0  /* A protection key is a 5-bit 
> > value */
>  ^ 4?
> > +#define VM_PKEY_BIT1   VM_HIGH_ARCH_1
> > +#define VM_PKEY_BIT2   VM_HIGH_ARCH_2
> > +#define VM_PKEY_BIT3   VM_HIGH_ARCH_3
> > +#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
> 
> That appears to be inside an #if defined(CONFIG_X86) ?
> 
> >  #elif defined(CONFIG_PPC)
>  ^
> Should be CONFIG_PPC64_MEMORY_PROTECTION_KEYS no?

Its a little garbled. Will fix it.
> 
> > +#define VM_PKEY_BIT0   VM_HIGH_ARCH_0  /* A protection key is a 5-bit 
> > value */
> > +#define VM_PKEY_BIT1   VM_HIGH_ARCH_1
> > +#define VM_PKEY_BIT2   VM_HIGH_ARCH_2
> > +#define VM_PKEY_BIT3   VM_HIGH_ARCH_3
> > +#define VM_PKEY_BIT4   VM_HIGH_ARCH_4  /* intel does not use this bit 
> > */
> > +   /* but reserved for future expansion */
> 
> But this hunk is for PPC ?
> 
> Is it OK for the other arches & generic code to add another VM_PKEY_BIT4 ?

No. it has to be PPC specific.

> 
> Do you need to update show_smap_vma_flags() ?
> 
> >  # define VM_SAOVM_ARCH_1   /* Strong Access Ordering 
> > (powerpc) */
> >  #elif defined(CONFIG_PARISC)
> >  # define VM_GROWSUPVM_ARCH_1
> 
> > diff --git a/include/uapi/asm-generic/mman-common.h 
> > b/include/uapi/asm-generic/mman-common.h
> > index 8c27db0..b13ecc6 100644
> > --- a/include/uapi/asm-generic/mman-common.h
> > +++ b/include/uapi/asm-generic/mman-common.h
> > @@ -76,5 +76,5 @@
> >  #define PKEY_DISABLE_WRITE 0x2
> >  #define PKEY_ACCESS_MASK   (PKEY_DISABLE_ACCESS |\
> >  PKEY_DISABLE_WRITE)
> > -
> > +#define PKEY_DISABLE_EXECUTE   0x4
> 
> How you can set that if it's not in PKEY_ACCESS_MASK?

I was wondering how to handle this. x86 does not support this flag.
However powerpc has the ability to enable/disable execute permission
on a key. It cannot be done from userspace, but can be done through
the sys_mprotec

[PATCH] Documentation: remove overlay-notes reference to non-existent file

2017-06-20 Thread frowand . list
From: Frank Rowand 

File dt-object-internal.txt does not exist.  Remove a reference to it
and fix up tags for references to other files.

Reported-by: afaer...@suse.de
Signed-off-by: Frank Rowand 
---
 Documentation/devicetree/overlay-notes.txt | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/Documentation/devicetree/overlay-notes.txt 
b/Documentation/devicetree/overlay-notes.txt
index d418a6ce9812..eb7f2685fda1 100644
--- a/Documentation/devicetree/overlay-notes.txt
+++ b/Documentation/devicetree/overlay-notes.txt
@@ -3,8 +3,7 @@ Device Tree Overlay Notes
 
 This document describes the implementation of the in-kernel
 device tree overlay functionality residing in drivers/of/overlay.c and is a
-companion document to Documentation/devicetree/dt-object-internal.txt[1] &
-Documentation/devicetree/dynamic-resolution-notes.txt[2]
+companion document to Documentation/devicetree/dynamic-resolution-notes.txt[1]
 
 How overlays work
 -
@@ -16,8 +15,7 @@ Since the kernel mainly deals with devices, any new device 
node that result
 in an active device should have it created while if the device node is either
 disabled or removed all together, the affected device should be deregistered.
 
-Lets take an example where we have a foo board with the following base tree
-which is taken from [1].
+Lets take an example where we have a foo board with the following base tree:
 
  foo.dts -
/* FOO platform */
@@ -36,7 +34,7 @@ which is taken from [1].
};
  foo.dts -
 
-The overlay bar.dts, when loaded (and resolved as described in [2]) should
+The overlay bar.dts, when loaded (and resolved as described in [1]) should
 
  bar.dts -
 /plugin/;  /* allow undefined label references and record them */
-- 
Frank Rowand 



[PATCH] powerpc: Only obtain cpu_hotplug_lock if called by rtasd

2017-06-20 Thread Thiago Jung Bauermann
Calling arch_update_cpu_topology from a CPU hotplug state machine callback
hits a deadlock because the function tries to get a read lock on
cpu_hotplug_lock while the state machine still holds a write lock on it.

Since all callers of arch_update_cpu_topology except rtasd already hold
cpu_hotplug_lock, this patch changes the function to use
stop_machine_cpuslocked and creates a separate function for rtasd which
still tries to obtain the lock.

Michael Bringmann investigated the bug and provided a detailed analysis
of the deadlock on this previous RFC for an alternate solution:

https://patchwork.ozlabs.org/patch/771293/

Signed-off-by: Thiago Jung Bauermann 
---

Notes:
This patch applies on tip/smp/hotplug, it should probably be carried there.

 arch/powerpc/include/asm/topology.h |  6 ++
 arch/powerpc/kernel/rtasd.c |  2 +-
 arch/powerpc/mm/numa.c  | 22 +++---
 3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 8b3b46b7b0f2..a2d36b7703ae 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -43,6 +43,7 @@ extern void __init dump_numa_cpu_topology(void);
 
 extern int sysfs_add_device_to_node(struct device *dev, int nid);
 extern void sysfs_remove_device_from_node(struct device *dev, int nid);
+extern int numa_update_cpu_topology(bool cpus_locked);
 
 #else
 
@@ -57,6 +58,11 @@ static inline void sysfs_remove_device_from_node(struct 
device *dev,
int nid)
 {
 }
+
+static inline int numa_update_cpu_topology(bool cpus_locked)
+{
+   return 0;
+}
 #endif /* CONFIG_NUMA */
 
 #if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR)
diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 3650732639ed..0f0b1b2f3b60 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -283,7 +283,7 @@ static void prrn_work_fn(struct work_struct *work)
 * the RTAS event.
 */
pseries_devicetree_update(-prrn_update_scope);
-   arch_update_cpu_topology();
+   numa_update_cpu_topology(false);
 }
 
 static DECLARE_WORK(prrn_work, prrn_work_fn);
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 371792e4418f..b95c584ce19d 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1311,8 +1311,10 @@ static int update_lookup_table(void *data)
 /*
  * Update the node maps and sysfs entries for each cpu whose home node
  * has changed. Returns 1 when the topology has changed, and 0 otherwise.
+ *
+ * cpus_locked says whether we already hold cpu_hotplug_lock.
  */
-int arch_update_cpu_topology(void)
+int numa_update_cpu_topology(bool cpus_locked)
 {
unsigned int cpu, sibling, changed = 0;
struct topology_update_data *updates, *ud;
@@ -1400,15 +1402,23 @@ int arch_update_cpu_topology(void)
if (!cpumask_weight(&updated_cpus))
goto out;
 
-   stop_machine(update_cpu_topology, &updates[0], &updated_cpus);
+   if (cpus_locked)
+   stop_machine_cpuslocked(update_cpu_topology, &updates[0],
+   &updated_cpus);
+   else
+   stop_machine(update_cpu_topology, &updates[0], &updated_cpus);
 
/*
 * Update the numa-cpu lookup table with the new mappings, even for
 * offline CPUs. It is best to perform this update from the stop-
 * machine context.
 */
-   stop_machine(update_lookup_table, &updates[0],
+   if (cpus_locked)
+   stop_machine_cpuslocked(update_lookup_table, &updates[0],
cpumask_of(raw_smp_processor_id()));
+   else
+   stop_machine(update_lookup_table, &updates[0],
+cpumask_of(raw_smp_processor_id()));
 
for (ud = &updates[0]; ud; ud = ud->next) {
unregister_cpu_under_node(ud->cpu, ud->old_nid);
@@ -1426,6 +1436,12 @@ int arch_update_cpu_topology(void)
return changed;
 }
 
+int arch_update_cpu_topology(void)
+{
+   lockdep_assert_cpus_held();
+   return numa_update_cpu_topology(true);
+}
+
 static void topology_work_fn(struct work_struct *work)
 {
rebuild_sched_domains();
-- 
2.7.4



[PATCH v6 4/4] of: detect invalid phandle in overlay

2017-06-20 Thread frowand . list
From: Frank Rowand 

Overlays are not allowed to modify phandle values of previously existing
nodes because there is no information available to allow fixup up
properties that use the previously existing phandle.

Signed-off-by: Frank Rowand 
---
 drivers/of/overlay.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
index ca0b85f5deb1..20ab49d2f7a4 100644
--- a/drivers/of/overlay.c
+++ b/drivers/of/overlay.c
@@ -130,6 +130,10 @@ static int of_overlay_apply_single_device_node(struct 
of_overlay *ov,
/* NOTE: Multiple mods of created nodes not supported */
tchild = of_get_child_by_name(target, cname);
if (tchild != NULL) {
+   /* new overlay phandle value conflicts with existing value */
+   if (child->phandle)
+   return -EINVAL;
+
/* apply overlay recursively */
ret = of_overlay_apply_one(ov, tchild, child);
of_node_put(tchild);
-- 
Frank Rowand 



[PATCH v6 3/4] of: be consistent in form of file mode

2017-06-20 Thread frowand . list
From: Frank Rowand 

checkpatch whined about using S_IRUGO instead of octal equivalent
when adding phandle sysfs code, so used octal in that patch.
Change other instances of the S_* constants in the same file to
the octal form.

Signed-off-by: Frank Rowand 
---
 drivers/of/base.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 941c9a03471d..a4e2159c8671 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -168,7 +168,7 @@ int __of_add_property_sysfs(struct device_node *np, struct 
property *pp)
 
sysfs_bin_attr_init(&pp->attr);
pp->attr.attr.name = safe_name(&np->kobj, pp->name);
-   pp->attr.attr.mode = secure ? S_IRUSR : S_IRUGO;
+   pp->attr.attr.mode = secure ? 0400 : 0444;
pp->attr.size = secure ? 0 : pp->length;
pp->attr.read = of_node_property_read;
 
-- 
Frank Rowand 



[PATCH v6 2/4] of: make __of_attach_node() static

2017-06-20 Thread frowand . list
From: Frank Rowand 

__of_attach_node() is not used outside of drivers/of/dynamic.c.  Make
it static and remove it from drivers/of/of_private.h.

Signed-off-by: Frank Rowand 
---
 drivers/of/dynamic.c| 2 +-
 drivers/of/of_private.h | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index be320082178f..3367ed2da9ad 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -216,7 +216,7 @@ int of_property_notify(int action, struct device_node *np,
return of_reconfig_notify(action, &pr);
 }
 
-void __of_attach_node(struct device_node *np)
+static void __of_attach_node(struct device_node *np)
 {
np->child = NULL;
np->sibling = np->parent->child;
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index 1a041411b219..73da291a51cd 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -91,7 +91,6 @@ extern int __of_update_property(struct device_node *np,
 extern void __of_update_property_sysfs(struct device_node *np,
struct property *newprop, struct property *oldprop);
 
-extern void __of_attach_node(struct device_node *np);
 extern int __of_attach_node_sysfs(struct device_node *np);
 extern void __of_detach_node(struct device_node *np);
 extern void __of_detach_node_sysfs(struct device_node *np);
-- 
Frank Rowand 



[PATCH v6 1/4] of: remove *phandle properties from expanded device tree

2017-06-20 Thread frowand . list
From: Frank Rowand 

Remove "phandle", "linux,phandle", and "ibm,phandle" properties from
the internal device tree.  The phandle will still be in the struct
device_node phandle field and will still be displayed as if it is
a property in /proc/device_tree.

This is to resolve the issue found by Stephen Boyd [1] when he changed
the type of struct property.value from void * to const void *.  As
a result of the type change, the overlay code had compile errors
where the resolver updates phandle values.

  [1] http://lkml.iu.edu/hypermail/linux/kernel/1702.1/04160.html

- Add sysfs infrastructure to report np->phandle, as if it was a property.
- Do not create "phandle" "ibm,phandle", and "linux,phandle" properties
  in the expanded device tree.
- Remove phandle properties in of_attach_node(), for nodes dynamically
  attached to the live tree.  Add the phandle sysfs entry for these nodes.
- When creating an overlay changeset, duplicate the node phandle in
  __of_node_dup().
- Remove no longer needed checks to exclude "phandle" and "linux,phandle"
  properties in several locations.
- A side effect of these changes is that the obsolete "linux,phandle" and
  "ibm,phandle" properties will no longer appear in /proc/device-tree (they
  will appear as "phandle").
- A side effect is that the value of property "ibm,phandle" will no longer
  override the value of properties "phandle" and "linux,phandle".

Signed-off-by: Frank Rowand 
---
 drivers/of/base.c   | 48 +++---
 drivers/of/dynamic.c| 55 +
 drivers/of/fdt.c| 43 +++---
 drivers/of/of_private.h |  1 +
 drivers/of/overlay.c|  4 +---
 drivers/of/resolver.c   | 23 +
 include/linux/of.h  |  1 +
 7 files changed, 112 insertions(+), 63 deletions(-)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 28d5f53bc631..941c9a03471d 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -116,6 +116,19 @@ static ssize_t of_node_property_read(struct file *filp, 
struct kobject *kobj,
return memory_read_from_buffer(buf, count, &offset, pp->value, 
pp->length);
 }
 
+static ssize_t of_node_phandle_read(struct file *filp, struct kobject *kobj,
+   struct bin_attribute *bin_attr, char *buf,
+   loff_t offset, size_t count)
+{
+   phandle phandle;
+   struct device_node *np;
+
+   np = container_of(bin_attr, struct device_node, attr_phandle);
+   phandle = cpu_to_be32(np->phandle);
+   return memory_read_from_buffer(buf, count, &offset, &phandle,
+  sizeof(phandle));
+}
+
 /* always return newly allocated name, caller must free after use */
 static const char *safe_name(struct kobject *kobj, const char *orig_name)
 {
@@ -164,6 +177,35 @@ int __of_add_property_sysfs(struct device_node *np, struct 
property *pp)
return rc;
 }
 
+/*
+ * In the imported device tree (fdt), phandle is a property.  In the
+ * internal data structure it is instead stored in the struct device_node.
+ * Make phandle visible in sysfs as if it was a property.
+ */
+int __of_add_phandle_sysfs(struct device_node *np)
+{
+   int rc;
+
+   if (!IS_ENABLED(CONFIG_SYSFS))
+   return 0;
+
+   if (!of_kset || !of_node_is_attached(np))
+   return 0;
+
+   if (!np->phandle || np->phandle == 0x)
+   return 0;
+
+   sysfs_bin_attr_init(&np->attr_phandle);
+   np->attr_phandle.attr.name = "phandle";
+   np->attr_phandle.attr.mode = 0444;
+   np->attr_phandle.size = sizeof(np->phandle);
+   np->attr_phandle.read = of_node_phandle_read;
+
+   rc = sysfs_create_bin_file(&np->kobj, &np->attr_phandle);
+   WARN(rc, "error adding attribute phandle to node %s\n", np->full_name);
+   return rc;
+}
+
 int __of_attach_node_sysfs(struct device_node *np)
 {
const char *name;
@@ -193,6 +235,8 @@ int __of_attach_node_sysfs(struct device_node *np)
if (rc)
return rc;
 
+   __of_add_phandle_sysfs(np);
+
for_each_property_of_node(np, pp)
__of_add_property_sysfs(np, pp);
 
@@ -2128,9 +2172,7 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 
align))
int id, len;
 
/* Skip those we do not want to proceed */
-   if (!strcmp(pp->name, "name") ||
-   !strcmp(pp->name, "phandle") ||
-   !strcmp(pp->name, "linux,phandle"))
+   if (!strcmp(pp->name, "name"))
continue;
 
np = of_find_node_by_path(pp->value);
diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index 888fdbc09992..be320082178f 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -218,19 +218,6 @@ int of_property_notify(int action, struct device_node *np,
 
 void __of_attach_node(struct device_node *np)

[PATCH v6 0/4] of: remove *phandle properties from expanded device tree

2017-06-20 Thread frowand . list
From: Frank Rowand 

Remove "phandle" and "linux,phandle" properties from the internal
device tree.  The phandle will still be in the struct device_node
phandle field and will still be displayed as if it is a property
in /proc/device_tree.

This is to resolve the issue found by Stephen Boyd [1] when he changed
the type of struct property.value from void * to const void *.  As
a result of the type change, the overlay code had compile errors
where the resolver updates phandle values.

  [1] http://lkml.iu.edu/hypermail/linux/kernel/1702.1/04160.html

Patch 1 is the phandle related changes.

Patches 2 - 4 are minor fixups for issues that became visible
while implementing patch 1.

Changes from v5:
   - patch 1: populate_properties(), prop_is_phandle was declared
 at the wrong scope and thus was initialized before the for
 loop instead of each time through the loop.  This resulted
 in any property in a node after the phandle property not
 being unflattened.

Changes from v4:
   - rebase on 4.12-rc1
   - Add reason for "" in of_attach_node()
   - Simplify and consolidate phandle detection logic in
 populate_properties().  This results in a change of behaviour,
 the value of property "ibm,phandle" will no longer override the
 value of properties "phandle" and "linux,phandle".

Changes from v3:
   - patch 1: fix incorrect variable name in __of_add_phandle_sysfs().
 Problem was reported by the kbuild test robot

Changes from v2:
   - patch 1: Remove check in __of_add_phandle_sysfs() that would not
 add a sysfs entry if IS_ENABLED(CONFIG_PPC_PSERIES)

Changes from v1:
   - Remove phandle properties in of_attach_node(), before attaching
 the node to the live tree.
   - Add the phandle sysfs entry for the node in of_attach_node().
   - When creating an overlay changeset, duplicate the node phandle in
 __of_node_dup().


*** BLURB HERE ***

Frank Rowand (4):
  of: remove *phandle properties from expanded device tree
  of: make __of_attach_node() static
  of: be consistent in form of file mode
  of: detect invalid phandle in overlay

 drivers/of/base.c   | 50 +++
 drivers/of/dynamic.c| 57 +
 drivers/of/fdt.c| 43 ++---
 drivers/of/of_private.h |  2 +-
 drivers/of/overlay.c|  8 ---
 drivers/of/resolver.c   | 23 +---
 include/linux/of.h  |  1 +
 7 files changed, 118 insertions(+), 66 deletions(-)

-- 
Frank Rowand 



Re: [PATCH v2] perf: libdw support for powerpc

2017-06-20 Thread Arnaldo Carvalho de Melo
Em Thu, Jun 01, 2017 at 12:24:41PM +0200, Paolo Bonzini escreveu:
> Porting PPC to libdw only needs an architecture-specific hook to move
> the register state from perf to libdw.
> 
> The ARM and x86 architectures already use libdw, and it is useful to
> have as much common code for the unwinder as possible.  Mark Wielaard
> has contributed a frame-based unwinder to libdw, so that unwinding works
> even for binaries that do not have CFI information.  In addition,
> libunwind is always preferred to libdw by the build machinery so this
> cannot introduce regressions on machines that have both libunwind and
> libdw installed.
> 
> Cc: a...@kernel.org
> Cc: Naveen N. Rao 
> Cc: Ravi Bangoria 
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Paolo Bonzini 
> ---
>   v1->v2: fix for 4.11->4.12 changes

Thanks, I'll test it and collect the Acked-by provided, will go into
perf/core.

- Arnaldo
 
>  tools/perf/Makefile.config  |  2 +-
>  tools/perf/arch/powerpc/util/Build  |  2 +
>  tools/perf/arch/powerpc/util/unwind-libdw.c | 73 
> +
>  3 files changed, 76 insertions(+), 1 deletion(-)
>  create mode 100644 tools/perf/arch/powerpc/util/unwind-libdw.c
> 
> diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
> index 8354d04b392f..e7b04a729417 100644
> --- a/tools/perf/Makefile.config
> +++ b/tools/perf/Makefile.config
> @@ -61,7 +61,7 @@ endif
>  # Disable it on all other architectures in case libdw unwind
>  # support is detected in system. Add supported architectures
>  # to the check.
> -ifneq ($(ARCH),$(filter $(ARCH),x86 arm))
> +ifneq ($(ARCH),$(filter $(ARCH),x86 arm powerpc))
>NO_LIBDW_DWARF_UNWIND := 1
>  endif
>  
> diff --git a/tools/perf/arch/powerpc/util/Build 
> b/tools/perf/arch/powerpc/util/Build
> index 90ad64b231cd..2e6595310420 100644
> --- a/tools/perf/arch/powerpc/util/Build
> +++ b/tools/perf/arch/powerpc/util/Build
> @@ -5,4 +5,6 @@ libperf-y += perf_regs.o
>  
>  libperf-$(CONFIG_DWARF) += dwarf-regs.o
>  libperf-$(CONFIG_DWARF) += skip-callchain-idx.o
> +
>  libperf-$(CONFIG_LIBUNWIND) += unwind-libunwind.o
> +libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
> diff --git a/tools/perf/arch/powerpc/util/unwind-libdw.c 
> b/tools/perf/arch/powerpc/util/unwind-libdw.c
> new file mode 100644
> index ..3a24b3c43273
> --- /dev/null
> +++ b/tools/perf/arch/powerpc/util/unwind-libdw.c
> @@ -0,0 +1,73 @@
> +#include 
> +#include "../../util/unwind-libdw.h"
> +#include "../../util/perf_regs.h"
> +#include "../../util/event.h"
> +
> +/* See backends/ppc_initreg.c and backends/ppc_regs.c in elfutils.  */
> +static const int special_regs[3][2] = {
> + { 65, PERF_REG_POWERPC_LINK },
> + { 101, PERF_REG_POWERPC_XER },
> + { 109, PERF_REG_POWERPC_CTR },
> +};
> +
> +bool libdw__arch_set_initial_registers(Dwfl_Thread *thread, void *arg)
> +{
> + struct unwind_info *ui = arg;
> + struct regs_dump *user_regs = &ui->sample->user_regs;
> + Dwarf_Word dwarf_regs[32], dwarf_nip;
> + size_t i;
> +
> +#define REG(r) ({\
> + Dwarf_Word val = 0; \
> + perf_reg_value(&val, user_regs, PERF_REG_POWERPC_##r);  \
> + val;\
> +})
> +
> + dwarf_regs[0]  = REG(R0);
> + dwarf_regs[1]  = REG(R1);
> + dwarf_regs[2]  = REG(R2);
> + dwarf_regs[3]  = REG(R3);
> + dwarf_regs[4]  = REG(R4);
> + dwarf_regs[5]  = REG(R5);
> + dwarf_regs[6]  = REG(R6);
> + dwarf_regs[7]  = REG(R7);
> + dwarf_regs[8]  = REG(R8);
> + dwarf_regs[9]  = REG(R9);
> + dwarf_regs[10] = REG(R10);
> + dwarf_regs[11] = REG(R11);
> + dwarf_regs[12] = REG(R12);
> + dwarf_regs[13] = REG(R13);
> + dwarf_regs[14] = REG(R14);
> + dwarf_regs[15] = REG(R15);
> + dwarf_regs[16] = REG(R16);
> + dwarf_regs[17] = REG(R17);
> + dwarf_regs[18] = REG(R18);
> + dwarf_regs[19] = REG(R19);
> + dwarf_regs[20] = REG(R20);
> + dwarf_regs[21] = REG(R21);
> + dwarf_regs[22] = REG(R22);
> + dwarf_regs[23] = REG(R23);
> + dwarf_regs[24] = REG(R24);
> + dwarf_regs[25] = REG(R25);
> + dwarf_regs[26] = REG(R26);
> + dwarf_regs[27] = REG(R27);
> + dwarf_regs[28] = REG(R28);
> + dwarf_regs[29] = REG(R29);
> + dwarf_regs[30] = REG(R30);
> + dwarf_regs[31] = REG(R31);
> + if (!dwfl_thread_state_registers(thread, 0, 32, dwarf_regs))
> + return false;
> +
> + dwarf_nip = REG(NIP);
> + dwfl_thread_state_register_pc(thread, dwarf_nip);
> + for (i = 0; i < ARRAY_SIZE(special_regs); i++) {
> + Dwarf_Word val = 0;
> + perf_reg_value(&val, user_regs, special_regs[i][1]);
> + if (!dwfl_thread_state_registers(thread,
> +  special_regs[i][0], 1,
> +  &val))
> +  

Network TX Stall on 440EP Processor

2017-06-20 Thread Thomas Besemer
I'm working on a project that is derived from the Yosemite
PPC 440EP board.  It's a legacy project that was running the
2.6.24 Kernel, and network traffic was stalling due to transmission
halting without an understandable error (in this error condition, the
various
status registers of network interface showed no issues), other
than TX stalling due to Buffer Descriptor Ring becoming full.

In order to see if the problem has been resolved, the Kernel
has been updated to 4.9.13, compiled with gcc version 5.4.0
(Buildroot 2017.02.2).  Although the frequency of the
problem is decreased, it still does show up.

The test case is the Linux Target running idle, no application
code.  From a Linux host on a directly connected network, 30
flood pings are started.  After a period of several minutes to
perhaps hours, the transmit aspect of the network controller
ceases to transmit packets (Buffer Descriptor ring becomes full).
RX still works.  In the 2.6.24 Kernel, the problem happens
within seconds, so it has improved with the new Kernel.

Below is the output from the Kernel when this happens.

Has anybody seen this problem before?  I can't find any
errata on it, nor can I find any reports of it.

The orginal problem is rooted in the Embedded Application
running, and after a period of time of heavy network
traffic, the TX side of network stalls.  The flood ping
test is used simply to force the problem to happen.

[ 3127.143572] NETDEV WATCHDOG: eth0 (emac): transmit queue 0 timed out
[ 3127.150172] [ cut here ]
[ 3127.154778] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316
dev_watchdog+0x23c/0x244
[ 3127.162965] Modules linked in:
[ 3127.166013] CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.13 #9
[ 3127.171707] task: c0e67300 task.stack: c0f0
[ 3127.176192] NIP: c068e734 LR: c068e734 CTR: c04672f4
[ 3127.181107] REGS: c0f01c90 TRAP: 0700   Not tainted  (4.9.13)
[ 3127.186793] MSR: 00029000 [ 3127.190241]   CR: 2812  XER:

[ 3127.194210]
GPR00: c068e734 c0f01d40 c0e67300 0038 d1006301 00df c04683e4
00df
GPR08: 00df c0eff4b0 c0eff4b0 0004 24122424 00b960f0 
c0e8
GPR16: 000ac8c1 c07b8618 c098bddc c0e69000 000a c0ee c0e73f20
c0f0
GPR24: c100e4e8 c0ee c0e77d60 c3128000 c068e4f8 c0e8 
c3128000
NIP [c068e734] dev_watchdog+0x23c/0x244
[ 3127.227680] LR [c068e734] dev_watchdog+0x23c/0x244
[ 3127.232427] Call Trace:
[ 3127.234857] [c0f01d40] [c068e734] dev_watchdog+0x23c/0x244 (unreliable)
[ 3127.241447] [c0f01d60] [c00805e8] call_timer_fn+0x40/0x118
[ 3127.246889] [c0f01d80] [c00808e8] expire_timers.isra.13+0xbc/0x114
[ 3127.253032] [c0f01db0] [c0080a94] run_timer_softirq+0x90/0xf0
[ 3127.258753] [c0f01e00] [c07b31b4] __do_softirq+0x114/0x2b0
[ 3127.264202] [c0f01e60] [c002a158] irq_exit+0xe8/0xec
[ 3127.269144] [c0f01e70] [c0008c98] timer_interrupt+0x34/0x4c
[ 3127.274684] [c0f01e80] [c000ec94] ret_from_except+0x0/0x18
[ 3127.280151] --- interrupt: 901 at cpm_idle+0x3c/0x70
[ 3127.280151] LR = arch_cpu_idle+0x30/0x68
[ 3127.289300] [c0f01f40] [c0f058e4] cpu_idle_force_poll+0x0/0x4
(unreliable)
[ 3127.296146] [c0f01f50] [c00073e4] arch_cpu_idle+0x30/0x68
[ 3127.301509] [c0f01f60] [c005bce8] cpu_startup_entry+0x184/0x1bc
[ 3127.307392] [c0f01fb0] [c0a76a1c] start_kernel+0x3d4/0x3e8
[ 3127.312843] [c0f01ff0] [c0b4] _start+0xb4/0xf8
[ 3127.317599] Instruction dump:
[ 3127.320557] 811f0284 4b78 3921 7fe3fb78 99281966 4bfd9cd5
7c651b78 3c60c0a1
[ 3127.328359] 7fc6f378 7fe4fb78 3863357c 48125319 <0fe0> 4bb8
7c0802a6 90010004
[ 3127.336327] ---[ end trace c31dfe4772ff0e8f ]---


PPC 266MHz 8347E slow kernel decompress

2017-06-20 Thread Christian Melki
Hi.


I recently upgraded one of Ericssons platforms from a old 3.6.x to the latest 
3.16.x LTS kernel.
It was a smooth upgrade, besides kernel decompression taking much longer to 
complete. From like 2 seconds to 10-12 seconds.
I also tried a 4.10.z kernel with the same result.
The decompression code doesn't have any significant changes as far as I can 
see. Maybe I am looking in the wrong place?
Did the ppc arch init change between 3.6.x and 3.16.x? I am thinking of cache 
init, prefetch copy functions etc?
The old Redboot loader starts the kernel with caches off. Maybe the older init 
re-initialized the caches before decompression?
Bootloader is identical, so is the compiler. Nothing has changed besides the 
kernel itself.
It is 1.5M or so, no external modules. Runtime speed after decompression seems 
absolutely normal.
I couldn't find anything significant anywhere regarding this.


Hints would be much appreciated.


Regards,

Christian


1M hugepage size being registered on Linux

2017-06-20 Thread victora

Hi Alistair/Jeremy,

I am working on a bug related to 1M hugepage size being registered on 
Linux (Power 8 Baremetal - Garrison).


I was checking dmesg and it seems that 1M page size is coming from 
firmware to Linux.


[0.00] base_shift=20: shift=20, sllp=0x0130, avpnm=0x, 
tlbiel=0, penc=2

[1.528867] HugeTLB registered 1 MB page size, pre-allocated 0 pages

Should Linux support this page size? As afar as I know, this was an 
unsupported page size in the past isn't it? If this should be supported 
now, is there any specific reason for that?


Thanks,

Victor Aoqui
Software Engineer: Linux Kernel Backports
Linux Technology Center
IBM Systems



Re: [PATCH V2 0/2] hwmon: (ibmpowernv) Add support for current(A) sensors

2017-06-20 Thread Guenter Roeck
On Tue, Jun 20, 2017 at 10:38:11AM +0530, Shilpasri G Bhat wrote:
> The first patch from Cedric in the patchset cleans up the driver to
> provide a neater way to define new sensor types. The second patch adds
> current sensor.
> 
> Cédric Le Goater (1):
>   hwmon: (ibmpowernv) introduce a legacy_compatibles array
> 
> Shilpasri G Bhat (1):
>   hwmon: (ibmpowernv) Add current(A) sensor
> 

Series applied to hwmon-next.

Thanks,
Guenter


Re: [PATCH 0/2] fix loadable module for DPAA Ethernet

2017-06-20 Thread David Miller
From: Madalin Bucur 
Date: Mon, 19 Jun 2017 18:04:15 +0300

> The DPAA Ethernet makes use of a symbol that is not exported.
> Address the issue by propagating the dma_ops rather than calling
> arch_setup_dma_ops().

Series applied, thanks.


Re: [PATCH v5 2/2] powerpc/fadump: update documentation about 'fadump_append=' parameter

2017-06-20 Thread Hari Bathini



On Friday 09 June 2017 05:34 PM, Michal Suchánek wrote:

On Thu, 8 Jun 2017 23:30:37 +0530
Hari Bathini  wrote:


Hi Michal,

Sorry for taking this long to respond. I was working on a few other
things.

On Monday 15 May 2017 02:59 PM, Michal Suchánek wrote:

Hello,

On Mon, 15 May 2017 12:59:46 +0530
Hari Bathini  wrote:
  

On Friday 12 May 2017 09:12 PM, Michal Suchánek wrote:

On Fri, 12 May 2017 15:15:33 +0530
Hari Bathini  wrote:
 

On Thursday 11 May 2017 06:46 PM, Michal Suchánek wrote:

On Thu, 11 May 2017 02:00:11 +0530
Hari Bathini  wrote:


Hello Michal,

On Wednesday 10 May 2017 09:31 PM, Michal Suchánek wrote:

Hello,

On Wed, 03 May 2017 23:52:52 +0530
Hari Bathini  wrote:
   

With the introduction of 'fadump_append=' parameter to pass
additional parameters to fadump (capture) kernel, update
documentation about it.

Signed-off-by: Hari Bathini 
---

Changes from v4:
* Based on top of patchset that includes
   
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=akpm&id=05f383cdfba8793240e73f9a9fbff4e25d66003f


  Documentation/powerpc/firmware-assisted-dump.txt |   10
+- 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/powerpc/firmware-assisted-dump.txt
b/Documentation/powerpc/firmware-assisted-dump.txt index
8394bc8..6327193 100644 ---
a/Documentation/powerpc/firmware-assisted-dump.txt +++
b/Documentation/powerpc/firmware-assisted-dump.txt @@ -162,7
+162,15 @@ How to enable firmware-assisted dump (fadump):
  1. Set config option CONFIG_FA_DUMP=y and build kernel.
  2. Boot into linux kernel with 'fadump=on' kernel
cmdline option. -3. Optionally, user can also set
'crashkernel=' kernel cmdline +3. A user can pass additional
command line parameters as a comma
+   separated list through 'fadump_append=' parameter, to be
enforced
+   when fadump is active. For example, if parameters like
nr_cpus=1,
+   numa=off & udev.children-max=2 are to be enforced when
fadump is
+   active,
'fadump_append=nr_cpus=1,numa=off,udev.children-max=2'
+   can be passed in command line, which will be replaced
with
+   "nr_cpus=1 numa=off udev.children-max=2" when fadump is
active.
+   This helps in reducing memory consumption during dump
capture. +4. Optionally, user can also set 'crashkernel='
kernel cmdline to specify size of the memory to reserve for
boot memory dump preservation.
  
   

Writing your own deficient parser for comma separated
arguments when perfectly fine parser for space separated
quoted arguments exists in the kernel and the bootloader does
not seem like a good idea to me.

Couple of things that prompted me for v5 are:
   1. Using parse_early_options() limits the kind of
parameters that can be passed to fadump capture kernel. Passing
parameters like systemd.unit= & udev.childern.max= has no
effect with v4. Updating
  boot_command_line parameter, when fadump is active,
seems a better alternative.

   2. Passing space-separated quoted arguments is not
working as intended with lilo. Updating bootloader with the
below entry in /etc/lilo.conf file results in a missing append
  entry in /etc/yaboot.conf file.

append = "quiet sysrq=1 insmod=sym53c8xx insmod=ipr
crashkernel=512M-:256M fadump_append=\"nr_cpus=1 numa=off
udev.children-max=2\""

Meaning that a script that emulates LILO semantics on top of
yaboot which is completely capable of passing qouted space
separated arguments fails. IMHO it is more reasonable to fix the
script or whatever adaptation layer or use yaboot directly than
working around bug in said script by introducing a new argument
parser in the kernel.



Hmmm.. while trying to implement space-separated parameter list
with quotes as syntax for fadump_append parameter, noticed that
it can make implemenation
more vulnerable. Here are some problems I am facing while
implementing this..

How so?

presumably you can reuse parse_args even if you do not register
with early_param and call it yourself. Then your parsing of
fadump_append is

I wasn't aware of that. Thanks for pointing it out, Michal.
Will try to use parse_args and get back.
  

I was thinking a bit more about the uses of the commandline and how
fadump_append potentially breaks it.

The first thing that should be addressed and is the special --
argument which denotes the start of init arguments that are not to
be parsed by the kernel. Incidentally the initial implementation
using early_param happened to handles that without issue.
parse_args surely handles that so adding a hook somewhere should
give you location of that argument (if any). And interesting thing
that can happen is passing an -- inside the fadump_append argument.
It should be handled (or not) in some way or other and the handling
documented.


The intention with this patch is to replace

"root=/dev/sda2 ro fadump_append=nr_cpus=1,numa=off
crashkernel=1024M"

with

"root=/dev/sda2 ro nr_cpus=1 numa=off c

[PATCH V6 2/2] powerpc/numa: Update CPU topology when VPHN enabled

2017-06-20 Thread Michael Bringmann

powerpc/numa: Correct the currently broken capability to set the
topology for shared CPUs in LPARs.  At boot time for shared CPU
lpars, the topology for each shared CPU is set to node zero, however,
this is now updated correctly using the Virtual Processor Home Node
(VPHN) capabilities information provided by the pHyp.

Also, update initialization checks for device-tree attributes to
independently recognize PRRN or VPHN usage.

Signed-off-by: Michael Bringmann 
---
Changes in V6:
  -- Place extern of timed_topology_update() proto under additional #ifdef
 for hotplug-cpu.
---
 arch/powerpc/include/asm/topology.h  |   16 +++
 arch/powerpc/mm/numa.c   |   64 +++---
 arch/powerpc/platforms/pseries/dlpar.c   |2 +
 arch/powerpc/platforms/pseries/hotplug-cpu.c |2 +
 4 files changed, 77 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 9cc6ec9..ae3cdd0 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -79,6 +79,22 @@ static inline int prrn_is_enabled(void)
 }
 #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */
 
+#if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR) && \
+   defined(CONFIG_HOTPLUG_CPU)
+extern int timed_topology_update(int nsecs);
+#else
+static int timed_topology_update(int nsecs)
+{
+   return 0;
+}
+#endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR && CONFIG_HOTPLUG_CPU */
+
+#if defined(CONFIG_PPC_SPLPAR)
+extern void shared_topology_update(void);
+#else
+#defineshared_topology_update()0
+#endif /* CONFIG_PPC_SPLPAR */
+
 #include 
 
 #ifdef CONFIG_SMP
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 0746d93..cf5992d 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -935,7 +936,7 @@ void __init initmem_init(void)
 
/*
 * Reduce the possible NUMA nodes to the online NUMA nodes,
-* since we do not support node hotplug. This ensures that  we
+* since we do not support node hotplug. This ensures that we
 * lower the maximum NUMA node ID to what is actually present.
 */
nodes_and(node_possible_map, node_possible_map, node_online_map);
@@ -1179,11 +1180,32 @@ struct topology_update_data {
int new_nid;
 };
 
+#defineTOPOLOGY_DEF_TIMER_SECS 60
+
 static u8 vphn_cpu_change_counts[NR_CPUS][MAX_DISTANCE_REF_POINTS];
 static cpumask_t cpu_associativity_changes_mask;
 static int vphn_enabled;
 static int prrn_enabled;
 static void reset_topology_timer(void);
+static int topology_timer_secs = TOPOLOGY_DEF_TIMER_SECS;
+static int topology_inited;
+static int topology_update_needed;
+
+/*
+ * Change polling interval for associativity changes.
+ */
+int timed_topology_update(int nsecs)
+{
+   if (nsecs > 0)
+   topology_timer_secs = nsecs;
+   else
+   topology_timer_secs = TOPOLOGY_DEF_TIMER_SECS;
+
+   if (vphn_enabled)
+   reset_topology_timer();
+
+   return 0;
+}
 
 /*
  * Store the current values of the associativity change counters in the
@@ -1277,6 +1299,12 @@ static long vphn_get_associativity(unsigned long cpu,
"hcall_vphn() experienced a hardware fault "
"preventing VPHN. Disabling polling...\n");
stop_topology_update();
+   break;
+   case H_SUCCESS:
+   printk(KERN_INFO
+   "VPHN hcall succeeded. Reset polling...\n");
+   timed_topology_update(0);
+   break;
}
 
return rc;
@@ -1354,8 +1382,11 @@ int numa_update_cpu_topology(bool cpus_locked)
struct device *dev;
int weight, new_nid, i = 0;
 
-   if (!prrn_enabled && !vphn_enabled)
+   if (!prrn_enabled && !vphn_enabled) {
+   if (!topology_inited)
+   topology_update_needed = 1;
return 0;
+   }
 
weight = cpumask_weight(&cpu_associativity_changes_mask);
if (!weight)
@@ -1394,6 +1425,8 @@ int numa_update_cpu_topology(bool cpus_locked)
cpumask_andnot(&cpu_associativity_changes_mask,
&cpu_associativity_changes_mask,
cpu_sibling_mask(cpu));
+   pr_info("Assoc chg gives same node %d for cpu%d\n",
+   new_nid, cpu);
cpu = cpu_last_thread_sibling(cpu);
continue;
}
@@ -1410,6 +1443,9 @@ int numa_update_cpu_topology(bool cpus_locked)
cpu = cpu_last_thread_sibling(cpu);
}
 
+   if (i)
+   updates[i-1].next = NULL;
+
pr_debug("Topology update for the following CPUs:\n");
if (cpumask_weight(

[PATCH V6 1/2] powerpc/hotplug: Ensure enough nodes avail for operations

2017-06-20 Thread Michael Bringmann

powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU
or memory resources, it may occur that the new resources are to be
inserted into nodes that were not used for these resources at bootup.
In the kernel, any node that is used must be defined and initialized
at boot.  In order to meet both needs, this patch adds a new kernel
command line option (numnodes=) for use by the PowerPC architecture-
specific code that defines the maximum number of nodes that the kernel
will ever need in its current hardware environment.  The boot code that
initializes nodes for PowerPC will read this value and use it to ensure
that all of the desired nodes are setup in the 'node_possible_map', and
elsewhere.

Signed-off-by: Michael Bringmann 
---
---
 arch/powerpc/mm/numa.c |   31 +++
 1 file changed, 31 insertions(+)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index e6f742d..0746d93 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -60,10 +60,27 @@
 static int n_mem_addr_cells, n_mem_size_cells;
 static int form1_affinity;
 
+#define TOPOLOGY_DEF_NUM_NODES 0
 #define MAX_DISTANCE_REF_POINTS 4
 static int distance_ref_points_depth;
 static const __be32 *distance_ref_points;
 static int distance_lookup_table[MAX_NUMNODES][MAX_DISTANCE_REF_POINTS];
+static int topology_num_nodes = TOPOLOGY_DEF_NUM_NODES;
+
+/*
+ * Topology-related early parameters
+ */
+static int __init early_num_nodes(char *p)
+{
+   if (!p)
+   return 1;
+
+   topology_num_nodes = memparse(p, &p);
+   dbg("topology num nodes = 0x%d\n", topology_num_nodes);
+
+   return 0;
+}
+early_param("numnodes", early_num_nodes);
 
 /*
  * Allocate node_to_cpumask_map based on number of available nodes
@@ -892,6 +909,18 @@ static void __init setup_node_data(int nid, u64 start_pfn, 
u64 end_pfn)
NODE_DATA(nid)->node_spanned_pages = spanned_pages;
 }
 
+static void __init setup_min_nodes(void)
+{
+   int i, l = topology_num_nodes;
+
+   for (i = 0; i < l; i++) {
+   if (!node_possible(i)) {
+   setup_node_data(i, 0, 0);
+   node_set(i, node_possible_map);
+   }
+   }
+}
+
 void __init initmem_init(void)
 {
int nid, cpu;
@@ -911,6 +940,8 @@ void __init initmem_init(void)
 */
nodes_and(node_possible_map, node_possible_map, node_online_map);
 
+   setup_min_nodes();
+
for_each_online_node(nid) {
unsigned long start_pfn, end_pfn;
 



[PATCH V6 0/2] powerpc/dlpar: Correct display of hot-add/hot-remove CPUs and memory

2017-06-20 Thread Michael Bringmann

On Power systems with shared configurations of CPUs and memory, there
are some issues with association of additional CPUs and memory to nodes
when hot-adding resources.  These patches address some of those problems.

powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU
or memory resources, it may occur that the new resources are to be
inserted into nodes that were not used for these resources at bootup.
In the kernel, any node that is used must be defined and initialized
at boot.  In order to meet both needs, this patch adds a new kernel
command line option (numnodes=) for use by the PowerPC
architecture-specific code that defines the maximum number of nodes
that the kernel will ever need in its current hardware environment.
The boot code that initializes nodes for PowerPC will read this value
and use it to ensure that all of the desired nodes are setup in the
'node_possible_map', and elsewhere.

powerpc/numa: Correct the currently broken capability to set the
topology for shared CPUs in LPARs.  At boot time for shared CPU
lpars, the topology for each shared CPU is set to node zero, however,
this is now updated correctly using the Virtual Processor Home Node
(VPHN) capabilities information provided by the pHyp. The VPHN handling
in Linux is disabled, if PRRN handling is present.

Signed-off-by: Michael Bringmann 

Michael Bringmann (2):
  powerpc/hotplug: Add option to define max nodes allowing dynamic
  growth of resources.
  powerpc/numa: Update CPU topology when VPHN enabled
---
Changes in V6:
  -- Reorder some code to better eliminate unused functions in
   conditional builds.



[PATCH] powerpc/64: Initialise thread_info for emergency stacks

2017-06-20 Thread Nicholas Piggin
Emergency stacks have their thread_info mostly uninitialised, which in
particular means garbage preempt_count values.

Emergency stack code runs with interrupts disabled entirely, and is
used very rarely, so this has been unnoticed so far. It was found by a
proposed new powerpc watchdog that takes a soft-NMI directly from the
masked_interrupt handler and using the emergency stack. That crashed at
BUG_ON(in_nmi()) in nmi_enter(). preempt_count()s were found to be
garbage.

Reported-by: Abdul Haleem 
Signed-off-by: Nicholas Piggin 
---

FYI, this bug looks to be breaking linux-next on some powerpc
boxes due to interaction with a proposed new powerpc watchdog
driver Andrew has in his tree:

http://marc.info/?l=linuxppc-embedded&m=149794320519941&w=2

 arch/powerpc/include/asm/thread_info.h | 19 +++
 arch/powerpc/kernel/setup_64.c |  6 +++---
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index a941cc6fc3e9..5995e4b2996d 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -54,6 +54,7 @@ struct thread_info {
.task = &tsk,   \
.cpu =  0,  \
.preempt_count = INIT_PREEMPT_COUNT,\
+   .local_flags =  0,  \
.flags =0,  \
 }
 
@@ -62,6 +63,24 @@ struct thread_info {
 
 #define THREAD_SIZE_ORDER  (THREAD_SHIFT - PAGE_SHIFT)
 
+/*
+ * Emergency stacks are used for a range of things, from asynchronous
+ * NMIs (system reset, machine check) to synchronous, process context.
+ * Set HARDIRQ_OFFSET because we don't know exactly what context we
+ * come from or if it had a valid stack, which is about the best we
+ * can do.
+ * TODO: what to do with accounting?
+ */
+#define emstack_init_thread_info(ti, c)\
+do {   \
+   (ti)->task = NULL;  \
+   (ti)->cpu = (c);\
+   (ti)->preempt_count = HARDIRQ_OFFSET;   \
+   (ti)->local_flags = 0;  \
+   (ti)->flags = 0;\
+   klp_init_thread_info(ti);   \
+} while (0)
+
 /* how to get the thread information struct from C */
 static inline struct thread_info *current_thread_info(void)
 {
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index f35ff9dea4fb..54c4336655f8 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -639,18 +639,18 @@ void __init emergency_stack_init(void)
for_each_possible_cpu(i) {
struct thread_info *ti;
ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit));
-   klp_init_thread_info(ti);
+   emstack_init_thread_info(ti, i);
paca[i].emergency_sp = (void *)ti + THREAD_SIZE;
 
 #ifdef CONFIG_PPC_BOOK3S_64
/* emergency stack for NMI exception handling. */
ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit));
-   klp_init_thread_info(ti);
+   emstack_init_thread_info(ti, i);
paca[i].nmi_emergency_sp = (void *)ti + THREAD_SIZE;
 
/* emergency stack for machine check exception handling. */
ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit));
-   klp_init_thread_info(ti);
+   emstack_init_thread_info(ti, i);
paca[i].mc_emergency_sp = (void *)ti + THREAD_SIZE;
 #endif
}
-- 
2.11.0



Re: clean up and modularize arch dma_mapping interface

2017-06-20 Thread Christoph Hellwig
On Tue, Jun 20, 2017 at 11:19:02AM +0200, Daniel Vetter wrote:
> Ack for the 2 drm patches, but I can also pick them up through drm-misc if
> you prefer that (but then it'll be 4.14).

Nah, I'll plan to set up a dma-mapping tree so that we'll have common
place for dma-mapping work.


Re: new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2

2017-06-20 Thread Christoph Hellwig
On Tue, Jun 20, 2017 at 11:04:00PM +1000, Stephen Rothwell wrote:
> git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git#dma-mapping-next
> 
> Contacts: Marek Szyprowski and Kyungmin Park (cc'd)
> 
> I have called your tree dma-mapping-hch for now.  The other tree has
> not been updated since 4.9-rc1 and I am not sure how general it is.
> Marek, Kyungmin, any comments?

I'd be happy to join efforts - co-maintainers and reviers are always
welcome.


Re: new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2

2017-06-20 Thread Christoph Hellwig
On Tue, Jun 20, 2017 at 02:14:36PM +0100, Robin Murphy wrote:
> Hi Christoph,
> 
> On 20/06/17 13:41, Christoph Hellwig wrote:
> > On Fri, Jun 16, 2017 at 08:10:15PM +0200, Christoph Hellwig wrote:
> >> I plan to create a new dma-mapping tree to collect all this work.
> >> Any volunteers for co-maintainers, especially from the iommu gang?
> > 
> > Ok, I've created the new tree:
> > 
> >git://git.infradead.org/users/hch/dma-mapping.git for-next
> > 
> > Gitweb:
> > 
> >
> > http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/for-next
> > 
> > And below is the patch to add the MAINTAINERS entry, additions welcome.
> 
> I'm happy to be a reviewer, since I've been working in this area for
> some time, particularly with the dma-iommu code and arm64 DMA ops.

Great, I'll add you!


Re: new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2

2017-06-20 Thread Robin Murphy
Hi Christoph,

On 20/06/17 13:41, Christoph Hellwig wrote:
> On Fri, Jun 16, 2017 at 08:10:15PM +0200, Christoph Hellwig wrote:
>> I plan to create a new dma-mapping tree to collect all this work.
>> Any volunteers for co-maintainers, especially from the iommu gang?
> 
> Ok, I've created the new tree:
> 
>git://git.infradead.org/users/hch/dma-mapping.git for-next
> 
> Gitweb:
> 
>
> http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/for-next
> 
> And below is the patch to add the MAINTAINERS entry, additions welcome.

I'm happy to be a reviewer, since I've been working in this area for
some time, particularly with the dma-iommu code and arm64 DMA ops.

Robin.

> Stephen, can you add this to linux-next?
> 
> ---
> From 335979c41912e6c101a20b719862b2d837370df1 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Tue, 20 Jun 2017 11:17:30 +0200
> Subject: MAINTAINERS: add entry for dma mapping helpers
> 
> This code has been spread between getting in through arch trees, the iommu
> tree, -mm and the drivers tree.  There will be a lot of work in this area,
> including consolidating various arch implementations into more common
> code, so ensure we have a proper git tree that facilitates cooperation with
> the architecture maintainers.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  MAINTAINERS | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 09b5ab6a8a5c..56859d53a424 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2595,6 +2595,19 @@ S: Maintained
>  F:   net/bluetooth/
>  F:   include/net/bluetooth/
>  
> +DMA MAPPING HELPERS
> +M:   Christoph Hellwig 
> +L:   linux-ker...@vger.kernel.org
> +T:   git git://git.infradead.org/users/hch/dma-mapping.git
> +W:   http://git.infradead.org/users/hch/dma-mapping.git
> +S:   Supported
> +F:   lib/dma-debug.c
> +F:   lib/dma-noop.c
> +F:   lib/dma-virt.c
> +F:   drivers/base/dma-mapping.c
> +F:   drivers/base/dma-coherent.c
> +F:   include/linux/dma-mapping.h
> +
>  BONDING DRIVER
>  M:   Jay Vosburgh 
>  M:   Veaceslav Falico 
> 



Re: new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2

2017-06-20 Thread Stephen Rothwell
Hi Christoph,

On Tue, 20 Jun 2017 14:41:40 +0200 Christoph Hellwig  wrote:
>
> On Fri, Jun 16, 2017 at 08:10:15PM +0200, Christoph Hellwig wrote:
> > I plan to create a new dma-mapping tree to collect all this work.
> > Any volunteers for co-maintainers, especially from the iommu gang?  
> 
> Ok, I've created the new tree:
> 
>git://git.infradead.org/users/hch/dma-mapping.git for-next
> 
> Gitweb:
> 
>
> http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/for-next
> 
> And below is the patch to add the MAINTAINERS entry, additions welcome.
> 
> Stephen, can you add this to linux-next?

Added from tomorrow.

I have another tree called dma-mapping:

git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git#dma-mapping-next

Contacts: Marek Szyprowski and Kyungmin Park (cc'd)

I have called your tree dma-mapping-hch for now.  The other tree has
not been updated since 4.9-rc1 and I am not sure how general it is.
Marek, Kyungmin, any comments?

Thanks for adding your subsystem tree as a participant of linux-next.  As
you may know, this is not a judgement of your code.  The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window. 

You will need to ensure that the patches/commits in your tree/series have
been:
 * submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
 * posted to the relevant mailing list,
 * reviewed by you (or another maintainer of your subsystem tree),
 * successfully unit tested, and 
 * destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch).  It is allowed to be rebased if you deem it necessary.

-- 
Cheers,
Stephen Rothwell 
s...@canb.auug.org.au


Re: [PATCH 1/3] powerpc/64s: Use BRANCH_TO_COMMON() for slb_miss_realmode

2017-06-20 Thread Nicholas Piggin
On Tue, 20 Jun 2017 22:34:55 +1000
Michael Ellerman  wrote:

> All the callers of slb_miss_realmode currently open code the #ifndef
> CONFIG_RELOCATABLE check and the branch via CTR in the RELOCATABLE case.
> We have a macro to do this, BRANCH_TO_COMMON(), so use it.
> 
> Signed-off-by: Michael Ellerman 

These 3 all look good to me.

Reviewed-by: Nicholas Piggin 


new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2

2017-06-20 Thread Christoph Hellwig
On Fri, Jun 16, 2017 at 08:10:15PM +0200, Christoph Hellwig wrote:
> I plan to create a new dma-mapping tree to collect all this work.
> Any volunteers for co-maintainers, especially from the iommu gang?

Ok, I've created the new tree:

   git://git.infradead.org/users/hch/dma-mapping.git for-next

Gitweb:

   
http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/for-next

And below is the patch to add the MAINTAINERS entry, additions welcome.

Stephen, can you add this to linux-next?

---
>From 335979c41912e6c101a20b719862b2d837370df1 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Tue, 20 Jun 2017 11:17:30 +0200
Subject: MAINTAINERS: add entry for dma mapping helpers

This code has been spread between getting in through arch trees, the iommu
tree, -mm and the drivers tree.  There will be a lot of work in this area,
including consolidating various arch implementations into more common
code, so ensure we have a proper git tree that facilitates cooperation with
the architecture maintainers.

Signed-off-by: Christoph Hellwig 
---
 MAINTAINERS | 13 +
 1 file changed, 13 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 09b5ab6a8a5c..56859d53a424 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2595,6 +2595,19 @@ S:   Maintained
 F: net/bluetooth/
 F: include/net/bluetooth/
 
+DMA MAPPING HELPERS
+M: Christoph Hellwig 
+L: linux-ker...@vger.kernel.org
+T: git git://git.infradead.org/users/hch/dma-mapping.git
+W: http://git.infradead.org/users/hch/dma-mapping.git
+S: Supported
+F: lib/dma-debug.c
+F: lib/dma-noop.c
+F: lib/dma-virt.c
+F: drivers/base/dma-mapping.c
+F: drivers/base/dma-coherent.c
+F: include/linux/dma-mapping.h
+
 BONDING DRIVER
 M: Jay Vosburgh 
 M: Veaceslav Falico 
-- 
2.11.0



[PATCH 3/3] powerpc/64s: Rename slb_allocate_realmode() to slb_allocate()

2017-06-20 Thread Michael Ellerman
As for slb_miss_realmode(), rename slb_allocate_realmode() to avoid
confusion over whether it runs in real or virtual mode - it runs in
both.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/exceptions-64s.S |  2 +-
 arch/powerpc/mm/slb.c| 10 +-
 arch/powerpc/mm/slb_low.S|  6 +++---
 3 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 6ad755e0cb29..07b79c2c70f8 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -605,7 +605,7 @@ EXC_COMMON_BEGIN(slb_miss_common)
crset   4*cr0+eq
 #ifdef CONFIG_PPC_STD_MMU_64
 BEGIN_MMU_FTR_SECTION
-   bl  slb_allocate_realmode
+   bl  slb_allocate
 END_MMU_FTR_SECTION_IFCLR(MMU_FTR_TYPE_RADIX)
 #endif
 
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 654a0d7ba0e7..13cfe413b40d 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -33,15 +33,7 @@ enum slb_index {
KSTACK_INDEX= 2, /* Kernel stack map */
 };
 
-extern void slb_allocate_realmode(unsigned long ea);
-
-static void slb_allocate(unsigned long ea)
-{
-   /* Currently, we do real mode for all SLBs including user, but
-* that will change if we bring back dynamic VSIDs
-*/
-   slb_allocate_realmode(ea);
-}
+extern void slb_allocate(unsigned long ea);
 
 #define slb_esid_mask(ssize)   \
(((ssize) == MMU_SEGSIZE_256M)? ESID_MASK: ESID_MASK_1T)
diff --git a/arch/powerpc/mm/slb_low.S b/arch/powerpc/mm/slb_low.S
index 9869b44a04dc..bde378559d01 100644
--- a/arch/powerpc/mm/slb_low.S
+++ b/arch/powerpc/mm/slb_low.S
@@ -65,7 +65,7 @@ MMU_FTR_SECTION_ELSE  
\
 ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_68_BIT_VA)
 
 
-/* void slb_allocate_realmode(unsigned long ea);
+/* void slb_allocate(unsigned long ea);
  *
  * Create an SLB entry for the given EA (user or kernel).
  * r3 = faulting address, r13 = PACA
@@ -73,7 +73,7 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_68_BIT_VA)
  * r3 is preserved.
  * No other registers are examined or changed.
  */
-_GLOBAL(slb_allocate_realmode)
+_GLOBAL(slb_allocate)
/*
 * check for bad kernel/user address
 * (ea & ~REGION_MASK) >= PGTABLE_RANGE
@@ -309,7 +309,7 @@ slb_compare_rr_to_size:
b   7b
 
 
-_ASM_NOKPROBE_SYMBOL(slb_allocate_realmode)
+_ASM_NOKPROBE_SYMBOL(slb_allocate)
 _ASM_NOKPROBE_SYMBOL(slb_miss_kernel_load_linear)
 _ASM_NOKPROBE_SYMBOL(slb_miss_kernel_load_io)
 _ASM_NOKPROBE_SYMBOL(slb_compare_rr_to_size)
-- 
2.7.4



[PATCH 2/3] powerpc/64s: Rename slb_miss_realmode() to slb_miss_common()

2017-06-20 Thread Michael Ellerman
slb_miss_realmode() doesn't always runs in real mode, which is what the
name implies. So rename it to avoid confusing people.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/exceptions-64s.S | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 7bdfddbe0328..6ad755e0cb29 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -514,7 +514,7 @@ EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80)
mfspr   r3,SPRN_DAR
mfspr   r11,SPRN_SRR1
crset   4*cr6+eq
-   BRANCH_TO_COMMON(r10, slb_miss_realmode)
+   BRANCH_TO_COMMON(r10, slb_miss_common)
 EXC_REAL_END(data_access_slb, 0x380, 0x80)
 
 EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80)
@@ -525,7 +525,7 @@ EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80)
mfspr   r3,SPRN_DAR
mfspr   r11,SPRN_SRR1
crset   4*cr6+eq
-   BRANCH_TO_COMMON(r10, slb_miss_realmode)
+   BRANCH_TO_COMMON(r10, slb_miss_common)
 EXC_VIRT_END(data_access_slb, 0x4380, 0x80)
 TRAMP_KVM_SKIP(PACA_EXSLB, 0x380)
 
@@ -558,7 +558,7 @@ EXC_REAL_BEGIN(instruction_access_slb, 0x480, 0x80)
mfspr   r3,SPRN_SRR0/* SRR0 is faulting address */
mfspr   r11,SPRN_SRR1
crclr   4*cr6+eq
-   BRANCH_TO_COMMON(r10, slb_miss_realmode)
+   BRANCH_TO_COMMON(r10, slb_miss_common)
 EXC_REAL_END(instruction_access_slb, 0x480, 0x80)
 
 EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80)
@@ -569,13 +569,16 @@ EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80)
mfspr   r3,SPRN_SRR0/* SRR0 is faulting address */
mfspr   r11,SPRN_SRR1
crclr   4*cr6+eq
-   BRANCH_TO_COMMON(r10, slb_miss_realmode)
+   BRANCH_TO_COMMON(r10, slb_miss_common)
 EXC_VIRT_END(instruction_access_slb, 0x4480, 0x80)
 TRAMP_KVM(PACA_EXSLB, 0x480)
 
 
-/* This handler is used by both 0x380 and 0x480 slb miss interrupts */
-EXC_COMMON_BEGIN(slb_miss_realmode)
+/*
+ * This handler is used by the 0x380 and 0x480 SLB miss interrupts, as well as
+ * the virtual mode 0x4380 and 0x4480 interrupts if AIL is enabled.
+ */
+EXC_COMMON_BEGIN(slb_miss_common)
/*
 * r13 points to the PACA, r9 contains the saved CR,
 * r12 contains the saved r3,
-- 
2.7.4



[PATCH 1/3] powerpc/64s: Use BRANCH_TO_COMMON() for slb_miss_realmode

2017-06-20 Thread Michael Ellerman
All the callers of slb_miss_realmode currently open code the #ifndef
CONFIG_RELOCATABLE check and the branch via CTR in the RELOCATABLE case.
We have a macro to do this, BRANCH_TO_COMMON(), so use it.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/exceptions-64s.S | 42 
 1 file changed, 4 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index ed8628c6f0f4..7bdfddbe0328 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -514,18 +514,7 @@ EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80)
mfspr   r3,SPRN_DAR
mfspr   r11,SPRN_SRR1
crset   4*cr6+eq
-#ifndef CONFIG_RELOCATABLE
-   b   slb_miss_realmode
-#else
-   /*
-* We can't just use a direct branch to slb_miss_realmode
-* because the distance from here to there depends on where
-* the kernel ends up being put.
-*/
-   LOAD_HANDLER(r10, slb_miss_realmode)
-   mtctr   r10
-   bctr
-#endif
+   BRANCH_TO_COMMON(r10, slb_miss_realmode)
 EXC_REAL_END(data_access_slb, 0x380, 0x80)
 
 EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80)
@@ -536,18 +525,7 @@ EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80)
mfspr   r3,SPRN_DAR
mfspr   r11,SPRN_SRR1
crset   4*cr6+eq
-#ifndef CONFIG_RELOCATABLE
-   b   slb_miss_realmode
-#else
-   /*
-* We can't just use a direct branch to slb_miss_realmode
-* because the distance from here to there depends on where
-* the kernel ends up being put.
-*/
-   LOAD_HANDLER(r10, slb_miss_realmode)
-   mtctr   r10
-   bctr
-#endif
+   BRANCH_TO_COMMON(r10, slb_miss_realmode)
 EXC_VIRT_END(data_access_slb, 0x4380, 0x80)
 TRAMP_KVM_SKIP(PACA_EXSLB, 0x380)
 
@@ -580,13 +558,7 @@ EXC_REAL_BEGIN(instruction_access_slb, 0x480, 0x80)
mfspr   r3,SPRN_SRR0/* SRR0 is faulting address */
mfspr   r11,SPRN_SRR1
crclr   4*cr6+eq
-#ifndef CONFIG_RELOCATABLE
-   b   slb_miss_realmode
-#else
-   LOAD_HANDLER(r10, slb_miss_realmode)
-   mtctr   r10
-   bctr
-#endif
+   BRANCH_TO_COMMON(r10, slb_miss_realmode)
 EXC_REAL_END(instruction_access_slb, 0x480, 0x80)
 
 EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80)
@@ -597,13 +569,7 @@ EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80)
mfspr   r3,SPRN_SRR0/* SRR0 is faulting address */
mfspr   r11,SPRN_SRR1
crclr   4*cr6+eq
-#ifndef CONFIG_RELOCATABLE
-   b   slb_miss_realmode
-#else
-   LOAD_HANDLER(r10, slb_miss_realmode)
-   mtctr   r10
-   bctr
-#endif
+   BRANCH_TO_COMMON(r10, slb_miss_realmode)
 EXC_VIRT_END(instruction_access_slb, 0x4480, 0x80)
 TRAMP_KVM(PACA_EXSLB, 0x480)
 
-- 
2.7.4



Re: [BUG][next-20170619][347de24] PowerPC boot fails with Oops

2017-06-20 Thread Nicholas Piggin
On Tue, 20 Jun 2017 12:49:25 +0530
Abdul Haleem  wrote:

> Hi,
> 
> commit: 347de24 (powerpc/64s: implement arch-specific hardlockup
> watchdog)
> 
> linux-next fails to boot on PowerPC Bare-metal box.
> 
> Test: boot
> Machine type: Power 8 Bare-metal
> Kernel: 4.12.0-rc5-next-20170619
> gcc: version 4.8.5
> 
> 
> In file arch/powerpc/kernel/watchdog.c
> 
> void soft_nmi_interrupt(struct pt_regs *regs)
> {
> unsigned long flags;
> int cpu = raw_smp_processor_id();
> u64 tb;
> 
> if (!cpumask_test_cpu(cpu, &wd_cpus_enabled))
> return;
> 
> >>> nmi_enter();  

Thanks for the report.

This is due to emergency stacks not zeroing preempt_count, so they get
garbage here, and it just trips the BUG_ON(in_nmi()) check.

Don't think it's a bug in the proposed new powerpc watchdog. (at least
I was able to reproduce your bug and fix it by fixing the stack init).

Thanks,
Nick


Re: [RFC v2 02/12] powerpc: Free up four 64K PTE bits in 64K backed hpte pages.

2017-06-20 Thread Anshuman Khandual
On 06/17/2017 09:22 AM, Ram Pai wrote:
> Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6
> in the 64K backed hpte pages. This along with the earlier
> patch will entirely free up the four bits from 64K PTE.
> 
> This patch does the following change to 64K PTE that is
> backed by 64K hpte.
> 
> H_PAGE_F_SECOND which occupied bit 4 moves to the second part
> of the pte.
> H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
> second part of the pte.
> 
> since bit 7 is now freed up, we move H_PAGE_BUSY from bit 9
> to bit 7. Trying to minimize gaps so that contiguous bits
> can be allocated if needed in the future.
> 
> The second part of the PTE will hold
> (H_PAGE_F_SECOND|H_PAGE_F_GIX) at bit 60,61,62,63.

I still dont understand how we freed up the 5th bit which is
used in the 5th patch. Was that bit never used for any thing
on 64K page size (64K and 4K mappings) ?

+#define _RPAGE_RSV50x00040UL

+#define H_PAGE_PKEY_BIT0   _RPAGE_RSV1
+#define H_PAGE_PKEY_BIT1   _RPAGE_RSV2
+#define H_PAGE_PKEY_BIT2   _RPAGE_RSV3
+#define H_PAGE_PKEY_BIT3   _RPAGE_RSV4
+#define H_PAGE_PKEY_BIT4   _RPAGE_RSV5



Re: [RFC v2 01/12] powerpc: Free up four 64K PTE bits in 4K backed hpte pages.

2017-06-20 Thread Anshuman Khandual
On 06/17/2017 09:22 AM, Ram Pai wrote:
> Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6
> in the 4K backed hpte pages. These bits continue to be used
> for 64K backed hpte pages in this patch, but will be freed
> up in the next patch.

The counting 3, 4, 5 and 6 are in BE format I believe, I was
initially trying to see that from right to left as we normally
do in the kernel and was getting confused. So basically these
bits (which are only applicable for 64K mapping IIUC) are going
to be freed up from the PTE format.

#define _RPAGE_RSV1 0x1000UL
#define _RPAGE_RSV2 0x0800UL
#define _RPAGE_RSV3 0x0400UL
#define _RPAGE_RSV4 0x0200UL

As you have mentioned before this feature is available for 64K
page size only and not for 4K mappings. So I assume we support
both the combinations.

* 64K mapping on 64K
* 64K mapping on 4K

These are the current users of the above bits

#define H_PAGE_BUSY _RPAGE_RSV1 /* software: PTE & hash are busy */
#define H_PAGE_F_SECOND _RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */
#define H_PAGE_F_GIX(_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
#define H_PAGE_HASHPTE  _RPAGE_RPN43/* PTE has associated HPTE */

> 
> The patch does the following change to the 64K PTE format
> 
> H_PAGE_BUSY moves from bit 3 to bit 9

and what is in there on bit 9 now ? This ?

#define _RPAGE_SW2  0x00400

which is used as 

#define _PAGE_SPECIAL   _RPAGE_SW2 /* software: special page */

which will not be required any more ?

> H_PAGE_F_SECOND which occupied bit 4 moves to the second part
>   of the pte.
> H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
>   second part of the pte.
> 
> the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
> is  initialized  to  0xF  indicating  an invalid  slot.  If  a hpte
> gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
> released immediately. In  other  words, even  though   0xF   is   a

Release immediately means we attempt again for a new hash slot ?

> valid slot we discard  and consider it as an invalid
> slot;i.e hpte_soft_invalid(). This  gives  us  an opportunity to not
> depend on a bit in the primary PTE in order to determine the
> validity of a slot.

So we have to see the slot number in the second half for each PTE to
figure out if it has got a valid slot in the hash page table.

> 
> When  we  release  ahpte   in the 0xF   slot we also   release a
> legitimate primary   slot  andunmapthat  entry. This  is  to
> ensure  that we do get a   legimate   non-0xF  slot the next time we
> retry for a slot.

Okay.

> 
> Though treating 0xF slot as invalid reduces the number of available
> slots  and  may  have an effect  on the performance, the probabilty
> of hitting a 0xF is extermely low.

Why you say that ? I thought every slot number has the same probability
of hit from the hash function.

> 
> Compared  to the current scheme, the above described scheme reduces
> the number of false hash table updates  significantly  and  has the

How it reduces false hash table updates ?

> added  advantage  of  releasing  four  valuable  PTE bits for other
> purpose.
> 
> This idea was jointly developed by Paul Mackerras, Aneesh, Michael
> Ellermen and myself.
> 
> 4K PTE format remain unchanged currently.
> 
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/include/asm/book3s/64/hash-4k.h  | 20 +++
>  arch/powerpc/include/asm/book3s/64/hash-64k.h | 32 +++
>  arch/powerpc/include/asm/book3s/64/hash.h | 15 +++--
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h |  5 ++
>  arch/powerpc/mm/dump_linuxpagetables.c|  3 +-
>  arch/powerpc/mm/hash64_4k.c   | 14 ++---
>  arch/powerpc/mm/hash64_64k.c  | 81 
> ---
>  arch/powerpc/mm/hash_utils_64.c   | 30 +++---
>  8 files changed, 122 insertions(+), 78 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> index b4b5e6b..5ef1d81 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> @@ -16,6 +16,18 @@
>  #define H_PUD_TABLE_SIZE (sizeof(pud_t) << H_PUD_INDEX_SIZE)
>  #define H_PGD_TABLE_SIZE (sizeof(pgd_t) << H_PGD_INDEX_SIZE)
> 
> +
> +/*
> + * Only supported by 4k linux page size
> + */
> +#define H_PAGE_F_SECOND_RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */
> +#define H_PAGE_F_GIX   (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
> +#define H_PAGE_F_GIX_SHIFT 56
> +
> +#define H_PAGE_BUSY  _RPAGE_RSV1 /* software: PTE & hash are busy */
> +#define H_PAGE_HASHPTE   _RPAGE_RPN43/* PTE has associated HPTE */
> +
> +

So we moved the common 64K definitions here.


>  /* PTE flags to conserve for HPTE identification */
>  #define _PAGE_HPTEFLAGS (H

Re: [RFC v2 00/12] powerpc: Memory Protection Keys

2017-06-20 Thread Benjamin Herrenschmidt
On Tue, 2017-06-20 at 15:10 +1000, Balbir Singh wrote:
> On Fri, 2017-06-16 at 20:52 -0700, Ram Pai wrote:
> > Memory protection keys enable applications to protect its
> > address space from inadvertent access or corruption from
> > itself.
> 
> I presume by itself you mean protection between threads?

Not necessarily. You could have for example a JIT that
when it runs the JITed code, only "opens" the keys for
the VM itself, preventing the JITed code from "leaking out"

There are plenty of other usages...
> 
> > The overall idea:
> > 
> >  A process allocates a   key  and associates it with
> >  a  address  range  withinits   address   space.
> 
> OK, so this is per VMA?
> 
> >  The process  than  can  dynamically  set read/write 
> >  permissions on  the   key   without  involving  the 
> >  kernel.
> 
> This bit is not clear, how can the key be set without
> involving the kernel? I presume you mean the key is set
> in the PTE's and the access protection values can be
> set without involving the kernel?
> 
>  Any  code that  violates   the  permissions
> >  off the address space; as defined by its associated
> >  key, will receive a segmentation fault.
> > 
> > This patch series enables the feature on PPC64.
> > It is enabled on HPTE 64K-page platform.
> > 
> > ISA3.0 section 5.7.13 describes the detailed specifications.
> > 
> > 
> > Testing:
> > This patch series has passed all the protection key
> > tests available in  the selftests directory.
> > The tests are updated to work on both x86 and powerpc.
> 
> Balbir


Re: clean up and modularize arch dma_mapping interface

2017-06-20 Thread Daniel Vetter
On Thu, Jun 08, 2017 at 03:25:25PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> for a while we have a generic implementation of the dma mapping routines
> that call into per-arch or per-device operations.  But right now there
> still are various bits in the interfaces where don't clearly operate
> on these ops.  This series tries to clean up a lot of those (but not all
> yet, but the series is big enough).  It gets rid of the DMA_ERROR_CODE
> way of signaling failures of the mapping routines from the
> implementations to the generic code (and cleans up various drivers that
> were incorrectly using it), and gets rid of the ->set_dma_mask routine
> in favor of relying on the ->dma_capable method that can be used in
> the same way, but which requires less code duplication.
> 
> Btw, we don't seem to have a tree every-growing amount of common dma
> mapping code, and given that I have a fair amount of all over the tree
> work in that area in my plate I'd like to start one.  Any good reason
> to that?  Anyone willing to volunteer as co maintainer?
> 
> The whole series is also available in git:
> 
> git://git.infradead.org/users/hch/misc.git dma-map

Ack for the 2 drm patches, but I can also pick them up through drm-misc if
you prefer that (but then it'll be 4.14).
-Daniel

> 
> Gitweb:
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-map
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[PATCH] powernv/npu-dma.c: Add explicit flush when sending an ATSD

2017-06-20 Thread Alistair Popple
NPU2 requires an extra explicit flush to an active GPU PID when sending
address translation shoot downs (ATSDs) to reliably flush the GPU TLB. This
patch adds just such a flush at the end of each sequence of ATSDs.

We can safely use PID 0 which is always reserved and active on the GPU. PID
0 is only used for init_mm which will never be a user mm on the GPU. To
enforce this we add a check in pnv_npu2_init_context() just in case someone
tries to use PID 0 on the GPU.

Signed-off-by: Alistair Popple 
---

Michael,

It turns out my assumptions about MMU_NO_CONTEXT were wrong so we have
reverted to using HW context id/PID 0 (init_mm) as that is quite clearly
reserved on hash and radix and it seems unlikely PID 0 would ever be used
for anything else.

That said if you feel strongly it would be easy enough to add functions to
reserve a PID and an exported function for device drivers to call to find
out what the reserved PID is. I was avoiding it because it would be more
invasive adding code and an external API for something that I'm not sure
will ever change, although if it does there is a check in
pnv2_npu2_init_context() to flag it so it won't result in weird bugs.

Anyway let me know which way you would like us to go here and I can update
the patch as required, thanks!

- Alistair

arch/powerpc/platforms/powernv/npu-dma.c | 93 ++--
 1 file changed, 64 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
b/arch/powerpc/platforms/powernv/npu-dma.c
index e6f444b..9468064 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -449,7 +449,7 @@ static int mmio_launch_invalidate(struct npu *npu, unsigned 
long launch,
return mmio_atsd_reg;
 }
 
-static int mmio_invalidate_pid(struct npu *npu, unsigned long pid)
+static int mmio_invalidate_pid(struct npu *npu, unsigned long pid, bool flush)
 {
unsigned long launch;
 
@@ -465,12 +465,15 @@ static int mmio_invalidate_pid(struct npu *npu, unsigned 
long pid)
/* PID */
launch |= pid << PPC_BITLSHIFT(38);
 
+   /* No flush */
+   launch |= !flush << PPC_BITLSHIFT(39);
+
/* Invalidating the entire process doesn't use a va */
return mmio_launch_invalidate(npu, launch, 0);
 }
 
 static int mmio_invalidate_va(struct npu *npu, unsigned long va,
-   unsigned long pid)
+   unsigned long pid, bool flush)
 {
unsigned long launch;
 
@@ -486,26 +489,60 @@ static int mmio_invalidate_va(struct npu *npu, unsigned 
long va,
/* PID */
launch |= pid << PPC_BITLSHIFT(38);
 
+   /* No flush */
+   launch |= !flush << PPC_BITLSHIFT(39);
+
return mmio_launch_invalidate(npu, launch, va);
 }
 
 #define mn_to_npu_context(x) container_of(x, struct npu_context, mn)
 
+struct mmio_atsd_reg {
+   struct npu *npu;
+   int reg;
+};
+
+static void mmio_invalidate_wait(
+   struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS], bool flush)
+{
+   struct npu *npu;
+   int i, reg;
+
+   /* Wait for all invalidations to complete */
+   for (i = 0; i <= max_npu2_index; i++) {
+   if (mmio_atsd_reg[i].reg < 0)
+   continue;
+
+   /* Wait for completion */
+   npu = mmio_atsd_reg[i].npu;
+   reg = mmio_atsd_reg[i].reg;
+   while (__raw_readq(npu->mmio_atsd_regs[reg] + XTS_ATSD_STAT))
+   cpu_relax();
+
+   put_mmio_atsd_reg(npu, reg);
+
+   /*
+* The GPU requires two flush ATSDs to ensure all entries have
+* been flushed. We use PID 0 as it will never be used for a
+* process on the GPU.
+*/
+   if (flush)
+   mmio_invalidate_pid(npu, 0, 1);
+   }
+}
+
 /*
  * Invalidate either a single address or an entire PID depending on
  * the value of va.
  */
 static void mmio_invalidate(struct npu_context *npu_context, int va,
-   unsigned long address)
+   unsigned long address, bool flush)
 {
-   int i, j, reg;
+   int i, j;
struct npu *npu;
struct pnv_phb *nphb;
struct pci_dev *npdev;
-   struct {
-   struct npu *npu;
-   int reg;
-   } mmio_atsd_reg[NV_MAX_NPUS];
+   struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS];
unsigned long pid = npu_context->mm->context.id;
 
/*
@@ -525,10 +562,11 @@ static void mmio_invalidate(struct npu_context 
*npu_context, int va,
 
if (va)
mmio_atsd_reg[i].reg =
-   mmio_invalidate_va(npu, address, pid);
+   mmio_invalidate_va(npu, address, pid,
+   flush);
else
  

Re: [RFC v2 06/12] powerpc: Program HPTE key protection bits.

2017-06-20 Thread Anshuman Khandual
On 06/17/2017 09:22 AM, Ram Pai wrote:
> Map the PTE protection key bits to the HPTE key protection bits,
> while creatiing HPTE  entries.
> 
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h | 5 +
>  arch/powerpc/include/asm/pkeys.h  | 7 +++
>  arch/powerpc/mm/hash_utils_64.c   | 5 +
>  3 files changed, 17 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
> b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> index cfb8169..3d7872c 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> @@ -90,6 +90,8 @@
>  #define HPTE_R_PP0   ASM_CONST(0x8000)
>  #define HPTE_R_TSASM_CONST(0x4000)
>  #define HPTE_R_KEY_HIASM_CONST(0x3000)
> +#define HPTE_R_KEY_BIT0  ASM_CONST(0x2000)
> +#define HPTE_R_KEY_BIT1  ASM_CONST(0x1000)
>  #define HPTE_R_RPN_SHIFT 12
>  #define HPTE_R_RPN   ASM_CONST(0x0000)
>  #define HPTE_R_RPN_3_0   ASM_CONST(0x01fff000)
> @@ -104,6 +106,9 @@
>  #define HPTE_R_C ASM_CONST(0x0080)
>  #define HPTE_R_R ASM_CONST(0x0100)
>  #define HPTE_R_KEY_LOASM_CONST(0x0e00)
> +#define HPTE_R_KEY_BIT2  ASM_CONST(0x0800)
> +#define HPTE_R_KEY_BIT3  ASM_CONST(0x0400)
> +#define HPTE_R_KEY_BIT4  ASM_CONST(0x0200)
> 

Should we indicate/document how these 5 bits are not contiguous
in the HPTE format for any given real page ?

>  #define HPTE_V_1TB_SEG   ASM_CONST(0x4000)
>  #define HPTE_V_VRMA_MASK ASM_CONST(0x4001ff00)
> diff --git a/arch/powerpc/include/asm/pkeys.h 
> b/arch/powerpc/include/asm/pkeys.h
> index 0f3dca8..9b6820d 100644
> --- a/arch/powerpc/include/asm/pkeys.h
> +++ b/arch/powerpc/include/asm/pkeys.h
> @@ -27,6 +27,13 @@
>   ((vm_flags & VM_PKEY_BIT3) ? H_PAGE_PKEY_BIT1 : 0x0UL) | \
>   ((vm_flags & VM_PKEY_BIT4) ? H_PAGE_PKEY_BIT0 : 0x0UL))
> 
> +#define calc_pte_to_hpte_pkey_bits(pteflags) \
> + (((pteflags & H_PAGE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL) |\
> + ((pteflags & H_PAGE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) | \
> + ((pteflags & H_PAGE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) | \
> + ((pteflags & H_PAGE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) | \
> + ((pteflags & H_PAGE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL))
> +

We can drop calc_ in here. pte_to_hpte_pkey_bits should be
sufficient.



Re: [RFC v2 07/12] powerpc: Macro the mask used for checking DSI exception

2017-06-20 Thread Anshuman Khandual
On 06/17/2017 09:22 AM, Ram Pai wrote:
> Replace the magic number used to check for DSI exception
> with a meaningful value.
> 
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/include/asm/reg.h   | 9 -
>  arch/powerpc/kernel/exceptions-64s.S | 2 +-
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index 7e50e47..2dcb8a1 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -272,16 +272,23 @@
>  #define SPRN_DAR 0x013   /* Data Address Register */
>  #define SPRN_DBCR0x136   /* e300 Data Breakpoint Control Reg */
>  #define SPRN_DSISR   0x012   /* Data Storage Interrupt Status Register */
> +#define   DSISR_BIT320x8000  /* not defined */
>  #define   DSISR_NOHPTE   0x4000  /* no translation found 
> */
> +#define   DSISR_PAGEATTR_CONFLT  0x2000  /* page attribute 
> conflict */
> +#define   DSISR_BIT350x1000  /* not defined */
>  #define   DSISR_PROTFAULT0x0800  /* protection fault */
>  #define   DSISR_BADACCESS0x0400  /* bad access to CI or G */
>  #define   DSISR_ISSTORE  0x0200  /* access was a store */
>  #define   DSISR_DABRMATCH0x0040  /* hit data breakpoint */
> -#define   DSISR_NOSEGMENT0x0020  /* SLB miss */
>  #define   DSISR_KEYFAULT 0x0020  /* Key fault */
> +#define   DSISR_BIT430x0010  /* not defined */
>  #define   DSISR_UNSUPP_MMU   0x0008  /* Unsupported MMU config */
>  #define   DSISR_SET_RC   0x0004  /* Failed setting of 
> R/C bits */
>  #define   DSISR_PGDIRFAULT  0x0002  /* Fault on page directory */
> +#define   DSISR_PAGE_FAULT_MASK (DSISR_BIT32 | \
> + DSISR_PAGEATTR_CONFLT | \
> + DSISR_BADACCESS |   \
> + DSISR_BIT43)

Sorry missed this one. Seems like there are couple of unnecessary
line additions in the subsequent patch which adds the new PKEY
reason code.

-#define   DSISR_PAGE_FAULT_MASK (DSISR_BIT32 | \
-   DSISR_PAGEATTR_CONFLT | \
-   DSISR_BADACCESS |   \
+#define   DSISR_PAGE_FAULT_MASK (DSISR_BIT32 | \
+   DSISR_PAGEATTR_CONFLT | \
+   DSISR_BADACCESS |   \
+   DSISR_KEYFAULT |\
DSISR_BIT43)





unsubscribe

2017-06-20 Thread Gary Thomas

Sadly, after >20 years


Re: [PATCH V2 1/2] hwmon: (ibmpowernv) introduce a legacy_compatibles array

2017-06-20 Thread Cédric Le Goater
On 06/20/2017 09:15 AM, Shilpasri G Bhat wrote:
> 
> 
> On 06/20/2017 11:36 AM, Cédric Le Goater wrote:
>> On 06/20/2017 07:08 AM, Shilpasri G Bhat wrote:
>>> From: Cédric Le Goater 
>>>
>>> Today, the type of a PowerNV sensor system is determined with the
>>> "compatible" property for legacy Firmwares and with the "sensor-type"
>>> for newer ones. The same array of strings is used for both to do the
>>> matching and this raises some issue to introduce new sensor types.
>>>
>>> Let's introduce two different arrays (legacy and current) to make
>>> things easier for new sensor types.
>>>
>>> Signed-off-by: Cédric Le Goater 
>>> Tested-by: Shilpasri G Bhat 
>>
>> Did you test on a Tuleta (IBM Power) system ? 
> 
> I have tested this patch on P9 FSP and Firestone.

OK. I just gave it a try on a Tuleta, P8 FSP, IBM Power system
Looks good.

Thanks,

C.

> 
>>
>> Thanks,
>>
>> C. 
>>
>>> ---
>>>  drivers/hwmon/ibmpowernv.c | 26 ++
>>>  1 file changed, 18 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/hwmon/ibmpowernv.c b/drivers/hwmon/ibmpowernv.c
>>> index 862b832..6d8909c 100644
>>> --- a/drivers/hwmon/ibmpowernv.c
>>> +++ b/drivers/hwmon/ibmpowernv.c
>>> @@ -55,17 +55,27 @@ enum sensors {
>>>  
>>>  #define INVALID_INDEX (-1U)
>>>  
>>> +/*
>>> + * 'compatible' string properties for sensor types as defined in old
>>> + * PowerNV firmware (skiboot). These are ordered as 'enum sensors'.
>>> + */
>>> +static const char * const legacy_compatibles[] = {
>>> +   "ibm,opal-sensor-cooling-fan",
>>> +   "ibm,opal-sensor-amb-temp",
>>> +   "ibm,opal-sensor-power-supply",
>>> +   "ibm,opal-sensor-power"
>>> +};
>>> +
>>>  static struct sensor_group {
>>> -   const char *name;
>>> -   const char *compatible;
>>> +   const char *name; /* matches property 'sensor-type' */
>>> struct attribute_group group;
>>> u32 attr_count;
>>> u32 hwmon_index;
>>>  } sensor_groups[] = {
>>> -   {"fan", "ibm,opal-sensor-cooling-fan"},
>>> -   {"temp", "ibm,opal-sensor-amb-temp"},
>>> -   {"in", "ibm,opal-sensor-power-supply"},
>>> -   {"power", "ibm,opal-sensor-power"}
>>> +   { "fan"   },
>>> +   { "temp"  },
>>> +   { "in"},
>>> +   { "power" }
>>>  };
>>>  
>>>  struct sensor_data {
>>> @@ -239,8 +249,8 @@ static int get_sensor_type(struct device_node *np)
>>> enum sensors type;
>>> const char *str;
>>>  
>>> -   for (type = 0; type < MAX_SENSOR_TYPE; type++) {
>>> -   if (of_device_is_compatible(np, sensor_groups[type].compatible))
>>> +   for (type = 0; type < ARRAY_SIZE(legacy_compatibles); type++) {
>>> +   if (of_device_is_compatible(np, legacy_compatibles[type]))
>>> return type;
>>> }
>>>  
>>>
>>
> 



[PATCH] powerpc/time: Fix tracing in time.c

2017-06-20 Thread Santosh Sivaraj
Since trace_clock is in a different file and already marked with notrace,
enable tracing in time.c by removing it from the disabled list in Makefile.
Also annotate clocksource read functions and sched_clock with notrace.

Testing: Timer and ftrace selftests run with different trace clocks.

Acked-by: Naveen N. Rao 
Signed-off-by: Santosh Sivaraj 
---
 arch/powerpc/kernel/Makefile | 2 --
 arch/powerpc/kernel/time.c   | 6 +++---
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index e132902..0845eeb 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -25,8 +25,6 @@ CFLAGS_REMOVE_cputable.o = -mno-sched-epilog 
$(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_prom_init.o = -mno-sched-epilog $(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_btext.o = -mno-sched-epilog $(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_prom.o = -mno-sched-epilog $(CC_FLAGS_FTRACE)
-# timers used by tracing
-CFLAGS_REMOVE_time.o = -mno-sched-epilog $(CC_FLAGS_FTRACE)
 endif
 
 obj-y  := cputable.o ptrace.o syscalls.o \
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 2b33cfa..896ba1a 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -675,7 +675,7 @@ EXPORT_SYMBOL_GPL(tb_to_ns);
  * the high 64 bits of a * b, i.e. (a * b) >> 64, where a and b
  * are 64-bit unsigned numbers.
  */
-unsigned long long sched_clock(void)
+notrace unsigned long long sched_clock(void)
 {
if (__USE_RTC())
return get_rtc();
@@ -823,12 +823,12 @@ void read_persistent_clock(struct timespec *ts)
 }
 
 /* clocksource code */
-static u64 rtc_read(struct clocksource *cs)
+static notrace u64 rtc_read(struct clocksource *cs)
 {
return (u64)get_rtc();
 }
 
-static u64 timebase_read(struct clocksource *cs)
+static notrace u64 timebase_read(struct clocksource *cs)
 {
return (u64)get_tb();
 }
-- 
2.9.4



Re: [RFC v2 08/12] powerpc: Handle exceptions caused by violation of pkey protection.

2017-06-20 Thread Anshuman Khandual
On 06/17/2017 09:22 AM, Ram Pai wrote:
> Handle Data and Instruction exceptions caused by memory
> protection-key.
> 
> Signed-off-by: Ram Pai 
> (cherry picked from commit a5e5217619a0c475fe0cacc3b0cf1d3d33c79a09)

To which tree this commit belongs to ?

> 
> Conflicts:
>   arch/powerpc/include/asm/reg.h
>   arch/powerpc/kernel/exceptions-64s.S
> ---
>  arch/powerpc/include/asm/mmu_context.h | 12 +
>  arch/powerpc/include/asm/pkeys.h   |  9 
>  arch/powerpc/include/asm/reg.h |  7 +--
>  arch/powerpc/mm/fault.c| 21 +++-
>  arch/powerpc/mm/pkeys.c| 90 
> ++
>  5 files changed, 134 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h 
> b/arch/powerpc/include/asm/mmu_context.h
> index da7e943..71fffe0 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -175,11 +175,23 @@ static inline void arch_bprm_mm_init(struct mm_struct 
> *mm,
>  {
>  }
> 
> +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
> +bool arch_pte_access_permitted(pte_t pte, bool write);
> +bool arch_vma_access_permitted(struct vm_area_struct *vma,
> + bool write, bool execute, bool foreign);
> +#else /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
> +static inline bool arch_pte_access_permitted(pte_t pte, bool write)
> +{
> + /* by default, allow everything */
> + return true;
> +}
>  static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
>   bool write, bool execute, bool foreign)
>  {
>   /* by default, allow everything */
>   return true;
>  }

Right, these are the two functions the core VM expects the
arch to provide.

> +#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
> +
>  #endif /* __KERNEL__ */
>  #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
> diff --git a/arch/powerpc/include/asm/pkeys.h 
> b/arch/powerpc/include/asm/pkeys.h
> index 9b6820d..405e7db 100644
> --- a/arch/powerpc/include/asm/pkeys.h
> +++ b/arch/powerpc/include/asm/pkeys.h
> @@ -14,6 +14,15 @@
>   VM_PKEY_BIT3 | \
>   VM_PKEY_BIT4)
> 
> +static inline u16 pte_flags_to_pkey(unsigned long pte_flags)
> +{
> + return ((pte_flags & H_PAGE_PKEY_BIT4) ? 0x1 : 0x0) |
> + ((pte_flags & H_PAGE_PKEY_BIT3) ? 0x2 : 0x0) |
> + ((pte_flags & H_PAGE_PKEY_BIT2) ? 0x4 : 0x0) |
> + ((pte_flags & H_PAGE_PKEY_BIT1) ? 0x8 : 0x0) |
> + ((pte_flags & H_PAGE_PKEY_BIT0) ? 0x10 : 0x0);
> +}

Add defines for the above 0x1, 0x2, 0x4, 0x8 etc ?

> +
>  #define pkey_to_vmflag_bits(key) (((key & 0x1UL) ? VM_PKEY_BIT0 : 0x0UL) | \
>   ((key & 0x2UL) ? VM_PKEY_BIT1 : 0x0UL) |\
>   ((key & 0x4UL) ? VM_PKEY_BIT2 : 0x0UL) |\
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index 2dcb8a1..a11977f 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -285,9 +285,10 @@
>  #define   DSISR_UNSUPP_MMU   0x0008  /* Unsupported MMU config */
>  #define   DSISR_SET_RC   0x0004  /* Failed setting of 
> R/C bits */
>  #define   DSISR_PGDIRFAULT  0x0002  /* Fault on page directory */
> -#define   DSISR_PAGE_FAULT_MASK (DSISR_BIT32 | \
> - DSISR_PAGEATTR_CONFLT | \
> - DSISR_BADACCESS |   \
> +#define   DSISR_PAGE_FAULT_MASK (DSISR_BIT32 |   \
> + DSISR_PAGEATTR_CONFLT | \
> + DSISR_BADACCESS |   \
> + DSISR_KEYFAULT |\
>   DSISR_BIT43)

This should have been cleaned up before adding new
DSISR_KEYFAULT reason code into it. But I guess its
okay.

>  #define SPRN_TBRL0x10C   /* Time Base Read Lower Register (user, R/O) */
>  #define SPRN_TBRU0x10D   /* Time Base Read Upper Register (user, R/O) */
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 3a7d580..c31624f 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -216,9 +216,10 @@ int do_page_fault(struct pt_regs *regs, unsigned long 
> address,
>* bits we are interested in.  But there are some bits which
>* indicate errors in DSISR but can validly be set in SRR1.
>*/
> - if (trap == 0x400)
> + if (trap == 0x400) {
>   error_code &= 0x4820;
> - else
> + flags |= FAULT_FLAG_INSTRUCTION;
> + } else
>   is_write = error_code & DSISR_ISSTORE;
>  #else

Why adding the FAULT_FLAG_INSTRUCTION here ?

>   is_write = error_code & ESR_DST;
> @@ -261,6 +262,13 @@ int do_page_fault(struct pt_regs *regs, unsigned long 
> address,
>   }
>  #endif
> 
> +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
> + if (error_code & DSISR_KEYFAULT) {
> + code = SEGV_PKUERR;
> +  

[BUG][next-20170619][347de24] PowerPC boot fails with Oops

2017-06-20 Thread Abdul Haleem
Hi,

commit: 347de24 (powerpc/64s: implement arch-specific hardlockup
watchdog)

linux-next fails to boot on PowerPC Bare-metal box.

Test: boot
Machine type: Power 8 Bare-metal
Kernel: 4.12.0-rc5-next-20170619
gcc: version 4.8.5


In file arch/powerpc/kernel/watchdog.c

void soft_nmi_interrupt(struct pt_regs *regs)
{
unsigned long flags;
int cpu = raw_smp_processor_id();
u64 tb;

if (!cpumask_test_cpu(cpu, &wd_cpus_enabled))
return;

>>> nmi_enter();
tb = get_tb();



commit 347de24231df9f82969e2de3ad9f6976f1856a0f
Author: Nicholas Piggin 
Date:   Sat Jun 17 09:33:56 2017 +1000

powerpc/64s: implement arch-specific hardlockup watchdog

Implement an arch-speicfic watchdog rather than use the perf-based
hardlockup detector.

The new watchdog takes the soft-NMI directly, rather than going
through
perf.  Perf interrupts are to be made maskable in future, so that
would
prevent the perf detector from working in those regions.



boot logs:
--
cpuidle: using governor menu
pstore: using zlib compression
pstore: Registered nvram as persistent store backend
[ cut here ]
kernel BUG at arch/powerpc/kernel/watchdog.c:206!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2048 
NUMA 
PowerNV
Modules linked in:
CPU: 67 PID: 0 Comm: swapper/67 Not tainted 4.12.0-rc5-next-20170619 #1
task: c00f272be700 task.stack: c00f2736c000
NIP: c002c5fc LR: c002c5e8 CTR: c016f570
REGS: c0003fcd7a00 TRAP: 0700   Not tainted
(4.12.0-rc5-next-20170619)
MSR: 90021033 
  CR: 22004022  XER: 2000  
CFAR: c0149c6c SOFTE: 0 
GPR00: c002c5e8 c0003fcd7c80 c105e900
 
GPR04:  00073388 c00fff7cf014
 
GPR08: 000ffea9 0010 4000
 
GPR12: 90009033 cfd57080 c00f2736ff90
 
GPR16:   40376a80
40376ac8 
GPR20: c00ffe63 0001 0002
 
GPR24:  c00f2736c000 c00f2736c080
0008 
GPR28: c0003fcd7d80 0003 0008
0043 
NIP [c002c5fc] soft_nmi_interrupt+0x9c/0x2e0
LR [c002c5e8] soft_nmi_interrupt+0x88/0x2e0
Call Trace:
Instruction dump:
eba1ffe8 ebc1fff0 ebe1fff8 7c0803a6 4e800020 7c7c1b78 4811d615 6000 
78290464 8129000c 552902d6 79290020 <0b09> 78290464 8149000c
3d4a0011 
[ cut here ]
kernel BUG at arch/powerpc/kernel/watchdog.c:206!
[ cut here ]
kernel BUG at arch/powerpc/kernel/watchdog.c:206!
[ cut here ]
kernel BUG at arch/powerpc/kernel/watchdog.c:206!
[ cut here ]
kernel BUG at arch/powerpc/kernel/watchdog.c:206!
random: print_oops_end_marker+0x6c/0xa0 get_random_bytes called with
crng_init=0
---[ end trace 9756c1a885c69f33 ]---
-- 


Regard's

Abdul Haleem
IBM Linux Technology Centre


 kernel:kexec: Starting new kernel
[  205.035822] kexec: waiting for cpu 48 (physical 168) to enter 1 state
[  205.035955] kexec: waiting for cpu 0 (physical 32) to enter OPAL
[  205.036870] kexec: waiting for cpu 2 (physical 34) to enter OPAL
[  205.037038] kexec: waiting for cpu 21 (physical 53) to enter OPAL
[0.00] opal: OPAL detected !
[0.00] Page sizes from device-tree:
[0.00] base_shift=12: shift=12, sllp=0x, avpnm=0x, 
tlbiel=1, penc=0
[0.00] base_shift=12: shift=16, sllp=0x, avpnm=0x, 
tlbiel=1, penc=7
[0.00] base_shift=12: shift=24, sllp=0x, avpnm=0x, 
tlbiel=1, penc=56
[0.00] base_shift=16: shift=16, sllp=0x0110, avpnm=0x, 
tlbiel=1, penc=1
[0.00] base_shift=16: shift=24, sllp=0x0110, avpnm=0x, 
tlbiel=1, penc=8
[0.00] base_shift=24: shift=24, sllp=0x0100, avpnm=0x0001, 
tlbiel=0, penc=0
[0.00] base_shift=34: shift=34, sllp=0x0120, avpnm=0x07ff, 
tlbiel=0, penc=3
[0.00] Using 1TB segments
[0.00] Initializing hash mmu with SLB
[0.00] Linux version 4.12.0-rc5-next-20170619 
(r...@ltc-test-ci3.aus.stglabs.ibm.com) (gcc version 4.8.5 20150623 (Red Hat 
4.8.5-11) (GCC) ) #1 SMP Tue Jun 20 12:17:53 IST 2017
[0.00] Found initrd at 0xc291:0xc3bea97a
[0.00] Using PowerNV machine description
[0.00] bootconsole [udbg0] enabled
[0.00] CPU maps initialized for 8 threads per core
 -> smp_release_cpus()
spinning_secondaries = 79
 <- smp_release_cpus()
[0.00] -
[0.00] ppc64_pft_size= 0x0
[0.00] phys_mem_size = 0x10
[0.00] dcache_bsize  = 0x80
[0.00] icache_bsize  = 0x80
[0.00] cpu_features  = 0x17fc7aed18500249
[0.00]   possible= 0x5fff

Re: [PATCH V2 1/2] hwmon: (ibmpowernv) introduce a legacy_compatibles array

2017-06-20 Thread Shilpasri G Bhat


On 06/20/2017 11:36 AM, Cédric Le Goater wrote:
> On 06/20/2017 07:08 AM, Shilpasri G Bhat wrote:
>> From: Cédric Le Goater 
>>
>> Today, the type of a PowerNV sensor system is determined with the
>> "compatible" property for legacy Firmwares and with the "sensor-type"
>> for newer ones. The same array of strings is used for both to do the
>> matching and this raises some issue to introduce new sensor types.
>>
>> Let's introduce two different arrays (legacy and current) to make
>> things easier for new sensor types.
>>
>> Signed-off-by: Cédric Le Goater 
>> Tested-by: Shilpasri G Bhat 
> 
> Did you test on a Tuleta (IBM Power) system ? 

I have tested this patch on P9 FSP and Firestone.

> 
> Thanks,
> 
> C. 
> 
>> ---
>>  drivers/hwmon/ibmpowernv.c | 26 ++
>>  1 file changed, 18 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/hwmon/ibmpowernv.c b/drivers/hwmon/ibmpowernv.c
>> index 862b832..6d8909c 100644
>> --- a/drivers/hwmon/ibmpowernv.c
>> +++ b/drivers/hwmon/ibmpowernv.c
>> @@ -55,17 +55,27 @@ enum sensors {
>>  
>>  #define INVALID_INDEX (-1U)
>>  
>> +/*
>> + * 'compatible' string properties for sensor types as defined in old
>> + * PowerNV firmware (skiboot). These are ordered as 'enum sensors'.
>> + */
>> +static const char * const legacy_compatibles[] = {
>> +"ibm,opal-sensor-cooling-fan",
>> +"ibm,opal-sensor-amb-temp",
>> +"ibm,opal-sensor-power-supply",
>> +"ibm,opal-sensor-power"
>> +};
>> +
>>  static struct sensor_group {
>> -const char *name;
>> -const char *compatible;
>> +const char *name; /* matches property 'sensor-type' */
>>  struct attribute_group group;
>>  u32 attr_count;
>>  u32 hwmon_index;
>>  } sensor_groups[] = {
>> -{"fan", "ibm,opal-sensor-cooling-fan"},
>> -{"temp", "ibm,opal-sensor-amb-temp"},
>> -{"in", "ibm,opal-sensor-power-supply"},
>> -{"power", "ibm,opal-sensor-power"}
>> +{ "fan"   },
>> +{ "temp"  },
>> +{ "in"},
>> +{ "power" }
>>  };
>>  
>>  struct sensor_data {
>> @@ -239,8 +249,8 @@ static int get_sensor_type(struct device_node *np)
>>  enum sensors type;
>>  const char *str;
>>  
>> -for (type = 0; type < MAX_SENSOR_TYPE; type++) {
>> -if (of_device_is_compatible(np, sensor_groups[type].compatible))
>> +for (type = 0; type < ARRAY_SIZE(legacy_compatibles); type++) {
>> +if (of_device_is_compatible(np, legacy_compatibles[type]))
>>  return type;
>>  }
>>  
>>
> 



Re: [RFC PATCH 0/7 v1] powerpc: Memory Protection Keys

2017-06-20 Thread Pavel Machek
Hi!

> Memory protection keys enable applications to protect its
> address space from inadvertent access or corruption from
> itself.
> 
> The overall idea:
> 
>  A process allocates a   key  and associates it with
>  a  address  range  withinits   address   space.
>  The process  than  can  dynamically  set read/write 
>  permissions on  the   key   without  involving  the 
>  kernel. Any  code that  violates   the  permissions
>  off the address space; as defined by its associated
>  key, will receive a segmentation fault.

Do you have some documentation how userspace should use this? Will it
be possible to hide details in libc so that it works across
architectures? Do you have some kind of library that hides them?

Where would you like it to be used? Web browsers?

How does it interact with ptrace()? With /dev/mem? With /proc/XXX/mem?
Will it enable malware to become very hard to understand?

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH] powerpc/time: Fix tracing in time.c

2017-06-20 Thread Naveen N . Rao
On 2017/06/20 11:50AM, Santosh Sivaraj wrote:
> Since trace_clock is in a different file and already marked with notrace,
> enable tracing in time.c by removing it from the disabled list in Makefile.
> Also annotate clocksource read functions and sched_clock with notrace.
> 
> Testing: Timer and ftrace selftests run with different trace clocks.
> 
> CC: Naveen N. Rao 
> Signed-off-by: Santosh Sivaraj 

Thanks for doing this! Apart from the minor nit below:
Acked-by: Naveen N. Rao 

> ---
>  arch/powerpc/kernel/Makefile | 2 --
>  arch/powerpc/kernel/time.c   | 6 +++---
>  2 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index e132902..0845eeb 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -25,8 +25,6 @@ CFLAGS_REMOVE_cputable.o = -mno-sched-epilog 
> $(CC_FLAGS_FTRACE)
>  CFLAGS_REMOVE_prom_init.o = -mno-sched-epilog $(CC_FLAGS_FTRACE)
>  CFLAGS_REMOVE_btext.o = -mno-sched-epilog $(CC_FLAGS_FTRACE)
>  CFLAGS_REMOVE_prom.o = -mno-sched-epilog $(CC_FLAGS_FTRACE)
> -# timers used by tracing
> -CFLAGS_REMOVE_time.o = -mno-sched-epilog $(CC_FLAGS_FTRACE)
>  endif
> 
>  obj-y:= cputable.o ptrace.o syscalls.o \
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index 2b33cfa..6d10c5f 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -675,7 +675,7 @@ EXPORT_SYMBOL_GPL(tb_to_ns);
>   * the high 64 bits of a * b, i.e. (a * b) >> 64, where a and b
>   * are 64-bit unsigned numbers.
>   */
> -unsigned long long sched_clock(void)
> +unsigned long long notrace sched_clock(void)

For the sake of consistency, it's probably better to add the notrace 
annotation before the return values, though I see that the prototype in 
include/sched.h has used this order.

- Naveen

>  {
>   if (__USE_RTC())
>   return get_rtc();
> @@ -823,12 +823,12 @@ void read_persistent_clock(struct timespec *ts)
>  }
> 
>  /* clocksource code */
> -static u64 rtc_read(struct clocksource *cs)
> +static notrace u64 rtc_read(struct clocksource *cs)
>  {
>   return (u64)get_rtc();
>  }
> 
> -static u64 timebase_read(struct clocksource *cs)
> +static notrace u64 timebase_read(struct clocksource *cs)
>  {
>   return (u64)get_tb();
>  }
> -- 
> 2.9.4
>