Re: [RFC v2 07/12] powerpc: Macro the mask used for checking DSI exception
On Tue, Jun 20, 2017 at 01:44:25PM +0530, Anshuman Khandual wrote: > On 06/17/2017 09:22 AM, Ram Pai wrote: > > Replace the magic number used to check for DSI exception > > with a meaningful value. > > > > Signed-off-by: Ram Pai > > --- > > arch/powerpc/include/asm/reg.h | 9 - > > arch/powerpc/kernel/exceptions-64s.S | 2 +- > > 2 files changed, 9 insertions(+), 2 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h > > index 7e50e47..2dcb8a1 100644 > > --- a/arch/powerpc/include/asm/reg.h > > +++ b/arch/powerpc/include/asm/reg.h > > @@ -272,16 +272,23 @@ > > #define SPRN_DAR 0x013 /* Data Address Register */ > > #define SPRN_DBCR 0x136 /* e300 Data Breakpoint Control Reg */ > > #define SPRN_DSISR 0x012 /* Data Storage Interrupt Status Register */ > > +#define DSISR_BIT32 0x8000 /* not defined */ > > #define DSISR_NOHPTE 0x4000 /* no translation found > > */ > > +#define DSISR_PAGEATTR_CONFLT0x2000 /* page attribute > > conflict */ > > +#define DSISR_BIT35 0x1000 /* not defined */ > > #define DSISR_PROTFAULT 0x0800 /* protection fault */ > > #define DSISR_BADACCESS 0x0400 /* bad access to CI or G */ > > #define DSISR_ISSTORE0x0200 /* access was a store */ > > #define DSISR_DABRMATCH 0x0040 /* hit data breakpoint */ > > -#define DSISR_NOSEGMENT 0x0020 /* SLB miss */ > > #define DSISR_KEYFAULT 0x0020 /* Key fault */ > > +#define DSISR_BIT43 0x0010 /* not defined */ > > #define DSISR_UNSUPP_MMU 0x0008 /* Unsupported MMU config */ > > #define DSISR_SET_RC 0x0004 /* Failed setting of > > R/C bits */ > > #define DSISR_PGDIRFAULT 0x0002 /* Fault on page directory > > */ > > +#define DSISR_PAGE_FAULT_MASK (DSISR_BIT32 | \ > > + DSISR_PAGEATTR_CONFLT | \ > > + DSISR_BADACCESS | \ > > + DSISR_BIT43) > > Sorry missed this one. Seems like there are couple of unnecessary > line additions in the subsequent patch which adds the new PKEY > reason code. > > -#define DSISR_PAGE_FAULT_MASK (DSISR_BIT32 | \ > - DSISR_PAGEATTR_CONFLT | \ > - DSISR_BADACCESS | \ > +#define DSISR_PAGE_FAULT_MASK (DSISR_BIT32 | \ > + DSISR_PAGEATTR_CONFLT | \ > + DSISR_BADACCESS | \ > + DSISR_KEYFAULT |\ > DSISR_BIT43) i like to see them separately, one per line. But than you are right. that is not the convention in this file. So will change it accordingly. thanks, RP > > -- Ram Pai
Re: [RFC v2 06/12] powerpc: Program HPTE key protection bits.
On Tue, Jun 20, 2017 at 01:51:45PM +0530, Anshuman Khandual wrote: > On 06/17/2017 09:22 AM, Ram Pai wrote: > > Map the PTE protection key bits to the HPTE key protection bits, > > while creatiing HPTE entries. > > > > Signed-off-by: Ram Pai > > --- > > arch/powerpc/include/asm/book3s/64/mmu-hash.h | 5 + > > arch/powerpc/include/asm/pkeys.h | 7 +++ > > arch/powerpc/mm/hash_utils_64.c | 5 + > > 3 files changed, 17 insertions(+) > > > > diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h > > b/arch/powerpc/include/asm/book3s/64/mmu-hash.h > > index cfb8169..3d7872c 100644 > > --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h > > +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h > > @@ -90,6 +90,8 @@ > > #define HPTE_R_PP0 ASM_CONST(0x8000) > > #define HPTE_R_TS ASM_CONST(0x4000) > > #define HPTE_R_KEY_HI ASM_CONST(0x3000) > > +#define HPTE_R_KEY_BIT0ASM_CONST(0x2000) > > +#define HPTE_R_KEY_BIT1ASM_CONST(0x1000) > > #define HPTE_R_RPN_SHIFT 12 > > #define HPTE_R_RPN ASM_CONST(0x0000) > > #define HPTE_R_RPN_3_0 ASM_CONST(0x01fff000) > > @@ -104,6 +106,9 @@ > > #define HPTE_R_C ASM_CONST(0x0080) > > #define HPTE_R_R ASM_CONST(0x0100) > > #define HPTE_R_KEY_LO ASM_CONST(0x0e00) > > +#define HPTE_R_KEY_BIT2ASM_CONST(0x0800) > > +#define HPTE_R_KEY_BIT3ASM_CONST(0x0400) > > +#define HPTE_R_KEY_BIT4ASM_CONST(0x0200) > > > > Should we indicate/document how these 5 bits are not contiguous > in the HPTE format for any given real page ? I can, but its all well documented in the ISA. Infact all the bits and the macros are one to one translation from the ISA. > > > #define HPTE_V_1TB_SEG ASM_CONST(0x4000) > > #define HPTE_V_VRMA_MASK ASM_CONST(0x4001ff00) > > diff --git a/arch/powerpc/include/asm/pkeys.h > > b/arch/powerpc/include/asm/pkeys.h > > index 0f3dca8..9b6820d 100644 > > --- a/arch/powerpc/include/asm/pkeys.h > > +++ b/arch/powerpc/include/asm/pkeys.h > > @@ -27,6 +27,13 @@ > > ((vm_flags & VM_PKEY_BIT3) ? H_PAGE_PKEY_BIT1 : 0x0UL) | \ > > ((vm_flags & VM_PKEY_BIT4) ? H_PAGE_PKEY_BIT0 : 0x0UL)) > > > > +#define calc_pte_to_hpte_pkey_bits(pteflags) \ > > + (((pteflags & H_PAGE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL) |\ > > + ((pteflags & H_PAGE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) | \ > > + ((pteflags & H_PAGE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) | \ > > + ((pteflags & H_PAGE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) | \ > > + ((pteflags & H_PAGE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL)) > > + > > We can drop calc_ in here. pte_to_hpte_pkey_bits should be > sufficient. ok. will do. thanks for your comments, RP
Re: [RFC v2 02/12] powerpc: Free up four 64K PTE bits in 64K backed hpte pages.
On Tue, Jun 20, 2017 at 04:21:45PM +0530, Anshuman Khandual wrote: > On 06/17/2017 09:22 AM, Ram Pai wrote: > > Rearrange 64K PTE bits to free up bits 3, 4, 5 and 6 > > in the 64K backed hpte pages. This along with the earlier > > patch will entirely free up the four bits from 64K PTE. > > > > This patch does the following change to 64K PTE that is > > backed by 64K hpte. > > > > H_PAGE_F_SECOND which occupied bit 4 moves to the second part > > of the pte. > > H_PAGE_F_GIX which occupied bit 5, 6 and 7 also moves to the > > second part of the pte. > > > > since bit 7 is now freed up, we move H_PAGE_BUSY from bit 9 > > to bit 7. Trying to minimize gaps so that contiguous bits > > can be allocated if needed in the future. > > > > The second part of the PTE will hold > > (H_PAGE_F_SECOND|H_PAGE_F_GIX) at bit 60,61,62,63. > > I still dont understand how we freed up the 5th bit which is > used in the 5th patch. Was that bit never used for any thing > on 64K page size (64K and 4K mappings) ? yes. it was not used. So I gladly used it :-) RP
Re: [RFC v2 01/12] powerpc: Free up four 64K PTE bits in 4K backed hpte pages.
On Tue, Jun 20, 2017 at 03:50:25PM +0530, Anshuman Khandual wrote: > On 06/17/2017 09:22 AM, Ram Pai wrote: > > Rearrange 64K PTE bits to free up bits 3, 4, 5 and 6 > > in the 4K backed hpte pages. These bits continue to be used > > for 64K backed hpte pages in this patch, but will be freed > > up in the next patch. > > The counting 3, 4, 5 and 6 are in BE format I believe, I was > initially trying to see that from right to left as we normally > do in the kernel and was getting confused. So basically these > bits (which are only applicable for 64K mapping IIUC) are going > to be freed up from the PTE format. > > #define _RPAGE_RSV1 0x1000UL > #define _RPAGE_RSV2 0x0800UL > #define _RPAGE_RSV3 0x0400UL > #define _RPAGE_RSV4 0x0200UL > > As you have mentioned before this feature is available for 64K > page size only and not for 4K mappings. So I assume we support > both the combinations. > > * 64K mapping on 64K > * 64K mapping on 4K yes. > > These are the current users of the above bits > > #define H_PAGE_BUSY _RPAGE_RSV1 /* software: PTE & hash are busy */ > #define H_PAGE_F_SECOND _RPAGE_RSV2 /* HPTE is in 2ndary > HPTEG */ > #define H_PAGE_F_GIX (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44) > #define H_PAGE_HASHPTE_RPAGE_RPN43/* PTE has associated > HPTE */ > > > > > The patch does the following change to the 64K PTE format > > > > H_PAGE_BUSY moves from bit 3 to bit 9 > > and what is in there on bit 9 now ? This ? > > #define _RPAGE_SW20x00400 > > which is used as > > #define _PAGE_SPECIAL _RPAGE_SW2 /* software: special page */ > > which will not be required any more ? i think you are reading bit 9 from right to left. the bit 9 i refer to is from left to right. Using the same numbering convention the ISA3.0 uses. I know it is confusing, will make a mention in the comment of this patch, to read it the big-endian way. BTW: Bit 9 is not used currently. so using it in this patch. But this is a temporary move. the H_PAGE_BUSY will move to bit 7 in the next patch. Had to keep at bit 9, because bit 7 is not yet entirely freed up. it is used by 64K PTE backed by 64k htpe. > > > H_PAGE_F_SECOND which occupied bit 4 moves to the second part > > of the pte. > > H_PAGE_F_GIX which occupied bit 5, 6 and 7 also moves to the > > second part of the pte. > > > > the four bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot > > is initialized to 0xF indicating an invalid slot. If a hpte > > gets cached in a 0xF slot(i.e 7th slot of secondary), it is > > released immediately. In other words, even though 0xF is a > > Release immediately means we attempt again for a new hash slot ? yes. > > > valid slot we discard and consider it as an invalid > > slot;i.e hpte_soft_invalid(). This gives us an opportunity to not > > depend on a bit in the primary PTE in order to determine the > > validity of a slot. > > So we have to see the slot number in the second half for each PTE to > figure out if it has got a valid slot in the hash page table. yes. > > > > > When we release ahpte in the 0xF slot we also release a > > legitimate primary slot andunmapthat entry. This is to > > ensure that we do get a legimate non-0xF slot the next time we > > retry for a slot. > > Okay. > > > > > Though treating 0xF slot as invalid reduces the number of available > > slots and may have an effect on the performance, the probabilty > > of hitting a 0xF is extermely low. > > Why you say that ? I thought every slot number has the same probability > of hit from the hash function. Every hash bucket has the same probability. But every slot within the hash bucket is filled in sequentially. so it takes 15 hptes to hash to the same bucket before we get to the 15th slot in the secondary. > > > > > Compared to the current scheme, the above described scheme reduces > > the number of false hash table updates significantly and has the > > How it reduces false hash table updates ? earlier, we had 1 bit allocated in the first-part-of-the 64K-PTE for four consecutive 4K hptes. If any one 4k hpte got hashed-in, the bit got set. Which means anytime it faulted on the remaining three 4k hpte, we saw the bit already set and tried to erroneously update that hpte. So we had a 75% update error rate. Funcationally not bad, but bad from a performance point of view. With the current scheme, we decide if a 4k slot is valid by looking at its value rather than depending on a bit in the main-pte. So there is no chance of getting mislead. And hence no chance of trying to update a invalid hpte. Should improve performance and at the same time give us four valuable PTE bits. > > > added advantage of releasing four valuable PTE bits for other > > purpose. > > > > This idea was jointly
Re: [RFC v2 03/12] powerpc: Implement sys_pkey_alloc and sys_pkey_free system call.
On Mon, Jun 19, 2017 at 10:18:01PM +1000, Michael Ellerman wrote: > Hi Ram, > > Ram Pai writes: > > Sys_pkey_alloc() allocates and returns available pkey > > Sys_pkey_free() frees up the pkey. > > > > Total 32 keys are supported on powerpc. However pkey 0,1 and 31 > > are reserved. So effectively we have 29 pkeys. > > > > Signed-off-by: Ram Pai > > --- > > include/linux/mm.h | 31 --- > > include/uapi/asm-generic/mman-common.h | 2 +- > > Those changes need to be split out and acked by mm folks. > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 7cb17c6..34ddac7 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -204,26 +204,35 @@ extern int overcommit_kbytes_handler(struct ctl_table > > *, int, void __user *, > > #define VM_MERGEABLE 0x8000 /* KSM may merge identical > > pages */ > > > > #ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS > > -#define VM_HIGH_ARCH_BIT_0 32 /* bit only usable on 64-bit > > architectures */ > > -#define VM_HIGH_ARCH_BIT_1 33 /* bit only usable on 64-bit > > architectures */ > > -#define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64-bit > > architectures */ > > -#define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-bit > > architectures */ > > +#define VM_HIGH_ARCH_BIT_0 32 /* bit only usable on 64-bit arch */ > > +#define VM_HIGH_ARCH_BIT_1 33 /* bit only usable on 64-bit arch */ > > +#define VM_HIGH_ARCH_BIT_2 34 /* bit only usable on 64-bit arch */ > > +#define VM_HIGH_ARCH_BIT_3 35 /* bit only usable on 64-bit arch */ > > Please don't change the comments, it makes the diff harder to read. The lines were surpassing 80 columns. tried to compress the comments without loosing meaning. will restore. > > You're actually just adding this AFAICS: > > > +#define VM_HIGH_ARCH_BIT_4 36 /* bit only usable on 64-bit arch */ > > > #define VM_HIGH_ARCH_0 BIT(VM_HIGH_ARCH_BIT_0) > > #define VM_HIGH_ARCH_1 BIT(VM_HIGH_ARCH_BIT_1) > > #define VM_HIGH_ARCH_2 BIT(VM_HIGH_ARCH_BIT_2) > > #define VM_HIGH_ARCH_3 BIT(VM_HIGH_ARCH_BIT_3) > > +#define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) > > #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ > > > > #if defined(CONFIG_X86) >^ > > # define VM_PATVM_ARCH_1 /* PAT reserves whole VMA at > > once (x86) */ > > -#if defined (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) > > -# define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 > > -# define VM_PKEY_BIT0 VM_HIGH_ARCH_0 /* A protection key is a 4-bit > > value */ > > -# define VM_PKEY_BIT1 VM_HIGH_ARCH_1 > > -# define VM_PKEY_BIT2 VM_HIGH_ARCH_2 > > -# define VM_PKEY_BIT3 VM_HIGH_ARCH_3 > > -#endif > > +#if defined(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) \ > > + || defined(CONFIG_PPC64_MEMORY_PROTECTION_KEYS) > > +#define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 > > +#define VM_PKEY_BIT0 VM_HIGH_ARCH_0 /* A protection key is a 5-bit > > value */ > ^ 4? > > +#define VM_PKEY_BIT1 VM_HIGH_ARCH_1 > > +#define VM_PKEY_BIT2 VM_HIGH_ARCH_2 > > +#define VM_PKEY_BIT3 VM_HIGH_ARCH_3 > > +#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */ > > That appears to be inside an #if defined(CONFIG_X86) ? > > > #elif defined(CONFIG_PPC) > ^ > Should be CONFIG_PPC64_MEMORY_PROTECTION_KEYS no? Its a little garbled. Will fix it. > > > +#define VM_PKEY_BIT0 VM_HIGH_ARCH_0 /* A protection key is a 5-bit > > value */ > > +#define VM_PKEY_BIT1 VM_HIGH_ARCH_1 > > +#define VM_PKEY_BIT2 VM_HIGH_ARCH_2 > > +#define VM_PKEY_BIT3 VM_HIGH_ARCH_3 > > +#define VM_PKEY_BIT4 VM_HIGH_ARCH_4 /* intel does not use this bit > > */ > > + /* but reserved for future expansion */ > > But this hunk is for PPC ? > > Is it OK for the other arches & generic code to add another VM_PKEY_BIT4 ? No. it has to be PPC specific. > > Do you need to update show_smap_vma_flags() ? > > > # define VM_SAOVM_ARCH_1 /* Strong Access Ordering > > (powerpc) */ > > #elif defined(CONFIG_PARISC) > > # define VM_GROWSUPVM_ARCH_1 > > > diff --git a/include/uapi/asm-generic/mman-common.h > > b/include/uapi/asm-generic/mman-common.h > > index 8c27db0..b13ecc6 100644 > > --- a/include/uapi/asm-generic/mman-common.h > > +++ b/include/uapi/asm-generic/mman-common.h > > @@ -76,5 +76,5 @@ > > #define PKEY_DISABLE_WRITE 0x2 > > #define PKEY_ACCESS_MASK (PKEY_DISABLE_ACCESS |\ > > PKEY_DISABLE_WRITE) > > - > > +#define PKEY_DISABLE_EXECUTE 0x4 > > How you can set that if it's not in PKEY_ACCESS_MASK? I was wondering how to handle this. x86 does not support this flag. However powerpc has the ability to enable/disable execute permission on a key. It cannot be done from userspace, but can be done through the sys_mprotec
[PATCH] Documentation: remove overlay-notes reference to non-existent file
From: Frank Rowand File dt-object-internal.txt does not exist. Remove a reference to it and fix up tags for references to other files. Reported-by: afaer...@suse.de Signed-off-by: Frank Rowand --- Documentation/devicetree/overlay-notes.txt | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/Documentation/devicetree/overlay-notes.txt b/Documentation/devicetree/overlay-notes.txt index d418a6ce9812..eb7f2685fda1 100644 --- a/Documentation/devicetree/overlay-notes.txt +++ b/Documentation/devicetree/overlay-notes.txt @@ -3,8 +3,7 @@ Device Tree Overlay Notes This document describes the implementation of the in-kernel device tree overlay functionality residing in drivers/of/overlay.c and is a -companion document to Documentation/devicetree/dt-object-internal.txt[1] & -Documentation/devicetree/dynamic-resolution-notes.txt[2] +companion document to Documentation/devicetree/dynamic-resolution-notes.txt[1] How overlays work - @@ -16,8 +15,7 @@ Since the kernel mainly deals with devices, any new device node that result in an active device should have it created while if the device node is either disabled or removed all together, the affected device should be deregistered. -Lets take an example where we have a foo board with the following base tree -which is taken from [1]. +Lets take an example where we have a foo board with the following base tree: foo.dts - /* FOO platform */ @@ -36,7 +34,7 @@ which is taken from [1]. }; foo.dts - -The overlay bar.dts, when loaded (and resolved as described in [2]) should +The overlay bar.dts, when loaded (and resolved as described in [1]) should bar.dts - /plugin/; /* allow undefined label references and record them */ -- Frank Rowand
[PATCH] powerpc: Only obtain cpu_hotplug_lock if called by rtasd
Calling arch_update_cpu_topology from a CPU hotplug state machine callback hits a deadlock because the function tries to get a read lock on cpu_hotplug_lock while the state machine still holds a write lock on it. Since all callers of arch_update_cpu_topology except rtasd already hold cpu_hotplug_lock, this patch changes the function to use stop_machine_cpuslocked and creates a separate function for rtasd which still tries to obtain the lock. Michael Bringmann investigated the bug and provided a detailed analysis of the deadlock on this previous RFC for an alternate solution: https://patchwork.ozlabs.org/patch/771293/ Signed-off-by: Thiago Jung Bauermann --- Notes: This patch applies on tip/smp/hotplug, it should probably be carried there. arch/powerpc/include/asm/topology.h | 6 ++ arch/powerpc/kernel/rtasd.c | 2 +- arch/powerpc/mm/numa.c | 22 +++--- 3 files changed, 26 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h index 8b3b46b7b0f2..a2d36b7703ae 100644 --- a/arch/powerpc/include/asm/topology.h +++ b/arch/powerpc/include/asm/topology.h @@ -43,6 +43,7 @@ extern void __init dump_numa_cpu_topology(void); extern int sysfs_add_device_to_node(struct device *dev, int nid); extern void sysfs_remove_device_from_node(struct device *dev, int nid); +extern int numa_update_cpu_topology(bool cpus_locked); #else @@ -57,6 +58,11 @@ static inline void sysfs_remove_device_from_node(struct device *dev, int nid) { } + +static inline int numa_update_cpu_topology(bool cpus_locked) +{ + return 0; +} #endif /* CONFIG_NUMA */ #if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR) diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c index 3650732639ed..0f0b1b2f3b60 100644 --- a/arch/powerpc/kernel/rtasd.c +++ b/arch/powerpc/kernel/rtasd.c @@ -283,7 +283,7 @@ static void prrn_work_fn(struct work_struct *work) * the RTAS event. */ pseries_devicetree_update(-prrn_update_scope); - arch_update_cpu_topology(); + numa_update_cpu_topology(false); } static DECLARE_WORK(prrn_work, prrn_work_fn); diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 371792e4418f..b95c584ce19d 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -1311,8 +1311,10 @@ static int update_lookup_table(void *data) /* * Update the node maps and sysfs entries for each cpu whose home node * has changed. Returns 1 when the topology has changed, and 0 otherwise. + * + * cpus_locked says whether we already hold cpu_hotplug_lock. */ -int arch_update_cpu_topology(void) +int numa_update_cpu_topology(bool cpus_locked) { unsigned int cpu, sibling, changed = 0; struct topology_update_data *updates, *ud; @@ -1400,15 +1402,23 @@ int arch_update_cpu_topology(void) if (!cpumask_weight(&updated_cpus)) goto out; - stop_machine(update_cpu_topology, &updates[0], &updated_cpus); + if (cpus_locked) + stop_machine_cpuslocked(update_cpu_topology, &updates[0], + &updated_cpus); + else + stop_machine(update_cpu_topology, &updates[0], &updated_cpus); /* * Update the numa-cpu lookup table with the new mappings, even for * offline CPUs. It is best to perform this update from the stop- * machine context. */ - stop_machine(update_lookup_table, &updates[0], + if (cpus_locked) + stop_machine_cpuslocked(update_lookup_table, &updates[0], cpumask_of(raw_smp_processor_id())); + else + stop_machine(update_lookup_table, &updates[0], +cpumask_of(raw_smp_processor_id())); for (ud = &updates[0]; ud; ud = ud->next) { unregister_cpu_under_node(ud->cpu, ud->old_nid); @@ -1426,6 +1436,12 @@ int arch_update_cpu_topology(void) return changed; } +int arch_update_cpu_topology(void) +{ + lockdep_assert_cpus_held(); + return numa_update_cpu_topology(true); +} + static void topology_work_fn(struct work_struct *work) { rebuild_sched_domains(); -- 2.7.4
[PATCH v6 4/4] of: detect invalid phandle in overlay
From: Frank Rowand Overlays are not allowed to modify phandle values of previously existing nodes because there is no information available to allow fixup up properties that use the previously existing phandle. Signed-off-by: Frank Rowand --- drivers/of/overlay.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c index ca0b85f5deb1..20ab49d2f7a4 100644 --- a/drivers/of/overlay.c +++ b/drivers/of/overlay.c @@ -130,6 +130,10 @@ static int of_overlay_apply_single_device_node(struct of_overlay *ov, /* NOTE: Multiple mods of created nodes not supported */ tchild = of_get_child_by_name(target, cname); if (tchild != NULL) { + /* new overlay phandle value conflicts with existing value */ + if (child->phandle) + return -EINVAL; + /* apply overlay recursively */ ret = of_overlay_apply_one(ov, tchild, child); of_node_put(tchild); -- Frank Rowand
[PATCH v6 3/4] of: be consistent in form of file mode
From: Frank Rowand checkpatch whined about using S_IRUGO instead of octal equivalent when adding phandle sysfs code, so used octal in that patch. Change other instances of the S_* constants in the same file to the octal form. Signed-off-by: Frank Rowand --- drivers/of/base.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/of/base.c b/drivers/of/base.c index 941c9a03471d..a4e2159c8671 100644 --- a/drivers/of/base.c +++ b/drivers/of/base.c @@ -168,7 +168,7 @@ int __of_add_property_sysfs(struct device_node *np, struct property *pp) sysfs_bin_attr_init(&pp->attr); pp->attr.attr.name = safe_name(&np->kobj, pp->name); - pp->attr.attr.mode = secure ? S_IRUSR : S_IRUGO; + pp->attr.attr.mode = secure ? 0400 : 0444; pp->attr.size = secure ? 0 : pp->length; pp->attr.read = of_node_property_read; -- Frank Rowand
[PATCH v6 2/4] of: make __of_attach_node() static
From: Frank Rowand __of_attach_node() is not used outside of drivers/of/dynamic.c. Make it static and remove it from drivers/of/of_private.h. Signed-off-by: Frank Rowand --- drivers/of/dynamic.c| 2 +- drivers/of/of_private.h | 1 - 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c index be320082178f..3367ed2da9ad 100644 --- a/drivers/of/dynamic.c +++ b/drivers/of/dynamic.c @@ -216,7 +216,7 @@ int of_property_notify(int action, struct device_node *np, return of_reconfig_notify(action, &pr); } -void __of_attach_node(struct device_node *np) +static void __of_attach_node(struct device_node *np) { np->child = NULL; np->sibling = np->parent->child; diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h index 1a041411b219..73da291a51cd 100644 --- a/drivers/of/of_private.h +++ b/drivers/of/of_private.h @@ -91,7 +91,6 @@ extern int __of_update_property(struct device_node *np, extern void __of_update_property_sysfs(struct device_node *np, struct property *newprop, struct property *oldprop); -extern void __of_attach_node(struct device_node *np); extern int __of_attach_node_sysfs(struct device_node *np); extern void __of_detach_node(struct device_node *np); extern void __of_detach_node_sysfs(struct device_node *np); -- Frank Rowand
[PATCH v6 1/4] of: remove *phandle properties from expanded device tree
From: Frank Rowand Remove "phandle", "linux,phandle", and "ibm,phandle" properties from the internal device tree. The phandle will still be in the struct device_node phandle field and will still be displayed as if it is a property in /proc/device_tree. This is to resolve the issue found by Stephen Boyd [1] when he changed the type of struct property.value from void * to const void *. As a result of the type change, the overlay code had compile errors where the resolver updates phandle values. [1] http://lkml.iu.edu/hypermail/linux/kernel/1702.1/04160.html - Add sysfs infrastructure to report np->phandle, as if it was a property. - Do not create "phandle" "ibm,phandle", and "linux,phandle" properties in the expanded device tree. - Remove phandle properties in of_attach_node(), for nodes dynamically attached to the live tree. Add the phandle sysfs entry for these nodes. - When creating an overlay changeset, duplicate the node phandle in __of_node_dup(). - Remove no longer needed checks to exclude "phandle" and "linux,phandle" properties in several locations. - A side effect of these changes is that the obsolete "linux,phandle" and "ibm,phandle" properties will no longer appear in /proc/device-tree (they will appear as "phandle"). - A side effect is that the value of property "ibm,phandle" will no longer override the value of properties "phandle" and "linux,phandle". Signed-off-by: Frank Rowand --- drivers/of/base.c | 48 +++--- drivers/of/dynamic.c| 55 + drivers/of/fdt.c| 43 +++--- drivers/of/of_private.h | 1 + drivers/of/overlay.c| 4 +--- drivers/of/resolver.c | 23 + include/linux/of.h | 1 + 7 files changed, 112 insertions(+), 63 deletions(-) diff --git a/drivers/of/base.c b/drivers/of/base.c index 28d5f53bc631..941c9a03471d 100644 --- a/drivers/of/base.c +++ b/drivers/of/base.c @@ -116,6 +116,19 @@ static ssize_t of_node_property_read(struct file *filp, struct kobject *kobj, return memory_read_from_buffer(buf, count, &offset, pp->value, pp->length); } +static ssize_t of_node_phandle_read(struct file *filp, struct kobject *kobj, + struct bin_attribute *bin_attr, char *buf, + loff_t offset, size_t count) +{ + phandle phandle; + struct device_node *np; + + np = container_of(bin_attr, struct device_node, attr_phandle); + phandle = cpu_to_be32(np->phandle); + return memory_read_from_buffer(buf, count, &offset, &phandle, + sizeof(phandle)); +} + /* always return newly allocated name, caller must free after use */ static const char *safe_name(struct kobject *kobj, const char *orig_name) { @@ -164,6 +177,35 @@ int __of_add_property_sysfs(struct device_node *np, struct property *pp) return rc; } +/* + * In the imported device tree (fdt), phandle is a property. In the + * internal data structure it is instead stored in the struct device_node. + * Make phandle visible in sysfs as if it was a property. + */ +int __of_add_phandle_sysfs(struct device_node *np) +{ + int rc; + + if (!IS_ENABLED(CONFIG_SYSFS)) + return 0; + + if (!of_kset || !of_node_is_attached(np)) + return 0; + + if (!np->phandle || np->phandle == 0x) + return 0; + + sysfs_bin_attr_init(&np->attr_phandle); + np->attr_phandle.attr.name = "phandle"; + np->attr_phandle.attr.mode = 0444; + np->attr_phandle.size = sizeof(np->phandle); + np->attr_phandle.read = of_node_phandle_read; + + rc = sysfs_create_bin_file(&np->kobj, &np->attr_phandle); + WARN(rc, "error adding attribute phandle to node %s\n", np->full_name); + return rc; +} + int __of_attach_node_sysfs(struct device_node *np) { const char *name; @@ -193,6 +235,8 @@ int __of_attach_node_sysfs(struct device_node *np) if (rc) return rc; + __of_add_phandle_sysfs(np); + for_each_property_of_node(np, pp) __of_add_property_sysfs(np, pp); @@ -2128,9 +2172,7 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align)) int id, len; /* Skip those we do not want to proceed */ - if (!strcmp(pp->name, "name") || - !strcmp(pp->name, "phandle") || - !strcmp(pp->name, "linux,phandle")) + if (!strcmp(pp->name, "name")) continue; np = of_find_node_by_path(pp->value); diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c index 888fdbc09992..be320082178f 100644 --- a/drivers/of/dynamic.c +++ b/drivers/of/dynamic.c @@ -218,19 +218,6 @@ int of_property_notify(int action, struct device_node *np, void __of_attach_node(struct device_node *np)
[PATCH v6 0/4] of: remove *phandle properties from expanded device tree
From: Frank Rowand Remove "phandle" and "linux,phandle" properties from the internal device tree. The phandle will still be in the struct device_node phandle field and will still be displayed as if it is a property in /proc/device_tree. This is to resolve the issue found by Stephen Boyd [1] when he changed the type of struct property.value from void * to const void *. As a result of the type change, the overlay code had compile errors where the resolver updates phandle values. [1] http://lkml.iu.edu/hypermail/linux/kernel/1702.1/04160.html Patch 1 is the phandle related changes. Patches 2 - 4 are minor fixups for issues that became visible while implementing patch 1. Changes from v5: - patch 1: populate_properties(), prop_is_phandle was declared at the wrong scope and thus was initialized before the for loop instead of each time through the loop. This resulted in any property in a node after the phandle property not being unflattened. Changes from v4: - rebase on 4.12-rc1 - Add reason for "" in of_attach_node() - Simplify and consolidate phandle detection logic in populate_properties(). This results in a change of behaviour, the value of property "ibm,phandle" will no longer override the value of properties "phandle" and "linux,phandle". Changes from v3: - patch 1: fix incorrect variable name in __of_add_phandle_sysfs(). Problem was reported by the kbuild test robot Changes from v2: - patch 1: Remove check in __of_add_phandle_sysfs() that would not add a sysfs entry if IS_ENABLED(CONFIG_PPC_PSERIES) Changes from v1: - Remove phandle properties in of_attach_node(), before attaching the node to the live tree. - Add the phandle sysfs entry for the node in of_attach_node(). - When creating an overlay changeset, duplicate the node phandle in __of_node_dup(). *** BLURB HERE *** Frank Rowand (4): of: remove *phandle properties from expanded device tree of: make __of_attach_node() static of: be consistent in form of file mode of: detect invalid phandle in overlay drivers/of/base.c | 50 +++ drivers/of/dynamic.c| 57 + drivers/of/fdt.c| 43 ++--- drivers/of/of_private.h | 2 +- drivers/of/overlay.c| 8 --- drivers/of/resolver.c | 23 +--- include/linux/of.h | 1 + 7 files changed, 118 insertions(+), 66 deletions(-) -- Frank Rowand
Re: [PATCH v2] perf: libdw support for powerpc
Em Thu, Jun 01, 2017 at 12:24:41PM +0200, Paolo Bonzini escreveu: > Porting PPC to libdw only needs an architecture-specific hook to move > the register state from perf to libdw. > > The ARM and x86 architectures already use libdw, and it is useful to > have as much common code for the unwinder as possible. Mark Wielaard > has contributed a frame-based unwinder to libdw, so that unwinding works > even for binaries that do not have CFI information. In addition, > libunwind is always preferred to libdw by the build machinery so this > cannot introduce regressions on machines that have both libunwind and > libdw installed. > > Cc: a...@kernel.org > Cc: Naveen N. Rao > Cc: Ravi Bangoria > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Paolo Bonzini > --- > v1->v2: fix for 4.11->4.12 changes Thanks, I'll test it and collect the Acked-by provided, will go into perf/core. - Arnaldo > tools/perf/Makefile.config | 2 +- > tools/perf/arch/powerpc/util/Build | 2 + > tools/perf/arch/powerpc/util/unwind-libdw.c | 73 > + > 3 files changed, 76 insertions(+), 1 deletion(-) > create mode 100644 tools/perf/arch/powerpc/util/unwind-libdw.c > > diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config > index 8354d04b392f..e7b04a729417 100644 > --- a/tools/perf/Makefile.config > +++ b/tools/perf/Makefile.config > @@ -61,7 +61,7 @@ endif > # Disable it on all other architectures in case libdw unwind > # support is detected in system. Add supported architectures > # to the check. > -ifneq ($(ARCH),$(filter $(ARCH),x86 arm)) > +ifneq ($(ARCH),$(filter $(ARCH),x86 arm powerpc)) >NO_LIBDW_DWARF_UNWIND := 1 > endif > > diff --git a/tools/perf/arch/powerpc/util/Build > b/tools/perf/arch/powerpc/util/Build > index 90ad64b231cd..2e6595310420 100644 > --- a/tools/perf/arch/powerpc/util/Build > +++ b/tools/perf/arch/powerpc/util/Build > @@ -5,4 +5,6 @@ libperf-y += perf_regs.o > > libperf-$(CONFIG_DWARF) += dwarf-regs.o > libperf-$(CONFIG_DWARF) += skip-callchain-idx.o > + > libperf-$(CONFIG_LIBUNWIND) += unwind-libunwind.o > +libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o > diff --git a/tools/perf/arch/powerpc/util/unwind-libdw.c > b/tools/perf/arch/powerpc/util/unwind-libdw.c > new file mode 100644 > index ..3a24b3c43273 > --- /dev/null > +++ b/tools/perf/arch/powerpc/util/unwind-libdw.c > @@ -0,0 +1,73 @@ > +#include > +#include "../../util/unwind-libdw.h" > +#include "../../util/perf_regs.h" > +#include "../../util/event.h" > + > +/* See backends/ppc_initreg.c and backends/ppc_regs.c in elfutils. */ > +static const int special_regs[3][2] = { > + { 65, PERF_REG_POWERPC_LINK }, > + { 101, PERF_REG_POWERPC_XER }, > + { 109, PERF_REG_POWERPC_CTR }, > +}; > + > +bool libdw__arch_set_initial_registers(Dwfl_Thread *thread, void *arg) > +{ > + struct unwind_info *ui = arg; > + struct regs_dump *user_regs = &ui->sample->user_regs; > + Dwarf_Word dwarf_regs[32], dwarf_nip; > + size_t i; > + > +#define REG(r) ({\ > + Dwarf_Word val = 0; \ > + perf_reg_value(&val, user_regs, PERF_REG_POWERPC_##r); \ > + val;\ > +}) > + > + dwarf_regs[0] = REG(R0); > + dwarf_regs[1] = REG(R1); > + dwarf_regs[2] = REG(R2); > + dwarf_regs[3] = REG(R3); > + dwarf_regs[4] = REG(R4); > + dwarf_regs[5] = REG(R5); > + dwarf_regs[6] = REG(R6); > + dwarf_regs[7] = REG(R7); > + dwarf_regs[8] = REG(R8); > + dwarf_regs[9] = REG(R9); > + dwarf_regs[10] = REG(R10); > + dwarf_regs[11] = REG(R11); > + dwarf_regs[12] = REG(R12); > + dwarf_regs[13] = REG(R13); > + dwarf_regs[14] = REG(R14); > + dwarf_regs[15] = REG(R15); > + dwarf_regs[16] = REG(R16); > + dwarf_regs[17] = REG(R17); > + dwarf_regs[18] = REG(R18); > + dwarf_regs[19] = REG(R19); > + dwarf_regs[20] = REG(R20); > + dwarf_regs[21] = REG(R21); > + dwarf_regs[22] = REG(R22); > + dwarf_regs[23] = REG(R23); > + dwarf_regs[24] = REG(R24); > + dwarf_regs[25] = REG(R25); > + dwarf_regs[26] = REG(R26); > + dwarf_regs[27] = REG(R27); > + dwarf_regs[28] = REG(R28); > + dwarf_regs[29] = REG(R29); > + dwarf_regs[30] = REG(R30); > + dwarf_regs[31] = REG(R31); > + if (!dwfl_thread_state_registers(thread, 0, 32, dwarf_regs)) > + return false; > + > + dwarf_nip = REG(NIP); > + dwfl_thread_state_register_pc(thread, dwarf_nip); > + for (i = 0; i < ARRAY_SIZE(special_regs); i++) { > + Dwarf_Word val = 0; > + perf_reg_value(&val, user_regs, special_regs[i][1]); > + if (!dwfl_thread_state_registers(thread, > + special_regs[i][0], 1, > + &val)) > +
Network TX Stall on 440EP Processor
I'm working on a project that is derived from the Yosemite PPC 440EP board. It's a legacy project that was running the 2.6.24 Kernel, and network traffic was stalling due to transmission halting without an understandable error (in this error condition, the various status registers of network interface showed no issues), other than TX stalling due to Buffer Descriptor Ring becoming full. In order to see if the problem has been resolved, the Kernel has been updated to 4.9.13, compiled with gcc version 5.4.0 (Buildroot 2017.02.2). Although the frequency of the problem is decreased, it still does show up. The test case is the Linux Target running idle, no application code. From a Linux host on a directly connected network, 30 flood pings are started. After a period of several minutes to perhaps hours, the transmit aspect of the network controller ceases to transmit packets (Buffer Descriptor ring becomes full). RX still works. In the 2.6.24 Kernel, the problem happens within seconds, so it has improved with the new Kernel. Below is the output from the Kernel when this happens. Has anybody seen this problem before? I can't find any errata on it, nor can I find any reports of it. The orginal problem is rooted in the Embedded Application running, and after a period of time of heavy network traffic, the TX side of network stalls. The flood ping test is used simply to force the problem to happen. [ 3127.143572] NETDEV WATCHDOG: eth0 (emac): transmit queue 0 timed out [ 3127.150172] [ cut here ] [ 3127.154778] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x23c/0x244 [ 3127.162965] Modules linked in: [ 3127.166013] CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.13 #9 [ 3127.171707] task: c0e67300 task.stack: c0f0 [ 3127.176192] NIP: c068e734 LR: c068e734 CTR: c04672f4 [ 3127.181107] REGS: c0f01c90 TRAP: 0700 Not tainted (4.9.13) [ 3127.186793] MSR: 00029000 [ 3127.190241] CR: 2812 XER: [ 3127.194210] GPR00: c068e734 c0f01d40 c0e67300 0038 d1006301 00df c04683e4 00df GPR08: 00df c0eff4b0 c0eff4b0 0004 24122424 00b960f0 c0e8 GPR16: 000ac8c1 c07b8618 c098bddc c0e69000 000a c0ee c0e73f20 c0f0 GPR24: c100e4e8 c0ee c0e77d60 c3128000 c068e4f8 c0e8 c3128000 NIP [c068e734] dev_watchdog+0x23c/0x244 [ 3127.227680] LR [c068e734] dev_watchdog+0x23c/0x244 [ 3127.232427] Call Trace: [ 3127.234857] [c0f01d40] [c068e734] dev_watchdog+0x23c/0x244 (unreliable) [ 3127.241447] [c0f01d60] [c00805e8] call_timer_fn+0x40/0x118 [ 3127.246889] [c0f01d80] [c00808e8] expire_timers.isra.13+0xbc/0x114 [ 3127.253032] [c0f01db0] [c0080a94] run_timer_softirq+0x90/0xf0 [ 3127.258753] [c0f01e00] [c07b31b4] __do_softirq+0x114/0x2b0 [ 3127.264202] [c0f01e60] [c002a158] irq_exit+0xe8/0xec [ 3127.269144] [c0f01e70] [c0008c98] timer_interrupt+0x34/0x4c [ 3127.274684] [c0f01e80] [c000ec94] ret_from_except+0x0/0x18 [ 3127.280151] --- interrupt: 901 at cpm_idle+0x3c/0x70 [ 3127.280151] LR = arch_cpu_idle+0x30/0x68 [ 3127.289300] [c0f01f40] [c0f058e4] cpu_idle_force_poll+0x0/0x4 (unreliable) [ 3127.296146] [c0f01f50] [c00073e4] arch_cpu_idle+0x30/0x68 [ 3127.301509] [c0f01f60] [c005bce8] cpu_startup_entry+0x184/0x1bc [ 3127.307392] [c0f01fb0] [c0a76a1c] start_kernel+0x3d4/0x3e8 [ 3127.312843] [c0f01ff0] [c0b4] _start+0xb4/0xf8 [ 3127.317599] Instruction dump: [ 3127.320557] 811f0284 4b78 3921 7fe3fb78 99281966 4bfd9cd5 7c651b78 3c60c0a1 [ 3127.328359] 7fc6f378 7fe4fb78 3863357c 48125319 <0fe0> 4bb8 7c0802a6 90010004 [ 3127.336327] ---[ end trace c31dfe4772ff0e8f ]---
PPC 266MHz 8347E slow kernel decompress
Hi. I recently upgraded one of Ericssons platforms from a old 3.6.x to the latest 3.16.x LTS kernel. It was a smooth upgrade, besides kernel decompression taking much longer to complete. From like 2 seconds to 10-12 seconds. I also tried a 4.10.z kernel with the same result. The decompression code doesn't have any significant changes as far as I can see. Maybe I am looking in the wrong place? Did the ppc arch init change between 3.6.x and 3.16.x? I am thinking of cache init, prefetch copy functions etc? The old Redboot loader starts the kernel with caches off. Maybe the older init re-initialized the caches before decompression? Bootloader is identical, so is the compiler. Nothing has changed besides the kernel itself. It is 1.5M or so, no external modules. Runtime speed after decompression seems absolutely normal. I couldn't find anything significant anywhere regarding this. Hints would be much appreciated. Regards, Christian
1M hugepage size being registered on Linux
Hi Alistair/Jeremy, I am working on a bug related to 1M hugepage size being registered on Linux (Power 8 Baremetal - Garrison). I was checking dmesg and it seems that 1M page size is coming from firmware to Linux. [0.00] base_shift=20: shift=20, sllp=0x0130, avpnm=0x, tlbiel=0, penc=2 [1.528867] HugeTLB registered 1 MB page size, pre-allocated 0 pages Should Linux support this page size? As afar as I know, this was an unsupported page size in the past isn't it? If this should be supported now, is there any specific reason for that? Thanks, Victor Aoqui Software Engineer: Linux Kernel Backports Linux Technology Center IBM Systems
Re: [PATCH V2 0/2] hwmon: (ibmpowernv) Add support for current(A) sensors
On Tue, Jun 20, 2017 at 10:38:11AM +0530, Shilpasri G Bhat wrote: > The first patch from Cedric in the patchset cleans up the driver to > provide a neater way to define new sensor types. The second patch adds > current sensor. > > Cédric Le Goater (1): > hwmon: (ibmpowernv) introduce a legacy_compatibles array > > Shilpasri G Bhat (1): > hwmon: (ibmpowernv) Add current(A) sensor > Series applied to hwmon-next. Thanks, Guenter
Re: [PATCH 0/2] fix loadable module for DPAA Ethernet
From: Madalin Bucur Date: Mon, 19 Jun 2017 18:04:15 +0300 > The DPAA Ethernet makes use of a symbol that is not exported. > Address the issue by propagating the dma_ops rather than calling > arch_setup_dma_ops(). Series applied, thanks.
Re: [PATCH v5 2/2] powerpc/fadump: update documentation about 'fadump_append=' parameter
On Friday 09 June 2017 05:34 PM, Michal Suchánek wrote: On Thu, 8 Jun 2017 23:30:37 +0530 Hari Bathini wrote: Hi Michal, Sorry for taking this long to respond. I was working on a few other things. On Monday 15 May 2017 02:59 PM, Michal Suchánek wrote: Hello, On Mon, 15 May 2017 12:59:46 +0530 Hari Bathini wrote: On Friday 12 May 2017 09:12 PM, Michal Suchánek wrote: On Fri, 12 May 2017 15:15:33 +0530 Hari Bathini wrote: On Thursday 11 May 2017 06:46 PM, Michal Suchánek wrote: On Thu, 11 May 2017 02:00:11 +0530 Hari Bathini wrote: Hello Michal, On Wednesday 10 May 2017 09:31 PM, Michal Suchánek wrote: Hello, On Wed, 03 May 2017 23:52:52 +0530 Hari Bathini wrote: With the introduction of 'fadump_append=' parameter to pass additional parameters to fadump (capture) kernel, update documentation about it. Signed-off-by: Hari Bathini --- Changes from v4: * Based on top of patchset that includes https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=akpm&id=05f383cdfba8793240e73f9a9fbff4e25d66003f Documentation/powerpc/firmware-assisted-dump.txt | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt index 8394bc8..6327193 100644 --- a/Documentation/powerpc/firmware-assisted-dump.txt +++ b/Documentation/powerpc/firmware-assisted-dump.txt @@ -162,7 +162,15 @@ How to enable firmware-assisted dump (fadump): 1. Set config option CONFIG_FA_DUMP=y and build kernel. 2. Boot into linux kernel with 'fadump=on' kernel cmdline option. -3. Optionally, user can also set 'crashkernel=' kernel cmdline +3. A user can pass additional command line parameters as a comma + separated list through 'fadump_append=' parameter, to be enforced + when fadump is active. For example, if parameters like nr_cpus=1, + numa=off & udev.children-max=2 are to be enforced when fadump is + active, 'fadump_append=nr_cpus=1,numa=off,udev.children-max=2' + can be passed in command line, which will be replaced with + "nr_cpus=1 numa=off udev.children-max=2" when fadump is active. + This helps in reducing memory consumption during dump capture. +4. Optionally, user can also set 'crashkernel=' kernel cmdline to specify size of the memory to reserve for boot memory dump preservation. Writing your own deficient parser for comma separated arguments when perfectly fine parser for space separated quoted arguments exists in the kernel and the bootloader does not seem like a good idea to me. Couple of things that prompted me for v5 are: 1. Using parse_early_options() limits the kind of parameters that can be passed to fadump capture kernel. Passing parameters like systemd.unit= & udev.childern.max= has no effect with v4. Updating boot_command_line parameter, when fadump is active, seems a better alternative. 2. Passing space-separated quoted arguments is not working as intended with lilo. Updating bootloader with the below entry in /etc/lilo.conf file results in a missing append entry in /etc/yaboot.conf file. append = "quiet sysrq=1 insmod=sym53c8xx insmod=ipr crashkernel=512M-:256M fadump_append=\"nr_cpus=1 numa=off udev.children-max=2\"" Meaning that a script that emulates LILO semantics on top of yaboot which is completely capable of passing qouted space separated arguments fails. IMHO it is more reasonable to fix the script or whatever adaptation layer or use yaboot directly than working around bug in said script by introducing a new argument parser in the kernel. Hmmm.. while trying to implement space-separated parameter list with quotes as syntax for fadump_append parameter, noticed that it can make implemenation more vulnerable. Here are some problems I am facing while implementing this.. How so? presumably you can reuse parse_args even if you do not register with early_param and call it yourself. Then your parsing of fadump_append is I wasn't aware of that. Thanks for pointing it out, Michal. Will try to use parse_args and get back. I was thinking a bit more about the uses of the commandline and how fadump_append potentially breaks it. The first thing that should be addressed and is the special -- argument which denotes the start of init arguments that are not to be parsed by the kernel. Incidentally the initial implementation using early_param happened to handles that without issue. parse_args surely handles that so adding a hook somewhere should give you location of that argument (if any). And interesting thing that can happen is passing an -- inside the fadump_append argument. It should be handled (or not) in some way or other and the handling documented. The intention with this patch is to replace "root=/dev/sda2 ro fadump_append=nr_cpus=1,numa=off crashkernel=1024M" with "root=/dev/sda2 ro nr_cpus=1 numa=off c
[PATCH V6 2/2] powerpc/numa: Update CPU topology when VPHN enabled
powerpc/numa: Correct the currently broken capability to set the topology for shared CPUs in LPARs. At boot time for shared CPU lpars, the topology for each shared CPU is set to node zero, however, this is now updated correctly using the Virtual Processor Home Node (VPHN) capabilities information provided by the pHyp. Also, update initialization checks for device-tree attributes to independently recognize PRRN or VPHN usage. Signed-off-by: Michael Bringmann --- Changes in V6: -- Place extern of timed_topology_update() proto under additional #ifdef for hotplug-cpu. --- arch/powerpc/include/asm/topology.h | 16 +++ arch/powerpc/mm/numa.c | 64 +++--- arch/powerpc/platforms/pseries/dlpar.c |2 + arch/powerpc/platforms/pseries/hotplug-cpu.c |2 + 4 files changed, 77 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h index 9cc6ec9..ae3cdd0 100644 --- a/arch/powerpc/include/asm/topology.h +++ b/arch/powerpc/include/asm/topology.h @@ -79,6 +79,22 @@ static inline int prrn_is_enabled(void) } #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */ +#if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR) && \ + defined(CONFIG_HOTPLUG_CPU) +extern int timed_topology_update(int nsecs); +#else +static int timed_topology_update(int nsecs) +{ + return 0; +} +#endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR && CONFIG_HOTPLUG_CPU */ + +#if defined(CONFIG_PPC_SPLPAR) +extern void shared_topology_update(void); +#else +#defineshared_topology_update()0 +#endif /* CONFIG_PPC_SPLPAR */ + #include #ifdef CONFIG_SMP diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 0746d93..cf5992d 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -935,7 +936,7 @@ void __init initmem_init(void) /* * Reduce the possible NUMA nodes to the online NUMA nodes, -* since we do not support node hotplug. This ensures that we +* since we do not support node hotplug. This ensures that we * lower the maximum NUMA node ID to what is actually present. */ nodes_and(node_possible_map, node_possible_map, node_online_map); @@ -1179,11 +1180,32 @@ struct topology_update_data { int new_nid; }; +#defineTOPOLOGY_DEF_TIMER_SECS 60 + static u8 vphn_cpu_change_counts[NR_CPUS][MAX_DISTANCE_REF_POINTS]; static cpumask_t cpu_associativity_changes_mask; static int vphn_enabled; static int prrn_enabled; static void reset_topology_timer(void); +static int topology_timer_secs = TOPOLOGY_DEF_TIMER_SECS; +static int topology_inited; +static int topology_update_needed; + +/* + * Change polling interval for associativity changes. + */ +int timed_topology_update(int nsecs) +{ + if (nsecs > 0) + topology_timer_secs = nsecs; + else + topology_timer_secs = TOPOLOGY_DEF_TIMER_SECS; + + if (vphn_enabled) + reset_topology_timer(); + + return 0; +} /* * Store the current values of the associativity change counters in the @@ -1277,6 +1299,12 @@ static long vphn_get_associativity(unsigned long cpu, "hcall_vphn() experienced a hardware fault " "preventing VPHN. Disabling polling...\n"); stop_topology_update(); + break; + case H_SUCCESS: + printk(KERN_INFO + "VPHN hcall succeeded. Reset polling...\n"); + timed_topology_update(0); + break; } return rc; @@ -1354,8 +1382,11 @@ int numa_update_cpu_topology(bool cpus_locked) struct device *dev; int weight, new_nid, i = 0; - if (!prrn_enabled && !vphn_enabled) + if (!prrn_enabled && !vphn_enabled) { + if (!topology_inited) + topology_update_needed = 1; return 0; + } weight = cpumask_weight(&cpu_associativity_changes_mask); if (!weight) @@ -1394,6 +1425,8 @@ int numa_update_cpu_topology(bool cpus_locked) cpumask_andnot(&cpu_associativity_changes_mask, &cpu_associativity_changes_mask, cpu_sibling_mask(cpu)); + pr_info("Assoc chg gives same node %d for cpu%d\n", + new_nid, cpu); cpu = cpu_last_thread_sibling(cpu); continue; } @@ -1410,6 +1443,9 @@ int numa_update_cpu_topology(bool cpus_locked) cpu = cpu_last_thread_sibling(cpu); } + if (i) + updates[i-1].next = NULL; + pr_debug("Topology update for the following CPUs:\n"); if (cpumask_weight(
[PATCH V6 1/2] powerpc/hotplug: Ensure enough nodes avail for operations
powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU or memory resources, it may occur that the new resources are to be inserted into nodes that were not used for these resources at bootup. In the kernel, any node that is used must be defined and initialized at boot. In order to meet both needs, this patch adds a new kernel command line option (numnodes=) for use by the PowerPC architecture- specific code that defines the maximum number of nodes that the kernel will ever need in its current hardware environment. The boot code that initializes nodes for PowerPC will read this value and use it to ensure that all of the desired nodes are setup in the 'node_possible_map', and elsewhere. Signed-off-by: Michael Bringmann --- --- arch/powerpc/mm/numa.c | 31 +++ 1 file changed, 31 insertions(+) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index e6f742d..0746d93 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -60,10 +60,27 @@ static int n_mem_addr_cells, n_mem_size_cells; static int form1_affinity; +#define TOPOLOGY_DEF_NUM_NODES 0 #define MAX_DISTANCE_REF_POINTS 4 static int distance_ref_points_depth; static const __be32 *distance_ref_points; static int distance_lookup_table[MAX_NUMNODES][MAX_DISTANCE_REF_POINTS]; +static int topology_num_nodes = TOPOLOGY_DEF_NUM_NODES; + +/* + * Topology-related early parameters + */ +static int __init early_num_nodes(char *p) +{ + if (!p) + return 1; + + topology_num_nodes = memparse(p, &p); + dbg("topology num nodes = 0x%d\n", topology_num_nodes); + + return 0; +} +early_param("numnodes", early_num_nodes); /* * Allocate node_to_cpumask_map based on number of available nodes @@ -892,6 +909,18 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn) NODE_DATA(nid)->node_spanned_pages = spanned_pages; } +static void __init setup_min_nodes(void) +{ + int i, l = topology_num_nodes; + + for (i = 0; i < l; i++) { + if (!node_possible(i)) { + setup_node_data(i, 0, 0); + node_set(i, node_possible_map); + } + } +} + void __init initmem_init(void) { int nid, cpu; @@ -911,6 +940,8 @@ void __init initmem_init(void) */ nodes_and(node_possible_map, node_possible_map, node_online_map); + setup_min_nodes(); + for_each_online_node(nid) { unsigned long start_pfn, end_pfn;
[PATCH V6 0/2] powerpc/dlpar: Correct display of hot-add/hot-remove CPUs and memory
On Power systems with shared configurations of CPUs and memory, there are some issues with association of additional CPUs and memory to nodes when hot-adding resources. These patches address some of those problems. powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU or memory resources, it may occur that the new resources are to be inserted into nodes that were not used for these resources at bootup. In the kernel, any node that is used must be defined and initialized at boot. In order to meet both needs, this patch adds a new kernel command line option (numnodes=) for use by the PowerPC architecture-specific code that defines the maximum number of nodes that the kernel will ever need in its current hardware environment. The boot code that initializes nodes for PowerPC will read this value and use it to ensure that all of the desired nodes are setup in the 'node_possible_map', and elsewhere. powerpc/numa: Correct the currently broken capability to set the topology for shared CPUs in LPARs. At boot time for shared CPU lpars, the topology for each shared CPU is set to node zero, however, this is now updated correctly using the Virtual Processor Home Node (VPHN) capabilities information provided by the pHyp. The VPHN handling in Linux is disabled, if PRRN handling is present. Signed-off-by: Michael Bringmann Michael Bringmann (2): powerpc/hotplug: Add option to define max nodes allowing dynamic growth of resources. powerpc/numa: Update CPU topology when VPHN enabled --- Changes in V6: -- Reorder some code to better eliminate unused functions in conditional builds.
[PATCH] powerpc/64: Initialise thread_info for emergency stacks
Emergency stacks have their thread_info mostly uninitialised, which in particular means garbage preempt_count values. Emergency stack code runs with interrupts disabled entirely, and is used very rarely, so this has been unnoticed so far. It was found by a proposed new powerpc watchdog that takes a soft-NMI directly from the masked_interrupt handler and using the emergency stack. That crashed at BUG_ON(in_nmi()) in nmi_enter(). preempt_count()s were found to be garbage. Reported-by: Abdul Haleem Signed-off-by: Nicholas Piggin --- FYI, this bug looks to be breaking linux-next on some powerpc boxes due to interaction with a proposed new powerpc watchdog driver Andrew has in his tree: http://marc.info/?l=linuxppc-embedded&m=149794320519941&w=2 arch/powerpc/include/asm/thread_info.h | 19 +++ arch/powerpc/kernel/setup_64.c | 6 +++--- 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index a941cc6fc3e9..5995e4b2996d 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -54,6 +54,7 @@ struct thread_info { .task = &tsk, \ .cpu = 0, \ .preempt_count = INIT_PREEMPT_COUNT,\ + .local_flags = 0, \ .flags =0, \ } @@ -62,6 +63,24 @@ struct thread_info { #define THREAD_SIZE_ORDER (THREAD_SHIFT - PAGE_SHIFT) +/* + * Emergency stacks are used for a range of things, from asynchronous + * NMIs (system reset, machine check) to synchronous, process context. + * Set HARDIRQ_OFFSET because we don't know exactly what context we + * come from or if it had a valid stack, which is about the best we + * can do. + * TODO: what to do with accounting? + */ +#define emstack_init_thread_info(ti, c)\ +do { \ + (ti)->task = NULL; \ + (ti)->cpu = (c);\ + (ti)->preempt_count = HARDIRQ_OFFSET; \ + (ti)->local_flags = 0; \ + (ti)->flags = 0;\ + klp_init_thread_info(ti); \ +} while (0) + /* how to get the thread information struct from C */ static inline struct thread_info *current_thread_info(void) { diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index f35ff9dea4fb..54c4336655f8 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -639,18 +639,18 @@ void __init emergency_stack_init(void) for_each_possible_cpu(i) { struct thread_info *ti; ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit)); - klp_init_thread_info(ti); + emstack_init_thread_info(ti, i); paca[i].emergency_sp = (void *)ti + THREAD_SIZE; #ifdef CONFIG_PPC_BOOK3S_64 /* emergency stack for NMI exception handling. */ ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit)); - klp_init_thread_info(ti); + emstack_init_thread_info(ti, i); paca[i].nmi_emergency_sp = (void *)ti + THREAD_SIZE; /* emergency stack for machine check exception handling. */ ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit)); - klp_init_thread_info(ti); + emstack_init_thread_info(ti, i); paca[i].mc_emergency_sp = (void *)ti + THREAD_SIZE; #endif } -- 2.11.0
Re: clean up and modularize arch dma_mapping interface
On Tue, Jun 20, 2017 at 11:19:02AM +0200, Daniel Vetter wrote: > Ack for the 2 drm patches, but I can also pick them up through drm-misc if > you prefer that (but then it'll be 4.14). Nah, I'll plan to set up a dma-mapping tree so that we'll have common place for dma-mapping work.
Re: new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2
On Tue, Jun 20, 2017 at 11:04:00PM +1000, Stephen Rothwell wrote: > git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git#dma-mapping-next > > Contacts: Marek Szyprowski and Kyungmin Park (cc'd) > > I have called your tree dma-mapping-hch for now. The other tree has > not been updated since 4.9-rc1 and I am not sure how general it is. > Marek, Kyungmin, any comments? I'd be happy to join efforts - co-maintainers and reviers are always welcome.
Re: new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2
On Tue, Jun 20, 2017 at 02:14:36PM +0100, Robin Murphy wrote: > Hi Christoph, > > On 20/06/17 13:41, Christoph Hellwig wrote: > > On Fri, Jun 16, 2017 at 08:10:15PM +0200, Christoph Hellwig wrote: > >> I plan to create a new dma-mapping tree to collect all this work. > >> Any volunteers for co-maintainers, especially from the iommu gang? > > > > Ok, I've created the new tree: > > > >git://git.infradead.org/users/hch/dma-mapping.git for-next > > > > Gitweb: > > > > > > http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/for-next > > > > And below is the patch to add the MAINTAINERS entry, additions welcome. > > I'm happy to be a reviewer, since I've been working in this area for > some time, particularly with the dma-iommu code and arm64 DMA ops. Great, I'll add you!
Re: new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2
Hi Christoph, On 20/06/17 13:41, Christoph Hellwig wrote: > On Fri, Jun 16, 2017 at 08:10:15PM +0200, Christoph Hellwig wrote: >> I plan to create a new dma-mapping tree to collect all this work. >> Any volunteers for co-maintainers, especially from the iommu gang? > > Ok, I've created the new tree: > >git://git.infradead.org/users/hch/dma-mapping.git for-next > > Gitweb: > > > http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/for-next > > And below is the patch to add the MAINTAINERS entry, additions welcome. I'm happy to be a reviewer, since I've been working in this area for some time, particularly with the dma-iommu code and arm64 DMA ops. Robin. > Stephen, can you add this to linux-next? > > --- > From 335979c41912e6c101a20b719862b2d837370df1 Mon Sep 17 00:00:00 2001 > From: Christoph Hellwig > Date: Tue, 20 Jun 2017 11:17:30 +0200 > Subject: MAINTAINERS: add entry for dma mapping helpers > > This code has been spread between getting in through arch trees, the iommu > tree, -mm and the drivers tree. There will be a lot of work in this area, > including consolidating various arch implementations into more common > code, so ensure we have a proper git tree that facilitates cooperation with > the architecture maintainers. > > Signed-off-by: Christoph Hellwig > --- > MAINTAINERS | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 09b5ab6a8a5c..56859d53a424 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -2595,6 +2595,19 @@ S: Maintained > F: net/bluetooth/ > F: include/net/bluetooth/ > > +DMA MAPPING HELPERS > +M: Christoph Hellwig > +L: linux-ker...@vger.kernel.org > +T: git git://git.infradead.org/users/hch/dma-mapping.git > +W: http://git.infradead.org/users/hch/dma-mapping.git > +S: Supported > +F: lib/dma-debug.c > +F: lib/dma-noop.c > +F: lib/dma-virt.c > +F: drivers/base/dma-mapping.c > +F: drivers/base/dma-coherent.c > +F: include/linux/dma-mapping.h > + > BONDING DRIVER > M: Jay Vosburgh > M: Veaceslav Falico >
Re: new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2
Hi Christoph, On Tue, 20 Jun 2017 14:41:40 +0200 Christoph Hellwig wrote: > > On Fri, Jun 16, 2017 at 08:10:15PM +0200, Christoph Hellwig wrote: > > I plan to create a new dma-mapping tree to collect all this work. > > Any volunteers for co-maintainers, especially from the iommu gang? > > Ok, I've created the new tree: > >git://git.infradead.org/users/hch/dma-mapping.git for-next > > Gitweb: > > > http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/for-next > > And below is the patch to add the MAINTAINERS entry, additions welcome. > > Stephen, can you add this to linux-next? Added from tomorrow. I have another tree called dma-mapping: git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git#dma-mapping-next Contacts: Marek Szyprowski and Kyungmin Park (cc'd) I have called your tree dma-mapping-hch for now. The other tree has not been updated since 4.9-rc1 and I am not sure how general it is. Marek, Kyungmin, any comments? Thanks for adding your subsystem tree as a participant of linux-next. As you may know, this is not a judgement of your code. The purpose of linux-next is for integration testing and to lower the impact of conflicts between subsystems in the next merge window. You will need to ensure that the patches/commits in your tree/series have been: * submitted under GPL v2 (or later) and include the Contributor's Signed-off-by, * posted to the relevant mailing list, * reviewed by you (or another maintainer of your subsystem tree), * successfully unit tested, and * destined for the current or next Linux merge window. Basically, this should be just what you would send to Linus (or ask him to fetch). It is allowed to be rebased if you deem it necessary. -- Cheers, Stephen Rothwell s...@canb.auug.org.au
Re: [PATCH 1/3] powerpc/64s: Use BRANCH_TO_COMMON() for slb_miss_realmode
On Tue, 20 Jun 2017 22:34:55 +1000 Michael Ellerman wrote: > All the callers of slb_miss_realmode currently open code the #ifndef > CONFIG_RELOCATABLE check and the branch via CTR in the RELOCATABLE case. > We have a macro to do this, BRANCH_TO_COMMON(), so use it. > > Signed-off-by: Michael Ellerman These 3 all look good to me. Reviewed-by: Nicholas Piggin
new dma-mapping tree, was Re: clean up and modularize arch dma_mapping interface V2
On Fri, Jun 16, 2017 at 08:10:15PM +0200, Christoph Hellwig wrote: > I plan to create a new dma-mapping tree to collect all this work. > Any volunteers for co-maintainers, especially from the iommu gang? Ok, I've created the new tree: git://git.infradead.org/users/hch/dma-mapping.git for-next Gitweb: http://git.infradead.org/users/hch/dma-mapping.git/shortlog/refs/heads/for-next And below is the patch to add the MAINTAINERS entry, additions welcome. Stephen, can you add this to linux-next? --- >From 335979c41912e6c101a20b719862b2d837370df1 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 20 Jun 2017 11:17:30 +0200 Subject: MAINTAINERS: add entry for dma mapping helpers This code has been spread between getting in through arch trees, the iommu tree, -mm and the drivers tree. There will be a lot of work in this area, including consolidating various arch implementations into more common code, so ensure we have a proper git tree that facilitates cooperation with the architecture maintainers. Signed-off-by: Christoph Hellwig --- MAINTAINERS | 13 + 1 file changed, 13 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 09b5ab6a8a5c..56859d53a424 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2595,6 +2595,19 @@ S: Maintained F: net/bluetooth/ F: include/net/bluetooth/ +DMA MAPPING HELPERS +M: Christoph Hellwig +L: linux-ker...@vger.kernel.org +T: git git://git.infradead.org/users/hch/dma-mapping.git +W: http://git.infradead.org/users/hch/dma-mapping.git +S: Supported +F: lib/dma-debug.c +F: lib/dma-noop.c +F: lib/dma-virt.c +F: drivers/base/dma-mapping.c +F: drivers/base/dma-coherent.c +F: include/linux/dma-mapping.h + BONDING DRIVER M: Jay Vosburgh M: Veaceslav Falico -- 2.11.0
[PATCH 3/3] powerpc/64s: Rename slb_allocate_realmode() to slb_allocate()
As for slb_miss_realmode(), rename slb_allocate_realmode() to avoid confusion over whether it runs in real or virtual mode - it runs in both. Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/exceptions-64s.S | 2 +- arch/powerpc/mm/slb.c| 10 +- arch/powerpc/mm/slb_low.S| 6 +++--- 3 files changed, 5 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 6ad755e0cb29..07b79c2c70f8 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -605,7 +605,7 @@ EXC_COMMON_BEGIN(slb_miss_common) crset 4*cr0+eq #ifdef CONFIG_PPC_STD_MMU_64 BEGIN_MMU_FTR_SECTION - bl slb_allocate_realmode + bl slb_allocate END_MMU_FTR_SECTION_IFCLR(MMU_FTR_TYPE_RADIX) #endif diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c index 654a0d7ba0e7..13cfe413b40d 100644 --- a/arch/powerpc/mm/slb.c +++ b/arch/powerpc/mm/slb.c @@ -33,15 +33,7 @@ enum slb_index { KSTACK_INDEX= 2, /* Kernel stack map */ }; -extern void slb_allocate_realmode(unsigned long ea); - -static void slb_allocate(unsigned long ea) -{ - /* Currently, we do real mode for all SLBs including user, but -* that will change if we bring back dynamic VSIDs -*/ - slb_allocate_realmode(ea); -} +extern void slb_allocate(unsigned long ea); #define slb_esid_mask(ssize) \ (((ssize) == MMU_SEGSIZE_256M)? ESID_MASK: ESID_MASK_1T) diff --git a/arch/powerpc/mm/slb_low.S b/arch/powerpc/mm/slb_low.S index 9869b44a04dc..bde378559d01 100644 --- a/arch/powerpc/mm/slb_low.S +++ b/arch/powerpc/mm/slb_low.S @@ -65,7 +65,7 @@ MMU_FTR_SECTION_ELSE \ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_68_BIT_VA) -/* void slb_allocate_realmode(unsigned long ea); +/* void slb_allocate(unsigned long ea); * * Create an SLB entry for the given EA (user or kernel). * r3 = faulting address, r13 = PACA @@ -73,7 +73,7 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_68_BIT_VA) * r3 is preserved. * No other registers are examined or changed. */ -_GLOBAL(slb_allocate_realmode) +_GLOBAL(slb_allocate) /* * check for bad kernel/user address * (ea & ~REGION_MASK) >= PGTABLE_RANGE @@ -309,7 +309,7 @@ slb_compare_rr_to_size: b 7b -_ASM_NOKPROBE_SYMBOL(slb_allocate_realmode) +_ASM_NOKPROBE_SYMBOL(slb_allocate) _ASM_NOKPROBE_SYMBOL(slb_miss_kernel_load_linear) _ASM_NOKPROBE_SYMBOL(slb_miss_kernel_load_io) _ASM_NOKPROBE_SYMBOL(slb_compare_rr_to_size) -- 2.7.4
[PATCH 2/3] powerpc/64s: Rename slb_miss_realmode() to slb_miss_common()
slb_miss_realmode() doesn't always runs in real mode, which is what the name implies. So rename it to avoid confusing people. Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/exceptions-64s.S | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 7bdfddbe0328..6ad755e0cb29 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -514,7 +514,7 @@ EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80) mfspr r3,SPRN_DAR mfspr r11,SPRN_SRR1 crset 4*cr6+eq - BRANCH_TO_COMMON(r10, slb_miss_realmode) + BRANCH_TO_COMMON(r10, slb_miss_common) EXC_REAL_END(data_access_slb, 0x380, 0x80) EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80) @@ -525,7 +525,7 @@ EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80) mfspr r3,SPRN_DAR mfspr r11,SPRN_SRR1 crset 4*cr6+eq - BRANCH_TO_COMMON(r10, slb_miss_realmode) + BRANCH_TO_COMMON(r10, slb_miss_common) EXC_VIRT_END(data_access_slb, 0x4380, 0x80) TRAMP_KVM_SKIP(PACA_EXSLB, 0x380) @@ -558,7 +558,7 @@ EXC_REAL_BEGIN(instruction_access_slb, 0x480, 0x80) mfspr r3,SPRN_SRR0/* SRR0 is faulting address */ mfspr r11,SPRN_SRR1 crclr 4*cr6+eq - BRANCH_TO_COMMON(r10, slb_miss_realmode) + BRANCH_TO_COMMON(r10, slb_miss_common) EXC_REAL_END(instruction_access_slb, 0x480, 0x80) EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80) @@ -569,13 +569,16 @@ EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80) mfspr r3,SPRN_SRR0/* SRR0 is faulting address */ mfspr r11,SPRN_SRR1 crclr 4*cr6+eq - BRANCH_TO_COMMON(r10, slb_miss_realmode) + BRANCH_TO_COMMON(r10, slb_miss_common) EXC_VIRT_END(instruction_access_slb, 0x4480, 0x80) TRAMP_KVM(PACA_EXSLB, 0x480) -/* This handler is used by both 0x380 and 0x480 slb miss interrupts */ -EXC_COMMON_BEGIN(slb_miss_realmode) +/* + * This handler is used by the 0x380 and 0x480 SLB miss interrupts, as well as + * the virtual mode 0x4380 and 0x4480 interrupts if AIL is enabled. + */ +EXC_COMMON_BEGIN(slb_miss_common) /* * r13 points to the PACA, r9 contains the saved CR, * r12 contains the saved r3, -- 2.7.4
[PATCH 1/3] powerpc/64s: Use BRANCH_TO_COMMON() for slb_miss_realmode
All the callers of slb_miss_realmode currently open code the #ifndef CONFIG_RELOCATABLE check and the branch via CTR in the RELOCATABLE case. We have a macro to do this, BRANCH_TO_COMMON(), so use it. Signed-off-by: Michael Ellerman --- arch/powerpc/kernel/exceptions-64s.S | 42 1 file changed, 4 insertions(+), 38 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index ed8628c6f0f4..7bdfddbe0328 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -514,18 +514,7 @@ EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80) mfspr r3,SPRN_DAR mfspr r11,SPRN_SRR1 crset 4*cr6+eq -#ifndef CONFIG_RELOCATABLE - b slb_miss_realmode -#else - /* -* We can't just use a direct branch to slb_miss_realmode -* because the distance from here to there depends on where -* the kernel ends up being put. -*/ - LOAD_HANDLER(r10, slb_miss_realmode) - mtctr r10 - bctr -#endif + BRANCH_TO_COMMON(r10, slb_miss_realmode) EXC_REAL_END(data_access_slb, 0x380, 0x80) EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80) @@ -536,18 +525,7 @@ EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80) mfspr r3,SPRN_DAR mfspr r11,SPRN_SRR1 crset 4*cr6+eq -#ifndef CONFIG_RELOCATABLE - b slb_miss_realmode -#else - /* -* We can't just use a direct branch to slb_miss_realmode -* because the distance from here to there depends on where -* the kernel ends up being put. -*/ - LOAD_HANDLER(r10, slb_miss_realmode) - mtctr r10 - bctr -#endif + BRANCH_TO_COMMON(r10, slb_miss_realmode) EXC_VIRT_END(data_access_slb, 0x4380, 0x80) TRAMP_KVM_SKIP(PACA_EXSLB, 0x380) @@ -580,13 +558,7 @@ EXC_REAL_BEGIN(instruction_access_slb, 0x480, 0x80) mfspr r3,SPRN_SRR0/* SRR0 is faulting address */ mfspr r11,SPRN_SRR1 crclr 4*cr6+eq -#ifndef CONFIG_RELOCATABLE - b slb_miss_realmode -#else - LOAD_HANDLER(r10, slb_miss_realmode) - mtctr r10 - bctr -#endif + BRANCH_TO_COMMON(r10, slb_miss_realmode) EXC_REAL_END(instruction_access_slb, 0x480, 0x80) EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80) @@ -597,13 +569,7 @@ EXC_VIRT_BEGIN(instruction_access_slb, 0x4480, 0x80) mfspr r3,SPRN_SRR0/* SRR0 is faulting address */ mfspr r11,SPRN_SRR1 crclr 4*cr6+eq -#ifndef CONFIG_RELOCATABLE - b slb_miss_realmode -#else - LOAD_HANDLER(r10, slb_miss_realmode) - mtctr r10 - bctr -#endif + BRANCH_TO_COMMON(r10, slb_miss_realmode) EXC_VIRT_END(instruction_access_slb, 0x4480, 0x80) TRAMP_KVM(PACA_EXSLB, 0x480) -- 2.7.4
Re: [BUG][next-20170619][347de24] PowerPC boot fails with Oops
On Tue, 20 Jun 2017 12:49:25 +0530 Abdul Haleem wrote: > Hi, > > commit: 347de24 (powerpc/64s: implement arch-specific hardlockup > watchdog) > > linux-next fails to boot on PowerPC Bare-metal box. > > Test: boot > Machine type: Power 8 Bare-metal > Kernel: 4.12.0-rc5-next-20170619 > gcc: version 4.8.5 > > > In file arch/powerpc/kernel/watchdog.c > > void soft_nmi_interrupt(struct pt_regs *regs) > { > unsigned long flags; > int cpu = raw_smp_processor_id(); > u64 tb; > > if (!cpumask_test_cpu(cpu, &wd_cpus_enabled)) > return; > > >>> nmi_enter(); Thanks for the report. This is due to emergency stacks not zeroing preempt_count, so they get garbage here, and it just trips the BUG_ON(in_nmi()) check. Don't think it's a bug in the proposed new powerpc watchdog. (at least I was able to reproduce your bug and fix it by fixing the stack init). Thanks, Nick
Re: [RFC v2 02/12] powerpc: Free up four 64K PTE bits in 64K backed hpte pages.
On 06/17/2017 09:22 AM, Ram Pai wrote: > Rearrange 64K PTE bits to free up bits 3, 4, 5 and 6 > in the 64K backed hpte pages. This along with the earlier > patch will entirely free up the four bits from 64K PTE. > > This patch does the following change to 64K PTE that is > backed by 64K hpte. > > H_PAGE_F_SECOND which occupied bit 4 moves to the second part > of the pte. > H_PAGE_F_GIX which occupied bit 5, 6 and 7 also moves to the > second part of the pte. > > since bit 7 is now freed up, we move H_PAGE_BUSY from bit 9 > to bit 7. Trying to minimize gaps so that contiguous bits > can be allocated if needed in the future. > > The second part of the PTE will hold > (H_PAGE_F_SECOND|H_PAGE_F_GIX) at bit 60,61,62,63. I still dont understand how we freed up the 5th bit which is used in the 5th patch. Was that bit never used for any thing on 64K page size (64K and 4K mappings) ? +#define _RPAGE_RSV50x00040UL +#define H_PAGE_PKEY_BIT0 _RPAGE_RSV1 +#define H_PAGE_PKEY_BIT1 _RPAGE_RSV2 +#define H_PAGE_PKEY_BIT2 _RPAGE_RSV3 +#define H_PAGE_PKEY_BIT3 _RPAGE_RSV4 +#define H_PAGE_PKEY_BIT4 _RPAGE_RSV5
Re: [RFC v2 01/12] powerpc: Free up four 64K PTE bits in 4K backed hpte pages.
On 06/17/2017 09:22 AM, Ram Pai wrote: > Rearrange 64K PTE bits to free up bits 3, 4, 5 and 6 > in the 4K backed hpte pages. These bits continue to be used > for 64K backed hpte pages in this patch, but will be freed > up in the next patch. The counting 3, 4, 5 and 6 are in BE format I believe, I was initially trying to see that from right to left as we normally do in the kernel and was getting confused. So basically these bits (which are only applicable for 64K mapping IIUC) are going to be freed up from the PTE format. #define _RPAGE_RSV1 0x1000UL #define _RPAGE_RSV2 0x0800UL #define _RPAGE_RSV3 0x0400UL #define _RPAGE_RSV4 0x0200UL As you have mentioned before this feature is available for 64K page size only and not for 4K mappings. So I assume we support both the combinations. * 64K mapping on 64K * 64K mapping on 4K These are the current users of the above bits #define H_PAGE_BUSY _RPAGE_RSV1 /* software: PTE & hash are busy */ #define H_PAGE_F_SECOND _RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */ #define H_PAGE_F_GIX(_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44) #define H_PAGE_HASHPTE _RPAGE_RPN43/* PTE has associated HPTE */ > > The patch does the following change to the 64K PTE format > > H_PAGE_BUSY moves from bit 3 to bit 9 and what is in there on bit 9 now ? This ? #define _RPAGE_SW2 0x00400 which is used as #define _PAGE_SPECIAL _RPAGE_SW2 /* software: special page */ which will not be required any more ? > H_PAGE_F_SECOND which occupied bit 4 moves to the second part > of the pte. > H_PAGE_F_GIX which occupied bit 5, 6 and 7 also moves to the > second part of the pte. > > the four bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot > is initialized to 0xF indicating an invalid slot. If a hpte > gets cached in a 0xF slot(i.e 7th slot of secondary), it is > released immediately. In other words, even though 0xF is a Release immediately means we attempt again for a new hash slot ? > valid slot we discard and consider it as an invalid > slot;i.e hpte_soft_invalid(). This gives us an opportunity to not > depend on a bit in the primary PTE in order to determine the > validity of a slot. So we have to see the slot number in the second half for each PTE to figure out if it has got a valid slot in the hash page table. > > When we release ahpte in the 0xF slot we also release a > legitimate primary slot andunmapthat entry. This is to > ensure that we do get a legimate non-0xF slot the next time we > retry for a slot. Okay. > > Though treating 0xF slot as invalid reduces the number of available > slots and may have an effect on the performance, the probabilty > of hitting a 0xF is extermely low. Why you say that ? I thought every slot number has the same probability of hit from the hash function. > > Compared to the current scheme, the above described scheme reduces > the number of false hash table updates significantly and has the How it reduces false hash table updates ? > added advantage of releasing four valuable PTE bits for other > purpose. > > This idea was jointly developed by Paul Mackerras, Aneesh, Michael > Ellermen and myself. > > 4K PTE format remain unchanged currently. > > Signed-off-by: Ram Pai > --- > arch/powerpc/include/asm/book3s/64/hash-4k.h | 20 +++ > arch/powerpc/include/asm/book3s/64/hash-64k.h | 32 +++ > arch/powerpc/include/asm/book3s/64/hash.h | 15 +++-- > arch/powerpc/include/asm/book3s/64/mmu-hash.h | 5 ++ > arch/powerpc/mm/dump_linuxpagetables.c| 3 +- > arch/powerpc/mm/hash64_4k.c | 14 ++--- > arch/powerpc/mm/hash64_64k.c | 81 > --- > arch/powerpc/mm/hash_utils_64.c | 30 +++--- > 8 files changed, 122 insertions(+), 78 deletions(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h > b/arch/powerpc/include/asm/book3s/64/hash-4k.h > index b4b5e6b..5ef1d81 100644 > --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h > +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h > @@ -16,6 +16,18 @@ > #define H_PUD_TABLE_SIZE (sizeof(pud_t) << H_PUD_INDEX_SIZE) > #define H_PGD_TABLE_SIZE (sizeof(pgd_t) << H_PGD_INDEX_SIZE) > > + > +/* > + * Only supported by 4k linux page size > + */ > +#define H_PAGE_F_SECOND_RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */ > +#define H_PAGE_F_GIX (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44) > +#define H_PAGE_F_GIX_SHIFT 56 > + > +#define H_PAGE_BUSY _RPAGE_RSV1 /* software: PTE & hash are busy */ > +#define H_PAGE_HASHPTE _RPAGE_RPN43/* PTE has associated HPTE */ > + > + So we moved the common 64K definitions here. > /* PTE flags to conserve for HPTE identification */ > #define _PAGE_HPTEFLAGS (H
Re: [RFC v2 00/12] powerpc: Memory Protection Keys
On Tue, 2017-06-20 at 15:10 +1000, Balbir Singh wrote: > On Fri, 2017-06-16 at 20:52 -0700, Ram Pai wrote: > > Memory protection keys enable applications to protect its > > address space from inadvertent access or corruption from > > itself. > > I presume by itself you mean protection between threads? Not necessarily. You could have for example a JIT that when it runs the JITed code, only "opens" the keys for the VM itself, preventing the JITed code from "leaking out" There are plenty of other usages... > > > The overall idea: > > > > A process allocates a key and associates it with > > a address range withinits address space. > > OK, so this is per VMA? > > > The process than can dynamically set read/write > > permissions on the key without involving the > > kernel. > > This bit is not clear, how can the key be set without > involving the kernel? I presume you mean the key is set > in the PTE's and the access protection values can be > set without involving the kernel? > > Any code that violates the permissions > > off the address space; as defined by its associated > > key, will receive a segmentation fault. > > > > This patch series enables the feature on PPC64. > > It is enabled on HPTE 64K-page platform. > > > > ISA3.0 section 5.7.13 describes the detailed specifications. > > > > > > Testing: > > This patch series has passed all the protection key > > tests available in the selftests directory. > > The tests are updated to work on both x86 and powerpc. > > Balbir
Re: clean up and modularize arch dma_mapping interface
On Thu, Jun 08, 2017 at 03:25:25PM +0200, Christoph Hellwig wrote: > Hi all, > > for a while we have a generic implementation of the dma mapping routines > that call into per-arch or per-device operations. But right now there > still are various bits in the interfaces where don't clearly operate > on these ops. This series tries to clean up a lot of those (but not all > yet, but the series is big enough). It gets rid of the DMA_ERROR_CODE > way of signaling failures of the mapping routines from the > implementations to the generic code (and cleans up various drivers that > were incorrectly using it), and gets rid of the ->set_dma_mask routine > in favor of relying on the ->dma_capable method that can be used in > the same way, but which requires less code duplication. > > Btw, we don't seem to have a tree every-growing amount of common dma > mapping code, and given that I have a fair amount of all over the tree > work in that area in my plate I'd like to start one. Any good reason > to that? Anyone willing to volunteer as co maintainer? > > The whole series is also available in git: > > git://git.infradead.org/users/hch/misc.git dma-map Ack for the 2 drm patches, but I can also pick them up through drm-misc if you prefer that (but then it'll be 4.14). -Daniel > > Gitweb: > > http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-map > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
[PATCH] powernv/npu-dma.c: Add explicit flush when sending an ATSD
NPU2 requires an extra explicit flush to an active GPU PID when sending address translation shoot downs (ATSDs) to reliably flush the GPU TLB. This patch adds just such a flush at the end of each sequence of ATSDs. We can safely use PID 0 which is always reserved and active on the GPU. PID 0 is only used for init_mm which will never be a user mm on the GPU. To enforce this we add a check in pnv_npu2_init_context() just in case someone tries to use PID 0 on the GPU. Signed-off-by: Alistair Popple --- Michael, It turns out my assumptions about MMU_NO_CONTEXT were wrong so we have reverted to using HW context id/PID 0 (init_mm) as that is quite clearly reserved on hash and radix and it seems unlikely PID 0 would ever be used for anything else. That said if you feel strongly it would be easy enough to add functions to reserve a PID and an exported function for device drivers to call to find out what the reserved PID is. I was avoiding it because it would be more invasive adding code and an external API for something that I'm not sure will ever change, although if it does there is a check in pnv2_npu2_init_context() to flag it so it won't result in weird bugs. Anyway let me know which way you would like us to go here and I can update the patch as required, thanks! - Alistair arch/powerpc/platforms/powernv/npu-dma.c | 93 ++-- 1 file changed, 64 insertions(+), 29 deletions(-) diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c index e6f444b..9468064 100644 --- a/arch/powerpc/platforms/powernv/npu-dma.c +++ b/arch/powerpc/platforms/powernv/npu-dma.c @@ -449,7 +449,7 @@ static int mmio_launch_invalidate(struct npu *npu, unsigned long launch, return mmio_atsd_reg; } -static int mmio_invalidate_pid(struct npu *npu, unsigned long pid) +static int mmio_invalidate_pid(struct npu *npu, unsigned long pid, bool flush) { unsigned long launch; @@ -465,12 +465,15 @@ static int mmio_invalidate_pid(struct npu *npu, unsigned long pid) /* PID */ launch |= pid << PPC_BITLSHIFT(38); + /* No flush */ + launch |= !flush << PPC_BITLSHIFT(39); + /* Invalidating the entire process doesn't use a va */ return mmio_launch_invalidate(npu, launch, 0); } static int mmio_invalidate_va(struct npu *npu, unsigned long va, - unsigned long pid) + unsigned long pid, bool flush) { unsigned long launch; @@ -486,26 +489,60 @@ static int mmio_invalidate_va(struct npu *npu, unsigned long va, /* PID */ launch |= pid << PPC_BITLSHIFT(38); + /* No flush */ + launch |= !flush << PPC_BITLSHIFT(39); + return mmio_launch_invalidate(npu, launch, va); } #define mn_to_npu_context(x) container_of(x, struct npu_context, mn) +struct mmio_atsd_reg { + struct npu *npu; + int reg; +}; + +static void mmio_invalidate_wait( + struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS], bool flush) +{ + struct npu *npu; + int i, reg; + + /* Wait for all invalidations to complete */ + for (i = 0; i <= max_npu2_index; i++) { + if (mmio_atsd_reg[i].reg < 0) + continue; + + /* Wait for completion */ + npu = mmio_atsd_reg[i].npu; + reg = mmio_atsd_reg[i].reg; + while (__raw_readq(npu->mmio_atsd_regs[reg] + XTS_ATSD_STAT)) + cpu_relax(); + + put_mmio_atsd_reg(npu, reg); + + /* +* The GPU requires two flush ATSDs to ensure all entries have +* been flushed. We use PID 0 as it will never be used for a +* process on the GPU. +*/ + if (flush) + mmio_invalidate_pid(npu, 0, 1); + } +} + /* * Invalidate either a single address or an entire PID depending on * the value of va. */ static void mmio_invalidate(struct npu_context *npu_context, int va, - unsigned long address) + unsigned long address, bool flush) { - int i, j, reg; + int i, j; struct npu *npu; struct pnv_phb *nphb; struct pci_dev *npdev; - struct { - struct npu *npu; - int reg; - } mmio_atsd_reg[NV_MAX_NPUS]; + struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS]; unsigned long pid = npu_context->mm->context.id; /* @@ -525,10 +562,11 @@ static void mmio_invalidate(struct npu_context *npu_context, int va, if (va) mmio_atsd_reg[i].reg = - mmio_invalidate_va(npu, address, pid); + mmio_invalidate_va(npu, address, pid, + flush); else
Re: [RFC v2 06/12] powerpc: Program HPTE key protection bits.
On 06/17/2017 09:22 AM, Ram Pai wrote: > Map the PTE protection key bits to the HPTE key protection bits, > while creatiing HPTE entries. > > Signed-off-by: Ram Pai > --- > arch/powerpc/include/asm/book3s/64/mmu-hash.h | 5 + > arch/powerpc/include/asm/pkeys.h | 7 +++ > arch/powerpc/mm/hash_utils_64.c | 5 + > 3 files changed, 17 insertions(+) > > diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h > b/arch/powerpc/include/asm/book3s/64/mmu-hash.h > index cfb8169..3d7872c 100644 > --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h > +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h > @@ -90,6 +90,8 @@ > #define HPTE_R_PP0 ASM_CONST(0x8000) > #define HPTE_R_TSASM_CONST(0x4000) > #define HPTE_R_KEY_HIASM_CONST(0x3000) > +#define HPTE_R_KEY_BIT0 ASM_CONST(0x2000) > +#define HPTE_R_KEY_BIT1 ASM_CONST(0x1000) > #define HPTE_R_RPN_SHIFT 12 > #define HPTE_R_RPN ASM_CONST(0x0000) > #define HPTE_R_RPN_3_0 ASM_CONST(0x01fff000) > @@ -104,6 +106,9 @@ > #define HPTE_R_C ASM_CONST(0x0080) > #define HPTE_R_R ASM_CONST(0x0100) > #define HPTE_R_KEY_LOASM_CONST(0x0e00) > +#define HPTE_R_KEY_BIT2 ASM_CONST(0x0800) > +#define HPTE_R_KEY_BIT3 ASM_CONST(0x0400) > +#define HPTE_R_KEY_BIT4 ASM_CONST(0x0200) > Should we indicate/document how these 5 bits are not contiguous in the HPTE format for any given real page ? > #define HPTE_V_1TB_SEG ASM_CONST(0x4000) > #define HPTE_V_VRMA_MASK ASM_CONST(0x4001ff00) > diff --git a/arch/powerpc/include/asm/pkeys.h > b/arch/powerpc/include/asm/pkeys.h > index 0f3dca8..9b6820d 100644 > --- a/arch/powerpc/include/asm/pkeys.h > +++ b/arch/powerpc/include/asm/pkeys.h > @@ -27,6 +27,13 @@ > ((vm_flags & VM_PKEY_BIT3) ? H_PAGE_PKEY_BIT1 : 0x0UL) | \ > ((vm_flags & VM_PKEY_BIT4) ? H_PAGE_PKEY_BIT0 : 0x0UL)) > > +#define calc_pte_to_hpte_pkey_bits(pteflags) \ > + (((pteflags & H_PAGE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL) |\ > + ((pteflags & H_PAGE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) | \ > + ((pteflags & H_PAGE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) | \ > + ((pteflags & H_PAGE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) | \ > + ((pteflags & H_PAGE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL)) > + We can drop calc_ in here. pte_to_hpte_pkey_bits should be sufficient.
Re: [RFC v2 07/12] powerpc: Macro the mask used for checking DSI exception
On 06/17/2017 09:22 AM, Ram Pai wrote: > Replace the magic number used to check for DSI exception > with a meaningful value. > > Signed-off-by: Ram Pai > --- > arch/powerpc/include/asm/reg.h | 9 - > arch/powerpc/kernel/exceptions-64s.S | 2 +- > 2 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h > index 7e50e47..2dcb8a1 100644 > --- a/arch/powerpc/include/asm/reg.h > +++ b/arch/powerpc/include/asm/reg.h > @@ -272,16 +272,23 @@ > #define SPRN_DAR 0x013 /* Data Address Register */ > #define SPRN_DBCR0x136 /* e300 Data Breakpoint Control Reg */ > #define SPRN_DSISR 0x012 /* Data Storage Interrupt Status Register */ > +#define DSISR_BIT320x8000 /* not defined */ > #define DSISR_NOHPTE 0x4000 /* no translation found > */ > +#define DSISR_PAGEATTR_CONFLT 0x2000 /* page attribute > conflict */ > +#define DSISR_BIT350x1000 /* not defined */ > #define DSISR_PROTFAULT0x0800 /* protection fault */ > #define DSISR_BADACCESS0x0400 /* bad access to CI or G */ > #define DSISR_ISSTORE 0x0200 /* access was a store */ > #define DSISR_DABRMATCH0x0040 /* hit data breakpoint */ > -#define DSISR_NOSEGMENT0x0020 /* SLB miss */ > #define DSISR_KEYFAULT 0x0020 /* Key fault */ > +#define DSISR_BIT430x0010 /* not defined */ > #define DSISR_UNSUPP_MMU 0x0008 /* Unsupported MMU config */ > #define DSISR_SET_RC 0x0004 /* Failed setting of > R/C bits */ > #define DSISR_PGDIRFAULT 0x0002 /* Fault on page directory */ > +#define DSISR_PAGE_FAULT_MASK (DSISR_BIT32 | \ > + DSISR_PAGEATTR_CONFLT | \ > + DSISR_BADACCESS | \ > + DSISR_BIT43) Sorry missed this one. Seems like there are couple of unnecessary line additions in the subsequent patch which adds the new PKEY reason code. -#define DSISR_PAGE_FAULT_MASK (DSISR_BIT32 | \ - DSISR_PAGEATTR_CONFLT | \ - DSISR_BADACCESS | \ +#define DSISR_PAGE_FAULT_MASK (DSISR_BIT32 | \ + DSISR_PAGEATTR_CONFLT | \ + DSISR_BADACCESS | \ + DSISR_KEYFAULT |\ DSISR_BIT43)
unsubscribe
Sadly, after >20 years
Re: [PATCH V2 1/2] hwmon: (ibmpowernv) introduce a legacy_compatibles array
On 06/20/2017 09:15 AM, Shilpasri G Bhat wrote: > > > On 06/20/2017 11:36 AM, Cédric Le Goater wrote: >> On 06/20/2017 07:08 AM, Shilpasri G Bhat wrote: >>> From: Cédric Le Goater >>> >>> Today, the type of a PowerNV sensor system is determined with the >>> "compatible" property for legacy Firmwares and with the "sensor-type" >>> for newer ones. The same array of strings is used for both to do the >>> matching and this raises some issue to introduce new sensor types. >>> >>> Let's introduce two different arrays (legacy and current) to make >>> things easier for new sensor types. >>> >>> Signed-off-by: Cédric Le Goater >>> Tested-by: Shilpasri G Bhat >> >> Did you test on a Tuleta (IBM Power) system ? > > I have tested this patch on P9 FSP and Firestone. OK. I just gave it a try on a Tuleta, P8 FSP, IBM Power system Looks good. Thanks, C. > >> >> Thanks, >> >> C. >> >>> --- >>> drivers/hwmon/ibmpowernv.c | 26 ++ >>> 1 file changed, 18 insertions(+), 8 deletions(-) >>> >>> diff --git a/drivers/hwmon/ibmpowernv.c b/drivers/hwmon/ibmpowernv.c >>> index 862b832..6d8909c 100644 >>> --- a/drivers/hwmon/ibmpowernv.c >>> +++ b/drivers/hwmon/ibmpowernv.c >>> @@ -55,17 +55,27 @@ enum sensors { >>> >>> #define INVALID_INDEX (-1U) >>> >>> +/* >>> + * 'compatible' string properties for sensor types as defined in old >>> + * PowerNV firmware (skiboot). These are ordered as 'enum sensors'. >>> + */ >>> +static const char * const legacy_compatibles[] = { >>> + "ibm,opal-sensor-cooling-fan", >>> + "ibm,opal-sensor-amb-temp", >>> + "ibm,opal-sensor-power-supply", >>> + "ibm,opal-sensor-power" >>> +}; >>> + >>> static struct sensor_group { >>> - const char *name; >>> - const char *compatible; >>> + const char *name; /* matches property 'sensor-type' */ >>> struct attribute_group group; >>> u32 attr_count; >>> u32 hwmon_index; >>> } sensor_groups[] = { >>> - {"fan", "ibm,opal-sensor-cooling-fan"}, >>> - {"temp", "ibm,opal-sensor-amb-temp"}, >>> - {"in", "ibm,opal-sensor-power-supply"}, >>> - {"power", "ibm,opal-sensor-power"} >>> + { "fan" }, >>> + { "temp" }, >>> + { "in"}, >>> + { "power" } >>> }; >>> >>> struct sensor_data { >>> @@ -239,8 +249,8 @@ static int get_sensor_type(struct device_node *np) >>> enum sensors type; >>> const char *str; >>> >>> - for (type = 0; type < MAX_SENSOR_TYPE; type++) { >>> - if (of_device_is_compatible(np, sensor_groups[type].compatible)) >>> + for (type = 0; type < ARRAY_SIZE(legacy_compatibles); type++) { >>> + if (of_device_is_compatible(np, legacy_compatibles[type])) >>> return type; >>> } >>> >>> >> >
[PATCH] powerpc/time: Fix tracing in time.c
Since trace_clock is in a different file and already marked with notrace, enable tracing in time.c by removing it from the disabled list in Makefile. Also annotate clocksource read functions and sched_clock with notrace. Testing: Timer and ftrace selftests run with different trace clocks. Acked-by: Naveen N. Rao Signed-off-by: Santosh Sivaraj --- arch/powerpc/kernel/Makefile | 2 -- arch/powerpc/kernel/time.c | 6 +++--- 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index e132902..0845eeb 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -25,8 +25,6 @@ CFLAGS_REMOVE_cputable.o = -mno-sched-epilog $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_prom_init.o = -mno-sched-epilog $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_btext.o = -mno-sched-epilog $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_prom.o = -mno-sched-epilog $(CC_FLAGS_FTRACE) -# timers used by tracing -CFLAGS_REMOVE_time.o = -mno-sched-epilog $(CC_FLAGS_FTRACE) endif obj-y := cputable.o ptrace.o syscalls.o \ diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index 2b33cfa..896ba1a 100644 --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -675,7 +675,7 @@ EXPORT_SYMBOL_GPL(tb_to_ns); * the high 64 bits of a * b, i.e. (a * b) >> 64, where a and b * are 64-bit unsigned numbers. */ -unsigned long long sched_clock(void) +notrace unsigned long long sched_clock(void) { if (__USE_RTC()) return get_rtc(); @@ -823,12 +823,12 @@ void read_persistent_clock(struct timespec *ts) } /* clocksource code */ -static u64 rtc_read(struct clocksource *cs) +static notrace u64 rtc_read(struct clocksource *cs) { return (u64)get_rtc(); } -static u64 timebase_read(struct clocksource *cs) +static notrace u64 timebase_read(struct clocksource *cs) { return (u64)get_tb(); } -- 2.9.4
Re: [RFC v2 08/12] powerpc: Handle exceptions caused by violation of pkey protection.
On 06/17/2017 09:22 AM, Ram Pai wrote: > Handle Data and Instruction exceptions caused by memory > protection-key. > > Signed-off-by: Ram Pai > (cherry picked from commit a5e5217619a0c475fe0cacc3b0cf1d3d33c79a09) To which tree this commit belongs to ? > > Conflicts: > arch/powerpc/include/asm/reg.h > arch/powerpc/kernel/exceptions-64s.S > --- > arch/powerpc/include/asm/mmu_context.h | 12 + > arch/powerpc/include/asm/pkeys.h | 9 > arch/powerpc/include/asm/reg.h | 7 +-- > arch/powerpc/mm/fault.c| 21 +++- > arch/powerpc/mm/pkeys.c| 90 > ++ > 5 files changed, 134 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/include/asm/mmu_context.h > b/arch/powerpc/include/asm/mmu_context.h > index da7e943..71fffe0 100644 > --- a/arch/powerpc/include/asm/mmu_context.h > +++ b/arch/powerpc/include/asm/mmu_context.h > @@ -175,11 +175,23 @@ static inline void arch_bprm_mm_init(struct mm_struct > *mm, > { > } > > +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS > +bool arch_pte_access_permitted(pte_t pte, bool write); > +bool arch_vma_access_permitted(struct vm_area_struct *vma, > + bool write, bool execute, bool foreign); > +#else /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */ > +static inline bool arch_pte_access_permitted(pte_t pte, bool write) > +{ > + /* by default, allow everything */ > + return true; > +} > static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, > bool write, bool execute, bool foreign) > { > /* by default, allow everything */ > return true; > } Right, these are the two functions the core VM expects the arch to provide. > +#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */ > + > #endif /* __KERNEL__ */ > #endif /* __ASM_POWERPC_MMU_CONTEXT_H */ > diff --git a/arch/powerpc/include/asm/pkeys.h > b/arch/powerpc/include/asm/pkeys.h > index 9b6820d..405e7db 100644 > --- a/arch/powerpc/include/asm/pkeys.h > +++ b/arch/powerpc/include/asm/pkeys.h > @@ -14,6 +14,15 @@ > VM_PKEY_BIT3 | \ > VM_PKEY_BIT4) > > +static inline u16 pte_flags_to_pkey(unsigned long pte_flags) > +{ > + return ((pte_flags & H_PAGE_PKEY_BIT4) ? 0x1 : 0x0) | > + ((pte_flags & H_PAGE_PKEY_BIT3) ? 0x2 : 0x0) | > + ((pte_flags & H_PAGE_PKEY_BIT2) ? 0x4 : 0x0) | > + ((pte_flags & H_PAGE_PKEY_BIT1) ? 0x8 : 0x0) | > + ((pte_flags & H_PAGE_PKEY_BIT0) ? 0x10 : 0x0); > +} Add defines for the above 0x1, 0x2, 0x4, 0x8 etc ? > + > #define pkey_to_vmflag_bits(key) (((key & 0x1UL) ? VM_PKEY_BIT0 : 0x0UL) | \ > ((key & 0x2UL) ? VM_PKEY_BIT1 : 0x0UL) |\ > ((key & 0x4UL) ? VM_PKEY_BIT2 : 0x0UL) |\ > diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h > index 2dcb8a1..a11977f 100644 > --- a/arch/powerpc/include/asm/reg.h > +++ b/arch/powerpc/include/asm/reg.h > @@ -285,9 +285,10 @@ > #define DSISR_UNSUPP_MMU 0x0008 /* Unsupported MMU config */ > #define DSISR_SET_RC 0x0004 /* Failed setting of > R/C bits */ > #define DSISR_PGDIRFAULT 0x0002 /* Fault on page directory */ > -#define DSISR_PAGE_FAULT_MASK (DSISR_BIT32 | \ > - DSISR_PAGEATTR_CONFLT | \ > - DSISR_BADACCESS | \ > +#define DSISR_PAGE_FAULT_MASK (DSISR_BIT32 | \ > + DSISR_PAGEATTR_CONFLT | \ > + DSISR_BADACCESS | \ > + DSISR_KEYFAULT |\ > DSISR_BIT43) This should have been cleaned up before adding new DSISR_KEYFAULT reason code into it. But I guess its okay. > #define SPRN_TBRL0x10C /* Time Base Read Lower Register (user, R/O) */ > #define SPRN_TBRU0x10D /* Time Base Read Upper Register (user, R/O) */ > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c > index 3a7d580..c31624f 100644 > --- a/arch/powerpc/mm/fault.c > +++ b/arch/powerpc/mm/fault.c > @@ -216,9 +216,10 @@ int do_page_fault(struct pt_regs *regs, unsigned long > address, >* bits we are interested in. But there are some bits which >* indicate errors in DSISR but can validly be set in SRR1. >*/ > - if (trap == 0x400) > + if (trap == 0x400) { > error_code &= 0x4820; > - else > + flags |= FAULT_FLAG_INSTRUCTION; > + } else > is_write = error_code & DSISR_ISSTORE; > #else Why adding the FAULT_FLAG_INSTRUCTION here ? > is_write = error_code & ESR_DST; > @@ -261,6 +262,13 @@ int do_page_fault(struct pt_regs *regs, unsigned long > address, > } > #endif > > +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS > + if (error_code & DSISR_KEYFAULT) { > + code = SEGV_PKUERR; > +
[BUG][next-20170619][347de24] PowerPC boot fails with Oops
Hi, commit: 347de24 (powerpc/64s: implement arch-specific hardlockup watchdog) linux-next fails to boot on PowerPC Bare-metal box. Test: boot Machine type: Power 8 Bare-metal Kernel: 4.12.0-rc5-next-20170619 gcc: version 4.8.5 In file arch/powerpc/kernel/watchdog.c void soft_nmi_interrupt(struct pt_regs *regs) { unsigned long flags; int cpu = raw_smp_processor_id(); u64 tb; if (!cpumask_test_cpu(cpu, &wd_cpus_enabled)) return; >>> nmi_enter(); tb = get_tb(); commit 347de24231df9f82969e2de3ad9f6976f1856a0f Author: Nicholas Piggin Date: Sat Jun 17 09:33:56 2017 +1000 powerpc/64s: implement arch-specific hardlockup watchdog Implement an arch-speicfic watchdog rather than use the perf-based hardlockup detector. The new watchdog takes the soft-NMI directly, rather than going through perf. Perf interrupts are to be made maskable in future, so that would prevent the perf detector from working in those regions. boot logs: -- cpuidle: using governor menu pstore: using zlib compression pstore: Registered nvram as persistent store backend [ cut here ] kernel BUG at arch/powerpc/kernel/watchdog.c:206! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 67 PID: 0 Comm: swapper/67 Not tainted 4.12.0-rc5-next-20170619 #1 task: c00f272be700 task.stack: c00f2736c000 NIP: c002c5fc LR: c002c5e8 CTR: c016f570 REGS: c0003fcd7a00 TRAP: 0700 Not tainted (4.12.0-rc5-next-20170619) MSR: 90021033 CR: 22004022 XER: 2000 CFAR: c0149c6c SOFTE: 0 GPR00: c002c5e8 c0003fcd7c80 c105e900 GPR04: 00073388 c00fff7cf014 GPR08: 000ffea9 0010 4000 GPR12: 90009033 cfd57080 c00f2736ff90 GPR16: 40376a80 40376ac8 GPR20: c00ffe63 0001 0002 GPR24: c00f2736c000 c00f2736c080 0008 GPR28: c0003fcd7d80 0003 0008 0043 NIP [c002c5fc] soft_nmi_interrupt+0x9c/0x2e0 LR [c002c5e8] soft_nmi_interrupt+0x88/0x2e0 Call Trace: Instruction dump: eba1ffe8 ebc1fff0 ebe1fff8 7c0803a6 4e800020 7c7c1b78 4811d615 6000 78290464 8129000c 552902d6 79290020 <0b09> 78290464 8149000c 3d4a0011 [ cut here ] kernel BUG at arch/powerpc/kernel/watchdog.c:206! [ cut here ] kernel BUG at arch/powerpc/kernel/watchdog.c:206! [ cut here ] kernel BUG at arch/powerpc/kernel/watchdog.c:206! [ cut here ] kernel BUG at arch/powerpc/kernel/watchdog.c:206! random: print_oops_end_marker+0x6c/0xa0 get_random_bytes called with crng_init=0 ---[ end trace 9756c1a885c69f33 ]--- -- Regard's Abdul Haleem IBM Linux Technology Centre kernel:kexec: Starting new kernel [ 205.035822] kexec: waiting for cpu 48 (physical 168) to enter 1 state [ 205.035955] kexec: waiting for cpu 0 (physical 32) to enter OPAL [ 205.036870] kexec: waiting for cpu 2 (physical 34) to enter OPAL [ 205.037038] kexec: waiting for cpu 21 (physical 53) to enter OPAL [0.00] opal: OPAL detected ! [0.00] Page sizes from device-tree: [0.00] base_shift=12: shift=12, sllp=0x, avpnm=0x, tlbiel=1, penc=0 [0.00] base_shift=12: shift=16, sllp=0x, avpnm=0x, tlbiel=1, penc=7 [0.00] base_shift=12: shift=24, sllp=0x, avpnm=0x, tlbiel=1, penc=56 [0.00] base_shift=16: shift=16, sllp=0x0110, avpnm=0x, tlbiel=1, penc=1 [0.00] base_shift=16: shift=24, sllp=0x0110, avpnm=0x, tlbiel=1, penc=8 [0.00] base_shift=24: shift=24, sllp=0x0100, avpnm=0x0001, tlbiel=0, penc=0 [0.00] base_shift=34: shift=34, sllp=0x0120, avpnm=0x07ff, tlbiel=0, penc=3 [0.00] Using 1TB segments [0.00] Initializing hash mmu with SLB [0.00] Linux version 4.12.0-rc5-next-20170619 (r...@ltc-test-ci3.aus.stglabs.ibm.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Jun 20 12:17:53 IST 2017 [0.00] Found initrd at 0xc291:0xc3bea97a [0.00] Using PowerNV machine description [0.00] bootconsole [udbg0] enabled [0.00] CPU maps initialized for 8 threads per core -> smp_release_cpus() spinning_secondaries = 79 <- smp_release_cpus() [0.00] - [0.00] ppc64_pft_size= 0x0 [0.00] phys_mem_size = 0x10 [0.00] dcache_bsize = 0x80 [0.00] icache_bsize = 0x80 [0.00] cpu_features = 0x17fc7aed18500249 [0.00] possible= 0x5fff
Re: [PATCH V2 1/2] hwmon: (ibmpowernv) introduce a legacy_compatibles array
On 06/20/2017 11:36 AM, Cédric Le Goater wrote: > On 06/20/2017 07:08 AM, Shilpasri G Bhat wrote: >> From: Cédric Le Goater >> >> Today, the type of a PowerNV sensor system is determined with the >> "compatible" property for legacy Firmwares and with the "sensor-type" >> for newer ones. The same array of strings is used for both to do the >> matching and this raises some issue to introduce new sensor types. >> >> Let's introduce two different arrays (legacy and current) to make >> things easier for new sensor types. >> >> Signed-off-by: Cédric Le Goater >> Tested-by: Shilpasri G Bhat > > Did you test on a Tuleta (IBM Power) system ? I have tested this patch on P9 FSP and Firestone. > > Thanks, > > C. > >> --- >> drivers/hwmon/ibmpowernv.c | 26 ++ >> 1 file changed, 18 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/hwmon/ibmpowernv.c b/drivers/hwmon/ibmpowernv.c >> index 862b832..6d8909c 100644 >> --- a/drivers/hwmon/ibmpowernv.c >> +++ b/drivers/hwmon/ibmpowernv.c >> @@ -55,17 +55,27 @@ enum sensors { >> >> #define INVALID_INDEX (-1U) >> >> +/* >> + * 'compatible' string properties for sensor types as defined in old >> + * PowerNV firmware (skiboot). These are ordered as 'enum sensors'. >> + */ >> +static const char * const legacy_compatibles[] = { >> +"ibm,opal-sensor-cooling-fan", >> +"ibm,opal-sensor-amb-temp", >> +"ibm,opal-sensor-power-supply", >> +"ibm,opal-sensor-power" >> +}; >> + >> static struct sensor_group { >> -const char *name; >> -const char *compatible; >> +const char *name; /* matches property 'sensor-type' */ >> struct attribute_group group; >> u32 attr_count; >> u32 hwmon_index; >> } sensor_groups[] = { >> -{"fan", "ibm,opal-sensor-cooling-fan"}, >> -{"temp", "ibm,opal-sensor-amb-temp"}, >> -{"in", "ibm,opal-sensor-power-supply"}, >> -{"power", "ibm,opal-sensor-power"} >> +{ "fan" }, >> +{ "temp" }, >> +{ "in"}, >> +{ "power" } >> }; >> >> struct sensor_data { >> @@ -239,8 +249,8 @@ static int get_sensor_type(struct device_node *np) >> enum sensors type; >> const char *str; >> >> -for (type = 0; type < MAX_SENSOR_TYPE; type++) { >> -if (of_device_is_compatible(np, sensor_groups[type].compatible)) >> +for (type = 0; type < ARRAY_SIZE(legacy_compatibles); type++) { >> +if (of_device_is_compatible(np, legacy_compatibles[type])) >> return type; >> } >> >> >
Re: [RFC PATCH 0/7 v1] powerpc: Memory Protection Keys
Hi! > Memory protection keys enable applications to protect its > address space from inadvertent access or corruption from > itself. > > The overall idea: > > A process allocates a key and associates it with > a address range withinits address space. > The process than can dynamically set read/write > permissions on the key without involving the > kernel. Any code that violates the permissions > off the address space; as defined by its associated > key, will receive a segmentation fault. Do you have some documentation how userspace should use this? Will it be possible to hide details in libc so that it works across architectures? Do you have some kind of library that hides them? Where would you like it to be used? Web browsers? How does it interact with ptrace()? With /dev/mem? With /proc/XXX/mem? Will it enable malware to become very hard to understand? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH] powerpc/time: Fix tracing in time.c
On 2017/06/20 11:50AM, Santosh Sivaraj wrote: > Since trace_clock is in a different file and already marked with notrace, > enable tracing in time.c by removing it from the disabled list in Makefile. > Also annotate clocksource read functions and sched_clock with notrace. > > Testing: Timer and ftrace selftests run with different trace clocks. > > CC: Naveen N. Rao > Signed-off-by: Santosh Sivaraj Thanks for doing this! Apart from the minor nit below: Acked-by: Naveen N. Rao > --- > arch/powerpc/kernel/Makefile | 2 -- > arch/powerpc/kernel/time.c | 6 +++--- > 2 files changed, 3 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile > index e132902..0845eeb 100644 > --- a/arch/powerpc/kernel/Makefile > +++ b/arch/powerpc/kernel/Makefile > @@ -25,8 +25,6 @@ CFLAGS_REMOVE_cputable.o = -mno-sched-epilog > $(CC_FLAGS_FTRACE) > CFLAGS_REMOVE_prom_init.o = -mno-sched-epilog $(CC_FLAGS_FTRACE) > CFLAGS_REMOVE_btext.o = -mno-sched-epilog $(CC_FLAGS_FTRACE) > CFLAGS_REMOVE_prom.o = -mno-sched-epilog $(CC_FLAGS_FTRACE) > -# timers used by tracing > -CFLAGS_REMOVE_time.o = -mno-sched-epilog $(CC_FLAGS_FTRACE) > endif > > obj-y:= cputable.o ptrace.o syscalls.o \ > diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c > index 2b33cfa..6d10c5f 100644 > --- a/arch/powerpc/kernel/time.c > +++ b/arch/powerpc/kernel/time.c > @@ -675,7 +675,7 @@ EXPORT_SYMBOL_GPL(tb_to_ns); > * the high 64 bits of a * b, i.e. (a * b) >> 64, where a and b > * are 64-bit unsigned numbers. > */ > -unsigned long long sched_clock(void) > +unsigned long long notrace sched_clock(void) For the sake of consistency, it's probably better to add the notrace annotation before the return values, though I see that the prototype in include/sched.h has used this order. - Naveen > { > if (__USE_RTC()) > return get_rtc(); > @@ -823,12 +823,12 @@ void read_persistent_clock(struct timespec *ts) > } > > /* clocksource code */ > -static u64 rtc_read(struct clocksource *cs) > +static notrace u64 rtc_read(struct clocksource *cs) > { > return (u64)get_rtc(); > } > > -static u64 timebase_read(struct clocksource *cs) > +static notrace u64 timebase_read(struct clocksource *cs) > { > return (u64)get_tb(); > } > -- > 2.9.4 >