Re: [PATCH 2/2] powerpc/8xx: Add microcode patch to move SMC parameter RAM.

2019-05-13 Thread Michael Ellerman
Christophe Leroy  writes:

> Some SCC functions like the QMC requires an extended parameter RAM.
> On modern 8xx (ie 866 and 885), SPI area can already be relocated,
> allowing the use of those functions on SCC2. But SCC3 and SCC4
> parameter RAM collide with SMC1 and SMC2 parameter RAMs.
>
> This patch adds microcode to allow the relocation of both SMC1 and
> SMC2, and relocate them at offsets 0x1ec0 and 0x1fc0.
> Those offsets are by default for the CPM1 DSP1 and DSP2, but there
> is no kernel driver using them at the moment so this area can be
> reused.
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/platforms/8xx/Kconfig  |   7 ++
>  arch/powerpc/platforms/8xx/micropatch.c | 109 
> +++-
>  2 files changed, 114 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/platforms/8xx/micropatch.c 
> b/arch/powerpc/platforms/8xx/micropatch.c
> index 33a9042fca80..dc4423daf7d4 100644
> --- a/arch/powerpc/platforms/8xx/micropatch.c
> +++ b/arch/powerpc/platforms/8xx/micropatch.c
> @@ -622,6 +622,86 @@ static uint patch_2f00[] __initdata = {
>  };
>  #endif
>  
> +/*
> + * SMC relocation patch arrays.
> + */
> +
> +#ifdef CONFIG_SMC_UCODE_PATCH
> +
> +static uint patch_2000[] __initdata = {
> + 0x3fff, 0x3ffd, 0x3ffb, 0x3ff9,
> + 0x5fefeff8, 0x5f91eff8, 0x3ff3, 0x3ff1,
> + 0x3a11e710, 0xedf0ccb9, 0xf318ed66, 0x7f0e5fe2,

Do we have any doc on what these values are?

I get that it's microcode but do we have any more detail than that?
What's the source etc?

cheers


Re: [PATCH v2] powerpc: Fix compile issue with force DAWR

2019-05-13 Thread Michael Neuling


> > > > --
> > > > v2:
> > > > Fixes based on Christophe Leroy's comments:
> > > > - Fix commit message formatting
> > > > - Move more DAWR code into dawr.c
> > > > ---
> > > >arch/powerpc/Kconfig |  5 ++
> > > >arch/powerpc/include/asm/hw_breakpoint.h | 20 ---
> > > >arch/powerpc/kernel/Makefile |  1 +
> > > >arch/powerpc/kernel/dawr.c   | 75
> > > > 
> > > >arch/powerpc/kernel/hw_breakpoint.c  | 56 --
> > > >arch/powerpc/kvm/Kconfig |  1 +
> > > >6 files changed, 94 insertions(+), 64 deletions(-)
> > > >create mode 100644 arch/powerpc/kernel/dawr.c
> > > > 
> > > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > > > index 2711aac246..f4b19e48cc 100644
> > > > --- a/arch/powerpc/Kconfig
> > > > +++ b/arch/powerpc/Kconfig
> > > > @@ -242,6 +242,7 @@ config PPC
> > > > select SYSCTL_EXCEPTION_TRACE
> > > > select THREAD_INFO_IN_TASK
> > > > select VIRT_TO_BUS  if !PPC64
> > > > +   select PPC_DAWR_FORCE_ENABLEif PPC64 || PERF
> > > 
> > > What's PERF ? Did you mean PERF_EVENTS ?
> > > 
> > > Then what you mean is:
> > > - Selected all the time for PPC64
> > > - Selected for PPC32 when PERF is also selected.
> > > 
> > > Is that what you want ? At first I thought it was linked to P9.
> > 
> > This is wrong.  I think we just want PPC64. PERF is a red herring.
> 
> Are you sure ? Michael suggested PERF || KVM as far as I remember.

It was mpe but I think it was wrong.

We could make it more selective with something like:
   PPC64 && (HW_BREAK || PPC_ADV_DEBUG_REGS || PERF)
but I think that will end up back in the larger mess of
  https://github.com/linuxppc/issues/issues/128

I don't think the minor size gain is is worth delving in that mess, hence I made
it simply PPC64 which is hopefully an improvement on what we have since it
eliminates 32bit.

> > > >#else/* CONFIG_HAVE_HW_BREAKPOINT */
> > > >static inline void hw_breakpoint_disable(void) { }
> > > >static inline void thread_change_pc(struct task_struct *tsk,
> > > > struct pt_regs *regs) { }
> > > > -static inline bool dawr_enabled(void) { return false; }
> > > > +
> > > >#endif   /* CONFIG_HAVE_HW_BREAKPOINT */
> > > > +
> > > > +extern bool dawr_force_enable;
> > > > +
> > > > +#ifdef CONFIG_PPC_DAWR_FORCE_ENABLE
> > > > +extern bool dawr_enabled(void);
> > > 
> > > Functions should not be 'extern'. I'm sure checkpatch --strict will tell
> > > you.
> > 
> > That's not what's currently being done in this header file.  I'm keeping
> > with
> > the style of that file.
> 
> style is not defined on a per file basis. There is the style from the 
> past and the nowadays style. If you keep old style just because the file 
> includes old style statements, then the code will never improve.
> 
> All new patches should come with clean 'checkpatch' report and follow 
> new style. Having mixed styles in a file is not a problem, that's the 
> way to improvement. See arch/powerpc/mm/mmu_decl.h as an exemple.

I'm not sure that's mpe's policy.

mpe?

> > > > +
> > > > +static ssize_t dawr_write_file_bool(struct file *file,
> > > > +   const char __user *user_buf,
> > > > +   size_t count, loff_t *ppos)
> > > > +{
> > > > +   struct arch_hw_breakpoint null_brk = {0, 0, 0};
> > > > +   size_t rc;
> > > > +
> > > > +   /* Send error to user if they hypervisor won't allow us to write
> > > > DAWR */
> > > > +   if ((!dawr_force_enable) &&
> > > > +   (firmware_has_feature(FW_FEATURE_LPAR)) &&
> > > > +   (set_dawr(&null_brk) != H_SUCCESS))
> > > 
> > > The above is not real clear.
> > > set_dabr() returns 0, H_SUCCESS is not used there.
> > 
> > It pseries_set_dawr() will return a hcall number.
> 
> Right, then it maybe means set_dawr() should be fixes ?

Sorry, I don't understand this.

> > This code hasn't changed. I'm just moving it.
> 
> Right, but could be an improvment for another patch.
> As far as I remember you are the one who wrote that code at first place, 
> arent't you ?

Yep, classic crap Mikey code :-)

Mikey


Re: [PATCH] EDAC, mpc85xx: Prevent building as a module

2019-05-13 Thread Michael Ellerman
Borislav Petkov  writes:
> On Fri, May 10, 2019 at 04:13:20PM +0200, Borislav Petkov wrote:
>> On Fri, May 10, 2019 at 08:50:52PM +1000, Michael Ellerman wrote:
>> > Yeah that looks better to me. I didn't think about the case where EDAC
>> > core is modular.
>> > 
>> > Do you want me to send a new patch?
>> 
>> Nah, I'll fix it up.
>
> I've pushed it here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/commit/?h=edac-fix-for-5.2
>
> in case you wanna throw your build tests on it. My dingy cross-compiler
> can't do much really.

Looks good. I even booted it :)

cheers


Re: [PATCH] powerpc/mm: Handle page table allocation failures

2019-05-13 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> This fix the below crash that arise due to not handling page table allocation
> failures while allocating hugetlb page table.
>
>  BUG: Kernel NULL pointer dereference at 0x001c
>  Faulting instruction address: 0xc1d1e58c
>  Oops: Kernel access of bad area, sig: 11 [#1]
>  LE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>
>  CPU: 3 PID: 4635 Comm: futex_wake04 Tainted: GW  O  
> 5.1.0-next-20190507-autotest #1
>  NIP:  c1d1e58c LR: c1d1e54c CTR: 
>  REGS: c4937890 TRAP: 0300   Tainted: GW  O   
> (5.1.0-next-20190507-autotest)
>  MSR:  80009033   CR: 22424822  XER: 
>  CFAR: c183e9e0 DAR: 001c DSISR: 4000 IRQMASK: 0
>  GPR00: c1901a80 c4937b20 c3938700 
>  GPR04: 00400cc0 0003efff 00027966e000 c3ba8700
>  GPR08: c3ba8700 0d601125 c3ba8700 8000
>  GPR12: 22424822 c0001ecae280  
>  GPR16:    
>  GPR20: 0018 c39e2d30 c39e2d28 c002762da460
>  GPR24: 001c  0001 c1901a80
>  GPR28: 00400cc0   00400cc0
>  NIP [c1d1e58c] kmem_cache_alloc+0xbc/0x5a0
>  LR [c1d1e54c] kmem_cache_alloc+0x7c/0x5a0
>  Call Trace:
>   [c1c91150] __pud_alloc+0x160/0x200 (unreliable)
>   [c1901a80] huge_pte_alloc+0x580/0x950
>   [c1cf7910] hugetlb_fault+0x9a0/0x1250
>   [c1c94a80] handle_mm_fault+0x490/0x4a0
>   [c18d529c] __do_page_fault+0x77c/0x1f00
>   [c18d6a48] do_page_fault+0x28/0x50
>   [c183b0d4] handle_page_fault+0x18/0x38
>
> Fixes: e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a 
> different page table format")
> Reported-by: Sachin Sant 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>
> Note: I did add a recent commit for the Fixes tag. But in reality we never 
> checked for page table
> allocation failure there. If we want to go to that old commit, then we may 
> need.

If we never checked for failure in that path, is there some reason we've
only just noticed the crashes? Are we just testing under memory pressure
more effectively than we used to?

cheers


[PATCH] powerpc/book3s/mm: Clear MMU_FTR_HPTE_TABLE when radix is enabled.

2019-05-13 Thread Aneesh Kumar K.V
Avoids confusion when printing Oops message like below

 Faulting instruction address: 0xc008bdb4
 Oops: Kernel access of bad area, sig: 11 [#1]
 LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV

Either ibm,pa-features or ibm,powerpc-cpu-features can be used to enable the
MMU features. We don't clear related MMU feature bits there. We use the kernel
commandline to determine what translation mode we want to use and clear the
HPTE or radix bit accordingly. On LPAR we do have to renable HASH bit if the
hypervisor can't do radix.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/init_64.c | 11 ++-
 arch/powerpc/mm/pgtable.c |  7 +++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 45b02fa11cd8..652d95ba 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -355,15 +355,18 @@ static void __init early_check_vec5(void)
chosen = of_get_flat_dt_subnode_by_name(root, "chosen");
if (chosen == -FDT_ERR_NOTFOUND) {
cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
+   cur_cpu_spec->mmu_features |= MMU_FTR_HPTE_TABLE;
return;
}
vec5 = of_get_flat_dt_prop(chosen, "ibm,architecture-vec-5", &size);
if (!vec5) {
cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
+   cur_cpu_spec->mmu_features |= MMU_FTR_HPTE_TABLE;
return;
}
if (size <= OV5_INDX(OV5_MMU_SUPPORT)) {
cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
+   cur_cpu_spec->mmu_features |= MMU_FTR_HPTE_TABLE;
return;
}
 
@@ -381,17 +384,23 @@ static void __init early_check_vec5(void)
}
/* Do radix anyway - the hypervisor said we had to */
cur_cpu_spec->mmu_features |= MMU_FTR_TYPE_RADIX;
+   cur_cpu_spec->mmu_features &= ~MMU_FTR_HPTE_TABLE;
} else if (mmu_supported == OV5_FEAT(OV5_MMU_HASH)) {
/* Hypervisor only supports hash - disable radix */
cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
+   cur_cpu_spec->mmu_features |= MMU_FTR_HPTE_TABLE;
}
 }
 
 void __init mmu_early_init_devtree(void)
 {
/* Disable radix mode based on kernel command line. */
-   if (disable_radix)
+   if (disable_radix) {
cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
+   } else {
+   if (early_radix_enabled())
+   cur_cpu_spec->mmu_features &= ~MMU_FTR_HPTE_TABLE;
+   }
 
/*
 * Check /chosen/ibm,architecture-vec-5 if running as a guest.
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index b97aee03924f..0fa6cac3fe82 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -77,9 +77,6 @@ static struct page *maybe_pte_to_page(pte_t pte)
 
 static pte_t set_pte_filter_hash(pte_t pte)
 {
-   if (radix_enabled())
-   return pte;
-
pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
if (pte_looks_normal(pte) && !(cpu_has_feature(CPU_FTR_COHERENT_ICACHE) 
||
   cpu_has_feature(CPU_FTR_NOEXECUTE))) {
@@ -110,6 +107,8 @@ static pte_t set_pte_filter(pte_t pte)
 
if (mmu_has_feature(MMU_FTR_HPTE_TABLE))
return set_pte_filter_hash(pte);
+   else if (radix_enabled())
+   return pte;
 
/* No exec permission in the first place, move on */
if (!pte_exec(pte) || !pte_looks_normal(pte))
@@ -140,7 +139,7 @@ static pte_t set_access_flags_filter(pte_t pte, struct 
vm_area_struct *vma,
 {
struct page *pg;
 
-   if (mmu_has_feature(MMU_FTR_HPTE_TABLE))
+   if (mmu_has_feature(MMU_FTR_HPTE_TABLE) || radix_enabled())
return pte;
 
/* So here, we only care about exec faults, as we use them
-- 
2.21.0



[PATCH 3/3] powerpc/mm: Remove radix dependency on HugeTLB page

2019-05-13 Thread Aneesh Kumar K.V
Now that we have switched the page table walk to use pmd_is_leaf we can now
revert commit 8adddf349fda ("powerpc/mm/radix: Make Radix require HUGETLB_PAGE")

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/platforms/Kconfig.cputype | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index d0e172d47574..fa6b03205ae1 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -330,7 +330,7 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
 
 config PPC_RADIX_MMU
bool "Radix MMU Support"
-   depends on PPC_BOOK3S_64 && HUGETLB_PAGE
+   depends on PPC_BOOK3S_64
select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
select PPC_HAVE_KUEP
select PPC_HAVE_KUAP
-- 
2.21.0



[PATCH 2/3] powerpc/mm: pmd_devmap implies pmd_large().

2019-05-13 Thread Aneesh Kumar K.V
large devmap usage is dependent on THP. Hence once check is sufficient.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/book3s64/pgtable.c | 2 +-
 arch/powerpc/mm/pgtable_64.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 16bda049187a..471cea271dd3 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -76,7 +76,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
 
WARN_ON(pte_hw_valid(pmd_pte(*pmdp)) && !pte_protnone(pmd_pte(*pmdp)));
assert_spin_locked(pmd_lockptr(mm, pmdp));
-   WARN_ON(!(pmd_large(pmd) || pmd_devmap(pmd)));
+   WARN_ON(!(pmd_large(pmd)));
 #endif
trace_hugepage_set_pmd(addr, pmd_val(pmd));
return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index bfb47b037a2f..c1541371f224 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -320,7 +320,7 @@ struct page *pud_page(pud_t pud)
 struct page *pmd_page(pmd_t pmd)
 {
if (pmd_is_leaf(pmd)) {
-   VM_WARN_ON(!(pmd_large(pmd) || pmd_huge(pmd) || 
pmd_devmap(pmd)));
+   VM_WARN_ON(!(pmd_large(pmd) || pmd_huge(pmd)));
return pte_page(pmd_pte(pmd));
}
return virt_to_page(pmd_page_vaddr(pmd));
-- 
2.21.0



[PATCH 1/3] powerpc/book3s: Use config independent helpers for page table walk

2019-05-13 Thread Aneesh Kumar K.V
Even when we have HugeTLB and THP disabled, kernel linear map can still be
mapped with hugepages. This is only an issue with radix translation because hash
MMU doesn't map kernel linear range in linux page table and other kernel
map areas are not mapped using hugepage.

Add config independent helpers and put WARN_ON() when we don't expect things
to be mapped via hugepages.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 21 +++
 arch/powerpc/include/asm/pgtable.h   | 24 +
 arch/powerpc/include/asm/pte-walk.h  | 28 ++--
 arch/powerpc/kvm/book3s_64_mmu_radix.c   | 12 +++--
 arch/powerpc/mm/book3s64/radix_pgtable.c | 10 +++
 arch/powerpc/mm/pgtable.c| 16 +--
 arch/powerpc/mm/pgtable_64.c | 12 ++---
 arch/powerpc/mm/ptdump/ptdump.c  |  6 ++---
 arch/powerpc/xmon/xmon.c |  6 ++---
 9 files changed, 102 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 7dede2e34b70..f402b2a96ef3 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1313,5 +1313,26 @@ static inline bool is_pte_rw_upgrade(unsigned long 
old_val, unsigned long new_va
return false;
 }
 
+/*
+ * Like pmd_huge() and pmd_large(), but works regardless of config options
+ */
+#define pmd_is_leaf pmd_is_leaf
+static inline bool pmd_is_leaf(pmd_t pmd)
+{
+   return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
+}
+
+#define pud_is_leaf pud_is_leaf
+static inline bool pud_is_leaf(pud_t pud)
+{
+   return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
+}
+
+#define pgd_is_leaf pgd_is_leaf
+static inline bool pgd_is_leaf(pgd_t pgd)
+{
+   return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE));
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 3f53be60fb01..bf7d771f342e 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -140,6 +140,30 @@ static inline void pte_frag_set(mm_context_t *ctx, void *p)
 }
 #endif
 
+#ifndef pmd_is_leaf
+#define pmd_is_leaf pmd_is_leaf
+static inline bool pmd_is_leaf(pmd_t pmd)
+{
+   return false;
+}
+#endif
+
+#ifndef pud_is_leaf
+#define pud_is_leaf pud_is_leaf
+static inline bool pud_is_leaf(pud_t pud)
+{
+   return false;
+}
+#endif
+
+#ifndef pgd_is_leaf
+#define pgd_is_leaf pgd_is_leaf
+static inline bool pgd_is_leaf(pgd_t pgd)
+{
+   return false;
+}
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
diff --git a/arch/powerpc/include/asm/pte-walk.h 
b/arch/powerpc/include/asm/pte-walk.h
index 2d633e9d686c..33fa5dd8ee6a 100644
--- a/arch/powerpc/include/asm/pte-walk.h
+++ b/arch/powerpc/include/asm/pte-walk.h
@@ -10,8 +10,20 @@ extern pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long 
ea,
 static inline pte_t *find_linux_pte(pgd_t *pgdir, unsigned long ea,
bool *is_thp, unsigned *hshift)
 {
+   pte_t *pte;
+
VM_WARN(!arch_irqs_disabled(), "%s called with irq enabled\n", 
__func__);
-   return __find_linux_pte(pgdir, ea, is_thp, hshift);
+   pte = __find_linux_pte(pgdir, ea, is_thp, hshift);
+
+#if defined(CONFIG_DEBUG_VM) &&
\
+   !(defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE))
+   /*
+* We should not find huge page if these configs are not enabled.
+*/
+   if (hshift)
+   WARN_ON(*hshift);
+#endif
+   return pte;
 }
 
 static inline pte_t *find_init_mm_pte(unsigned long ea, unsigned *hshift)
@@ -26,10 +38,22 @@ static inline pte_t *find_init_mm_pte(unsigned long ea, 
unsigned *hshift)
 static inline pte_t *find_current_mm_pte(pgd_t *pgdir, unsigned long ea,
 bool *is_thp, unsigned *hshift)
 {
+   pte_t *pte;
+
VM_WARN(!arch_irqs_disabled(), "%s called with irq enabled\n", 
__func__);
VM_WARN(pgdir != current->mm->pgd,
"%s lock less page table lookup called on wrong mm\n", 
__func__);
-   return __find_linux_pte(pgdir, ea, is_thp, hshift);
+   pte = __find_linux_pte(pgdir, ea, is_thp, hshift);
+
+#if defined(CONFIG_DEBUG_VM) &&
\
+   !(defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE))
+   /*
+* We should not find huge page if these configs are not enabled.
+*/
+   if (hshift)
+   WARN_ON(*hshift);
+#endif
+   return pte;
 }
 
 #endif /* _ASM_POWERPC_PTE_WALK_H */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index f55ef071883f..91efee7f0329 100644
--- a/arch/powerpc/kvm/book

Re: [PATCH 1/2] [PowerPC] Add simd.h implementation

2019-05-13 Thread Benjamin Herrenschmidt
On Mon, 2019-05-13 at 22:44 -0300, Shawn Landden wrote:
> +
> +/*
> + * Were we in user mode when we were
> + * interrupted?
> + *
> + * Doing kernel_altivec/vsx_begin/end() is ok if we are running
> + * in an interrupt context from user mode - we'll just
> + * save the FPU state as required.
> + */
> +static bool interrupted_user_mode(void)
> +{
> +   struct pt_regs *regs = get_irq_regs();
> +
> +   return regs && user_mode(regs);
> +}
> +

That's interesting  it *could* work but we'll have to careful audit
the code to make sure thats ok.

We probably also want to handle the case where the CPU is in the idle
loop.

Do we always save the user state when switching out these days ? If
yes, then there's no "live" state to worry about...

Cheers,
Ben.




Re: [PATCH v2] powerpc: Fix compile issue with force DAWR

2019-05-13 Thread Christophe Leroy




Le 14/05/2019 à 06:47, Michael Neuling a écrit :

On Mon, 2019-05-13 at 11:08 +0200, Christophe Leroy wrote:


Le 13/05/2019 à 09:17, Michael Neuling a écrit :

If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail
at linking with:
arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined
reference to `dawr_force_enable'

This was caused by commit c1fe190c0672 ("powerpc: Add force enable of
DAWR on P9 option").

This puts more of the dawr infrastructure in a new file.


I think you are doing a bit more than that. I think you should explain
that you define a new CONFIG_ option, when it is selected, etc ...

The commit you are referring to is talking about P9. It looks like your
patch covers a lot more, so it should be mentionned her I guess.


Not really. It looks like I'm doing a lot but I'm really just moving code around
to deal with the ugliness of a bunch of config options and dependencies.


Signed-off-by: Michael Neuling 


You should add a Fixes: tag, ie:

Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")


Ok




--
v2:
Fixes based on Christophe Leroy's comments:
- Fix commit message formatting
- Move more DAWR code into dawr.c
---
   arch/powerpc/Kconfig |  5 ++
   arch/powerpc/include/asm/hw_breakpoint.h | 20 ---
   arch/powerpc/kernel/Makefile |  1 +
   arch/powerpc/kernel/dawr.c   | 75 
   arch/powerpc/kernel/hw_breakpoint.c  | 56 --
   arch/powerpc/kvm/Kconfig |  1 +
   6 files changed, 94 insertions(+), 64 deletions(-)
   create mode 100644 arch/powerpc/kernel/dawr.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2711aac246..f4b19e48cc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -242,6 +242,7 @@ config PPC
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK
select VIRT_TO_BUS  if !PPC64
+   select PPC_DAWR_FORCE_ENABLEif PPC64 || PERF


What's PERF ? Did you mean PERF_EVENTS ?

Then what you mean is:
- Selected all the time for PPC64
- Selected for PPC32 when PERF is also selected.

Is that what you want ? At first I thought it was linked to P9.


This is wrong.  I think we just want PPC64. PERF is a red herring.


Are you sure ? Michael suggested PERF || KVM as far as I remember.




And ... did you read below statement ?


Clearly not :-)




#
# Please keep this list sorted alphabetically.
#
@@ -369,6 +370,10 @@ config PPC_ADV_DEBUG_DAC_RANGE
depends on PPC_ADV_DEBUG_REGS && 44x
default y
   
+config PPC_DAWR_FORCE_ENABLE

+   bool
+   default y
+


Why defaulting it to y. Then how is it set to n ?


Good point.




   config ZONE_DMA
bool
default y if PPC_BOOK3E_64
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h
b/arch/powerpc/include/asm/hw_breakpoint.h
index 0fe8c1e46b..ffbc8eab41 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -47,6 +47,8 @@ struct arch_hw_breakpoint {
   #define HW_BRK_TYPE_PRIV_ALL (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL |
\
 HW_BRK_TYPE_HYP)
   
+extern int set_dawr(struct arch_hw_breakpoint *brk);

+
   #ifdef CONFIG_HAVE_HW_BREAKPOINT
   #include 
   #include 
@@ -90,18 +92,20 @@ static inline void hw_breakpoint_disable(void)
   extern void thread_change_pc(struct task_struct *tsk, struct pt_regs
*regs);
   int hw_breakpoint_handler(struct die_args *args);
   
-extern int set_dawr(struct arch_hw_breakpoint *brk);

-extern bool dawr_force_enable;
-static inline bool dawr_enabled(void)
-{
-   return dawr_force_enable;
-}
-


That's a very simple function, why not keep it here (or in another .h)
as 'static inline' ?


Sure.


   #else/* CONFIG_HAVE_HW_BREAKPOINT */
   static inline void hw_breakpoint_disable(void) { }
   static inline void thread_change_pc(struct task_struct *tsk,
struct pt_regs *regs) { }
-static inline bool dawr_enabled(void) { return false; }
+
   #endif   /* CONFIG_HAVE_HW_BREAKPOINT */
+
+extern bool dawr_force_enable;
+
+#ifdef CONFIG_PPC_DAWR_FORCE_ENABLE
+extern bool dawr_enabled(void);


Functions should not be 'extern'. I'm sure checkpatch --strict will tell
you.


That's not what's currently being done in this header file.  I'm keeping with
the style of that file.


style is not defined on a per file basis. There is the style from the 
past and the nowadays style. If you keep old style just because the file 
includes old style statements, then the code will never improve.


All new patches should come with clean 'checkpatch' report and follow 
new style. Having mixed styles in a file is not a problem, that's the 
way to improvement. See arch/powerpc/mm/mmu_decl.h as an exemple.





+#else
+#define dawr_enabled() true


true by default ?
Previously it was false 

Re: [Bug 203597] New: kernel 4.9.175 fails to boot on a PowerMac G4 3,6 at early stage

2019-05-13 Thread Christophe Leroy

Hi Greg,

Could you please apply b45ba4a51cde29b2939365ef0c07ad34c8321789 to 4.9 
since 51c3c62b58b357e8d35e4cc32f7b4ec907426fe3 was applied allthought 
marked for #4.13+


Thanks
Christophe

Le 13/05/2019 à 22:18, bugzilla-dae...@bugzilla.kernel.org a écrit :

https://bugzilla.kernel.org/show_bug.cgi?id=203597

 Bug ID: 203597
Summary: kernel 4.9.175 fails to boot on a PowerMac G4 3,6 at
 early stage
Product: Platform Specific/Hardware
Version: 2.5
 Kernel Version: 4.9.175
   Hardware: PPC-32
 OS: Linux
   Tree: Mainline
 Status: NEW
   Severity: normal
   Priority: P1
  Component: PPC-32
   Assignee: platform_ppc...@kernel-bugs.osdl.org
   Reporter: erhar...@mailbox.org
 Regression: No

Created attachment 282743
   --> https://bugzilla.kernel.org/attachment.cgi?id=282743&action=edit
kernel .config (PowerMac G4 MDD)

Trying out older kernels on the G4 MDD I noticed recent 4.9.x freeze the
machine. Only message displayed in black letters on a white screen:

done
found display   : /pco@f000/ATY,AlteracParent@10/ATY,Alterac_B@1,
opening...


It's a hard freeze, RCU_CPU_STALL_TIMEOUT does not kick in.

Tried other stable kernels, which all worked:
4.19.37
4.14.114
4.4.179

So I suppose it's only a 4.9.x issue. Last working 4.9.x kernel I had in
service was 4.9.161. First one I spotted freezing was 4.9.171. A bisect gave me
the following commit:

1c38a84d45862be06ac418618981631eddbda741 is the first bad commit
commit 1c38a84d45862be06ac418618981631eddbda741
Author: Michael Neuling 
Date:   Thu Apr 11 21:45:59 2019 +1000

 powerpc: Avoid code patching freed init sections

 commit 51c3c62b58b357e8d35e4cc32f7b4ec907426fe3 upstream.

 This stops us from doing code patching in init sections after they've
 been freed.

 In this chain:
   kvm_guest_init() ->
 kvm_use_magic_page() ->
   fault_in_pages_readable() ->
  __get_user() ->
__get_user_nocheck() ->
  barrier_nospec();

 We have a code patching location at barrier_nospec() and
 kvm_guest_init() is an init function. This whole chain gets inlined,
 so when we free the init section (hence kvm_guest_init()), this code
 goes away and hence should no longer be patched.

 We seen this as userspace memory corruption when using a memory
 checker while doing partition migration testing on powervm (this
 starts the code patching post migration via
 /sys/kernel/mobility/migration). In theory, it could also happen when
 using /sys/kernel/debug/powerpc/barrier_nospec.

 Cc: sta...@vger.kernel.org # 4.13+
 Signed-off-by: Michael Neuling 
 Reviewed-by: Nicholas Piggin 
 Reviewed-by: Christophe Leroy 
 Signed-off-by: Michael Ellerman 
 Signed-off-by: Sasha Levin 



Re: [PATCH v1 4/8] soc/fsl/qbman: Use index when accessing device tree properties

2019-05-13 Thread Joakim Tjernlund
On Mon, 2019-05-13 at 17:40 +, Roy Pledge wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
> 
> 
> On 5/13/2019 12:40 PM, Joakim Tjernlund wrote:
> > On Mon, 2019-05-13 at 16:09 +, Roy Pledge wrote:
> > > The index value should be passed to the of_parse_phandle()
> > > function to ensure the correct property is read.
> > Is this a bug fix? Maybe for stable too?
> > 
> >  Jocke
> Yes this could go to stable.  I will include sta...@vger.kernel.org when
> I send the next version.

I think you need to send this patch separately then. The whole series cannot go 
to stable.

 Jocke

> > > Signed-off-by: Roy Pledge 
> > > ---
> > >  drivers/soc/fsl/qbman/dpaa_sys.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/soc/fsl/qbman/dpaa_sys.c 
> > > b/drivers/soc/fsl/qbman/dpaa_sys.c
> > > index 3e0a7f3..0b901a8 100644
> > > --- a/drivers/soc/fsl/qbman/dpaa_sys.c
> > > +++ b/drivers/soc/fsl/qbman/dpaa_sys.c
> > > @@ -49,7 +49,7 @@ int qbman_init_private_mem(struct device *dev, int idx, 
> > > dma_addr_t *addr,
> > > idx, ret);
> > > return -ENODEV;
> > > }
> > > -   mem_node = of_parse_phandle(dev->of_node, "memory-region", 0);
> > > +   mem_node = of_parse_phandle(dev->of_node, "memory-region", idx);
> > > if (mem_node) {
> > > ret = of_property_read_u64(mem_node, "size", &size64);
> > > if (ret) {
> > > --
> > > 2.7.4
> > > 



Re: [PATCH v2] powerpc: Fix compile issue with force DAWR

2019-05-13 Thread Michael Neuling
On Mon, 2019-05-13 at 11:08 +0200, Christophe Leroy wrote:
> 
> Le 13/05/2019 à 09:17, Michael Neuling a écrit :
> > If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail
> > at linking with:
> >arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined
> > reference to `dawr_force_enable'
> > 
> > This was caused by commit c1fe190c0672 ("powerpc: Add force enable of
> > DAWR on P9 option").
> > 
> > This puts more of the dawr infrastructure in a new file.
> 
> I think you are doing a bit more than that. I think you should explain 
> that you define a new CONFIG_ option, when it is selected, etc ...
> 
> The commit you are referring to is talking about P9. It looks like your 
> patch covers a lot more, so it should be mentionned her I guess.

Not really. It looks like I'm doing a lot but I'm really just moving code around
to deal with the ugliness of a bunch of config options and dependencies. 

> > Signed-off-by: Michael Neuling 
> 
> You should add a Fixes: tag, ie:
> 
> Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")

Ok

> 
> > --
> > v2:
> >Fixes based on Christophe Leroy's comments:
> >- Fix commit message formatting
> >- Move more DAWR code into dawr.c
> > ---
> >   arch/powerpc/Kconfig |  5 ++
> >   arch/powerpc/include/asm/hw_breakpoint.h | 20 ---
> >   arch/powerpc/kernel/Makefile |  1 +
> >   arch/powerpc/kernel/dawr.c   | 75 
> >   arch/powerpc/kernel/hw_breakpoint.c  | 56 --
> >   arch/powerpc/kvm/Kconfig |  1 +
> >   6 files changed, 94 insertions(+), 64 deletions(-)
> >   create mode 100644 arch/powerpc/kernel/dawr.c
> > 
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index 2711aac246..f4b19e48cc 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -242,6 +242,7 @@ config PPC
> > select SYSCTL_EXCEPTION_TRACE
> > select THREAD_INFO_IN_TASK
> > select VIRT_TO_BUS  if !PPC64
> > +   select PPC_DAWR_FORCE_ENABLEif PPC64 || PERF
> 
> What's PERF ? Did you mean PERF_EVENTS ?
> 
> Then what you mean is:
> - Selected all the time for PPC64
> - Selected for PPC32 when PERF is also selected.
> 
> Is that what you want ? At first I thought it was linked to P9.

This is wrong.  I think we just want PPC64. PERF is a red herring.

> And ... did you read below statement ?

Clearly not :-)

> 
> > #
> > # Please keep this list sorted alphabetically.
> > #
> > @@ -369,6 +370,10 @@ config PPC_ADV_DEBUG_DAC_RANGE
> > depends on PPC_ADV_DEBUG_REGS && 44x
> > default y
> >   
> > +config PPC_DAWR_FORCE_ENABLE
> > +   bool
> > +   default y
> > +
> 
> Why defaulting it to y. Then how is it set to n ?

Good point.

> 
> >   config ZONE_DMA
> > bool
> > default y if PPC_BOOK3E_64
> > diff --git a/arch/powerpc/include/asm/hw_breakpoint.h
> > b/arch/powerpc/include/asm/hw_breakpoint.h
> > index 0fe8c1e46b..ffbc8eab41 100644
> > --- a/arch/powerpc/include/asm/hw_breakpoint.h
> > +++ b/arch/powerpc/include/asm/hw_breakpoint.h
> > @@ -47,6 +47,8 @@ struct arch_hw_breakpoint {
> >   #define HW_BRK_TYPE_PRIV_ALL  (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL |
> > \
> >  HW_BRK_TYPE_HYP)
> >   
> > +extern int set_dawr(struct arch_hw_breakpoint *brk);
> > +
> >   #ifdef CONFIG_HAVE_HW_BREAKPOINT
> >   #include 
> >   #include 
> > @@ -90,18 +92,20 @@ static inline void hw_breakpoint_disable(void)
> >   extern void thread_change_pc(struct task_struct *tsk, struct pt_regs
> > *regs);
> >   int hw_breakpoint_handler(struct die_args *args);
> >   
> > -extern int set_dawr(struct arch_hw_breakpoint *brk);
> > -extern bool dawr_force_enable;
> > -static inline bool dawr_enabled(void)
> > -{
> > -   return dawr_force_enable;
> > -}
> > -
> 
> That's a very simple function, why not keep it here (or in another .h) 
> as 'static inline' ?

Sure.

> >   #else /* CONFIG_HAVE_HW_BREAKPOINT */
> >   static inline void hw_breakpoint_disable(void) { }
> >   static inline void thread_change_pc(struct task_struct *tsk,
> > struct pt_regs *regs) { }
> > -static inline bool dawr_enabled(void) { return false; }
> > +
> >   #endif/* CONFIG_HAVE_HW_BREAKPOINT */
> > +
> > +extern bool dawr_force_enable;
> > +
> > +#ifdef CONFIG_PPC_DAWR_FORCE_ENABLE
> > +extern bool dawr_enabled(void);
> 
> Functions should not be 'extern'. I'm sure checkpatch --strict will tell 
> you.

That's not what's currently being done in this header file.  I'm keeping with
the style of that file.

> > +#else
> > +#define dawr_enabled() true
> 
> true by default ?
> Previously it was false by default.

Thanks, yeah that's wrong but I need to rethink the config option to make it
CONFIG_PPC_DAWR. 

This patch is far more difficult than it should be due to the mess that
CONFIG_HAVE_HW_BREAKPOINT and CONFIG_PPC_ADV_DEBUG_REGS cr

Re: [PATCH] mm/nvdimm: Use correct #defines instead of opencoding

2019-05-13 Thread Aneesh Kumar K.V

On 5/14/19 9:42 AM, Dan Williams wrote:

On Mon, May 13, 2019 at 9:05 PM Aneesh Kumar K.V
 wrote:


On 5/14/19 9:28 AM, Dan Williams wrote:

On Mon, May 13, 2019 at 7:56 PM Aneesh Kumar K.V
 wrote:


The nfpn related change is needed to fix the kernel message

"number of pfns truncated from 2617344 to 163584"

The change makes sure the nfpns stored in the superblock is right value.

Signed-off-by: Aneesh Kumar K.V 
---
   drivers/nvdimm/pfn_devs.c| 6 +++---
   drivers/nvdimm/region_devs.c | 8 
   2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 347cab166376..6751ff0296ef 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
   * when populating the vmemmap. This *should* be equal to
   * PMD_SIZE for most architectures.
   */
-   offset = ALIGN(start + reserve + 64 * npfns,
-   max(nd_pfn->align, PMD_SIZE)) - start;
+   offset = ALIGN(start + reserve + sizeof(struct page) * npfns,
+  max(nd_pfn->align, PMD_SIZE)) - start;


No, I think we need to record the page-size into the superblock format
otherwise this breaks in debug builds where the struct-page size is
extended.


  } else if (nd_pfn->mode == PFN_MODE_RAM)
  offset = ALIGN(start + reserve, nd_pfn->align) - start;
  else
@@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
  return -ENXIO;
  }

-   npfns = (size - offset - start_pad - end_trunc) / SZ_4K;
+   npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE;


Similar comment, if the page size is variable then the superblock
needs to explicitly account for it.



PAGE_SIZE is not really variable. What we can run into is the issue you
mentioned above. The size of struct page can change which means the
reserved space for keeping vmemmap in device may not be sufficient for
certain kernel builds.

I was planning to add another patch that fails namespace init if we
don't have enough space to keep the struct page.

Why do you suggest we need to have PAGE_SIZE as part of pfn superblock?


So that the kernel has a chance to identify cases where the superblock
it is handling was created on a system with different PAGE_SIZE
assumptions.



The reason to do that is we don't have enough space to keep struct page 
backing the total number of pfns? If so, what i suggested above should 
handle that.


or are you finding any other reason why we should fail a namespace init 
with a different PAGE_SIZE value?


My another patch handle the details w.r.t devdax alignment for which 
devdax got created with PAGE_SIZE 4K but we are now trying to load that 
in a kernel with PAGE_SIZE 64k.


-aneesh



[Bug 203597] kernel 4.9.175 fails to boot on a PowerMac G4 3,6 at early stage

2019-05-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=203597

Christophe Leroy (christophe.le...@c-s.fr) changed:

   What|Removed |Added

 CC||christophe.le...@c-s.fr

--- Comment #2 from Christophe Leroy (christophe.le...@c-s.fr) ---
You are missing following commit:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b45ba4a51cd

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH] mm/nvdimm: Use correct alignment when looking at first pfn from a region

2019-05-13 Thread Aneesh Kumar K.V

On 5/14/19 9:59 AM, Dan Williams wrote:

On Mon, May 13, 2019 at 7:55 PM Aneesh Kumar K.V
 wrote:


We already add the start_pad to the resource->start but fails to section
align the start. This make sure with altmap we compute the right first
pfn when start_pad is zero and we are doing an align down of start address.

Signed-off-by: Aneesh Kumar K.V 
---
  kernel/memremap.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index a856cb5ff192..23d77b60e728 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -59,9 +59,9 @@ static unsigned long pfn_first(struct dev_pagemap *pgmap)
  {
 const struct resource *res = &pgmap->res;
 struct vmem_altmap *altmap = &pgmap->altmap;
-   unsigned long pfn;
+   unsigned long pfn = PHYS_PFN(res->start);

-   pfn = res->start >> PAGE_SHIFT;
+   pfn = SECTION_ALIGN_DOWN(pfn);


This does not seem right to me it breaks the assumptions of where the
first expected valid pfn occurs in the passed in range.



How do we define the first valid pfn? Isn't that at pfn_sb->dataoff ?

-aneesh



Re: [RFC PATCH] mm/nvdimm: Fix kernel crash on devm_mremap_pages_release

2019-05-13 Thread Aneesh Kumar K.V

On 5/14/19 9:45 AM, Dan Williams wrote:

[ add Keith who was looking at something similar ]

On Mon, May 13, 2019 at 7:54 PM Aneesh Kumar K.V
 wrote:


When we initialize the namespace, if we support altmap, we don't initialize all 
the
backing struct page where as while releasing the namespace we look at some of
these uninitilized struct page. This results in a kernel crash as below.

kernel BUG at include/linux/mm.h:1034!
cpu 0x2: Vector: 700 (Program Check) at [c0024146b870]
 pc: c03788f8: devm_memremap_pages_release+0x258/0x3a0
 lr: c03788f4: devm_memremap_pages_release+0x254/0x3a0
 sp: c0024146bb00
msr: 8282b033
   current = 0xc00241382f00
   paca= 0xc0003fffd680   irqmask: 0x03   irq_happened: 0x01
 pid   = 4114, comm = ndctl
  c09bf8c0 devm_action_release+0x30/0x50
  c09c0938 release_nodes+0x268/0x2d0
  c09b95b4 device_release_driver_internal+0x164/0x230
  c09b638c unbind_store+0x13c/0x190
  c09b4f44 drv_attr_store+0x44/0x60
  c058ccc0 sysfs_kf_write+0x70/0xa0
  c058b52c kernfs_fop_write+0x1ac/0x290
  c04a415c __vfs_write+0x3c/0x70
  c04a85ac vfs_write+0xec/0x200
  c04a8920 ksys_write+0x80/0x130
  c000bee4 system_call+0x5c/0x70

Signed-off-by: Aneesh Kumar K.V 
---
  mm/page_alloc.c | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 59661106da16..892eabe1ec13 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5740,8 +5740,7 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,

  #ifdef CONFIG_ZONE_DEVICE
 /*
-* Honor reservation requested by the driver for this ZONE_DEVICE
-* memory. We limit the total number of pages to initialize to just
+* We limit the total number of pages to initialize to just
  * those that might contain the memory mapping. We will defer the
  * ZONE_DEVICE page initialization until after we have released
  * the hotplug lock.
@@ -5750,8 +5749,6 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
 if (!altmap)
 return;

-   if (start_pfn == altmap->base_pfn)
-   start_pfn += altmap->reserve;


If it's reserved then we should not be accessing, even if the above
works in practice. Isn't the fix something more like this to fix up
the assumptions at release time?

diff --git a/kernel/memremap.c b/kernel/memremap.c
index a856cb5ff192..9074ba14572c 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -90,6 +90,7 @@ static void devm_memremap_pages_release(void *data)
   struct device *dev = pgmap->dev;
   struct resource *res = &pgmap->res;
   resource_size_t align_start, align_size;
+ struct vmem_altmap *altmap = pgmap->altmap_valid ? &pgmap->altmap : NULL;
   unsigned long pfn;
   int nid;

@@ -102,7 +103,10 @@ static void devm_memremap_pages_release(void *data)
   align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
   - align_start;

- nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT));
+ pfn = align_start >> PAGE_SHIFT;
+ if (altmap)
+ pfn += vmem_altmap_offset(altmap);
+ nid = page_to_nid(pfn_to_page(pfn));

   mem_hotplug_begin();
   if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
@@ -110,8 +114,7 @@ static void devm_memremap_pages_release(void *data)
   __remove_pages(page_zone(pfn_to_page(pfn)), pfn,
   align_size >> PAGE_SHIFT, NULL);
   } else {
- arch_remove_memory(nid, align_start, align_size,
- pgmap->altmap_valid ? &pgmap->altmap : NULL);
+ arch_remove_memory(nid, align_start, align_size, altmap);
   kasan_remove_zero_shadow(__va(align_start), align_size);
   }
   mem_hotplug_done();

I did try that first. I was not sure about that. From the memory add vs 
remove perspective.


devm_memremap_pages:

align_start = res->start & ~(SECTION_SIZE - 1);
align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
- align_start;
align_end = align_start + align_size - 1;

error = arch_add_memory(nid, align_start, align_size, altmap,
false);


devm_memremap_pages_release:

/* pages are dead and unused, undo the arch mapping */
align_start = res->start & ~(SECTION_SIZE - 1);
align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
- align_start;

arch_remove_memory(nid, align_start, align_size,
pgmap->altmap_valid ? &pgmap->altmap : NULL);


Now if we are fixing the memremap_pages_release, shouldn't we adjust 
alig_start w.r.t memremap_pages too? and I was not sure what that means 
w.r.t add/remove alignment requirements.


What is the intended usage of reserve area? I guess we want that part to 
be added? if so shouldn't we remove them?



-aneesh



Re: [PATCH] mm/nvdimm: Use correct alignment when looking at first pfn from a region

2019-05-13 Thread Dan Williams
On Mon, May 13, 2019 at 7:55 PM Aneesh Kumar K.V
 wrote:
>
> We already add the start_pad to the resource->start but fails to section
> align the start. This make sure with altmap we compute the right first
> pfn when start_pad is zero and we are doing an align down of start address.
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  kernel/memremap.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index a856cb5ff192..23d77b60e728 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
> @@ -59,9 +59,9 @@ static unsigned long pfn_first(struct dev_pagemap *pgmap)
>  {
> const struct resource *res = &pgmap->res;
> struct vmem_altmap *altmap = &pgmap->altmap;
> -   unsigned long pfn;
> +   unsigned long pfn = PHYS_PFN(res->start);
>
> -   pfn = res->start >> PAGE_SHIFT;
> +   pfn = SECTION_ALIGN_DOWN(pfn);

This does not seem right to me it breaks the assumptions of where the
first expected valid pfn occurs in the passed in range.


Re: [RFC PATCH] mm/nvdimm: Fix kernel crash on devm_mremap_pages_release

2019-05-13 Thread Dan Williams
[ add Keith who was looking at something similar ]

On Mon, May 13, 2019 at 7:54 PM Aneesh Kumar K.V
 wrote:
>
> When we initialize the namespace, if we support altmap, we don't initialize 
> all the
> backing struct page where as while releasing the namespace we look at some of
> these uninitilized struct page. This results in a kernel crash as below.
>
> kernel BUG at include/linux/mm.h:1034!
> cpu 0x2: Vector: 700 (Program Check) at [c0024146b870]
> pc: c03788f8: devm_memremap_pages_release+0x258/0x3a0
> lr: c03788f4: devm_memremap_pages_release+0x254/0x3a0
> sp: c0024146bb00
>msr: 8282b033
>   current = 0xc00241382f00
>   paca= 0xc0003fffd680   irqmask: 0x03   irq_happened: 0x01
> pid   = 4114, comm = ndctl
>  c09bf8c0 devm_action_release+0x30/0x50
>  c09c0938 release_nodes+0x268/0x2d0
>  c09b95b4 device_release_driver_internal+0x164/0x230
>  c09b638c unbind_store+0x13c/0x190
>  c09b4f44 drv_attr_store+0x44/0x60
>  c058ccc0 sysfs_kf_write+0x70/0xa0
>  c058b52c kernfs_fop_write+0x1ac/0x290
>  c04a415c __vfs_write+0x3c/0x70
>  c04a85ac vfs_write+0xec/0x200
>  c04a8920 ksys_write+0x80/0x130
>  c000bee4 system_call+0x5c/0x70
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  mm/page_alloc.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 59661106da16..892eabe1ec13 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5740,8 +5740,7 @@ void __meminit memmap_init_zone(unsigned long size, int 
> nid, unsigned long zone,
>
>  #ifdef CONFIG_ZONE_DEVICE
> /*
> -* Honor reservation requested by the driver for this ZONE_DEVICE
> -* memory. We limit the total number of pages to initialize to just
> +* We limit the total number of pages to initialize to just
>  * those that might contain the memory mapping. We will defer the
>  * ZONE_DEVICE page initialization until after we have released
>  * the hotplug lock.
> @@ -5750,8 +5749,6 @@ void __meminit memmap_init_zone(unsigned long size, int 
> nid, unsigned long zone,
> if (!altmap)
> return;
>
> -   if (start_pfn == altmap->base_pfn)
> -   start_pfn += altmap->reserve;

If it's reserved then we should not be accessing, even if the above
works in practice. Isn't the fix something more like this to fix up
the assumptions at release time?

diff --git a/kernel/memremap.c b/kernel/memremap.c
index a856cb5ff192..9074ba14572c 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -90,6 +90,7 @@ static void devm_memremap_pages_release(void *data)
  struct device *dev = pgmap->dev;
  struct resource *res = &pgmap->res;
  resource_size_t align_start, align_size;
+ struct vmem_altmap *altmap = pgmap->altmap_valid ? &pgmap->altmap : NULL;
  unsigned long pfn;
  int nid;

@@ -102,7 +103,10 @@ static void devm_memremap_pages_release(void *data)
  align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
  - align_start;

- nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT));
+ pfn = align_start >> PAGE_SHIFT;
+ if (altmap)
+ pfn += vmem_altmap_offset(altmap);
+ nid = page_to_nid(pfn_to_page(pfn));

  mem_hotplug_begin();
  if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
@@ -110,8 +114,7 @@ static void devm_memremap_pages_release(void *data)
  __remove_pages(page_zone(pfn_to_page(pfn)), pfn,
  align_size >> PAGE_SHIFT, NULL);
  } else {
- arch_remove_memory(nid, align_start, align_size,
- pgmap->altmap_valid ? &pgmap->altmap : NULL);
+ arch_remove_memory(nid, align_start, align_size, altmap);
  kasan_remove_zero_shadow(__va(align_start), align_size);
  }
  mem_hotplug_done();


Re: [PATCH] mm/nvdimm: Use correct #defines instead of opencoding

2019-05-13 Thread Dan Williams
On Mon, May 13, 2019 at 9:05 PM Aneesh Kumar K.V
 wrote:
>
> On 5/14/19 9:28 AM, Dan Williams wrote:
> > On Mon, May 13, 2019 at 7:56 PM Aneesh Kumar K.V
> >  wrote:
> >>
> >> The nfpn related change is needed to fix the kernel message
> >>
> >> "number of pfns truncated from 2617344 to 163584"
> >>
> >> The change makes sure the nfpns stored in the superblock is right value.
> >>
> >> Signed-off-by: Aneesh Kumar K.V 
> >> ---
> >>   drivers/nvdimm/pfn_devs.c| 6 +++---
> >>   drivers/nvdimm/region_devs.c | 8 
> >>   2 files changed, 7 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> >> index 347cab166376..6751ff0296ef 100644
> >> --- a/drivers/nvdimm/pfn_devs.c
> >> +++ b/drivers/nvdimm/pfn_devs.c
> >> @@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
> >>   * when populating the vmemmap. This *should* be equal to
> >>   * PMD_SIZE for most architectures.
> >>   */
> >> -   offset = ALIGN(start + reserve + 64 * npfns,
> >> -   max(nd_pfn->align, PMD_SIZE)) - start;
> >> +   offset = ALIGN(start + reserve + sizeof(struct page) * 
> >> npfns,
> >> +  max(nd_pfn->align, PMD_SIZE)) - start;
> >
> > No, I think we need to record the page-size into the superblock format
> > otherwise this breaks in debug builds where the struct-page size is
> > extended.
> >
> >>  } else if (nd_pfn->mode == PFN_MODE_RAM)
> >>  offset = ALIGN(start + reserve, nd_pfn->align) - start;
> >>  else
> >> @@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
> >>  return -ENXIO;
> >>  }
> >>
> >> -   npfns = (size - offset - start_pad - end_trunc) / SZ_4K;
> >> +   npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE;
> >
> > Similar comment, if the page size is variable then the superblock
> > needs to explicitly account for it.
> >
>
> PAGE_SIZE is not really variable. What we can run into is the issue you
> mentioned above. The size of struct page can change which means the
> reserved space for keeping vmemmap in device may not be sufficient for
> certain kernel builds.
>
> I was planning to add another patch that fails namespace init if we
> don't have enough space to keep the struct page.
>
> Why do you suggest we need to have PAGE_SIZE as part of pfn superblock?

So that the kernel has a chance to identify cases where the superblock
it is handling was created on a system with different PAGE_SIZE
assumptions.


Re: [PATCH] mm/nvdimm: Use correct #defines instead of opencoding

2019-05-13 Thread Aneesh Kumar K.V

On 5/14/19 9:28 AM, Dan Williams wrote:

On Mon, May 13, 2019 at 7:56 PM Aneesh Kumar K.V
 wrote:


The nfpn related change is needed to fix the kernel message

"number of pfns truncated from 2617344 to 163584"

The change makes sure the nfpns stored in the superblock is right value.

Signed-off-by: Aneesh Kumar K.V 
---
  drivers/nvdimm/pfn_devs.c| 6 +++---
  drivers/nvdimm/region_devs.c | 8 
  2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 347cab166376..6751ff0296ef 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
  * when populating the vmemmap. This *should* be equal to
  * PMD_SIZE for most architectures.
  */
-   offset = ALIGN(start + reserve + 64 * npfns,
-   max(nd_pfn->align, PMD_SIZE)) - start;
+   offset = ALIGN(start + reserve + sizeof(struct page) * npfns,
+  max(nd_pfn->align, PMD_SIZE)) - start;


No, I think we need to record the page-size into the superblock format
otherwise this breaks in debug builds where the struct-page size is
extended.


 } else if (nd_pfn->mode == PFN_MODE_RAM)
 offset = ALIGN(start + reserve, nd_pfn->align) - start;
 else
@@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 return -ENXIO;
 }

-   npfns = (size - offset - start_pad - end_trunc) / SZ_4K;
+   npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE;


Similar comment, if the page size is variable then the superblock
needs to explicitly account for it.



PAGE_SIZE is not really variable. What we can run into is the issue you 
mentioned above. The size of struct page can change which means the 
reserved space for keeping vmemmap in device may not be sufficient for 
certain kernel builds.


I was planning to add another patch that fails namespace init if we 
don't have enough space to keep the struct page.


Why do you suggest we need to have PAGE_SIZE as part of pfn superblock?

-aneesh



Re: [RFC PATCH] mm/nvdimm: Fix kernel crash on devm_mremap_pages_release

2019-05-13 Thread Anshuman Khandual
On 05/14/2019 08:23 AM, Aneesh Kumar K.V wrote:
> When we initialize the namespace, if we support altmap, we don't initialize 
> all the
> backing struct page where as while releasing the namespace we look at some of
> these uninitilized struct page. This results in a kernel crash as below.
Yes this has been problematic which I have also previously encountered but in a 
bit
different way (while searching memory resources).

> 
> kernel BUG at include/linux/mm.h:1034!
What that would be ? Did not see a corresponding BUG_ON() line in the file.

> cpu 0x2: Vector: 700 (Program Check) at [c0024146b870]
> pc: c03788f8: devm_memremap_pages_release+0x258/0x3a0
> lr: c03788f4: devm_memremap_pages_release+0x254/0x3a0
> sp: c0024146bb00
>msr: 8282b033
>   current = 0xc00241382f00
>   paca= 0xc0003fffd680   irqmask: 0x03   irq_happened: 0x01
> pid   = 4114, comm = ndctl
>  c09bf8c0 devm_action_release+0x30/0x50
>  c09c0938 release_nodes+0x268/0x2d0
>  c09b95b4 device_release_driver_internal+0x164/0x230
>  c09b638c unbind_store+0x13c/0x190
>  c09b4f44 drv_attr_store+0x44/0x60
>  c058ccc0 sysfs_kf_write+0x70/0xa0
>  c058b52c kernfs_fop_write+0x1ac/0x290
>  c04a415c __vfs_write+0x3c/0x70
>  c04a85ac vfs_write+0xec/0x200
>  c04a8920 ksys_write+0x80/0x130
>  c000bee4 system_call+0x5c/0x70

I saw this as memory hotplug problem with respect to ZONE_DEVICE based device 
memory.
Hence a bit different explanation which I never posted. I guess parts of the 
commit
message here can be used for a better comprehensive explanation of the problem.

mm/hotplug: Initialize struct pages for vmem_altmap reserved areas

The following ZONE_DEVICE ranges (altmap) have valid struct pages allocated
from within device memory memmap range.

A. Driver reserved area [BASE -> BASE + RESV)
B. Device mmap area [BASE + RESV -> BASE + RESV + FREE]
C. Device usable area   [BASE + RESV + FREE -> END]

BASE - pgmap->altmap.base_pfn (pgmap->res.start >> PAGE_SHIFT)
RESV - pgmap->altmap.reserve
FREE - pgmap->altmap.free
END  - pgmap->res->end >> PAGE_SHIFT

Struct page init for all areas happens in two phases which detects altmap
use case and init parts of the device range in each phase.

1. memmap_init_zone (Device mmap area)
2. memmap_init_zone_device  (Device usable area)

memmap_init_zone() skips driver reserved area and does not init the
struct pages. This is problematic primarily for two reasons.

Though NODE_DATA(device_node(dev))->node_zones[ZONE_DEVICE] contains the
device memory range in it's entirety (in zone->spanned_pages) parts of this
range does not have zone set to ZONE_DEVICE in their struct page.

__remove_pages() called directly or from within arch_remove_memory() during
ZONE_DEVICE tear down procedure (devm_memremap_pages_release) hits an error
(like below) if there are reserved pages. This is because the first pfn of
the device range (invariably also the first pfn from reserved area) cannot
be identified belonging to ZONE_DEVICE. This erroneously leads range search
within iomem_resource region which never had this device memory region. So
this eventually ends up flashing the following error.

Unable to release resource <0x00068000-0x0006bfff> (-22)

Initialize struct pages for the driver reserved range while still staying
clear from it's contents.

> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  mm/page_alloc.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 59661106da16..892eabe1ec13 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5740,8 +5740,7 @@ void __meminit memmap_init_zone(unsigned long size, int 
> nid, unsigned long zone,
>  
>  #ifdef CONFIG_ZONE_DEVICE
>   /*
> -  * Honor reservation requested by the driver for this ZONE_DEVICE
> -  * memory. We limit the total number of pages to initialize to just
> +  * We limit the total number of pages to initialize to just
Comment needs bit change to reflect on the fact that both driver reserved as
well as mapped area (containing altmap struct pages) needs init here.


Re: [PATCH] mm/nvdimm: Use correct #defines instead of opencoding

2019-05-13 Thread Dan Williams
On Mon, May 13, 2019 at 7:56 PM Aneesh Kumar K.V
 wrote:
>
> The nfpn related change is needed to fix the kernel message
>
> "number of pfns truncated from 2617344 to 163584"
>
> The change makes sure the nfpns stored in the superblock is right value.
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  drivers/nvdimm/pfn_devs.c| 6 +++---
>  drivers/nvdimm/region_devs.c | 8 
>  2 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
> index 347cab166376..6751ff0296ef 100644
> --- a/drivers/nvdimm/pfn_devs.c
> +++ b/drivers/nvdimm/pfn_devs.c
> @@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
>  * when populating the vmemmap. This *should* be equal to
>  * PMD_SIZE for most architectures.
>  */
> -   offset = ALIGN(start + reserve + 64 * npfns,
> -   max(nd_pfn->align, PMD_SIZE)) - start;
> +   offset = ALIGN(start + reserve + sizeof(struct page) * npfns,
> +  max(nd_pfn->align, PMD_SIZE)) - start;

No, I think we need to record the page-size into the superblock format
otherwise this breaks in debug builds where the struct-page size is
extended.

> } else if (nd_pfn->mode == PFN_MODE_RAM)
> offset = ALIGN(start + reserve, nd_pfn->align) - start;
> else
> @@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
> return -ENXIO;
> }
>
> -   npfns = (size - offset - start_pad - end_trunc) / SZ_4K;
> +   npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE;

Similar comment, if the page size is variable then the superblock
needs to explicitly account for it.


[PATCH] mm/nvdimm: Use correct #defines instead of opencoding

2019-05-13 Thread Aneesh Kumar K.V
The nfpn related change is needed to fix the kernel message

"number of pfns truncated from 2617344 to 163584"

The change makes sure the nfpns stored in the superblock is right value.

Signed-off-by: Aneesh Kumar K.V 
---
 drivers/nvdimm/pfn_devs.c| 6 +++---
 drivers/nvdimm/region_devs.c | 8 
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 347cab166376..6751ff0296ef 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
 * when populating the vmemmap. This *should* be equal to
 * PMD_SIZE for most architectures.
 */
-   offset = ALIGN(start + reserve + 64 * npfns,
-   max(nd_pfn->align, PMD_SIZE)) - start;
+   offset = ALIGN(start + reserve + sizeof(struct page) * npfns,
+  max(nd_pfn->align, PMD_SIZE)) - start;
} else if (nd_pfn->mode == PFN_MODE_RAM)
offset = ALIGN(start + reserve, nd_pfn->align) - start;
else
@@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
return -ENXIO;
}
 
-   npfns = (size - offset - start_pad - end_trunc) / SZ_4K;
+   npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE;
pfn_sb->mode = cpu_to_le32(nd_pfn->mode);
pfn_sb->dataoff = cpu_to_le64(offset);
pfn_sb->npfns = cpu_to_le64(npfns);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b4ef7d9ff22e..2d8facea5a03 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -994,10 +994,10 @@ static struct nd_region *nd_region_create(struct 
nvdimm_bus *nvdimm_bus,
struct nd_mapping_desc *mapping = &ndr_desc->mapping[i];
struct nvdimm *nvdimm = mapping->nvdimm;
 
-   if ((mapping->start | mapping->size) % SZ_4K) {
-   dev_err(&nvdimm_bus->dev, "%s: %s mapping%d is not 4K 
aligned\n",
-   caller, dev_name(&nvdimm->dev), i);
-
+   if ((mapping->start | mapping->size) % PAGE_SIZE) {
+   dev_err(&nvdimm_bus->dev,
+   "%s: %s mapping%d is not 4K aligned\n",
+   caller, dev_name(&nvdimm->dev), i);
return NULL;
}
 
-- 
2.21.0



[PATCH] mm/nvdimm: Use correct alignment when looking at first pfn from a region

2019-05-13 Thread Aneesh Kumar K.V
We already add the start_pad to the resource->start but fails to section
align the start. This make sure with altmap we compute the right first
pfn when start_pad is zero and we are doing an align down of start address.

Signed-off-by: Aneesh Kumar K.V 
---
 kernel/memremap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/memremap.c b/kernel/memremap.c
index a856cb5ff192..23d77b60e728 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -59,9 +59,9 @@ static unsigned long pfn_first(struct dev_pagemap *pgmap)
 {
const struct resource *res = &pgmap->res;
struct vmem_altmap *altmap = &pgmap->altmap;
-   unsigned long pfn;
+   unsigned long pfn = PHYS_PFN(res->start);
 
-   pfn = res->start >> PAGE_SHIFT;
+   pfn = SECTION_ALIGN_DOWN(pfn);
if (pgmap->altmap_valid)
pfn += vmem_altmap_offset(altmap);
return pfn;
-- 
2.21.0



[PATCH] mm/nvdimm: Pick the right alignment default when creating dax devices

2019-05-13 Thread Aneesh Kumar K.V
Allow arch to provide the supported alignments and use hugepage alignment only
if we support hugepage. Right now we depend on compile time configs whereas this
patch switch this to runtime discovery.

Architectures like ppc64 can have THP enabled in code, but then can have
hugepage size disabled by the hypervisor. This allows us to create dax devices
with PAGE_SIZE alignment in this case.

Existing dax namespace with alignment larger than PAGE_SIZE will fail to
initialize in this specific case. We still allow fsdax namespace initialization.

With respect to identifying whether to enable hugepage fault for a dax device,
if THP is enabled during compile, we default to taking hugepage fault and in dax
fault handler if we find the fault size > alignment we retry with PAGE_SIZE
fault size.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/libnvdimm.h |  9 
 arch/powerpc/mm/Makefile |  1 +
 arch/powerpc/mm/nvdimm.c | 34 
 arch/x86/include/asm/libnvdimm.h | 19 
 drivers/nvdimm/nd.h  |  6 -
 drivers/nvdimm/pfn_devs.c| 32 +-
 include/linux/huge_mm.h  |  7 +-
 7 files changed, 100 insertions(+), 8 deletions(-)
 create mode 100644 arch/powerpc/include/asm/libnvdimm.h
 create mode 100644 arch/powerpc/mm/nvdimm.c
 create mode 100644 arch/x86/include/asm/libnvdimm.h

diff --git a/arch/powerpc/include/asm/libnvdimm.h 
b/arch/powerpc/include/asm/libnvdimm.h
new file mode 100644
index ..d35fd7f48603
--- /dev/null
+++ b/arch/powerpc/include/asm/libnvdimm.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_LIBNVDIMM_H
+#define _ASM_POWERPC_LIBNVDIMM_H
+
+#define nd_pfn_supported_alignments nd_pfn_supported_alignments
+extern unsigned long *nd_pfn_supported_alignments(void);
+extern unsigned long nd_pfn_default_alignment(void);
+
+#endif
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 0f499db315d6..42e4a399ba5d 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -20,3 +20,4 @@ obj-$(CONFIG_HIGHMEM) += highmem.o
 obj-$(CONFIG_PPC_COPRO_BASE)   += copro_fault.o
 obj-$(CONFIG_PPC_PTDUMP)   += ptdump/
 obj-$(CONFIG_KASAN)+= kasan/
+obj-$(CONFIG_NVDIMM_PFN)   += nvdimm.o
diff --git a/arch/powerpc/mm/nvdimm.c b/arch/powerpc/mm/nvdimm.c
new file mode 100644
index ..a29a4510715e
--- /dev/null
+++ b/arch/powerpc/mm/nvdimm.c
@@ -0,0 +1,34 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+
+#include 
+/*
+ * We support only pte and pmd mappings for now.
+ */
+const unsigned long *nd_pfn_supported_alignments(void)
+{
+   static unsigned long supported_alignments[3];
+
+   supported_alignments[0] = PAGE_SIZE;
+
+   if (has_transparent_hugepage())
+   supported_alignments[1] = HPAGE_PMD_SIZE;
+   else
+   supported_alignments[1] = 0;
+
+   supported_alignments[2] = 0;
+   return supported_alignments;
+}
+
+/*
+ * Use pmd mapping if supported as default alignment
+ */
+unsigned long nd_pfn_default_alignment(void)
+{
+
+   if (has_transparent_hugepage())
+   return HPAGE_PMD_SIZE;
+   return PAGE_SIZE;
+}
diff --git a/arch/x86/include/asm/libnvdimm.h b/arch/x86/include/asm/libnvdimm.h
new file mode 100644
index ..3d5361db9164
--- /dev/null
+++ b/arch/x86/include/asm/libnvdimm.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_LIBNVDIMM_H
+#define _ASM_X86_LIBNVDIMM_H
+
+static inline unsigned long nd_pfn_default_alignment(void)
+{
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+   return HPAGE_PMD_SIZE;
+#else
+   return PAGE_SIZE;
+#endif
+}
+
+static inline unsigned long nd_altmap_align_size(unsigned long nd_align)
+{
+   return PMD_SIZE;
+}
+
+#endif
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index a5ac3b240293..44fe923b2ee3 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -292,12 +292,6 @@ static inline struct device *nd_btt_create(struct 
nd_region *nd_region)
 struct nd_pfn *to_nd_pfn(struct device *dev);
 #if IS_ENABLED(CONFIG_NVDIMM_PFN)
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-#define PFN_DEFAULT_ALIGNMENT HPAGE_PMD_SIZE
-#else
-#define PFN_DEFAULT_ALIGNMENT PAGE_SIZE
-#endif
-
 int nd_pfn_probe(struct device *dev, struct nd_namespace_common *ndns);
 bool is_nd_pfn(struct device *dev);
 struct device *nd_pfn_create(struct nd_region *nd_region);
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 01f40672507f..347cab166376 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "nd-core.h"
 #include "pfn.h"
 #include "nd.h"
@@ -111,6 +112,8 @@ static ssize_t align_show(struct device *dev,
return sprintf(buf, "%ld\n", nd_pfn->align);
 }
 
+#ifndef nd_pfn_supported_alignments
+#define nd_

[RFC PATCH] mm/nvdimm: Fix kernel crash on devm_mremap_pages_release

2019-05-13 Thread Aneesh Kumar K.V
When we initialize the namespace, if we support altmap, we don't initialize all 
the
backing struct page where as while releasing the namespace we look at some of
these uninitilized struct page. This results in a kernel crash as below.

kernel BUG at include/linux/mm.h:1034!
cpu 0x2: Vector: 700 (Program Check) at [c0024146b870]
pc: c03788f8: devm_memremap_pages_release+0x258/0x3a0
lr: c03788f4: devm_memremap_pages_release+0x254/0x3a0
sp: c0024146bb00
   msr: 8282b033
  current = 0xc00241382f00
  paca= 0xc0003fffd680   irqmask: 0x03   irq_happened: 0x01
pid   = 4114, comm = ndctl
 c09bf8c0 devm_action_release+0x30/0x50
 c09c0938 release_nodes+0x268/0x2d0
 c09b95b4 device_release_driver_internal+0x164/0x230
 c09b638c unbind_store+0x13c/0x190
 c09b4f44 drv_attr_store+0x44/0x60
 c058ccc0 sysfs_kf_write+0x70/0xa0
 c058b52c kernfs_fop_write+0x1ac/0x290
 c04a415c __vfs_write+0x3c/0x70
 c04a85ac vfs_write+0xec/0x200
 c04a8920 ksys_write+0x80/0x130
 c000bee4 system_call+0x5c/0x70

Signed-off-by: Aneesh Kumar K.V 
---
 mm/page_alloc.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 59661106da16..892eabe1ec13 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5740,8 +5740,7 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
 
 #ifdef CONFIG_ZONE_DEVICE
/*
-* Honor reservation requested by the driver for this ZONE_DEVICE
-* memory. We limit the total number of pages to initialize to just
+* We limit the total number of pages to initialize to just
 * those that might contain the memory mapping. We will defer the
 * ZONE_DEVICE page initialization until after we have released
 * the hotplug lock.
@@ -5750,8 +5749,6 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
if (!altmap)
return;
 
-   if (start_pfn == altmap->base_pfn)
-   start_pfn += altmap->reserve;
end_pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
}
 #endif
-- 
2.21.0



Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-13 Thread Sergey Senozhatsky
On (05/14/19 11:07), Sergey Senozhatsky wrote:
> How about this:
> 
>   if ptr < PAGE_SIZE  -> "(null)"

No, this is totally stupid. Forget about it. Sorry.


>   if IS_ERR_VALUE(ptr)-> "(fault)"

But Steven's "(fault)" is nice.

-ss


[v2 2/2] [PowerPC] Allow use of SIMD in interrupts from kernel code

2019-05-13 Thread Shawn Landden
This second patch is separate because it could be wrong,
like I am not sure about how kernel thread migration works,
and it is even allowing simd in preemptible kernel code.

Signed-off-by: Shawn Landden 
---
 arch/powerpc/include/asm/switch_to.h | 15 ++-
 arch/powerpc/kernel/process.c| 60 ++--
 2 files changed, 59 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 5b03d8a82..c79f7d24a 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -30,10 +30,7 @@ extern void enable_kernel_fp(void);
 extern void flush_fp_to_thread(struct task_struct *);
 extern void giveup_fpu(struct task_struct *);
 extern void save_fpu(struct task_struct *);
-static inline void disable_kernel_fp(void)
-{
-   msr_check_and_clear(MSR_FP);
-}
+extern void disable_kernel_fp(void);
 #else
 static inline void save_fpu(struct task_struct *t) { }
 static inline void flush_fp_to_thread(struct task_struct *t) { }
@@ -44,10 +41,7 @@ extern void enable_kernel_altivec(void);
 extern void flush_altivec_to_thread(struct task_struct *);
 extern void giveup_altivec(struct task_struct *);
 extern void save_altivec(struct task_struct *);
-static inline void disable_kernel_altivec(void)
-{
-   msr_check_and_clear(MSR_VEC);
-}
+extern void disable_kernel_altivec(void);
 #else
 static inline void save_altivec(struct task_struct *t) { }
 static inline void __giveup_altivec(struct task_struct *t) { }
@@ -56,10 +50,7 @@ static inline void __giveup_altivec(struct task_struct *t) { 
}
 #ifdef CONFIG_VSX
 extern void enable_kernel_vsx(void);
 extern void flush_vsx_to_thread(struct task_struct *);
-static inline void disable_kernel_vsx(void)
-{
-   msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
-}
+extern void disable_kernel_vsx(void);
 #endif
 
 #ifdef CONFIG_SPE
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ef534831f..d15f2cb51 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -170,6 +170,29 @@ void __msr_check_and_clear(unsigned long bits)
 EXPORT_SYMBOL(__msr_check_and_clear);
 
 #ifdef CONFIG_PPC_FPU
+/*
+ * Track whether the kernel is using the FPU state
+ * currently.
+ *
+ * This flag is used:
+ *
+ *   - by IRQ context code to potentially use the FPU
+ * if it's unused.
+ *
+ *   - to debug kernel_fpu/altivec/vsx_begin()/end() correctness
+ */
+static DEFINE_PER_CPU(bool, in_kernel_fpu);
+
+static bool kernel_fpu_disabled(void)
+{
+return this_cpu_read(in_kernel_fpu);
+}
+
+static bool interrupted_kernel_fpu_idle(void)
+{
+return !kernel_fpu_disabled();
+}
+
 static void __giveup_fpu(struct task_struct *tsk)
 {
unsigned long msr;
@@ -230,7 +253,8 @@ void enable_kernel_fp(void)
 {
unsigned long cpumsr;
 
-   WARN_ON(preemptible());
+WARN_ON_ONCE(this_cpu_read(in_kernel_fpu));
+this_cpu_write(in_kernel_fpu, true);
 
cpumsr = msr_check_and_set(MSR_FP);
 
@@ -251,6 +275,15 @@ void enable_kernel_fp(void)
 }
 EXPORT_SYMBOL(enable_kernel_fp);
 
+void disable_kernel_fp(void)
+{
+WARN_ON_ONCE(!this_cpu_read(in_kernel_fpu));
+this_cpu_write(in_kernel_fpu, false);
+
+msr_check_and_clear(MSR_FP);
+}
+EXPORT_SYMBOL(disable_kernel_fp);
+
 static int restore_fp(struct task_struct *tsk)
 {
if (tsk->thread.load_fp || tm_active_with_fp(tsk)) {
@@ -295,7 +328,8 @@ void enable_kernel_altivec(void)
 {
unsigned long cpumsr;
 
-   WARN_ON(preemptible());
+   WARN_ON_ONCE(this_cpu_read(in_kernel_fpu));
+   this_cpu_write(in_kernel_fpu, true);
 
cpumsr = msr_check_and_set(MSR_VEC);
 
@@ -316,6 +350,14 @@ void enable_kernel_altivec(void)
 }
 EXPORT_SYMBOL(enable_kernel_altivec);
 
+extern void disable_kernel_altivec(void)
+{
+   WARN_ON_ONCE(!this_cpu_read(in_kernel_fpu));
+   this_cpu_write(in_kernel_fpu, false);
+   msr_check_and_clear(MSR_VEC);
+}
+EXPORT_SYMBOL(disable_kernel_altivec);
+
 /*
  * Make sure the VMX/Altivec register state in the
  * the thread_struct is up to date for task tsk.
@@ -371,7 +413,8 @@ static bool interrupted_user_mode(void)
 bool may_use_simd(void)
 {
return !in_interrupt() ||
-   interrupted_user_mode();
+   interrupted_user_mode() ||
+   interrupted_kernel_fpu_idle();
 }
 EXPORT_SYMBOL(may_use_simd);
 
@@ -411,7 +454,8 @@ void enable_kernel_vsx(void)
 {
unsigned long cpumsr;
 
-   WARN_ON(preemptible());
+   WARN_ON_ONCE(this_cpu_read(in_kernel_fpu));
+   this_cpu_write(in_kernel_fpu, true);
 
cpumsr = msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
 
@@ -433,6 +477,14 @@ void enable_kernel_vsx(void)
 }
 EXPORT_SYMBOL(enable_kernel_vsx);
 
+void disable_kernel_vsx(void)
+{
+   WARN_ON_ONCE(!this_cpu_read(in_kernel_fpu));
+   this_cpu_write(in_kernel_fpu, false);
+   msr_check_and_clear(MSR_FP|MSR_VEC|MSR_

[v2 1/2] [PowerPC] Add simd.h implementation

2019-05-13 Thread Shawn Landden
Based off the x86 one.

WireGuard really wants to be able to do SIMD in interrupts,
so it can accelerate its in-bound path.

Signed-off-by: Shawn Landden 
---
 arch/powerpc/include/asm/simd.h | 10 ++
 arch/powerpc/kernel/process.c   | 30 ++
 2 files changed, 40 insertions(+)
 create mode 100644 arch/powerpc/include/asm/simd.h

diff --git a/arch/powerpc/include/asm/simd.h b/arch/powerpc/include/asm/simd.h
new file mode 100644
index 0..9b066d633
--- /dev/null
+++ b/arch/powerpc/include/asm/simd.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+/*
+ * may_use_simd - whether it is allowable at this time to issue SIMD
+ *instructions or access the SIMD register file
+ *
+ * It's always ok in process context (ie "not interrupt")
+ * but it is sometimes ok even from an irq.
+ */
+extern bool may_use_simd(void);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index dd9e0d538..ef534831f 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -345,6 +345,36 @@ static int restore_altivec(struct task_struct *tsk)
}
return 0;
 }
+
+/*
+ * Were we in user mode when we were
+ * interrupted?
+ *
+ * Doing kernel_altivec/vsx_begin/end() is ok if we are running
+ * in an interrupt context from user mode - we'll just
+ * save the FPU state as required.
+ */
+static bool interrupted_user_mode(void)
+{
+   struct pt_regs *regs = get_irq_regs();
+
+   return regs && user_mode(regs);
+}
+
+/*
+ * Can we use FPU in kernel mode with the
+ * whole "kernel_fpu/altivec/vsx_begin/end()" sequence?
+ *
+ * It's always ok in process context (ie "not interrupt")
+ * but it is sometimes ok even from an irq.
+ */
+bool may_use_simd(void)
+{
+   return !in_interrupt() ||
+   interrupted_user_mode();
+}
+EXPORT_SYMBOL(may_use_simd);
+
 #else
 #define loadvec(thr) 0
 static inline int restore_altivec(struct task_struct *tsk) { return 0; }
-- 
2.21.0.1020.gf2820cf01a



Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-13 Thread Sergey Senozhatsky
On (05/13/19 14:42), Petr Mladek wrote:
> > The "(null)" is good enough by itself and already an established
> > practice..
> 
> (efault) made more sense with the probe_kernel_read() that
> checked wide range of addresses. Well, I still think that
> it makes sense to distinguish a pure NULL. And it still
> used also for IS_ERR_VALUE().

Wouldn't anything within first PAGE_SIZE bytes be reported as
a NULL deref?

   char *p = (char *)(PAGE_SIZE - 2);
   *p = 'a';

gives

 kernel: BUG: kernel NULL pointer dereference, address = 0ffe
 kernel: #PF: supervisor-privileged write access from kernel code
 kernel: #PF: error_code(0x0002) - not-present page


And I like Steven's "(fault)" idea.
How about this:

if ptr < PAGE_SIZE  -> "(null)"
if IS_ERR_VALUE(ptr)-> "(fault)"

-ss


[PATCH 2/2] [PowerPC] Allow use of SIMD in interrupts from kernel code

2019-05-13 Thread Shawn Landden
This second patch is separate because it could be wrong,
like I am not sure about how kernel thread migration works,
and it is even allowing simd in preemptible kernel code.

Signed-off-by: Shawn Landden 
---
 arch/powerpc/include/asm/simd.h  |  8 +
 arch/powerpc/include/asm/switch_to.h | 10 ++
 arch/powerpc/kernel/process.c| 50 ++--
 3 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/simd.h b/arch/powerpc/include/asm/simd.h
index 2c02ad531..7b582b07e 100644
--- a/arch/powerpc/include/asm/simd.h
+++ b/arch/powerpc/include/asm/simd.h
@@ -7,7 +7,15 @@
  * It's always ok in process context (ie "not interrupt")
  * but it is sometimes ok even from an irq.
  */
+#ifdef CONFIG_ALTIVEC
+extern bool irq_simd_usable(void);
 static __must_check inline bool may_use_simd(void)
 {
return irq_simd_usable();
 }
+#else
+static inline bool may_use_simd(void)
+{
+   return false;
+}
+#endif
diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 5b03d8a82..537998997 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -44,10 +44,7 @@ extern void enable_kernel_altivec(void);
 extern void flush_altivec_to_thread(struct task_struct *);
 extern void giveup_altivec(struct task_struct *);
 extern void save_altivec(struct task_struct *);
-static inline void disable_kernel_altivec(void)
-{
-   msr_check_and_clear(MSR_VEC);
-}
+extern void disable_kernel_altivec(void);
 #else
 static inline void save_altivec(struct task_struct *t) { }
 static inline void __giveup_altivec(struct task_struct *t) { }
@@ -56,10 +53,7 @@ static inline void __giveup_altivec(struct task_struct *t) { 
}
 #ifdef CONFIG_VSX
 extern void enable_kernel_vsx(void);
 extern void flush_vsx_to_thread(struct task_struct *);
-static inline void disable_kernel_vsx(void)
-{
-   msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
-}
+extern void disable_kernel_vsx(void);
 #endif
 
 #ifdef CONFIG_SPE
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index e436d708a..41a0ab500 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -267,6 +267,29 @@ static int restore_fp(struct task_struct *tsk) { return 0; 
}
 #ifdef CONFIG_ALTIVEC
 #define loadvec(thr) ((thr).load_vec)
 
+/*
+ * Track whether the kernel is using the SIMD state
+ * currently.
+ *
+ * This flag is used:
+ *
+ *   - by IRQ context code to potentially use the FPU
+ * if it's unused.
+ *
+ *   - to debug kernel_altivec/vsx_begin()/end() correctness
+ */
+static DEFINE_PER_CPU(bool, in_kernel_simd);
+
+static bool kernel_simd_disabled(void)
+{
+   return this_cpu_read(in_kernel_simd);
+}
+
+static bool interrupted_kernel_simd_idle(void)
+{
+   return !kernel_simd_disabled();
+}
+
 static void __giveup_altivec(struct task_struct *tsk)
 {
unsigned long msr;
@@ -295,7 +318,9 @@ void enable_kernel_altivec(void)
 {
unsigned long cpumsr;
 
-   WARN_ON(preemptible());
+   WARN_ON_ONCE(preemptible());
+   WARN_ON_ONCE(this_cpu_read(in_kernel_simd));
+   this_cpu_write(in_kernel_simd, true);
 
cpumsr = msr_check_and_set(MSR_VEC);
 
@@ -316,6 +341,14 @@ void enable_kernel_altivec(void)
 }
 EXPORT_SYMBOL(enable_kernel_altivec);
 
+extern void disable_kernel_altivec(void)
+{
+   WARN_ON_ONCE(!this_cpu_read(in_kernel_simd));
+   this_cpu_write(in_kernel_simd, false);
+   msr_check_and_clear(MSR_VEC);
+}
+EXPORT_SYMBOL(disable_kernel_altivec);
+
 /*
  * Make sure the VMX/Altivec register state in the
  * the thread_struct is up to date for task tsk.
@@ -371,7 +404,8 @@ static bool interrupted_user_mode(void)
 bool irq_simd_usable(void)
 {
return !in_interrupt() ||
-   interrupted_user_mode();
+   interrupted_user_mode() ||
+   interrupted_kernel_simd_idle();
 }
 EXPORT_SYMBOL(irq_simd_usable);
 
@@ -411,7 +445,9 @@ void enable_kernel_vsx(void)
 {
unsigned long cpumsr;
 
-   WARN_ON(preemptible());
+   WARN_ON_ONCE(preemptible());
+   WARN_ON_ONCE(this_cpu_read(in_kernel_simd));
+   this_cpu_write(in_kernel_simd, true);
 
cpumsr = msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
 
@@ -433,6 +469,14 @@ void enable_kernel_vsx(void)
 }
 EXPORT_SYMBOL(enable_kernel_vsx);
 
+void disable_kernel_vsx(void)
+{
+   WARN_ON_ONCE(!this_cpu_read(in_kernel_simd));
+   this_cpu_write(in_kernel_simd, false);
+   msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
+}
+EXPORT_SYMBOL(disable_kernel_vsx);
+
 void flush_vsx_to_thread(struct task_struct *tsk)
 {
if (tsk->thread.regs) {
-- 
2.21.0.1020.gf2820cf01a



[PATCH 1/2] [PowerPC] Add simd.h implementation

2019-05-13 Thread Shawn Landden
Based off the x86 one.

WireGuard really wants to be able to do SIMD in interrupts,
so it can accelerate its in-bound path.

Signed-off-by: Shawn Landden 
---
 arch/powerpc/include/asm/simd.h | 13 +
 arch/powerpc/kernel/process.c   | 30 ++
 2 files changed, 43 insertions(+)
 create mode 100644 arch/powerpc/include/asm/simd.h

diff --git a/arch/powerpc/include/asm/simd.h b/arch/powerpc/include/asm/simd.h
new file mode 100644
index 0..2c02ad531
--- /dev/null
+++ b/arch/powerpc/include/asm/simd.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+/*
+ * may_use_simd - whether it is allowable at this time to issue SIMD
+ *instructions or access the SIMD register file
+ *
+ * It's always ok in process context (ie "not interrupt")
+ * but it is sometimes ok even from an irq.
+ */
+static __must_check inline bool may_use_simd(void)
+{
+   return irq_simd_usable();
+}
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index dd9e0d538..e436d708a 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -345,6 +345,36 @@ static int restore_altivec(struct task_struct *tsk)
}
return 0;
 }
+
+/*
+ * Were we in user mode when we were
+ * interrupted?
+ *
+ * Doing kernel_altivec/vsx_begin/end() is ok if we are running
+ * in an interrupt context from user mode - we'll just
+ * save the FPU state as required.
+ */
+static bool interrupted_user_mode(void)
+{
+   struct pt_regs *regs = get_irq_regs();
+
+   return regs && user_mode(regs);
+}
+
+/*
+ * Can we use SIMD in kernel mode with the
+ * whole "kernel_altivec/vsx_begin/end()" sequence?
+ *
+ * It's always ok in process context (ie "not interrupt")
+ * but it is sometimes ok even from an irq.
+ */
+bool irq_simd_usable(void)
+{
+   return !in_interrupt() ||
+   interrupted_user_mode();
+}
+EXPORT_SYMBOL(irq_simd_usable);
+
 #else
 #define loadvec(thr) 0
 static inline int restore_altivec(struct task_struct *tsk) { return 0; }
-- 
2.21.0.1020.gf2820cf01a



Re: Kernel OOPS followed by a panic on next20190507 with 4K page size

2019-05-13 Thread Aneesh Kumar K.V

On 5/8/19 4:30 PM, Sachin Sant wrote:

While running LTP tests (specifically futex_wake04) against next-20199597
build with 4K page size on a POWER8 LPAR following crash is observed.

[ 4233.214876] BUG: Kernel NULL pointer dereference at 0x001c
[ 4233.214898] Faulting instruction address: 0xc1d1e58c
[ 4233.214905] Oops: Kernel access of bad area, sig: 11 [#1]
[ 4233.214911] LE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 4233.214920] Dumping ftrace buffer:
[ 4233.214928](ftrace buffer empty)
[ 4233.214933] Modules linked in: overlay rpadlpar_io rpaphp iptable_mangle 
xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 
ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc kvm iptable_filter 
pseries_rng rng_core vmx_crypto ip_tables x_tables autofs4 [last unloaded: 
dummy_del_mod]
[ 4233.214973] CPU: 3 PID: 4635 Comm: futex_wake04 Tainted: GW  O  
5.1.0-next-20190507-autotest #1
[ 4233.214980] NIP:  c1d1e58c LR: c1d1e54c CTR: 
[ 4233.214987] REGS: c4937890 TRAP: 0300   Tainted: GW  O   
(5.1.0-next-20190507-autotest)
[ 4233.214993] MSR:  80009033   CR: 22424822  
XER: 
[ 4233.215005] CFAR: c183e9e0 DAR: 001c DSISR: 4000 
IRQMASK: 0
[ 4233.215005] GPR00: c1901a80 c4937b20 c3938700 

[ 4233.215005] GPR04: 00400cc0 0003efff 00027966e000 
c3ba8700
[ 4233.215005] GPR08: c3ba8700 0d601125 c3ba8700 
8000
[ 4233.215005] GPR12: 22424822 c0001ecae280  

[ 4233.215005] GPR16:    

[ 4233.215005] GPR20: 0018 c39e2d30 c39e2d28 
c002762da460
[ 4233.215005] GPR24: 001c  0001 
c1901a80
[ 4233.215005] GPR28: 00400cc0   
00400cc0
[ 4233.215065] NIP [c1d1e58c] kmem_cache_alloc+0xbc/0x5a0
[ 4233.215071] LR [c1d1e54c] kmem_cache_alloc+0x7c/0x5a0
[ 4233.215075] Call Trace:
[ 4233.215081] [c4937b20] [c1c91150] __pud_alloc+0x160/0x200 
(unreliable)
[ 4233.215090] [c4937b80] [c1901a80] huge_pte_alloc+0x580/0x950
[ 4233.215098] [c4937c00] [c1cf7910] hugetlb_fault+0x9a0/0x1250
[ 4233.215106] [c4937ce0] [c1c94a80] handle_mm_fault+0x490/0x4a0
[ 4233.215114] [c4937d20] [c18d529c] 
__do_page_fault+0x77c/0x1f00
[ 4233.215121] [c4937e00] [c18d6a48] do_page_fault+0x28/0x50
[ 4233.215129] [c4937e20] [c183b0d4] handle_page_fault+0x18/0x38
[ 4233.215135] Instruction dump:
[ 4233.215139] 39290001 f92ac1b0 419e009c 3ce20027 3ba0 e927c1f0 39290001 
f927c1f0
[ 4233.215149] 3d420027 e92ac290 39290001 f92ac290 <8359001c> 83390018 6000 
3ce20027


I did send a patch to the list to handle page allocation failures in 
this patch. But i guess what we are finding here is get_current() 
crashing. Any chance to bisect this?


-aneesh



[PATCH] powerpc/mm: Handle page table allocation failures

2019-05-13 Thread Aneesh Kumar K.V
This fix the below crash that arise due to not handling page table allocation
failures while allocating hugetlb page table.

 BUG: Kernel NULL pointer dereference at 0x001c
 Faulting instruction address: 0xc1d1e58c
 Oops: Kernel access of bad area, sig: 11 [#1]
 LE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries

 CPU: 3 PID: 4635 Comm: futex_wake04 Tainted: GW  O  
5.1.0-next-20190507-autotest #1
 NIP:  c1d1e58c LR: c1d1e54c CTR: 
 REGS: c4937890 TRAP: 0300   Tainted: GW  O   
(5.1.0-next-20190507-autotest)
 MSR:  80009033   CR: 22424822  XER: 
 CFAR: c183e9e0 DAR: 001c DSISR: 4000 IRQMASK: 0
 GPR00: c1901a80 c4937b20 c3938700 
 GPR04: 00400cc0 0003efff 00027966e000 c3ba8700
 GPR08: c3ba8700 0d601125 c3ba8700 8000
 GPR12: 22424822 c0001ecae280  
 GPR16:    
 GPR20: 0018 c39e2d30 c39e2d28 c002762da460
 GPR24: 001c  0001 c1901a80
 GPR28: 00400cc0   00400cc0
 NIP [c1d1e58c] kmem_cache_alloc+0xbc/0x5a0
 LR [c1d1e54c] kmem_cache_alloc+0x7c/0x5a0
 Call Trace:
  [c1c91150] __pud_alloc+0x160/0x200 (unreliable)
  [c1901a80] huge_pte_alloc+0x580/0x950
  [c1cf7910] hugetlb_fault+0x9a0/0x1250
  [c1c94a80] handle_mm_fault+0x490/0x4a0
  [c18d529c] __do_page_fault+0x77c/0x1f00
  [c18d6a48] do_page_fault+0x28/0x50
  [c183b0d4] handle_page_fault+0x18/0x38

Fixes: e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a 
different page table format")
Reported-by: Sachin Sant 
Signed-off-by: Aneesh Kumar K.V 
---

Note: I did add a recent commit for the Fixes tag. But in reality we never 
checked for page table
allocation failure there. If we want to go to that old commit, then we may need.

Fixes: a4fe3ce7699b ("powerpc/mm: Allow more flexible layouts for hugepage 
pagetables")

 arch/powerpc/mm/hugetlbpage.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index c5c9ff2d7afc..ae9d71da5219 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -130,6 +130,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
} else {
pdshift = PUD_SHIFT;
pu = pud_alloc(mm, pg, addr);
+   if (!pu)
+   return NULL;
if (pshift == PUD_SHIFT)
return (pte_t *)pu;
else if (pshift > PMD_SHIFT) {
@@ -138,6 +140,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
} else {
pdshift = PMD_SHIFT;
pm = pmd_alloc(mm, pu, addr);
+   if (!pm)
+   return NULL;
if (pshift == PMD_SHIFT)
/* 16MB hugepage */
return (pte_t *)pm;
@@ -154,12 +158,16 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long 
addr, unsigned long sz
} else {
pdshift = PUD_SHIFT;
pu = pud_alloc(mm, pg, addr);
+   if (!pu)
+   return NULL;
if (pshift >= PUD_SHIFT) {
ptl = pud_lockptr(mm, pu);
hpdp = (hugepd_t *)pu;
} else {
pdshift = PMD_SHIFT;
pm = pmd_alloc(mm, pu, addr);
+   if (!pm)
+   return NULL;
ptl = pmd_lockptr(mm, pm);
hpdp = (hugepd_t *)pm;
}
-- 
2.21.0



[Bug 203597] kernel 4.9.175 fails to boot on a PowerMac G4 3,6 at early stage

2019-05-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=203597

--- Comment #1 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 282745
  --> https://bugzilla.kernel.org/attachment.cgi?id=282745&action=edit
bisect.log

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 203597] New: kernel 4.9.175 fails to boot on a PowerMac G4 3,6 at early stage

2019-05-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=203597

Bug ID: 203597
   Summary: kernel 4.9.175 fails to boot on a PowerMac G4 3,6 at
early stage
   Product: Platform Specific/Hardware
   Version: 2.5
Kernel Version: 4.9.175
  Hardware: PPC-32
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: PPC-32
  Assignee: platform_ppc...@kernel-bugs.osdl.org
  Reporter: erhar...@mailbox.org
Regression: No

Created attachment 282743
  --> https://bugzilla.kernel.org/attachment.cgi?id=282743&action=edit
kernel .config (PowerMac G4 MDD)

Trying out older kernels on the G4 MDD I noticed recent 4.9.x freeze the
machine. Only message displayed in black letters on a white screen:

done
found display   : /pco@f000/ATY,AlteracParent@10/ATY,Alterac_B@1,
opening...


It's a hard freeze, RCU_CPU_STALL_TIMEOUT does not kick in.

Tried other stable kernels, which all worked:
4.19.37
4.14.114 
4.4.179

So I suppose it's only a 4.9.x issue. Last working 4.9.x kernel I had in
service was 4.9.161. First one I spotted freezing was 4.9.171. A bisect gave me
the following commit:

1c38a84d45862be06ac418618981631eddbda741 is the first bad commit
commit 1c38a84d45862be06ac418618981631eddbda741
Author: Michael Neuling 
Date:   Thu Apr 11 21:45:59 2019 +1000

powerpc: Avoid code patching freed init sections

commit 51c3c62b58b357e8d35e4cc32f7b4ec907426fe3 upstream.

This stops us from doing code patching in init sections after they've
been freed.

In this chain:
  kvm_guest_init() ->
kvm_use_magic_page() ->
  fault_in_pages_readable() ->
 __get_user() ->
   __get_user_nocheck() ->
 barrier_nospec();

We have a code patching location at barrier_nospec() and
kvm_guest_init() is an init function. This whole chain gets inlined,
so when we free the init section (hence kvm_guest_init()), this code
goes away and hence should no longer be patched.

We seen this as userspace memory corruption when using a memory
checker while doing partition migration testing on powervm (this
starts the code patching post migration via
/sys/kernel/mobility/migration). In theory, it could also happen when
using /sys/kernel/debug/powerpc/barrier_nospec.

Cc: sta...@vger.kernel.org # 4.13+
Signed-off-by: Michael Neuling 
Reviewed-by: Nicholas Piggin 
Reviewed-by: Christophe Leroy 
Signed-off-by: Michael Ellerman 
Signed-off-by: Sasha Levin 

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH] powerpc/powernv/npu: Fix reference leak

2019-05-13 Thread Greg Kurz
Michael,

Any comments on this patch ? Should I repost with a shorter comment
as suggested by Alexey ?

Cheers,

--
Greg

On Mon, 29 Apr 2019 12:36:59 +0200
Greg Kurz  wrote:

> On Mon, 29 Apr 2019 16:01:29 +1000
> Alexey Kardashevskiy  wrote:
> 
> > On 20/04/2019 01:34, Greg Kurz wrote:  
> > > Since 902bdc57451c, get_pci_dev() calls pci_get_domain_bus_and_slot(). 
> > > This
> > > has the effect of incrementing the reference count of the PCI device, as
> > > explained in drivers/pci/search.c:
> > > 
> > >  * Given a PCI domain, bus, and slot/function number, the desired PCI
> > >  * device is located in the list of PCI devices. If the device is
> > >  * found, its reference count is increased and this function returns a
> > >  * pointer to its data structure.  The caller must decrement the
> > >  * reference count by calling pci_dev_put().  If no device is found,
> > >  * %NULL is returned.
> > > 
> > > Nothing was done to call pci_dev_put() and the reference count of GPU and
> > > NPU PCI devices rockets up.
> > > 
> > > A natural way to fix this would be to teach the callers about the change,
> > > so that they call pci_dev_put() when done with the pointer. This turns
> > > out to be quite intrusive, as it affects many paths in npu-dma.c,
> > > pci-ioda.c and vfio_pci_nvlink2.c.
> > 
> > 
> > afaict this referencing is only done to protect the current traverser
> > and what you've done is actually a natural way (and the generic
> > pci_get_dev_by_id() does exactly the same), although this looks a bit weird.
> >   
> 
> Not exactly the same: pci_get_dev_by_id() always increment the refcount
> of the returned PCI device. The refcount is only decremented when this
> device is passed to pci_get_dev_by_id() to continue searching.
> 
> That means that the users of the PCI device pointer returned by
> pci_get_dev_by_id() or its exported variants pci_get_subsys(),
> pci_get_device() and pci_get_class() do handle the refcount. They
> all pass the pointer to pci_dev_put() or continue the search,
> which calls pci_dev_put() internally.
> 
> Direct and indirect callers of get_pci_dev() don't care for the
> refcount at all unless I'm missing something.
> 
> >   
> > > Also, the issue appeared in 4.16 and
> > > some affected code got moved around since then: it would be problematic
> > > to backport the fix to stable releases.
> > > 
> > > All that code never cared for reference counting anyway. Call 
> > > pci_dev_put()
> > > from get_pci_dev() to revert to the previous behavior.
> > >> Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev
> > from pci_dn")  
> > > Cc: sta...@vger.kernel.org # v4.16
> > > Signed-off-by: Greg Kurz 
> > > ---
> > >  arch/powerpc/platforms/powernv/npu-dma.c |   15 ++-
> > >  1 file changed, 14 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> > > b/arch/powerpc/platforms/powernv/npu-dma.c
> > > index e713ade30087..d8f3647e8fb2 100644
> > > --- a/arch/powerpc/platforms/powernv/npu-dma.c
> > > +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> > > @@ -31,9 +31,22 @@ static DEFINE_SPINLOCK(npu_context_lock);
> > >  static struct pci_dev *get_pci_dev(struct device_node *dn)
> > >  {
> > >   struct pci_dn *pdn = PCI_DN(dn);
> > > + struct pci_dev *pdev;
> > >  
> > > - return pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus),
> > > + pdev = pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus),
> > >  pdn->busno, pdn->devfn);
> > > +
> > > + /*
> > > +  * pci_get_domain_bus_and_slot() increased the reference count of
> > > +  * the PCI device, but callers don't need that actually as the PE
> > > +  * already holds a reference to the device.
> > 
> > Imho this would be just enough.
> > 
> > Anyway,
> > 
> > Reviewed-by: Alexey Kardashevskiy 
> >   
> 
> Thanks !
> 
> I now realize that I forgot to add the --cc option for stable on my stgit
> command line :-\.
> 
> Cc'ing now.
> 
> > 
> > How did you find it? :)
> >   
> 
> While reading code to find some inspiration for OpenCAPI passthrough. :)
> 
> I saw the following in vfio_pci_ibm_npu2_init():
> 
>   if (!pnv_pci_get_gpu_dev(vdev->pdev))
>   return -ENODEV;
> 
> and simply followed the function calls.
> 
> >   
> > > Since callers aren't
> > > +  * aware of the reference count change, call pci_dev_put() now to
> > > +  * avoid leaks.
> > > +  */
> > > + if (pdev)
> > > + pci_dev_put(pdev);
> > > +
> > > + return pdev;
> > >  }
> > >  
> > >  /* Given a NPU device get the associated PCI device. */
> > > 
> >   
> 



Re: [PATCH v3 4/6] dt-bindings: soc/fsl: qe: document new fsl,qe-snums binding

2019-05-13 Thread Rob Herring
On Mon, 13 May 2019 11:14:58 +, Rasmus Villemoes wrote:
> Reading table 4-30, and its footnotes, of the QUICC Engine Block
> Reference Manual shows that the set of snum _values_ is not
> necessarily just a function of the _number_ of snums, as given in the
> fsl,qe-num-snums property.
> 
> As an alternative, to make it easier to add support for other variants
> of the QUICC engine IP, this introduces a new binding fsl,qe-snums,
> which automatically encodes both the number of snums and the actual
> values to use.
> 
> Signed-off-by: Rasmus Villemoes 
> ---
> Rob, thanks for the review of v2. However, since I moved the example
> from the commit log to the binding (per Joakim's request), I didn't
> add a Reviewed-by tag for this revision.
> 
>  .../devicetree/bindings/soc/fsl/cpm_qe/qe.txt   | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 

Reviewed-by: Rob Herring 


Re: [PATCH v1 4/8] soc/fsl/qbman: Use index when accessing device tree properties

2019-05-13 Thread Roy Pledge
On 5/13/2019 12:40 PM, Joakim Tjernlund wrote:
> On Mon, 2019-05-13 at 16:09 +, Roy Pledge wrote:
>> The index value should be passed to the of_parse_phandle()
>> function to ensure the correct property is read.
> Is this a bug fix? Maybe for stable too?
>
>  Jocke
Yes this could go to stable.  I will include sta...@vger.kernel.org when
I send the next version.
>
>> Signed-off-by: Roy Pledge 
>> ---
>>  drivers/soc/fsl/qbman/dpaa_sys.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/soc/fsl/qbman/dpaa_sys.c 
>> b/drivers/soc/fsl/qbman/dpaa_sys.c
>> index 3e0a7f3..0b901a8 100644
>> --- a/drivers/soc/fsl/qbman/dpaa_sys.c
>> +++ b/drivers/soc/fsl/qbman/dpaa_sys.c
>> @@ -49,7 +49,7 @@ int qbman_init_private_mem(struct device *dev, int idx, 
>> dma_addr_t *addr,
>> idx, ret);
>> return -ENODEV;
>> }
>> -   mem_node = of_parse_phandle(dev->of_node, "memory-region", 0);
>> +   mem_node = of_parse_phandle(dev->of_node, "memory-region", idx);
>> if (mem_node) {
>> ret = of_property_read_u64(mem_node, "size", &size64);
>> if (ret) {
>> --
>> 2.7.4
>>
>



Re: [PATCH v1 4/8] soc/fsl/qbman: Use index when accessing device tree properties

2019-05-13 Thread Joakim Tjernlund
On Mon, 2019-05-13 at 16:09 +, Roy Pledge wrote:
> 
> The index value should be passed to the of_parse_phandle()
> function to ensure the correct property is read.

Is this a bug fix? Maybe for stable too?

 Jocke

> 
> Signed-off-by: Roy Pledge 
> ---
>  drivers/soc/fsl/qbman/dpaa_sys.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/soc/fsl/qbman/dpaa_sys.c 
> b/drivers/soc/fsl/qbman/dpaa_sys.c
> index 3e0a7f3..0b901a8 100644
> --- a/drivers/soc/fsl/qbman/dpaa_sys.c
> +++ b/drivers/soc/fsl/qbman/dpaa_sys.c
> @@ -49,7 +49,7 @@ int qbman_init_private_mem(struct device *dev, int idx, 
> dma_addr_t *addr,
> idx, ret);
> return -ENODEV;
> }
> -   mem_node = of_parse_phandle(dev->of_node, "memory-region", 0);
> +   mem_node = of_parse_phandle(dev->of_node, "memory-region", idx);
> if (mem_node) {
> ret = of_property_read_u64(mem_node, "size", &size64);
> if (ret) {
> --
> 2.7.4
> 



Re: [PATCH 2/8] powerpc/pseries: Do not save the previous DTL mask value

2019-05-13 Thread Nathan Lynch
"Naveen N. Rao"  writes:

> When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is enabled, we always initialize
> DTL enable mask to DTL_LOG_PREEMPT (0x2). There are no other places
> where the mask is changed. As such, when reading the DTL log buffer
> through debugfs, there is no need to save and restore the previous mask
> value.
>
> We don't need to save and restore the earlier mask value if
> CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not enabled. So, remove the field
> from the structure as well.

Makes sense.

Acked-by: Nathan Lynch 



[PATCH v1 8/8] soc/fsl/qbman: Update device tree with reserved memory

2019-05-13 Thread Roy Pledge
When using the reserved memory node in the device tree there are
two options - dynamic or static. If a dynamic allocation was
selected (where the kernel selects the address for the allocation)
convert it to a static allocation by inserting the reg property.
This will ensure the same memory is reused after a kexec()

Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/dpaa_sys.c | 58 
 1 file changed, 35 insertions(+), 23 deletions(-)

diff --git a/drivers/soc/fsl/qbman/dpaa_sys.c b/drivers/soc/fsl/qbman/dpaa_sys.c
index 0b901a8..9dd8bb5 100644
--- a/drivers/soc/fsl/qbman/dpaa_sys.c
+++ b/drivers/soc/fsl/qbman/dpaa_sys.c
@@ -37,41 +37,53 @@
 int qbman_init_private_mem(struct device *dev, int idx, dma_addr_t *addr,
size_t *size)
 {
-   int ret;
struct device_node *mem_node;
-   u64 size64;
struct reserved_mem *rmem;
+   struct property *prop;
+   int len, err;
+   __be32 *res_array;
 
-   ret = of_reserved_mem_device_init_by_idx(dev, dev->of_node, idx);
-   if (ret) {
-   dev_err(dev,
-   "of_reserved_mem_device_init_by_idx(%d) failed 0x%x\n",
-   idx, ret);
-   return -ENODEV;
-   }
mem_node = of_parse_phandle(dev->of_node, "memory-region", idx);
-   if (mem_node) {
-   ret = of_property_read_u64(mem_node, "size", &size64);
-   if (ret) {
-   dev_err(dev, "of_address_to_resource fails 0x%x\n",
-   ret);
-   return -ENODEV;
-   }
-   *size = size64;
-   } else {
+   if (!mem_node) {
dev_err(dev, "No memory-region found for index %d\n", idx);
return -ENODEV;
}
 
rmem = of_reserved_mem_lookup(mem_node);
+   if (!rmem) {
+   dev_err(dev, "of_reserved_mem_lookup() returned NULL\n");
+   return -ENODEV;
+   }
*addr = rmem->base;
+   *size = rmem->size;
 
/*
-* Disassociate the reserved memory area from the device
-* because a device can only have one DMA memory area. This
-* should be fine since the memory is allocated and initialized
-* and only ever accessed by the QBMan device from now on
+* Check if the reg property exists - if not insert the node
+* so upon kexec() the same memory region address will be preserved.
+* This is needed because QBMan HW does not allow the base address/
+* size to be modified once set.
 */
-   of_reserved_mem_device_release(dev);
+   prop = of_find_property(mem_node, "reg", &len);
+   if (!prop) {
+   prop = devm_kzalloc(dev, sizeof(*prop), GFP_KERNEL);
+   if (!prop)
+   return -ENOMEM;
+   prop->value = res_array = devm_kzalloc(dev, sizeof(__be32) * 4,
+  GFP_KERNEL);
+   if (!prop->value)
+   return -ENOMEM;
+   res_array[0] = cpu_to_be32(upper_32_bits(*addr));
+   res_array[1] = cpu_to_be32(lower_32_bits(*addr));
+   res_array[2] = cpu_to_be32(upper_32_bits(*size));
+   res_array[3] = cpu_to_be32(lower_32_bits(*size));
+   prop->length = sizeof(__be32) * 4;
+   prop->name = devm_kstrdup(dev, "reg", GFP_KERNEL);
+   if (!prop->name)
+   return -ENOMEM;
+   err = of_add_property(mem_node, prop);
+   if (err)
+   return err;
+   }
+
return 0;
 }
-- 
2.7.4



[PATCH v1 7/8] soc/fsl/qbman: Fixup qman_shutdown_fq()

2019-05-13 Thread Roy Pledge
When shutting down a FQ on a dedicated channel only the
SW portal associated with that channel can dequeue from it.
Make sure the correct portal is use.

Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/qman.c | 53 +++-
 1 file changed, 42 insertions(+), 11 deletions(-)

diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index 4a99ce5..bf68d86 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -1018,6 +1018,20 @@ static inline void put_affine_portal(void)
put_cpu_var(qman_affine_portal);
 }
 
+
+static inline struct qman_portal *get_portal_for_channel(u16 channel)
+{
+   int i;
+
+   for (i = 0; i < num_possible_cpus(); i++) {
+   if (affine_portals[i] &&
+   affine_portals[i]->config->channel == channel)
+   return affine_portals[i];
+   }
+
+   return NULL;
+}
+
 static struct workqueue_struct *qm_portal_wq;
 
 int qman_dqrr_set_ithresh(struct qman_portal *portal, u8 ithresh)
@@ -2601,7 +2615,7 @@ static int _qm_dqrr_consume_and_match(struct qm_portal 
*p, u32 fqid, int s,
 
 int qman_shutdown_fq(u32 fqid)
 {
-   struct qman_portal *p;
+   struct qman_portal *p, *channel_portal;
struct device *dev;
union qm_mc_command *mcc;
union qm_mc_result *mcr;
@@ -2641,17 +2655,28 @@ int qman_shutdown_fq(u32 fqid)
channel = qm_fqd_get_chan(&mcr->queryfq.fqd);
wq = qm_fqd_get_wq(&mcr->queryfq.fqd);
 
+   if (channel < qm_channel_pool1) {
+   channel_portal = get_portal_for_channel(channel);
+   if (channel_portal == NULL) {
+   dev_err(dev, "Can't find portal for dedicated channel 
0x%x\n",
+   channel);
+   ret = -EIO;
+   goto out;
+   }
+   } else
+   channel_portal = p;
+
switch (state) {
case QM_MCR_NP_STATE_TEN_SCHED:
case QM_MCR_NP_STATE_TRU_SCHED:
case QM_MCR_NP_STATE_ACTIVE:
case QM_MCR_NP_STATE_PARKED:
orl_empty = 0;
-   mcc = qm_mc_start(&p->p);
+   mcc = qm_mc_start(&channel_portal->p);
qm_fqid_set(&mcc->fq, fqid);
-   qm_mc_commit(&p->p, QM_MCC_VERB_ALTER_RETIRE);
-   if (!qm_mc_result_timeout(&p->p, &mcr)) {
-   dev_err(dev, "QUERYFQ_NP timeout\n");
+   qm_mc_commit(&channel_portal->p, QM_MCC_VERB_ALTER_RETIRE);
+   if (!qm_mc_result_timeout(&channel_portal->p, &mcr)) {
+   dev_err(dev, "ALTER_RETIRE timeout\n");
ret = -ETIMEDOUT;
goto out;
}
@@ -2659,6 +2684,9 @@ int qman_shutdown_fq(u32 fqid)
QM_MCR_VERB_ALTER_RETIRE);
res = mcr->result; /* Make a copy as we reuse MCR below */
 
+   if (res == QM_MCR_RESULT_OK)
+   drain_mr_fqrni(&channel_portal->p);
+
if (res == QM_MCR_RESULT_PENDING) {
/*
 * Need to wait for the FQRN in the message ring, which
@@ -2688,21 +2716,25 @@ int qman_shutdown_fq(u32 fqid)
}
/* Set the sdqcr to drain this channel */
if (channel < qm_channel_pool1)
-   qm_dqrr_sdqcr_set(&p->p,
+   qm_dqrr_sdqcr_set(&channel_portal->p,
  QM_SDQCR_TYPE_ACTIVE |
  QM_SDQCR_CHANNELS_DEDICATED);
else
-   qm_dqrr_sdqcr_set(&p->p,
+   qm_dqrr_sdqcr_set(&channel_portal->p,
  QM_SDQCR_TYPE_ACTIVE |
  QM_SDQCR_CHANNELS_POOL_CONV
  (channel));
do {
/* Keep draining DQRR while checking the MR*/
-   qm_dqrr_drain_nomatch(&p->p);
+   qm_dqrr_drain_nomatch(&channel_portal->p);
/* Process message ring too */
-   found_fqrn = qm_mr_drain(&p->p, FQRN);
+   found_fqrn = qm_mr_drain(&channel_portal->p,
+FQRN);
cpu_relax();
} while (!found_fqrn);
+   /* Restore SDQCR */
+   qm_dqrr_sdqcr_set(&channel_portal->p,
+ channel_portal->sdqcr);
 
}
if (res != QM_MCR_RESULT_OK &&
@@ -2733,9 +2765,8 @@ int qman_shutdown_fq(u32 fqid)
 

[PATCH v1 6/8] soc/fsl/qbman: Disable interrupts during portal recovery

2019-05-13 Thread Roy Pledge
Disable the QBMan interrupts during recovery.

Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/qman.c  | 22 +++---
 drivers/soc/fsl/qbman/qman_ccsr.c |  1 +
 drivers/soc/fsl/qbman/qman_priv.h |  1 +
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index 2989504..4a99ce5 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -1070,6 +1070,20 @@ int qman_wq_alloc(void)
return 0;
 }
 
+
+void qman_enable_irqs(void)
+{
+   int i;
+
+   for (i = 0; i < num_possible_cpus(); i++) {
+   if (affine_portals[i]) {
+   qm_out(&affine_portals[i]->p, QM_REG_ISR, 0x);
+   qm_out(&affine_portals[i]->p, QM_REG_IIR, 0);
+   }
+
+   }
+}
+
 /*
  * This is what everything can wait on, even if it migrates to a different cpu
  * to the one whose affine portal it is waiting on.
@@ -1269,8 +1283,8 @@ static int qman_create_portal(struct qman_portal *portal,
qm_out(p, QM_REG_ISDR, isdr);
portal->irq_sources = 0;
qm_out(p, QM_REG_IER, 0);
-   qm_out(p, QM_REG_ISR, 0x);
snprintf(portal->irqname, MAX_IRQNAME, IRQNAME, c->cpu);
+   qm_out(p, QM_REG_IIR, 1);
if (request_irq(c->irq, portal_isr, 0, portal->irqname, portal)) {
dev_err(c->dev, "request_irq() failed\n");
goto fail_irq;
@@ -1290,7 +1304,7 @@ static int qman_create_portal(struct qman_portal *portal,
isdr &= ~(QM_PIRQ_DQRI | QM_PIRQ_MRI);
qm_out(p, QM_REG_ISDR, isdr);
if (qm_dqrr_current(p)) {
-   dev_err(c->dev, "DQRR unclean\n");
+   dev_dbg(c->dev, "DQRR unclean\n");
qm_dqrr_cdc_consume_n(p, 0x);
}
if (qm_mr_current(p) && drain_mr_fqrni(p)) {
@@ -1303,8 +1317,10 @@ static int qman_create_portal(struct qman_portal *portal,
}
/* Success */
portal->config = c;
+   qm_out(p, QM_REG_ISR, 0x);
qm_out(p, QM_REG_ISDR, 0);
-   qm_out(p, QM_REG_IIR, 0);
+   if (!qman_requires_cleanup())
+   qm_out(p, QM_REG_IIR, 0);
/* Write a sane SDQCR */
qm_dqrr_sdqcr_set(p, portal->sdqcr);
return 0;
diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c 
b/drivers/soc/fsl/qbman/qman_ccsr.c
index bee2d0e..3894172 100644
--- a/drivers/soc/fsl/qbman/qman_ccsr.c
+++ b/drivers/soc/fsl/qbman/qman_ccsr.c
@@ -735,6 +735,7 @@ int qman_requires_cleanup(void)
 
 void qman_done_cleanup(void)
 {
+   qman_enable_irqs();
__qman_requires_cleanup = 0;
 }
 
diff --git a/drivers/soc/fsl/qbman/qman_priv.h 
b/drivers/soc/fsl/qbman/qman_priv.h
index bf51c17..9d37ddd 100644
--- a/drivers/soc/fsl/qbman/qman_priv.h
+++ b/drivers/soc/fsl/qbman/qman_priv.h
@@ -272,3 +272,4 @@ int qman_shutdown_fq(u32 fqid);
 
 int qman_requires_cleanup(void);
 void qman_done_cleanup(void);
+void qman_enable_irqs(void);
-- 
2.7.4



[PATCH v1 3/8] soc/fsl/qbman: Cleanup QMan queues if device was already initialized

2019-05-13 Thread Roy Pledge
If the QMan device was previously initialized make sure all the
frame queues are out of service once all the portals are probed.
This handles the case where the kernel is restarted without the
SoC being reset (kexec for example)

Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/qman.c|  4 ++--
 drivers/soc/fsl/qbman/qman_ccsr.c   | 13 -
 drivers/soc/fsl/qbman/qman_portal.c | 18 +-
 drivers/soc/fsl/qbman/qman_priv.h   |  7 +++
 4 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index 636f83f..f10f77d 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -2581,7 +2581,7 @@ static int _qm_dqrr_consume_and_match(struct qm_portal 
*p, u32 fqid, int s,
 #define qm_dqrr_drain_nomatch(p) \
_qm_dqrr_consume_and_match(p, 0, 0, false)
 
-static int qman_shutdown_fq(u32 fqid)
+int qman_shutdown_fq(u32 fqid)
 {
struct qman_portal *p;
struct device *dev;
@@ -2754,7 +2754,7 @@ static int qman_shutdown_fq(u32 fqid)
 
DPAA_ASSERT((mcr->verb & QM_MCR_VERB_MASK) ==
QM_MCR_VERB_ALTER_OOS);
-   if (mcr->result) {
+   if (mcr->result != QM_MCR_RESULT_OK) {
dev_err(dev, "OOS fail: FQ 0x%x (0x%x)\n",
fqid, mcr->result);
ret = -EIO;
diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c 
b/drivers/soc/fsl/qbman/qman_ccsr.c
index d054e37..bee2d0e 100644
--- a/drivers/soc/fsl/qbman/qman_ccsr.c
+++ b/drivers/soc/fsl/qbman/qman_ccsr.c
@@ -483,7 +483,7 @@ RESERVEDMEM_OF_DECLARE(qman_pfdr, "fsl,qman-pfdr", 
qman_pfdr);
 
 #endif
 
-static unsigned int qm_get_fqid_maxcnt(void)
+unsigned int qm_get_fqid_maxcnt(void)
 {
return fqd_sz / 64;
 }
@@ -728,6 +728,17 @@ int qman_is_probed(void)
 }
 EXPORT_SYMBOL_GPL(qman_is_probed);
 
+int qman_requires_cleanup(void)
+{
+   return __qman_requires_cleanup;
+}
+
+void qman_done_cleanup(void)
+{
+   __qman_requires_cleanup = 0;
+}
+
+
 static int fsl_qman_probe(struct platform_device *pdev)
 {
struct device *dev = &pdev->dev;
diff --git a/drivers/soc/fsl/qbman/qman_portal.c 
b/drivers/soc/fsl/qbman/qman_portal.c
index 2460802..8295d75 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -233,7 +233,7 @@ static int qman_portal_probe(struct platform_device *pdev)
struct device_node *node = dev->of_node;
struct qm_portal_config *pcfg;
struct resource *addr_phys[2];
-   int irq, cpu, err;
+   int irq, cpu, err, i;
u32 val;
 
err = qman_is_probed();
@@ -327,6 +327,22 @@ static int qman_portal_probe(struct platform_device *pdev)
if (!cpu_online(cpu))
qman_offline_cpu(cpu);
 
+   if (__qman_portals_probed == 1 && qman_requires_cleanup()) {
+   /*
+* QMan wasn't reset prior to boot (Kexec for example)
+* Empty all the frame queues so they are in reset state
+*/
+   for (i = 0; i < qm_get_fqid_maxcnt(); i++) {
+   err =  qman_shutdown_fq(i);
+   if (err) {
+   dev_err(dev, "Failed to shutdown frame queue 
%d\n",
+   i);
+   goto err_portal_init;
+   }
+   }
+   qman_done_cleanup();
+   }
+
return 0;
 
 err_portal_init:
diff --git a/drivers/soc/fsl/qbman/qman_priv.h 
b/drivers/soc/fsl/qbman/qman_priv.h
index 75a8f90..bf51c17 100644
--- a/drivers/soc/fsl/qbman/qman_priv.h
+++ b/drivers/soc/fsl/qbman/qman_priv.h
@@ -265,3 +265,10 @@ extern struct qman_portal *affine_portals[NR_CPUS];
 extern struct qman_portal *qman_dma_portal;
 const struct qm_portal_config *qman_get_qm_portal_config(
struct qman_portal *portal);
+
+unsigned int qm_get_fqid_maxcnt(void);
+
+int qman_shutdown_fq(u32 fqid);
+
+int qman_requires_cleanup(void);
+void qman_done_cleanup(void);
-- 
2.7.4



[PATCH v1 5/8] soc/fsl/qbman: Fix drain_mr_fqni()

2019-05-13 Thread Roy Pledge
The drain_mr_fqni() function may be called fron uninterruptable
context so convert the msleep() to an mdelay().  Also ensure that
the valid bit is updated while polling.

Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/qman.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index f10f77d..2989504 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -1164,6 +1164,7 @@ static int drain_mr_fqrni(struct qm_portal *p)
 {
const union qm_mr_entry *msg;
 loop:
+   qm_mr_pvb_update(p);
msg = qm_mr_current(p);
if (!msg) {
/*
@@ -1180,7 +1181,8 @@ static int drain_mr_fqrni(struct qm_portal *p)
 * entries well before the ring has been fully consumed, so
 * we're being *really* paranoid here.
 */
-   msleep(1);
+   mdelay(1);
+   qm_mr_pvb_update(p);
msg = qm_mr_current(p);
if (!msg)
return 0;
-- 
2.7.4



[PATCH v1 4/8] soc/fsl/qbman: Use index when accessing device tree properties

2019-05-13 Thread Roy Pledge
The index value should be passed to the of_parse_phandle()
function to ensure the correct property is read.

Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/dpaa_sys.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/qbman/dpaa_sys.c b/drivers/soc/fsl/qbman/dpaa_sys.c
index 3e0a7f3..0b901a8 100644
--- a/drivers/soc/fsl/qbman/dpaa_sys.c
+++ b/drivers/soc/fsl/qbman/dpaa_sys.c
@@ -49,7 +49,7 @@ int qbman_init_private_mem(struct device *dev, int idx, 
dma_addr_t *addr,
idx, ret);
return -ENODEV;
}
-   mem_node = of_parse_phandle(dev->of_node, "memory-region", 0);
+   mem_node = of_parse_phandle(dev->of_node, "memory-region", idx);
if (mem_node) {
ret = of_property_read_u64(mem_node, "size", &size64);
if (ret) {
-- 
2.7.4



[PATCH v1 2/8] soc/fsl/qbman: Cleanup buffer pools if BMan was initialized prior to bootup

2019-05-13 Thread Roy Pledge
Clean the BMan buffer pools if the device had been initialized
previously.  This will ensure a consistent state if the kernel
was soft restarted (kexec for example)

Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/bman.c| 17 +
 drivers/soc/fsl/qbman/bman_ccsr.c   | 10 ++
 drivers/soc/fsl/qbman/bman_portal.c | 18 +-
 drivers/soc/fsl/qbman/bman_priv.h   |  5 +
 4 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/drivers/soc/fsl/qbman/bman.c b/drivers/soc/fsl/qbman/bman.c
index f84ab59..f4fb527 100644
--- a/drivers/soc/fsl/qbman/bman.c
+++ b/drivers/soc/fsl/qbman/bman.c
@@ -635,30 +635,31 @@ int bman_p_irqsource_add(struct bman_portal *p, u32 bits)
return 0;
 }
 
-static int bm_shutdown_pool(u32 bpid)
+int bm_shutdown_pool(u32 bpid)
 {
+   int err = 0;
struct bm_mc_command *bm_cmd;
union bm_mc_result *bm_res;
 
+
+   struct bman_portal *p = get_affine_portal();
while (1) {
-   struct bman_portal *p = get_affine_portal();
/* Acquire buffers until empty */
bm_cmd = bm_mc_start(&p->p);
bm_cmd->bpid = bpid;
bm_mc_commit(&p->p, BM_MCC_VERB_CMD_ACQUIRE | 1);
if (!bm_mc_result_timeout(&p->p, &bm_res)) {
-   put_affine_portal();
pr_crit("BMan Acquire Command timedout\n");
-   return -ETIMEDOUT;
+   err = -ETIMEDOUT;
+   goto done;
}
if (!(bm_res->verb & BM_MCR_VERB_ACQUIRE_BUFCOUNT)) {
-   put_affine_portal();
/* Pool is empty */
-   return 0;
+   goto done;
}
-   put_affine_portal();
}
-
+done:
+   put_affine_portal();
return 0;
 }
 
diff --git a/drivers/soc/fsl/qbman/bman_ccsr.c 
b/drivers/soc/fsl/qbman/bman_ccsr.c
index dc6d7e5..cb24a08 100644
--- a/drivers/soc/fsl/qbman/bman_ccsr.c
+++ b/drivers/soc/fsl/qbman/bman_ccsr.c
@@ -195,6 +195,16 @@ int bman_is_probed(void)
 }
 EXPORT_SYMBOL_GPL(bman_is_probed);
 
+int bman_requires_cleanup(void)
+{
+   return __bman_requires_cleanup;
+}
+
+void bman_done_cleanup(void)
+{
+   __bman_requires_cleanup = 0;
+}
+
 static int fsl_bman_probe(struct platform_device *pdev)
 {
int ret, err_irq;
diff --git a/drivers/soc/fsl/qbman/bman_portal.c 
b/drivers/soc/fsl/qbman/bman_portal.c
index 7819bc2..6c1a734 100644
--- a/drivers/soc/fsl/qbman/bman_portal.c
+++ b/drivers/soc/fsl/qbman/bman_portal.c
@@ -100,7 +100,7 @@ static int bman_portal_probe(struct platform_device *pdev)
struct device_node *node = dev->of_node;
struct bm_portal_config *pcfg;
struct resource *addr_phys[2];
-   int irq, cpu, err;
+   int irq, cpu, err, i;
 
err = bman_is_probed();
if (!err)
@@ -180,6 +180,22 @@ static int bman_portal_probe(struct platform_device *pdev)
if (!cpu_online(cpu))
bman_offline_cpu(cpu);
 
+   if (__bman_portals_probed == 1 && bman_requires_cleanup()) {
+   /*
+* BMan wasn't reset prior to boot (Kexec for example)
+* Empty all the buffer pools so they are in reset state
+*/
+   for (i = 0; i < BM_POOL_MAX; i++) {
+   err =  bm_shutdown_pool(i);
+   if (err) {
+   dev_err(dev, "Failed to shutdown bpool %d\n",
+   i);
+   goto err_portal_init;
+   }
+   }
+   bman_done_cleanup();
+   }
+
return 0;
 
 err_portal_init:
diff --git a/drivers/soc/fsl/qbman/bman_priv.h 
b/drivers/soc/fsl/qbman/bman_priv.h
index 751ce90..aa3981e 100644
--- a/drivers/soc/fsl/qbman/bman_priv.h
+++ b/drivers/soc/fsl/qbman/bman_priv.h
@@ -76,3 +76,8 @@ int bman_p_irqsource_add(struct bman_portal *p, u32 bits);
 
 const struct bm_portal_config *
 bman_get_bm_portal_config(const struct bman_portal *portal);
+
+int bman_requires_cleanup(void);
+void bman_done_cleanup(void);
+
+int bm_shutdown_pool(u32 bpid);
-- 
2.7.4



[PATCH v1 1/8] soc/fsl/qbman: Rework QBMan private memory setup

2019-05-13 Thread Roy Pledge
Rework QBMan private memory setup so that the areas are not
zeroed if the device was previously initialized

If the QMan private memory was already initialized skip the PFDR
initialization.

Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/bman_ccsr.c | 26 --
 drivers/soc/fsl/qbman/dpaa_sys.c  |  7 +++---
 drivers/soc/fsl/qbman/qman_ccsr.c | 45 ++-
 3 files changed, 67 insertions(+), 11 deletions(-)

diff --git a/drivers/soc/fsl/qbman/bman_ccsr.c 
b/drivers/soc/fsl/qbman/bman_ccsr.c
index 7c3cc96..dc6d7e5 100644
--- a/drivers/soc/fsl/qbman/bman_ccsr.c
+++ b/drivers/soc/fsl/qbman/bman_ccsr.c
@@ -97,17 +97,40 @@ static void bm_get_version(u16 *id, u8 *major, u8 *minor)
 /* signal transactions for FBPRs with higher priority */
 #define FBPR_AR_RPRIO_HI BIT(30)
 
-static void bm_set_memory(u64 ba, u32 size)
+/* Track if probe has occurred and if cleanup is required */
+static int __bman_probed;
+static int __bman_requires_cleanup;
+
+
+static int bm_set_memory(u64 ba, u32 size)
 {
+   u32 bar, bare;
u32 exp = ilog2(size);
/* choke if size isn't within range */
DPAA_ASSERT(size >= 4096 && size <= 1024*1024*1024 &&
   is_power_of_2(size));
/* choke if '[e]ba' has lower-alignment than 'size' */
DPAA_ASSERT(!(ba & (size - 1)));
+
+   /* Check to see if BMan has already been initialized */
+   bar = bm_ccsr_in(REG_FBPR_BAR);
+   if (bar) {
+   /* Maker sure ba == what was programmed) */
+   bare = bm_ccsr_in(REG_FBPR_BARE);
+   if (bare != upper_32_bits(ba) || bar != lower_32_bits(ba)) {
+   pr_err("Attempted to reinitialize BMan with different 
BAR, got 0x%llx read BARE=0x%x BAR=0x%x\n",
+  ba, bare, bar);
+   return -ENOMEM;
+   }
+   pr_info("BMan BAR already configured\n");
+   __bman_requires_cleanup = 1;
+   return 1;
+   }
+
bm_ccsr_out(REG_FBPR_BARE, upper_32_bits(ba));
bm_ccsr_out(REG_FBPR_BAR, lower_32_bits(ba));
bm_ccsr_out(REG_FBPR_AR, exp - 1);
+   return 0;
 }
 
 /*
@@ -120,7 +143,6 @@ static void bm_set_memory(u64 ba, u32 size)
  */
 static dma_addr_t fbpr_a;
 static size_t fbpr_sz;
-static int __bman_probed;
 
 static int bman_fbpr(struct reserved_mem *rmem)
 {
diff --git a/drivers/soc/fsl/qbman/dpaa_sys.c b/drivers/soc/fsl/qbman/dpaa_sys.c
index e6d48dc..3e0a7f3 100644
--- a/drivers/soc/fsl/qbman/dpaa_sys.c
+++ b/drivers/soc/fsl/qbman/dpaa_sys.c
@@ -40,6 +40,7 @@ int qbman_init_private_mem(struct device *dev, int idx, 
dma_addr_t *addr,
int ret;
struct device_node *mem_node;
u64 size64;
+   struct reserved_mem *rmem;
 
ret = of_reserved_mem_device_init_by_idx(dev, dev->of_node, idx);
if (ret) {
@@ -62,10 +63,8 @@ int qbman_init_private_mem(struct device *dev, int idx, 
dma_addr_t *addr,
return -ENODEV;
}
 
-   if (!dma_alloc_coherent(dev, *size, addr, 0)) {
-   dev_err(dev, "DMA Alloc memory failed\n");
-   return -ENODEV;
-   }
+   rmem = of_reserved_mem_lookup(mem_node);
+   *addr = rmem->base;
 
/*
 * Disassociate the reserved memory area from the device
diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c 
b/drivers/soc/fsl/qbman/qman_ccsr.c
index 109b38d..d054e37 100644
--- a/drivers/soc/fsl/qbman/qman_ccsr.c
+++ b/drivers/soc/fsl/qbman/qman_ccsr.c
@@ -274,6 +274,7 @@ static u32 __iomem *qm_ccsr_start;
 /* A SDQCR mask comprising all the available/visible pool channels */
 static u32 qm_pools_sdqcr;
 static int __qman_probed;
+static int  __qman_requires_cleanup;
 
 static inline u32 qm_ccsr_in(u32 offset)
 {
@@ -340,19 +341,46 @@ static void qm_get_version(u16 *id, u8 *major, u8 *minor)
 }
 
 #define PFDR_AR_EN BIT(31)
-static void qm_set_memory(enum qm_memory memory, u64 ba, u32 size)
+static int qm_set_memory(enum qm_memory memory, u64 ba, u32 size)
 {
+   void *ptr;
u32 offset = (memory == qm_memory_fqd) ? REG_FQD_BARE : REG_PFDR_BARE;
u32 exp = ilog2(size);
+   u32 bar, bare;
 
/* choke if size isn't within range */
DPAA_ASSERT((size >= 4096) && (size <= 1024*1024*1024) &&
is_power_of_2(size));
/* choke if 'ba' has lower-alignment than 'size' */
DPAA_ASSERT(!(ba & (size - 1)));
+
+   /* Check to see if QMan has already been initialized */
+   bar = qm_ccsr_in(offset + REG_offset_BAR);
+   if (bar) {
+   /* Maker sure ba == what was programmed) */
+   bare = qm_ccsr_in(offset);
+   if (bare != upper_32_bits(ba) || bar != lower_32_bits(ba)) {
+   pr_err("Attempted to reinitialize QMan with different 
BAR, got 0x%llx read BARE=0x%x BAR=0x%x\n",
+  ba, bare, bar);
+

[PATCH v1 0/8] soc/fsl/qbman: Enable Kexec for DPAA1 devices

2019-05-13 Thread Roy Pledge
Most DPAA1 devices do not support a soft reset which is an issue if
Kexec starts a new kernel. This patch series allows Kexec to function
by detecting that the QBMan device was previously initialized.

The patches fix some issues with device cleanup as well as ensuring
that the location of the QBMan private memories has not changed
after the execution of the Kexec.

Roy Pledge (8):
  soc/fsl/qbman: Rework QBMan private memory setup
  soc/fsl/qbman: Cleanup buffer pools if BMan was initialized prior to
bootup
  soc/fsl/qbman: Cleanup QMan queues if device was already initialized
  soc/fsl/qbman: Use index when accessing device tree properties
  soc/fsl/qbman: Fix drain_mr_fqni()
  soc/fsl/qbman: Disable interrupts during portal recovery
  soc/fsl/qbman: Fixup qman_shutdown_fq()
  soc/fsl/qbman: Update device tree with reserved memory

 drivers/soc/fsl/qbman/bman.c| 17 
 drivers/soc/fsl/qbman/bman_ccsr.c   | 36 +++-
 drivers/soc/fsl/qbman/bman_portal.c | 18 +++-
 drivers/soc/fsl/qbman/bman_priv.h   |  5 +++
 drivers/soc/fsl/qbman/dpaa_sys.c| 63 
 drivers/soc/fsl/qbman/qman.c| 83 +
 drivers/soc/fsl/qbman/qman_ccsr.c   | 59 +++---
 drivers/soc/fsl/qbman/qman_portal.c | 18 +++-
 drivers/soc/fsl/qbman/qman_priv.h   |  8 
 9 files changed, 246 insertions(+), 61 deletions(-)

--
2.7.4



Re: [PATCH 1/8] powerpc/pseries: Use macros for referring to the DTL enable mask

2019-05-13 Thread Nathan Lynch
"Naveen N. Rao"  writes:
> Introduce macros to encode the DTL enable mask fields and use those
> instead of hardcoding numbers.

This is a good cleanup on its own.

Acked-by: Nathan Lynch 


[PATCH v2] powerpc/boot: pass CONFIG options in a simpler and more robust way

2019-05-13 Thread Masahiro Yamada
Commit 5e9dcb6188a4 ("powerpc/boot: Expose Kconfig symbols to wrapper")
was wrong, but commit e41b93a6be57 ("powerpc/boot: Fix build failures
with -j 1") was also wrong.

The correct dependency is:

  $(obj)/serial.o: $(obj)/autoconf.h

However, I do not see the reason why we need to copy autoconf.h to
arch/power/boot/. Nor do I see consistency in the way of passing
CONFIG options.

decompress.c references CONFIG_KERNEL_GZIP and CONFIG_KERNEL_XZ, which
are passed via the command line.

serial.c includes autoconf.h to reference a couple of CONFIG options,
but this is fragile because we often forget to include "autoconf.h"
from source files.

In fact, it is already broken.

ppc_asm.h references CONFIG_PPC_8xx, but utils.S is not given any way
to access CONFIG options. So, CONFIG_PPC_8xx is never defined here.

Pass $(LINUXINCLUDE) to make sure CONFIG options are accessible from
all .c and .S files in arch/powerpc/boot/.

I also removed the -traditional flag to make include/linux/kconfig.h
work. This flag makes the preprocessor imitate the behavior of the
pre-standard C compiler, but I do not understand why it is necessary.

Signed-off-by: Masahiro Yamada 
---

Changes in v2:
 - reword commit log

 arch/powerpc/boot/.gitignore |  2 --
 arch/powerpc/boot/Makefile   | 14 +++---
 arch/powerpc/boot/serial.c   |  1 -
 3 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/boot/.gitignore b/arch/powerpc/boot/.gitignore
index 32034a0c..6610665 100644
--- a/arch/powerpc/boot/.gitignore
+++ b/arch/powerpc/boot/.gitignore
@@ -44,5 +44,3 @@ fdt_sw.c
 fdt_wip.c
 libfdt.h
 libfdt_internal.h
-autoconf.h
-
diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 73d1f35..b8a82be 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -20,9 +20,6 @@
 
 all: $(obj)/zImage
 
-compress-$(CONFIG_KERNEL_GZIP) := CONFIG_KERNEL_GZIP
-compress-$(CONFIG_KERNEL_XZ)   := CONFIG_KERNEL_XZ
-
 ifdef CROSS32_COMPILE
 BOOTCC := $(CROSS32_COMPILE)gcc
 BOOTAR := $(CROSS32_COMPILE)ar
@@ -34,7 +31,7 @@ endif
 BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
 -fno-strict-aliasing -O2 -msoft-float -mno-altivec -mno-vsx \
 -pipe -fomit-frame-pointer -fno-builtin -fPIC -nostdinc \
--D$(compress-y)
+$(LINUXINCLUDE)
 
 ifdef CONFIG_PPC64_BOOT_WRAPPER
 BOOTCFLAGS += -m64
@@ -51,7 +48,7 @@ BOOTCFLAGS+= -mlittle-endian
 BOOTCFLAGS += $(call cc-option,-mabi=elfv2)
 endif
 
-BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -traditional -nostdinc
+BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -nostdinc
 
 BOOTARFLAGS:= -cr$(KBUILD_ARFLAGS)
 
@@ -202,14 +199,9 @@ $(obj)/empty.c:
 $(obj)/zImage.coff.lds $(obj)/zImage.ps3.lds : $(obj)/%: $(srctree)/$(src)/%.S
$(Q)cp $< $@
 
-$(srctree)/$(src)/serial.c: $(obj)/autoconf.h
-
-$(obj)/autoconf.h: $(obj)/%: $(objtree)/include/generated/%
-   $(Q)cp $< $@
-
 clean-files := $(zlib-) $(zlibheader-) $(zliblinuxheader-) \
$(zlib-decomp-) $(libfdt) $(libfdtheader) \
-   autoconf.h empty.c zImage.coff.lds zImage.ps3.lds zImage.lds
+   empty.c zImage.coff.lds zImage.ps3.lds zImage.lds
 
 quiet_cmd_bootcc = BOOTCC  $@
   cmd_bootcc = $(BOOTCC) -Wp,-MD,$(depfile) $(BOOTCFLAGS) -c -o $@ $<
diff --git a/arch/powerpc/boot/serial.c b/arch/powerpc/boot/serial.c
index b0491b8..9457863 100644
--- a/arch/powerpc/boot/serial.c
+++ b/arch/powerpc/boot/serial.c
@@ -18,7 +18,6 @@
 #include "stdio.h"
 #include "io.h"
 #include "ops.h"
-#include "autoconf.h"
 
 static int serial_open(void)
 {
-- 
2.7.4



Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-13 Thread Steven Rostedt
On Mon, 13 May 2019 14:42:20 +0200
Petr Mladek  wrote:

> > The "(null)" is good enough by itself and already an established
> > practice..  
> 
> (efault) made more sense with the probe_kernel_read() that
> checked wide range of addresses. Well, I still think that
> it makes sense to distinguish a pure NULL. And it still
> used also for IS_ERR_VALUE().

Why not just "(fault)"? That is self descriptive enough.

-- Steve


Re: [PATCH v6 1/1] iommu: enhance IOMMU dma mode build options

2019-05-13 Thread Leizhen (ThunderTown)



On 2019/5/8 17:42, John Garry wrote:
> On 18/04/2019 14:57, Zhen Lei wrote:
>> First, add build option IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the
>> opportunity to set {lazy|strict} mode as default at build time. Then put
>> the three config options in an choice, make people can only choose one of
>> the three at a time.
>>
>> The default IOMMU dma modes on each ARCHs have no change.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>  arch/ia64/kernel/pci-dma.c|  2 +-
>>  arch/powerpc/platforms/powernv/pci-ioda.c |  3 ++-
>>  arch/s390/pci/pci_dma.c   |  2 +-
>>  arch/x86/kernel/pci-dma.c |  7 ++---
>>  drivers/iommu/Kconfig | 44 
>> ++-
>>  drivers/iommu/amd_iommu_init.c|  3 ++-
>>  drivers/iommu/intel-iommu.c   |  2 +-
>>  drivers/iommu/iommu.c |  3 ++-
>>  8 files changed, 48 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
>> index fe988c49f01ce6a..655511dbf3c3b34 100644
>> --- a/arch/ia64/kernel/pci-dma.c
>> +++ b/arch/ia64/kernel/pci-dma.c
>> @@ -22,7 +22,7 @@
>>  int force_iommu __read_mostly;
>>  #endif
>>
>> -int iommu_pass_through;
>> +int iommu_pass_through = IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);
>>
>>  static int __init pci_iommu_init(void)
>>  {
>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>> b/arch/powerpc/platforms/powernv/pci-ioda.c
>> index 3ead4c237ed0ec9..383e082a9bb985c 100644
>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>> @@ -85,7 +85,8 @@ void pe_level_printk(const struct pnv_ioda_pe *pe, const 
>> char *level,
>>  va_end(args);
>>  }
>>
>> -static bool pnv_iommu_bypass_disabled __read_mostly;
>> +static bool pnv_iommu_bypass_disabled __read_mostly =
>> +!IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);
>>  static bool pci_reset_phbs __read_mostly;
>>
>>  static int __init iommu_setup(char *str)
>> diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
>> index 9e52d1527f71495..784ad1e0acecfb1 100644
>> --- a/arch/s390/pci/pci_dma.c
>> +++ b/arch/s390/pci/pci_dma.c
>> @@ -17,7 +17,7 @@
>>
>>  static struct kmem_cache *dma_region_table_cache;
>>  static struct kmem_cache *dma_page_table_cache;
>> -static int s390_iommu_strict;
>> +static int s390_iommu_strict = IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT);
>>
>>  static int zpci_refresh_global(struct zpci_dev *zdev)
>>  {
>> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
>> index d460998ae828514..fb2bab42a0a3173 100644
>> --- a/arch/x86/kernel/pci-dma.c
>> +++ b/arch/x86/kernel/pci-dma.c
>> @@ -43,11 +43,8 @@
>>   * It is also possible to disable by default in kernel config, and enable 
>> with
>>   * iommu=nopt at boot time.
>>   */
>> -#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH
>> -int iommu_pass_through __read_mostly = 1;
>> -#else
>> -int iommu_pass_through __read_mostly;
>> -#endif
>> +int iommu_pass_through __read_mostly =
>> +IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);
>>
>>  extern struct iommu_table_entry __iommu_table[], __iommu_table_end[];
>>
>> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
>> index 6f07f3b21816c64..8a1f1793cde76b4 100644
>> --- a/drivers/iommu/Kconfig
>> +++ b/drivers/iommu/Kconfig
>> @@ -74,17 +74,47 @@ config IOMMU_DEBUGFS
>>debug/iommu directory, and then populate a subdirectory with
>>entries as required.
>>
>> -config IOMMU_DEFAULT_PASSTHROUGH
>> -bool "IOMMU passthrough by default"
>> +choice
>> +prompt "IOMMU dma mode"
> 
> /s/dma/DMA/
OK

> 
> And how about add "default", as in "Default IOMMU DMA mode" or "IOMMU default 
> DMA mode"?
Yes. I prefer "IOMMU default DMA mode".

> 
>>  depends on IOMMU_API
>> -help
>> -  Enable passthrough by default, removing the need to pass in
>> -  iommu.passthrough=on or iommu=pt through command line. If this
>> -  is enabled, you can still disable with iommu.passthrough=off
>> -  or iommu=nopt depending on the architecture.
>> +default IOMMU_DEFAULT_PASSTHROUGH if (PPC_POWERNV && PCI)
>> +default IOMMU_DEFAULT_LAZY if (AMD_IOMMU || INTEL_IOMMU || S390_IOMMU)
>> +default IOMMU_DEFAULT_STRICT
>> +help
>> +  This option allows IOMMU dma mode to be chose at build time, to
> 
> again, capitalize acronyms, i.e. /s/dma/DMA/ (more of these above and below)
OK, I will check it all. Thanks.

> 
>> +  override the default dma mode of each ARCHs, removing the need to
>> +  pass in kernel parameters through command line. You can still use
>> +  ARCHs specific boot options to override this option again.
>> +
>> +config IOMMU_DEFAULT_PASSTHROUGH
> 
> I think that it may need to be indented, along with the other choices
There is no problem. I referred to mm/Kconfig.

> 
>> +bool "passthrough"
>> +help
>> +  In this mode, the dma access through IOMMU withou

Re: [PATCH] powerpc/boot: fix broken way to pass CONFIG options

2019-05-13 Thread Oliver
On Mon, May 13, 2019 at 11:56 PM Masahiro Yamada
 wrote:
>
> On Mon, May 13, 2019 at 9:33 PM Masahiro Yamada
>  wrote:
> >
> > Commit 5e9dcb6188a4 ("powerpc/boot: Expose Kconfig symbols to wrapper")
> > was wrong, but commit e41b93a6be57 ("powerpc/boot: Fix build failures
> > with -j 1") was also wrong.
> >
> > Check-in source files never ever depend on build artifacts.
> >
> > The correct dependency is:
> >
> >   $(obj)/serial.o: $(obj)/autoconf.h
> >
> > However, copying autoconf.h to arch/power/boot/ is questionable
> > in the first place.
> >
> > arch/powerpc/Makefile adopted multiple ways to pass CONFIG options.
> >
> > arch/powerpc/boot/decompress.c references CONFIG_KERNEL_GZIP and
> > CONFIG_KERNEL_XZ, which are passed via the command line.
> >
> > arch/powerpc/boot/serial.c includes the copied autoconf.h to
> > reference a couple of CONFIG options.
> >
> > Do not do this.
> >
> > We should have already learned that including autoconf.h from each
> > source file is really fragile.
> >
> > In fact, it is already broken.
> >
> > arch/powerpc/boot/ppc_asm.h references CONFIG_PPC_8xx, but
> > arch/powerpc/boot/utils.S is not given any way to access CONFIG
> > options. So, CONFIG_PPC_8xx is never defined here.
> >
> > Just pass $(LINUXINCLUDE) and remove all broken code.
> >
> > I also removed the -traditional flag to make include/linux/kconfig.h
> > work. I do not understand why it needs to imitate the behavior of
> > pre-standard C preprocessors.
> >
> > Signed-off-by: Masahiro Yamada 
> > ---
>
>
> I re-read my commit log, and I thought it was needlessly
> too offensive. Sorry about that.
>
> I will reword the commit log and send v2.
>

No worries. We know the bootwrapper is... not great.

>
>
>
> --
> Best Regards
> Masahiro Yamada


Re: [PATCH] powerpc/boot: fix broken way to pass CONFIG options

2019-05-13 Thread Oliver
On Mon, May 13, 2019 at 9:23 PM Masahiro Yamada
 wrote:
>
> Commit 5e9dcb6188a4 ("powerpc/boot: Expose Kconfig symbols to wrapper")
> was wrong, but commit e41b93a6be57 ("powerpc/boot: Fix build failures
> with -j 1") was also wrong.
>
> Check-in source files never ever depend on build artifacts.
>
> The correct dependency is:
>
>   $(obj)/serial.o: $(obj)/autoconf.h
>
> However, copying autoconf.h to arch/power/boot/ is questionable
> in the first place.
>
> arch/powerpc/Makefile adopted multiple ways to pass CONFIG options.
>
> arch/powerpc/boot/decompress.c references CONFIG_KERNEL_GZIP and
> CONFIG_KERNEL_XZ, which are passed via the command line.
>
> arch/powerpc/boot/serial.c includes the copied autoconf.h to
> reference a couple of CONFIG options.
>
> Do not do this.
>
> We should have already learned that including autoconf.h from each
> source file is really fragile.
>
> In fact, it is already broken.
>
> arch/powerpc/boot/ppc_asm.h references CONFIG_PPC_8xx, but
> arch/powerpc/boot/utils.S is not given any way to access CONFIG
> options. So, CONFIG_PPC_8xx is never defined here.
>
> Just pass $(LINUXINCLUDE) and remove all broken code.

I'm not sure how safe this is. The original reason for the
CONFIG_KERNEL_XZ hack in the makefile was because the kernel headers
couldn't be included directly. The bootwrapper is compiled with a
32bit toolchain when the kernel is compiled for 64bit big endian
because of older systems with broken firmware that can't load 64bit
ELFs directly. When I added XZ support to the wrapper I did experiment
with including the kernel headers directly and couldn't make it work
reliably. I don't remember what the exact reason was, but I think it
was something to do with the generated headers not always matching
what you would expect when compiling for 32bit. It's also possible I
was just being paranoid. Either way it's about time we found a real
fix...

The stuff in serial.c and ppc_asm.h was added later to work around
other issues without anyone thinking too hard about it. Oh well.

> I also removed the -traditional flag to make include/linux/kconfig.h
> work. I do not understand why it needs to imitate the behavior of
> pre-standard C preprocessors.

I'm not sure why it's there either. The boot wrapper was re-written at
some point so it might just be a hold over from the dawn of time.

> Signed-off-by: Masahiro Yamada 
> ---
>
>  arch/powerpc/boot/.gitignore |  2 --
>  arch/powerpc/boot/Makefile   | 14 +++---
>  arch/powerpc/boot/serial.c   |  1 -
>  3 files changed, 3 insertions(+), 14 deletions(-)
>
> diff --git a/arch/powerpc/boot/.gitignore b/arch/powerpc/boot/.gitignore
> index 32034a0cc554..6610665fcf5e 100644
> --- a/arch/powerpc/boot/.gitignore
> +++ b/arch/powerpc/boot/.gitignore
> @@ -44,5 +44,3 @@ fdt_sw.c
>  fdt_wip.c
>  libfdt.h
>  libfdt_internal.h
> -autoconf.h
> -
> diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
> index 73d1f3562978..b8a82be2af2a 100644
> --- a/arch/powerpc/boot/Makefile
> +++ b/arch/powerpc/boot/Makefile
> @@ -20,9 +20,6 @@
>
>  all: $(obj)/zImage
>
> -compress-$(CONFIG_KERNEL_GZIP) := CONFIG_KERNEL_GZIP
> -compress-$(CONFIG_KERNEL_XZ)   := CONFIG_KERNEL_XZ
> -
>  ifdef CROSS32_COMPILE
>  BOOTCC := $(CROSS32_COMPILE)gcc
>  BOOTAR := $(CROSS32_COMPILE)ar
> @@ -34,7 +31,7 @@ endif
>  BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
>  -fno-strict-aliasing -O2 -msoft-float -mno-altivec -mno-vsx \
>  -pipe -fomit-frame-pointer -fno-builtin -fPIC -nostdinc \
> --D$(compress-y)
> +$(LINUXINCLUDE)
>
>  ifdef CONFIG_PPC64_BOOT_WRAPPER
>  BOOTCFLAGS += -m64
> @@ -51,7 +48,7 @@ BOOTCFLAGS+= -mlittle-endian
>  BOOTCFLAGS += $(call cc-option,-mabi=elfv2)
>  endif
>
> -BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -traditional -nostdinc
> +BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -nostdinc
>
>  BOOTARFLAGS:= -cr$(KBUILD_ARFLAGS)
>
> @@ -202,14 +199,9 @@ $(obj)/empty.c:
>  $(obj)/zImage.coff.lds $(obj)/zImage.ps3.lds : $(obj)/%: 
> $(srctree)/$(src)/%.S
> $(Q)cp $< $@
>
> -$(srctree)/$(src)/serial.c: $(obj)/autoconf.h
> -
> -$(obj)/autoconf.h: $(obj)/%: $(objtree)/include/generated/%
> -   $(Q)cp $< $@
> -
>  clean-files := $(zlib-) $(zlibheader-) $(zliblinuxheader-) \
> $(zlib-decomp-) $(libfdt) $(libfdtheader) \
> -   autoconf.h empty.c zImage.coff.lds zImage.ps3.lds zImage.lds
> +   empty.c zImage.coff.lds zImage.ps3.lds zImage.lds
>
>  quiet_cmd_bootcc = BOOTCC  $@
>cmd_bootcc = $(BOOTCC) -Wp,-MD,$(depfile) $(BOOTCFLAGS) -c -o $@ $<
> diff --git a/arch/powerpc/boot/serial.c b/arch/powerpc/boot/serial.c
> index b0491b8c0199..9457863147f9 100644
> --- a/arch/powerpc/boot/serial.c
> +++ b/arch/powerpc/boot/serial.c
> @@ -18,7 +18,6 @@
>  #include "stdio.h"
>  #include "io.h"
>  #include "ops.h"
> -#include "autoconf.h"
>
>  static int serial_open(void)
>  {

Re: [PATCH] powerpc/boot: fix broken way to pass CONFIG options

2019-05-13 Thread Masahiro Yamada
On Mon, May 13, 2019 at 9:33 PM Masahiro Yamada
 wrote:
>
> Commit 5e9dcb6188a4 ("powerpc/boot: Expose Kconfig symbols to wrapper")
> was wrong, but commit e41b93a6be57 ("powerpc/boot: Fix build failures
> with -j 1") was also wrong.
>
> Check-in source files never ever depend on build artifacts.
>
> The correct dependency is:
>
>   $(obj)/serial.o: $(obj)/autoconf.h
>
> However, copying autoconf.h to arch/power/boot/ is questionable
> in the first place.
>
> arch/powerpc/Makefile adopted multiple ways to pass CONFIG options.
>
> arch/powerpc/boot/decompress.c references CONFIG_KERNEL_GZIP and
> CONFIG_KERNEL_XZ, which are passed via the command line.
>
> arch/powerpc/boot/serial.c includes the copied autoconf.h to
> reference a couple of CONFIG options.
>
> Do not do this.
>
> We should have already learned that including autoconf.h from each
> source file is really fragile.
>
> In fact, it is already broken.
>
> arch/powerpc/boot/ppc_asm.h references CONFIG_PPC_8xx, but
> arch/powerpc/boot/utils.S is not given any way to access CONFIG
> options. So, CONFIG_PPC_8xx is never defined here.
>
> Just pass $(LINUXINCLUDE) and remove all broken code.
>
> I also removed the -traditional flag to make include/linux/kconfig.h
> work. I do not understand why it needs to imitate the behavior of
> pre-standard C preprocessors.
>
> Signed-off-by: Masahiro Yamada 
> ---


I re-read my commit log, and I thought it was needlessly
too offensive. Sorry about that.

I will reword the commit log and send v2.




-- 
Best Regards
Masahiro Yamada


Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-13 Thread Petr Mladek
On Mon 2019-05-13 12:13:20, Andy Shevchenko wrote:
> On Mon, May 13, 2019 at 08:52:41AM +, David Laight wrote:
> > From: christophe leroy
> > > Sent: 10 May 2019 18:35
> > > Le 10/05/2019 à 18:24, Steven Rostedt a écrit :
> > > > On Fri, 10 May 2019 10:42:13 +0200
> > > > Petr Mladek  wrote:
> 
> > > >> -  if (probe_kernel_address(ptr, byte))
> > > >> +  if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
> > > >>return "(efault)";
> > 
> > "efault" looks a bit like a spellling mistake for "default".
> 
> It's a special, thus it's in parenthesis, though somebody can be
> misguided.
> 
> > > Usually, < PAGE_SIZE means NULL pointer dereference (via the member of a
> > > struct)
> > 
> > Maybe the caller should pass in a short buffer so that you can return
> > "(err-%d)"
> > or "(null+%#x)" ?

There are many vsprintf() users. I am afraid that nobody would want
to define extra buffers for error messages. It must fit into
the one for the formatted string.


> In both cases it should be limited to the size of pointer (8 or 16
> characters). Something like "(e:%4d)" would work for error codes.

I am afraid that we could get 10 different proposals from a huge
enough group of people. I wanted to start with something simple
(source code). I know that (efault) might be confused with
"default" but it is still just one string to grep.


> The "(null)" is good enough by itself and already an established
> practice..

(efault) made more sense with the probe_kernel_read() that
checked wide range of addresses. Well, I still think that
it makes sense to distinguish a pure NULL. And it still
used also for IS_ERR_VALUE().

Best Regards,
Petr


[Bug 203571] Allow use of SIMD in interrupts on PowerPC

2019-05-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=203571

--- Comment #2 from Shawn Landden (sland...@gmail.com) ---
X86 manages to allow SIMD in interrupts through some very careful code in
arch/x86/kernel/fpu/core.c (starting with irq_fpu_usable())

PowerPC can do the same.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-13 Thread Petr Mladek
On Fri 2019-05-10 12:40:58, Steven Rostedt wrote:
> On Fri, 10 May 2019 18:32:58 +0200
> Martin Schwidefsky  wrote:
> 
> > On Fri, 10 May 2019 12:24:01 -0400
> > Steven Rostedt  wrote:
> > 
> > > On Fri, 10 May 2019 10:42:13 +0200
> > > Petr Mladek  wrote:
> > >   
> > > >  static const char *check_pointer_msg(const void *ptr)
> > > >  {
> > > > -   char byte;
> > > > -
> > > > if (!ptr)
> > > > return "(null)";
> > > >  
> > > > -   if (probe_kernel_address(ptr, byte))
> > > > +   if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
> > > > return "(efault)";
> > > >  
> > > 
> > > 
> > >   < PAGE_SIZE ?
> > > 
> > > do you mean: < TASK_SIZE ?  
> > 
> > The check with < TASK_SIZE would break on s390. The 'ptr' is
> > in the kernel address space, *not* in the user address space.
> > Remember s390 has two separate address spaces for kernel/user
> > the check < TASK_SIZE only makes sense with a __user pointer.
> > 
> 
> So we allow this to read user addresses? Can't that cause a fault?

I did some quick check and did not found anywhere a user pointer
being dereferenced via vsprintf().

In each case, %s did the check (ptr < PAGE_SIZE) even before this
patchset. The other checks are in %p format modifiers that are
used to print various kernel structures.

Finally, it accesses the pointer directly. I am not completely sure
but I think that it would not work that easily with an address
from the user address space.

Best Regards,
Petr


Re: [PATCH, RFC] byteorder: sanity check toolchain vs kernel endianess

2019-05-13 Thread Christoph Hellwig
On Mon, May 13, 2019 at 01:50:19PM +0200, Dmitry Vyukov wrote:
> > We did have some bugs in the past (~1-2 y/ago) but AFAIK they are all
> > fixed now. These days I build most of my kernels with a bi-endian 64-bit
> > toolchain, and switching endian without running `make clean` also works.
> 
> For the record, yes, it turn out to be a problem in our code (a latent
> bug). We actually used host (x86) gcc to build as-if ppc code that can
> run on the host, so it defined neither LE no BE macros. It just
> happened to work in the past :)

So Nick was right and these checks actually are useful..


Re: [PATCH RESEND] powerpc: add simd.h implementation specific to PowerPC

2019-05-13 Thread Michael Ellerman
Shawn Landden  writes:
> It is safe to do SIMD in an interrupt on PowerPC.

No it's not sorry :)

> Only disable when there is no SIMD available
> (and this is a static branch).
>
> Tested and works with the WireGuard (Zinc) patch I wrote that needs this.
> Also improves performance of the crypto subsystem that checks this.
>
> Re-sending because this linuxppc-dev didn't seem to accept it.

It did but you were probably moderated as a non-subscriber? In future if
you just wait a while for the moderators to wake up it should come
through. Though having posted once and been approved I think you might
not get moderated at all in future (?).

> Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=203571
> Signed-off-by: Shawn Landden 
> ---
>  arch/powerpc/include/asm/simd.h | 15 +++
>  1 file changed, 15 insertions(+)
>  create mode 100644 arch/powerpc/include/asm/simd.h
>
> diff --git a/arch/powerpc/include/asm/simd.h b/arch/powerpc/include/asm/simd.h
> new file mode 100644
> index 0..b3fecb95a
> --- /dev/null
> +++ b/arch/powerpc/include/asm/simd.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +
> +#include 
> +
> +/*
> + * may_use_simd - whether it is allowable at this time to issue SIMD
> + *instructions or access the SIMD register file
> + *
> + * As documented in Chapter 6.2.1 Machine Status Save/Restore Registers
> + * of Power ISA (2.07 and 3.0), all registers are saved/restored in an 
> interrupt.

I think the confusion here is that the ISA says:

  When various interrupts occur, the state of the machine is saved in the
  Machine Status Save/Restore registers (SRR0 and SRR1).

And you've read that to mean all "the state" of the machine, ie.
including GPRs, FPRs etc.

But unfortunately it's not that simple. All the hardware does is write
two 64-bit registers (SRR0 & SRR1) with the information required to be
able to return to the state the CPU was in prior to the interrupt. That
includes (obviously) the PC you were executing at, and what state the
CPU was in (ie. 64/32-bit, MMU on/off, FP on/off etc.). All the saving
of registers etc. is left up to software. It's the RISC way :)


I guess we need to work out why exactly may_use_simd() is returning
false in your kworker. 

cheers


Re: [PATCH] crypto: vmx - fix copy-paste error in CTR mode

2019-05-13 Thread Michael Ellerman
Herbert Xu  writes:
> On Mon, May 06, 2019 at 08:53:17AM -0700, Eric Biggers wrote:
>>
>> Any progress on this?  Someone just reported this again here:
>> https://bugzilla.kernel.org/show_bug.cgi?id=203515
>
> Guys if I don't get a fix for this soon I'll have to disable CTR
> in vmx.

No objection from me.

I'll try and debug it at some point if no one else does, but I can't
make it my top priority sorry.

cheers


Re: [PATCH v3 4/6] dt-bindings: soc/fsl: qe: document new fsl,qe-snums binding

2019-05-13 Thread Joakim Tjernlund
On Mon, 2019-05-13 at 11:14 +, Rasmus Villemoes wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
> 
> 
> Reading table 4-30, and its footnotes, of the QUICC Engine Block
> Reference Manual shows that the set of snum _values_ is not
> necessarily just a function of the _number_ of snums, as given in the
> fsl,qe-num-snums property.
> 
> As an alternative, to make it easier to add support for other variants
> of the QUICC engine IP, this introduces a new binding fsl,qe-snums,
> which automatically encodes both the number of snums and the actual
> values to use.
> 
> Signed-off-by: Rasmus Villemoes 
> ---
> Rob, thanks for the review of v2. However, since I moved the example
> from the commit log to the binding (per Joakim's request), I didn't

Thanks, looks good now.

> add a Reviewed-by tag for this revision.
> 
>  .../devicetree/bindings/soc/fsl/cpm_qe/qe.txt   | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/soc/fsl/cpm_qe/qe.txt 
> b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/qe.txt
> index d7afaff5faff..05ec2a838c54 100644
> --- a/Documentation/devicetree/bindings/soc/fsl/cpm_qe/qe.txt
> +++ b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/qe.txt
> @@ -18,7 +18,8 @@ Required properties:
>  - reg : offset and length of the device registers.
>  - bus-frequency : the clock frequency for QUICC Engine.
>  - fsl,qe-num-riscs: define how many RISC engines the QE has.
> -- fsl,qe-num-snums: define how many serial number(SNUM) the QE can use for 
> the
> +- fsl,qe-snums: This property has to be specified as '/bits/ 8' value,
> +  defining the array of serial number (SNUM) values for the virtual
>threads.
> 
>  Optional properties:
> @@ -34,6 +35,11 @@ Recommended properties
>  - brg-frequency : the internal clock source frequency for baud-rate
>generators in Hz.
> 
> +Deprecated properties
> +- fsl,qe-num-snums: define how many serial number(SNUM) the QE can use
> +  for the threads. Use fsl,qe-snums instead to not only specify the
> +  number of snums, but also their values.
> +
>  Example:
>   qe@e010 {
> #address-cells = <1>;
> @@ -44,6 +50,11 @@ Example:
> reg = ;
> brg-frequency = <0>;
> bus-frequency = <179A7B00>;
> +   fsl,qe-snums = /bits/ 8 <
> +   0x04 0x05 0x0C 0x0D 0x14 0x15 0x1C 0x1D
> +   0x24 0x25 0x2C 0x2D 0x34 0x35 0x88 0x89
> +   0x98 0x99 0xA8 0xA9 0xB8 0xB9 0xC8 0xC9
> +   0xD8 0xD9 0xE8 0xE9>;
>   }
> 
>  * Multi-User RAM (MURAM)
> --
> 2.20.1
> 



Re: [RFC PATCH] powerpc/64/ftrace: mprofile-kernel patch out mflr

2019-05-13 Thread Michael Ellerman
"Naveen N. Rao"  writes:
> Michael Ellerman wrote:
>> Nicholas Piggin  writes:
>>> The new mprofile-kernel mcount sequence is
>>>
>>>   mflr  r0
>>>   bl_mcount
>>>
>>> Dynamic ftrace patches the branch instruction with a noop, but leaves
>>> the mflr. mflr is executed by the branch unit that can only execute one
>>> per cycle on POWER9 and shared with branches, so it would be nice to
>>> avoid it where possible.
>>>
>>> This patch is a hacky proof of concept to nop out the mflr. Can we do
>>> this or are there races or other issues with it?
>> 
>> There's a race, isn't there?
>> 
>> We have a function foo which currently has tracing disabled, so the mflr
>> and bl are nop'ed out.
>> 
>>   CPU 0  CPU 1
>>   ==
>>   bl foo
>>   nop (ie. not mflr)
>>   -> interrupt
>>   something else enable tracing for foo
>>   ...patch mflr and branch
>>   <- rfi
>>   bl _mcount
>> 
>> So we end up in _mcount() but with r0 not populated.
>
> Good catch! Looks like we need to patch the mflr with a "b +8" similar 
> to what we do in __ftrace_make_nop().

Would that actually make it any faster though? Nick?

cheers


Re: [PATCH, RFC] byteorder: sanity check toolchain vs kernel endianess

2019-05-13 Thread Michael Ellerman
Dmitry Vyukov  writes:
> From: Arnd Bergmann 
> Date: Sat, May 11, 2019 at 2:51 AM
> To: Dmitry Vyukov
> Cc: Nick Kossifidis, Christoph Hellwig, Linus Torvalds, Andrew Morton,
> linux-arch, Linux Kernel Mailing List, linuxppc-dev
>
>> On Fri, May 10, 2019 at 6:53 AM Dmitry Vyukov  wrote:
>> > >
>> > > I think it's good to have a sanity check in-place for consistency.
>> >
>> >
>> > Hi,
>> >
>> > This broke our cross-builds from x86. I am using:
>> >
>> > $ powerpc64le-linux-gnu-gcc --version
>> > powerpc64le-linux-gnu-gcc (Debian 7.2.0-7) 7.2.0
>> >
>> > and it says that it's little-endian somehow:
>> >
>> > $ powerpc64le-linux-gnu-gcc -dM -E - < /dev/null | grep BYTE_ORDER
>> > #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__
>> >
>> > Is it broke compiler? Or I always hold it wrong? Is there some
>> > additional flag I need to add?
>>
>> It looks like a bug in the kernel Makefiles to me. powerpc32 is always
>> big-endian,
>> powerpc64 used to be big-endian but is now usually little-endian. There are
>> often three separate toolchains that default to the respective user
>> space targets
>> (ppc32be, ppc64be, ppc64le), but generally you should be able to build
>> any of the
>> three kernel configurations with any of those compilers, and have the 
>> Makefile
>> pass the correct -m32/-m64/-mbig-endian/-mlittle-endian command line options
>> depending on the kernel configuration. It seems that this is not happening
>> here. I have not checked why, but if this is the problem, it should be
>> easy enough
>> to figure out.
>
>
> Thanks! This clears a lot.
> This may be a bug in our magic as we try to build kernel files outside
> of make with own flags (required to extract parts of kernel
> interfaces).
> So don't spend time looking for the Makefile bugs yet.

OK :)

We did have some bugs in the past (~1-2 y/ago) but AFAIK they are all
fixed now. These days I build most of my kernels with a bi-endian 64-bit
toolchain, and switching endian without running `make clean` also works.

cheers


[PATCH] powerpc/boot: fix broken way to pass CONFIG options

2019-05-13 Thread Masahiro Yamada
Commit 5e9dcb6188a4 ("powerpc/boot: Expose Kconfig symbols to wrapper")
was wrong, but commit e41b93a6be57 ("powerpc/boot: Fix build failures
with -j 1") was also wrong.

Check-in source files never ever depend on build artifacts.

The correct dependency is:

  $(obj)/serial.o: $(obj)/autoconf.h

However, copying autoconf.h to arch/power/boot/ is questionable
in the first place.

arch/powerpc/Makefile adopted multiple ways to pass CONFIG options.

arch/powerpc/boot/decompress.c references CONFIG_KERNEL_GZIP and
CONFIG_KERNEL_XZ, which are passed via the command line.

arch/powerpc/boot/serial.c includes the copied autoconf.h to
reference a couple of CONFIG options.

Do not do this.

We should have already learned that including autoconf.h from each
source file is really fragile.

In fact, it is already broken.

arch/powerpc/boot/ppc_asm.h references CONFIG_PPC_8xx, but
arch/powerpc/boot/utils.S is not given any way to access CONFIG
options. So, CONFIG_PPC_8xx is never defined here.

Just pass $(LINUXINCLUDE) and remove all broken code.

I also removed the -traditional flag to make include/linux/kconfig.h
work. I do not understand why it needs to imitate the behavior of
pre-standard C preprocessors.

Signed-off-by: Masahiro Yamada 
---

 arch/powerpc/boot/.gitignore |  2 --
 arch/powerpc/boot/Makefile   | 14 +++---
 arch/powerpc/boot/serial.c   |  1 -
 3 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/boot/.gitignore b/arch/powerpc/boot/.gitignore
index 32034a0cc554..6610665fcf5e 100644
--- a/arch/powerpc/boot/.gitignore
+++ b/arch/powerpc/boot/.gitignore
@@ -44,5 +44,3 @@ fdt_sw.c
 fdt_wip.c
 libfdt.h
 libfdt_internal.h
-autoconf.h
-
diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 73d1f3562978..b8a82be2af2a 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -20,9 +20,6 @@
 
 all: $(obj)/zImage
 
-compress-$(CONFIG_KERNEL_GZIP) := CONFIG_KERNEL_GZIP
-compress-$(CONFIG_KERNEL_XZ)   := CONFIG_KERNEL_XZ
-
 ifdef CROSS32_COMPILE
 BOOTCC := $(CROSS32_COMPILE)gcc
 BOOTAR := $(CROSS32_COMPILE)ar
@@ -34,7 +31,7 @@ endif
 BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
 -fno-strict-aliasing -O2 -msoft-float -mno-altivec -mno-vsx \
 -pipe -fomit-frame-pointer -fno-builtin -fPIC -nostdinc \
--D$(compress-y)
+$(LINUXINCLUDE)
 
 ifdef CONFIG_PPC64_BOOT_WRAPPER
 BOOTCFLAGS += -m64
@@ -51,7 +48,7 @@ BOOTCFLAGS+= -mlittle-endian
 BOOTCFLAGS += $(call cc-option,-mabi=elfv2)
 endif
 
-BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -traditional -nostdinc
+BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -nostdinc
 
 BOOTARFLAGS:= -cr$(KBUILD_ARFLAGS)
 
@@ -202,14 +199,9 @@ $(obj)/empty.c:
 $(obj)/zImage.coff.lds $(obj)/zImage.ps3.lds : $(obj)/%: $(srctree)/$(src)/%.S
$(Q)cp $< $@
 
-$(srctree)/$(src)/serial.c: $(obj)/autoconf.h
-
-$(obj)/autoconf.h: $(obj)/%: $(objtree)/include/generated/%
-   $(Q)cp $< $@
-
 clean-files := $(zlib-) $(zlibheader-) $(zliblinuxheader-) \
$(zlib-decomp-) $(libfdt) $(libfdtheader) \
-   autoconf.h empty.c zImage.coff.lds zImage.ps3.lds zImage.lds
+   empty.c zImage.coff.lds zImage.ps3.lds zImage.lds
 
 quiet_cmd_bootcc = BOOTCC  $@
   cmd_bootcc = $(BOOTCC) -Wp,-MD,$(depfile) $(BOOTCFLAGS) -c -o $@ $<
diff --git a/arch/powerpc/boot/serial.c b/arch/powerpc/boot/serial.c
index b0491b8c0199..9457863147f9 100644
--- a/arch/powerpc/boot/serial.c
+++ b/arch/powerpc/boot/serial.c
@@ -18,7 +18,6 @@
 #include "stdio.h"
 #include "io.h"
 #include "ops.h"
-#include "autoconf.h"
 
 static int serial_open(void)
 {
-- 
2.17.1



[PATCH v3 1/6] soc/fsl/qe: qe.c: drop useless static qualifier

2019-05-13 Thread Rasmus Villemoes
The local variable snum_init has no reason to have static storage duration.

Reviewed-by: Christophe Leroy 
Reviewed-by: Qiang Zhao 
Signed-off-by: Rasmus Villemoes 
---
 drivers/soc/fsl/qe/qe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c
index 612d9c551be5..855373deb746 100644
--- a/drivers/soc/fsl/qe/qe.c
+++ b/drivers/soc/fsl/qe/qe.c
@@ -306,7 +306,7 @@ static void qe_snums_init(void)
0x28, 0x29, 0x38, 0x39, 0x48, 0x49, 0x58, 0x59,
0x68, 0x69, 0x78, 0x79, 0x80, 0x81,
};
-   static const u8 *snum_init;
+   const u8 *snum_init;
 
qe_num_of_snum = qe_get_num_of_snums();
 
-- 
2.20.1



[PATCH v3 6/6] soc/fsl/qe: qe.c: fold qe_get_num_of_snums into qe_snums_init

2019-05-13 Thread Rasmus Villemoes
The comment "No QE ever has fewer than 28 SNUMs" is false; e.g. the
MPC8309 has 14. The code path returning -EINVAL is also a recipe for
instant disaster, since the caller (qe_snums_init) uncritically
assigns the return value to the unsigned qe_num_of_snum, and would
thus proceed to attempt to copy 4GB from snum_init_46[] to the snum[]
array.

So fold the handling of the legacy fsl,qe-num-snums into
qe_snums_init, and make sure we do not end up using the snum_init_46
array in cases other than the two where we know it makes sense.

Reviewed-by: Christophe Leroy 
Reviewed-by: Qiang Zhao 
Signed-off-by: Rasmus Villemoes 
---
 drivers/soc/fsl/qe/qe.c | 46 ++---
 1 file changed, 16 insertions(+), 30 deletions(-)

diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c
index 1d27187b251c..852060caff24 100644
--- a/drivers/soc/fsl/qe/qe.c
+++ b/drivers/soc/fsl/qe/qe.c
@@ -308,24 +308,33 @@ static void qe_snums_init(void)
int i;
 
bitmap_zero(snum_state, QE_NUM_OF_SNUM);
+   qe_num_of_snum = 28; /* The default number of snum for threads is 28 */
qe = qe_get_device_node();
if (qe) {
i = of_property_read_variable_u8_array(qe, "fsl,qe-snums",
   snums, 1, 
QE_NUM_OF_SNUM);
-   of_node_put(qe);
if (i > 0) {
+   of_node_put(qe);
qe_num_of_snum = i;
return;
}
+   /*
+* Fall back to legacy binding of using the value of
+* fsl,qe-num-snums to choose one of the static arrays
+* above.
+*/
+   of_property_read_u32(qe, "fsl,qe-num-snums", &qe_num_of_snum);
+   of_node_put(qe);
}
 
-   qe_num_of_snum = qe_get_num_of_snums();
-
-   if (qe_num_of_snum == 76)
+   if (qe_num_of_snum == 76) {
snum_init = snum_init_76;
-   else
+   } else if (qe_num_of_snum == 28 || qe_num_of_snum == 46) {
snum_init = snum_init_46;
-
+   } else {
+   pr_err("QE: unsupported value of fsl,qe-num-snums: %u\n", 
qe_num_of_snum);
+   return;
+   }
memcpy(snums, snum_init, qe_num_of_snum);
 }
 
@@ -642,30 +651,7 @@ EXPORT_SYMBOL(qe_get_num_of_risc);
 
 unsigned int qe_get_num_of_snums(void)
 {
-   struct device_node *qe;
-   int size;
-   unsigned int num_of_snums;
-   const u32 *prop;
-
-   num_of_snums = 28; /* The default number of snum for threads is 28 */
-   qe = qe_get_device_node();
-   if (!qe)
-   return num_of_snums;
-
-   prop = of_get_property(qe, "fsl,qe-num-snums", &size);
-   if (prop && size == sizeof(*prop)) {
-   num_of_snums = *prop;
-   if ((num_of_snums < 28) || (num_of_snums > QE_NUM_OF_SNUM)) {
-   /* No QE ever has fewer than 28 SNUMs */
-   pr_err("QE: number of snum is invalid\n");
-   of_node_put(qe);
-   return -EINVAL;
-   }
-   }
-
-   of_node_put(qe);
-
-   return num_of_snums;
+   return qe_num_of_snum;
 }
 EXPORT_SYMBOL(qe_get_num_of_snums);
 
-- 
2.20.1



[PATCH v3 5/6] soc/fsl/qe: qe.c: support fsl,qe-snums property

2019-05-13 Thread Rasmus Villemoes
Add driver support for the newly introduced fsl,qe-snums property.

Conveniently, of_property_read_variable_u8_array does exactly what we
need: If the property fsl,qe-snums is found (and has an allowed size),
the array of values get copied to snums, and the return value is the
number of snums - we cannot assign directly to num_of_snums, since we
need to check whether the return value is negative.

Reviewed-by: Christophe Leroy 
Reviewed-by: Qiang Zhao 
Signed-off-by: Rasmus Villemoes 
---
 drivers/soc/fsl/qe/qe.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c
index 4b444846d590..1d27187b251c 100644
--- a/drivers/soc/fsl/qe/qe.c
+++ b/drivers/soc/fsl/qe/qe.c
@@ -283,7 +283,6 @@ EXPORT_SYMBOL(qe_clock_source);
  */
 static void qe_snums_init(void)
 {
-   int i;
static const u8 snum_init_76[] = {
0x04, 0x05, 0x0C, 0x0D, 0x14, 0x15, 0x1C, 0x1D,
0x24, 0x25, 0x2C, 0x2D, 0x34, 0x35, 0x88, 0x89,
@@ -304,7 +303,21 @@ static void qe_snums_init(void)
0x28, 0x29, 0x38, 0x39, 0x48, 0x49, 0x58, 0x59,
0x68, 0x69, 0x78, 0x79, 0x80, 0x81,
};
+   struct device_node *qe;
const u8 *snum_init;
+   int i;
+
+   bitmap_zero(snum_state, QE_NUM_OF_SNUM);
+   qe = qe_get_device_node();
+   if (qe) {
+   i = of_property_read_variable_u8_array(qe, "fsl,qe-snums",
+  snums, 1, 
QE_NUM_OF_SNUM);
+   of_node_put(qe);
+   if (i > 0) {
+   qe_num_of_snum = i;
+   return;
+   }
+   }
 
qe_num_of_snum = qe_get_num_of_snums();
 
@@ -313,7 +326,6 @@ static void qe_snums_init(void)
else
snum_init = snum_init_46;
 
-   bitmap_zero(snum_state, QE_NUM_OF_SNUM);
memcpy(snums, snum_init, qe_num_of_snum);
 }
 
-- 
2.20.1



[PATCH v3 4/6] dt-bindings: soc/fsl: qe: document new fsl,qe-snums binding

2019-05-13 Thread Rasmus Villemoes
Reading table 4-30, and its footnotes, of the QUICC Engine Block
Reference Manual shows that the set of snum _values_ is not
necessarily just a function of the _number_ of snums, as given in the
fsl,qe-num-snums property.

As an alternative, to make it easier to add support for other variants
of the QUICC engine IP, this introduces a new binding fsl,qe-snums,
which automatically encodes both the number of snums and the actual
values to use.

Signed-off-by: Rasmus Villemoes 
---
Rob, thanks for the review of v2. However, since I moved the example
from the commit log to the binding (per Joakim's request), I didn't
add a Reviewed-by tag for this revision.

 .../devicetree/bindings/soc/fsl/cpm_qe/qe.txt   | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/soc/fsl/cpm_qe/qe.txt 
b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/qe.txt
index d7afaff5faff..05ec2a838c54 100644
--- a/Documentation/devicetree/bindings/soc/fsl/cpm_qe/qe.txt
+++ b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/qe.txt
@@ -18,7 +18,8 @@ Required properties:
 - reg : offset and length of the device registers.
 - bus-frequency : the clock frequency for QUICC Engine.
 - fsl,qe-num-riscs: define how many RISC engines the QE has.
-- fsl,qe-num-snums: define how many serial number(SNUM) the QE can use for the
+- fsl,qe-snums: This property has to be specified as '/bits/ 8' value,
+  defining the array of serial number (SNUM) values for the virtual
   threads.
 
 Optional properties:
@@ -34,6 +35,11 @@ Recommended properties
 - brg-frequency : the internal clock source frequency for baud-rate
   generators in Hz.
 
+Deprecated properties
+- fsl,qe-num-snums: define how many serial number(SNUM) the QE can use
+  for the threads. Use fsl,qe-snums instead to not only specify the
+  number of snums, but also their values.
+
 Example:
  qe@e010 {
#address-cells = <1>;
@@ -44,6 +50,11 @@ Example:
reg = ;
brg-frequency = <0>;
bus-frequency = <179A7B00>;
+   fsl,qe-snums = /bits/ 8 <
+   0x04 0x05 0x0C 0x0D 0x14 0x15 0x1C 0x1D
+   0x24 0x25 0x2C 0x2D 0x34 0x35 0x88 0x89
+   0x98 0x99 0xA8 0xA9 0xB8 0xB9 0xC8 0xC9
+   0xD8 0xD9 0xE8 0xE9>;
  }
 
 * Multi-User RAM (MURAM)
-- 
2.20.1



[PATCH v3 3/6] soc/fsl/qe: qe.c: introduce qe_get_device_node helper

2019-05-13 Thread Rasmus Villemoes
The 'try of_find_compatible_node(NULL, NULL, "fsl,qe"), fall back to
of_find_node_by_type(NULL, "qe")' pattern is repeated five
times. Factor it into a common helper.

Reviewed-by: Christophe Leroy 
Reviewed-by: Qiang Zhao 
Signed-off-by: Rasmus Villemoes 
---
 drivers/soc/fsl/qe/qe.c | 71 +
 1 file changed, 29 insertions(+), 42 deletions(-)

diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c
index 4b59109df22b..4b444846d590 100644
--- a/drivers/soc/fsl/qe/qe.c
+++ b/drivers/soc/fsl/qe/qe.c
@@ -56,6 +56,20 @@ static unsigned int qe_num_of_snum;
 
 static phys_addr_t qebase = -1;
 
+static struct device_node *qe_get_device_node(void)
+{
+   struct device_node *qe;
+
+   /*
+* Newer device trees have an "fsl,qe" compatible property for the QE
+* node, but we still need to support older device trees.
+*/
+   qe = of_find_compatible_node(NULL, NULL, "fsl,qe");
+   if (qe)
+   return qe;
+   return of_find_node_by_type(NULL, "qe");
+}
+
 static phys_addr_t get_qe_base(void)
 {
struct device_node *qe;
@@ -65,12 +79,9 @@ static phys_addr_t get_qe_base(void)
if (qebase != -1)
return qebase;
 
-   qe = of_find_compatible_node(NULL, NULL, "fsl,qe");
-   if (!qe) {
-   qe = of_find_node_by_type(NULL, "qe");
-   if (!qe)
-   return qebase;
-   }
+   qe = qe_get_device_node();
+   if (!qe)
+   return qebase;
 
ret = of_address_to_resource(qe, 0, &res);
if (!ret)
@@ -164,12 +175,9 @@ unsigned int qe_get_brg_clk(void)
if (brg_clk)
return brg_clk;
 
-   qe = of_find_compatible_node(NULL, NULL, "fsl,qe");
-   if (!qe) {
-   qe = of_find_node_by_type(NULL, "qe");
-   if (!qe)
-   return brg_clk;
-   }
+   qe = qe_get_device_node();
+   if (!qe)
+   return brg_clk;
 
prop = of_get_property(qe, "brg-frequency", &size);
if (prop && size == sizeof(*prop))
@@ -558,16 +566,9 @@ struct qe_firmware_info *qe_get_firmware_info(void)
 
initialized = 1;
 
-   /*
-* Newer device trees have an "fsl,qe" compatible property for the QE
-* node, but we still need to support older device trees.
-   */
-   qe = of_find_compatible_node(NULL, NULL, "fsl,qe");
-   if (!qe) {
-   qe = of_find_node_by_type(NULL, "qe");
-   if (!qe)
-   return NULL;
-   }
+   qe = qe_get_device_node();
+   if (!qe)
+   return NULL;
 
/* Find the 'firmware' child node */
fw = of_get_child_by_name(qe, "firmware");
@@ -613,16 +614,9 @@ unsigned int qe_get_num_of_risc(void)
unsigned int num_of_risc = 0;
const u32 *prop;
 
-   qe = of_find_compatible_node(NULL, NULL, "fsl,qe");
-   if (!qe) {
-   /* Older devices trees did not have an "fsl,qe"
-* compatible property, so we need to look for
-* the QE node by name.
-*/
-   qe = of_find_node_by_type(NULL, "qe");
-   if (!qe)
-   return num_of_risc;
-   }
+   qe = qe_get_device_node();
+   if (!qe)
+   return num_of_risc;
 
prop = of_get_property(qe, "fsl,qe-num-riscs", &size);
if (prop && size == sizeof(*prop))
@@ -642,16 +636,9 @@ unsigned int qe_get_num_of_snums(void)
const u32 *prop;
 
num_of_snums = 28; /* The default number of snum for threads is 28 */
-   qe = of_find_compatible_node(NULL, NULL, "fsl,qe");
-   if (!qe) {
-   /* Older devices trees did not have an "fsl,qe"
-* compatible property, so we need to look for
-* the QE node by name.
-*/
-   qe = of_find_node_by_type(NULL, "qe");
-   if (!qe)
-   return num_of_snums;
-   }
+   qe = qe_get_device_node();
+   if (!qe)
+   return num_of_snums;
 
prop = of_get_property(qe, "fsl,qe-num-snums", &size);
if (prop && size == sizeof(*prop)) {
-- 
2.20.1



[PATCH v3 2/6] soc/fsl/qe: qe.c: reduce static memory footprint by 1.7K

2019-05-13 Thread Rasmus Villemoes
The current array of struct qe_snum use 256*4 bytes for just keeping
track of the free/used state of each index, and the struct layout
means there's another 768 bytes of padding. If we just unzip that
structure, the array of snum values just use 256 bytes, while the
free/inuse state can be tracked in a 32 byte bitmap.

So this reduces the .data footprint by 1760 bytes. It also serves as
preparation for introducing another DT binding for specifying the snum
values.

Reviewed-by: Christophe Leroy 
Reviewed-by: Qiang Zhao 
Signed-off-by: Rasmus Villemoes 
---
 drivers/soc/fsl/qe/qe.c | 42 -
 1 file changed, 12 insertions(+), 30 deletions(-)

diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c
index 855373deb746..4b59109df22b 100644
--- a/drivers/soc/fsl/qe/qe.c
+++ b/drivers/soc/fsl/qe/qe.c
@@ -14,6 +14,7 @@
  * Free Software Foundation;  either version 2 of the  License, or (at your
  * option) any later version.
  */
+#include 
 #include 
 #include 
 #include 
@@ -43,25 +44,14 @@ static DEFINE_SPINLOCK(qe_lock);
 DEFINE_SPINLOCK(cmxgcr_lock);
 EXPORT_SYMBOL(cmxgcr_lock);
 
-/* QE snum state */
-enum qe_snum_state {
-   QE_SNUM_STATE_USED,
-   QE_SNUM_STATE_FREE
-};
-
-/* QE snum */
-struct qe_snum {
-   u8 num;
-   enum qe_snum_state state;
-};
-
 /* We allocate this here because it is used almost exclusively for
  * the communication processor devices.
  */
 struct qe_immap __iomem *qe_immr;
 EXPORT_SYMBOL(qe_immr);
 
-static struct qe_snum snums[QE_NUM_OF_SNUM];   /* Dynamically allocated SNUMs 
*/
+static u8 snums[QE_NUM_OF_SNUM];   /* Dynamically allocated SNUMs */
+static DECLARE_BITMAP(snum_state, QE_NUM_OF_SNUM);
 static unsigned int qe_num_of_snum;
 
 static phys_addr_t qebase = -1;
@@ -315,10 +305,8 @@ static void qe_snums_init(void)
else
snum_init = snum_init_46;
 
-   for (i = 0; i < qe_num_of_snum; i++) {
-   snums[i].num = snum_init[i];
-   snums[i].state = QE_SNUM_STATE_FREE;
-   }
+   bitmap_zero(snum_state, QE_NUM_OF_SNUM);
+   memcpy(snums, snum_init, qe_num_of_snum);
 }
 
 int qe_get_snum(void)
@@ -328,12 +316,10 @@ int qe_get_snum(void)
int i;
 
spin_lock_irqsave(&qe_lock, flags);
-   for (i = 0; i < qe_num_of_snum; i++) {
-   if (snums[i].state == QE_SNUM_STATE_FREE) {
-   snums[i].state = QE_SNUM_STATE_USED;
-   snum = snums[i].num;
-   break;
-   }
+   i = find_first_zero_bit(snum_state, qe_num_of_snum);
+   if (i < qe_num_of_snum) {
+   set_bit(i, snum_state);
+   snum = snums[i];
}
spin_unlock_irqrestore(&qe_lock, flags);
 
@@ -343,14 +329,10 @@ EXPORT_SYMBOL(qe_get_snum);
 
 void qe_put_snum(u8 snum)
 {
-   int i;
+   const u8 *p = memchr(snums, snum, qe_num_of_snum);
 
-   for (i = 0; i < qe_num_of_snum; i++) {
-   if (snums[i].num == snum) {
-   snums[i].state = QE_SNUM_STATE_FREE;
-   break;
-   }
-   }
+   if (p)
+   clear_bit(p - snums, snum_state);
 }
 EXPORT_SYMBOL(qe_put_snum);
 
-- 
2.20.1



[PATCH v3 0/6] soc/fsl/qe: cleanups and new DT binding

2019-05-13 Thread Rasmus Villemoes
This small series consists of some small cleanups and simplifications
of the QUICC engine driver, and introduces a new DT binding that makes
it much easier to support other variants of the QUICC engine IP block
that appears in the wild: There's no reason to expect in general that
the number of valid SNUMs uniquely determines the set of such, so it's
better to simply let the device tree specify the values (and,
implicitly via the array length, also the count).

Which tree should this go through?

v3:
- Move example from commit log into binding document (adapting to
  MPC8360 which the existing example pertains to).
- Add more review tags.
- Fix minor style issue.

v2:
- Address comments from Christophe Leroy
- Add his Reviewed-by to 1/6 and 3/6
- Split DT binding update to separate patch as per
  Documentation/devicetree/bindings/submitting-patches.txt

Rasmus Villemoes (6):
  soc/fsl/qe: qe.c: drop useless static qualifier
  soc/fsl/qe: qe.c: reduce static memory footprint by 1.7K
  soc/fsl/qe: qe.c: introduce qe_get_device_node helper
  dt-bindings: soc/fsl: qe: document new fsl,qe-snums binding
  soc/fsl/qe: qe.c: support fsl,qe-snums property
  soc/fsl/qe: qe.c: fold qe_get_num_of_snums into qe_snums_init

 .../devicetree/bindings/soc/fsl/cpm_qe/qe.txt |  13 +-
 drivers/soc/fsl/qe/qe.c   | 163 +++---
 2 files changed, 77 insertions(+), 99 deletions(-)

-- 
2.20.1



Re: [PATCH 1/2] perf ioctl: Add check for the sample_period value

2019-05-13 Thread Ravi Bangoria



On 5/13/19 2:26 PM, Peter Zijlstra wrote:
> On Mon, May 13, 2019 at 09:42:13AM +0200, Peter Zijlstra wrote:
>> On Sat, May 11, 2019 at 08:12:16AM +0530, Ravi Bangoria wrote:
>>> Add a check for sample_period value sent from userspace. Negative
>>> value does not make sense. And in powerpc arch code this could cause
>>> a recursive PMI leading to a hang (reported when running perf-fuzzer).
>>>
>>> Signed-off-by: Ravi Bangoria 
>>> ---
>>>  kernel/events/core.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>> index abbd4b3b96c2..e44c90378940 100644
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -5005,6 +5005,9 @@ static int perf_event_period(struct perf_event 
>>> *event, u64 __user *arg)
>>> if (perf_event_check_period(event, value))
>>> return -EINVAL;
>>>  
>>> +   if (!event->attr.freq && (value & (1ULL << 63)))
>>> +   return -EINVAL;
>>
>> Well, perf_event_attr::sample_period is __u64. Would not be the site
>> using it as signed be the one in error?
> 
> You forgot to mention commit: 0819b2e30ccb9, so I guess this just makes
> it consistent and is fine.
> 

Yeah, I was about to reply :)



[PATCH 2/2] powerpc/lib: only build ldstfp.o when CONFIG_PPC_FPU is set

2019-05-13 Thread Christophe Leroy
The entire code in ldstfp.o is enclosed into #ifdef CONFIG_PPC_FPU,
so there is no point in building it when this config is not selected.

Fixes: cd64d1697cf0 ("powerpc: mtmsrd not defined")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/lib/Makefile | 3 ++-
 arch/powerpc/lib/ldstfp.S | 4 
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 17fce3738d48..eebc782d89a5 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -49,7 +49,8 @@ obj64-$(CONFIG_KPROBES_SANITY_TEST)   += test_emulate_step.o \
 obj-y  += checksum_$(BITS).o checksum_wrappers.o \
   string_$(BITS).o
 
-obj-y  += sstep.o ldstfp.o
+obj-y  += sstep.o
+obj-$(CONFIG_PPC_FPU)  += ldstfp.o
 obj64-y+= quad.o
 
 obj-$(CONFIG_PPC_LIB_RHEAP) += rheap.o
diff --git a/arch/powerpc/lib/ldstfp.S b/arch/powerpc/lib/ldstfp.S
index 32e91994b6b2..e388a3127cb6 100644
--- a/arch/powerpc/lib/ldstfp.S
+++ b/arch/powerpc/lib/ldstfp.S
@@ -18,8 +18,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_PPC_FPU
-
 #define STKFRM (PPC_MIN_STKFRM + 16)
 
 /* Get the contents of frN into *p; N is in r3 and p is in r4. */
@@ -241,5 +239,3 @@ _GLOBAL(conv_dp_to_sp)
MTMSRD(r6)
isync
blr
-
-#endif /* CONFIG_PPC_FPU */
-- 
2.13.3



[PATCH 1/2] powerpc/lib: fix redundant inclusion of quad.o

2019-05-13 Thread Christophe Leroy
quad.o is only for PPC64, and already included in obj64-y,
so it doesn't have to be in obj-y

Fixes: 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to 
handle alignment faults")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/lib/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index c55f9c27bf79..17fce3738d48 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -49,7 +49,7 @@ obj64-$(CONFIG_KPROBES_SANITY_TEST)   += test_emulate_step.o \
 obj-y  += checksum_$(BITS).o checksum_wrappers.o \
   string_$(BITS).o
 
-obj-y  += sstep.o ldstfp.o quad.o
+obj-y  += sstep.o ldstfp.o
 obj64-y+= quad.o
 
 obj-$(CONFIG_PPC_LIB_RHEAP) += rheap.o
-- 
2.13.3



Re: [PATCH 0/1] Forced-wakeup for stop lite states on Powernv

2019-05-13 Thread Abhishek

On 05/08/2019 10:29 AM, Nicholas Piggin wrote:

Abhishek Goel's on April 22, 2019 4:32 pm:

Currently, the cpuidle governors determine what idle state a idling CPU
should enter into based on heuristics that depend on the idle history on
that CPU. Given that no predictive heuristic is perfect, there are cases
where the governor predicts a shallow idle state, hoping that the CPU will
be busy soon. However, if no new workload is scheduled on that CPU in the
near future, the CPU will end up in the shallow state.

Motivation
--
In case of POWER, this is problematic, when the predicted state in the
aforementioned scenario is a lite stop state, as such lite states will
inhibit SMT folding, thereby depriving the other threads in the core from
using the core resources.

So we do not want to get stucked in such states for longer duration. To
address this, the cpuidle-core can queue timer to correspond with the
residency value of the next available state. This timer will forcefully
wakeup the cpu. Few such iterations will essentially train the governor to
select a deeper state for that cpu, as the timer here corresponds to the
next available cpuidle state residency. Cpu will be kicked out of the lite
state and end up in a non-lite state.

Experiment
--
I performed experiments for three scenarios to collect some data.

case 1 :
Without this patch and without tick retained, i.e. in a upstream kernel,
It would spend more than even a second to get out of stop0_lite.

case 2 : With tick retained in a upstream kernel -

Generally, we have a sched tick at 4ms(CONF_HZ = 250). Ideally I expected
it to take 8 sched tick to get out of stop0_lite. Experimentally,
observation was

=
sample  minmax   99percentile
20  4ms12ms  4ms
=

It would take atleast one sched tick to get out of stop0_lite.

case 2 :  With this patch (not stopping tick, but explicitly queuing a
   timer)


sample  min max 99percentile

20  144us   192us   144us


In this patch, we queue a timer just before entering into a stop0_lite
state. The timer fires at (residency of next available state + exit latency
of next available state * 2). Let's say if next state(stop0) is available
which has residency of 20us, it should get out in as low as (20+2*2)*8
[Based on the forumla (residency + 2xlatency)*history length] microseconds
= 192us. Ideally we would expect 8 iterations, it was observed to get out
in 6-7 iterations. Even if let's say stop2 is next available state(stop0
and stop1 both are unavailable), it would take (100+2*10)*8 = 960us to get
into stop2.

So, We are able to get out of stop0_lite generally in 150us(with this
patch) as compared to 4ms(with tick retained). As stated earlier, we do not
want to get stuck into stop0_lite as it inhibits SMT folding for other
sibling threads, depriving them of core resources. Current patch is using
forced-wakeup only for stop0_lite, as it gives performance benefit(primary
reason) along with lowering down power consumption. We may extend this
model for other states in future.

I still have to wonder, between our snooze loop and stop0, what does
stop0_lite buy us.

That said, the problem you're solving here is a generic one that all
stop states have, I think. Doesn't the same thing apply going from
stop0 to stop5? You might under estimate the sleep time and lose power
savings and therefore performance there too. Shouldn't we make it
generic for all stop states?

Thanks,
Nick


When a cpu is in snooze, it takes both space and time of core. When in 
stop0_lite,
it free up time but it still takes space. When it is in stop0 or deeper, 
it free up both

space and time slice of core.
In stop0_lite, cpu doesn't free up the core resources and thus inhibits 
thread
folding. When a cpu goes to stop0, it will free up the core resources 
thus increasing

the single thread performance of other sibling thread.
Hence, we do not want to get stuck in stop0_lite for long duration, and 
want to quickly

move onto the next state.
If we get stuck in any other state we would possibly be losing on to 
power saving,
but will still be able to gain the performance benefits for other 
sibling threads.


Thanks,
Abhishek



Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-13 Thread Andy Shevchenko
On Mon, May 13, 2019 at 08:52:41AM +, David Laight wrote:
> From: christophe leroy
> > Sent: 10 May 2019 18:35
> > Le 10/05/2019 à 18:24, Steven Rostedt a écrit :
> > > On Fri, 10 May 2019 10:42:13 +0200
> > > Petr Mladek  wrote:

> > >> -if (probe_kernel_address(ptr, byte))
> > >> +if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
> > >>  return "(efault)";
> 
> "efault" looks a bit like a spellling mistake for "default".

It's a special, thus it's in parenthesis, though somebody can be
misguided.

> > Usually, < PAGE_SIZE means NULL pointer dereference (via the member of a
> > struct)
> 
> Maybe the caller should pass in a short buffer so that you can return
> "(err-%d)"
> or "(null+%#x)" ?

In both cases it should be limited to the size of pointer (8 or 16
characters). Something like "(e:%4d)" would work for error codes.

The "(null)" is good enough by itself and already an established
practice..

-- 
With Best Regards,
Andy Shevchenko




Re: [PATCH v2] powerpc: Fix compile issue with force DAWR

2019-05-13 Thread Christophe Leroy




Le 13/05/2019 à 09:17, Michael Neuling a écrit :

If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail
at linking with:
   arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference 
to `dawr_force_enable'

This was caused by commit c1fe190c0672 ("powerpc: Add force enable of
DAWR on P9 option").

This puts more of the dawr infrastructure in a new file.


I think you are doing a bit more than that. I think you should explain 
that you define a new CONFIG_ option, when it is selected, etc ...


The commit you are referring to is talking about P9. It looks like your 
patch covers a lot more, so it should be mentionned her I guess.




Signed-off-by: Michael Neuling 


You should add a Fixes: tag, ie:

Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")


--
v2:
   Fixes based on Christophe Leroy's comments:
   - Fix commit message formatting
   - Move more DAWR code into dawr.c
---
  arch/powerpc/Kconfig |  5 ++
  arch/powerpc/include/asm/hw_breakpoint.h | 20 ---
  arch/powerpc/kernel/Makefile |  1 +
  arch/powerpc/kernel/dawr.c   | 75 
  arch/powerpc/kernel/hw_breakpoint.c  | 56 --
  arch/powerpc/kvm/Kconfig |  1 +
  6 files changed, 94 insertions(+), 64 deletions(-)
  create mode 100644 arch/powerpc/kernel/dawr.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2711aac246..f4b19e48cc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -242,6 +242,7 @@ config PPC
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK
select VIRT_TO_BUS  if !PPC64
+   select PPC_DAWR_FORCE_ENABLEif PPC64 || PERF


What's PERF ? Did you mean PERF_EVENTS ?

Then what you mean is:
- Selected all the time for PPC64
- Selected for PPC32 when PERF is also selected.

Is that what you want ? At first I thought it was linked to P9.


And ... did you read below statement ?


#
# Please keep this list sorted alphabetically.
#
@@ -369,6 +370,10 @@ config PPC_ADV_DEBUG_DAC_RANGE
depends on PPC_ADV_DEBUG_REGS && 44x
default y
  
+config PPC_DAWR_FORCE_ENABLE

+   bool
+   default y
+


Why defaulting it to y. Then how is it set to n ?


  config ZONE_DMA
bool
default y if PPC_BOOK3E_64
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 0fe8c1e46b..ffbc8eab41 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -47,6 +47,8 @@ struct arch_hw_breakpoint {
  #define HW_BRK_TYPE_PRIV_ALL  (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL | \
 HW_BRK_TYPE_HYP)
  
+extern int set_dawr(struct arch_hw_breakpoint *brk);

+
  #ifdef CONFIG_HAVE_HW_BREAKPOINT
  #include 
  #include 
@@ -90,18 +92,20 @@ static inline void hw_breakpoint_disable(void)
  extern void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs);
  int hw_breakpoint_handler(struct die_args *args);
  
-extern int set_dawr(struct arch_hw_breakpoint *brk);

-extern bool dawr_force_enable;
-static inline bool dawr_enabled(void)
-{
-   return dawr_force_enable;
-}
-


That's a very simple function, why not keep it here (or in another .h) 
as 'static inline' ?



  #else /* CONFIG_HAVE_HW_BREAKPOINT */
  static inline void hw_breakpoint_disable(void) { }
  static inline void thread_change_pc(struct task_struct *tsk,
struct pt_regs *regs) { }
-static inline bool dawr_enabled(void) { return false; }
+
  #endif/* CONFIG_HAVE_HW_BREAKPOINT */
+
+extern bool dawr_force_enable;
+
+#ifdef CONFIG_PPC_DAWR_FORCE_ENABLE
+extern bool dawr_enabled(void);


Functions should not be 'extern'. I'm sure checkpatch --strict will tell 
you.



+#else
+#define dawr_enabled() true


true by default ?
Previously it was false by default.

And why a #define ? That's better to keep a static inline.


+#endif
+
  #endif/* __KERNEL__ */
  #endif/* _PPC_BOOK3S_64_HW_BREAKPOINT_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 0ea6c4aa3a..a9c497c34f 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -56,6 +56,7 @@ obj-$(CONFIG_PPC64)   += setup_64.o sys_ppc32.o \
  obj-$(CONFIG_VDSO32)  += vdso32/
  obj-$(CONFIG_PPC_WATCHDOG)+= watchdog.o
  obj-$(CONFIG_HAVE_HW_BREAKPOINT)  += hw_breakpoint.o
+obj-$(CONFIG_PPC_DAWR_FORCE_ENABLE)+= dawr.o
  obj-$(CONFIG_PPC_BOOK3S_64)   += cpu_setup_ppc970.o cpu_setup_pa6t.o
  obj-$(CONFIG_PPC_BOOK3S_64)   += cpu_setup_power.o
  obj-$(CONFIG_PPC_BOOK3S_64)   += mce.o mce_power.o
diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c
new file mode 100644
index 00..cf1d02fe1e
--- /dev/null
+++ b/arch/powerpc/kernel/dawr.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identif

Re: [PATCH 1/2] perf ioctl: Add check for the sample_period value

2019-05-13 Thread Peter Zijlstra
On Mon, May 13, 2019 at 09:42:13AM +0200, Peter Zijlstra wrote:
> On Sat, May 11, 2019 at 08:12:16AM +0530, Ravi Bangoria wrote:
> > Add a check for sample_period value sent from userspace. Negative
> > value does not make sense. And in powerpc arch code this could cause
> > a recursive PMI leading to a hang (reported when running perf-fuzzer).
> > 
> > Signed-off-by: Ravi Bangoria 
> > ---
> >  kernel/events/core.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index abbd4b3b96c2..e44c90378940 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -5005,6 +5005,9 @@ static int perf_event_period(struct perf_event 
> > *event, u64 __user *arg)
> > if (perf_event_check_period(event, value))
> > return -EINVAL;
> >  
> > +   if (!event->attr.freq && (value & (1ULL << 63)))
> > +   return -EINVAL;
> 
> Well, perf_event_attr::sample_period is __u64. Would not be the site
> using it as signed be the one in error?

You forgot to mention commit: 0819b2e30ccb9, so I guess this just makes
it consistent and is fine.


RE: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-13 Thread David Laight
From: christophe leroy
> Sent: 10 May 2019 18:35
> Le 10/05/2019 à 18:24, Steven Rostedt a écrit :
> > On Fri, 10 May 2019 10:42:13 +0200
> > Petr Mladek  wrote:
> >
> >>   static const char *check_pointer_msg(const void *ptr)
> >>   {
> >> -  char byte;
> >> -
> >>if (!ptr)
> >>return "(null)";
> >>
> >> -  if (probe_kernel_address(ptr, byte))
> >> +  if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
> >>return "(efault)";

"efault" looks a bit like a spellling mistake for "default".

> > < PAGE_SIZE ?
> >
> > do you mean: < TASK_SIZE ?
> 
> I guess not.
> 
> Usually, < PAGE_SIZE means NULL pointer dereference (via the member of a
> struct)

Maybe the caller should pass in a short buffer so that you can return
"(err-%d)" or "(null+%#x)" ?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: [PATCH v2 3/8] mm/memory_hotplug: arch_remove_memory() and __remove_pages() with CONFIG_MEMORY_HOTPLUG

2019-05-13 Thread David Hildenbrand
On 13.05.19 09:48, David Hildenbrand wrote:
> On 07.05.19 23:02, Dan Williams wrote:
>> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand  wrote:
>>>
>>> Let's prepare for better error handling while adding memory by allowing
>>> to use arch_remove_memory() and __remove_pages() even if
>>> CONFIG_MEMORY_HOTREMOVE is not set. CONFIG_MEMORY_HOTREMOVE effectively
>>> covers
>>> - Offlining of system ram (memory block devices) - offline_pages()
>>> - Unplug of system ram - remove_memory()
>>> - Unplug/remap of device memory - devm_memremap()
>>>
>>> This allows e.g. for handling like
>>>
>>> arch_add_memory()
>>> rc = do_something();
>>> if (rc) {
>>> arch_remove_memory();
>>> }
>>>
>>> Whereby do_something() will for example be memory block device creation
>>> after it has been factored out.
>>
>> What's left after this? Can we just get rid of CONFIG_MEMORY_HOTREMOVE
>> option completely when CONFIG_MEMORY_HOTPLUG is enabled? It's not
>> clear to me why there was ever the option to compile out the remove
>> code when the add code is included.
>>
> 
> If there are no other comments, I will go ahead and rip out
> CONFIG_MEMORY_HOTREMOVE completely, gluing the functionality to
> CONFIG_MEMORY_HOTPLUG.
> 

H, however this will require CONFIG_MEMORY_HOTPLUG to require

- MEMORY_ISOLATION
- HAVE_BOOTMEM_INFO_NODE if (X86_64 || PPC64)

And depends on
- MIGRATION

Which would limit the configurations where memory hotplug would be
available. I guess going with this patch here is ok as a first step.

I just realized, that we'll need arch_remove_memory() for arm64 to make
this patch here work.

-- 

Thanks,

David / dhildenb


Re: [PATCH v2 3/8] mm/memory_hotplug: arch_remove_memory() and __remove_pages() with CONFIG_MEMORY_HOTPLUG

2019-05-13 Thread David Hildenbrand
On 07.05.19 23:02, Dan Williams wrote:
> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand  wrote:
>>
>> Let's prepare for better error handling while adding memory by allowing
>> to use arch_remove_memory() and __remove_pages() even if
>> CONFIG_MEMORY_HOTREMOVE is not set. CONFIG_MEMORY_HOTREMOVE effectively
>> covers
>> - Offlining of system ram (memory block devices) - offline_pages()
>> - Unplug of system ram - remove_memory()
>> - Unplug/remap of device memory - devm_memremap()
>>
>> This allows e.g. for handling like
>>
>> arch_add_memory()
>> rc = do_something();
>> if (rc) {
>> arch_remove_memory();
>> }
>>
>> Whereby do_something() will for example be memory block device creation
>> after it has been factored out.
> 
> What's left after this? Can we just get rid of CONFIG_MEMORY_HOTREMOVE
> option completely when CONFIG_MEMORY_HOTPLUG is enabled? It's not
> clear to me why there was ever the option to compile out the remove
> code when the add code is included.
> 

If there are no other comments, I will go ahead and rip out
CONFIG_MEMORY_HOTREMOVE completely, gluing the functionality to
CONFIG_MEMORY_HOTPLUG.

-- 

Thanks,

David / dhildenb


Re: [PATCH 1/2] perf ioctl: Add check for the sample_period value

2019-05-13 Thread Peter Zijlstra
On Sat, May 11, 2019 at 08:12:16AM +0530, Ravi Bangoria wrote:
> Add a check for sample_period value sent from userspace. Negative
> value does not make sense. And in powerpc arch code this could cause
> a recursive PMI leading to a hang (reported when running perf-fuzzer).
> 
> Signed-off-by: Ravi Bangoria 
> ---
>  kernel/events/core.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index abbd4b3b96c2..e44c90378940 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5005,6 +5005,9 @@ static int perf_event_period(struct perf_event *event, 
> u64 __user *arg)
>   if (perf_event_check_period(event, value))
>   return -EINVAL;
>  
> + if (!event->attr.freq && (value & (1ULL << 63)))
> + return -EINVAL;

Well, perf_event_attr::sample_period is __u64. Would not be the site
using it as signed be the one in error?


Re: [PATCH] powerpc: Document xive=off option

2019-05-13 Thread Cédric Le Goater
On 5/13/19 7:39 AM, Michael Neuling wrote:
> commit 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE
> interrupt controller") added an option to turn off Linux native XIVE
> usage via the xive=off kernel command line option.
> 
> This documents this option.
> 
> Signed-off-by: Michael Neuling 



Reviewed-by: Cédric Le Goater 


But,

We should fix the behavior because xive=off does not work on pseries. 
This is not handled correctly in prom when CAS negotiates with the 
hypervisor which interrupt mode is to be used. 

I haven't tried this option on PowerNV.

Cheers,

C.


> ---
>  Documentation/admin-guide/kernel-parameters.txt | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index c45a19d654..ee410d0ef4 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -5177,6 +5177,15 @@
>   Format:
>   
> ,[,[,[,]]]
>  
> + xive=   [PPC]
> + By default on POWER9 and above, the kernel will
> + natively use the XIVE interrupt controller. This option
> + allows the fallback firmware mode to be used:
> +
> + off   Fallback to firmware control of XIVE interrupt
> +   controller on both pseries and powernv
> +   platforms. Only useful on POWER9 and above.
> +
>   xhci-hcd.quirks [USB,KNL]
>   A hex value specifying bitmask with supplemental xhci
>   host controller quirks. Meaning of each bit can be
> 



[PATCH v2] powerpc: Fix compile issue with force DAWR

2019-05-13 Thread Michael Neuling
If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail
at linking with:
  arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference to 
`dawr_force_enable'

This was caused by commit c1fe190c0672 ("powerpc: Add force enable of
DAWR on P9 option").

This puts more of the dawr infrastructure in a new file.

Signed-off-by: Michael Neuling 
--
v2:
  Fixes based on Christophe Leroy's comments:
  - Fix commit message formatting
  - Move more DAWR code into dawr.c
---
 arch/powerpc/Kconfig |  5 ++
 arch/powerpc/include/asm/hw_breakpoint.h | 20 ---
 arch/powerpc/kernel/Makefile |  1 +
 arch/powerpc/kernel/dawr.c   | 75 
 arch/powerpc/kernel/hw_breakpoint.c  | 56 --
 arch/powerpc/kvm/Kconfig |  1 +
 6 files changed, 94 insertions(+), 64 deletions(-)
 create mode 100644 arch/powerpc/kernel/dawr.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2711aac246..f4b19e48cc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -242,6 +242,7 @@ config PPC
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK
select VIRT_TO_BUS  if !PPC64
+   select PPC_DAWR_FORCE_ENABLEif PPC64 || PERF
#
# Please keep this list sorted alphabetically.
#
@@ -369,6 +370,10 @@ config PPC_ADV_DEBUG_DAC_RANGE
depends on PPC_ADV_DEBUG_REGS && 44x
default y
 
+config PPC_DAWR_FORCE_ENABLE
+   bool
+   default y
+
 config ZONE_DMA
bool
default y if PPC_BOOK3E_64
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 0fe8c1e46b..ffbc8eab41 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -47,6 +47,8 @@ struct arch_hw_breakpoint {
 #define HW_BRK_TYPE_PRIV_ALL   (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL | \
 HW_BRK_TYPE_HYP)
 
+extern int set_dawr(struct arch_hw_breakpoint *brk);
+
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 #include 
 #include 
@@ -90,18 +92,20 @@ static inline void hw_breakpoint_disable(void)
 extern void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs);
 int hw_breakpoint_handler(struct die_args *args);
 
-extern int set_dawr(struct arch_hw_breakpoint *brk);
-extern bool dawr_force_enable;
-static inline bool dawr_enabled(void)
-{
-   return dawr_force_enable;
-}
-
 #else  /* CONFIG_HAVE_HW_BREAKPOINT */
 static inline void hw_breakpoint_disable(void) { }
 static inline void thread_change_pc(struct task_struct *tsk,
struct pt_regs *regs) { }
-static inline bool dawr_enabled(void) { return false; }
+
 #endif /* CONFIG_HAVE_HW_BREAKPOINT */
+
+extern bool dawr_force_enable;
+
+#ifdef CONFIG_PPC_DAWR_FORCE_ENABLE
+extern bool dawr_enabled(void);
+#else
+#define dawr_enabled() true
+#endif
+
 #endif /* __KERNEL__ */
 #endif /* _PPC_BOOK3S_64_HW_BREAKPOINT_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 0ea6c4aa3a..a9c497c34f 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -56,6 +56,7 @@ obj-$(CONFIG_PPC64)   += setup_64.o sys_ppc32.o \
 obj-$(CONFIG_VDSO32)   += vdso32/
 obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)   += hw_breakpoint.o
+obj-$(CONFIG_PPC_DAWR_FORCE_ENABLE)+= dawr.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_ppc970.o cpu_setup_pa6t.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_power.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= mce.o mce_power.o
diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c
new file mode 100644
index 00..cf1d02fe1e
--- /dev/null
+++ b/arch/powerpc/kernel/dawr.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0+
+//
+// DAWR infrastructure
+//
+// Copyright 2019, Michael Neuling, IBM Corporation.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+bool dawr_force_enable;
+EXPORT_SYMBOL_GPL(dawr_force_enable);
+
+extern bool dawr_enabled(void)
+{
+   return dawr_force_enable;
+}
+EXPORT_SYMBOL_GPL(dawr_enabled);
+
+static ssize_t dawr_write_file_bool(struct file *file,
+   const char __user *user_buf,
+   size_t count, loff_t *ppos)
+{
+   struct arch_hw_breakpoint null_brk = {0, 0, 0};
+   size_t rc;
+
+   /* Send error to user if they hypervisor won't allow us to write DAWR */
+   if ((!dawr_force_enable) &&
+   (firmware_has_feature(FW_FEATURE_LPAR)) &&
+   (set_dawr(&null_brk) != H_SUCCESS))
+   return -1;
+
+   rc = debugfs_write_file_bool(file, user_buf, count, ppos);
+   if (rc)
+   return rc;
+
+   /* If we are clearing, make sure all CPUs have the DAWR cleared */
+   if (!dawr_force_enable)
+   smp_c

Re: [PATCH 03/16] lib,treewide: add new match_string() helper/macro

2019-05-13 Thread Ardelean, Alexandru
On Fri, 2019-05-10 at 17:34 +0300, andriy.shevche...@linux.intel.com wrote:
> [External]
> 
> 
> On Fri, May 10, 2019 at 09:15:27AM +, Ardelean, Alexandru wrote:
> > On Wed, 2019-05-08 at 16:22 +0300, Alexandru Ardelean wrote:
> > > On Wed, 2019-05-08 at 15:18 +0200, Greg KH wrote:
> > > > On Wed, May 08, 2019 at 04:11:28PM +0300, Andy Shevchenko wrote:
> > > > > On Wed, May 08, 2019 at 02:28:29PM +0300, Alexandru Ardelean
> > > > > wrote:
> > > > > Can you split include/linux/ change from the rest?
> > > > 
> > > > That would break the build, why do you want it split out?  This
> > > > makes
> > > > sense all as a single patch to me.
> > > > 
> > > 
> > > Not really.
> > > It would be just be the new match_string() helper/macro in a new
> > > commit.
> > > And the conversions of the simple users of match_string() (the ones
> > > using
> > > ARRAY_SIZE()) in another commit.
> > > 
> > 
> > I should have asked in my previous reply.
> > Leave this as-is or re-formulate in 2 patches ?
> 
> Depends on on what you would like to spend your time: collecting Acks for
> all
> pieces in treewide patch or send new API first followed up by per driver
> /
> module update in next cycle.

I actually would have preferred new API first, with the current
`match_string()` -> `__match_string()` rename from the start, but I wasn't
sure. I am still navigating through how feedbacks are working in this
realm.

I'll send a V2 with the API change-first/only; should be a smaller list.
Then see about follow-ups/changes per subsystems.

> 
> I also have no strong preference.
> And I think it's good to add Heikki Krogerus to Cc list for both patch
> series,
> since he is the author of sysfs variant and may have something to comment
> on
> the rest.

Thanks for the reference.

> 
> --
> With Best Regards,
> Andy Shevchenko
> 
>