Re: [PATCH 2/2] powerpc/8xx: Add microcode patch to move SMC parameter RAM.
Christophe Leroy writes: > Some SCC functions like the QMC requires an extended parameter RAM. > On modern 8xx (ie 866 and 885), SPI area can already be relocated, > allowing the use of those functions on SCC2. But SCC3 and SCC4 > parameter RAM collide with SMC1 and SMC2 parameter RAMs. > > This patch adds microcode to allow the relocation of both SMC1 and > SMC2, and relocate them at offsets 0x1ec0 and 0x1fc0. > Those offsets are by default for the CPM1 DSP1 and DSP2, but there > is no kernel driver using them at the moment so this area can be > reused. > > Signed-off-by: Christophe Leroy > --- > arch/powerpc/platforms/8xx/Kconfig | 7 ++ > arch/powerpc/platforms/8xx/micropatch.c | 109 > +++- > 2 files changed, 114 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/platforms/8xx/micropatch.c > b/arch/powerpc/platforms/8xx/micropatch.c > index 33a9042fca80..dc4423daf7d4 100644 > --- a/arch/powerpc/platforms/8xx/micropatch.c > +++ b/arch/powerpc/platforms/8xx/micropatch.c > @@ -622,6 +622,86 @@ static uint patch_2f00[] __initdata = { > }; > #endif > > +/* > + * SMC relocation patch arrays. > + */ > + > +#ifdef CONFIG_SMC_UCODE_PATCH > + > +static uint patch_2000[] __initdata = { > + 0x3fff, 0x3ffd, 0x3ffb, 0x3ff9, > + 0x5fefeff8, 0x5f91eff8, 0x3ff3, 0x3ff1, > + 0x3a11e710, 0xedf0ccb9, 0xf318ed66, 0x7f0e5fe2, Do we have any doc on what these values are? I get that it's microcode but do we have any more detail than that? What's the source etc? cheers
Re: [PATCH v2] powerpc: Fix compile issue with force DAWR
> > > > -- > > > > v2: > > > > Fixes based on Christophe Leroy's comments: > > > > - Fix commit message formatting > > > > - Move more DAWR code into dawr.c > > > > --- > > > >arch/powerpc/Kconfig | 5 ++ > > > >arch/powerpc/include/asm/hw_breakpoint.h | 20 --- > > > >arch/powerpc/kernel/Makefile | 1 + > > > >arch/powerpc/kernel/dawr.c | 75 > > > > > > > >arch/powerpc/kernel/hw_breakpoint.c | 56 -- > > > >arch/powerpc/kvm/Kconfig | 1 + > > > >6 files changed, 94 insertions(+), 64 deletions(-) > > > >create mode 100644 arch/powerpc/kernel/dawr.c > > > > > > > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > > > > index 2711aac246..f4b19e48cc 100644 > > > > --- a/arch/powerpc/Kconfig > > > > +++ b/arch/powerpc/Kconfig > > > > @@ -242,6 +242,7 @@ config PPC > > > > select SYSCTL_EXCEPTION_TRACE > > > > select THREAD_INFO_IN_TASK > > > > select VIRT_TO_BUS if !PPC64 > > > > + select PPC_DAWR_FORCE_ENABLEif PPC64 || PERF > > > > > > What's PERF ? Did you mean PERF_EVENTS ? > > > > > > Then what you mean is: > > > - Selected all the time for PPC64 > > > - Selected for PPC32 when PERF is also selected. > > > > > > Is that what you want ? At first I thought it was linked to P9. > > > > This is wrong. I think we just want PPC64. PERF is a red herring. > > Are you sure ? Michael suggested PERF || KVM as far as I remember. It was mpe but I think it was wrong. We could make it more selective with something like: PPC64 && (HW_BREAK || PPC_ADV_DEBUG_REGS || PERF) but I think that will end up back in the larger mess of https://github.com/linuxppc/issues/issues/128 I don't think the minor size gain is is worth delving in that mess, hence I made it simply PPC64 which is hopefully an improvement on what we have since it eliminates 32bit. > > > >#else/* CONFIG_HAVE_HW_BREAKPOINT */ > > > >static inline void hw_breakpoint_disable(void) { } > > > >static inline void thread_change_pc(struct task_struct *tsk, > > > > struct pt_regs *regs) { } > > > > -static inline bool dawr_enabled(void) { return false; } > > > > + > > > >#endif /* CONFIG_HAVE_HW_BREAKPOINT */ > > > > + > > > > +extern bool dawr_force_enable; > > > > + > > > > +#ifdef CONFIG_PPC_DAWR_FORCE_ENABLE > > > > +extern bool dawr_enabled(void); > > > > > > Functions should not be 'extern'. I'm sure checkpatch --strict will tell > > > you. > > > > That's not what's currently being done in this header file. I'm keeping > > with > > the style of that file. > > style is not defined on a per file basis. There is the style from the > past and the nowadays style. If you keep old style just because the file > includes old style statements, then the code will never improve. > > All new patches should come with clean 'checkpatch' report and follow > new style. Having mixed styles in a file is not a problem, that's the > way to improvement. See arch/powerpc/mm/mmu_decl.h as an exemple. I'm not sure that's mpe's policy. mpe? > > > > + > > > > +static ssize_t dawr_write_file_bool(struct file *file, > > > > + const char __user *user_buf, > > > > + size_t count, loff_t *ppos) > > > > +{ > > > > + struct arch_hw_breakpoint null_brk = {0, 0, 0}; > > > > + size_t rc; > > > > + > > > > + /* Send error to user if they hypervisor won't allow us to write > > > > DAWR */ > > > > + if ((!dawr_force_enable) && > > > > + (firmware_has_feature(FW_FEATURE_LPAR)) && > > > > + (set_dawr(&null_brk) != H_SUCCESS)) > > > > > > The above is not real clear. > > > set_dabr() returns 0, H_SUCCESS is not used there. > > > > It pseries_set_dawr() will return a hcall number. > > Right, then it maybe means set_dawr() should be fixes ? Sorry, I don't understand this. > > This code hasn't changed. I'm just moving it. > > Right, but could be an improvment for another patch. > As far as I remember you are the one who wrote that code at first place, > arent't you ? Yep, classic crap Mikey code :-) Mikey
Re: [PATCH] EDAC, mpc85xx: Prevent building as a module
Borislav Petkov writes: > On Fri, May 10, 2019 at 04:13:20PM +0200, Borislav Petkov wrote: >> On Fri, May 10, 2019 at 08:50:52PM +1000, Michael Ellerman wrote: >> > Yeah that looks better to me. I didn't think about the case where EDAC >> > core is modular. >> > >> > Do you want me to send a new patch? >> >> Nah, I'll fix it up. > > I've pushed it here: > > https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/commit/?h=edac-fix-for-5.2 > > in case you wanna throw your build tests on it. My dingy cross-compiler > can't do much really. Looks good. I even booted it :) cheers
Re: [PATCH] powerpc/mm: Handle page table allocation failures
"Aneesh Kumar K.V" writes: > This fix the below crash that arise due to not handling page table allocation > failures while allocating hugetlb page table. > > BUG: Kernel NULL pointer dereference at 0x001c > Faulting instruction address: 0xc1d1e58c > Oops: Kernel access of bad area, sig: 11 [#1] > LE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries > > CPU: 3 PID: 4635 Comm: futex_wake04 Tainted: GW O > 5.1.0-next-20190507-autotest #1 > NIP: c1d1e58c LR: c1d1e54c CTR: > REGS: c4937890 TRAP: 0300 Tainted: GW O > (5.1.0-next-20190507-autotest) > MSR: 80009033 CR: 22424822 XER: > CFAR: c183e9e0 DAR: 001c DSISR: 4000 IRQMASK: 0 > GPR00: c1901a80 c4937b20 c3938700 > GPR04: 00400cc0 0003efff 00027966e000 c3ba8700 > GPR08: c3ba8700 0d601125 c3ba8700 8000 > GPR12: 22424822 c0001ecae280 > GPR16: > GPR20: 0018 c39e2d30 c39e2d28 c002762da460 > GPR24: 001c 0001 c1901a80 > GPR28: 00400cc0 00400cc0 > NIP [c1d1e58c] kmem_cache_alloc+0xbc/0x5a0 > LR [c1d1e54c] kmem_cache_alloc+0x7c/0x5a0 > Call Trace: > [c1c91150] __pud_alloc+0x160/0x200 (unreliable) > [c1901a80] huge_pte_alloc+0x580/0x950 > [c1cf7910] hugetlb_fault+0x9a0/0x1250 > [c1c94a80] handle_mm_fault+0x490/0x4a0 > [c18d529c] __do_page_fault+0x77c/0x1f00 > [c18d6a48] do_page_fault+0x28/0x50 > [c183b0d4] handle_page_fault+0x18/0x38 > > Fixes: e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a > different page table format") > Reported-by: Sachin Sant > Signed-off-by: Aneesh Kumar K.V > --- > > Note: I did add a recent commit for the Fixes tag. But in reality we never > checked for page table > allocation failure there. If we want to go to that old commit, then we may > need. If we never checked for failure in that path, is there some reason we've only just noticed the crashes? Are we just testing under memory pressure more effectively than we used to? cheers
[PATCH] powerpc/book3s/mm: Clear MMU_FTR_HPTE_TABLE when radix is enabled.
Avoids confusion when printing Oops message like below Faulting instruction address: 0xc008bdb4 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV Either ibm,pa-features or ibm,powerpc-cpu-features can be used to enable the MMU features. We don't clear related MMU feature bits there. We use the kernel commandline to determine what translation mode we want to use and clear the HPTE or radix bit accordingly. On LPAR we do have to renable HASH bit if the hypervisor can't do radix. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/mm/init_64.c | 11 ++- arch/powerpc/mm/pgtable.c | 7 +++ 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index 45b02fa11cd8..652d95ba 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -355,15 +355,18 @@ static void __init early_check_vec5(void) chosen = of_get_flat_dt_subnode_by_name(root, "chosen"); if (chosen == -FDT_ERR_NOTFOUND) { cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX; + cur_cpu_spec->mmu_features |= MMU_FTR_HPTE_TABLE; return; } vec5 = of_get_flat_dt_prop(chosen, "ibm,architecture-vec-5", &size); if (!vec5) { cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX; + cur_cpu_spec->mmu_features |= MMU_FTR_HPTE_TABLE; return; } if (size <= OV5_INDX(OV5_MMU_SUPPORT)) { cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX; + cur_cpu_spec->mmu_features |= MMU_FTR_HPTE_TABLE; return; } @@ -381,17 +384,23 @@ static void __init early_check_vec5(void) } /* Do radix anyway - the hypervisor said we had to */ cur_cpu_spec->mmu_features |= MMU_FTR_TYPE_RADIX; + cur_cpu_spec->mmu_features &= ~MMU_FTR_HPTE_TABLE; } else if (mmu_supported == OV5_FEAT(OV5_MMU_HASH)) { /* Hypervisor only supports hash - disable radix */ cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX; + cur_cpu_spec->mmu_features |= MMU_FTR_HPTE_TABLE; } } void __init mmu_early_init_devtree(void) { /* Disable radix mode based on kernel command line. */ - if (disable_radix) + if (disable_radix) { cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX; + } else { + if (early_radix_enabled()) + cur_cpu_spec->mmu_features &= ~MMU_FTR_HPTE_TABLE; + } /* * Check /chosen/ibm,architecture-vec-5 if running as a guest. diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index b97aee03924f..0fa6cac3fe82 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -77,9 +77,6 @@ static struct page *maybe_pte_to_page(pte_t pte) static pte_t set_pte_filter_hash(pte_t pte) { - if (radix_enabled()) - return pte; - pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS); if (pte_looks_normal(pte) && !(cpu_has_feature(CPU_FTR_COHERENT_ICACHE) || cpu_has_feature(CPU_FTR_NOEXECUTE))) { @@ -110,6 +107,8 @@ static pte_t set_pte_filter(pte_t pte) if (mmu_has_feature(MMU_FTR_HPTE_TABLE)) return set_pte_filter_hash(pte); + else if (radix_enabled()) + return pte; /* No exec permission in the first place, move on */ if (!pte_exec(pte) || !pte_looks_normal(pte)) @@ -140,7 +139,7 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma, { struct page *pg; - if (mmu_has_feature(MMU_FTR_HPTE_TABLE)) + if (mmu_has_feature(MMU_FTR_HPTE_TABLE) || radix_enabled()) return pte; /* So here, we only care about exec faults, as we use them -- 2.21.0
[PATCH 3/3] powerpc/mm: Remove radix dependency on HugeTLB page
Now that we have switched the page table walk to use pmd_is_leaf we can now revert commit 8adddf349fda ("powerpc/mm/radix: Make Radix require HUGETLB_PAGE") Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/platforms/Kconfig.cputype | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index d0e172d47574..fa6b03205ae1 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -330,7 +330,7 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK config PPC_RADIX_MMU bool "Radix MMU Support" - depends on PPC_BOOK3S_64 && HUGETLB_PAGE + depends on PPC_BOOK3S_64 select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA select PPC_HAVE_KUEP select PPC_HAVE_KUAP -- 2.21.0
[PATCH 2/3] powerpc/mm: pmd_devmap implies pmd_large().
large devmap usage is dependent on THP. Hence once check is sufficient. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/mm/book3s64/pgtable.c | 2 +- arch/powerpc/mm/pgtable_64.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 16bda049187a..471cea271dd3 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -76,7 +76,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr, WARN_ON(pte_hw_valid(pmd_pte(*pmdp)) && !pte_protnone(pmd_pte(*pmdp))); assert_spin_locked(pmd_lockptr(mm, pmdp)); - WARN_ON(!(pmd_large(pmd) || pmd_devmap(pmd))); + WARN_ON(!(pmd_large(pmd))); #endif trace_hugepage_set_pmd(addr, pmd_val(pmd)); return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd)); diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c index bfb47b037a2f..c1541371f224 100644 --- a/arch/powerpc/mm/pgtable_64.c +++ b/arch/powerpc/mm/pgtable_64.c @@ -320,7 +320,7 @@ struct page *pud_page(pud_t pud) struct page *pmd_page(pmd_t pmd) { if (pmd_is_leaf(pmd)) { - VM_WARN_ON(!(pmd_large(pmd) || pmd_huge(pmd) || pmd_devmap(pmd))); + VM_WARN_ON(!(pmd_large(pmd) || pmd_huge(pmd))); return pte_page(pmd_pte(pmd)); } return virt_to_page(pmd_page_vaddr(pmd)); -- 2.21.0
[PATCH 1/3] powerpc/book3s: Use config independent helpers for page table walk
Even when we have HugeTLB and THP disabled, kernel linear map can still be mapped with hugepages. This is only an issue with radix translation because hash MMU doesn't map kernel linear range in linux page table and other kernel map areas are not mapped using hugepage. Add config independent helpers and put WARN_ON() when we don't expect things to be mapped via hugepages. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/64/pgtable.h | 21 +++ arch/powerpc/include/asm/pgtable.h | 24 + arch/powerpc/include/asm/pte-walk.h | 28 ++-- arch/powerpc/kvm/book3s_64_mmu_radix.c | 12 +++-- arch/powerpc/mm/book3s64/radix_pgtable.c | 10 +++ arch/powerpc/mm/pgtable.c| 16 +-- arch/powerpc/mm/pgtable_64.c | 12 ++--- arch/powerpc/mm/ptdump/ptdump.c | 6 ++--- arch/powerpc/xmon/xmon.c | 6 ++--- 9 files changed, 102 insertions(+), 33 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 7dede2e34b70..f402b2a96ef3 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1313,5 +1313,26 @@ static inline bool is_pte_rw_upgrade(unsigned long old_val, unsigned long new_va return false; } +/* + * Like pmd_huge() and pmd_large(), but works regardless of config options + */ +#define pmd_is_leaf pmd_is_leaf +static inline bool pmd_is_leaf(pmd_t pmd) +{ + return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE)); +} + +#define pud_is_leaf pud_is_leaf +static inline bool pud_is_leaf(pud_t pud) +{ + return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE)); +} + +#define pgd_is_leaf pgd_is_leaf +static inline bool pgd_is_leaf(pgd_t pgd) +{ + return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE)); +} + #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */ diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 3f53be60fb01..bf7d771f342e 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -140,6 +140,30 @@ static inline void pte_frag_set(mm_context_t *ctx, void *p) } #endif +#ifndef pmd_is_leaf +#define pmd_is_leaf pmd_is_leaf +static inline bool pmd_is_leaf(pmd_t pmd) +{ + return false; +} +#endif + +#ifndef pud_is_leaf +#define pud_is_leaf pud_is_leaf +static inline bool pud_is_leaf(pud_t pud) +{ + return false; +} +#endif + +#ifndef pgd_is_leaf +#define pgd_is_leaf pgd_is_leaf +static inline bool pgd_is_leaf(pgd_t pgd) +{ + return false; +} +#endif + #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_PGTABLE_H */ diff --git a/arch/powerpc/include/asm/pte-walk.h b/arch/powerpc/include/asm/pte-walk.h index 2d633e9d686c..33fa5dd8ee6a 100644 --- a/arch/powerpc/include/asm/pte-walk.h +++ b/arch/powerpc/include/asm/pte-walk.h @@ -10,8 +10,20 @@ extern pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea, static inline pte_t *find_linux_pte(pgd_t *pgdir, unsigned long ea, bool *is_thp, unsigned *hshift) { + pte_t *pte; + VM_WARN(!arch_irqs_disabled(), "%s called with irq enabled\n", __func__); - return __find_linux_pte(pgdir, ea, is_thp, hshift); + pte = __find_linux_pte(pgdir, ea, is_thp, hshift); + +#if defined(CONFIG_DEBUG_VM) && \ + !(defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)) + /* +* We should not find huge page if these configs are not enabled. +*/ + if (hshift) + WARN_ON(*hshift); +#endif + return pte; } static inline pte_t *find_init_mm_pte(unsigned long ea, unsigned *hshift) @@ -26,10 +38,22 @@ static inline pte_t *find_init_mm_pte(unsigned long ea, unsigned *hshift) static inline pte_t *find_current_mm_pte(pgd_t *pgdir, unsigned long ea, bool *is_thp, unsigned *hshift) { + pte_t *pte; + VM_WARN(!arch_irqs_disabled(), "%s called with irq enabled\n", __func__); VM_WARN(pgdir != current->mm->pgd, "%s lock less page table lookup called on wrong mm\n", __func__); - return __find_linux_pte(pgdir, ea, is_thp, hshift); + pte = __find_linux_pte(pgdir, ea, is_thp, hshift); + +#if defined(CONFIG_DEBUG_VM) && \ + !(defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)) + /* +* We should not find huge page if these configs are not enabled. +*/ + if (hshift) + WARN_ON(*hshift); +#endif + return pte; } #endif /* _ASM_POWERPC_PTE_WALK_H */ diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index f55ef071883f..91efee7f0329 100644 --- a/arch/powerpc/kvm/book
Re: [PATCH 1/2] [PowerPC] Add simd.h implementation
On Mon, 2019-05-13 at 22:44 -0300, Shawn Landden wrote: > + > +/* > + * Were we in user mode when we were > + * interrupted? > + * > + * Doing kernel_altivec/vsx_begin/end() is ok if we are running > + * in an interrupt context from user mode - we'll just > + * save the FPU state as required. > + */ > +static bool interrupted_user_mode(void) > +{ > + struct pt_regs *regs = get_irq_regs(); > + > + return regs && user_mode(regs); > +} > + That's interesting it *could* work but we'll have to careful audit the code to make sure thats ok. We probably also want to handle the case where the CPU is in the idle loop. Do we always save the user state when switching out these days ? If yes, then there's no "live" state to worry about... Cheers, Ben.
Re: [PATCH v2] powerpc: Fix compile issue with force DAWR
Le 14/05/2019 à 06:47, Michael Neuling a écrit : On Mon, 2019-05-13 at 11:08 +0200, Christophe Leroy wrote: Le 13/05/2019 à 09:17, Michael Neuling a écrit : If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail at linking with: arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference to `dawr_force_enable' This was caused by commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option"). This puts more of the dawr infrastructure in a new file. I think you are doing a bit more than that. I think you should explain that you define a new CONFIG_ option, when it is selected, etc ... The commit you are referring to is talking about P9. It looks like your patch covers a lot more, so it should be mentionned her I guess. Not really. It looks like I'm doing a lot but I'm really just moving code around to deal with the ugliness of a bunch of config options and dependencies. Signed-off-by: Michael Neuling You should add a Fixes: tag, ie: Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") Ok -- v2: Fixes based on Christophe Leroy's comments: - Fix commit message formatting - Move more DAWR code into dawr.c --- arch/powerpc/Kconfig | 5 ++ arch/powerpc/include/asm/hw_breakpoint.h | 20 --- arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/dawr.c | 75 arch/powerpc/kernel/hw_breakpoint.c | 56 -- arch/powerpc/kvm/Kconfig | 1 + 6 files changed, 94 insertions(+), 64 deletions(-) create mode 100644 arch/powerpc/kernel/dawr.c diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 2711aac246..f4b19e48cc 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -242,6 +242,7 @@ config PPC select SYSCTL_EXCEPTION_TRACE select THREAD_INFO_IN_TASK select VIRT_TO_BUS if !PPC64 + select PPC_DAWR_FORCE_ENABLEif PPC64 || PERF What's PERF ? Did you mean PERF_EVENTS ? Then what you mean is: - Selected all the time for PPC64 - Selected for PPC32 when PERF is also selected. Is that what you want ? At first I thought it was linked to P9. This is wrong. I think we just want PPC64. PERF is a red herring. Are you sure ? Michael suggested PERF || KVM as far as I remember. And ... did you read below statement ? Clearly not :-) # # Please keep this list sorted alphabetically. # @@ -369,6 +370,10 @@ config PPC_ADV_DEBUG_DAC_RANGE depends on PPC_ADV_DEBUG_REGS && 44x default y +config PPC_DAWR_FORCE_ENABLE + bool + default y + Why defaulting it to y. Then how is it set to n ? Good point. config ZONE_DMA bool default y if PPC_BOOK3E_64 diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index 0fe8c1e46b..ffbc8eab41 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -47,6 +47,8 @@ struct arch_hw_breakpoint { #define HW_BRK_TYPE_PRIV_ALL (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL | \ HW_BRK_TYPE_HYP) +extern int set_dawr(struct arch_hw_breakpoint *brk); + #ifdef CONFIG_HAVE_HW_BREAKPOINT #include #include @@ -90,18 +92,20 @@ static inline void hw_breakpoint_disable(void) extern void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs); int hw_breakpoint_handler(struct die_args *args); -extern int set_dawr(struct arch_hw_breakpoint *brk); -extern bool dawr_force_enable; -static inline bool dawr_enabled(void) -{ - return dawr_force_enable; -} - That's a very simple function, why not keep it here (or in another .h) as 'static inline' ? Sure. #else/* CONFIG_HAVE_HW_BREAKPOINT */ static inline void hw_breakpoint_disable(void) { } static inline void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs) { } -static inline bool dawr_enabled(void) { return false; } + #endif /* CONFIG_HAVE_HW_BREAKPOINT */ + +extern bool dawr_force_enable; + +#ifdef CONFIG_PPC_DAWR_FORCE_ENABLE +extern bool dawr_enabled(void); Functions should not be 'extern'. I'm sure checkpatch --strict will tell you. That's not what's currently being done in this header file. I'm keeping with the style of that file. style is not defined on a per file basis. There is the style from the past and the nowadays style. If you keep old style just because the file includes old style statements, then the code will never improve. All new patches should come with clean 'checkpatch' report and follow new style. Having mixed styles in a file is not a problem, that's the way to improvement. See arch/powerpc/mm/mmu_decl.h as an exemple. +#else +#define dawr_enabled() true true by default ? Previously it was false
Re: [Bug 203597] New: kernel 4.9.175 fails to boot on a PowerMac G4 3,6 at early stage
Hi Greg, Could you please apply b45ba4a51cde29b2939365ef0c07ad34c8321789 to 4.9 since 51c3c62b58b357e8d35e4cc32f7b4ec907426fe3 was applied allthought marked for #4.13+ Thanks Christophe Le 13/05/2019 à 22:18, bugzilla-dae...@bugzilla.kernel.org a écrit : https://bugzilla.kernel.org/show_bug.cgi?id=203597 Bug ID: 203597 Summary: kernel 4.9.175 fails to boot on a PowerMac G4 3,6 at early stage Product: Platform Specific/Hardware Version: 2.5 Kernel Version: 4.9.175 Hardware: PPC-32 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: PPC-32 Assignee: platform_ppc...@kernel-bugs.osdl.org Reporter: erhar...@mailbox.org Regression: No Created attachment 282743 --> https://bugzilla.kernel.org/attachment.cgi?id=282743&action=edit kernel .config (PowerMac G4 MDD) Trying out older kernels on the G4 MDD I noticed recent 4.9.x freeze the machine. Only message displayed in black letters on a white screen: done found display : /pco@f000/ATY,AlteracParent@10/ATY,Alterac_B@1, opening... It's a hard freeze, RCU_CPU_STALL_TIMEOUT does not kick in. Tried other stable kernels, which all worked: 4.19.37 4.14.114 4.4.179 So I suppose it's only a 4.9.x issue. Last working 4.9.x kernel I had in service was 4.9.161. First one I spotted freezing was 4.9.171. A bisect gave me the following commit: 1c38a84d45862be06ac418618981631eddbda741 is the first bad commit commit 1c38a84d45862be06ac418618981631eddbda741 Author: Michael Neuling Date: Thu Apr 11 21:45:59 2019 +1000 powerpc: Avoid code patching freed init sections commit 51c3c62b58b357e8d35e4cc32f7b4ec907426fe3 upstream. This stops us from doing code patching in init sections after they've been freed. In this chain: kvm_guest_init() -> kvm_use_magic_page() -> fault_in_pages_readable() -> __get_user() -> __get_user_nocheck() -> barrier_nospec(); We have a code patching location at barrier_nospec() and kvm_guest_init() is an init function. This whole chain gets inlined, so when we free the init section (hence kvm_guest_init()), this code goes away and hence should no longer be patched. We seen this as userspace memory corruption when using a memory checker while doing partition migration testing on powervm (this starts the code patching post migration via /sys/kernel/mobility/migration). In theory, it could also happen when using /sys/kernel/debug/powerpc/barrier_nospec. Cc: sta...@vger.kernel.org # 4.13+ Signed-off-by: Michael Neuling Reviewed-by: Nicholas Piggin Reviewed-by: Christophe Leroy Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin
Re: [PATCH v1 4/8] soc/fsl/qbman: Use index when accessing device tree properties
On Mon, 2019-05-13 at 17:40 +, Roy Pledge wrote: > CAUTION: This email originated from outside of the organization. Do not click > links or open attachments unless you recognize the sender and know the > content is safe. > > > On 5/13/2019 12:40 PM, Joakim Tjernlund wrote: > > On Mon, 2019-05-13 at 16:09 +, Roy Pledge wrote: > > > The index value should be passed to the of_parse_phandle() > > > function to ensure the correct property is read. > > Is this a bug fix? Maybe for stable too? > > > > Jocke > Yes this could go to stable. I will include sta...@vger.kernel.org when > I send the next version. I think you need to send this patch separately then. The whole series cannot go to stable. Jocke > > > Signed-off-by: Roy Pledge > > > --- > > > drivers/soc/fsl/qbman/dpaa_sys.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/soc/fsl/qbman/dpaa_sys.c > > > b/drivers/soc/fsl/qbman/dpaa_sys.c > > > index 3e0a7f3..0b901a8 100644 > > > --- a/drivers/soc/fsl/qbman/dpaa_sys.c > > > +++ b/drivers/soc/fsl/qbman/dpaa_sys.c > > > @@ -49,7 +49,7 @@ int qbman_init_private_mem(struct device *dev, int idx, > > > dma_addr_t *addr, > > > idx, ret); > > > return -ENODEV; > > > } > > > - mem_node = of_parse_phandle(dev->of_node, "memory-region", 0); > > > + mem_node = of_parse_phandle(dev->of_node, "memory-region", idx); > > > if (mem_node) { > > > ret = of_property_read_u64(mem_node, "size", &size64); > > > if (ret) { > > > -- > > > 2.7.4 > > >
Re: [PATCH v2] powerpc: Fix compile issue with force DAWR
On Mon, 2019-05-13 at 11:08 +0200, Christophe Leroy wrote: > > Le 13/05/2019 à 09:17, Michael Neuling a écrit : > > If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail > > at linking with: > >arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined > > reference to `dawr_force_enable' > > > > This was caused by commit c1fe190c0672 ("powerpc: Add force enable of > > DAWR on P9 option"). > > > > This puts more of the dawr infrastructure in a new file. > > I think you are doing a bit more than that. I think you should explain > that you define a new CONFIG_ option, when it is selected, etc ... > > The commit you are referring to is talking about P9. It looks like your > patch covers a lot more, so it should be mentionned her I guess. Not really. It looks like I'm doing a lot but I'm really just moving code around to deal with the ugliness of a bunch of config options and dependencies. > > Signed-off-by: Michael Neuling > > You should add a Fixes: tag, ie: > > Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") Ok > > > -- > > v2: > >Fixes based on Christophe Leroy's comments: > >- Fix commit message formatting > >- Move more DAWR code into dawr.c > > --- > > arch/powerpc/Kconfig | 5 ++ > > arch/powerpc/include/asm/hw_breakpoint.h | 20 --- > > arch/powerpc/kernel/Makefile | 1 + > > arch/powerpc/kernel/dawr.c | 75 > > arch/powerpc/kernel/hw_breakpoint.c | 56 -- > > arch/powerpc/kvm/Kconfig | 1 + > > 6 files changed, 94 insertions(+), 64 deletions(-) > > create mode 100644 arch/powerpc/kernel/dawr.c > > > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > > index 2711aac246..f4b19e48cc 100644 > > --- a/arch/powerpc/Kconfig > > +++ b/arch/powerpc/Kconfig > > @@ -242,6 +242,7 @@ config PPC > > select SYSCTL_EXCEPTION_TRACE > > select THREAD_INFO_IN_TASK > > select VIRT_TO_BUS if !PPC64 > > + select PPC_DAWR_FORCE_ENABLEif PPC64 || PERF > > What's PERF ? Did you mean PERF_EVENTS ? > > Then what you mean is: > - Selected all the time for PPC64 > - Selected for PPC32 when PERF is also selected. > > Is that what you want ? At first I thought it was linked to P9. This is wrong. I think we just want PPC64. PERF is a red herring. > And ... did you read below statement ? Clearly not :-) > > > # > > # Please keep this list sorted alphabetically. > > # > > @@ -369,6 +370,10 @@ config PPC_ADV_DEBUG_DAC_RANGE > > depends on PPC_ADV_DEBUG_REGS && 44x > > default y > > > > +config PPC_DAWR_FORCE_ENABLE > > + bool > > + default y > > + > > Why defaulting it to y. Then how is it set to n ? Good point. > > > config ZONE_DMA > > bool > > default y if PPC_BOOK3E_64 > > diff --git a/arch/powerpc/include/asm/hw_breakpoint.h > > b/arch/powerpc/include/asm/hw_breakpoint.h > > index 0fe8c1e46b..ffbc8eab41 100644 > > --- a/arch/powerpc/include/asm/hw_breakpoint.h > > +++ b/arch/powerpc/include/asm/hw_breakpoint.h > > @@ -47,6 +47,8 @@ struct arch_hw_breakpoint { > > #define HW_BRK_TYPE_PRIV_ALL (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL | > > \ > > HW_BRK_TYPE_HYP) > > > > +extern int set_dawr(struct arch_hw_breakpoint *brk); > > + > > #ifdef CONFIG_HAVE_HW_BREAKPOINT > > #include > > #include > > @@ -90,18 +92,20 @@ static inline void hw_breakpoint_disable(void) > > extern void thread_change_pc(struct task_struct *tsk, struct pt_regs > > *regs); > > int hw_breakpoint_handler(struct die_args *args); > > > > -extern int set_dawr(struct arch_hw_breakpoint *brk); > > -extern bool dawr_force_enable; > > -static inline bool dawr_enabled(void) > > -{ > > - return dawr_force_enable; > > -} > > - > > That's a very simple function, why not keep it here (or in another .h) > as 'static inline' ? Sure. > > #else /* CONFIG_HAVE_HW_BREAKPOINT */ > > static inline void hw_breakpoint_disable(void) { } > > static inline void thread_change_pc(struct task_struct *tsk, > > struct pt_regs *regs) { } > > -static inline bool dawr_enabled(void) { return false; } > > + > > #endif/* CONFIG_HAVE_HW_BREAKPOINT */ > > + > > +extern bool dawr_force_enable; > > + > > +#ifdef CONFIG_PPC_DAWR_FORCE_ENABLE > > +extern bool dawr_enabled(void); > > Functions should not be 'extern'. I'm sure checkpatch --strict will tell > you. That's not what's currently being done in this header file. I'm keeping with the style of that file. > > +#else > > +#define dawr_enabled() true > > true by default ? > Previously it was false by default. Thanks, yeah that's wrong but I need to rethink the config option to make it CONFIG_PPC_DAWR. This patch is far more difficult than it should be due to the mess that CONFIG_HAVE_HW_BREAKPOINT and CONFIG_PPC_ADV_DEBUG_REGS cr
Re: [PATCH] mm/nvdimm: Use correct #defines instead of opencoding
On 5/14/19 9:42 AM, Dan Williams wrote: On Mon, May 13, 2019 at 9:05 PM Aneesh Kumar K.V wrote: On 5/14/19 9:28 AM, Dan Williams wrote: On Mon, May 13, 2019 at 7:56 PM Aneesh Kumar K.V wrote: The nfpn related change is needed to fix the kernel message "number of pfns truncated from 2617344 to 163584" The change makes sure the nfpns stored in the superblock is right value. Signed-off-by: Aneesh Kumar K.V --- drivers/nvdimm/pfn_devs.c| 6 +++--- drivers/nvdimm/region_devs.c | 8 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 347cab166376..6751ff0296ef 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) * when populating the vmemmap. This *should* be equal to * PMD_SIZE for most architectures. */ - offset = ALIGN(start + reserve + 64 * npfns, - max(nd_pfn->align, PMD_SIZE)) - start; + offset = ALIGN(start + reserve + sizeof(struct page) * npfns, + max(nd_pfn->align, PMD_SIZE)) - start; No, I think we need to record the page-size into the superblock format otherwise this breaks in debug builds where the struct-page size is extended. } else if (nd_pfn->mode == PFN_MODE_RAM) offset = ALIGN(start + reserve, nd_pfn->align) - start; else @@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) return -ENXIO; } - npfns = (size - offset - start_pad - end_trunc) / SZ_4K; + npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE; Similar comment, if the page size is variable then the superblock needs to explicitly account for it. PAGE_SIZE is not really variable. What we can run into is the issue you mentioned above. The size of struct page can change which means the reserved space for keeping vmemmap in device may not be sufficient for certain kernel builds. I was planning to add another patch that fails namespace init if we don't have enough space to keep the struct page. Why do you suggest we need to have PAGE_SIZE as part of pfn superblock? So that the kernel has a chance to identify cases where the superblock it is handling was created on a system with different PAGE_SIZE assumptions. The reason to do that is we don't have enough space to keep struct page backing the total number of pfns? If so, what i suggested above should handle that. or are you finding any other reason why we should fail a namespace init with a different PAGE_SIZE value? My another patch handle the details w.r.t devdax alignment for which devdax got created with PAGE_SIZE 4K but we are now trying to load that in a kernel with PAGE_SIZE 64k. -aneesh
[Bug 203597] kernel 4.9.175 fails to boot on a PowerMac G4 3,6 at early stage
https://bugzilla.kernel.org/show_bug.cgi?id=203597 Christophe Leroy (christophe.le...@c-s.fr) changed: What|Removed |Added CC||christophe.le...@c-s.fr --- Comment #2 from Christophe Leroy (christophe.le...@c-s.fr) --- You are missing following commit: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b45ba4a51cd -- You are receiving this mail because: You are watching the assignee of the bug.
Re: [PATCH] mm/nvdimm: Use correct alignment when looking at first pfn from a region
On 5/14/19 9:59 AM, Dan Williams wrote: On Mon, May 13, 2019 at 7:55 PM Aneesh Kumar K.V wrote: We already add the start_pad to the resource->start but fails to section align the start. This make sure with altmap we compute the right first pfn when start_pad is zero and we are doing an align down of start address. Signed-off-by: Aneesh Kumar K.V --- kernel/memremap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/memremap.c b/kernel/memremap.c index a856cb5ff192..23d77b60e728 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -59,9 +59,9 @@ static unsigned long pfn_first(struct dev_pagemap *pgmap) { const struct resource *res = &pgmap->res; struct vmem_altmap *altmap = &pgmap->altmap; - unsigned long pfn; + unsigned long pfn = PHYS_PFN(res->start); - pfn = res->start >> PAGE_SHIFT; + pfn = SECTION_ALIGN_DOWN(pfn); This does not seem right to me it breaks the assumptions of where the first expected valid pfn occurs in the passed in range. How do we define the first valid pfn? Isn't that at pfn_sb->dataoff ? -aneesh
Re: [RFC PATCH] mm/nvdimm: Fix kernel crash on devm_mremap_pages_release
On 5/14/19 9:45 AM, Dan Williams wrote: [ add Keith who was looking at something similar ] On Mon, May 13, 2019 at 7:54 PM Aneesh Kumar K.V wrote: When we initialize the namespace, if we support altmap, we don't initialize all the backing struct page where as while releasing the namespace we look at some of these uninitilized struct page. This results in a kernel crash as below. kernel BUG at include/linux/mm.h:1034! cpu 0x2: Vector: 700 (Program Check) at [c0024146b870] pc: c03788f8: devm_memremap_pages_release+0x258/0x3a0 lr: c03788f4: devm_memremap_pages_release+0x254/0x3a0 sp: c0024146bb00 msr: 8282b033 current = 0xc00241382f00 paca= 0xc0003fffd680 irqmask: 0x03 irq_happened: 0x01 pid = 4114, comm = ndctl c09bf8c0 devm_action_release+0x30/0x50 c09c0938 release_nodes+0x268/0x2d0 c09b95b4 device_release_driver_internal+0x164/0x230 c09b638c unbind_store+0x13c/0x190 c09b4f44 drv_attr_store+0x44/0x60 c058ccc0 sysfs_kf_write+0x70/0xa0 c058b52c kernfs_fop_write+0x1ac/0x290 c04a415c __vfs_write+0x3c/0x70 c04a85ac vfs_write+0xec/0x200 c04a8920 ksys_write+0x80/0x130 c000bee4 system_call+0x5c/0x70 Signed-off-by: Aneesh Kumar K.V --- mm/page_alloc.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 59661106da16..892eabe1ec13 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5740,8 +5740,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, #ifdef CONFIG_ZONE_DEVICE /* -* Honor reservation requested by the driver for this ZONE_DEVICE -* memory. We limit the total number of pages to initialize to just +* We limit the total number of pages to initialize to just * those that might contain the memory mapping. We will defer the * ZONE_DEVICE page initialization until after we have released * the hotplug lock. @@ -5750,8 +5749,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, if (!altmap) return; - if (start_pfn == altmap->base_pfn) - start_pfn += altmap->reserve; If it's reserved then we should not be accessing, even if the above works in practice. Isn't the fix something more like this to fix up the assumptions at release time? diff --git a/kernel/memremap.c b/kernel/memremap.c index a856cb5ff192..9074ba14572c 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -90,6 +90,7 @@ static void devm_memremap_pages_release(void *data) struct device *dev = pgmap->dev; struct resource *res = &pgmap->res; resource_size_t align_start, align_size; + struct vmem_altmap *altmap = pgmap->altmap_valid ? &pgmap->altmap : NULL; unsigned long pfn; int nid; @@ -102,7 +103,10 @@ static void devm_memremap_pages_release(void *data) align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE) - align_start; - nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT)); + pfn = align_start >> PAGE_SHIFT; + if (altmap) + pfn += vmem_altmap_offset(altmap); + nid = page_to_nid(pfn_to_page(pfn)); mem_hotplug_begin(); if (pgmap->type == MEMORY_DEVICE_PRIVATE) { @@ -110,8 +114,7 @@ static void devm_memremap_pages_release(void *data) __remove_pages(page_zone(pfn_to_page(pfn)), pfn, align_size >> PAGE_SHIFT, NULL); } else { - arch_remove_memory(nid, align_start, align_size, - pgmap->altmap_valid ? &pgmap->altmap : NULL); + arch_remove_memory(nid, align_start, align_size, altmap); kasan_remove_zero_shadow(__va(align_start), align_size); } mem_hotplug_done(); I did try that first. I was not sure about that. From the memory add vs remove perspective. devm_memremap_pages: align_start = res->start & ~(SECTION_SIZE - 1); align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE) - align_start; align_end = align_start + align_size - 1; error = arch_add_memory(nid, align_start, align_size, altmap, false); devm_memremap_pages_release: /* pages are dead and unused, undo the arch mapping */ align_start = res->start & ~(SECTION_SIZE - 1); align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE) - align_start; arch_remove_memory(nid, align_start, align_size, pgmap->altmap_valid ? &pgmap->altmap : NULL); Now if we are fixing the memremap_pages_release, shouldn't we adjust alig_start w.r.t memremap_pages too? and I was not sure what that means w.r.t add/remove alignment requirements. What is the intended usage of reserve area? I guess we want that part to be added? if so shouldn't we remove them? -aneesh
Re: [PATCH] mm/nvdimm: Use correct alignment when looking at first pfn from a region
On Mon, May 13, 2019 at 7:55 PM Aneesh Kumar K.V wrote: > > We already add the start_pad to the resource->start but fails to section > align the start. This make sure with altmap we compute the right first > pfn when start_pad is zero and we are doing an align down of start address. > > Signed-off-by: Aneesh Kumar K.V > --- > kernel/memremap.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/memremap.c b/kernel/memremap.c > index a856cb5ff192..23d77b60e728 100644 > --- a/kernel/memremap.c > +++ b/kernel/memremap.c > @@ -59,9 +59,9 @@ static unsigned long pfn_first(struct dev_pagemap *pgmap) > { > const struct resource *res = &pgmap->res; > struct vmem_altmap *altmap = &pgmap->altmap; > - unsigned long pfn; > + unsigned long pfn = PHYS_PFN(res->start); > > - pfn = res->start >> PAGE_SHIFT; > + pfn = SECTION_ALIGN_DOWN(pfn); This does not seem right to me it breaks the assumptions of where the first expected valid pfn occurs in the passed in range.
Re: [RFC PATCH] mm/nvdimm: Fix kernel crash on devm_mremap_pages_release
[ add Keith who was looking at something similar ] On Mon, May 13, 2019 at 7:54 PM Aneesh Kumar K.V wrote: > > When we initialize the namespace, if we support altmap, we don't initialize > all the > backing struct page where as while releasing the namespace we look at some of > these uninitilized struct page. This results in a kernel crash as below. > > kernel BUG at include/linux/mm.h:1034! > cpu 0x2: Vector: 700 (Program Check) at [c0024146b870] > pc: c03788f8: devm_memremap_pages_release+0x258/0x3a0 > lr: c03788f4: devm_memremap_pages_release+0x254/0x3a0 > sp: c0024146bb00 >msr: 8282b033 > current = 0xc00241382f00 > paca= 0xc0003fffd680 irqmask: 0x03 irq_happened: 0x01 > pid = 4114, comm = ndctl > c09bf8c0 devm_action_release+0x30/0x50 > c09c0938 release_nodes+0x268/0x2d0 > c09b95b4 device_release_driver_internal+0x164/0x230 > c09b638c unbind_store+0x13c/0x190 > c09b4f44 drv_attr_store+0x44/0x60 > c058ccc0 sysfs_kf_write+0x70/0xa0 > c058b52c kernfs_fop_write+0x1ac/0x290 > c04a415c __vfs_write+0x3c/0x70 > c04a85ac vfs_write+0xec/0x200 > c04a8920 ksys_write+0x80/0x130 > c000bee4 system_call+0x5c/0x70 > > Signed-off-by: Aneesh Kumar K.V > --- > mm/page_alloc.c | 5 + > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 59661106da16..892eabe1ec13 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5740,8 +5740,7 @@ void __meminit memmap_init_zone(unsigned long size, int > nid, unsigned long zone, > > #ifdef CONFIG_ZONE_DEVICE > /* > -* Honor reservation requested by the driver for this ZONE_DEVICE > -* memory. We limit the total number of pages to initialize to just > +* We limit the total number of pages to initialize to just > * those that might contain the memory mapping. We will defer the > * ZONE_DEVICE page initialization until after we have released > * the hotplug lock. > @@ -5750,8 +5749,6 @@ void __meminit memmap_init_zone(unsigned long size, int > nid, unsigned long zone, > if (!altmap) > return; > > - if (start_pfn == altmap->base_pfn) > - start_pfn += altmap->reserve; If it's reserved then we should not be accessing, even if the above works in practice. Isn't the fix something more like this to fix up the assumptions at release time? diff --git a/kernel/memremap.c b/kernel/memremap.c index a856cb5ff192..9074ba14572c 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -90,6 +90,7 @@ static void devm_memremap_pages_release(void *data) struct device *dev = pgmap->dev; struct resource *res = &pgmap->res; resource_size_t align_start, align_size; + struct vmem_altmap *altmap = pgmap->altmap_valid ? &pgmap->altmap : NULL; unsigned long pfn; int nid; @@ -102,7 +103,10 @@ static void devm_memremap_pages_release(void *data) align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE) - align_start; - nid = page_to_nid(pfn_to_page(align_start >> PAGE_SHIFT)); + pfn = align_start >> PAGE_SHIFT; + if (altmap) + pfn += vmem_altmap_offset(altmap); + nid = page_to_nid(pfn_to_page(pfn)); mem_hotplug_begin(); if (pgmap->type == MEMORY_DEVICE_PRIVATE) { @@ -110,8 +114,7 @@ static void devm_memremap_pages_release(void *data) __remove_pages(page_zone(pfn_to_page(pfn)), pfn, align_size >> PAGE_SHIFT, NULL); } else { - arch_remove_memory(nid, align_start, align_size, - pgmap->altmap_valid ? &pgmap->altmap : NULL); + arch_remove_memory(nid, align_start, align_size, altmap); kasan_remove_zero_shadow(__va(align_start), align_size); } mem_hotplug_done();
Re: [PATCH] mm/nvdimm: Use correct #defines instead of opencoding
On Mon, May 13, 2019 at 9:05 PM Aneesh Kumar K.V wrote: > > On 5/14/19 9:28 AM, Dan Williams wrote: > > On Mon, May 13, 2019 at 7:56 PM Aneesh Kumar K.V > > wrote: > >> > >> The nfpn related change is needed to fix the kernel message > >> > >> "number of pfns truncated from 2617344 to 163584" > >> > >> The change makes sure the nfpns stored in the superblock is right value. > >> > >> Signed-off-by: Aneesh Kumar K.V > >> --- > >> drivers/nvdimm/pfn_devs.c| 6 +++--- > >> drivers/nvdimm/region_devs.c | 8 > >> 2 files changed, 7 insertions(+), 7 deletions(-) > >> > >> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c > >> index 347cab166376..6751ff0296ef 100644 > >> --- a/drivers/nvdimm/pfn_devs.c > >> +++ b/drivers/nvdimm/pfn_devs.c > >> @@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > >> * when populating the vmemmap. This *should* be equal to > >> * PMD_SIZE for most architectures. > >> */ > >> - offset = ALIGN(start + reserve + 64 * npfns, > >> - max(nd_pfn->align, PMD_SIZE)) - start; > >> + offset = ALIGN(start + reserve + sizeof(struct page) * > >> npfns, > >> + max(nd_pfn->align, PMD_SIZE)) - start; > > > > No, I think we need to record the page-size into the superblock format > > otherwise this breaks in debug builds where the struct-page size is > > extended. > > > >> } else if (nd_pfn->mode == PFN_MODE_RAM) > >> offset = ALIGN(start + reserve, nd_pfn->align) - start; > >> else > >> @@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > >> return -ENXIO; > >> } > >> > >> - npfns = (size - offset - start_pad - end_trunc) / SZ_4K; > >> + npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE; > > > > Similar comment, if the page size is variable then the superblock > > needs to explicitly account for it. > > > > PAGE_SIZE is not really variable. What we can run into is the issue you > mentioned above. The size of struct page can change which means the > reserved space for keeping vmemmap in device may not be sufficient for > certain kernel builds. > > I was planning to add another patch that fails namespace init if we > don't have enough space to keep the struct page. > > Why do you suggest we need to have PAGE_SIZE as part of pfn superblock? So that the kernel has a chance to identify cases where the superblock it is handling was created on a system with different PAGE_SIZE assumptions.
Re: [PATCH] mm/nvdimm: Use correct #defines instead of opencoding
On 5/14/19 9:28 AM, Dan Williams wrote: On Mon, May 13, 2019 at 7:56 PM Aneesh Kumar K.V wrote: The nfpn related change is needed to fix the kernel message "number of pfns truncated from 2617344 to 163584" The change makes sure the nfpns stored in the superblock is right value. Signed-off-by: Aneesh Kumar K.V --- drivers/nvdimm/pfn_devs.c| 6 +++--- drivers/nvdimm/region_devs.c | 8 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 347cab166376..6751ff0296ef 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) * when populating the vmemmap. This *should* be equal to * PMD_SIZE for most architectures. */ - offset = ALIGN(start + reserve + 64 * npfns, - max(nd_pfn->align, PMD_SIZE)) - start; + offset = ALIGN(start + reserve + sizeof(struct page) * npfns, + max(nd_pfn->align, PMD_SIZE)) - start; No, I think we need to record the page-size into the superblock format otherwise this breaks in debug builds where the struct-page size is extended. } else if (nd_pfn->mode == PFN_MODE_RAM) offset = ALIGN(start + reserve, nd_pfn->align) - start; else @@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) return -ENXIO; } - npfns = (size - offset - start_pad - end_trunc) / SZ_4K; + npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE; Similar comment, if the page size is variable then the superblock needs to explicitly account for it. PAGE_SIZE is not really variable. What we can run into is the issue you mentioned above. The size of struct page can change which means the reserved space for keeping vmemmap in device may not be sufficient for certain kernel builds. I was planning to add another patch that fails namespace init if we don't have enough space to keep the struct page. Why do you suggest we need to have PAGE_SIZE as part of pfn superblock? -aneesh
Re: [RFC PATCH] mm/nvdimm: Fix kernel crash on devm_mremap_pages_release
On 05/14/2019 08:23 AM, Aneesh Kumar K.V wrote: > When we initialize the namespace, if we support altmap, we don't initialize > all the > backing struct page where as while releasing the namespace we look at some of > these uninitilized struct page. This results in a kernel crash as below. Yes this has been problematic which I have also previously encountered but in a bit different way (while searching memory resources). > > kernel BUG at include/linux/mm.h:1034! What that would be ? Did not see a corresponding BUG_ON() line in the file. > cpu 0x2: Vector: 700 (Program Check) at [c0024146b870] > pc: c03788f8: devm_memremap_pages_release+0x258/0x3a0 > lr: c03788f4: devm_memremap_pages_release+0x254/0x3a0 > sp: c0024146bb00 >msr: 8282b033 > current = 0xc00241382f00 > paca= 0xc0003fffd680 irqmask: 0x03 irq_happened: 0x01 > pid = 4114, comm = ndctl > c09bf8c0 devm_action_release+0x30/0x50 > c09c0938 release_nodes+0x268/0x2d0 > c09b95b4 device_release_driver_internal+0x164/0x230 > c09b638c unbind_store+0x13c/0x190 > c09b4f44 drv_attr_store+0x44/0x60 > c058ccc0 sysfs_kf_write+0x70/0xa0 > c058b52c kernfs_fop_write+0x1ac/0x290 > c04a415c __vfs_write+0x3c/0x70 > c04a85ac vfs_write+0xec/0x200 > c04a8920 ksys_write+0x80/0x130 > c000bee4 system_call+0x5c/0x70 I saw this as memory hotplug problem with respect to ZONE_DEVICE based device memory. Hence a bit different explanation which I never posted. I guess parts of the commit message here can be used for a better comprehensive explanation of the problem. mm/hotplug: Initialize struct pages for vmem_altmap reserved areas The following ZONE_DEVICE ranges (altmap) have valid struct pages allocated from within device memory memmap range. A. Driver reserved area [BASE -> BASE + RESV) B. Device mmap area [BASE + RESV -> BASE + RESV + FREE] C. Device usable area [BASE + RESV + FREE -> END] BASE - pgmap->altmap.base_pfn (pgmap->res.start >> PAGE_SHIFT) RESV - pgmap->altmap.reserve FREE - pgmap->altmap.free END - pgmap->res->end >> PAGE_SHIFT Struct page init for all areas happens in two phases which detects altmap use case and init parts of the device range in each phase. 1. memmap_init_zone (Device mmap area) 2. memmap_init_zone_device (Device usable area) memmap_init_zone() skips driver reserved area and does not init the struct pages. This is problematic primarily for two reasons. Though NODE_DATA(device_node(dev))->node_zones[ZONE_DEVICE] contains the device memory range in it's entirety (in zone->spanned_pages) parts of this range does not have zone set to ZONE_DEVICE in their struct page. __remove_pages() called directly or from within arch_remove_memory() during ZONE_DEVICE tear down procedure (devm_memremap_pages_release) hits an error (like below) if there are reserved pages. This is because the first pfn of the device range (invariably also the first pfn from reserved area) cannot be identified belonging to ZONE_DEVICE. This erroneously leads range search within iomem_resource region which never had this device memory region. So this eventually ends up flashing the following error. Unable to release resource <0x00068000-0x0006bfff> (-22) Initialize struct pages for the driver reserved range while still staying clear from it's contents. > > Signed-off-by: Aneesh Kumar K.V > --- > mm/page_alloc.c | 5 + > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 59661106da16..892eabe1ec13 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5740,8 +5740,7 @@ void __meminit memmap_init_zone(unsigned long size, int > nid, unsigned long zone, > > #ifdef CONFIG_ZONE_DEVICE > /* > - * Honor reservation requested by the driver for this ZONE_DEVICE > - * memory. We limit the total number of pages to initialize to just > + * We limit the total number of pages to initialize to just Comment needs bit change to reflect on the fact that both driver reserved as well as mapped area (containing altmap struct pages) needs init here.
Re: [PATCH] mm/nvdimm: Use correct #defines instead of opencoding
On Mon, May 13, 2019 at 7:56 PM Aneesh Kumar K.V wrote: > > The nfpn related change is needed to fix the kernel message > > "number of pfns truncated from 2617344 to 163584" > > The change makes sure the nfpns stored in the superblock is right value. > > Signed-off-by: Aneesh Kumar K.V > --- > drivers/nvdimm/pfn_devs.c| 6 +++--- > drivers/nvdimm/region_devs.c | 8 > 2 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c > index 347cab166376..6751ff0296ef 100644 > --- a/drivers/nvdimm/pfn_devs.c > +++ b/drivers/nvdimm/pfn_devs.c > @@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > * when populating the vmemmap. This *should* be equal to > * PMD_SIZE for most architectures. > */ > - offset = ALIGN(start + reserve + 64 * npfns, > - max(nd_pfn->align, PMD_SIZE)) - start; > + offset = ALIGN(start + reserve + sizeof(struct page) * npfns, > + max(nd_pfn->align, PMD_SIZE)) - start; No, I think we need to record the page-size into the superblock format otherwise this breaks in debug builds where the struct-page size is extended. > } else if (nd_pfn->mode == PFN_MODE_RAM) > offset = ALIGN(start + reserve, nd_pfn->align) - start; > else > @@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > return -ENXIO; > } > > - npfns = (size - offset - start_pad - end_trunc) / SZ_4K; > + npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE; Similar comment, if the page size is variable then the superblock needs to explicitly account for it.
[PATCH] mm/nvdimm: Use correct #defines instead of opencoding
The nfpn related change is needed to fix the kernel message "number of pfns truncated from 2617344 to 163584" The change makes sure the nfpns stored in the superblock is right value. Signed-off-by: Aneesh Kumar K.V --- drivers/nvdimm/pfn_devs.c| 6 +++--- drivers/nvdimm/region_devs.c | 8 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 347cab166376..6751ff0296ef 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -777,8 +777,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) * when populating the vmemmap. This *should* be equal to * PMD_SIZE for most architectures. */ - offset = ALIGN(start + reserve + 64 * npfns, - max(nd_pfn->align, PMD_SIZE)) - start; + offset = ALIGN(start + reserve + sizeof(struct page) * npfns, + max(nd_pfn->align, PMD_SIZE)) - start; } else if (nd_pfn->mode == PFN_MODE_RAM) offset = ALIGN(start + reserve, nd_pfn->align) - start; else @@ -790,7 +790,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) return -ENXIO; } - npfns = (size - offset - start_pad - end_trunc) / SZ_4K; + npfns = (size - offset - start_pad - end_trunc) / PAGE_SIZE; pfn_sb->mode = cpu_to_le32(nd_pfn->mode); pfn_sb->dataoff = cpu_to_le64(offset); pfn_sb->npfns = cpu_to_le64(npfns); diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index b4ef7d9ff22e..2d8facea5a03 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -994,10 +994,10 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus, struct nd_mapping_desc *mapping = &ndr_desc->mapping[i]; struct nvdimm *nvdimm = mapping->nvdimm; - if ((mapping->start | mapping->size) % SZ_4K) { - dev_err(&nvdimm_bus->dev, "%s: %s mapping%d is not 4K aligned\n", - caller, dev_name(&nvdimm->dev), i); - + if ((mapping->start | mapping->size) % PAGE_SIZE) { + dev_err(&nvdimm_bus->dev, + "%s: %s mapping%d is not 4K aligned\n", + caller, dev_name(&nvdimm->dev), i); return NULL; } -- 2.21.0
[PATCH] mm/nvdimm: Use correct alignment when looking at first pfn from a region
We already add the start_pad to the resource->start but fails to section align the start. This make sure with altmap we compute the right first pfn when start_pad is zero and we are doing an align down of start address. Signed-off-by: Aneesh Kumar K.V --- kernel/memremap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/memremap.c b/kernel/memremap.c index a856cb5ff192..23d77b60e728 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -59,9 +59,9 @@ static unsigned long pfn_first(struct dev_pagemap *pgmap) { const struct resource *res = &pgmap->res; struct vmem_altmap *altmap = &pgmap->altmap; - unsigned long pfn; + unsigned long pfn = PHYS_PFN(res->start); - pfn = res->start >> PAGE_SHIFT; + pfn = SECTION_ALIGN_DOWN(pfn); if (pgmap->altmap_valid) pfn += vmem_altmap_offset(altmap); return pfn; -- 2.21.0
[PATCH] mm/nvdimm: Pick the right alignment default when creating dax devices
Allow arch to provide the supported alignments and use hugepage alignment only if we support hugepage. Right now we depend on compile time configs whereas this patch switch this to runtime discovery. Architectures like ppc64 can have THP enabled in code, but then can have hugepage size disabled by the hypervisor. This allows us to create dax devices with PAGE_SIZE alignment in this case. Existing dax namespace with alignment larger than PAGE_SIZE will fail to initialize in this specific case. We still allow fsdax namespace initialization. With respect to identifying whether to enable hugepage fault for a dax device, if THP is enabled during compile, we default to taking hugepage fault and in dax fault handler if we find the fault size > alignment we retry with PAGE_SIZE fault size. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/libnvdimm.h | 9 arch/powerpc/mm/Makefile | 1 + arch/powerpc/mm/nvdimm.c | 34 arch/x86/include/asm/libnvdimm.h | 19 drivers/nvdimm/nd.h | 6 - drivers/nvdimm/pfn_devs.c| 32 +- include/linux/huge_mm.h | 7 +- 7 files changed, 100 insertions(+), 8 deletions(-) create mode 100644 arch/powerpc/include/asm/libnvdimm.h create mode 100644 arch/powerpc/mm/nvdimm.c create mode 100644 arch/x86/include/asm/libnvdimm.h diff --git a/arch/powerpc/include/asm/libnvdimm.h b/arch/powerpc/include/asm/libnvdimm.h new file mode 100644 index ..d35fd7f48603 --- /dev/null +++ b/arch/powerpc/include/asm/libnvdimm.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_POWERPC_LIBNVDIMM_H +#define _ASM_POWERPC_LIBNVDIMM_H + +#define nd_pfn_supported_alignments nd_pfn_supported_alignments +extern unsigned long *nd_pfn_supported_alignments(void); +extern unsigned long nd_pfn_default_alignment(void); + +#endif diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile index 0f499db315d6..42e4a399ba5d 100644 --- a/arch/powerpc/mm/Makefile +++ b/arch/powerpc/mm/Makefile @@ -20,3 +20,4 @@ obj-$(CONFIG_HIGHMEM) += highmem.o obj-$(CONFIG_PPC_COPRO_BASE) += copro_fault.o obj-$(CONFIG_PPC_PTDUMP) += ptdump/ obj-$(CONFIG_KASAN)+= kasan/ +obj-$(CONFIG_NVDIMM_PFN) += nvdimm.o diff --git a/arch/powerpc/mm/nvdimm.c b/arch/powerpc/mm/nvdimm.c new file mode 100644 index ..a29a4510715e --- /dev/null +++ b/arch/powerpc/mm/nvdimm.c @@ -0,0 +1,34 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include + +#include +/* + * We support only pte and pmd mappings for now. + */ +const unsigned long *nd_pfn_supported_alignments(void) +{ + static unsigned long supported_alignments[3]; + + supported_alignments[0] = PAGE_SIZE; + + if (has_transparent_hugepage()) + supported_alignments[1] = HPAGE_PMD_SIZE; + else + supported_alignments[1] = 0; + + supported_alignments[2] = 0; + return supported_alignments; +} + +/* + * Use pmd mapping if supported as default alignment + */ +unsigned long nd_pfn_default_alignment(void) +{ + + if (has_transparent_hugepage()) + return HPAGE_PMD_SIZE; + return PAGE_SIZE; +} diff --git a/arch/x86/include/asm/libnvdimm.h b/arch/x86/include/asm/libnvdimm.h new file mode 100644 index ..3d5361db9164 --- /dev/null +++ b/arch/x86/include/asm/libnvdimm.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_LIBNVDIMM_H +#define _ASM_X86_LIBNVDIMM_H + +static inline unsigned long nd_pfn_default_alignment(void) +{ +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + return HPAGE_PMD_SIZE; +#else + return PAGE_SIZE; +#endif +} + +static inline unsigned long nd_altmap_align_size(unsigned long nd_align) +{ + return PMD_SIZE; +} + +#endif diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h index a5ac3b240293..44fe923b2ee3 100644 --- a/drivers/nvdimm/nd.h +++ b/drivers/nvdimm/nd.h @@ -292,12 +292,6 @@ static inline struct device *nd_btt_create(struct nd_region *nd_region) struct nd_pfn *to_nd_pfn(struct device *dev); #if IS_ENABLED(CONFIG_NVDIMM_PFN) -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -#define PFN_DEFAULT_ALIGNMENT HPAGE_PMD_SIZE -#else -#define PFN_DEFAULT_ALIGNMENT PAGE_SIZE -#endif - int nd_pfn_probe(struct device *dev, struct nd_namespace_common *ndns); bool is_nd_pfn(struct device *dev); struct device *nd_pfn_create(struct nd_region *nd_region); diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 01f40672507f..347cab166376 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -18,6 +18,7 @@ #include #include #include +#include #include "nd-core.h" #include "pfn.h" #include "nd.h" @@ -111,6 +112,8 @@ static ssize_t align_show(struct device *dev, return sprintf(buf, "%ld\n", nd_pfn->align); } +#ifndef nd_pfn_supported_alignments +#define nd_
[RFC PATCH] mm/nvdimm: Fix kernel crash on devm_mremap_pages_release
When we initialize the namespace, if we support altmap, we don't initialize all the backing struct page where as while releasing the namespace we look at some of these uninitilized struct page. This results in a kernel crash as below. kernel BUG at include/linux/mm.h:1034! cpu 0x2: Vector: 700 (Program Check) at [c0024146b870] pc: c03788f8: devm_memremap_pages_release+0x258/0x3a0 lr: c03788f4: devm_memremap_pages_release+0x254/0x3a0 sp: c0024146bb00 msr: 8282b033 current = 0xc00241382f00 paca= 0xc0003fffd680 irqmask: 0x03 irq_happened: 0x01 pid = 4114, comm = ndctl c09bf8c0 devm_action_release+0x30/0x50 c09c0938 release_nodes+0x268/0x2d0 c09b95b4 device_release_driver_internal+0x164/0x230 c09b638c unbind_store+0x13c/0x190 c09b4f44 drv_attr_store+0x44/0x60 c058ccc0 sysfs_kf_write+0x70/0xa0 c058b52c kernfs_fop_write+0x1ac/0x290 c04a415c __vfs_write+0x3c/0x70 c04a85ac vfs_write+0xec/0x200 c04a8920 ksys_write+0x80/0x130 c000bee4 system_call+0x5c/0x70 Signed-off-by: Aneesh Kumar K.V --- mm/page_alloc.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 59661106da16..892eabe1ec13 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5740,8 +5740,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, #ifdef CONFIG_ZONE_DEVICE /* -* Honor reservation requested by the driver for this ZONE_DEVICE -* memory. We limit the total number of pages to initialize to just +* We limit the total number of pages to initialize to just * those that might contain the memory mapping. We will defer the * ZONE_DEVICE page initialization until after we have released * the hotplug lock. @@ -5750,8 +5749,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, if (!altmap) return; - if (start_pfn == altmap->base_pfn) - start_pfn += altmap->reserve; end_pfn = altmap->base_pfn + vmem_altmap_offset(altmap); } #endif -- 2.21.0
Re: [PATCH] vsprintf: Do not break early boot with probing addresses
On (05/14/19 11:07), Sergey Senozhatsky wrote: > How about this: > > if ptr < PAGE_SIZE -> "(null)" No, this is totally stupid. Forget about it. Sorry. > if IS_ERR_VALUE(ptr)-> "(fault)" But Steven's "(fault)" is nice. -ss
[v2 2/2] [PowerPC] Allow use of SIMD in interrupts from kernel code
This second patch is separate because it could be wrong, like I am not sure about how kernel thread migration works, and it is even allowing simd in preemptible kernel code. Signed-off-by: Shawn Landden --- arch/powerpc/include/asm/switch_to.h | 15 ++- arch/powerpc/kernel/process.c| 60 ++-- 2 files changed, 59 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 5b03d8a82..c79f7d24a 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -30,10 +30,7 @@ extern void enable_kernel_fp(void); extern void flush_fp_to_thread(struct task_struct *); extern void giveup_fpu(struct task_struct *); extern void save_fpu(struct task_struct *); -static inline void disable_kernel_fp(void) -{ - msr_check_and_clear(MSR_FP); -} +extern void disable_kernel_fp(void); #else static inline void save_fpu(struct task_struct *t) { } static inline void flush_fp_to_thread(struct task_struct *t) { } @@ -44,10 +41,7 @@ extern void enable_kernel_altivec(void); extern void flush_altivec_to_thread(struct task_struct *); extern void giveup_altivec(struct task_struct *); extern void save_altivec(struct task_struct *); -static inline void disable_kernel_altivec(void) -{ - msr_check_and_clear(MSR_VEC); -} +extern void disable_kernel_altivec(void); #else static inline void save_altivec(struct task_struct *t) { } static inline void __giveup_altivec(struct task_struct *t) { } @@ -56,10 +50,7 @@ static inline void __giveup_altivec(struct task_struct *t) { } #ifdef CONFIG_VSX extern void enable_kernel_vsx(void); extern void flush_vsx_to_thread(struct task_struct *); -static inline void disable_kernel_vsx(void) -{ - msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX); -} +extern void disable_kernel_vsx(void); #endif #ifdef CONFIG_SPE diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index ef534831f..d15f2cb51 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -170,6 +170,29 @@ void __msr_check_and_clear(unsigned long bits) EXPORT_SYMBOL(__msr_check_and_clear); #ifdef CONFIG_PPC_FPU +/* + * Track whether the kernel is using the FPU state + * currently. + * + * This flag is used: + * + * - by IRQ context code to potentially use the FPU + * if it's unused. + * + * - to debug kernel_fpu/altivec/vsx_begin()/end() correctness + */ +static DEFINE_PER_CPU(bool, in_kernel_fpu); + +static bool kernel_fpu_disabled(void) +{ +return this_cpu_read(in_kernel_fpu); +} + +static bool interrupted_kernel_fpu_idle(void) +{ +return !kernel_fpu_disabled(); +} + static void __giveup_fpu(struct task_struct *tsk) { unsigned long msr; @@ -230,7 +253,8 @@ void enable_kernel_fp(void) { unsigned long cpumsr; - WARN_ON(preemptible()); +WARN_ON_ONCE(this_cpu_read(in_kernel_fpu)); +this_cpu_write(in_kernel_fpu, true); cpumsr = msr_check_and_set(MSR_FP); @@ -251,6 +275,15 @@ void enable_kernel_fp(void) } EXPORT_SYMBOL(enable_kernel_fp); +void disable_kernel_fp(void) +{ +WARN_ON_ONCE(!this_cpu_read(in_kernel_fpu)); +this_cpu_write(in_kernel_fpu, false); + +msr_check_and_clear(MSR_FP); +} +EXPORT_SYMBOL(disable_kernel_fp); + static int restore_fp(struct task_struct *tsk) { if (tsk->thread.load_fp || tm_active_with_fp(tsk)) { @@ -295,7 +328,8 @@ void enable_kernel_altivec(void) { unsigned long cpumsr; - WARN_ON(preemptible()); + WARN_ON_ONCE(this_cpu_read(in_kernel_fpu)); + this_cpu_write(in_kernel_fpu, true); cpumsr = msr_check_and_set(MSR_VEC); @@ -316,6 +350,14 @@ void enable_kernel_altivec(void) } EXPORT_SYMBOL(enable_kernel_altivec); +extern void disable_kernel_altivec(void) +{ + WARN_ON_ONCE(!this_cpu_read(in_kernel_fpu)); + this_cpu_write(in_kernel_fpu, false); + msr_check_and_clear(MSR_VEC); +} +EXPORT_SYMBOL(disable_kernel_altivec); + /* * Make sure the VMX/Altivec register state in the * the thread_struct is up to date for task tsk. @@ -371,7 +413,8 @@ static bool interrupted_user_mode(void) bool may_use_simd(void) { return !in_interrupt() || - interrupted_user_mode(); + interrupted_user_mode() || + interrupted_kernel_fpu_idle(); } EXPORT_SYMBOL(may_use_simd); @@ -411,7 +454,8 @@ void enable_kernel_vsx(void) { unsigned long cpumsr; - WARN_ON(preemptible()); + WARN_ON_ONCE(this_cpu_read(in_kernel_fpu)); + this_cpu_write(in_kernel_fpu, true); cpumsr = msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX); @@ -433,6 +477,14 @@ void enable_kernel_vsx(void) } EXPORT_SYMBOL(enable_kernel_vsx); +void disable_kernel_vsx(void) +{ + WARN_ON_ONCE(!this_cpu_read(in_kernel_fpu)); + this_cpu_write(in_kernel_fpu, false); + msr_check_and_clear(MSR_FP|MSR_VEC|MSR_
[v2 1/2] [PowerPC] Add simd.h implementation
Based off the x86 one. WireGuard really wants to be able to do SIMD in interrupts, so it can accelerate its in-bound path. Signed-off-by: Shawn Landden --- arch/powerpc/include/asm/simd.h | 10 ++ arch/powerpc/kernel/process.c | 30 ++ 2 files changed, 40 insertions(+) create mode 100644 arch/powerpc/include/asm/simd.h diff --git a/arch/powerpc/include/asm/simd.h b/arch/powerpc/include/asm/simd.h new file mode 100644 index 0..9b066d633 --- /dev/null +++ b/arch/powerpc/include/asm/simd.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ + +/* + * may_use_simd - whether it is allowable at this time to issue SIMD + *instructions or access the SIMD register file + * + * It's always ok in process context (ie "not interrupt") + * but it is sometimes ok even from an irq. + */ +extern bool may_use_simd(void); diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index dd9e0d538..ef534831f 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -345,6 +345,36 @@ static int restore_altivec(struct task_struct *tsk) } return 0; } + +/* + * Were we in user mode when we were + * interrupted? + * + * Doing kernel_altivec/vsx_begin/end() is ok if we are running + * in an interrupt context from user mode - we'll just + * save the FPU state as required. + */ +static bool interrupted_user_mode(void) +{ + struct pt_regs *regs = get_irq_regs(); + + return regs && user_mode(regs); +} + +/* + * Can we use FPU in kernel mode with the + * whole "kernel_fpu/altivec/vsx_begin/end()" sequence? + * + * It's always ok in process context (ie "not interrupt") + * but it is sometimes ok even from an irq. + */ +bool may_use_simd(void) +{ + return !in_interrupt() || + interrupted_user_mode(); +} +EXPORT_SYMBOL(may_use_simd); + #else #define loadvec(thr) 0 static inline int restore_altivec(struct task_struct *tsk) { return 0; } -- 2.21.0.1020.gf2820cf01a
Re: [PATCH] vsprintf: Do not break early boot with probing addresses
On (05/13/19 14:42), Petr Mladek wrote: > > The "(null)" is good enough by itself and already an established > > practice.. > > (efault) made more sense with the probe_kernel_read() that > checked wide range of addresses. Well, I still think that > it makes sense to distinguish a pure NULL. And it still > used also for IS_ERR_VALUE(). Wouldn't anything within first PAGE_SIZE bytes be reported as a NULL deref? char *p = (char *)(PAGE_SIZE - 2); *p = 'a'; gives kernel: BUG: kernel NULL pointer dereference, address = 0ffe kernel: #PF: supervisor-privileged write access from kernel code kernel: #PF: error_code(0x0002) - not-present page And I like Steven's "(fault)" idea. How about this: if ptr < PAGE_SIZE -> "(null)" if IS_ERR_VALUE(ptr)-> "(fault)" -ss
[PATCH 2/2] [PowerPC] Allow use of SIMD in interrupts from kernel code
This second patch is separate because it could be wrong, like I am not sure about how kernel thread migration works, and it is even allowing simd in preemptible kernel code. Signed-off-by: Shawn Landden --- arch/powerpc/include/asm/simd.h | 8 + arch/powerpc/include/asm/switch_to.h | 10 ++ arch/powerpc/kernel/process.c| 50 ++-- 3 files changed, 57 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/simd.h b/arch/powerpc/include/asm/simd.h index 2c02ad531..7b582b07e 100644 --- a/arch/powerpc/include/asm/simd.h +++ b/arch/powerpc/include/asm/simd.h @@ -7,7 +7,15 @@ * It's always ok in process context (ie "not interrupt") * but it is sometimes ok even from an irq. */ +#ifdef CONFIG_ALTIVEC +extern bool irq_simd_usable(void); static __must_check inline bool may_use_simd(void) { return irq_simd_usable(); } +#else +static inline bool may_use_simd(void) +{ + return false; +} +#endif diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 5b03d8a82..537998997 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -44,10 +44,7 @@ extern void enable_kernel_altivec(void); extern void flush_altivec_to_thread(struct task_struct *); extern void giveup_altivec(struct task_struct *); extern void save_altivec(struct task_struct *); -static inline void disable_kernel_altivec(void) -{ - msr_check_and_clear(MSR_VEC); -} +extern void disable_kernel_altivec(void); #else static inline void save_altivec(struct task_struct *t) { } static inline void __giveup_altivec(struct task_struct *t) { } @@ -56,10 +53,7 @@ static inline void __giveup_altivec(struct task_struct *t) { } #ifdef CONFIG_VSX extern void enable_kernel_vsx(void); extern void flush_vsx_to_thread(struct task_struct *); -static inline void disable_kernel_vsx(void) -{ - msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX); -} +extern void disable_kernel_vsx(void); #endif #ifdef CONFIG_SPE diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index e436d708a..41a0ab500 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -267,6 +267,29 @@ static int restore_fp(struct task_struct *tsk) { return 0; } #ifdef CONFIG_ALTIVEC #define loadvec(thr) ((thr).load_vec) +/* + * Track whether the kernel is using the SIMD state + * currently. + * + * This flag is used: + * + * - by IRQ context code to potentially use the FPU + * if it's unused. + * + * - to debug kernel_altivec/vsx_begin()/end() correctness + */ +static DEFINE_PER_CPU(bool, in_kernel_simd); + +static bool kernel_simd_disabled(void) +{ + return this_cpu_read(in_kernel_simd); +} + +static bool interrupted_kernel_simd_idle(void) +{ + return !kernel_simd_disabled(); +} + static void __giveup_altivec(struct task_struct *tsk) { unsigned long msr; @@ -295,7 +318,9 @@ void enable_kernel_altivec(void) { unsigned long cpumsr; - WARN_ON(preemptible()); + WARN_ON_ONCE(preemptible()); + WARN_ON_ONCE(this_cpu_read(in_kernel_simd)); + this_cpu_write(in_kernel_simd, true); cpumsr = msr_check_and_set(MSR_VEC); @@ -316,6 +341,14 @@ void enable_kernel_altivec(void) } EXPORT_SYMBOL(enable_kernel_altivec); +extern void disable_kernel_altivec(void) +{ + WARN_ON_ONCE(!this_cpu_read(in_kernel_simd)); + this_cpu_write(in_kernel_simd, false); + msr_check_and_clear(MSR_VEC); +} +EXPORT_SYMBOL(disable_kernel_altivec); + /* * Make sure the VMX/Altivec register state in the * the thread_struct is up to date for task tsk. @@ -371,7 +404,8 @@ static bool interrupted_user_mode(void) bool irq_simd_usable(void) { return !in_interrupt() || - interrupted_user_mode(); + interrupted_user_mode() || + interrupted_kernel_simd_idle(); } EXPORT_SYMBOL(irq_simd_usable); @@ -411,7 +445,9 @@ void enable_kernel_vsx(void) { unsigned long cpumsr; - WARN_ON(preemptible()); + WARN_ON_ONCE(preemptible()); + WARN_ON_ONCE(this_cpu_read(in_kernel_simd)); + this_cpu_write(in_kernel_simd, true); cpumsr = msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX); @@ -433,6 +469,14 @@ void enable_kernel_vsx(void) } EXPORT_SYMBOL(enable_kernel_vsx); +void disable_kernel_vsx(void) +{ + WARN_ON_ONCE(!this_cpu_read(in_kernel_simd)); + this_cpu_write(in_kernel_simd, false); + msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX); +} +EXPORT_SYMBOL(disable_kernel_vsx); + void flush_vsx_to_thread(struct task_struct *tsk) { if (tsk->thread.regs) { -- 2.21.0.1020.gf2820cf01a
[PATCH 1/2] [PowerPC] Add simd.h implementation
Based off the x86 one. WireGuard really wants to be able to do SIMD in interrupts, so it can accelerate its in-bound path. Signed-off-by: Shawn Landden --- arch/powerpc/include/asm/simd.h | 13 + arch/powerpc/kernel/process.c | 30 ++ 2 files changed, 43 insertions(+) create mode 100644 arch/powerpc/include/asm/simd.h diff --git a/arch/powerpc/include/asm/simd.h b/arch/powerpc/include/asm/simd.h new file mode 100644 index 0..2c02ad531 --- /dev/null +++ b/arch/powerpc/include/asm/simd.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ + +/* + * may_use_simd - whether it is allowable at this time to issue SIMD + *instructions or access the SIMD register file + * + * It's always ok in process context (ie "not interrupt") + * but it is sometimes ok even from an irq. + */ +static __must_check inline bool may_use_simd(void) +{ + return irq_simd_usable(); +} diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index dd9e0d538..e436d708a 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -345,6 +345,36 @@ static int restore_altivec(struct task_struct *tsk) } return 0; } + +/* + * Were we in user mode when we were + * interrupted? + * + * Doing kernel_altivec/vsx_begin/end() is ok if we are running + * in an interrupt context from user mode - we'll just + * save the FPU state as required. + */ +static bool interrupted_user_mode(void) +{ + struct pt_regs *regs = get_irq_regs(); + + return regs && user_mode(regs); +} + +/* + * Can we use SIMD in kernel mode with the + * whole "kernel_altivec/vsx_begin/end()" sequence? + * + * It's always ok in process context (ie "not interrupt") + * but it is sometimes ok even from an irq. + */ +bool irq_simd_usable(void) +{ + return !in_interrupt() || + interrupted_user_mode(); +} +EXPORT_SYMBOL(irq_simd_usable); + #else #define loadvec(thr) 0 static inline int restore_altivec(struct task_struct *tsk) { return 0; } -- 2.21.0.1020.gf2820cf01a
Re: Kernel OOPS followed by a panic on next20190507 with 4K page size
On 5/8/19 4:30 PM, Sachin Sant wrote: While running LTP tests (specifically futex_wake04) against next-20199597 build with 4K page size on a POWER8 LPAR following crash is observed. [ 4233.214876] BUG: Kernel NULL pointer dereference at 0x001c [ 4233.214898] Faulting instruction address: 0xc1d1e58c [ 4233.214905] Oops: Kernel access of bad area, sig: 11 [#1] [ 4233.214911] LE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [ 4233.214920] Dumping ftrace buffer: [ 4233.214928](ftrace buffer empty) [ 4233.214933] Modules linked in: overlay rpadlpar_io rpaphp iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc kvm iptable_filter pseries_rng rng_core vmx_crypto ip_tables x_tables autofs4 [last unloaded: dummy_del_mod] [ 4233.214973] CPU: 3 PID: 4635 Comm: futex_wake04 Tainted: GW O 5.1.0-next-20190507-autotest #1 [ 4233.214980] NIP: c1d1e58c LR: c1d1e54c CTR: [ 4233.214987] REGS: c4937890 TRAP: 0300 Tainted: GW O (5.1.0-next-20190507-autotest) [ 4233.214993] MSR: 80009033 CR: 22424822 XER: [ 4233.215005] CFAR: c183e9e0 DAR: 001c DSISR: 4000 IRQMASK: 0 [ 4233.215005] GPR00: c1901a80 c4937b20 c3938700 [ 4233.215005] GPR04: 00400cc0 0003efff 00027966e000 c3ba8700 [ 4233.215005] GPR08: c3ba8700 0d601125 c3ba8700 8000 [ 4233.215005] GPR12: 22424822 c0001ecae280 [ 4233.215005] GPR16: [ 4233.215005] GPR20: 0018 c39e2d30 c39e2d28 c002762da460 [ 4233.215005] GPR24: 001c 0001 c1901a80 [ 4233.215005] GPR28: 00400cc0 00400cc0 [ 4233.215065] NIP [c1d1e58c] kmem_cache_alloc+0xbc/0x5a0 [ 4233.215071] LR [c1d1e54c] kmem_cache_alloc+0x7c/0x5a0 [ 4233.215075] Call Trace: [ 4233.215081] [c4937b20] [c1c91150] __pud_alloc+0x160/0x200 (unreliable) [ 4233.215090] [c4937b80] [c1901a80] huge_pte_alloc+0x580/0x950 [ 4233.215098] [c4937c00] [c1cf7910] hugetlb_fault+0x9a0/0x1250 [ 4233.215106] [c4937ce0] [c1c94a80] handle_mm_fault+0x490/0x4a0 [ 4233.215114] [c4937d20] [c18d529c] __do_page_fault+0x77c/0x1f00 [ 4233.215121] [c4937e00] [c18d6a48] do_page_fault+0x28/0x50 [ 4233.215129] [c4937e20] [c183b0d4] handle_page_fault+0x18/0x38 [ 4233.215135] Instruction dump: [ 4233.215139] 39290001 f92ac1b0 419e009c 3ce20027 3ba0 e927c1f0 39290001 f927c1f0 [ 4233.215149] 3d420027 e92ac290 39290001 f92ac290 <8359001c> 83390018 6000 3ce20027 I did send a patch to the list to handle page allocation failures in this patch. But i guess what we are finding here is get_current() crashing. Any chance to bisect this? -aneesh
[PATCH] powerpc/mm: Handle page table allocation failures
This fix the below crash that arise due to not handling page table allocation failures while allocating hugetlb page table. BUG: Kernel NULL pointer dereference at 0x001c Faulting instruction address: 0xc1d1e58c Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries CPU: 3 PID: 4635 Comm: futex_wake04 Tainted: GW O 5.1.0-next-20190507-autotest #1 NIP: c1d1e58c LR: c1d1e54c CTR: REGS: c4937890 TRAP: 0300 Tainted: GW O (5.1.0-next-20190507-autotest) MSR: 80009033 CR: 22424822 XER: CFAR: c183e9e0 DAR: 001c DSISR: 4000 IRQMASK: 0 GPR00: c1901a80 c4937b20 c3938700 GPR04: 00400cc0 0003efff 00027966e000 c3ba8700 GPR08: c3ba8700 0d601125 c3ba8700 8000 GPR12: 22424822 c0001ecae280 GPR16: GPR20: 0018 c39e2d30 c39e2d28 c002762da460 GPR24: 001c 0001 c1901a80 GPR28: 00400cc0 00400cc0 NIP [c1d1e58c] kmem_cache_alloc+0xbc/0x5a0 LR [c1d1e54c] kmem_cache_alloc+0x7c/0x5a0 Call Trace: [c1c91150] __pud_alloc+0x160/0x200 (unreliable) [c1901a80] huge_pte_alloc+0x580/0x950 [c1cf7910] hugetlb_fault+0x9a0/0x1250 [c1c94a80] handle_mm_fault+0x490/0x4a0 [c18d529c] __do_page_fault+0x77c/0x1f00 [c18d6a48] do_page_fault+0x28/0x50 [c183b0d4] handle_page_fault+0x18/0x38 Fixes: e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format") Reported-by: Sachin Sant Signed-off-by: Aneesh Kumar K.V --- Note: I did add a recent commit for the Fixes tag. But in reality we never checked for page table allocation failure there. If we want to go to that old commit, then we may need. Fixes: a4fe3ce7699b ("powerpc/mm: Allow more flexible layouts for hugepage pagetables") arch/powerpc/mm/hugetlbpage.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index c5c9ff2d7afc..ae9d71da5219 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -130,6 +130,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz } else { pdshift = PUD_SHIFT; pu = pud_alloc(mm, pg, addr); + if (!pu) + return NULL; if (pshift == PUD_SHIFT) return (pte_t *)pu; else if (pshift > PMD_SHIFT) { @@ -138,6 +140,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz } else { pdshift = PMD_SHIFT; pm = pmd_alloc(mm, pu, addr); + if (!pm) + return NULL; if (pshift == PMD_SHIFT) /* 16MB hugepage */ return (pte_t *)pm; @@ -154,12 +158,16 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz } else { pdshift = PUD_SHIFT; pu = pud_alloc(mm, pg, addr); + if (!pu) + return NULL; if (pshift >= PUD_SHIFT) { ptl = pud_lockptr(mm, pu); hpdp = (hugepd_t *)pu; } else { pdshift = PMD_SHIFT; pm = pmd_alloc(mm, pu, addr); + if (!pm) + return NULL; ptl = pmd_lockptr(mm, pm); hpdp = (hugepd_t *)pm; } -- 2.21.0
[Bug 203597] kernel 4.9.175 fails to boot on a PowerMac G4 3,6 at early stage
https://bugzilla.kernel.org/show_bug.cgi?id=203597 --- Comment #1 from Erhard F. (erhar...@mailbox.org) --- Created attachment 282745 --> https://bugzilla.kernel.org/attachment.cgi?id=282745&action=edit bisect.log -- You are receiving this mail because: You are watching the assignee of the bug.
[Bug 203597] New: kernel 4.9.175 fails to boot on a PowerMac G4 3,6 at early stage
https://bugzilla.kernel.org/show_bug.cgi?id=203597 Bug ID: 203597 Summary: kernel 4.9.175 fails to boot on a PowerMac G4 3,6 at early stage Product: Platform Specific/Hardware Version: 2.5 Kernel Version: 4.9.175 Hardware: PPC-32 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: PPC-32 Assignee: platform_ppc...@kernel-bugs.osdl.org Reporter: erhar...@mailbox.org Regression: No Created attachment 282743 --> https://bugzilla.kernel.org/attachment.cgi?id=282743&action=edit kernel .config (PowerMac G4 MDD) Trying out older kernels on the G4 MDD I noticed recent 4.9.x freeze the machine. Only message displayed in black letters on a white screen: done found display : /pco@f000/ATY,AlteracParent@10/ATY,Alterac_B@1, opening... It's a hard freeze, RCU_CPU_STALL_TIMEOUT does not kick in. Tried other stable kernels, which all worked: 4.19.37 4.14.114 4.4.179 So I suppose it's only a 4.9.x issue. Last working 4.9.x kernel I had in service was 4.9.161. First one I spotted freezing was 4.9.171. A bisect gave me the following commit: 1c38a84d45862be06ac418618981631eddbda741 is the first bad commit commit 1c38a84d45862be06ac418618981631eddbda741 Author: Michael Neuling Date: Thu Apr 11 21:45:59 2019 +1000 powerpc: Avoid code patching freed init sections commit 51c3c62b58b357e8d35e4cc32f7b4ec907426fe3 upstream. This stops us from doing code patching in init sections after they've been freed. In this chain: kvm_guest_init() -> kvm_use_magic_page() -> fault_in_pages_readable() -> __get_user() -> __get_user_nocheck() -> barrier_nospec(); We have a code patching location at barrier_nospec() and kvm_guest_init() is an init function. This whole chain gets inlined, so when we free the init section (hence kvm_guest_init()), this code goes away and hence should no longer be patched. We seen this as userspace memory corruption when using a memory checker while doing partition migration testing on powervm (this starts the code patching post migration via /sys/kernel/mobility/migration). In theory, it could also happen when using /sys/kernel/debug/powerpc/barrier_nospec. Cc: sta...@vger.kernel.org # 4.13+ Signed-off-by: Michael Neuling Reviewed-by: Nicholas Piggin Reviewed-by: Christophe Leroy Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin -- You are receiving this mail because: You are watching the assignee of the bug.
Re: [PATCH] powerpc/powernv/npu: Fix reference leak
Michael, Any comments on this patch ? Should I repost with a shorter comment as suggested by Alexey ? Cheers, -- Greg On Mon, 29 Apr 2019 12:36:59 +0200 Greg Kurz wrote: > On Mon, 29 Apr 2019 16:01:29 +1000 > Alexey Kardashevskiy wrote: > > > On 20/04/2019 01:34, Greg Kurz wrote: > > > Since 902bdc57451c, get_pci_dev() calls pci_get_domain_bus_and_slot(). > > > This > > > has the effect of incrementing the reference count of the PCI device, as > > > explained in drivers/pci/search.c: > > > > > > * Given a PCI domain, bus, and slot/function number, the desired PCI > > > * device is located in the list of PCI devices. If the device is > > > * found, its reference count is increased and this function returns a > > > * pointer to its data structure. The caller must decrement the > > > * reference count by calling pci_dev_put(). If no device is found, > > > * %NULL is returned. > > > > > > Nothing was done to call pci_dev_put() and the reference count of GPU and > > > NPU PCI devices rockets up. > > > > > > A natural way to fix this would be to teach the callers about the change, > > > so that they call pci_dev_put() when done with the pointer. This turns > > > out to be quite intrusive, as it affects many paths in npu-dma.c, > > > pci-ioda.c and vfio_pci_nvlink2.c. > > > > > > afaict this referencing is only done to protect the current traverser > > and what you've done is actually a natural way (and the generic > > pci_get_dev_by_id() does exactly the same), although this looks a bit weird. > > > > Not exactly the same: pci_get_dev_by_id() always increment the refcount > of the returned PCI device. The refcount is only decremented when this > device is passed to pci_get_dev_by_id() to continue searching. > > That means that the users of the PCI device pointer returned by > pci_get_dev_by_id() or its exported variants pci_get_subsys(), > pci_get_device() and pci_get_class() do handle the refcount. They > all pass the pointer to pci_dev_put() or continue the search, > which calls pci_dev_put() internally. > > Direct and indirect callers of get_pci_dev() don't care for the > refcount at all unless I'm missing something. > > > > > > Also, the issue appeared in 4.16 and > > > some affected code got moved around since then: it would be problematic > > > to backport the fix to stable releases. > > > > > > All that code never cared for reference counting anyway. Call > > > pci_dev_put() > > > from get_pci_dev() to revert to the previous behavior. > > >> Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev > > from pci_dn") > > > Cc: sta...@vger.kernel.org # v4.16 > > > Signed-off-by: Greg Kurz > > > --- > > > arch/powerpc/platforms/powernv/npu-dma.c | 15 ++- > > > 1 file changed, 14 insertions(+), 1 deletion(-) > > > > > > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c > > > b/arch/powerpc/platforms/powernv/npu-dma.c > > > index e713ade30087..d8f3647e8fb2 100644 > > > --- a/arch/powerpc/platforms/powernv/npu-dma.c > > > +++ b/arch/powerpc/platforms/powernv/npu-dma.c > > > @@ -31,9 +31,22 @@ static DEFINE_SPINLOCK(npu_context_lock); > > > static struct pci_dev *get_pci_dev(struct device_node *dn) > > > { > > > struct pci_dn *pdn = PCI_DN(dn); > > > + struct pci_dev *pdev; > > > > > > - return pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus), > > > + pdev = pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus), > > > pdn->busno, pdn->devfn); > > > + > > > + /* > > > + * pci_get_domain_bus_and_slot() increased the reference count of > > > + * the PCI device, but callers don't need that actually as the PE > > > + * already holds a reference to the device. > > > > Imho this would be just enough. > > > > Anyway, > > > > Reviewed-by: Alexey Kardashevskiy > > > > Thanks ! > > I now realize that I forgot to add the --cc option for stable on my stgit > command line :-\. > > Cc'ing now. > > > > > How did you find it? :) > > > > While reading code to find some inspiration for OpenCAPI passthrough. :) > > I saw the following in vfio_pci_ibm_npu2_init(): > > if (!pnv_pci_get_gpu_dev(vdev->pdev)) > return -ENODEV; > > and simply followed the function calls. > > > > > > Since callers aren't > > > + * aware of the reference count change, call pci_dev_put() now to > > > + * avoid leaks. > > > + */ > > > + if (pdev) > > > + pci_dev_put(pdev); > > > + > > > + return pdev; > > > } > > > > > > /* Given a NPU device get the associated PCI device. */ > > > > > >
Re: [PATCH v3 4/6] dt-bindings: soc/fsl: qe: document new fsl,qe-snums binding
On Mon, 13 May 2019 11:14:58 +, Rasmus Villemoes wrote: > Reading table 4-30, and its footnotes, of the QUICC Engine Block > Reference Manual shows that the set of snum _values_ is not > necessarily just a function of the _number_ of snums, as given in the > fsl,qe-num-snums property. > > As an alternative, to make it easier to add support for other variants > of the QUICC engine IP, this introduces a new binding fsl,qe-snums, > which automatically encodes both the number of snums and the actual > values to use. > > Signed-off-by: Rasmus Villemoes > --- > Rob, thanks for the review of v2. However, since I moved the example > from the commit log to the binding (per Joakim's request), I didn't > add a Reviewed-by tag for this revision. > > .../devicetree/bindings/soc/fsl/cpm_qe/qe.txt | 13 - > 1 file changed, 12 insertions(+), 1 deletion(-) > Reviewed-by: Rob Herring
Re: [PATCH v1 4/8] soc/fsl/qbman: Use index when accessing device tree properties
On 5/13/2019 12:40 PM, Joakim Tjernlund wrote: > On Mon, 2019-05-13 at 16:09 +, Roy Pledge wrote: >> The index value should be passed to the of_parse_phandle() >> function to ensure the correct property is read. > Is this a bug fix? Maybe for stable too? > > Jocke Yes this could go to stable. I will include sta...@vger.kernel.org when I send the next version. > >> Signed-off-by: Roy Pledge >> --- >> drivers/soc/fsl/qbman/dpaa_sys.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/soc/fsl/qbman/dpaa_sys.c >> b/drivers/soc/fsl/qbman/dpaa_sys.c >> index 3e0a7f3..0b901a8 100644 >> --- a/drivers/soc/fsl/qbman/dpaa_sys.c >> +++ b/drivers/soc/fsl/qbman/dpaa_sys.c >> @@ -49,7 +49,7 @@ int qbman_init_private_mem(struct device *dev, int idx, >> dma_addr_t *addr, >> idx, ret); >> return -ENODEV; >> } >> - mem_node = of_parse_phandle(dev->of_node, "memory-region", 0); >> + mem_node = of_parse_phandle(dev->of_node, "memory-region", idx); >> if (mem_node) { >> ret = of_property_read_u64(mem_node, "size", &size64); >> if (ret) { >> -- >> 2.7.4 >> >
Re: [PATCH v1 4/8] soc/fsl/qbman: Use index when accessing device tree properties
On Mon, 2019-05-13 at 16:09 +, Roy Pledge wrote: > > The index value should be passed to the of_parse_phandle() > function to ensure the correct property is read. Is this a bug fix? Maybe for stable too? Jocke > > Signed-off-by: Roy Pledge > --- > drivers/soc/fsl/qbman/dpaa_sys.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/soc/fsl/qbman/dpaa_sys.c > b/drivers/soc/fsl/qbman/dpaa_sys.c > index 3e0a7f3..0b901a8 100644 > --- a/drivers/soc/fsl/qbman/dpaa_sys.c > +++ b/drivers/soc/fsl/qbman/dpaa_sys.c > @@ -49,7 +49,7 @@ int qbman_init_private_mem(struct device *dev, int idx, > dma_addr_t *addr, > idx, ret); > return -ENODEV; > } > - mem_node = of_parse_phandle(dev->of_node, "memory-region", 0); > + mem_node = of_parse_phandle(dev->of_node, "memory-region", idx); > if (mem_node) { > ret = of_property_read_u64(mem_node, "size", &size64); > if (ret) { > -- > 2.7.4 >
Re: [PATCH 2/8] powerpc/pseries: Do not save the previous DTL mask value
"Naveen N. Rao" writes: > When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is enabled, we always initialize > DTL enable mask to DTL_LOG_PREEMPT (0x2). There are no other places > where the mask is changed. As such, when reading the DTL log buffer > through debugfs, there is no need to save and restore the previous mask > value. > > We don't need to save and restore the earlier mask value if > CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not enabled. So, remove the field > from the structure as well. Makes sense. Acked-by: Nathan Lynch
[PATCH v1 8/8] soc/fsl/qbman: Update device tree with reserved memory
When using the reserved memory node in the device tree there are two options - dynamic or static. If a dynamic allocation was selected (where the kernel selects the address for the allocation) convert it to a static allocation by inserting the reg property. This will ensure the same memory is reused after a kexec() Signed-off-by: Roy Pledge --- drivers/soc/fsl/qbman/dpaa_sys.c | 58 1 file changed, 35 insertions(+), 23 deletions(-) diff --git a/drivers/soc/fsl/qbman/dpaa_sys.c b/drivers/soc/fsl/qbman/dpaa_sys.c index 0b901a8..9dd8bb5 100644 --- a/drivers/soc/fsl/qbman/dpaa_sys.c +++ b/drivers/soc/fsl/qbman/dpaa_sys.c @@ -37,41 +37,53 @@ int qbman_init_private_mem(struct device *dev, int idx, dma_addr_t *addr, size_t *size) { - int ret; struct device_node *mem_node; - u64 size64; struct reserved_mem *rmem; + struct property *prop; + int len, err; + __be32 *res_array; - ret = of_reserved_mem_device_init_by_idx(dev, dev->of_node, idx); - if (ret) { - dev_err(dev, - "of_reserved_mem_device_init_by_idx(%d) failed 0x%x\n", - idx, ret); - return -ENODEV; - } mem_node = of_parse_phandle(dev->of_node, "memory-region", idx); - if (mem_node) { - ret = of_property_read_u64(mem_node, "size", &size64); - if (ret) { - dev_err(dev, "of_address_to_resource fails 0x%x\n", - ret); - return -ENODEV; - } - *size = size64; - } else { + if (!mem_node) { dev_err(dev, "No memory-region found for index %d\n", idx); return -ENODEV; } rmem = of_reserved_mem_lookup(mem_node); + if (!rmem) { + dev_err(dev, "of_reserved_mem_lookup() returned NULL\n"); + return -ENODEV; + } *addr = rmem->base; + *size = rmem->size; /* -* Disassociate the reserved memory area from the device -* because a device can only have one DMA memory area. This -* should be fine since the memory is allocated and initialized -* and only ever accessed by the QBMan device from now on +* Check if the reg property exists - if not insert the node +* so upon kexec() the same memory region address will be preserved. +* This is needed because QBMan HW does not allow the base address/ +* size to be modified once set. */ - of_reserved_mem_device_release(dev); + prop = of_find_property(mem_node, "reg", &len); + if (!prop) { + prop = devm_kzalloc(dev, sizeof(*prop), GFP_KERNEL); + if (!prop) + return -ENOMEM; + prop->value = res_array = devm_kzalloc(dev, sizeof(__be32) * 4, + GFP_KERNEL); + if (!prop->value) + return -ENOMEM; + res_array[0] = cpu_to_be32(upper_32_bits(*addr)); + res_array[1] = cpu_to_be32(lower_32_bits(*addr)); + res_array[2] = cpu_to_be32(upper_32_bits(*size)); + res_array[3] = cpu_to_be32(lower_32_bits(*size)); + prop->length = sizeof(__be32) * 4; + prop->name = devm_kstrdup(dev, "reg", GFP_KERNEL); + if (!prop->name) + return -ENOMEM; + err = of_add_property(mem_node, prop); + if (err) + return err; + } + return 0; } -- 2.7.4
[PATCH v1 7/8] soc/fsl/qbman: Fixup qman_shutdown_fq()
When shutting down a FQ on a dedicated channel only the SW portal associated with that channel can dequeue from it. Make sure the correct portal is use. Signed-off-by: Roy Pledge --- drivers/soc/fsl/qbman/qman.c | 53 +++- 1 file changed, 42 insertions(+), 11 deletions(-) diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c index 4a99ce5..bf68d86 100644 --- a/drivers/soc/fsl/qbman/qman.c +++ b/drivers/soc/fsl/qbman/qman.c @@ -1018,6 +1018,20 @@ static inline void put_affine_portal(void) put_cpu_var(qman_affine_portal); } + +static inline struct qman_portal *get_portal_for_channel(u16 channel) +{ + int i; + + for (i = 0; i < num_possible_cpus(); i++) { + if (affine_portals[i] && + affine_portals[i]->config->channel == channel) + return affine_portals[i]; + } + + return NULL; +} + static struct workqueue_struct *qm_portal_wq; int qman_dqrr_set_ithresh(struct qman_portal *portal, u8 ithresh) @@ -2601,7 +2615,7 @@ static int _qm_dqrr_consume_and_match(struct qm_portal *p, u32 fqid, int s, int qman_shutdown_fq(u32 fqid) { - struct qman_portal *p; + struct qman_portal *p, *channel_portal; struct device *dev; union qm_mc_command *mcc; union qm_mc_result *mcr; @@ -2641,17 +2655,28 @@ int qman_shutdown_fq(u32 fqid) channel = qm_fqd_get_chan(&mcr->queryfq.fqd); wq = qm_fqd_get_wq(&mcr->queryfq.fqd); + if (channel < qm_channel_pool1) { + channel_portal = get_portal_for_channel(channel); + if (channel_portal == NULL) { + dev_err(dev, "Can't find portal for dedicated channel 0x%x\n", + channel); + ret = -EIO; + goto out; + } + } else + channel_portal = p; + switch (state) { case QM_MCR_NP_STATE_TEN_SCHED: case QM_MCR_NP_STATE_TRU_SCHED: case QM_MCR_NP_STATE_ACTIVE: case QM_MCR_NP_STATE_PARKED: orl_empty = 0; - mcc = qm_mc_start(&p->p); + mcc = qm_mc_start(&channel_portal->p); qm_fqid_set(&mcc->fq, fqid); - qm_mc_commit(&p->p, QM_MCC_VERB_ALTER_RETIRE); - if (!qm_mc_result_timeout(&p->p, &mcr)) { - dev_err(dev, "QUERYFQ_NP timeout\n"); + qm_mc_commit(&channel_portal->p, QM_MCC_VERB_ALTER_RETIRE); + if (!qm_mc_result_timeout(&channel_portal->p, &mcr)) { + dev_err(dev, "ALTER_RETIRE timeout\n"); ret = -ETIMEDOUT; goto out; } @@ -2659,6 +2684,9 @@ int qman_shutdown_fq(u32 fqid) QM_MCR_VERB_ALTER_RETIRE); res = mcr->result; /* Make a copy as we reuse MCR below */ + if (res == QM_MCR_RESULT_OK) + drain_mr_fqrni(&channel_portal->p); + if (res == QM_MCR_RESULT_PENDING) { /* * Need to wait for the FQRN in the message ring, which @@ -2688,21 +2716,25 @@ int qman_shutdown_fq(u32 fqid) } /* Set the sdqcr to drain this channel */ if (channel < qm_channel_pool1) - qm_dqrr_sdqcr_set(&p->p, + qm_dqrr_sdqcr_set(&channel_portal->p, QM_SDQCR_TYPE_ACTIVE | QM_SDQCR_CHANNELS_DEDICATED); else - qm_dqrr_sdqcr_set(&p->p, + qm_dqrr_sdqcr_set(&channel_portal->p, QM_SDQCR_TYPE_ACTIVE | QM_SDQCR_CHANNELS_POOL_CONV (channel)); do { /* Keep draining DQRR while checking the MR*/ - qm_dqrr_drain_nomatch(&p->p); + qm_dqrr_drain_nomatch(&channel_portal->p); /* Process message ring too */ - found_fqrn = qm_mr_drain(&p->p, FQRN); + found_fqrn = qm_mr_drain(&channel_portal->p, +FQRN); cpu_relax(); } while (!found_fqrn); + /* Restore SDQCR */ + qm_dqrr_sdqcr_set(&channel_portal->p, + channel_portal->sdqcr); } if (res != QM_MCR_RESULT_OK && @@ -2733,9 +2765,8 @@ int qman_shutdown_fq(u32 fqid)
[PATCH v1 6/8] soc/fsl/qbman: Disable interrupts during portal recovery
Disable the QBMan interrupts during recovery. Signed-off-by: Roy Pledge --- drivers/soc/fsl/qbman/qman.c | 22 +++--- drivers/soc/fsl/qbman/qman_ccsr.c | 1 + drivers/soc/fsl/qbman/qman_priv.h | 1 + 3 files changed, 21 insertions(+), 3 deletions(-) diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c index 2989504..4a99ce5 100644 --- a/drivers/soc/fsl/qbman/qman.c +++ b/drivers/soc/fsl/qbman/qman.c @@ -1070,6 +1070,20 @@ int qman_wq_alloc(void) return 0; } + +void qman_enable_irqs(void) +{ + int i; + + for (i = 0; i < num_possible_cpus(); i++) { + if (affine_portals[i]) { + qm_out(&affine_portals[i]->p, QM_REG_ISR, 0x); + qm_out(&affine_portals[i]->p, QM_REG_IIR, 0); + } + + } +} + /* * This is what everything can wait on, even if it migrates to a different cpu * to the one whose affine portal it is waiting on. @@ -1269,8 +1283,8 @@ static int qman_create_portal(struct qman_portal *portal, qm_out(p, QM_REG_ISDR, isdr); portal->irq_sources = 0; qm_out(p, QM_REG_IER, 0); - qm_out(p, QM_REG_ISR, 0x); snprintf(portal->irqname, MAX_IRQNAME, IRQNAME, c->cpu); + qm_out(p, QM_REG_IIR, 1); if (request_irq(c->irq, portal_isr, 0, portal->irqname, portal)) { dev_err(c->dev, "request_irq() failed\n"); goto fail_irq; @@ -1290,7 +1304,7 @@ static int qman_create_portal(struct qman_portal *portal, isdr &= ~(QM_PIRQ_DQRI | QM_PIRQ_MRI); qm_out(p, QM_REG_ISDR, isdr); if (qm_dqrr_current(p)) { - dev_err(c->dev, "DQRR unclean\n"); + dev_dbg(c->dev, "DQRR unclean\n"); qm_dqrr_cdc_consume_n(p, 0x); } if (qm_mr_current(p) && drain_mr_fqrni(p)) { @@ -1303,8 +1317,10 @@ static int qman_create_portal(struct qman_portal *portal, } /* Success */ portal->config = c; + qm_out(p, QM_REG_ISR, 0x); qm_out(p, QM_REG_ISDR, 0); - qm_out(p, QM_REG_IIR, 0); + if (!qman_requires_cleanup()) + qm_out(p, QM_REG_IIR, 0); /* Write a sane SDQCR */ qm_dqrr_sdqcr_set(p, portal->sdqcr); return 0; diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c index bee2d0e..3894172 100644 --- a/drivers/soc/fsl/qbman/qman_ccsr.c +++ b/drivers/soc/fsl/qbman/qman_ccsr.c @@ -735,6 +735,7 @@ int qman_requires_cleanup(void) void qman_done_cleanup(void) { + qman_enable_irqs(); __qman_requires_cleanup = 0; } diff --git a/drivers/soc/fsl/qbman/qman_priv.h b/drivers/soc/fsl/qbman/qman_priv.h index bf51c17..9d37ddd 100644 --- a/drivers/soc/fsl/qbman/qman_priv.h +++ b/drivers/soc/fsl/qbman/qman_priv.h @@ -272,3 +272,4 @@ int qman_shutdown_fq(u32 fqid); int qman_requires_cleanup(void); void qman_done_cleanup(void); +void qman_enable_irqs(void); -- 2.7.4
[PATCH v1 3/8] soc/fsl/qbman: Cleanup QMan queues if device was already initialized
If the QMan device was previously initialized make sure all the frame queues are out of service once all the portals are probed. This handles the case where the kernel is restarted without the SoC being reset (kexec for example) Signed-off-by: Roy Pledge --- drivers/soc/fsl/qbman/qman.c| 4 ++-- drivers/soc/fsl/qbman/qman_ccsr.c | 13 - drivers/soc/fsl/qbman/qman_portal.c | 18 +- drivers/soc/fsl/qbman/qman_priv.h | 7 +++ 4 files changed, 38 insertions(+), 4 deletions(-) diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c index 636f83f..f10f77d 100644 --- a/drivers/soc/fsl/qbman/qman.c +++ b/drivers/soc/fsl/qbman/qman.c @@ -2581,7 +2581,7 @@ static int _qm_dqrr_consume_and_match(struct qm_portal *p, u32 fqid, int s, #define qm_dqrr_drain_nomatch(p) \ _qm_dqrr_consume_and_match(p, 0, 0, false) -static int qman_shutdown_fq(u32 fqid) +int qman_shutdown_fq(u32 fqid) { struct qman_portal *p; struct device *dev; @@ -2754,7 +2754,7 @@ static int qman_shutdown_fq(u32 fqid) DPAA_ASSERT((mcr->verb & QM_MCR_VERB_MASK) == QM_MCR_VERB_ALTER_OOS); - if (mcr->result) { + if (mcr->result != QM_MCR_RESULT_OK) { dev_err(dev, "OOS fail: FQ 0x%x (0x%x)\n", fqid, mcr->result); ret = -EIO; diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c index d054e37..bee2d0e 100644 --- a/drivers/soc/fsl/qbman/qman_ccsr.c +++ b/drivers/soc/fsl/qbman/qman_ccsr.c @@ -483,7 +483,7 @@ RESERVEDMEM_OF_DECLARE(qman_pfdr, "fsl,qman-pfdr", qman_pfdr); #endif -static unsigned int qm_get_fqid_maxcnt(void) +unsigned int qm_get_fqid_maxcnt(void) { return fqd_sz / 64; } @@ -728,6 +728,17 @@ int qman_is_probed(void) } EXPORT_SYMBOL_GPL(qman_is_probed); +int qman_requires_cleanup(void) +{ + return __qman_requires_cleanup; +} + +void qman_done_cleanup(void) +{ + __qman_requires_cleanup = 0; +} + + static int fsl_qman_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c index 2460802..8295d75 100644 --- a/drivers/soc/fsl/qbman/qman_portal.c +++ b/drivers/soc/fsl/qbman/qman_portal.c @@ -233,7 +233,7 @@ static int qman_portal_probe(struct platform_device *pdev) struct device_node *node = dev->of_node; struct qm_portal_config *pcfg; struct resource *addr_phys[2]; - int irq, cpu, err; + int irq, cpu, err, i; u32 val; err = qman_is_probed(); @@ -327,6 +327,22 @@ static int qman_portal_probe(struct platform_device *pdev) if (!cpu_online(cpu)) qman_offline_cpu(cpu); + if (__qman_portals_probed == 1 && qman_requires_cleanup()) { + /* +* QMan wasn't reset prior to boot (Kexec for example) +* Empty all the frame queues so they are in reset state +*/ + for (i = 0; i < qm_get_fqid_maxcnt(); i++) { + err = qman_shutdown_fq(i); + if (err) { + dev_err(dev, "Failed to shutdown frame queue %d\n", + i); + goto err_portal_init; + } + } + qman_done_cleanup(); + } + return 0; err_portal_init: diff --git a/drivers/soc/fsl/qbman/qman_priv.h b/drivers/soc/fsl/qbman/qman_priv.h index 75a8f90..bf51c17 100644 --- a/drivers/soc/fsl/qbman/qman_priv.h +++ b/drivers/soc/fsl/qbman/qman_priv.h @@ -265,3 +265,10 @@ extern struct qman_portal *affine_portals[NR_CPUS]; extern struct qman_portal *qman_dma_portal; const struct qm_portal_config *qman_get_qm_portal_config( struct qman_portal *portal); + +unsigned int qm_get_fqid_maxcnt(void); + +int qman_shutdown_fq(u32 fqid); + +int qman_requires_cleanup(void); +void qman_done_cleanup(void); -- 2.7.4
[PATCH v1 5/8] soc/fsl/qbman: Fix drain_mr_fqni()
The drain_mr_fqni() function may be called fron uninterruptable context so convert the msleep() to an mdelay(). Also ensure that the valid bit is updated while polling. Signed-off-by: Roy Pledge --- drivers/soc/fsl/qbman/qman.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c index f10f77d..2989504 100644 --- a/drivers/soc/fsl/qbman/qman.c +++ b/drivers/soc/fsl/qbman/qman.c @@ -1164,6 +1164,7 @@ static int drain_mr_fqrni(struct qm_portal *p) { const union qm_mr_entry *msg; loop: + qm_mr_pvb_update(p); msg = qm_mr_current(p); if (!msg) { /* @@ -1180,7 +1181,8 @@ static int drain_mr_fqrni(struct qm_portal *p) * entries well before the ring has been fully consumed, so * we're being *really* paranoid here. */ - msleep(1); + mdelay(1); + qm_mr_pvb_update(p); msg = qm_mr_current(p); if (!msg) return 0; -- 2.7.4
[PATCH v1 4/8] soc/fsl/qbman: Use index when accessing device tree properties
The index value should be passed to the of_parse_phandle() function to ensure the correct property is read. Signed-off-by: Roy Pledge --- drivers/soc/fsl/qbman/dpaa_sys.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/soc/fsl/qbman/dpaa_sys.c b/drivers/soc/fsl/qbman/dpaa_sys.c index 3e0a7f3..0b901a8 100644 --- a/drivers/soc/fsl/qbman/dpaa_sys.c +++ b/drivers/soc/fsl/qbman/dpaa_sys.c @@ -49,7 +49,7 @@ int qbman_init_private_mem(struct device *dev, int idx, dma_addr_t *addr, idx, ret); return -ENODEV; } - mem_node = of_parse_phandle(dev->of_node, "memory-region", 0); + mem_node = of_parse_phandle(dev->of_node, "memory-region", idx); if (mem_node) { ret = of_property_read_u64(mem_node, "size", &size64); if (ret) { -- 2.7.4
[PATCH v1 2/8] soc/fsl/qbman: Cleanup buffer pools if BMan was initialized prior to bootup
Clean the BMan buffer pools if the device had been initialized previously. This will ensure a consistent state if the kernel was soft restarted (kexec for example) Signed-off-by: Roy Pledge --- drivers/soc/fsl/qbman/bman.c| 17 + drivers/soc/fsl/qbman/bman_ccsr.c | 10 ++ drivers/soc/fsl/qbman/bman_portal.c | 18 +- drivers/soc/fsl/qbman/bman_priv.h | 5 + 4 files changed, 41 insertions(+), 9 deletions(-) diff --git a/drivers/soc/fsl/qbman/bman.c b/drivers/soc/fsl/qbman/bman.c index f84ab59..f4fb527 100644 --- a/drivers/soc/fsl/qbman/bman.c +++ b/drivers/soc/fsl/qbman/bman.c @@ -635,30 +635,31 @@ int bman_p_irqsource_add(struct bman_portal *p, u32 bits) return 0; } -static int bm_shutdown_pool(u32 bpid) +int bm_shutdown_pool(u32 bpid) { + int err = 0; struct bm_mc_command *bm_cmd; union bm_mc_result *bm_res; + + struct bman_portal *p = get_affine_portal(); while (1) { - struct bman_portal *p = get_affine_portal(); /* Acquire buffers until empty */ bm_cmd = bm_mc_start(&p->p); bm_cmd->bpid = bpid; bm_mc_commit(&p->p, BM_MCC_VERB_CMD_ACQUIRE | 1); if (!bm_mc_result_timeout(&p->p, &bm_res)) { - put_affine_portal(); pr_crit("BMan Acquire Command timedout\n"); - return -ETIMEDOUT; + err = -ETIMEDOUT; + goto done; } if (!(bm_res->verb & BM_MCR_VERB_ACQUIRE_BUFCOUNT)) { - put_affine_portal(); /* Pool is empty */ - return 0; + goto done; } - put_affine_portal(); } - +done: + put_affine_portal(); return 0; } diff --git a/drivers/soc/fsl/qbman/bman_ccsr.c b/drivers/soc/fsl/qbman/bman_ccsr.c index dc6d7e5..cb24a08 100644 --- a/drivers/soc/fsl/qbman/bman_ccsr.c +++ b/drivers/soc/fsl/qbman/bman_ccsr.c @@ -195,6 +195,16 @@ int bman_is_probed(void) } EXPORT_SYMBOL_GPL(bman_is_probed); +int bman_requires_cleanup(void) +{ + return __bman_requires_cleanup; +} + +void bman_done_cleanup(void) +{ + __bman_requires_cleanup = 0; +} + static int fsl_bman_probe(struct platform_device *pdev) { int ret, err_irq; diff --git a/drivers/soc/fsl/qbman/bman_portal.c b/drivers/soc/fsl/qbman/bman_portal.c index 7819bc2..6c1a734 100644 --- a/drivers/soc/fsl/qbman/bman_portal.c +++ b/drivers/soc/fsl/qbman/bman_portal.c @@ -100,7 +100,7 @@ static int bman_portal_probe(struct platform_device *pdev) struct device_node *node = dev->of_node; struct bm_portal_config *pcfg; struct resource *addr_phys[2]; - int irq, cpu, err; + int irq, cpu, err, i; err = bman_is_probed(); if (!err) @@ -180,6 +180,22 @@ static int bman_portal_probe(struct platform_device *pdev) if (!cpu_online(cpu)) bman_offline_cpu(cpu); + if (__bman_portals_probed == 1 && bman_requires_cleanup()) { + /* +* BMan wasn't reset prior to boot (Kexec for example) +* Empty all the buffer pools so they are in reset state +*/ + for (i = 0; i < BM_POOL_MAX; i++) { + err = bm_shutdown_pool(i); + if (err) { + dev_err(dev, "Failed to shutdown bpool %d\n", + i); + goto err_portal_init; + } + } + bman_done_cleanup(); + } + return 0; err_portal_init: diff --git a/drivers/soc/fsl/qbman/bman_priv.h b/drivers/soc/fsl/qbman/bman_priv.h index 751ce90..aa3981e 100644 --- a/drivers/soc/fsl/qbman/bman_priv.h +++ b/drivers/soc/fsl/qbman/bman_priv.h @@ -76,3 +76,8 @@ int bman_p_irqsource_add(struct bman_portal *p, u32 bits); const struct bm_portal_config * bman_get_bm_portal_config(const struct bman_portal *portal); + +int bman_requires_cleanup(void); +void bman_done_cleanup(void); + +int bm_shutdown_pool(u32 bpid); -- 2.7.4
[PATCH v1 1/8] soc/fsl/qbman: Rework QBMan private memory setup
Rework QBMan private memory setup so that the areas are not zeroed if the device was previously initialized If the QMan private memory was already initialized skip the PFDR initialization. Signed-off-by: Roy Pledge --- drivers/soc/fsl/qbman/bman_ccsr.c | 26 -- drivers/soc/fsl/qbman/dpaa_sys.c | 7 +++--- drivers/soc/fsl/qbman/qman_ccsr.c | 45 ++- 3 files changed, 67 insertions(+), 11 deletions(-) diff --git a/drivers/soc/fsl/qbman/bman_ccsr.c b/drivers/soc/fsl/qbman/bman_ccsr.c index 7c3cc96..dc6d7e5 100644 --- a/drivers/soc/fsl/qbman/bman_ccsr.c +++ b/drivers/soc/fsl/qbman/bman_ccsr.c @@ -97,17 +97,40 @@ static void bm_get_version(u16 *id, u8 *major, u8 *minor) /* signal transactions for FBPRs with higher priority */ #define FBPR_AR_RPRIO_HI BIT(30) -static void bm_set_memory(u64 ba, u32 size) +/* Track if probe has occurred and if cleanup is required */ +static int __bman_probed; +static int __bman_requires_cleanup; + + +static int bm_set_memory(u64 ba, u32 size) { + u32 bar, bare; u32 exp = ilog2(size); /* choke if size isn't within range */ DPAA_ASSERT(size >= 4096 && size <= 1024*1024*1024 && is_power_of_2(size)); /* choke if '[e]ba' has lower-alignment than 'size' */ DPAA_ASSERT(!(ba & (size - 1))); + + /* Check to see if BMan has already been initialized */ + bar = bm_ccsr_in(REG_FBPR_BAR); + if (bar) { + /* Maker sure ba == what was programmed) */ + bare = bm_ccsr_in(REG_FBPR_BARE); + if (bare != upper_32_bits(ba) || bar != lower_32_bits(ba)) { + pr_err("Attempted to reinitialize BMan with different BAR, got 0x%llx read BARE=0x%x BAR=0x%x\n", + ba, bare, bar); + return -ENOMEM; + } + pr_info("BMan BAR already configured\n"); + __bman_requires_cleanup = 1; + return 1; + } + bm_ccsr_out(REG_FBPR_BARE, upper_32_bits(ba)); bm_ccsr_out(REG_FBPR_BAR, lower_32_bits(ba)); bm_ccsr_out(REG_FBPR_AR, exp - 1); + return 0; } /* @@ -120,7 +143,6 @@ static void bm_set_memory(u64 ba, u32 size) */ static dma_addr_t fbpr_a; static size_t fbpr_sz; -static int __bman_probed; static int bman_fbpr(struct reserved_mem *rmem) { diff --git a/drivers/soc/fsl/qbman/dpaa_sys.c b/drivers/soc/fsl/qbman/dpaa_sys.c index e6d48dc..3e0a7f3 100644 --- a/drivers/soc/fsl/qbman/dpaa_sys.c +++ b/drivers/soc/fsl/qbman/dpaa_sys.c @@ -40,6 +40,7 @@ int qbman_init_private_mem(struct device *dev, int idx, dma_addr_t *addr, int ret; struct device_node *mem_node; u64 size64; + struct reserved_mem *rmem; ret = of_reserved_mem_device_init_by_idx(dev, dev->of_node, idx); if (ret) { @@ -62,10 +63,8 @@ int qbman_init_private_mem(struct device *dev, int idx, dma_addr_t *addr, return -ENODEV; } - if (!dma_alloc_coherent(dev, *size, addr, 0)) { - dev_err(dev, "DMA Alloc memory failed\n"); - return -ENODEV; - } + rmem = of_reserved_mem_lookup(mem_node); + *addr = rmem->base; /* * Disassociate the reserved memory area from the device diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c b/drivers/soc/fsl/qbman/qman_ccsr.c index 109b38d..d054e37 100644 --- a/drivers/soc/fsl/qbman/qman_ccsr.c +++ b/drivers/soc/fsl/qbman/qman_ccsr.c @@ -274,6 +274,7 @@ static u32 __iomem *qm_ccsr_start; /* A SDQCR mask comprising all the available/visible pool channels */ static u32 qm_pools_sdqcr; static int __qman_probed; +static int __qman_requires_cleanup; static inline u32 qm_ccsr_in(u32 offset) { @@ -340,19 +341,46 @@ static void qm_get_version(u16 *id, u8 *major, u8 *minor) } #define PFDR_AR_EN BIT(31) -static void qm_set_memory(enum qm_memory memory, u64 ba, u32 size) +static int qm_set_memory(enum qm_memory memory, u64 ba, u32 size) { + void *ptr; u32 offset = (memory == qm_memory_fqd) ? REG_FQD_BARE : REG_PFDR_BARE; u32 exp = ilog2(size); + u32 bar, bare; /* choke if size isn't within range */ DPAA_ASSERT((size >= 4096) && (size <= 1024*1024*1024) && is_power_of_2(size)); /* choke if 'ba' has lower-alignment than 'size' */ DPAA_ASSERT(!(ba & (size - 1))); + + /* Check to see if QMan has already been initialized */ + bar = qm_ccsr_in(offset + REG_offset_BAR); + if (bar) { + /* Maker sure ba == what was programmed) */ + bare = qm_ccsr_in(offset); + if (bare != upper_32_bits(ba) || bar != lower_32_bits(ba)) { + pr_err("Attempted to reinitialize QMan with different BAR, got 0x%llx read BARE=0x%x BAR=0x%x\n", + ba, bare, bar); +
[PATCH v1 0/8] soc/fsl/qbman: Enable Kexec for DPAA1 devices
Most DPAA1 devices do not support a soft reset which is an issue if Kexec starts a new kernel. This patch series allows Kexec to function by detecting that the QBMan device was previously initialized. The patches fix some issues with device cleanup as well as ensuring that the location of the QBMan private memories has not changed after the execution of the Kexec. Roy Pledge (8): soc/fsl/qbman: Rework QBMan private memory setup soc/fsl/qbman: Cleanup buffer pools if BMan was initialized prior to bootup soc/fsl/qbman: Cleanup QMan queues if device was already initialized soc/fsl/qbman: Use index when accessing device tree properties soc/fsl/qbman: Fix drain_mr_fqni() soc/fsl/qbman: Disable interrupts during portal recovery soc/fsl/qbman: Fixup qman_shutdown_fq() soc/fsl/qbman: Update device tree with reserved memory drivers/soc/fsl/qbman/bman.c| 17 drivers/soc/fsl/qbman/bman_ccsr.c | 36 +++- drivers/soc/fsl/qbman/bman_portal.c | 18 +++- drivers/soc/fsl/qbman/bman_priv.h | 5 +++ drivers/soc/fsl/qbman/dpaa_sys.c| 63 drivers/soc/fsl/qbman/qman.c| 83 + drivers/soc/fsl/qbman/qman_ccsr.c | 59 +++--- drivers/soc/fsl/qbman/qman_portal.c | 18 +++- drivers/soc/fsl/qbman/qman_priv.h | 8 9 files changed, 246 insertions(+), 61 deletions(-) -- 2.7.4
Re: [PATCH 1/8] powerpc/pseries: Use macros for referring to the DTL enable mask
"Naveen N. Rao" writes: > Introduce macros to encode the DTL enable mask fields and use those > instead of hardcoding numbers. This is a good cleanup on its own. Acked-by: Nathan Lynch
[PATCH v2] powerpc/boot: pass CONFIG options in a simpler and more robust way
Commit 5e9dcb6188a4 ("powerpc/boot: Expose Kconfig symbols to wrapper") was wrong, but commit e41b93a6be57 ("powerpc/boot: Fix build failures with -j 1") was also wrong. The correct dependency is: $(obj)/serial.o: $(obj)/autoconf.h However, I do not see the reason why we need to copy autoconf.h to arch/power/boot/. Nor do I see consistency in the way of passing CONFIG options. decompress.c references CONFIG_KERNEL_GZIP and CONFIG_KERNEL_XZ, which are passed via the command line. serial.c includes autoconf.h to reference a couple of CONFIG options, but this is fragile because we often forget to include "autoconf.h" from source files. In fact, it is already broken. ppc_asm.h references CONFIG_PPC_8xx, but utils.S is not given any way to access CONFIG options. So, CONFIG_PPC_8xx is never defined here. Pass $(LINUXINCLUDE) to make sure CONFIG options are accessible from all .c and .S files in arch/powerpc/boot/. I also removed the -traditional flag to make include/linux/kconfig.h work. This flag makes the preprocessor imitate the behavior of the pre-standard C compiler, but I do not understand why it is necessary. Signed-off-by: Masahiro Yamada --- Changes in v2: - reword commit log arch/powerpc/boot/.gitignore | 2 -- arch/powerpc/boot/Makefile | 14 +++--- arch/powerpc/boot/serial.c | 1 - 3 files changed, 3 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/boot/.gitignore b/arch/powerpc/boot/.gitignore index 32034a0c..6610665 100644 --- a/arch/powerpc/boot/.gitignore +++ b/arch/powerpc/boot/.gitignore @@ -44,5 +44,3 @@ fdt_sw.c fdt_wip.c libfdt.h libfdt_internal.h -autoconf.h - diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile index 73d1f35..b8a82be 100644 --- a/arch/powerpc/boot/Makefile +++ b/arch/powerpc/boot/Makefile @@ -20,9 +20,6 @@ all: $(obj)/zImage -compress-$(CONFIG_KERNEL_GZIP) := CONFIG_KERNEL_GZIP -compress-$(CONFIG_KERNEL_XZ) := CONFIG_KERNEL_XZ - ifdef CROSS32_COMPILE BOOTCC := $(CROSS32_COMPILE)gcc BOOTAR := $(CROSS32_COMPILE)ar @@ -34,7 +31,7 @@ endif BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \ -fno-strict-aliasing -O2 -msoft-float -mno-altivec -mno-vsx \ -pipe -fomit-frame-pointer -fno-builtin -fPIC -nostdinc \ --D$(compress-y) +$(LINUXINCLUDE) ifdef CONFIG_PPC64_BOOT_WRAPPER BOOTCFLAGS += -m64 @@ -51,7 +48,7 @@ BOOTCFLAGS+= -mlittle-endian BOOTCFLAGS += $(call cc-option,-mabi=elfv2) endif -BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -traditional -nostdinc +BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -nostdinc BOOTARFLAGS:= -cr$(KBUILD_ARFLAGS) @@ -202,14 +199,9 @@ $(obj)/empty.c: $(obj)/zImage.coff.lds $(obj)/zImage.ps3.lds : $(obj)/%: $(srctree)/$(src)/%.S $(Q)cp $< $@ -$(srctree)/$(src)/serial.c: $(obj)/autoconf.h - -$(obj)/autoconf.h: $(obj)/%: $(objtree)/include/generated/% - $(Q)cp $< $@ - clean-files := $(zlib-) $(zlibheader-) $(zliblinuxheader-) \ $(zlib-decomp-) $(libfdt) $(libfdtheader) \ - autoconf.h empty.c zImage.coff.lds zImage.ps3.lds zImage.lds + empty.c zImage.coff.lds zImage.ps3.lds zImage.lds quiet_cmd_bootcc = BOOTCC $@ cmd_bootcc = $(BOOTCC) -Wp,-MD,$(depfile) $(BOOTCFLAGS) -c -o $@ $< diff --git a/arch/powerpc/boot/serial.c b/arch/powerpc/boot/serial.c index b0491b8..9457863 100644 --- a/arch/powerpc/boot/serial.c +++ b/arch/powerpc/boot/serial.c @@ -18,7 +18,6 @@ #include "stdio.h" #include "io.h" #include "ops.h" -#include "autoconf.h" static int serial_open(void) { -- 2.7.4
Re: [PATCH] vsprintf: Do not break early boot with probing addresses
On Mon, 13 May 2019 14:42:20 +0200 Petr Mladek wrote: > > The "(null)" is good enough by itself and already an established > > practice.. > > (efault) made more sense with the probe_kernel_read() that > checked wide range of addresses. Well, I still think that > it makes sense to distinguish a pure NULL. And it still > used also for IS_ERR_VALUE(). Why not just "(fault)"? That is self descriptive enough. -- Steve
Re: [PATCH v6 1/1] iommu: enhance IOMMU dma mode build options
On 2019/5/8 17:42, John Garry wrote: > On 18/04/2019 14:57, Zhen Lei wrote: >> First, add build option IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the >> opportunity to set {lazy|strict} mode as default at build time. Then put >> the three config options in an choice, make people can only choose one of >> the three at a time. >> >> The default IOMMU dma modes on each ARCHs have no change. >> >> Signed-off-by: Zhen Lei >> --- >> arch/ia64/kernel/pci-dma.c| 2 +- >> arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++- >> arch/s390/pci/pci_dma.c | 2 +- >> arch/x86/kernel/pci-dma.c | 7 ++--- >> drivers/iommu/Kconfig | 44 >> ++- >> drivers/iommu/amd_iommu_init.c| 3 ++- >> drivers/iommu/intel-iommu.c | 2 +- >> drivers/iommu/iommu.c | 3 ++- >> 8 files changed, 48 insertions(+), 18 deletions(-) >> >> diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c >> index fe988c49f01ce6a..655511dbf3c3b34 100644 >> --- a/arch/ia64/kernel/pci-dma.c >> +++ b/arch/ia64/kernel/pci-dma.c >> @@ -22,7 +22,7 @@ >> int force_iommu __read_mostly; >> #endif >> >> -int iommu_pass_through; >> +int iommu_pass_through = IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH); >> >> static int __init pci_iommu_init(void) >> { >> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c >> b/arch/powerpc/platforms/powernv/pci-ioda.c >> index 3ead4c237ed0ec9..383e082a9bb985c 100644 >> --- a/arch/powerpc/platforms/powernv/pci-ioda.c >> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c >> @@ -85,7 +85,8 @@ void pe_level_printk(const struct pnv_ioda_pe *pe, const >> char *level, >> va_end(args); >> } >> >> -static bool pnv_iommu_bypass_disabled __read_mostly; >> +static bool pnv_iommu_bypass_disabled __read_mostly = >> +!IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH); >> static bool pci_reset_phbs __read_mostly; >> >> static int __init iommu_setup(char *str) >> diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c >> index 9e52d1527f71495..784ad1e0acecfb1 100644 >> --- a/arch/s390/pci/pci_dma.c >> +++ b/arch/s390/pci/pci_dma.c >> @@ -17,7 +17,7 @@ >> >> static struct kmem_cache *dma_region_table_cache; >> static struct kmem_cache *dma_page_table_cache; >> -static int s390_iommu_strict; >> +static int s390_iommu_strict = IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT); >> >> static int zpci_refresh_global(struct zpci_dev *zdev) >> { >> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c >> index d460998ae828514..fb2bab42a0a3173 100644 >> --- a/arch/x86/kernel/pci-dma.c >> +++ b/arch/x86/kernel/pci-dma.c >> @@ -43,11 +43,8 @@ >> * It is also possible to disable by default in kernel config, and enable >> with >> * iommu=nopt at boot time. >> */ >> -#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH >> -int iommu_pass_through __read_mostly = 1; >> -#else >> -int iommu_pass_through __read_mostly; >> -#endif >> +int iommu_pass_through __read_mostly = >> +IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH); >> >> extern struct iommu_table_entry __iommu_table[], __iommu_table_end[]; >> >> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig >> index 6f07f3b21816c64..8a1f1793cde76b4 100644 >> --- a/drivers/iommu/Kconfig >> +++ b/drivers/iommu/Kconfig >> @@ -74,17 +74,47 @@ config IOMMU_DEBUGFS >>debug/iommu directory, and then populate a subdirectory with >>entries as required. >> >> -config IOMMU_DEFAULT_PASSTHROUGH >> -bool "IOMMU passthrough by default" >> +choice >> +prompt "IOMMU dma mode" > > /s/dma/DMA/ OK > > And how about add "default", as in "Default IOMMU DMA mode" or "IOMMU default > DMA mode"? Yes. I prefer "IOMMU default DMA mode". > >> depends on IOMMU_API >> -help >> - Enable passthrough by default, removing the need to pass in >> - iommu.passthrough=on or iommu=pt through command line. If this >> - is enabled, you can still disable with iommu.passthrough=off >> - or iommu=nopt depending on the architecture. >> +default IOMMU_DEFAULT_PASSTHROUGH if (PPC_POWERNV && PCI) >> +default IOMMU_DEFAULT_LAZY if (AMD_IOMMU || INTEL_IOMMU || S390_IOMMU) >> +default IOMMU_DEFAULT_STRICT >> +help >> + This option allows IOMMU dma mode to be chose at build time, to > > again, capitalize acronyms, i.e. /s/dma/DMA/ (more of these above and below) OK, I will check it all. Thanks. > >> + override the default dma mode of each ARCHs, removing the need to >> + pass in kernel parameters through command line. You can still use >> + ARCHs specific boot options to override this option again. >> + >> +config IOMMU_DEFAULT_PASSTHROUGH > > I think that it may need to be indented, along with the other choices There is no problem. I referred to mm/Kconfig. > >> +bool "passthrough" >> +help >> + In this mode, the dma access through IOMMU withou
Re: [PATCH] powerpc/boot: fix broken way to pass CONFIG options
On Mon, May 13, 2019 at 11:56 PM Masahiro Yamada wrote: > > On Mon, May 13, 2019 at 9:33 PM Masahiro Yamada > wrote: > > > > Commit 5e9dcb6188a4 ("powerpc/boot: Expose Kconfig symbols to wrapper") > > was wrong, but commit e41b93a6be57 ("powerpc/boot: Fix build failures > > with -j 1") was also wrong. > > > > Check-in source files never ever depend on build artifacts. > > > > The correct dependency is: > > > > $(obj)/serial.o: $(obj)/autoconf.h > > > > However, copying autoconf.h to arch/power/boot/ is questionable > > in the first place. > > > > arch/powerpc/Makefile adopted multiple ways to pass CONFIG options. > > > > arch/powerpc/boot/decompress.c references CONFIG_KERNEL_GZIP and > > CONFIG_KERNEL_XZ, which are passed via the command line. > > > > arch/powerpc/boot/serial.c includes the copied autoconf.h to > > reference a couple of CONFIG options. > > > > Do not do this. > > > > We should have already learned that including autoconf.h from each > > source file is really fragile. > > > > In fact, it is already broken. > > > > arch/powerpc/boot/ppc_asm.h references CONFIG_PPC_8xx, but > > arch/powerpc/boot/utils.S is not given any way to access CONFIG > > options. So, CONFIG_PPC_8xx is never defined here. > > > > Just pass $(LINUXINCLUDE) and remove all broken code. > > > > I also removed the -traditional flag to make include/linux/kconfig.h > > work. I do not understand why it needs to imitate the behavior of > > pre-standard C preprocessors. > > > > Signed-off-by: Masahiro Yamada > > --- > > > I re-read my commit log, and I thought it was needlessly > too offensive. Sorry about that. > > I will reword the commit log and send v2. > No worries. We know the bootwrapper is... not great. > > > > -- > Best Regards > Masahiro Yamada
Re: [PATCH] powerpc/boot: fix broken way to pass CONFIG options
On Mon, May 13, 2019 at 9:23 PM Masahiro Yamada wrote: > > Commit 5e9dcb6188a4 ("powerpc/boot: Expose Kconfig symbols to wrapper") > was wrong, but commit e41b93a6be57 ("powerpc/boot: Fix build failures > with -j 1") was also wrong. > > Check-in source files never ever depend on build artifacts. > > The correct dependency is: > > $(obj)/serial.o: $(obj)/autoconf.h > > However, copying autoconf.h to arch/power/boot/ is questionable > in the first place. > > arch/powerpc/Makefile adopted multiple ways to pass CONFIG options. > > arch/powerpc/boot/decompress.c references CONFIG_KERNEL_GZIP and > CONFIG_KERNEL_XZ, which are passed via the command line. > > arch/powerpc/boot/serial.c includes the copied autoconf.h to > reference a couple of CONFIG options. > > Do not do this. > > We should have already learned that including autoconf.h from each > source file is really fragile. > > In fact, it is already broken. > > arch/powerpc/boot/ppc_asm.h references CONFIG_PPC_8xx, but > arch/powerpc/boot/utils.S is not given any way to access CONFIG > options. So, CONFIG_PPC_8xx is never defined here. > > Just pass $(LINUXINCLUDE) and remove all broken code. I'm not sure how safe this is. The original reason for the CONFIG_KERNEL_XZ hack in the makefile was because the kernel headers couldn't be included directly. The bootwrapper is compiled with a 32bit toolchain when the kernel is compiled for 64bit big endian because of older systems with broken firmware that can't load 64bit ELFs directly. When I added XZ support to the wrapper I did experiment with including the kernel headers directly and couldn't make it work reliably. I don't remember what the exact reason was, but I think it was something to do with the generated headers not always matching what you would expect when compiling for 32bit. It's also possible I was just being paranoid. Either way it's about time we found a real fix... The stuff in serial.c and ppc_asm.h was added later to work around other issues without anyone thinking too hard about it. Oh well. > I also removed the -traditional flag to make include/linux/kconfig.h > work. I do not understand why it needs to imitate the behavior of > pre-standard C preprocessors. I'm not sure why it's there either. The boot wrapper was re-written at some point so it might just be a hold over from the dawn of time. > Signed-off-by: Masahiro Yamada > --- > > arch/powerpc/boot/.gitignore | 2 -- > arch/powerpc/boot/Makefile | 14 +++--- > arch/powerpc/boot/serial.c | 1 - > 3 files changed, 3 insertions(+), 14 deletions(-) > > diff --git a/arch/powerpc/boot/.gitignore b/arch/powerpc/boot/.gitignore > index 32034a0cc554..6610665fcf5e 100644 > --- a/arch/powerpc/boot/.gitignore > +++ b/arch/powerpc/boot/.gitignore > @@ -44,5 +44,3 @@ fdt_sw.c > fdt_wip.c > libfdt.h > libfdt_internal.h > -autoconf.h > - > diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile > index 73d1f3562978..b8a82be2af2a 100644 > --- a/arch/powerpc/boot/Makefile > +++ b/arch/powerpc/boot/Makefile > @@ -20,9 +20,6 @@ > > all: $(obj)/zImage > > -compress-$(CONFIG_KERNEL_GZIP) := CONFIG_KERNEL_GZIP > -compress-$(CONFIG_KERNEL_XZ) := CONFIG_KERNEL_XZ > - > ifdef CROSS32_COMPILE > BOOTCC := $(CROSS32_COMPILE)gcc > BOOTAR := $(CROSS32_COMPILE)ar > @@ -34,7 +31,7 @@ endif > BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \ > -fno-strict-aliasing -O2 -msoft-float -mno-altivec -mno-vsx \ > -pipe -fomit-frame-pointer -fno-builtin -fPIC -nostdinc \ > --D$(compress-y) > +$(LINUXINCLUDE) > > ifdef CONFIG_PPC64_BOOT_WRAPPER > BOOTCFLAGS += -m64 > @@ -51,7 +48,7 @@ BOOTCFLAGS+= -mlittle-endian > BOOTCFLAGS += $(call cc-option,-mabi=elfv2) > endif > > -BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -traditional -nostdinc > +BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -nostdinc > > BOOTARFLAGS:= -cr$(KBUILD_ARFLAGS) > > @@ -202,14 +199,9 @@ $(obj)/empty.c: > $(obj)/zImage.coff.lds $(obj)/zImage.ps3.lds : $(obj)/%: > $(srctree)/$(src)/%.S > $(Q)cp $< $@ > > -$(srctree)/$(src)/serial.c: $(obj)/autoconf.h > - > -$(obj)/autoconf.h: $(obj)/%: $(objtree)/include/generated/% > - $(Q)cp $< $@ > - > clean-files := $(zlib-) $(zlibheader-) $(zliblinuxheader-) \ > $(zlib-decomp-) $(libfdt) $(libfdtheader) \ > - autoconf.h empty.c zImage.coff.lds zImage.ps3.lds zImage.lds > + empty.c zImage.coff.lds zImage.ps3.lds zImage.lds > > quiet_cmd_bootcc = BOOTCC $@ >cmd_bootcc = $(BOOTCC) -Wp,-MD,$(depfile) $(BOOTCFLAGS) -c -o $@ $< > diff --git a/arch/powerpc/boot/serial.c b/arch/powerpc/boot/serial.c > index b0491b8c0199..9457863147f9 100644 > --- a/arch/powerpc/boot/serial.c > +++ b/arch/powerpc/boot/serial.c > @@ -18,7 +18,6 @@ > #include "stdio.h" > #include "io.h" > #include "ops.h" > -#include "autoconf.h" > > static int serial_open(void) > {
Re: [PATCH] powerpc/boot: fix broken way to pass CONFIG options
On Mon, May 13, 2019 at 9:33 PM Masahiro Yamada wrote: > > Commit 5e9dcb6188a4 ("powerpc/boot: Expose Kconfig symbols to wrapper") > was wrong, but commit e41b93a6be57 ("powerpc/boot: Fix build failures > with -j 1") was also wrong. > > Check-in source files never ever depend on build artifacts. > > The correct dependency is: > > $(obj)/serial.o: $(obj)/autoconf.h > > However, copying autoconf.h to arch/power/boot/ is questionable > in the first place. > > arch/powerpc/Makefile adopted multiple ways to pass CONFIG options. > > arch/powerpc/boot/decompress.c references CONFIG_KERNEL_GZIP and > CONFIG_KERNEL_XZ, which are passed via the command line. > > arch/powerpc/boot/serial.c includes the copied autoconf.h to > reference a couple of CONFIG options. > > Do not do this. > > We should have already learned that including autoconf.h from each > source file is really fragile. > > In fact, it is already broken. > > arch/powerpc/boot/ppc_asm.h references CONFIG_PPC_8xx, but > arch/powerpc/boot/utils.S is not given any way to access CONFIG > options. So, CONFIG_PPC_8xx is never defined here. > > Just pass $(LINUXINCLUDE) and remove all broken code. > > I also removed the -traditional flag to make include/linux/kconfig.h > work. I do not understand why it needs to imitate the behavior of > pre-standard C preprocessors. > > Signed-off-by: Masahiro Yamada > --- I re-read my commit log, and I thought it was needlessly too offensive. Sorry about that. I will reword the commit log and send v2. -- Best Regards Masahiro Yamada
Re: [PATCH] vsprintf: Do not break early boot with probing addresses
On Mon 2019-05-13 12:13:20, Andy Shevchenko wrote: > On Mon, May 13, 2019 at 08:52:41AM +, David Laight wrote: > > From: christophe leroy > > > Sent: 10 May 2019 18:35 > > > Le 10/05/2019 à 18:24, Steven Rostedt a écrit : > > > > On Fri, 10 May 2019 10:42:13 +0200 > > > > Petr Mladek wrote: > > > > >> - if (probe_kernel_address(ptr, byte)) > > > >> + if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > > > >>return "(efault)"; > > > > "efault" looks a bit like a spellling mistake for "default". > > It's a special, thus it's in parenthesis, though somebody can be > misguided. > > > > Usually, < PAGE_SIZE means NULL pointer dereference (via the member of a > > > struct) > > > > Maybe the caller should pass in a short buffer so that you can return > > "(err-%d)" > > or "(null+%#x)" ? There are many vsprintf() users. I am afraid that nobody would want to define extra buffers for error messages. It must fit into the one for the formatted string. > In both cases it should be limited to the size of pointer (8 or 16 > characters). Something like "(e:%4d)" would work for error codes. I am afraid that we could get 10 different proposals from a huge enough group of people. I wanted to start with something simple (source code). I know that (efault) might be confused with "default" but it is still just one string to grep. > The "(null)" is good enough by itself and already an established > practice.. (efault) made more sense with the probe_kernel_read() that checked wide range of addresses. Well, I still think that it makes sense to distinguish a pure NULL. And it still used also for IS_ERR_VALUE(). Best Regards, Petr
[Bug 203571] Allow use of SIMD in interrupts on PowerPC
https://bugzilla.kernel.org/show_bug.cgi?id=203571 --- Comment #2 from Shawn Landden (sland...@gmail.com) --- X86 manages to allow SIMD in interrupts through some very careful code in arch/x86/kernel/fpu/core.c (starting with irq_fpu_usable()) PowerPC can do the same. -- You are receiving this mail because: You are watching the assignee of the bug.
Re: [PATCH] vsprintf: Do not break early boot with probing addresses
On Fri 2019-05-10 12:40:58, Steven Rostedt wrote: > On Fri, 10 May 2019 18:32:58 +0200 > Martin Schwidefsky wrote: > > > On Fri, 10 May 2019 12:24:01 -0400 > > Steven Rostedt wrote: > > > > > On Fri, 10 May 2019 10:42:13 +0200 > > > Petr Mladek wrote: > > > > > > > static const char *check_pointer_msg(const void *ptr) > > > > { > > > > - char byte; > > > > - > > > > if (!ptr) > > > > return "(null)"; > > > > > > > > - if (probe_kernel_address(ptr, byte)) > > > > + if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > > > > return "(efault)"; > > > > > > > > > > > > > < PAGE_SIZE ? > > > > > > do you mean: < TASK_SIZE ? > > > > The check with < TASK_SIZE would break on s390. The 'ptr' is > > in the kernel address space, *not* in the user address space. > > Remember s390 has two separate address spaces for kernel/user > > the check < TASK_SIZE only makes sense with a __user pointer. > > > > So we allow this to read user addresses? Can't that cause a fault? I did some quick check and did not found anywhere a user pointer being dereferenced via vsprintf(). In each case, %s did the check (ptr < PAGE_SIZE) even before this patchset. The other checks are in %p format modifiers that are used to print various kernel structures. Finally, it accesses the pointer directly. I am not completely sure but I think that it would not work that easily with an address from the user address space. Best Regards, Petr
Re: [PATCH, RFC] byteorder: sanity check toolchain vs kernel endianess
On Mon, May 13, 2019 at 01:50:19PM +0200, Dmitry Vyukov wrote: > > We did have some bugs in the past (~1-2 y/ago) but AFAIK they are all > > fixed now. These days I build most of my kernels with a bi-endian 64-bit > > toolchain, and switching endian without running `make clean` also works. > > For the record, yes, it turn out to be a problem in our code (a latent > bug). We actually used host (x86) gcc to build as-if ppc code that can > run on the host, so it defined neither LE no BE macros. It just > happened to work in the past :) So Nick was right and these checks actually are useful..
Re: [PATCH RESEND] powerpc: add simd.h implementation specific to PowerPC
Shawn Landden writes: > It is safe to do SIMD in an interrupt on PowerPC. No it's not sorry :) > Only disable when there is no SIMD available > (and this is a static branch). > > Tested and works with the WireGuard (Zinc) patch I wrote that needs this. > Also improves performance of the crypto subsystem that checks this. > > Re-sending because this linuxppc-dev didn't seem to accept it. It did but you were probably moderated as a non-subscriber? In future if you just wait a while for the moderators to wake up it should come through. Though having posted once and been approved I think you might not get moderated at all in future (?). > Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=203571 > Signed-off-by: Shawn Landden > --- > arch/powerpc/include/asm/simd.h | 15 +++ > 1 file changed, 15 insertions(+) > create mode 100644 arch/powerpc/include/asm/simd.h > > diff --git a/arch/powerpc/include/asm/simd.h b/arch/powerpc/include/asm/simd.h > new file mode 100644 > index 0..b3fecb95a > --- /dev/null > +++ b/arch/powerpc/include/asm/simd.h > @@ -0,0 +1,15 @@ > +/* SPDX-License-Identifier: GPL-2.0+ */ > + > +#include > + > +/* > + * may_use_simd - whether it is allowable at this time to issue SIMD > + *instructions or access the SIMD register file > + * > + * As documented in Chapter 6.2.1 Machine Status Save/Restore Registers > + * of Power ISA (2.07 and 3.0), all registers are saved/restored in an > interrupt. I think the confusion here is that the ISA says: When various interrupts occur, the state of the machine is saved in the Machine Status Save/Restore registers (SRR0 and SRR1). And you've read that to mean all "the state" of the machine, ie. including GPRs, FPRs etc. But unfortunately it's not that simple. All the hardware does is write two 64-bit registers (SRR0 & SRR1) with the information required to be able to return to the state the CPU was in prior to the interrupt. That includes (obviously) the PC you were executing at, and what state the CPU was in (ie. 64/32-bit, MMU on/off, FP on/off etc.). All the saving of registers etc. is left up to software. It's the RISC way :) I guess we need to work out why exactly may_use_simd() is returning false in your kworker. cheers
Re: [PATCH] crypto: vmx - fix copy-paste error in CTR mode
Herbert Xu writes: > On Mon, May 06, 2019 at 08:53:17AM -0700, Eric Biggers wrote: >> >> Any progress on this? Someone just reported this again here: >> https://bugzilla.kernel.org/show_bug.cgi?id=203515 > > Guys if I don't get a fix for this soon I'll have to disable CTR > in vmx. No objection from me. I'll try and debug it at some point if no one else does, but I can't make it my top priority sorry. cheers
Re: [PATCH v3 4/6] dt-bindings: soc/fsl: qe: document new fsl,qe-snums binding
On Mon, 2019-05-13 at 11:14 +, Rasmus Villemoes wrote: > CAUTION: This email originated from outside of the organization. Do not click > links or open attachments unless you recognize the sender and know the > content is safe. > > > Reading table 4-30, and its footnotes, of the QUICC Engine Block > Reference Manual shows that the set of snum _values_ is not > necessarily just a function of the _number_ of snums, as given in the > fsl,qe-num-snums property. > > As an alternative, to make it easier to add support for other variants > of the QUICC engine IP, this introduces a new binding fsl,qe-snums, > which automatically encodes both the number of snums and the actual > values to use. > > Signed-off-by: Rasmus Villemoes > --- > Rob, thanks for the review of v2. However, since I moved the example > from the commit log to the binding (per Joakim's request), I didn't Thanks, looks good now. > add a Reviewed-by tag for this revision. > > .../devicetree/bindings/soc/fsl/cpm_qe/qe.txt | 13 - > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/Documentation/devicetree/bindings/soc/fsl/cpm_qe/qe.txt > b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/qe.txt > index d7afaff5faff..05ec2a838c54 100644 > --- a/Documentation/devicetree/bindings/soc/fsl/cpm_qe/qe.txt > +++ b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/qe.txt > @@ -18,7 +18,8 @@ Required properties: > - reg : offset and length of the device registers. > - bus-frequency : the clock frequency for QUICC Engine. > - fsl,qe-num-riscs: define how many RISC engines the QE has. > -- fsl,qe-num-snums: define how many serial number(SNUM) the QE can use for > the > +- fsl,qe-snums: This property has to be specified as '/bits/ 8' value, > + defining the array of serial number (SNUM) values for the virtual >threads. > > Optional properties: > @@ -34,6 +35,11 @@ Recommended properties > - brg-frequency : the internal clock source frequency for baud-rate >generators in Hz. > > +Deprecated properties > +- fsl,qe-num-snums: define how many serial number(SNUM) the QE can use > + for the threads. Use fsl,qe-snums instead to not only specify the > + number of snums, but also their values. > + > Example: > qe@e010 { > #address-cells = <1>; > @@ -44,6 +50,11 @@ Example: > reg = ; > brg-frequency = <0>; > bus-frequency = <179A7B00>; > + fsl,qe-snums = /bits/ 8 < > + 0x04 0x05 0x0C 0x0D 0x14 0x15 0x1C 0x1D > + 0x24 0x25 0x2C 0x2D 0x34 0x35 0x88 0x89 > + 0x98 0x99 0xA8 0xA9 0xB8 0xB9 0xC8 0xC9 > + 0xD8 0xD9 0xE8 0xE9>; > } > > * Multi-User RAM (MURAM) > -- > 2.20.1 >
Re: [RFC PATCH] powerpc/64/ftrace: mprofile-kernel patch out mflr
"Naveen N. Rao" writes: > Michael Ellerman wrote: >> Nicholas Piggin writes: >>> The new mprofile-kernel mcount sequence is >>> >>> mflr r0 >>> bl_mcount >>> >>> Dynamic ftrace patches the branch instruction with a noop, but leaves >>> the mflr. mflr is executed by the branch unit that can only execute one >>> per cycle on POWER9 and shared with branches, so it would be nice to >>> avoid it where possible. >>> >>> This patch is a hacky proof of concept to nop out the mflr. Can we do >>> this or are there races or other issues with it? >> >> There's a race, isn't there? >> >> We have a function foo which currently has tracing disabled, so the mflr >> and bl are nop'ed out. >> >> CPU 0 CPU 1 >> == >> bl foo >> nop (ie. not mflr) >> -> interrupt >> something else enable tracing for foo >> ...patch mflr and branch >> <- rfi >> bl _mcount >> >> So we end up in _mcount() but with r0 not populated. > > Good catch! Looks like we need to patch the mflr with a "b +8" similar > to what we do in __ftrace_make_nop(). Would that actually make it any faster though? Nick? cheers
Re: [PATCH, RFC] byteorder: sanity check toolchain vs kernel endianess
Dmitry Vyukov writes: > From: Arnd Bergmann > Date: Sat, May 11, 2019 at 2:51 AM > To: Dmitry Vyukov > Cc: Nick Kossifidis, Christoph Hellwig, Linus Torvalds, Andrew Morton, > linux-arch, Linux Kernel Mailing List, linuxppc-dev > >> On Fri, May 10, 2019 at 6:53 AM Dmitry Vyukov wrote: >> > > >> > > I think it's good to have a sanity check in-place for consistency. >> > >> > >> > Hi, >> > >> > This broke our cross-builds from x86. I am using: >> > >> > $ powerpc64le-linux-gnu-gcc --version >> > powerpc64le-linux-gnu-gcc (Debian 7.2.0-7) 7.2.0 >> > >> > and it says that it's little-endian somehow: >> > >> > $ powerpc64le-linux-gnu-gcc -dM -E - < /dev/null | grep BYTE_ORDER >> > #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ >> > >> > Is it broke compiler? Or I always hold it wrong? Is there some >> > additional flag I need to add? >> >> It looks like a bug in the kernel Makefiles to me. powerpc32 is always >> big-endian, >> powerpc64 used to be big-endian but is now usually little-endian. There are >> often three separate toolchains that default to the respective user >> space targets >> (ppc32be, ppc64be, ppc64le), but generally you should be able to build >> any of the >> three kernel configurations with any of those compilers, and have the >> Makefile >> pass the correct -m32/-m64/-mbig-endian/-mlittle-endian command line options >> depending on the kernel configuration. It seems that this is not happening >> here. I have not checked why, but if this is the problem, it should be >> easy enough >> to figure out. > > > Thanks! This clears a lot. > This may be a bug in our magic as we try to build kernel files outside > of make with own flags (required to extract parts of kernel > interfaces). > So don't spend time looking for the Makefile bugs yet. OK :) We did have some bugs in the past (~1-2 y/ago) but AFAIK they are all fixed now. These days I build most of my kernels with a bi-endian 64-bit toolchain, and switching endian without running `make clean` also works. cheers
[PATCH] powerpc/boot: fix broken way to pass CONFIG options
Commit 5e9dcb6188a4 ("powerpc/boot: Expose Kconfig symbols to wrapper") was wrong, but commit e41b93a6be57 ("powerpc/boot: Fix build failures with -j 1") was also wrong. Check-in source files never ever depend on build artifacts. The correct dependency is: $(obj)/serial.o: $(obj)/autoconf.h However, copying autoconf.h to arch/power/boot/ is questionable in the first place. arch/powerpc/Makefile adopted multiple ways to pass CONFIG options. arch/powerpc/boot/decompress.c references CONFIG_KERNEL_GZIP and CONFIG_KERNEL_XZ, which are passed via the command line. arch/powerpc/boot/serial.c includes the copied autoconf.h to reference a couple of CONFIG options. Do not do this. We should have already learned that including autoconf.h from each source file is really fragile. In fact, it is already broken. arch/powerpc/boot/ppc_asm.h references CONFIG_PPC_8xx, but arch/powerpc/boot/utils.S is not given any way to access CONFIG options. So, CONFIG_PPC_8xx is never defined here. Just pass $(LINUXINCLUDE) and remove all broken code. I also removed the -traditional flag to make include/linux/kconfig.h work. I do not understand why it needs to imitate the behavior of pre-standard C preprocessors. Signed-off-by: Masahiro Yamada --- arch/powerpc/boot/.gitignore | 2 -- arch/powerpc/boot/Makefile | 14 +++--- arch/powerpc/boot/serial.c | 1 - 3 files changed, 3 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/boot/.gitignore b/arch/powerpc/boot/.gitignore index 32034a0cc554..6610665fcf5e 100644 --- a/arch/powerpc/boot/.gitignore +++ b/arch/powerpc/boot/.gitignore @@ -44,5 +44,3 @@ fdt_sw.c fdt_wip.c libfdt.h libfdt_internal.h -autoconf.h - diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile index 73d1f3562978..b8a82be2af2a 100644 --- a/arch/powerpc/boot/Makefile +++ b/arch/powerpc/boot/Makefile @@ -20,9 +20,6 @@ all: $(obj)/zImage -compress-$(CONFIG_KERNEL_GZIP) := CONFIG_KERNEL_GZIP -compress-$(CONFIG_KERNEL_XZ) := CONFIG_KERNEL_XZ - ifdef CROSS32_COMPILE BOOTCC := $(CROSS32_COMPILE)gcc BOOTAR := $(CROSS32_COMPILE)ar @@ -34,7 +31,7 @@ endif BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \ -fno-strict-aliasing -O2 -msoft-float -mno-altivec -mno-vsx \ -pipe -fomit-frame-pointer -fno-builtin -fPIC -nostdinc \ --D$(compress-y) +$(LINUXINCLUDE) ifdef CONFIG_PPC64_BOOT_WRAPPER BOOTCFLAGS += -m64 @@ -51,7 +48,7 @@ BOOTCFLAGS+= -mlittle-endian BOOTCFLAGS += $(call cc-option,-mabi=elfv2) endif -BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -traditional -nostdinc +BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -nostdinc BOOTARFLAGS:= -cr$(KBUILD_ARFLAGS) @@ -202,14 +199,9 @@ $(obj)/empty.c: $(obj)/zImage.coff.lds $(obj)/zImage.ps3.lds : $(obj)/%: $(srctree)/$(src)/%.S $(Q)cp $< $@ -$(srctree)/$(src)/serial.c: $(obj)/autoconf.h - -$(obj)/autoconf.h: $(obj)/%: $(objtree)/include/generated/% - $(Q)cp $< $@ - clean-files := $(zlib-) $(zlibheader-) $(zliblinuxheader-) \ $(zlib-decomp-) $(libfdt) $(libfdtheader) \ - autoconf.h empty.c zImage.coff.lds zImage.ps3.lds zImage.lds + empty.c zImage.coff.lds zImage.ps3.lds zImage.lds quiet_cmd_bootcc = BOOTCC $@ cmd_bootcc = $(BOOTCC) -Wp,-MD,$(depfile) $(BOOTCFLAGS) -c -o $@ $< diff --git a/arch/powerpc/boot/serial.c b/arch/powerpc/boot/serial.c index b0491b8c0199..9457863147f9 100644 --- a/arch/powerpc/boot/serial.c +++ b/arch/powerpc/boot/serial.c @@ -18,7 +18,6 @@ #include "stdio.h" #include "io.h" #include "ops.h" -#include "autoconf.h" static int serial_open(void) { -- 2.17.1
[PATCH v3 1/6] soc/fsl/qe: qe.c: drop useless static qualifier
The local variable snum_init has no reason to have static storage duration. Reviewed-by: Christophe Leroy Reviewed-by: Qiang Zhao Signed-off-by: Rasmus Villemoes --- drivers/soc/fsl/qe/qe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c index 612d9c551be5..855373deb746 100644 --- a/drivers/soc/fsl/qe/qe.c +++ b/drivers/soc/fsl/qe/qe.c @@ -306,7 +306,7 @@ static void qe_snums_init(void) 0x28, 0x29, 0x38, 0x39, 0x48, 0x49, 0x58, 0x59, 0x68, 0x69, 0x78, 0x79, 0x80, 0x81, }; - static const u8 *snum_init; + const u8 *snum_init; qe_num_of_snum = qe_get_num_of_snums(); -- 2.20.1
[PATCH v3 6/6] soc/fsl/qe: qe.c: fold qe_get_num_of_snums into qe_snums_init
The comment "No QE ever has fewer than 28 SNUMs" is false; e.g. the MPC8309 has 14. The code path returning -EINVAL is also a recipe for instant disaster, since the caller (qe_snums_init) uncritically assigns the return value to the unsigned qe_num_of_snum, and would thus proceed to attempt to copy 4GB from snum_init_46[] to the snum[] array. So fold the handling of the legacy fsl,qe-num-snums into qe_snums_init, and make sure we do not end up using the snum_init_46 array in cases other than the two where we know it makes sense. Reviewed-by: Christophe Leroy Reviewed-by: Qiang Zhao Signed-off-by: Rasmus Villemoes --- drivers/soc/fsl/qe/qe.c | 46 ++--- 1 file changed, 16 insertions(+), 30 deletions(-) diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c index 1d27187b251c..852060caff24 100644 --- a/drivers/soc/fsl/qe/qe.c +++ b/drivers/soc/fsl/qe/qe.c @@ -308,24 +308,33 @@ static void qe_snums_init(void) int i; bitmap_zero(snum_state, QE_NUM_OF_SNUM); + qe_num_of_snum = 28; /* The default number of snum for threads is 28 */ qe = qe_get_device_node(); if (qe) { i = of_property_read_variable_u8_array(qe, "fsl,qe-snums", snums, 1, QE_NUM_OF_SNUM); - of_node_put(qe); if (i > 0) { + of_node_put(qe); qe_num_of_snum = i; return; } + /* +* Fall back to legacy binding of using the value of +* fsl,qe-num-snums to choose one of the static arrays +* above. +*/ + of_property_read_u32(qe, "fsl,qe-num-snums", &qe_num_of_snum); + of_node_put(qe); } - qe_num_of_snum = qe_get_num_of_snums(); - - if (qe_num_of_snum == 76) + if (qe_num_of_snum == 76) { snum_init = snum_init_76; - else + } else if (qe_num_of_snum == 28 || qe_num_of_snum == 46) { snum_init = snum_init_46; - + } else { + pr_err("QE: unsupported value of fsl,qe-num-snums: %u\n", qe_num_of_snum); + return; + } memcpy(snums, snum_init, qe_num_of_snum); } @@ -642,30 +651,7 @@ EXPORT_SYMBOL(qe_get_num_of_risc); unsigned int qe_get_num_of_snums(void) { - struct device_node *qe; - int size; - unsigned int num_of_snums; - const u32 *prop; - - num_of_snums = 28; /* The default number of snum for threads is 28 */ - qe = qe_get_device_node(); - if (!qe) - return num_of_snums; - - prop = of_get_property(qe, "fsl,qe-num-snums", &size); - if (prop && size == sizeof(*prop)) { - num_of_snums = *prop; - if ((num_of_snums < 28) || (num_of_snums > QE_NUM_OF_SNUM)) { - /* No QE ever has fewer than 28 SNUMs */ - pr_err("QE: number of snum is invalid\n"); - of_node_put(qe); - return -EINVAL; - } - } - - of_node_put(qe); - - return num_of_snums; + return qe_num_of_snum; } EXPORT_SYMBOL(qe_get_num_of_snums); -- 2.20.1
[PATCH v3 5/6] soc/fsl/qe: qe.c: support fsl,qe-snums property
Add driver support for the newly introduced fsl,qe-snums property. Conveniently, of_property_read_variable_u8_array does exactly what we need: If the property fsl,qe-snums is found (and has an allowed size), the array of values get copied to snums, and the return value is the number of snums - we cannot assign directly to num_of_snums, since we need to check whether the return value is negative. Reviewed-by: Christophe Leroy Reviewed-by: Qiang Zhao Signed-off-by: Rasmus Villemoes --- drivers/soc/fsl/qe/qe.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c index 4b444846d590..1d27187b251c 100644 --- a/drivers/soc/fsl/qe/qe.c +++ b/drivers/soc/fsl/qe/qe.c @@ -283,7 +283,6 @@ EXPORT_SYMBOL(qe_clock_source); */ static void qe_snums_init(void) { - int i; static const u8 snum_init_76[] = { 0x04, 0x05, 0x0C, 0x0D, 0x14, 0x15, 0x1C, 0x1D, 0x24, 0x25, 0x2C, 0x2D, 0x34, 0x35, 0x88, 0x89, @@ -304,7 +303,21 @@ static void qe_snums_init(void) 0x28, 0x29, 0x38, 0x39, 0x48, 0x49, 0x58, 0x59, 0x68, 0x69, 0x78, 0x79, 0x80, 0x81, }; + struct device_node *qe; const u8 *snum_init; + int i; + + bitmap_zero(snum_state, QE_NUM_OF_SNUM); + qe = qe_get_device_node(); + if (qe) { + i = of_property_read_variable_u8_array(qe, "fsl,qe-snums", + snums, 1, QE_NUM_OF_SNUM); + of_node_put(qe); + if (i > 0) { + qe_num_of_snum = i; + return; + } + } qe_num_of_snum = qe_get_num_of_snums(); @@ -313,7 +326,6 @@ static void qe_snums_init(void) else snum_init = snum_init_46; - bitmap_zero(snum_state, QE_NUM_OF_SNUM); memcpy(snums, snum_init, qe_num_of_snum); } -- 2.20.1
[PATCH v3 4/6] dt-bindings: soc/fsl: qe: document new fsl,qe-snums binding
Reading table 4-30, and its footnotes, of the QUICC Engine Block Reference Manual shows that the set of snum _values_ is not necessarily just a function of the _number_ of snums, as given in the fsl,qe-num-snums property. As an alternative, to make it easier to add support for other variants of the QUICC engine IP, this introduces a new binding fsl,qe-snums, which automatically encodes both the number of snums and the actual values to use. Signed-off-by: Rasmus Villemoes --- Rob, thanks for the review of v2. However, since I moved the example from the commit log to the binding (per Joakim's request), I didn't add a Reviewed-by tag for this revision. .../devicetree/bindings/soc/fsl/cpm_qe/qe.txt | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/soc/fsl/cpm_qe/qe.txt b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/qe.txt index d7afaff5faff..05ec2a838c54 100644 --- a/Documentation/devicetree/bindings/soc/fsl/cpm_qe/qe.txt +++ b/Documentation/devicetree/bindings/soc/fsl/cpm_qe/qe.txt @@ -18,7 +18,8 @@ Required properties: - reg : offset and length of the device registers. - bus-frequency : the clock frequency for QUICC Engine. - fsl,qe-num-riscs: define how many RISC engines the QE has. -- fsl,qe-num-snums: define how many serial number(SNUM) the QE can use for the +- fsl,qe-snums: This property has to be specified as '/bits/ 8' value, + defining the array of serial number (SNUM) values for the virtual threads. Optional properties: @@ -34,6 +35,11 @@ Recommended properties - brg-frequency : the internal clock source frequency for baud-rate generators in Hz. +Deprecated properties +- fsl,qe-num-snums: define how many serial number(SNUM) the QE can use + for the threads. Use fsl,qe-snums instead to not only specify the + number of snums, but also their values. + Example: qe@e010 { #address-cells = <1>; @@ -44,6 +50,11 @@ Example: reg = ; brg-frequency = <0>; bus-frequency = <179A7B00>; + fsl,qe-snums = /bits/ 8 < + 0x04 0x05 0x0C 0x0D 0x14 0x15 0x1C 0x1D + 0x24 0x25 0x2C 0x2D 0x34 0x35 0x88 0x89 + 0x98 0x99 0xA8 0xA9 0xB8 0xB9 0xC8 0xC9 + 0xD8 0xD9 0xE8 0xE9>; } * Multi-User RAM (MURAM) -- 2.20.1
[PATCH v3 3/6] soc/fsl/qe: qe.c: introduce qe_get_device_node helper
The 'try of_find_compatible_node(NULL, NULL, "fsl,qe"), fall back to of_find_node_by_type(NULL, "qe")' pattern is repeated five times. Factor it into a common helper. Reviewed-by: Christophe Leroy Reviewed-by: Qiang Zhao Signed-off-by: Rasmus Villemoes --- drivers/soc/fsl/qe/qe.c | 71 + 1 file changed, 29 insertions(+), 42 deletions(-) diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c index 4b59109df22b..4b444846d590 100644 --- a/drivers/soc/fsl/qe/qe.c +++ b/drivers/soc/fsl/qe/qe.c @@ -56,6 +56,20 @@ static unsigned int qe_num_of_snum; static phys_addr_t qebase = -1; +static struct device_node *qe_get_device_node(void) +{ + struct device_node *qe; + + /* +* Newer device trees have an "fsl,qe" compatible property for the QE +* node, but we still need to support older device trees. +*/ + qe = of_find_compatible_node(NULL, NULL, "fsl,qe"); + if (qe) + return qe; + return of_find_node_by_type(NULL, "qe"); +} + static phys_addr_t get_qe_base(void) { struct device_node *qe; @@ -65,12 +79,9 @@ static phys_addr_t get_qe_base(void) if (qebase != -1) return qebase; - qe = of_find_compatible_node(NULL, NULL, "fsl,qe"); - if (!qe) { - qe = of_find_node_by_type(NULL, "qe"); - if (!qe) - return qebase; - } + qe = qe_get_device_node(); + if (!qe) + return qebase; ret = of_address_to_resource(qe, 0, &res); if (!ret) @@ -164,12 +175,9 @@ unsigned int qe_get_brg_clk(void) if (brg_clk) return brg_clk; - qe = of_find_compatible_node(NULL, NULL, "fsl,qe"); - if (!qe) { - qe = of_find_node_by_type(NULL, "qe"); - if (!qe) - return brg_clk; - } + qe = qe_get_device_node(); + if (!qe) + return brg_clk; prop = of_get_property(qe, "brg-frequency", &size); if (prop && size == sizeof(*prop)) @@ -558,16 +566,9 @@ struct qe_firmware_info *qe_get_firmware_info(void) initialized = 1; - /* -* Newer device trees have an "fsl,qe" compatible property for the QE -* node, but we still need to support older device trees. - */ - qe = of_find_compatible_node(NULL, NULL, "fsl,qe"); - if (!qe) { - qe = of_find_node_by_type(NULL, "qe"); - if (!qe) - return NULL; - } + qe = qe_get_device_node(); + if (!qe) + return NULL; /* Find the 'firmware' child node */ fw = of_get_child_by_name(qe, "firmware"); @@ -613,16 +614,9 @@ unsigned int qe_get_num_of_risc(void) unsigned int num_of_risc = 0; const u32 *prop; - qe = of_find_compatible_node(NULL, NULL, "fsl,qe"); - if (!qe) { - /* Older devices trees did not have an "fsl,qe" -* compatible property, so we need to look for -* the QE node by name. -*/ - qe = of_find_node_by_type(NULL, "qe"); - if (!qe) - return num_of_risc; - } + qe = qe_get_device_node(); + if (!qe) + return num_of_risc; prop = of_get_property(qe, "fsl,qe-num-riscs", &size); if (prop && size == sizeof(*prop)) @@ -642,16 +636,9 @@ unsigned int qe_get_num_of_snums(void) const u32 *prop; num_of_snums = 28; /* The default number of snum for threads is 28 */ - qe = of_find_compatible_node(NULL, NULL, "fsl,qe"); - if (!qe) { - /* Older devices trees did not have an "fsl,qe" -* compatible property, so we need to look for -* the QE node by name. -*/ - qe = of_find_node_by_type(NULL, "qe"); - if (!qe) - return num_of_snums; - } + qe = qe_get_device_node(); + if (!qe) + return num_of_snums; prop = of_get_property(qe, "fsl,qe-num-snums", &size); if (prop && size == sizeof(*prop)) { -- 2.20.1
[PATCH v3 2/6] soc/fsl/qe: qe.c: reduce static memory footprint by 1.7K
The current array of struct qe_snum use 256*4 bytes for just keeping track of the free/used state of each index, and the struct layout means there's another 768 bytes of padding. If we just unzip that structure, the array of snum values just use 256 bytes, while the free/inuse state can be tracked in a 32 byte bitmap. So this reduces the .data footprint by 1760 bytes. It also serves as preparation for introducing another DT binding for specifying the snum values. Reviewed-by: Christophe Leroy Reviewed-by: Qiang Zhao Signed-off-by: Rasmus Villemoes --- drivers/soc/fsl/qe/qe.c | 42 - 1 file changed, 12 insertions(+), 30 deletions(-) diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c index 855373deb746..4b59109df22b 100644 --- a/drivers/soc/fsl/qe/qe.c +++ b/drivers/soc/fsl/qe/qe.c @@ -14,6 +14,7 @@ * Free Software Foundation; either version 2 of the License, or (at your * option) any later version. */ +#include #include #include #include @@ -43,25 +44,14 @@ static DEFINE_SPINLOCK(qe_lock); DEFINE_SPINLOCK(cmxgcr_lock); EXPORT_SYMBOL(cmxgcr_lock); -/* QE snum state */ -enum qe_snum_state { - QE_SNUM_STATE_USED, - QE_SNUM_STATE_FREE -}; - -/* QE snum */ -struct qe_snum { - u8 num; - enum qe_snum_state state; -}; - /* We allocate this here because it is used almost exclusively for * the communication processor devices. */ struct qe_immap __iomem *qe_immr; EXPORT_SYMBOL(qe_immr); -static struct qe_snum snums[QE_NUM_OF_SNUM]; /* Dynamically allocated SNUMs */ +static u8 snums[QE_NUM_OF_SNUM]; /* Dynamically allocated SNUMs */ +static DECLARE_BITMAP(snum_state, QE_NUM_OF_SNUM); static unsigned int qe_num_of_snum; static phys_addr_t qebase = -1; @@ -315,10 +305,8 @@ static void qe_snums_init(void) else snum_init = snum_init_46; - for (i = 0; i < qe_num_of_snum; i++) { - snums[i].num = snum_init[i]; - snums[i].state = QE_SNUM_STATE_FREE; - } + bitmap_zero(snum_state, QE_NUM_OF_SNUM); + memcpy(snums, snum_init, qe_num_of_snum); } int qe_get_snum(void) @@ -328,12 +316,10 @@ int qe_get_snum(void) int i; spin_lock_irqsave(&qe_lock, flags); - for (i = 0; i < qe_num_of_snum; i++) { - if (snums[i].state == QE_SNUM_STATE_FREE) { - snums[i].state = QE_SNUM_STATE_USED; - snum = snums[i].num; - break; - } + i = find_first_zero_bit(snum_state, qe_num_of_snum); + if (i < qe_num_of_snum) { + set_bit(i, snum_state); + snum = snums[i]; } spin_unlock_irqrestore(&qe_lock, flags); @@ -343,14 +329,10 @@ EXPORT_SYMBOL(qe_get_snum); void qe_put_snum(u8 snum) { - int i; + const u8 *p = memchr(snums, snum, qe_num_of_snum); - for (i = 0; i < qe_num_of_snum; i++) { - if (snums[i].num == snum) { - snums[i].state = QE_SNUM_STATE_FREE; - break; - } - } + if (p) + clear_bit(p - snums, snum_state); } EXPORT_SYMBOL(qe_put_snum); -- 2.20.1
[PATCH v3 0/6] soc/fsl/qe: cleanups and new DT binding
This small series consists of some small cleanups and simplifications of the QUICC engine driver, and introduces a new DT binding that makes it much easier to support other variants of the QUICC engine IP block that appears in the wild: There's no reason to expect in general that the number of valid SNUMs uniquely determines the set of such, so it's better to simply let the device tree specify the values (and, implicitly via the array length, also the count). Which tree should this go through? v3: - Move example from commit log into binding document (adapting to MPC8360 which the existing example pertains to). - Add more review tags. - Fix minor style issue. v2: - Address comments from Christophe Leroy - Add his Reviewed-by to 1/6 and 3/6 - Split DT binding update to separate patch as per Documentation/devicetree/bindings/submitting-patches.txt Rasmus Villemoes (6): soc/fsl/qe: qe.c: drop useless static qualifier soc/fsl/qe: qe.c: reduce static memory footprint by 1.7K soc/fsl/qe: qe.c: introduce qe_get_device_node helper dt-bindings: soc/fsl: qe: document new fsl,qe-snums binding soc/fsl/qe: qe.c: support fsl,qe-snums property soc/fsl/qe: qe.c: fold qe_get_num_of_snums into qe_snums_init .../devicetree/bindings/soc/fsl/cpm_qe/qe.txt | 13 +- drivers/soc/fsl/qe/qe.c | 163 +++--- 2 files changed, 77 insertions(+), 99 deletions(-) -- 2.20.1
Re: [PATCH 1/2] perf ioctl: Add check for the sample_period value
On 5/13/19 2:26 PM, Peter Zijlstra wrote: > On Mon, May 13, 2019 at 09:42:13AM +0200, Peter Zijlstra wrote: >> On Sat, May 11, 2019 at 08:12:16AM +0530, Ravi Bangoria wrote: >>> Add a check for sample_period value sent from userspace. Negative >>> value does not make sense. And in powerpc arch code this could cause >>> a recursive PMI leading to a hang (reported when running perf-fuzzer). >>> >>> Signed-off-by: Ravi Bangoria >>> --- >>> kernel/events/core.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/kernel/events/core.c b/kernel/events/core.c >>> index abbd4b3b96c2..e44c90378940 100644 >>> --- a/kernel/events/core.c >>> +++ b/kernel/events/core.c >>> @@ -5005,6 +5005,9 @@ static int perf_event_period(struct perf_event >>> *event, u64 __user *arg) >>> if (perf_event_check_period(event, value)) >>> return -EINVAL; >>> >>> + if (!event->attr.freq && (value & (1ULL << 63))) >>> + return -EINVAL; >> >> Well, perf_event_attr::sample_period is __u64. Would not be the site >> using it as signed be the one in error? > > You forgot to mention commit: 0819b2e30ccb9, so I guess this just makes > it consistent and is fine. > Yeah, I was about to reply :)
[PATCH 2/2] powerpc/lib: only build ldstfp.o when CONFIG_PPC_FPU is set
The entire code in ldstfp.o is enclosed into #ifdef CONFIG_PPC_FPU, so there is no point in building it when this config is not selected. Fixes: cd64d1697cf0 ("powerpc: mtmsrd not defined") Signed-off-by: Christophe Leroy --- arch/powerpc/lib/Makefile | 3 ++- arch/powerpc/lib/ldstfp.S | 4 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile index 17fce3738d48..eebc782d89a5 100644 --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -49,7 +49,8 @@ obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o \ obj-y += checksum_$(BITS).o checksum_wrappers.o \ string_$(BITS).o -obj-y += sstep.o ldstfp.o +obj-y += sstep.o +obj-$(CONFIG_PPC_FPU) += ldstfp.o obj64-y+= quad.o obj-$(CONFIG_PPC_LIB_RHEAP) += rheap.o diff --git a/arch/powerpc/lib/ldstfp.S b/arch/powerpc/lib/ldstfp.S index 32e91994b6b2..e388a3127cb6 100644 --- a/arch/powerpc/lib/ldstfp.S +++ b/arch/powerpc/lib/ldstfp.S @@ -18,8 +18,6 @@ #include #include -#ifdef CONFIG_PPC_FPU - #define STKFRM (PPC_MIN_STKFRM + 16) /* Get the contents of frN into *p; N is in r3 and p is in r4. */ @@ -241,5 +239,3 @@ _GLOBAL(conv_dp_to_sp) MTMSRD(r6) isync blr - -#endif /* CONFIG_PPC_FPU */ -- 2.13.3
[PATCH 1/2] powerpc/lib: fix redundant inclusion of quad.o
quad.o is only for PPC64, and already included in obj64-y, so it doesn't have to be in obj-y Fixes: 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to handle alignment faults") Signed-off-by: Christophe Leroy --- arch/powerpc/lib/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile index c55f9c27bf79..17fce3738d48 100644 --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -49,7 +49,7 @@ obj64-$(CONFIG_KPROBES_SANITY_TEST) += test_emulate_step.o \ obj-y += checksum_$(BITS).o checksum_wrappers.o \ string_$(BITS).o -obj-y += sstep.o ldstfp.o quad.o +obj-y += sstep.o ldstfp.o obj64-y+= quad.o obj-$(CONFIG_PPC_LIB_RHEAP) += rheap.o -- 2.13.3
Re: [PATCH 0/1] Forced-wakeup for stop lite states on Powernv
On 05/08/2019 10:29 AM, Nicholas Piggin wrote: Abhishek Goel's on April 22, 2019 4:32 pm: Currently, the cpuidle governors determine what idle state a idling CPU should enter into based on heuristics that depend on the idle history on that CPU. Given that no predictive heuristic is perfect, there are cases where the governor predicts a shallow idle state, hoping that the CPU will be busy soon. However, if no new workload is scheduled on that CPU in the near future, the CPU will end up in the shallow state. Motivation -- In case of POWER, this is problematic, when the predicted state in the aforementioned scenario is a lite stop state, as such lite states will inhibit SMT folding, thereby depriving the other threads in the core from using the core resources. So we do not want to get stucked in such states for longer duration. To address this, the cpuidle-core can queue timer to correspond with the residency value of the next available state. This timer will forcefully wakeup the cpu. Few such iterations will essentially train the governor to select a deeper state for that cpu, as the timer here corresponds to the next available cpuidle state residency. Cpu will be kicked out of the lite state and end up in a non-lite state. Experiment -- I performed experiments for three scenarios to collect some data. case 1 : Without this patch and without tick retained, i.e. in a upstream kernel, It would spend more than even a second to get out of stop0_lite. case 2 : With tick retained in a upstream kernel - Generally, we have a sched tick at 4ms(CONF_HZ = 250). Ideally I expected it to take 8 sched tick to get out of stop0_lite. Experimentally, observation was = sample minmax 99percentile 20 4ms12ms 4ms = It would take atleast one sched tick to get out of stop0_lite. case 2 : With this patch (not stopping tick, but explicitly queuing a timer) sample min max 99percentile 20 144us 192us 144us In this patch, we queue a timer just before entering into a stop0_lite state. The timer fires at (residency of next available state + exit latency of next available state * 2). Let's say if next state(stop0) is available which has residency of 20us, it should get out in as low as (20+2*2)*8 [Based on the forumla (residency + 2xlatency)*history length] microseconds = 192us. Ideally we would expect 8 iterations, it was observed to get out in 6-7 iterations. Even if let's say stop2 is next available state(stop0 and stop1 both are unavailable), it would take (100+2*10)*8 = 960us to get into stop2. So, We are able to get out of stop0_lite generally in 150us(with this patch) as compared to 4ms(with tick retained). As stated earlier, we do not want to get stuck into stop0_lite as it inhibits SMT folding for other sibling threads, depriving them of core resources. Current patch is using forced-wakeup only for stop0_lite, as it gives performance benefit(primary reason) along with lowering down power consumption. We may extend this model for other states in future. I still have to wonder, between our snooze loop and stop0, what does stop0_lite buy us. That said, the problem you're solving here is a generic one that all stop states have, I think. Doesn't the same thing apply going from stop0 to stop5? You might under estimate the sleep time and lose power savings and therefore performance there too. Shouldn't we make it generic for all stop states? Thanks, Nick When a cpu is in snooze, it takes both space and time of core. When in stop0_lite, it free up time but it still takes space. When it is in stop0 or deeper, it free up both space and time slice of core. In stop0_lite, cpu doesn't free up the core resources and thus inhibits thread folding. When a cpu goes to stop0, it will free up the core resources thus increasing the single thread performance of other sibling thread. Hence, we do not want to get stuck in stop0_lite for long duration, and want to quickly move onto the next state. If we get stuck in any other state we would possibly be losing on to power saving, but will still be able to gain the performance benefits for other sibling threads. Thanks, Abhishek
Re: [PATCH] vsprintf: Do not break early boot with probing addresses
On Mon, May 13, 2019 at 08:52:41AM +, David Laight wrote: > From: christophe leroy > > Sent: 10 May 2019 18:35 > > Le 10/05/2019 à 18:24, Steven Rostedt a écrit : > > > On Fri, 10 May 2019 10:42:13 +0200 > > > Petr Mladek wrote: > > >> -if (probe_kernel_address(ptr, byte)) > > >> +if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > > >> return "(efault)"; > > "efault" looks a bit like a spellling mistake for "default". It's a special, thus it's in parenthesis, though somebody can be misguided. > > Usually, < PAGE_SIZE means NULL pointer dereference (via the member of a > > struct) > > Maybe the caller should pass in a short buffer so that you can return > "(err-%d)" > or "(null+%#x)" ? In both cases it should be limited to the size of pointer (8 or 16 characters). Something like "(e:%4d)" would work for error codes. The "(null)" is good enough by itself and already an established practice.. -- With Best Regards, Andy Shevchenko
Re: [PATCH v2] powerpc: Fix compile issue with force DAWR
Le 13/05/2019 à 09:17, Michael Neuling a écrit : If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail at linking with: arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference to `dawr_force_enable' This was caused by commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option"). This puts more of the dawr infrastructure in a new file. I think you are doing a bit more than that. I think you should explain that you define a new CONFIG_ option, when it is selected, etc ... The commit you are referring to is talking about P9. It looks like your patch covers a lot more, so it should be mentionned her I guess. Signed-off-by: Michael Neuling You should add a Fixes: tag, ie: Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") -- v2: Fixes based on Christophe Leroy's comments: - Fix commit message formatting - Move more DAWR code into dawr.c --- arch/powerpc/Kconfig | 5 ++ arch/powerpc/include/asm/hw_breakpoint.h | 20 --- arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/dawr.c | 75 arch/powerpc/kernel/hw_breakpoint.c | 56 -- arch/powerpc/kvm/Kconfig | 1 + 6 files changed, 94 insertions(+), 64 deletions(-) create mode 100644 arch/powerpc/kernel/dawr.c diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 2711aac246..f4b19e48cc 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -242,6 +242,7 @@ config PPC select SYSCTL_EXCEPTION_TRACE select THREAD_INFO_IN_TASK select VIRT_TO_BUS if !PPC64 + select PPC_DAWR_FORCE_ENABLEif PPC64 || PERF What's PERF ? Did you mean PERF_EVENTS ? Then what you mean is: - Selected all the time for PPC64 - Selected for PPC32 when PERF is also selected. Is that what you want ? At first I thought it was linked to P9. And ... did you read below statement ? # # Please keep this list sorted alphabetically. # @@ -369,6 +370,10 @@ config PPC_ADV_DEBUG_DAC_RANGE depends on PPC_ADV_DEBUG_REGS && 44x default y +config PPC_DAWR_FORCE_ENABLE + bool + default y + Why defaulting it to y. Then how is it set to n ? config ZONE_DMA bool default y if PPC_BOOK3E_64 diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index 0fe8c1e46b..ffbc8eab41 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -47,6 +47,8 @@ struct arch_hw_breakpoint { #define HW_BRK_TYPE_PRIV_ALL (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL | \ HW_BRK_TYPE_HYP) +extern int set_dawr(struct arch_hw_breakpoint *brk); + #ifdef CONFIG_HAVE_HW_BREAKPOINT #include #include @@ -90,18 +92,20 @@ static inline void hw_breakpoint_disable(void) extern void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs); int hw_breakpoint_handler(struct die_args *args); -extern int set_dawr(struct arch_hw_breakpoint *brk); -extern bool dawr_force_enable; -static inline bool dawr_enabled(void) -{ - return dawr_force_enable; -} - That's a very simple function, why not keep it here (or in another .h) as 'static inline' ? #else /* CONFIG_HAVE_HW_BREAKPOINT */ static inline void hw_breakpoint_disable(void) { } static inline void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs) { } -static inline bool dawr_enabled(void) { return false; } + #endif/* CONFIG_HAVE_HW_BREAKPOINT */ + +extern bool dawr_force_enable; + +#ifdef CONFIG_PPC_DAWR_FORCE_ENABLE +extern bool dawr_enabled(void); Functions should not be 'extern'. I'm sure checkpatch --strict will tell you. +#else +#define dawr_enabled() true true by default ? Previously it was false by default. And why a #define ? That's better to keep a static inline. +#endif + #endif/* __KERNEL__ */ #endif/* _PPC_BOOK3S_64_HW_BREAKPOINT_H */ diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 0ea6c4aa3a..a9c497c34f 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -56,6 +56,7 @@ obj-$(CONFIG_PPC64) += setup_64.o sys_ppc32.o \ obj-$(CONFIG_VDSO32) += vdso32/ obj-$(CONFIG_PPC_WATCHDOG)+= watchdog.o obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o +obj-$(CONFIG_PPC_DAWR_FORCE_ENABLE)+= dawr.o obj-$(CONFIG_PPC_BOOK3S_64) += cpu_setup_ppc970.o cpu_setup_pa6t.o obj-$(CONFIG_PPC_BOOK3S_64) += cpu_setup_power.o obj-$(CONFIG_PPC_BOOK3S_64) += mce.o mce_power.o diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c new file mode 100644 index 00..cf1d02fe1e --- /dev/null +++ b/arch/powerpc/kernel/dawr.c @@ -0,0 +1,75 @@ +// SPDX-License-Identif
Re: [PATCH 1/2] perf ioctl: Add check for the sample_period value
On Mon, May 13, 2019 at 09:42:13AM +0200, Peter Zijlstra wrote: > On Sat, May 11, 2019 at 08:12:16AM +0530, Ravi Bangoria wrote: > > Add a check for sample_period value sent from userspace. Negative > > value does not make sense. And in powerpc arch code this could cause > > a recursive PMI leading to a hang (reported when running perf-fuzzer). > > > > Signed-off-by: Ravi Bangoria > > --- > > kernel/events/core.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > index abbd4b3b96c2..e44c90378940 100644 > > --- a/kernel/events/core.c > > +++ b/kernel/events/core.c > > @@ -5005,6 +5005,9 @@ static int perf_event_period(struct perf_event > > *event, u64 __user *arg) > > if (perf_event_check_period(event, value)) > > return -EINVAL; > > > > + if (!event->attr.freq && (value & (1ULL << 63))) > > + return -EINVAL; > > Well, perf_event_attr::sample_period is __u64. Would not be the site > using it as signed be the one in error? You forgot to mention commit: 0819b2e30ccb9, so I guess this just makes it consistent and is fine.
RE: [PATCH] vsprintf: Do not break early boot with probing addresses
From: christophe leroy > Sent: 10 May 2019 18:35 > Le 10/05/2019 à 18:24, Steven Rostedt a écrit : > > On Fri, 10 May 2019 10:42:13 +0200 > > Petr Mladek wrote: > > > >> static const char *check_pointer_msg(const void *ptr) > >> { > >> - char byte; > >> - > >>if (!ptr) > >>return "(null)"; > >> > >> - if (probe_kernel_address(ptr, byte)) > >> + if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr)) > >>return "(efault)"; "efault" looks a bit like a spellling mistake for "default". > > < PAGE_SIZE ? > > > > do you mean: < TASK_SIZE ? > > I guess not. > > Usually, < PAGE_SIZE means NULL pointer dereference (via the member of a > struct) Maybe the caller should pass in a short buffer so that you can return "(err-%d)" or "(null+%#x)" ? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH v2 3/8] mm/memory_hotplug: arch_remove_memory() and __remove_pages() with CONFIG_MEMORY_HOTPLUG
On 13.05.19 09:48, David Hildenbrand wrote: > On 07.05.19 23:02, Dan Williams wrote: >> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand wrote: >>> >>> Let's prepare for better error handling while adding memory by allowing >>> to use arch_remove_memory() and __remove_pages() even if >>> CONFIG_MEMORY_HOTREMOVE is not set. CONFIG_MEMORY_HOTREMOVE effectively >>> covers >>> - Offlining of system ram (memory block devices) - offline_pages() >>> - Unplug of system ram - remove_memory() >>> - Unplug/remap of device memory - devm_memremap() >>> >>> This allows e.g. for handling like >>> >>> arch_add_memory() >>> rc = do_something(); >>> if (rc) { >>> arch_remove_memory(); >>> } >>> >>> Whereby do_something() will for example be memory block device creation >>> after it has been factored out. >> >> What's left after this? Can we just get rid of CONFIG_MEMORY_HOTREMOVE >> option completely when CONFIG_MEMORY_HOTPLUG is enabled? It's not >> clear to me why there was ever the option to compile out the remove >> code when the add code is included. >> > > If there are no other comments, I will go ahead and rip out > CONFIG_MEMORY_HOTREMOVE completely, gluing the functionality to > CONFIG_MEMORY_HOTPLUG. > H, however this will require CONFIG_MEMORY_HOTPLUG to require - MEMORY_ISOLATION - HAVE_BOOTMEM_INFO_NODE if (X86_64 || PPC64) And depends on - MIGRATION Which would limit the configurations where memory hotplug would be available. I guess going with this patch here is ok as a first step. I just realized, that we'll need arch_remove_memory() for arm64 to make this patch here work. -- Thanks, David / dhildenb
Re: [PATCH v2 3/8] mm/memory_hotplug: arch_remove_memory() and __remove_pages() with CONFIG_MEMORY_HOTPLUG
On 07.05.19 23:02, Dan Williams wrote: > On Tue, May 7, 2019 at 11:38 AM David Hildenbrand wrote: >> >> Let's prepare for better error handling while adding memory by allowing >> to use arch_remove_memory() and __remove_pages() even if >> CONFIG_MEMORY_HOTREMOVE is not set. CONFIG_MEMORY_HOTREMOVE effectively >> covers >> - Offlining of system ram (memory block devices) - offline_pages() >> - Unplug of system ram - remove_memory() >> - Unplug/remap of device memory - devm_memremap() >> >> This allows e.g. for handling like >> >> arch_add_memory() >> rc = do_something(); >> if (rc) { >> arch_remove_memory(); >> } >> >> Whereby do_something() will for example be memory block device creation >> after it has been factored out. > > What's left after this? Can we just get rid of CONFIG_MEMORY_HOTREMOVE > option completely when CONFIG_MEMORY_HOTPLUG is enabled? It's not > clear to me why there was ever the option to compile out the remove > code when the add code is included. > If there are no other comments, I will go ahead and rip out CONFIG_MEMORY_HOTREMOVE completely, gluing the functionality to CONFIG_MEMORY_HOTPLUG. -- Thanks, David / dhildenb
Re: [PATCH 1/2] perf ioctl: Add check for the sample_period value
On Sat, May 11, 2019 at 08:12:16AM +0530, Ravi Bangoria wrote: > Add a check for sample_period value sent from userspace. Negative > value does not make sense. And in powerpc arch code this could cause > a recursive PMI leading to a hang (reported when running perf-fuzzer). > > Signed-off-by: Ravi Bangoria > --- > kernel/events/core.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index abbd4b3b96c2..e44c90378940 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -5005,6 +5005,9 @@ static int perf_event_period(struct perf_event *event, > u64 __user *arg) > if (perf_event_check_period(event, value)) > return -EINVAL; > > + if (!event->attr.freq && (value & (1ULL << 63))) > + return -EINVAL; Well, perf_event_attr::sample_period is __u64. Would not be the site using it as signed be the one in error?
Re: [PATCH] powerpc: Document xive=off option
On 5/13/19 7:39 AM, Michael Neuling wrote: > commit 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE > interrupt controller") added an option to turn off Linux native XIVE > usage via the xive=off kernel command line option. > > This documents this option. > > Signed-off-by: Michael Neuling Reviewed-by: Cédric Le Goater But, We should fix the behavior because xive=off does not work on pseries. This is not handled correctly in prom when CAS negotiates with the hypervisor which interrupt mode is to be used. I haven't tried this option on PowerNV. Cheers, C. > --- > Documentation/admin-guide/kernel-parameters.txt | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt > b/Documentation/admin-guide/kernel-parameters.txt > index c45a19d654..ee410d0ef4 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -5177,6 +5177,15 @@ > Format: > > ,[,[,[,]]] > > + xive= [PPC] > + By default on POWER9 and above, the kernel will > + natively use the XIVE interrupt controller. This option > + allows the fallback firmware mode to be used: > + > + off Fallback to firmware control of XIVE interrupt > + controller on both pseries and powernv > + platforms. Only useful on POWER9 and above. > + > xhci-hcd.quirks [USB,KNL] > A hex value specifying bitmask with supplemental xhci > host controller quirks. Meaning of each bit can be >
[PATCH v2] powerpc: Fix compile issue with force DAWR
If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail at linking with: arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference to `dawr_force_enable' This was caused by commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option"). This puts more of the dawr infrastructure in a new file. Signed-off-by: Michael Neuling -- v2: Fixes based on Christophe Leroy's comments: - Fix commit message formatting - Move more DAWR code into dawr.c --- arch/powerpc/Kconfig | 5 ++ arch/powerpc/include/asm/hw_breakpoint.h | 20 --- arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/dawr.c | 75 arch/powerpc/kernel/hw_breakpoint.c | 56 -- arch/powerpc/kvm/Kconfig | 1 + 6 files changed, 94 insertions(+), 64 deletions(-) create mode 100644 arch/powerpc/kernel/dawr.c diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 2711aac246..f4b19e48cc 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -242,6 +242,7 @@ config PPC select SYSCTL_EXCEPTION_TRACE select THREAD_INFO_IN_TASK select VIRT_TO_BUS if !PPC64 + select PPC_DAWR_FORCE_ENABLEif PPC64 || PERF # # Please keep this list sorted alphabetically. # @@ -369,6 +370,10 @@ config PPC_ADV_DEBUG_DAC_RANGE depends on PPC_ADV_DEBUG_REGS && 44x default y +config PPC_DAWR_FORCE_ENABLE + bool + default y + config ZONE_DMA bool default y if PPC_BOOK3E_64 diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index 0fe8c1e46b..ffbc8eab41 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -47,6 +47,8 @@ struct arch_hw_breakpoint { #define HW_BRK_TYPE_PRIV_ALL (HW_BRK_TYPE_USER | HW_BRK_TYPE_KERNEL | \ HW_BRK_TYPE_HYP) +extern int set_dawr(struct arch_hw_breakpoint *brk); + #ifdef CONFIG_HAVE_HW_BREAKPOINT #include #include @@ -90,18 +92,20 @@ static inline void hw_breakpoint_disable(void) extern void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs); int hw_breakpoint_handler(struct die_args *args); -extern int set_dawr(struct arch_hw_breakpoint *brk); -extern bool dawr_force_enable; -static inline bool dawr_enabled(void) -{ - return dawr_force_enable; -} - #else /* CONFIG_HAVE_HW_BREAKPOINT */ static inline void hw_breakpoint_disable(void) { } static inline void thread_change_pc(struct task_struct *tsk, struct pt_regs *regs) { } -static inline bool dawr_enabled(void) { return false; } + #endif /* CONFIG_HAVE_HW_BREAKPOINT */ + +extern bool dawr_force_enable; + +#ifdef CONFIG_PPC_DAWR_FORCE_ENABLE +extern bool dawr_enabled(void); +#else +#define dawr_enabled() true +#endif + #endif /* __KERNEL__ */ #endif /* _PPC_BOOK3S_64_HW_BREAKPOINT_H */ diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 0ea6c4aa3a..a9c497c34f 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -56,6 +56,7 @@ obj-$(CONFIG_PPC64) += setup_64.o sys_ppc32.o \ obj-$(CONFIG_VDSO32) += vdso32/ obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o +obj-$(CONFIG_PPC_DAWR_FORCE_ENABLE)+= dawr.o obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_ppc970.o cpu_setup_pa6t.o obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_power.o obj-$(CONFIG_PPC_BOOK3S_64)+= mce.o mce_power.o diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c new file mode 100644 index 00..cf1d02fe1e --- /dev/null +++ b/arch/powerpc/kernel/dawr.c @@ -0,0 +1,75 @@ +// SPDX-License-Identifier: GPL-2.0+ +// +// DAWR infrastructure +// +// Copyright 2019, Michael Neuling, IBM Corporation. + +#include +#include +#include +#include +#include +#include +#include + +bool dawr_force_enable; +EXPORT_SYMBOL_GPL(dawr_force_enable); + +extern bool dawr_enabled(void) +{ + return dawr_force_enable; +} +EXPORT_SYMBOL_GPL(dawr_enabled); + +static ssize_t dawr_write_file_bool(struct file *file, + const char __user *user_buf, + size_t count, loff_t *ppos) +{ + struct arch_hw_breakpoint null_brk = {0, 0, 0}; + size_t rc; + + /* Send error to user if they hypervisor won't allow us to write DAWR */ + if ((!dawr_force_enable) && + (firmware_has_feature(FW_FEATURE_LPAR)) && + (set_dawr(&null_brk) != H_SUCCESS)) + return -1; + + rc = debugfs_write_file_bool(file, user_buf, count, ppos); + if (rc) + return rc; + + /* If we are clearing, make sure all CPUs have the DAWR cleared */ + if (!dawr_force_enable) + smp_c
Re: [PATCH 03/16] lib,treewide: add new match_string() helper/macro
On Fri, 2019-05-10 at 17:34 +0300, andriy.shevche...@linux.intel.com wrote: > [External] > > > On Fri, May 10, 2019 at 09:15:27AM +, Ardelean, Alexandru wrote: > > On Wed, 2019-05-08 at 16:22 +0300, Alexandru Ardelean wrote: > > > On Wed, 2019-05-08 at 15:18 +0200, Greg KH wrote: > > > > On Wed, May 08, 2019 at 04:11:28PM +0300, Andy Shevchenko wrote: > > > > > On Wed, May 08, 2019 at 02:28:29PM +0300, Alexandru Ardelean > > > > > wrote: > > > > > Can you split include/linux/ change from the rest? > > > > > > > > That would break the build, why do you want it split out? This > > > > makes > > > > sense all as a single patch to me. > > > > > > > > > > Not really. > > > It would be just be the new match_string() helper/macro in a new > > > commit. > > > And the conversions of the simple users of match_string() (the ones > > > using > > > ARRAY_SIZE()) in another commit. > > > > > > > I should have asked in my previous reply. > > Leave this as-is or re-formulate in 2 patches ? > > Depends on on what you would like to spend your time: collecting Acks for > all > pieces in treewide patch or send new API first followed up by per driver > / > module update in next cycle. I actually would have preferred new API first, with the current `match_string()` -> `__match_string()` rename from the start, but I wasn't sure. I am still navigating through how feedbacks are working in this realm. I'll send a V2 with the API change-first/only; should be a smaller list. Then see about follow-ups/changes per subsystems. > > I also have no strong preference. > And I think it's good to add Heikki Krogerus to Cc list for both patch > series, > since he is the author of sysfs variant and may have something to comment > on > the rest. Thanks for the reference. > > -- > With Best Regards, > Andy Shevchenko > >