Re: [PATCH 2/3] powerpc: mm: add p{te,md,ud}_user_accessible_page helpers

2022-09-11 Thread Christophe Leroy


Le 12/09/2022 à 03:47, Rohan McLure a écrit :
> Add the following helpers for detecting whether a page table entry
> is a leaf and is accessible to user space.
> 
>   * pte_user_accessible_page
>   * pmd_user_accessible_page
>   * pud_user_accessible_page
> 
> The heavy lifting is done by pte_user, which checks user accessibility
> on a per-mmu level, provided prior to this commit.

Not in this series it seems, so I guess that commit is already in the 
tree. Can you reference it ?

> 
> On 32-bit systems, provide stub implementations for these methods, with
> BUG(), as debug features such as page table checks will emit functions
> that call p{md,ud}_user_accessible_page but must not be used.

Please please please no new BUG or BUG_ON, please see the following 
discussion and especially the position of Linus Torvalds :

https://lore.kernel.org/all/CAHk-=wg40eazofo16eviaj7mfqdhz2gvebvfsmf6gyzsprj...@mail.gmail.com/

> 
> Signed-off-by: Rohan McLure 
> ---
>   arch/powerpc/include/asm/pgtable.h | 35 
>   1 file changed, 35 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/pgtable.h 
> b/arch/powerpc/include/asm/pgtable.h
> index 522145b16a07..8c1f5feb9360 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -170,6 +170,41 @@ static inline int pud_pfn(pud_t pud)
>   return 0;
>   }
>   
> +static inline bool pte_user_accessible_page(pte_t pte)
> +{
> + return (pte_val(pte) & _PAGE_PRESENT) && pte_user(pte);
> +}
> +
> +#ifdef CONFIG_PPC64
> +
> +static inline bool pmd_user_accessible_page(pmd_t pmd)
> +{
> + return pmd_is_leaf(pmd) && pmd_present(pmd)
> + && pte_user(pmd_pte(pmd));

The && goes on previous line.

> +}
> +
> +static inline bool pud_user_accessible_page(pud_t pud)
> +{
> + return pud_is_leaf(pud) && pud_present(pud)
> + && pte_user(pud_pte(pud));

Same

> +}
> +
> +#else
> +
> +static inline bool pmd_user_accessible_page(pmd_t pmd)
> +{
> + BUG();

Can you use BUILD_BUG() instead ?

> + return false;
> +}
> +
> +static inline bool pud_user_accessible_page(pud_t pud)
> +{
> + BUG();

Same.

> + return false;
> +}
> +
> +#endif /* CONFIG_PPC64 */
> +
>   #endif /* __ASSEMBLY__ */
>   
>   #endif /* _ASM_POWERPC_PGTABLE_H */

By the way, there is a great tool that can help you :

$ ./arch/powerpc/tools/checkpatch.sh --strict -g HEAD~
CHECK:LOGICAL_CONTINUATIONS: Logical continuations should be on the 
previous line
#43: FILE: arch/powerpc/include/asm/pgtable.h:183:
+   return pmd_is_leaf(pmd) && pmd_present(pmd)
+   && pte_user(pmd_pte(pmd));

CHECK:LOGICAL_CONTINUATIONS: Logical continuations should be on the 
previous line
#49: FILE: arch/powerpc/include/asm/pgtable.h:189:
+   return pud_is_leaf(pud) && pud_present(pud)
+   && pte_user(pud_pte(pud));

WARNING:AVOID_BUG: Avoid crashing the kernel - try using WARN_ON & 
recovery code rather than BUG() or BUG_ON()
#56: FILE: arch/powerpc/include/asm/pgtable.h:196:
+   BUG();

WARNING:AVOID_BUG: Avoid crashing the kernel - try using WARN_ON & 
recovery code rather than BUG() or BUG_ON()
#62: FILE: arch/powerpc/include/asm/pgtable.h:202:
+   BUG();


Re: [PATCH 1/3] powerpc: mm: move pud_pfn stub to common pgtable header

2022-09-11 Thread Christophe Leroy


Le 12/09/2022 à 03:47, Rohan McLure a écrit :
> The pud_pfn inline call is only referenced on 64-bit Book3S systems,
> but its invocations are gated by pud_devmap() invocations, rendering the
> body of this function as dead code.
> 
> As such, this function is readily exportable to all platforms in the
> instance where kernel features depend on it at least being defined.

I don't understand. If this function is dead code, why move it to a 
larger scope ? Shouldn't you remove it instead ? Or are you planning to 
re-use it on other platforms ? In that case say it.

> 
> Signed-off-by: Rohan McLure 
> ---
>   arch/powerpc/include/asm/book3s/64/pgtable.h | 10 --
>   arch/powerpc/include/asm/pgtable.h   | 12 
>   2 files changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 392ff48f77df..8874f2a3661d 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -1411,16 +1411,6 @@ static inline int pgd_devmap(pgd_t pgd)
>   }
>   #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>   
> -static inline int pud_pfn(pud_t pud)
> -{
> - /*
> -  * Currently all calls to pud_pfn() are gated around a pud_devmap()
> -  * check so this should never be used. If it grows another user we
> -  * want to know about it.
> -  */
> - BUILD_BUG();
> - return 0;
> -}
>   #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
>   pte_t ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pte_t 
> *);
>   void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long,
> diff --git a/arch/powerpc/include/asm/pgtable.h 
> b/arch/powerpc/include/asm/pgtable.h
> index 33f4bf8d22b0..522145b16a07 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -158,6 +158,18 @@ struct seq_file;
>   void arch_report_meminfo(struct seq_file *m);
>   #endif /* CONFIG_PPC64 */
>   
> +#define pud_pfn pud_pfn

Why do you need to add that define ?

> +static inline int pud_pfn(pud_t pud)
> +{
> + /*
> +  * Currently all calls to pud_pfn() are gated around a pud_devmap()
> +  * check so this should never be used. If it grows another user we
> +  * want to know about it.
> +  */
> + BUILD_BUG();
> + return 0;
> +}
> +
>   #endif /* __ASSEMBLY__ */
>   
>   #endif /* _ASM_POWERPC_PGTABLE_H */

Re: [PATCH] ppc64/kdump: Limit kdump base to 512MB

2022-09-11 Thread Christophe Leroy


Le 09/09/2022 à 19:40, Hari Bathini a écrit :
> Since commit e641eb03ab2b0 ("powerpc: Fix up the kdump base cap to
> 128M") memory for kdump kernel has been reserved at an offset of
> 128MB. This held up well for a long time before running into boot
> failure on LPARs having a lot of cores. Commit 7c5ed82b800d8
> ("powerpc: Set crashkernel offset to mid of RMA region") fixed this
> boot failure by moving the offset to mid of RMA region. Limit this
> offset to 512MB to avoid running into boot failures, during kdump
> kernel boot, due RTAS or other allocation restrictions.
> 
> Signed-off-by: Hari Bathini 
> ---
>   arch/powerpc/kexec/core.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c
> index cf84bfe9e27e..c2cbfcf81cea 100644
> --- a/arch/powerpc/kexec/core.c
> +++ b/arch/powerpc/kexec/core.c
> @@ -136,7 +136,7 @@ void __init reserve_crashkernel(void)
>   #ifdef CONFIG_PPC64
>   /*
>* On the LPAR platform place the crash kernel to mid of
> -  * RMA size (512MB or more) to ensure the crash kernel
> +  * RMA size (max. of 512MB) to ensure the crash kernel
>* gets enough space to place itself and some stack to be
>* in the first segment. At the same time normal kernel
>* also get enough space to allocate memory for essential
> @@ -144,7 +144,7 @@ void __init reserve_crashkernel(void)
>* kernel starts at 128MB offset on other platforms.
>*/
>   if (firmware_has_feature(FW_FEATURE_LPAR))
> - crashk_res.start = ppc64_rma_size / 2;
> + crashk_res.start = min(0x2000ULL, (ppc64_rma_size / 
> 2));

Use SZ_512M instead of open coding.

Remove the ( ) around ppc64_rma_size / 2

>   else
>   crashk_res.start = min(0x800ULL, (ppc64_rma_size / 
> 2));
>   #else

Christophe

Re: [PATCH 1/3] powerpc: mm: move pud_pfn stub to common pgtable header

2022-09-11 Thread Rohan McLure
This patch and its successor would be avoidable if architectures could specify
that they wish to use page_table_check_p{ud,md}_{clear,set}.

> On 12 Sep 2022, at 11:47 am, Rohan McLure  wrote:
> 
> The pud_pfn inline call is only referenced on 64-bit Book3S systems,
> but its invocations are gated by pud_devmap() invocations, rendering the
> body of this function as dead code.
> 
> As such, this function is readily exportable to all platforms in the
> instance where kernel features depend on it at least being defined.
> 
> Signed-off-by: Rohan McLure 
> ---
> arch/powerpc/include/asm/book3s/64/pgtable.h | 10 --
> arch/powerpc/include/asm/pgtable.h   | 12 
> 2 files changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 392ff48f77df..8874f2a3661d 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -1411,16 +1411,6 @@ static inline int pgd_devmap(pgd_t pgd)
> }
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> 
> -static inline int pud_pfn(pud_t pud)
> -{
> - /*
> -  * Currently all calls to pud_pfn() are gated around a pud_devmap()
> -  * check so this should never be used. If it grows another user we
> -  * want to know about it.
> -  */
> - BUILD_BUG();
> - return 0;
> -}
> #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
> pte_t ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pte_t *);
> void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long,
> diff --git a/arch/powerpc/include/asm/pgtable.h 
> b/arch/powerpc/include/asm/pgtable.h
> index 33f4bf8d22b0..522145b16a07 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -158,6 +158,18 @@ struct seq_file;
> void arch_report_meminfo(struct seq_file *m);
> #endif /* CONFIG_PPC64 */
> 
> +#define pud_pfn pud_pfn
> +static inline int pud_pfn(pud_t pud)
> +{
> + /*
> +  * Currently all calls to pud_pfn() are gated around a pud_devmap()
> +  * check so this should never be used. If it grows another user we
> +  * want to know about it.
> +  */
> + BUILD_BUG();
> + return 0;
> +}
> +
> #endif /* __ASSEMBLY__ */
> 
> #endif /* _ASM_POWERPC_PGTABLE_H */
> -- 
> 2.34.1
> 



[PATCH 3/3] powerpc: mm: support page table check

2022-09-11 Thread Rohan McLure
On creation and clearing of a page table mapping, instrument such calls
by invoking page_table_check_pte_set and page_table_check_pte_clear
respectively. These calls serve as a sanity check against illegal
mappings.

Enable ARCH_SUPPORTS_PAGE_TABLE_CHECK for all ppc64, and 32-bit
platforms implementing Book3S.

Change pud_pfn to be a runtime bug rather than a build bug as it is
consumed by page_table_check_pud_{clear,set} which are not called.

See also:

riscv support in commit 3fee229a8eb9 ("riscv/mm: enable 
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
arm64 in commit 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK")
x86_64 in commit d283d422c6c4 ("x86: mm: add x86_64 support for page table
check")

Signed-off-by: Rohan McLure 
---
 arch/powerpc/Kconfig | 1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h | 7 ++-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 9 -
 arch/powerpc/include/asm/nohash/32/pgtable.h | 5 -
 arch/powerpc/include/asm/nohash/64/pgtable.h | 2 ++
 arch/powerpc/include/asm/nohash/pgtable.h| 1 +
 arch/powerpc/include/asm/pgtable.h   | 4 
 7 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 4c466acdc70d..6c213ac46a92 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -149,6 +149,7 @@ config PPC
select ARCH_STACKWALK
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_DEBUG_PAGEALLOCif PPC_BOOK3S || PPC_8xx || 40x
+   select ARCH_SUPPORTS_PAGE_TABLE_CHECK
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
select ARCH_USE_MEMTEST
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 40041ac713d9..3d05c8fe4604 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -53,6 +53,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include 
+
 static inline bool pte_user(pte_t pte)
 {
return pte_val(pte) & _PAGE_USER;
@@ -353,7 +355,9 @@ static inline int __ptep_test_and_clear_young(struct 
mm_struct *mm,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long 
addr,
   pte_t *ptep)
 {
-   return __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
+   unsigned long old = pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0);
+   page_table_check_pte_clear(mm, addr, __pte(old));
+   return __pte(old);
 }
 
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
@@ -541,6 +545,7 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, int percpu)
 {
+   page_table_check_pte_set(mm, addr, ptep, pte);
 #if defined(CONFIG_SMP) && !defined(CONFIG_PTE_64BIT)
/* First case is 32-bit Hash MMU in SMP mode with 32-bit PTEs. We use 
the
 * helper pte_update() which does an atomic update. We need to do that
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 8874f2a3661d..cbb7bd99c897 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -179,6 +179,8 @@
 #define PAGE_AGP   (PAGE_KERNEL_NC)
 
 #ifndef __ASSEMBLY__
+#include 
+
 /*
  * page table defines
  */
@@ -483,6 +485,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
 {
unsigned long old = pte_update(mm, addr, ptep, ~0UL, 0, 0);
+   page_table_check_pte_clear(mm, addr, __pte(old));
return __pte(old);
 }
 
@@ -491,12 +494,15 @@ static inline pte_t ptep_get_and_clear_full(struct 
mm_struct *mm,
unsigned long addr,
pte_t *ptep, int full)
 {
+   pte_t old_pte;
if (full && radix_enabled()) {
/*
 * We know that this is a full mm pte clear and
 * hence can be sure there is no parallel set_pte.
 */
-   return radix__ptep_get_and_clear_full(mm, addr, ptep, full);
+   old_pte = radix__ptep_get_and_clear_full(mm, addr, ptep, full);
+   page_table_check_pte_clear(mm, addr, old_pte);
+   return old_pte;
}
return ptep_get_and_clear(mm, addr, ptep);
 }
@@ -872,6 +878,7 @@ static inline void __set_pte_at(struct mm_struct *mm, 
unsigned long addr,
 */
pte = __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PTE));
 
+   page_table_check_pte_set(mm, addr, ptep, pte);
if (radix_enabled())
return radix__set_pte_at(mm, addr, ptep, pte, percpu);
return hash__set_pte_at(mm, addr, ptep, pte, percpu);
diff --git 

[PATCH 2/3] powerpc: mm: add p{te,md,ud}_user_accessible_page helpers

2022-09-11 Thread Rohan McLure
Add the following helpers for detecting whether a page table entry
is a leaf and is accessible to user space.

 * pte_user_accessible_page
 * pmd_user_accessible_page
 * pud_user_accessible_page

The heavy lifting is done by pte_user, which checks user accessibility
on a per-mmu level, provided prior to this commit.

On 32-bit systems, provide stub implementations for these methods, with
BUG(), as debug features such as page table checks will emit functions
that call p{md,ud}_user_accessible_page but must not be used.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/include/asm/pgtable.h | 35 
 1 file changed, 35 insertions(+)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 522145b16a07..8c1f5feb9360 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -170,6 +170,41 @@ static inline int pud_pfn(pud_t pud)
return 0;
 }
 
+static inline bool pte_user_accessible_page(pte_t pte)
+{
+   return (pte_val(pte) & _PAGE_PRESENT) && pte_user(pte);
+}
+
+#ifdef CONFIG_PPC64
+
+static inline bool pmd_user_accessible_page(pmd_t pmd)
+{
+   return pmd_is_leaf(pmd) && pmd_present(pmd)
+   && pte_user(pmd_pte(pmd));
+}
+
+static inline bool pud_user_accessible_page(pud_t pud)
+{
+   return pud_is_leaf(pud) && pud_present(pud)
+   && pte_user(pud_pte(pud));
+}
+
+#else
+
+static inline bool pmd_user_accessible_page(pmd_t pmd)
+{
+   BUG();
+   return false;
+}
+
+static inline bool pud_user_accessible_page(pud_t pud)
+{
+   BUG();
+   return false;
+}
+
+#endif /* CONFIG_PPC64 */
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.34.1



[PATCH 1/3] powerpc: mm: move pud_pfn stub to common pgtable header

2022-09-11 Thread Rohan McLure
The pud_pfn inline call is only referenced on 64-bit Book3S systems,
but its invocations are gated by pud_devmap() invocations, rendering the
body of this function as dead code.

As such, this function is readily exportable to all platforms in the
instance where kernel features depend on it at least being defined.

Signed-off-by: Rohan McLure 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 10 --
 arch/powerpc/include/asm/pgtable.h   | 12 
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 392ff48f77df..8874f2a3661d 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1411,16 +1411,6 @@ static inline int pgd_devmap(pgd_t pgd)
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
-static inline int pud_pfn(pud_t pud)
-{
-   /*
-* Currently all calls to pud_pfn() are gated around a pud_devmap()
-* check so this should never be used. If it grows another user we
-* want to know about it.
-*/
-   BUILD_BUG();
-   return 0;
-}
 #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
 pte_t ptep_modify_prot_start(struct vm_area_struct *, unsigned long, pte_t *);
 void ptep_modify_prot_commit(struct vm_area_struct *, unsigned long,
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 33f4bf8d22b0..522145b16a07 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -158,6 +158,18 @@ struct seq_file;
 void arch_report_meminfo(struct seq_file *m);
 #endif /* CONFIG_PPC64 */
 
+#define pud_pfn pud_pfn
+static inline int pud_pfn(pud_t pud)
+{
+   /*
+* Currently all calls to pud_pfn() are gated around a pud_devmap()
+* check so this should never be used. If it grows another user we
+* want to know about it.
+*/
+   BUILD_BUG();
+   return 0;
+}
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.34.1



Re: [PATCH v4 00/20] powerpc: Syscall wrapper and register clearing

2022-09-11 Thread Rohan McLure
Any comments for this revision? Hopefully these revisions address 32-bit and 
embedded systems appropriately.

Thanks,
Rohan

> On 24 Aug 2022, at 12:05 pm, Rohan McLure  wrote:
> 
> V3 available here:
> 
> Link: 
> https://lore.kernel.org/all/4c3a8815-67ff-41eb-a703-981920ca1...@linux.ibm.com/T/
> 
> Implement a syscall wrapper, causing arguments to handlers to be passed
> via a struct pt_regs on the stack. The syscall wrapper is implemented
> for all platforms other than the Cell processor, from which SPUs expect
> the ability to directly call syscall handler symbols with the regular
> in-register calling convention.
> 
> Adopting syscall wrappers requires redefinition of architecture-specific
> syscalls and compatibility syscalls to use the SYSCALL_DEFINE and
> COMPAT_SYSCALL_DEFINE macros, as well as removal of direct-references to
> the emitted syscall-handler symbols from within the kernel. This work
> lead to the following modernisations of powerpc's syscall handlers:
> 
> - Replace syscall 82 semantics with sys_old_select and remove
>   ppc_select handler, which features direct call to both sys_old_select
>   and sys_select.
> - Use a generic fallocate compatibility syscall
> 
> Replace asm implementation of syscall table with C implementation for
> more compile-time checks.
> 
> Many compatibility syscalls are candidates to be removed in favour of
> generically defined handlers, but exhibit different parameter orderings
> and numberings due to 32-bit ABI support for 64-bit parameters. The
> parameter reorderings are however consistent with arm. A future patch
> series will serve to modernise syscalls by providing generic
> implementations featuring these reorderings.
> 
> The design of this syscall is very similar to the s390, x86 and arm64
> implementations. See also Commit 4378a7d4be30 (arm64: implement syscall 
> wrappers).
> The motivation for this change is that it allows for the clearing of
> register state when entering the kernel via through interrupt handlers
> on 64-bit servers. This serves to reduce the influence of values in
> registers carried over from the interrupted process, e.g. syscall
> parameters from user space, or user state at the site of a pagefault.
> All values in registers are saved and zeroized at the entry to an
> interrupt handler and restored afterward. While this may sound like a
> heavy-weight mitigation, many gprs are already saved and restored on
> handling of an interrupt, and the mmap_bench benchmark on Power 9 guest,
> repeatedly invoking the pagefault handler suggests at most ~0.8%
> regression in performance. Realistic workloads are not constantly
> producing interrupts, and so this does not indicate realistic slowdown.
> 
> Using wrapped syscalls yields to a performance improvement of ~5.6% on
> the null_syscall benchmark on pseries guests, by removing the need for
> system_call_exception to allocate its own stack frame. This amortises
> the additional costs of saving and restoring non-volatile registers
> (register clearing is cheap on super scalar platforms), and so the
> final mitigation actually yields a net performance improvement of ~0.6%
> on the null_syscall benchmark.
> 
> Patch Changelog:
> 
> - Fix instances where NULLIFY_GPRS were still present
> - Minimise unrecoverable windows in entry_32.S between SRR0/1 restores
>   and RFI
> - Remove all references to syscall symbols prior to introducing syscall
>   wrapper.
> - Remove unnecessary duplication of syscall handlers with sys_... and
>   powerpc_sys_... symbols.
> - Clear non-volatile registers on Book3E systems, as some of these
>   systems feature hardware speculation, and we already unconditionally
>   restore NVGPRS.
> 
> Rohan McLure (20):
>  powerpc: Remove asmlinkage from syscall handler definitions
>  powerpc: Use generic fallocate compatibility syscall
>  powerpc/32: Remove powerpc select specialisation
>  powerpc: Provide do_ppc64_personality helper
>  powerpc: Remove direct call to personality syscall handler
>  powerpc: Remove direct call to mmap2 syscall handlers
>  powerpc: Adopt SYSCALL_DEFINE for arch-specific syscall handlers
>  powerpc: Include all arch-specific syscall prototypes
>  powerpc: Enable compile-time check for syscall handlers
>  powerpc: Use common syscall handler type
>  powerpc: Add ZEROIZE_GPRS macros for register clears
>  Revert "powerpc/syscall: Save r3 in regs->orig_r3"
>  powerpc: Provide syscall wrapper
>  powerpc/64s: Clear/restore caller gprs in syscall interrupt/return
>  powerpc/64s: Use {ZEROIZE,SAVE,REST}_GPRS macros in sc, scv 0 handlers
>  powerpc/32: Clarify interrupt restores with REST_GPR macro in
>entry_32.S
>  powerpc/64e: Clarify register saves and clears with
>{SAVE,ZEROIZE}_GPRS
>  powerpc/64s: Fix comment on interrupt handler prologue
>  powerpc/64s: Clear gprs on interrupt routine entry in Book3S
>  powerpc/64e: Clear gprs on interrupt routine entry
> 
> arch/powerpc/Kconfig |   1 +
> 

Re: [External] Re: [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status

2022-09-11 Thread Serge Semin
On Mon, Sep 12, 2022 at 01:09:05AM +0800, Zhuo Chen wrote:
> 
> 
> On 9/12/22 12:22 AM, Serge Semin wrote:
> > Hi Zhuo
> > 
> > On Fri, Sep 02, 2022 at 02:16:32AM +0800, Zhuo Chen wrote:
> > > Status bits for ERR_NONFATAL errors only are cleared in
> > > pci_aer_clear_nonfatal_status(), but we want clear uncorrectable
> > > error status in ntb_hw_idt.c and lpfc_attr.c. So we add
> > > pci_aer_clear_uncorrect_error_status() and change to use it.
> > 
> > What about the next drivers
> > 
> > drivers/scsi/lpfc/lpfc_attr.c
> > drivers/crypto/hisilicon/qm.c
> > drivers/net/ethernet/intel/ice/ice_main.c
> > 
> > which call the pci_aer_clear_nonfatal_status() method too?
> 
> ‘pci_aer_clear_nonfatal_status()’ in
> drivers/net/ethernet/intel/ice/ice_main.c has already been removed and
> merged in kernel in: 
> https://github.com/torvalds/linux/commit/ca415ea1f03abf34fc8e4cc5fc30a00189b4e776
> 
> ‘pci_aer_clear_nonfatal_status()’ in drivers/crypto/hisilicon/qm.c will be
> removed in the next kernel:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/crypto/hisilicon/qm.c?id=00278564a60e11df8bcca0ececd8b2f55434e406
> 
> Uncorrectable error status register was intended to be cleared in
> drivers/scsi/lpfc/lpfc_attr.c. But originally function was changed in 
> https://github.com/torvalds/linux/commit/e7b0b847de6db161e3917732276e425bc92a2feb
> and
> https://github.com/torvalds/linux/commit/894020fdd88c1e9a74c60b67c0f19f1c7696ba2f

Got it. Thanks for clarification.

-Sergey

> > 
> > > 
> > > Use pci_aer_clear_nonfatal_status() in dpc_process_error(), which has
> > > no functional changes.
> > > 
> > > Since pci_aer_clear_nonfatal_status() is used only internally, move
> > > its declaration to the PCI internal header file. Also, no one cares
> > > about return value of pci_aer_clear_nonfatal_status(), so make it void.
> > > 
> > > Signed-off-by: Zhuo Chen 
> > > ---
> > >   drivers/ntb/hw/idt/ntb_hw_idt.c |  4 ++--
> > >   drivers/pci/pci.h   |  2 ++
> > >   drivers/pci/pcie/aer.c  | 23 ++-
> > >   drivers/pci/pcie/dpc.c  |  3 +--
> > >   drivers/scsi/lpfc/lpfc_attr.c   |  4 ++--
> > >   include/linux/aer.h |  4 ++--
> > >   6 files changed, 27 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c 
> > > b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > index 733557231ed0..de1dbbc5b9de 100644
> > > --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> > > @@ -2657,8 +2657,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
> > >   ret = pci_enable_pcie_error_reporting(pdev);
> > >   if (ret != 0)
> > >   dev_warn(>dev, "PCIe AER capability disabled\n");
> > 
> > > - else /* Cleanup nonfatal error status before getting to init */
> > > - pci_aer_clear_nonfatal_status(pdev);
> > > + else /* Cleanup uncorrectable error status before getting to init */
> > > + pci_aer_clear_uncorrect_error_status(pdev);
> > 
> >  From the IDT NTB PCIe initialization procedure point of view both of
> > these methods are equivalent. So for the IDT NTB part:
> > 
> IDT NTB part is the same as drivers/scsi/lpfc/lpfc_attr.c. The original
> function is clear uncorrectable error status register including fatal and
> non-fatal error status bits.
> 
> > Acked-by: Serge Semin 
> > 
> > -Sergey
> > 
> > >   /* First enable the PCI device */
> > >   ret = pcim_enable_device(pdev);
> > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > index e10cdec6c56e..574176f43025 100644
> > > --- a/drivers/pci/pci.h
> > > +++ b/drivers/pci/pci.h
> > > @@ -686,6 +686,7 @@ void pci_aer_init(struct pci_dev *dev);
> > >   void pci_aer_exit(struct pci_dev *dev);
> > >   extern const struct attribute_group aer_stats_attr_group;
> > >   void pci_aer_clear_fatal_status(struct pci_dev *dev);
> > > +void pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> > >   int pci_aer_clear_status(struct pci_dev *dev);
> > >   int pci_aer_raw_clear_status(struct pci_dev *dev);
> > >   #else
> > > @@ -693,6 +694,7 @@ static inline void pci_no_aer(void) { }
> > >   static inline void pci_aer_init(struct pci_dev *d) { }
> > >   static inline void pci_aer_exit(struct pci_dev *d) { }
> > >   static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
> > > +static inline void pci_aer_clear_nonfatal_status(struct pci_dev *dev) { }
> > >   static inline int pci_aer_clear_status(struct pci_dev *dev) { return 
> > > -EINVAL; }
> > >   static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { 
> > > return -EINVAL; }
> > >   #endif
> > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > index 7952e5efd6cf..d2996afa80f6 100644
> > > --- a/drivers/pci/pcie/aer.c
> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -251,13 +251,13 @@ int pci_disable_pcie_error_reporting(struct pci_dev 
> > > *dev)
> > >   }
> > >   

Re: [External] Re: [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status

2022-09-11 Thread Zhuo Chen




On 9/12/22 12:22 AM, Serge Semin wrote:

Hi Zhuo

On Fri, Sep 02, 2022 at 02:16:32AM +0800, Zhuo Chen wrote:

Status bits for ERR_NONFATAL errors only are cleared in
pci_aer_clear_nonfatal_status(), but we want clear uncorrectable
error status in ntb_hw_idt.c and lpfc_attr.c. So we add
pci_aer_clear_uncorrect_error_status() and change to use it.


What about the next drivers

drivers/scsi/lpfc/lpfc_attr.c
drivers/crypto/hisilicon/qm.c
drivers/net/ethernet/intel/ice/ice_main.c

which call the pci_aer_clear_nonfatal_status() method too?


‘pci_aer_clear_nonfatal_status()’ in 
drivers/net/ethernet/intel/ice/ice_main.c has already been removed and 
merged in kernel in: 
https://github.com/torvalds/linux/commit/ca415ea1f03abf34fc8e4cc5fc30a00189b4e776


‘pci_aer_clear_nonfatal_status()’ in drivers/crypto/hisilicon/qm.c will 
be removed in the next kernel:

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/crypto/hisilicon/qm.c?id=00278564a60e11df8bcca0ececd8b2f55434e406

Uncorrectable error status register was intended to be cleared in 
drivers/scsi/lpfc/lpfc_attr.c. But originally function was changed in 
https://github.com/torvalds/linux/commit/e7b0b847de6db161e3917732276e425bc92a2feb

and
https://github.com/torvalds/linux/commit/894020fdd88c1e9a74c60b67c0f19f1c7696ba2f




Use pci_aer_clear_nonfatal_status() in dpc_process_error(), which has
no functional changes.

Since pci_aer_clear_nonfatal_status() is used only internally, move
its declaration to the PCI internal header file. Also, no one cares
about return value of pci_aer_clear_nonfatal_status(), so make it void.

Signed-off-by: Zhuo Chen 
---
  drivers/ntb/hw/idt/ntb_hw_idt.c |  4 ++--
  drivers/pci/pci.h   |  2 ++
  drivers/pci/pcie/aer.c  | 23 ++-
  drivers/pci/pcie/dpc.c  |  3 +--
  drivers/scsi/lpfc/lpfc_attr.c   |  4 ++--
  include/linux/aer.h |  4 ++--
  6 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index 733557231ed0..de1dbbc5b9de 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -2657,8 +2657,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
ret = pci_enable_pcie_error_reporting(pdev);
if (ret != 0)
dev_warn(>dev, "PCIe AER capability disabled\n");



-   else /* Cleanup nonfatal error status before getting to init */
-   pci_aer_clear_nonfatal_status(pdev);
+   else /* Cleanup uncorrectable error status before getting to init */
+   pci_aer_clear_uncorrect_error_status(pdev);


 From the IDT NTB PCIe initialization procedure point of view both of
these methods are equivalent. So for the IDT NTB part:

IDT NTB part is the same as drivers/scsi/lpfc/lpfc_attr.c. The original 
function is clear uncorrectable error status register including fatal 
and non-fatal error status bits.



Acked-by: Serge Semin 

-Sergey

  
  	/* First enable the PCI device */

ret = pcim_enable_device(pdev);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index e10cdec6c56e..574176f43025 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -686,6 +686,7 @@ void pci_aer_init(struct pci_dev *dev);
  void pci_aer_exit(struct pci_dev *dev);
  extern const struct attribute_group aer_stats_attr_group;
  void pci_aer_clear_fatal_status(struct pci_dev *dev);
+void pci_aer_clear_nonfatal_status(struct pci_dev *dev);
  int pci_aer_clear_status(struct pci_dev *dev);
  int pci_aer_raw_clear_status(struct pci_dev *dev);
  #else
@@ -693,6 +694,7 @@ static inline void pci_no_aer(void) { }
  static inline void pci_aer_init(struct pci_dev *d) { }
  static inline void pci_aer_exit(struct pci_dev *d) { }
  static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
+static inline void pci_aer_clear_nonfatal_status(struct pci_dev *dev) { }
  static inline int pci_aer_clear_status(struct pci_dev *dev) { return -EINVAL; 
}
  static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return 
-EINVAL; }
  #endif
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 7952e5efd6cf..d2996afa80f6 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -251,13 +251,13 @@ int pci_disable_pcie_error_reporting(struct pci_dev *dev)
  }
  EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
  
-int pci_aer_clear_nonfatal_status(struct pci_dev *dev)

+void pci_aer_clear_nonfatal_status(struct pci_dev *dev)
  {
int aer = dev->aer_cap;
u32 status, sev;
  
  	if (!pcie_aer_is_native(dev))

-   return -EIO;
+   return;
  
  	/* Clear status bits for ERR_NONFATAL errors only */

pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, );
@@ -265,10 +265,7 @@ int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
status &= ~sev;
if (status)
pci_write_config_dword(dev, aer + 

Re: [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status

2022-09-11 Thread Serge Semin
Hi Zhuo

On Fri, Sep 02, 2022 at 02:16:32AM +0800, Zhuo Chen wrote:
> Status bits for ERR_NONFATAL errors only are cleared in
> pci_aer_clear_nonfatal_status(), but we want clear uncorrectable
> error status in ntb_hw_idt.c and lpfc_attr.c. So we add
> pci_aer_clear_uncorrect_error_status() and change to use it.

What about the next drivers

drivers/scsi/lpfc/lpfc_attr.c
drivers/crypto/hisilicon/qm.c
drivers/net/ethernet/intel/ice/ice_main.c

which call the pci_aer_clear_nonfatal_status() method too?

> 
> Use pci_aer_clear_nonfatal_status() in dpc_process_error(), which has
> no functional changes.
> 
> Since pci_aer_clear_nonfatal_status() is used only internally, move
> its declaration to the PCI internal header file. Also, no one cares
> about return value of pci_aer_clear_nonfatal_status(), so make it void.
> 
> Signed-off-by: Zhuo Chen 
> ---
>  drivers/ntb/hw/idt/ntb_hw_idt.c |  4 ++--
>  drivers/pci/pci.h   |  2 ++
>  drivers/pci/pcie/aer.c  | 23 ++-
>  drivers/pci/pcie/dpc.c  |  3 +--
>  drivers/scsi/lpfc/lpfc_attr.c   |  4 ++--
>  include/linux/aer.h |  4 ++--
>  6 files changed, 27 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> index 733557231ed0..de1dbbc5b9de 100644
> --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> @@ -2657,8 +2657,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
>   ret = pci_enable_pcie_error_reporting(pdev);
>   if (ret != 0)
>   dev_warn(>dev, "PCIe AER capability disabled\n");

> - else /* Cleanup nonfatal error status before getting to init */
> - pci_aer_clear_nonfatal_status(pdev);
> + else /* Cleanup uncorrectable error status before getting to init */
> + pci_aer_clear_uncorrect_error_status(pdev);

>From the IDT NTB PCIe initialization procedure point of view both of
these methods are equivalent. So for the IDT NTB part:

Acked-by: Serge Semin 

-Sergey

>  
>   /* First enable the PCI device */
>   ret = pcim_enable_device(pdev);
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index e10cdec6c56e..574176f43025 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -686,6 +686,7 @@ void pci_aer_init(struct pci_dev *dev);
>  void pci_aer_exit(struct pci_dev *dev);
>  extern const struct attribute_group aer_stats_attr_group;
>  void pci_aer_clear_fatal_status(struct pci_dev *dev);
> +void pci_aer_clear_nonfatal_status(struct pci_dev *dev);
>  int pci_aer_clear_status(struct pci_dev *dev);
>  int pci_aer_raw_clear_status(struct pci_dev *dev);
>  #else
> @@ -693,6 +694,7 @@ static inline void pci_no_aer(void) { }
>  static inline void pci_aer_init(struct pci_dev *d) { }
>  static inline void pci_aer_exit(struct pci_dev *d) { }
>  static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
> +static inline void pci_aer_clear_nonfatal_status(struct pci_dev *dev) { }
>  static inline int pci_aer_clear_status(struct pci_dev *dev) { return 
> -EINVAL; }
>  static inline int pci_aer_raw_clear_status(struct pci_dev *dev) { return 
> -EINVAL; }
>  #endif
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 7952e5efd6cf..d2996afa80f6 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -251,13 +251,13 @@ int pci_disable_pcie_error_reporting(struct pci_dev 
> *dev)
>  }
>  EXPORT_SYMBOL_GPL(pci_disable_pcie_error_reporting);
>  
> -int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> +void pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>  {
>   int aer = dev->aer_cap;
>   u32 status, sev;
>  
>   if (!pcie_aer_is_native(dev))
> - return -EIO;
> + return;
>  
>   /* Clear status bits for ERR_NONFATAL errors only */
>   pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, );
> @@ -265,10 +265,7 @@ int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>   status &= ~sev;
>   if (status)
>   pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
> -
> - return 0;
>  }
> -EXPORT_SYMBOL_GPL(pci_aer_clear_nonfatal_status);
>  
>  void pci_aer_clear_fatal_status(struct pci_dev *dev)
>  {
> @@ -286,6 +283,22 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
>   pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
>  }
>  
> +int pci_aer_clear_uncorrect_error_status(struct pci_dev *dev)
> +{
> + int aer = dev->aer_cap;
> + u32 status;
> +
> + if (!pcie_aer_is_native(dev))
> + return -EIO;
> +
> + pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, );
> + if (status)
> + pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, status);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_aer_clear_uncorrect_error_status);
> +
>  /**
>   * pci_aer_raw_clear_status - Clear AER error registers.
>   * @dev: the PCI device
> diff --git 

Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

2022-09-11 Thread Ard Biesheuvel
On Sun, 11 Sept 2022 at 16:26, Peter Zijlstra  wrote:
>
> On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> > Alternatives
> > 
> >
> > Another idea which has been floated in the past is for objtool to read
> > DWARF (or .eh_frame) to help it figure out the control flow.  That
> > hasn't been tried yet, but would be considerably more difficult and
> > fragile IMO.
>
> I though Ard played around with that a bit on ARM64. And yes, given that
> most toolchains consider DWARF itself best-effort, I'm not holding my
> breath there.
>

I have patches out that use unwind data to locate pointer auth
sign/authenticate instructions in the code, in order to patch them to
shadow call stack pushes and pops at runtime if pointer authentication
is not supported by the hardware. This has little to do with objtool
or reliable stack traces.

I still think DWARF could help to make objtool's job a bit easier, but
I don't think it will be of any use with jump tables or noreturn
functions in particular.


Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}

2022-09-11 Thread Peter Zijlstra
On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> Alternatives
> 
> 
> Another idea which has been floated in the past is for objtool to read
> DWARF (or .eh_frame) to help it figure out the control flow.  That
> hasn't been tried yet, but would be considerably more difficult and
> fragile IMO.

I though Ard played around with that a bit on ARM64. And yes, given that
most toolchains consider DWARF itself best-effort, I'm not holding my
breath there.

On top of that, building a kernel with DWARFs on is just so much
slower..


Re: [PATCH v2] powerpc: select HAVE_PATA_PLATFORM in PPC instead of creating a PPC dependency

2022-09-11 Thread Damien Le Moal
On 2022/09/11 21:41, Arnd Bergmann wrote:
> On Sun, Sep 11, 2022, at 1:54 PM, Damien Le Moal wrote:
>> On 2022/09/09 20:31, Arnd Bergmann wrote:
>>>  
>>>  config PATA_PLATFORM
>>> -   tristate "Generic platform device PATA support"
>>> -   depends on EXPERT || PPC || HAVE_PATA_PLATFORM
>>> +   tristate "Generic platform device PATA support" if EXPERT || 
>>> HAVE_PATA_PLATFORM
>>
>> Shouldn't this be:
>>
>>  tristate "Generic platform device PATA support" if EXPERT || PPC
>>
>> ?
>>
>> And while at it, it would be nice to add "|| COMPILE_TEST" too.
> 
> The idea was that this can be selected by CONFIG_PATA_OF_PLATFORM
> in any configuration that has CONFIG_OF enabled. Since PPC
> has CONFIG_OF enabled unconditionally, there is no need to
> make this option visible separately.
> 
> Same for compile-testing: since CONFIG_OF can be enabled on
> any architecture, PATA_OF_PLATFORM is already covered by
> allmodconfig builds anywhere. The separate HAVE_PATA_PLATFORM
> is only needed for machines that want the non-OF pata-platform
> module (sh, m68k-mac, mips-sibyte arm-s3c-simtec).

Got it. Thanks for the details.

> 
>Arnd

-- 
Damien Le Moal
Western Digital Research



Re: [PATCH v2] powerpc: select HAVE_PATA_PLATFORM in PPC instead of creating a PPC dependency

2022-09-11 Thread Arnd Bergmann
On Sun, Sep 11, 2022, at 1:54 PM, Damien Le Moal wrote:
> On 2022/09/09 20:31, Arnd Bergmann wrote:
>>  
>>  config PATA_PLATFORM
>> -tristate "Generic platform device PATA support"
>> -depends on EXPERT || PPC || HAVE_PATA_PLATFORM
>> +tristate "Generic platform device PATA support" if EXPERT || 
>> HAVE_PATA_PLATFORM
>
> Shouldn't this be:
>
>   tristate "Generic platform device PATA support" if EXPERT || PPC
>
> ?
>
> And while at it, it would be nice to add "|| COMPILE_TEST" too.

The idea was that this can be selected by CONFIG_PATA_OF_PLATFORM
in any configuration that has CONFIG_OF enabled. Since PPC
has CONFIG_OF enabled unconditionally, there is no need to
make this option visible separately.

Same for compile-testing: since CONFIG_OF can be enabled on
any architecture, PATA_OF_PLATFORM is already covered by
allmodconfig builds anywhere. The separate HAVE_PATA_PLATFORM
is only needed for machines that want the non-OF pata-platform
module (sh, m68k-mac, mips-sibyte arm-s3c-simtec).

   Arnd


Re: [PATCH v2] powerpc: select HAVE_PATA_PLATFORM in PPC instead of creating a PPC dependency

2022-09-11 Thread Damien Le Moal
On 2022/09/09 20:31, Arnd Bergmann wrote:
> On Fri, Sep 9, 2022, at 1:19 PM, Christophe Leroy wrote:
>> Le 09/09/2022 à 13:09, Arnd Bergmann a écrit :
>>> On Fri, Sep 9, 2022, at 11:03 AM, Lukas Bulwahn wrote:
>>>
>>> I don't see a single powerpc machine that creates a
>>>   name="pata_platform" platform_device. I suspect this was
>>> only needed bwfore 2007 commit 9cd55be4d223 ("[POWERPC] pasemi:
>>> Move electra-ide to pata_of_platform"), so the "|| PPC"
>>> bit should just get removed without adding the HAVE_PATA_PLATFORM
>>> bit.
>>
>> But that was added in 2008 by commit 61f7162117d4 ("libata: 
>> pata_of_platform: OF-Platform PATA device driver")
> 
> Ah, I see. In that case, I think we should probably just always
> allow PATA_OF_PLATFORM to be enabled regardless of
> HAVE_PATA_PLATFORM, something like
> 
> diff --git a/drivers/ata/Kconfig b/drivers/ata/Kconfig
> index 1c9f4fb2595d..c93d97455744 100644
> --- a/drivers/ata/Kconfig
> +++ b/drivers/ata/Kconfig
> @@ -1102,8 +1102,7 @@ config PATA_PCMCIA
> If unsure, say N.
>  
>  config PATA_PLATFORM
> - tristate "Generic platform device PATA support"
> - depends on EXPERT || PPC || HAVE_PATA_PLATFORM
> + tristate "Generic platform device PATA support" if EXPERT || 
> HAVE_PATA_PLATFORM

Shouldn't this be:

tristate "Generic platform device PATA support" if EXPERT || PPC

?

And while at it, it would be nice to add "|| COMPILE_TEST" too.

>   help
> This option enables support for generic directly connected ATA
> devices commonly found on embedded systems.
> @@ -1112,7 +,8 @@ config PATA_PLATFORM
>  
>  config PATA_OF_PLATFORM
>   tristate "OpenFirmware platform device PATA support"
> - depends on PATA_PLATFORM && OF
> + depends on OF
> + select PATA_PLATFORM
>   help
> This option enables support for generic directly connected ATA
> devices commonly found on embedded systems with OpenFirmware
> 
> and then also drop the "select HAVE_PATA_PLATFORM" from
> arm64 and arm/versatile.
> 
> Or we can go one step further, and either split out the
> 'pata_platform_driver' into a separate file from
> '__pata_platform_probe', or merge pata_of_platform.c
> back into pata_platform.c.
> 
>   Arnd

-- 
Damien Le Moal
Western Digital Research



Re: [RFC PATCH RESEND 00/28] per-VMA locks proposal

2022-09-11 Thread Vlastimil Babka
On 9/2/22 01:26, Suren Baghdasaryan wrote:
> On Thu, Sep 1, 2022 at 1:58 PM Kent Overstreet
>  wrote:
>>
>> On Thu, Sep 01, 2022 at 10:34:48AM -0700, Suren Baghdasaryan wrote:
>> > Resending to fix the issue with the In-Reply-To tag in the original
>> > submission at [4].
>> >
>> > This is a proof of concept for per-vma locks idea that was discussed
>> > during SPF [1] discussion at LSF/MM this year [2], which concluded with
>> > suggestion that “a reader/writer semaphore could be put into the VMA
>> > itself; that would have the effect of using the VMA as a sort of range
>> > lock. There would still be contention at the VMA level, but it would be an
>> > improvement.” This patchset implements this suggested approach.
>> >
>> > When handling page faults we lookup the VMA that contains the faulting
>> > page under RCU protection and try to acquire its lock. If that fails we
>> > fall back to using mmap_lock, similar to how SPF handled this situation.
>> >
>> > One notable way the implementation deviates from the proposal is the way
>> > VMAs are marked as locked. Because during some of mm updates multiple
>> > VMAs need to be locked until the end of the update (e.g. vma_merge,
>> > split_vma, etc). Tracking all the locked VMAs, avoiding recursive locks
>> > and other complications would make the code more complex. Therefore we
>> > provide a way to "mark" VMAs as locked and then unmark all locked VMAs
>> > all at once. This is done using two sequence numbers - one in the
>> > vm_area_struct and one in the mm_struct. VMA is considered locked when
>> > these sequence numbers are equal. To mark a VMA as locked we set the
>> > sequence number in vm_area_struct to be equal to the sequence number
>> > in mm_struct. To unlock all VMAs we increment mm_struct's seq number.
>> > This allows for an efficient way to track locked VMAs and to drop the
>> > locks on all VMAs at the end of the update.
>>
>> I like it - the sequence numbers are a stroke of genuius. For what it's doing
>> the patchset seems almost small.
> 
> Thanks for reviewing it!
> 
>>
>> Two complaints so far:
>>  - I don't like the vma_mark_locked() name. To me it says that the caller
>>already took or is taking the lock and this function is just marking that
>>we're holding the lock, but it's really taking a different type of lock. 
>> But
>>this function can block, it really is taking a lock, so it should say 
>> that.
>>
>>This is AFAIK a new concept, not sure I'm going to have anything good 
>> either,
>>but perhaps vma_lock_multiple()?
> 
> I'm open to name suggestions but vma_lock_multiple() is a bit
> confusing to me. Will wait for more suggestions.

Well, it does act like a vma_write_lock(), no? So why not that name. The
checking function for it is even called vma_assert_write_locked().

We just don't provide a single vma_write_unlock(), but a
vma_mark_unlocked_all(), that could be instead named e.g.
vma_write_unlock_all().
But it's called on a mm, so maybe e.g. mm_vma_write_unlock_all()?