Re: [PATCH RFC 0/3] mm/gup: consistently call it GUP-fast

2024-03-27 Thread Vineet Gupta
+CC Alexey

On 3/27/24 09:22, Arnd Bergmann wrote:
> On Wed, Mar 27, 2024, at 16:39, David Hildenbrand wrote:
>> On 27.03.24 16:21, Peter Xu wrote:
>>> On Wed, Mar 27, 2024 at 02:05:35PM +0100, David Hildenbrand wrote:
>>>
>>> I'm not sure what config you tried there; as I am doing some build tests
>>> recently, I found turning off CONFIG_SAMPLES + CONFIG_GCC_PLUGINS could
>>> avoid a lot of issues, I think it's due to libc missing.  But maybe not the
>>> case there.
>> CCin Arnd; I use some of his compiler chains, others from Fedora directly. 
>> For
>> example for alpha and arc, the Fedora gcc is "13.2.1".
>> But there is other stuff like (arc):
>>
>> ./arch/arc/include/asm/mmu-arcv2.h: In function 'mmu_setup_asid':
>> ./arch/arc/include/asm/mmu-arcv2.h:82:9: error: implicit declaration of 
>> function 'write_aux_reg' [-Werro
>> r=implicit-function-declaration]
>> 82 | write_aux_reg(ARC_REG_PID, asid | MMU_ENABLE);
>>| ^
> Seems to be missing an #include of soc/arc/aux.h, but I can't
> tell when this first broke without bisecting.

Weird I don't see this one but I only have gcc 12 handy ATM.

    gcc version 12.2.1 20230306 (ARC HS GNU/Linux glibc toolchain -
build 1360)

I even tried W=1 (which according to scripts/Makefile.extrawarn) should
include -Werror=implicit-function-declaration but don't see this still.

Tomorrow I'll try building a gcc 13.2.1 for ARC.


>
>> or (alpha)
>>
>> WARNING: modpost: "saved_config" [vmlinux] is COMMON symbol
>> ERROR: modpost: "memcpy" [fs/reiserfs/reiserfs.ko] undefined!
>> ERROR: modpost: "memcpy" [fs/nfs/nfs.ko] undefined!
>> ERROR: modpost: "memcpy" [fs/nfs/nfsv3.ko] undefined!
>> ERROR: modpost: "memcpy" [fs/nfsd/nfsd.ko] undefined!
>> ERROR: modpost: "memcpy" [fs/lockd/lockd.ko] undefined!
>> ERROR: modpost: "memcpy" [crypto/crypto.ko] undefined!
>> ERROR: modpost: "memcpy" [crypto/crypto_algapi.ko] undefined!
>> ERROR: modpost: "memcpy" [crypto/aead.ko] undefined!
>> ERROR: modpost: "memcpy" [crypto/crypto_skcipher.ko] undefined!
>> ERROR: modpost: "memcpy" [crypto/seqiv.ko] undefined!

Are these from ARC build or otherwise ?

Thx,
-Vineet


Re: [PATCH v11 09/11] poweprc: mm: Implement *_user_accessible_page() for ptes

2024-03-27 Thread Rohan McLure
On Thu, 2024-03-28 at 05:40 +, Christophe Leroy wrote:
> 
> 
> Le 28/03/2024 à 05:55, Rohan McLure a écrit :
> > Page table checking depends on architectures providing an
> > implementation of p{te,md,ud}_user_accessible_page. With
> > refactorisations made on powerpc/mm, the pte_access_permitted() and
> > similar methods verify whether a userland page is accessible with
> > the
> > required permissions.
> > 
> > Since page table checking is the only user of
> > p{te,md,ud}_user_accessible_page(), implement these for all
> > platforms,
> > using some of the same preliminary checks taken by
> > pte_access_permitted()
> > on that platform.
> > 
> > Since Commit 8e9bd41e4ce1 ("powerpc/nohash: Replace pte_user() by
> > pte_read()")
> > pte_user() is no longer required to be present on all platforms as
> > it
> > may be equivalent to or implied by pte_read(). Hence
> > implementations of
> > pte_user_accessible_page() are specialised.
> > 
> > Signed-off-by: Rohan McLure 
> > ---
> > v9: New implementation
> > v10: Let book3s/64 use pte_user(), but otherwise default other
> > platforms
> > to using the address provided with the call to infer whether it is
> > a
> > user page or not. pmd/pud variants will warn on all other
> > platforms, as
> > they should not be used for user page mappings
> > v11: Conditionally define p{m,u}d_user_accessible_page(), as not
> > all
> > platforms have p{m,u}d_leaf(), p{m,u}d_pte() stubs.
> 
> See my comment to v10 patch 10.
> 
> p{m,u}d_leaf() is defined for all platforms (There is a fallback 
> definition in include/linux/pgtable.h) so
> p{m,u}d_user_accessible_page() 
> can be defined for all platforms, no need for a conditionally define.

The issue I see is that the definition in include/linux/pgtable.h
occurs after this header is included. Prior to the removal of a local
definition of p{m,u}d_leaf() etc we didn't run into this issue, but we
still do now.

Not insistent on doing it this way with ifndef, so amenable to
suggestions if you have a preference.

> 
> > ---
> >   arch/powerpc/include/asm/book3s/32/pgtable.h |  5 +
> >   arch/powerpc/include/asm/book3s/64/pgtable.h | 17
> > +
> >   arch/powerpc/include/asm/nohash/pgtable.h    |  5 +
> >   arch/powerpc/include/asm/pgtable.h   |  8 
> >   4 files changed, 35 insertions(+)
> > 
> > diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h
> > b/arch/powerpc/include/asm/book3s/32/pgtable.h
> > index 52971ee30717..83f7b98ef49f 100644
> > --- a/arch/powerpc/include/asm/book3s/32/pgtable.h
> > +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
> > @@ -436,6 +436,11 @@ static inline bool pte_access_permitted(pte_t
> > pte, bool write)
> >     return true;
> >   }
> >   
> > +static inline bool pte_user_accessible_page(pte_t pte, unsigned
> > long addr)
> > +{
> > +   return pte_present(pte) && !is_kernel_addr(addr);
> > +}
> > +
> >   /* Conversion functions: convert a page and protection to a page
> > entry,
> >    * and a page entry and page directory to the page they refer to.
> >    *
> > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h
> > b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > index fac5615e6bc5..d8640ddbcad1 100644
> > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > @@ -538,6 +538,11 @@ static inline bool pte_access_permitted(pte_t
> > pte, bool write)
> >     return arch_pte_access_permitted(pte_val(pte), write, 0);
> >   }
> >   
> > +static inline bool pte_user_accessible_page(pte_t pte, unsigned
> > long addr)
> > +{
> > +   return pte_present(pte) && pte_user(pte);
> > +}
> > +
> >   /*
> >    * Conversion functions: convert a page and protection to a page
> > entry,
> >    * and a page entry and page directory to the page they refer to.
> > @@ -1441,5 +1446,17 @@ static inline bool pud_leaf(pud_t pud)
> >     return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
> >   }
> >   
> > +#define pmd_user_accessible_page pmd_user_accessible_page
> > +static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned
> > long addr)
> > +{
> > +   return pmd_leaf(pmd) &&
> > pte_user_accessible_page(pmd_pte(pmd), addr);
> > +}
> > +
> > +#define pud_user_accessible_page pud_user_accessible_page
> > +static inline bool pud_user_accessible_page(pud_t pud, unsigned
> > long addr)
> > +{
> > +   return pud_leaf(pud) &&
> > pte_user_accessible_page(pud_pte(pud), addr);
> > +}
> > +
> >   #endif /* __ASSEMBLY__ */
> >   #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */
> > diff --git a/arch/powerpc/include/asm/nohash/pgtable.h
> > b/arch/powerpc/include/asm/nohash/pgtable.h
> > index 427db14292c9..413d01a51e6f 100644
> > --- a/arch/powerpc/include/asm/nohash/pgtable.h
> > +++ b/arch/powerpc/include/asm/nohash/pgtable.h
> > @@ -213,6 +213,11 @@ static inline bool pte_access_permitted(pte_t
> > pte, bool write)
> >     return true;
> >   }
> >   
> > +static inline bool pte_user_accessible_page(pte_t pte, 

Re: [PATCH v11 09/11] poweprc: mm: Implement *_user_accessible_page() for ptes

2024-03-27 Thread Christophe Leroy


Le 28/03/2024 à 05:55, Rohan McLure a écrit :
> Page table checking depends on architectures providing an
> implementation of p{te,md,ud}_user_accessible_page. With
> refactorisations made on powerpc/mm, the pte_access_permitted() and
> similar methods verify whether a userland page is accessible with the
> required permissions.
> 
> Since page table checking is the only user of
> p{te,md,ud}_user_accessible_page(), implement these for all platforms,
> using some of the same preliminary checks taken by pte_access_permitted()
> on that platform.
> 
> Since Commit 8e9bd41e4ce1 ("powerpc/nohash: Replace pte_user() by pte_read()")
> pte_user() is no longer required to be present on all platforms as it
> may be equivalent to or implied by pte_read(). Hence implementations of
> pte_user_accessible_page() are specialised.
> 
> Signed-off-by: Rohan McLure 
> ---
> v9: New implementation
> v10: Let book3s/64 use pte_user(), but otherwise default other platforms
> to using the address provided with the call to infer whether it is a
> user page or not. pmd/pud variants will warn on all other platforms, as
> they should not be used for user page mappings
> v11: Conditionally define p{m,u}d_user_accessible_page(), as not all
> platforms have p{m,u}d_leaf(), p{m,u}d_pte() stubs.

See my comment to v10 patch 10.

p{m,u}d_leaf() is defined for all platforms (There is a fallback 
definition in include/linux/pgtable.h) so p{m,u}d_user_accessible_page() 
can be defined for all platforms, no need for a conditionally define.

> ---
>   arch/powerpc/include/asm/book3s/32/pgtable.h |  5 +
>   arch/powerpc/include/asm/book3s/64/pgtable.h | 17 +
>   arch/powerpc/include/asm/nohash/pgtable.h|  5 +
>   arch/powerpc/include/asm/pgtable.h   |  8 
>   4 files changed, 35 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
> b/arch/powerpc/include/asm/book3s/32/pgtable.h
> index 52971ee30717..83f7b98ef49f 100644
> --- a/arch/powerpc/include/asm/book3s/32/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
> @@ -436,6 +436,11 @@ static inline bool pte_access_permitted(pte_t pte, bool 
> write)
>   return true;
>   }
>   
> +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
> +{
> + return pte_present(pte) && !is_kernel_addr(addr);
> +}
> +
>   /* Conversion functions: convert a page and protection to a page entry,
>* and a page entry and page directory to the page they refer to.
>*
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index fac5615e6bc5..d8640ddbcad1 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -538,6 +538,11 @@ static inline bool pte_access_permitted(pte_t pte, bool 
> write)
>   return arch_pte_access_permitted(pte_val(pte), write, 0);
>   }
>   
> +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
> +{
> + return pte_present(pte) && pte_user(pte);
> +}
> +
>   /*
>* Conversion functions: convert a page and protection to a page entry,
>* and a page entry and page directory to the page they refer to.
> @@ -1441,5 +1446,17 @@ static inline bool pud_leaf(pud_t pud)
>   return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
>   }
>   
> +#define pmd_user_accessible_page pmd_user_accessible_page
> +static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
> +{
> + return pmd_leaf(pmd) && pte_user_accessible_page(pmd_pte(pmd), addr);
> +}
> +
> +#define pud_user_accessible_page pud_user_accessible_page
> +static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
> +{
> + return pud_leaf(pud) && pte_user_accessible_page(pud_pte(pud), addr);
> +}
> +
>   #endif /* __ASSEMBLY__ */
>   #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */
> diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
> b/arch/powerpc/include/asm/nohash/pgtable.h
> index 427db14292c9..413d01a51e6f 100644
> --- a/arch/powerpc/include/asm/nohash/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/pgtable.h
> @@ -213,6 +213,11 @@ static inline bool pte_access_permitted(pte_t pte, bool 
> write)
>   return true;
>   }
>   
> +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
> +{
> + return pte_present(pte) && !is_kernel_addr(addr);
> +}
> +
>   /* Conversion functions: convert a page and protection to a page entry,
>* and a page entry and page directory to the page they refer to.
>*
> diff --git a/arch/powerpc/include/asm/pgtable.h 
> b/arch/powerpc/include/asm/pgtable.h
> index ee8c82c0528f..f1ceae778cb1 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -219,6 +219,14 @@ static inline int pud_pfn(pud_t pud)
>   }
>   #endif
>   
> +#ifndef pmd_user_accessible_page
> +#define pmd_user_accessible_page(pmd, addr)  false
> +#endif
> +
> +#ifndef 

Re: [PATCH v11 08/11] powerpc: mm: Add pud_pfn() stub

2024-03-27 Thread Christophe Leroy


Le 28/03/2024 à 05:55, Rohan McLure a écrit :
> The page table check feature requires that pud_pfn() be defined
> on each consuming architecture. Since only 64-bit, Book3S platforms
> allow for hugepages at this upper level, and since the calling code is
> gated by a call to pud_user_accessible_page(), which will return zero,
> include this stub as a BUILD_BUG().
> 
> Signed-off-by: Rohan McLure 
> ---
> v11: pud_pfn() stub has been removed upstream as it has valid users now
> in transparent hugepages. Create a BUG_ON() for other, non Book3S64
> platforms.
> ---
>   arch/powerpc/include/asm/pgtable.h | 8 
>   1 file changed, 8 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/pgtable.h 
> b/arch/powerpc/include/asm/pgtable.h
> index 239709a2f68e..ee8c82c0528f 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -211,6 +211,14 @@ static inline bool 
> arch_supports_memmap_on_memory(unsigned long vmemmap_size)
>   
>   #endif /* CONFIG_PPC64 */
>   
> +#ifndef pud_pfn
> +#define pud_pfn pud_pfn
> +static inline int pud_pfn(pud_t pud)
> +{
> + BUILD_BUG();

This function must return something.

> +}
> +#endif
> +
>   #endif /* __ASSEMBLY__ */
>   
>   #endif /* _ASM_POWERPC_PGTABLE_H */


[PATCH v11 02/11] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pmd_set"

2024-03-27 Thread Rohan McLure
This reverts commit a3b837130b5865521fa8662aceaa6ebc8d29389a.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

riscv: Respect change to delete mm, addr parameters from __set_pte_at()

This commit also changed calls to __set_pte_at() to use fewer parameters
on riscv. Keep that change rather than reverting it, as the signature of
__set_pte_at() is changed in a different commit.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  4 ++--
 arch/riscv/include/asm/pgtable.h |  4 ++--
 arch/x86/include/asm/pgtable.h   |  4 ++--
 include/linux/page_table_check.h | 11 +++
 mm/page_table_check.c|  3 ++-
 5 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7334e5526185..995cc6213d0d 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -560,7 +560,7 @@ static inline void __set_pte_at(struct mm_struct *mm,
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
  pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(mm, pmdp, pmd);
+   page_table_check_pmd_set(mm, addr, pmdp, pmd);
return __set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd),
PMD_SIZE >> PAGE_SHIFT);
 }
@@ -1239,7 +1239,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct 
*mm,
 static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(vma->vm_mm, pmdp, pmd);
+   page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
return __pmd(xchg_relaxed(_val(*pmdp), pmd_val(pmd)));
 }
 #endif
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 1e0c0717b3f9..7b4053ff597e 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -712,7 +712,7 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd)
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(mm, pmdp, pmd);
+   page_table_check_pmd_set(mm, addr, pmdp, pmd);
return __set_pte_at((pte_t *)pmdp, pmd_pte(pmd));
 }
 
@@ -783,7 +783,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
 static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(vma->vm_mm, pmdp, pmd);
+   page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
return __pmd(atomic_long_xchg((atomic_long_t *)pmdp, pmd_val(pmd)));
 }
 
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 09db55fa8856..82bbe115a1a4 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1238,7 +1238,7 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t 
*pudp)
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
  pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(mm, pmdp, pmd);
+   page_table_check_pmd_set(mm, addr, pmdp, pmd);
set_pmd(pmdp, pmd);
 }
 
@@ -1383,7 +1383,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct 
*mm,
 static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
-   page_table_check_pmd_set(vma->vm_mm, pmdp, pmd);
+   page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd);
if (IS_ENABLED(CONFIG_SMP)) {
return xchg(pmdp, pmd);
} else {
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index d188428512f5..5855d690c48a 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -19,7 +19,8 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t 
pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
unsigned int nr);
-void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd);
+void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
+   pmd_t *pmdp, pmd_t pmd);
 void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
pud_t *pudp, pud_t pud);
 void __page_table_check_pte_clear_range(struct mm_struct *mm,
@@ -75,13 +76,14 @@ static inline void page_table_check_ptes_set(struct 
mm_struct *mm,
__page_table_check_ptes_set(mm, ptep, pte, nr);
 }
 
-static inline void page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp,
+static inline void 

[PATCH v11 08/11] powerpc: mm: Add pud_pfn() stub

2024-03-27 Thread Rohan McLure
The page table check feature requires that pud_pfn() be defined
on each consuming architecture. Since only 64-bit, Book3S platforms
allow for hugepages at this upper level, and since the calling code is
gated by a call to pud_user_accessible_page(), which will return zero,
include this stub as a BUILD_BUG().

Signed-off-by: Rohan McLure 
---
v11: pud_pfn() stub has been removed upstream as it has valid users now
in transparent hugepages. Create a BUG_ON() for other, non Book3S64
platforms.
---
 arch/powerpc/include/asm/pgtable.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 239709a2f68e..ee8c82c0528f 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -211,6 +211,14 @@ static inline bool arch_supports_memmap_on_memory(unsigned 
long vmemmap_size)
 
 #endif /* CONFIG_PPC64 */
 
+#ifndef pud_pfn
+#define pud_pfn pud_pfn
+static inline int pud_pfn(pud_t pud)
+{
+   BUILD_BUG();
+}
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.44.0



[PATCH v11 03/11] mm: Provide addr parameter to page_table_check_pte_set()

2024-03-27 Thread Rohan McLure
To provide support for powerpc platforms, provide an addr parameter to
the page_table_check_pte_set() routine. This parameter is needed on some
powerpc platforms which do not encode whether a mapping is for user or
kernel in the pte. On such platforms, this can be inferred form the
addr parameter.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 include/linux/page_table_check.h | 12 +++-
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  4 ++--
 5 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 995cc6213d0d..b3938f80a1b6 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -376,7 +376,7 @@ static inline void __set_ptes(struct mm_struct *mm,
  unsigned long __always_unused addr,
  pte_t *ptep, pte_t pte, unsigned int nr)
 {
-   page_table_check_ptes_set(mm, ptep, pte, nr);
+   page_table_check_ptes_set(mm, addr, ptep, pte, nr);
__sync_cache_and_tags(pte, nr);
 
for (;;) {
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 7b4053ff597e..a153d3d143d2 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -532,7 +532,7 @@ static inline void __set_pte_at(pte_t *ptep, pte_t pteval)
 static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pteval, unsigned int nr)
 {
-   page_table_check_ptes_set(mm, ptep, pteval, nr);
+   page_table_check_ptes_set(mm, addr, ptep, pteval, nr);
 
for (;;) {
__set_pte_at(ptep, pteval);
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 5855d690c48a..9243c920ed02 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -17,8 +17,8 @@ void __page_table_check_zero(struct page *page, unsigned int 
order);
 void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
 void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud);
-void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
-   unsigned int nr);
+void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
+pte_t *ptep, pte_t pte, unsigned int nr);
 void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd);
 void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
@@ -68,12 +68,13 @@ static inline void page_table_check_pud_clear(struct 
mm_struct *mm, pud_t pud)
 }
 
 static inline void page_table_check_ptes_set(struct mm_struct *mm,
-   pte_t *ptep, pte_t pte, unsigned int nr)
+unsigned long addr, pte_t *ptep,
+pte_t pte, unsigned int nr)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_ptes_set(mm, ptep, pte, nr);
+   __page_table_check_ptes_set(mm, addr, ptep, pte, nr);
 }
 
 static inline void page_table_check_pmd_set(struct mm_struct *mm,
@@ -129,7 +130,8 @@ static inline void page_table_check_pud_clear(struct 
mm_struct *mm, pud_t pud)
 }
 
 static inline void page_table_check_ptes_set(struct mm_struct *mm,
-   pte_t *ptep, pte_t pte, unsigned int nr)
+unsigned long addr, pte_t *ptep,
+pte_t pte, unsigned int nr)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 85fc7554cd52..b2b4c1160d4a 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -264,7 +264,7 @@ static inline pte_t pte_advance_pfn(pte_t pte, unsigned 
long nr)
 static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, unsigned int nr)
 {
-   page_table_check_ptes_set(mm, ptep, pte, nr);
+   page_table_check_ptes_set(mm, addr, ptep, pte, nr);
 
arch_enter_lazy_mmu_mode();
for (;;) {
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 7b9d7b45505d..3a338fee6d00 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -182,8 +182,8 @@ void __page_table_check_pud_clear(struct mm_struct *mm, 
pud_t pud)
 }
 EXPORT_SYMBOL(__page_table_check_pud_clear);
 
-void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
-   unsigned int nr)
+void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
+pte_t *ptep, pte_t pte, unsigned int nr)
 {
unsigned int i;
 
-- 
2.44.0



[PATCH v11 11/11] powerpc: mm: Support page table check

2024-03-27 Thread Rohan McLure
On creation and clearing of a page table mapping, instrument such calls
by invoking page_table_check_pte_set and page_table_check_pte_clear
respectively. These calls serve as a sanity check against illegal
mappings.

Enable ARCH_SUPPORTS_PAGE_TABLE_CHECK for all platforms.

See also:

riscv support in commit 3fee229a8eb9 ("riscv/mm: enable
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
arm64 in commit 42b2547137f5 ("arm64/mm: enable
ARCH_SUPPORTS_PAGE_TABLE_CHECK")
x86_64 in commit d283d422c6c4 ("x86: mm: add x86_64 support for page table
check")

Reviewed-by: Christophe Leroy 
Signed-off-by: Rohan McLure 
---
v9: Updated for new API. Instrument pmdp_collapse_flush's two
constituent calls to avoid header hell
v10: Cause p{u,m}dp_huge_get_and_clear() to resemble one another
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h |  7 ++-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 45 +++-
 arch/powerpc/mm/book3s64/hash_pgtable.c  |  4 ++
 arch/powerpc/mm/book3s64/pgtable.c   | 11 +++--
 arch/powerpc/mm/book3s64/radix_pgtable.c |  3 ++
 arch/powerpc/mm/pgtable.c|  4 ++
 7 files changed, 61 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a68b9e637eda..66a72f9078f5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -166,6 +166,7 @@ config PPC
select ARCH_STACKWALK
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_DEBUG_PAGEALLOCif PPC_BOOK3S || PPC_8xx || 40x
+   select ARCH_SUPPORTS_PAGE_TABLE_CHECK
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
select ARCH_USE_MEMTEST
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 83f7b98ef49f..703deb5749e6 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -201,6 +201,7 @@ void unmap_kernel_page(unsigned long va);
 #ifndef __ASSEMBLY__
 #include 
 #include 
+#include 
 
 /* Bits to mask out from a PGD to get to the PUD page */
 #define PGD_MASKED_BITS0
@@ -314,7 +315,11 @@ static inline int __ptep_test_and_clear_young(struct 
mm_struct *mm,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long 
addr,
   pte_t *ptep)
 {
-   return __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
+   pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
+
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
 }
 
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index d8640ddbcad1..6199d2b4bded 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -145,6 +145,8 @@
 #define PAGE_KERNEL_ROX__pgprot(_PAGE_BASE | _PAGE_KERNEL_ROX)
 
 #ifndef __ASSEMBLY__
+#include 
+
 /*
  * page table defines
  */
@@ -415,8 +417,11 @@ static inline void huge_ptep_set_wrprotect(struct 
mm_struct *mm,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
 {
-   unsigned long old = pte_update(mm, addr, ptep, ~0UL, 0, 0);
-   return __pte(old);
+   pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~0UL, 0, 0));
+
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
 }
 
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL
@@ -425,11 +430,16 @@ static inline pte_t ptep_get_and_clear_full(struct 
mm_struct *mm,
pte_t *ptep, int full)
 {
if (full && radix_enabled()) {
+   pte_t old_pte;
+
/*
 * We know that this is a full mm pte clear and
 * hence can be sure there is no parallel set_pte.
 */
-   return radix__ptep_get_and_clear_full(mm, addr, ptep, full);
+   old_pte = radix__ptep_get_and_clear_full(mm, addr, ptep, full);
+   page_table_check_pte_clear(mm, addr, old_pte);
+
+   return old_pte;
}
return ptep_get_and_clear(mm, addr, ptep);
 }
@@ -1306,19 +1316,34 @@ extern int pudp_test_and_clear_young(struct 
vm_area_struct *vma,
 static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
unsigned long addr, pmd_t *pmdp)
 {
-   if (radix_enabled())
-   return radix__pmdp_huge_get_and_clear(mm, addr, pmdp);
-   return hash__pmdp_huge_get_and_clear(mm, addr, pmdp);
+   pmd_t old_pmd;
+
+   if (radix_enabled()) {
+   old_pmd = radix__pmdp_huge_get_and_clear(mm, addr, pmdp);
+   } else {
+   old_pmd = hash__pmdp_huge_get_and_clear(mm, addr, pmdp);
+   }
+
+   

[PATCH v11 09/11] poweprc: mm: Implement *_user_accessible_page() for ptes

2024-03-27 Thread Rohan McLure
Page table checking depends on architectures providing an
implementation of p{te,md,ud}_user_accessible_page. With
refactorisations made on powerpc/mm, the pte_access_permitted() and
similar methods verify whether a userland page is accessible with the
required permissions.

Since page table checking is the only user of
p{te,md,ud}_user_accessible_page(), implement these for all platforms,
using some of the same preliminary checks taken by pte_access_permitted()
on that platform.

Since Commit 8e9bd41e4ce1 ("powerpc/nohash: Replace pte_user() by pte_read()")
pte_user() is no longer required to be present on all platforms as it
may be equivalent to or implied by pte_read(). Hence implementations of
pte_user_accessible_page() are specialised.

Signed-off-by: Rohan McLure 
---
v9: New implementation
v10: Let book3s/64 use pte_user(), but otherwise default other platforms
to using the address provided with the call to infer whether it is a
user page or not. pmd/pud variants will warn on all other platforms, as
they should not be used for user page mappings
v11: Conditionally define p{m,u}d_user_accessible_page(), as not all
platforms have p{m,u}d_leaf(), p{m,u}d_pte() stubs.
---
 arch/powerpc/include/asm/book3s/32/pgtable.h |  5 +
 arch/powerpc/include/asm/book3s/64/pgtable.h | 17 +
 arch/powerpc/include/asm/nohash/pgtable.h|  5 +
 arch/powerpc/include/asm/pgtable.h   |  8 
 4 files changed, 35 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 52971ee30717..83f7b98ef49f 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -436,6 +436,11 @@ static inline bool pte_access_permitted(pte_t pte, bool 
write)
return true;
 }
 
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
+{
+   return pte_present(pte) && !is_kernel_addr(addr);
+}
+
 /* Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
  *
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index fac5615e6bc5..d8640ddbcad1 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -538,6 +538,11 @@ static inline bool pte_access_permitted(pte_t pte, bool 
write)
return arch_pte_access_permitted(pte_val(pte), write, 0);
 }
 
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
+{
+   return pte_present(pte) && pte_user(pte);
+}
+
 /*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
@@ -1441,5 +1446,17 @@ static inline bool pud_leaf(pud_t pud)
return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
 }
 
+#define pmd_user_accessible_page pmd_user_accessible_page
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
+{
+   return pmd_leaf(pmd) && pte_user_accessible_page(pmd_pte(pmd), addr);
+}
+
+#define pud_user_accessible_page pud_user_accessible_page
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
+{
+   return pud_leaf(pud) && pte_user_accessible_page(pud_pte(pud), addr);
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 427db14292c9..413d01a51e6f 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -213,6 +213,11 @@ static inline bool pte_access_permitted(pte_t pte, bool 
write)
return true;
 }
 
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
+{
+   return pte_present(pte) && !is_kernel_addr(addr);
+}
+
 /* Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
  *
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index ee8c82c0528f..f1ceae778cb1 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -219,6 +219,14 @@ static inline int pud_pfn(pud_t pud)
 }
 #endif
 
+#ifndef pmd_user_accessible_page
+#define pmd_user_accessible_page(pmd, addr)false
+#endif
+
+#ifndef pud_user_accessible_page
+#define pud_user_accessible_page(pud, addr)false
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
-- 
2.44.0



[PATCH v11 10/11] powerpc: mm: Use set_pte_at_unchecked() for early-boot / internal usages

2024-03-27 Thread Rohan McLure
In the new set_ptes() API, set_pte_at() (a special case of set_ptes())
is intended to be instrumented by the page table check facility. There
are however several other routines that constitute the API for setting
page table entries, including set_pmd_at() among others. Such routines
are themselves implemented in terms of set_ptes_at().

A future patch providing support for page table checking on powerpc
must take care to avoid duplicate calls to
page_table_check_p{te,md,ud}_set(). Allow for assignment of pte entries
without instrumentation through the set_pte_at_unchecked() routine
introduced in this patch.

Cause API-facing routines that call set_pte_at() to instead call
set_pte_at_unchecked(), which will remain uninstrumented by page
table check. set_ptes() is itself implemented by calls to
__set_pte_at(), so this eliminates redundant code.

Also prefer set_pte_at_unchecked() in early-boot usages which should not be
instrumented.

Signed-off-by: Rohan McLure 
---
v9: New patch
v10: don't reuse __set_pte_at(), as that will not apply filters. Instead
use new set_pte_at_unchecked().
v11: Include the assertion that hwvalid => !protnone. It is possible that
some of these calls can be safely replaced with __set_pte_at(), however
that will have to be done at a later stage.
---
 arch/powerpc/include/asm/pgtable.h   | 2 ++
 arch/powerpc/mm/book3s64/hash_pgtable.c  | 2 +-
 arch/powerpc/mm/book3s64/pgtable.c   | 6 +++---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 8 
 arch/powerpc/mm/nohash/book3e_pgtable.c  | 2 +-
 arch/powerpc/mm/pgtable.c| 8 
 arch/powerpc/mm/pgtable_32.c | 2 +-
 7 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index f1ceae778cb1..ad0c1451502d 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -46,6 +46,8 @@ struct mm_struct;
 void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
pte_t pte, unsigned int nr);
 #define set_ptes set_ptes
+void set_pte_at_unchecked(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, pte_t pte);
 #define update_mmu_cache(vma, addr, ptep) \
update_mmu_cache_range(NULL, vma, addr, ptep, 1)
 
diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c 
b/arch/powerpc/mm/book3s64/hash_pgtable.c
index 988948d69bc1..871472f99a01 100644
--- a/arch/powerpc/mm/book3s64/hash_pgtable.c
+++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
@@ -165,7 +165,7 @@ int hash__map_kernel_page(unsigned long ea, unsigned long 
pa, pgprot_t prot)
ptep = pte_alloc_kernel(pmdp, ea);
if (!ptep)
return -ENOMEM;
-   set_pte_at(_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, prot));
+   set_pte_at_unchecked(_mm, ea, ptep, pfn_pte(pa >> 
PAGE_SHIFT, prot));
} else {
/*
 * If the mm subsystem is not fully up, we cannot create a
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 83823db3488b..f7be5fa058e8 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -116,7 +116,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
WARN_ON(!(pmd_leaf(pmd)));
 #endif
trace_hugepage_set_pmd(addr, pmd_val(pmd));
-   return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
+   return set_pte_at_unchecked(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
 }
 
 void set_pud_at(struct mm_struct *mm, unsigned long addr,
@@ -133,7 +133,7 @@ void set_pud_at(struct mm_struct *mm, unsigned long addr,
WARN_ON(!(pud_leaf(pud)));
 #endif
trace_hugepage_set_pud(addr, pud_val(pud));
-   return set_pte_at(mm, addr, pudp_ptep(pudp), pud_pte(pud));
+   return set_pte_at_unchecked(mm, addr, pudp_ptep(pudp), pud_pte(pud));
 }
 
 static void do_serialize(void *arg)
@@ -539,7 +539,7 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, 
unsigned long addr,
if (radix_enabled())
return radix__ptep_modify_prot_commit(vma, addr,
  ptep, old_pte, pte);
-   set_pte_at(vma->vm_mm, addr, ptep, pte);
+   set_pte_at_unchecked(vma->vm_mm, addr, ptep, pte);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 15e88f1439ec..e8da30536bd5 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -109,7 +109,7 @@ static int early_map_kernel_page(unsigned long ea, unsigned 
long pa,
ptep = pte_offset_kernel(pmdp, ea);
 
 set_the_pte:
-   set_pte_at(_mm, ea, ptep, pfn_pte(pfn, flags));
+   set_pte_at_unchecked(_mm, ea, ptep, pfn_pte(pfn, flags));
asm volatile("ptesync": : :"memory");
return 0;
 }
@@ -1522,7 +1522,7 @@ void 

[PATCH v11 07/11] mm: Provide address parameter to p{te,md,ud}_user_accessible_page()

2024-03-27 Thread Rohan McLure
On several powerpc platforms, a page table entry may not imply whether
the relevant mapping is for userspace or kernelspace. Instead, such
platforms infer this by the address which is being accessed.

Add an additional address argument to each of these routines in order to
provide support for page table check on powerpc.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  6 +++---
 arch/riscv/include/asm/pgtable.h |  6 +++---
 arch/x86/include/asm/pgtable.h   |  6 +++---
 mm/page_table_check.c| 12 ++--
 4 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 040c2e664cff..f698b30463f3 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1074,17 +1074,17 @@ static inline int pgd_devmap(pgd_t pgd)
 #endif
 
 #ifdef CONFIG_PAGE_TABLE_CHECK
-static inline bool pte_user_accessible_page(pte_t pte)
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
 {
return pte_present(pte) && (pte_user(pte) || pte_user_exec(pte));
 }
 
-static inline bool pmd_user_accessible_page(pmd_t pmd)
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
 {
return pmd_leaf(pmd) && !pmd_present_invalid(pmd) && (pmd_user(pmd) || 
pmd_user_exec(pmd));
 }
 
-static inline bool pud_user_accessible_page(pud_t pud)
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
 {
return pud_leaf(pud) && (pud_user(pud) || pud_user_exec(pud));
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 92bf5c309055..b9663e03475b 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -724,17 +724,17 @@ static inline void set_pud_at(struct mm_struct *mm, 
unsigned long addr,
 }
 
 #ifdef CONFIG_PAGE_TABLE_CHECK
-static inline bool pte_user_accessible_page(pte_t pte)
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
 {
return pte_present(pte) && pte_user(pte);
 }
 
-static inline bool pmd_user_accessible_page(pmd_t pmd)
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
 {
return pmd_leaf(pmd) && pmd_user(pmd);
 }
 
-static inline bool pud_user_accessible_page(pud_t pud)
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
 {
return pud_leaf(pud) && pud_user(pud);
 }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index b2b3902f8df4..e898813fce01 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1688,17 +1688,17 @@ static inline bool arch_has_hw_nonleaf_pmd_young(void)
 #endif
 
 #ifdef CONFIG_PAGE_TABLE_CHECK
-static inline bool pte_user_accessible_page(pte_t pte)
+static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr)
 {
return (pte_val(pte) & _PAGE_PRESENT) && (pte_val(pte) & _PAGE_USER);
 }
 
-static inline bool pmd_user_accessible_page(pmd_t pmd)
+static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr)
 {
return pmd_leaf(pmd) && (pmd_val(pmd) & _PAGE_PRESENT) && (pmd_val(pmd) 
& _PAGE_USER);
 }
 
-static inline bool pud_user_accessible_page(pud_t pud)
+static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr)
 {
return pud_leaf(pud) && (pud_val(pud) & _PAGE_PRESENT) && (pud_val(pud) 
& _PAGE_USER);
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 98cccee74b02..aa5e16c8328e 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -155,7 +155,7 @@ void __page_table_check_pte_clear(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   if (pte_user_accessible_page(pte)) {
+   if (pte_user_accessible_page(pte, addr)) {
page_table_check_clear(pte_pfn(pte), PAGE_SIZE >> PAGE_SHIFT);
}
 }
@@ -167,7 +167,7 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   if (pmd_user_accessible_page(pmd)) {
+   if (pmd_user_accessible_page(pmd, addr)) {
page_table_check_clear(pmd_pfn(pmd), PMD_SIZE >> PAGE_SHIFT);
}
 }
@@ -179,7 +179,7 @@ void __page_table_check_pud_clear(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   if (pud_user_accessible_page(pud)) {
+   if (pud_user_accessible_page(pud, addr)) {
page_table_check_clear(pud_pfn(pud), PUD_SIZE >> PAGE_SHIFT);
}
 }
@@ -195,7 +195,7 @@ void __page_table_check_ptes_set(struct mm_struct *mm, 
unsigned long addr,
 
for (i = 0; i < nr; i++)
__page_table_check_pte_clear(mm, addr, ptep_get(ptep + i));
-   if (pte_user_accessible_page(pte))
+   if (pte_user_accessible_page(pte, addr))
page_table_check_set(pte_pfn(pte), nr, pte_write(pte));
 }
 

[PATCH v11 06/11] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pte_clear"

2024-03-27 Thread Rohan McLure
This reverts commit aa232204c4689427cefa55fe975692b57291523a.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 arch/x86/include/asm/pgtable.h   |  4 ++--
 include/linux/page_table_check.h | 11 +++
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  7 ---
 6 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index d20afcfae530..040c2e664cff 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1145,7 +1145,7 @@ static inline pte_t __ptep_get_and_clear(struct mm_struct 
*mm,
 {
pte_t pte = __pte(xchg_relaxed(_val(*ptep), 0));
 
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, address, pte);
 
return pte;
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 0066626159a5..92bf5c309055 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -563,7 +563,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 {
pte_t pte = __pte(atomic_long_xchg((atomic_long_t *)ptep, 0));
 
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, address, pte);
 
return pte;
 }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 9876e6d92799..b2b3902f8df4 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1276,7 +1276,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct 
*mm, unsigned long addr,
   pte_t *ptep)
 {
pte_t pte = native_ptep_get_and_clear(ptep);
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, addr, pte);
return pte;
 }
 
@@ -1292,7 +1292,7 @@ static inline pte_t ptep_get_and_clear_full(struct 
mm_struct *mm,
 * care about updates and native needs no locking
 */
pte = native_local_ptep_get_and_clear(ptep);
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, addr, pte);
} else {
pte = ptep_get_and_clear(mm, addr, ptep);
}
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 0a6ebfa46a31..48721a4a2b84 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -14,7 +14,8 @@ extern struct static_key_true page_table_check_disabled;
 extern struct page_ext_operations page_table_check_ops;
 
 void __page_table_check_zero(struct page *page, unsigned int order);
-void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
+void __page_table_check_pte_clear(struct mm_struct *mm, unsigned long addr,
+ pte_t pte);
 void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
  pmd_t pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
@@ -45,12 +46,13 @@ static inline void page_table_check_free(struct page *page, 
unsigned int order)
__page_table_check_zero(page, order);
 }
 
-static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte)
+static inline void page_table_check_pte_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t pte)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_pte_clear(mm, pte);
+   __page_table_check_pte_clear(mm, addr, pte);
 }
 
 static inline void page_table_check_pmd_clear(struct mm_struct *mm,
@@ -121,7 +123,8 @@ static inline void page_table_check_free(struct page *page, 
unsigned int order)
 {
 }
 
-static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte)
+static inline void page_table_check_pte_clear(struct mm_struct *mm,
+ unsigned long addr, pte_t pte)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index d17fbca4da7b..7c18a1e55696 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -454,7 +454,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
 {
pte_t pte = ptep_get(ptep);
pte_clear(mm, address, ptep);
-   page_table_check_pte_clear(mm, pte);
+   page_table_check_pte_clear(mm, address, pte);
return pte;
 }
 #endif
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 7afaad9c6e6f..98cccee74b02 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -149,7 +149,8 @@ void __page_table_check_zero(struct page *page, unsigned 
int order)

[PATCH v11 05/11] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pmd_clear"

2024-03-27 Thread Rohan McLure
This reverts commit 1831414cd729a34af937d56ad684a66599de6344.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/page_table_check.h | 11 +++
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  5 +++--
 6 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b3938f80a1b6..d20afcfae530 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1188,7 +1188,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm,
 {
pmd_t pmd = __pmd(xchg_relaxed(_val(*pmdp), 0));
 
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, address, pmd);
 
return pmd;
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index a153d3d143d2..0066626159a5 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -767,7 +767,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm,
 {
pmd_t pmd = __pmd(atomic_long_xchg((atomic_long_t *)pmdp, 0));
 
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, address, pmd);
 
return pmd;
 }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index e35b2b4f5ea1..9876e6d92799 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1345,7 +1345,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm, unsigned long
 {
pmd_t pmd = native_pmdp_get_and_clear(pmdp);
 
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, addr, pmd);
 
return pmd;
 }
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index d01a00ffc1f9..0a6ebfa46a31 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -15,7 +15,8 @@ extern struct page_ext_operations page_table_check_ops;
 
 void __page_table_check_zero(struct page *page, unsigned int order);
 void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
-void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd);
+void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
+ pmd_t pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
  pud_t pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
@@ -52,12 +53,13 @@ static inline void page_table_check_pte_clear(struct 
mm_struct *mm, pte_t pte)
__page_table_check_pte_clear(mm, pte);
 }
 
-static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd)
+static inline void page_table_check_pmd_clear(struct mm_struct *mm,
+ unsigned long addr, pmd_t pmd)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_pmd_clear(mm, pmd);
+   __page_table_check_pmd_clear(mm, addr, pmd);
 }
 
 static inline void page_table_check_pud_clear(struct mm_struct *mm,
@@ -123,7 +125,8 @@ static inline void page_table_check_pte_clear(struct 
mm_struct *mm, pte_t pte)
 {
 }
 
-static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd)
+static inline void page_table_check_pmd_clear(struct mm_struct *mm,
+ unsigned long addr, pmd_t pmd)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 6a5c44c2208e..d17fbca4da7b 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -557,7 +557,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct 
mm_struct *mm,
pmd_t pmd = *pmdp;
 
pmd_clear(pmdp);
-   page_table_check_pmd_clear(mm, pmd);
+   page_table_check_pmd_clear(mm, address, pmd);
 
return pmd;
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index a8c8fd7f06f8..7afaad9c6e6f 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -160,7 +160,8 @@ void __page_table_check_pte_clear(struct mm_struct *mm, 
pte_t pte)
 }
 EXPORT_SYMBOL(__page_table_check_pte_clear);
 
-void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd)
+void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
+ pmd_t pmd)
 {
if (_mm == mm)
return;
@@ -204,7 +205,7 @@ void __page_table_check_pmd_set(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   __page_table_check_pmd_clear(mm, *pmdp);
+   

[PATCH v11 04/11] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pud_clear"

2024-03-27 Thread Rohan McLure
This reverts commit 931c38e16499a057e30a3033f4d6a9c242f0f156.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

Signed-off-by: Rohan McLure 
---
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/page_table_check.h | 11 +++
 include/linux/pgtable.h  |  2 +-
 mm/page_table_check.c|  5 +++--
 4 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 82bbe115a1a4..e35b2b4f5ea1 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1356,7 +1356,7 @@ static inline pud_t pudp_huge_get_and_clear(struct 
mm_struct *mm,
 {
pud_t pud = native_pudp_get_and_clear(pudp);
 
-   page_table_check_pud_clear(mm, pud);
+   page_table_check_pud_clear(mm, addr, pud);
 
return pud;
 }
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 9243c920ed02..d01a00ffc1f9 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -16,7 +16,8 @@ extern struct page_ext_operations page_table_check_ops;
 void __page_table_check_zero(struct page *page, unsigned int order);
 void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte);
 void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd);
-void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud);
+void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
+ pud_t pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
 pte_t *ptep, pte_t pte, unsigned int nr);
 void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
@@ -59,12 +60,13 @@ static inline void page_table_check_pmd_clear(struct 
mm_struct *mm, pmd_t pmd)
__page_table_check_pmd_clear(mm, pmd);
 }
 
-static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud)
+static inline void page_table_check_pud_clear(struct mm_struct *mm,
+ unsigned long addr, pud_t pud)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_pud_clear(mm, pud);
+   __page_table_check_pud_clear(mm, addr, pud);
 }
 
 static inline void page_table_check_ptes_set(struct mm_struct *mm,
@@ -125,7 +127,8 @@ static inline void page_table_check_pmd_clear(struct 
mm_struct *mm, pmd_t pmd)
 {
 }
 
-static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud)
+static inline void page_table_check_pud_clear(struct mm_struct *mm,
+ unsigned long addr, pud_t pud)
 {
 }
 
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index b2b4c1160d4a..6a5c44c2208e 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -570,7 +570,7 @@ static inline pud_t pudp_huge_get_and_clear(struct 
mm_struct *mm,
pud_t pud = *pudp;
 
pud_clear(pudp);
-   page_table_check_pud_clear(mm, pud);
+   page_table_check_pud_clear(mm, address, pud);
 
return pud;
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 3a338fee6d00..a8c8fd7f06f8 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -171,7 +171,8 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, 
pmd_t pmd)
 }
 EXPORT_SYMBOL(__page_table_check_pmd_clear);
 
-void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud)
+void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
+ pud_t pud)
 {
if (_mm == mm)
return;
@@ -217,7 +218,7 @@ void __page_table_check_pud_set(struct mm_struct *mm, 
unsigned long addr,
if (_mm == mm)
return;
 
-   __page_table_check_pud_clear(mm, *pudp);
+   __page_table_check_pud_clear(mm, addr, *pudp);
if (pud_user_accessible_page(pud)) {
page_table_check_set(pud_pfn(pud), PUD_SIZE >> PAGE_SHIFT,
 pud_write(pud));
-- 
2.44.0



[PATCH v11 00/11] Support page table check PowerPC

2024-03-27 Thread Rohan McLure
Support page table check on all PowerPC platforms. This works by
serialising assignments, reassignments and clears of page table
entries at each level in order to ensure that anonymous mappings
have at most one writable consumer, and likewise that file-backed
mappings are not simultaneously also anonymous mappings.

In order to support this infrastructure, a number of stubs must be
defined for all powerpc platforms. Additionally, seperate set_pte_at()
and set_pte_at_unchecked(), to allow for internal, uninstrumented mappings.

v11:
 * The pud_pfn() stub, which previously had no legitimate users on any
   powerpc platform, now has users in Book3s64 with transparent pages.
   Include a stub of the same name for each platform that does not
   define their own.
 * Drop patch that standardised use of p*d_leaf(), as already included
   upstream in v6.9.
 * Provide fallback definitions of p{m,u}d_user_accessible_page() that
   do not reference p*d_leaf(), p*d_pte(), as they are defined after
   powerpc/mm headers by linux/mm headers.
 * Ensure that set_pte_at_unchecked() has the same checks as
   set_pte_at().

v10:
 * Revert patches that removed address and mm parameters from page table
   check routines, including consuming code from arm64, x86_64 and
   riscv.
 * Implement *_user_accessible_page() routines in terms of pte_user()
   where available (64-bit, book3s) but otherwise by checking the
   address (on platforms where the pte does not imply whether the
   mapping is for user or kernel) 
 * Internal set_pte_at() calls replaced with set_pte_at_unchecked(), which
   is identical, but prevents double instrumentation.
Link: 
https://lore.kernel.org/linuxppc-dev/20240313042118.230397-9-rmcl...@linux.ibm.com/T/

v9:
 * Adapt to using the set_ptes() API, using __set_pte_at() where we need
   must avoid instrumentation.
 * Use the logic of *_access_permitted() for implementing
   *_user_accessible_page(), which are required routines for page table
   check.
 * Even though we no longer need p{m,u,4}d_leaf(), still default
   implement these to assist in refactoring out extant
   p{m,u,4}_is_leaf().
 * Add p{m,u}_pte() stubs where asm-generic does not provide them, as
   page table check wants all *user_accessible_page() variants, and we
   would like to default implement the variants in terms of
   pte_user_accessible_page().
 * Avoid the ugly pmdp_collapse_flush() macro nonsense! Just instrument
   its constituent calls instead for radix and hash.
Link: 
https://lore.kernel.org/linuxppc-dev/20231130025404.37179-2-rmcl...@linux.ibm.com/

v8:
 * Fix linux/page_table_check.h include in asm/pgtable.h breaking
   32-bit.
Link: 
https://lore.kernel.org/linuxppc-dev/20230215231153.2147454-1-rmcl...@linux.ibm.com/

v7:
 * Remove use of extern in set_pte prototypes
 * Clean up pmdp_collapse_flush macro
 * Replace set_pte_at with static inline function
 * Fix commit message for patch 7
Link: 
https://lore.kernel.org/linuxppc-dev/20230215020155.1969194-1-rmcl...@linux.ibm.com/

v6:
 * Support huge pages and p{m,u}d accounting.
 * Remove instrumentation from set_pte from kernel internal pages.
 * 64s: Implement pmdp_collapse_flush in terms of __pmdp_collapse_flush
   as access to the mm_struct * is required.
Link: 
https://lore.kernel.org/linuxppc-dev/20230214015939.1853438-1-rmcl...@linux.ibm.com/

v5:
Link: 
https://lore.kernel.org/linuxppc-dev/20221118002146.25979-1-rmcl...@linux.ibm.com/

Rohan McLure (11):
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pud_set"
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pmd_set"
  mm: Provide addr parameter to page_table_check_pte_set()
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pud_clear"
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pmd_clear"
  Revert "mm/page_table_check: remove unused parameter in
[__]page_table_check_pte_clear"
  mm: Provide address parameter to p{te,md,ud}_user_accessible_page()
  powerpc: mm: Add pud_pfn() stub
  poweprc: mm: Implement *_user_accessible_page() for ptes
  powerpc: mm: Use set_pte_at_unchecked() for early-boot / internal
usages
  powerpc: mm: Support page table check

 arch/arm64/include/asm/pgtable.h | 18 +++---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h | 12 +++-
 arch/powerpc/include/asm/book3s/64/pgtable.h | 62 +++---
 arch/powerpc/include/asm/nohash/pgtable.h|  5 ++
 arch/powerpc/include/asm/pgtable.h   | 18 ++
 arch/powerpc/mm/book3s64/hash_pgtable.c  |  6 +-
 arch/powerpc/mm/book3s64/pgtable.c   | 17 +++--
 arch/powerpc/mm/book3s64/radix_pgtable.c | 11 ++--
 arch/powerpc/mm/nohash/book3e_pgtable.c  |  2 +-
 arch/powerpc/mm/pgtable.c| 12 
 arch/powerpc/mm/pgtable_32.c |  2 +-
 arch/riscv/include/asm/pgtable.h | 18 

[PATCH v11 01/11] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pud_set"

2024-03-27 Thread Rohan McLure
This reverts commit 6d144436d954311f2dbacb5bf7b084042448d83e.

Reinstate previously unused parameters for the purpose of supporting
powerpc platforms, as many do not encode user/kernel ownership of the
page in the pte, but instead in the address of the access.

riscv: Respect change to delete mm, addr parameters from __set_pte_at()

This commit also changed calls to __set_pte_at() to use fewer parameters
on riscv. Keep that change rather than reverting it, as the signature of
__set_pte_at() is changed in a different commit.

Signed-off-by: Rohan McLure 
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/page_table_check.h | 11 +++
 mm/page_table_check.c|  3 ++-
 5 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index afdd56d26ad7..7334e5526185 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -568,7 +568,7 @@ static inline void set_pmd_at(struct mm_struct *mm, 
unsigned long addr,
 static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
  pud_t *pudp, pud_t pud)
 {
-   page_table_check_pud_set(mm, pudp, pud);
+   page_table_check_pud_set(mm, addr, pudp, pud);
return __set_pte_at(mm, addr, (pte_t *)pudp, pud_pte(pud),
PUD_SIZE >> PAGE_SHIFT);
 }
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 20242402fc11..1e0c0717b3f9 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -719,7 +719,7 @@ static inline void set_pmd_at(struct mm_struct *mm, 
unsigned long addr,
 static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
pud_t *pudp, pud_t pud)
 {
-   page_table_check_pud_set(mm, pudp, pud);
+   page_table_check_pud_set(mm, addr, pudp, pud);
return __set_pte_at((pte_t *)pudp, pud_pte(pud));
 }
 
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 315535ffb258..09db55fa8856 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1245,7 +1245,7 @@ static inline void set_pmd_at(struct mm_struct *mm, 
unsigned long addr,
 static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
  pud_t *pudp, pud_t pud)
 {
-   page_table_check_pud_set(mm, pudp, pud);
+   page_table_check_pud_set(mm, addr, pudp, pud);
native_set_pud(pudp, pud);
 }
 
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 6722941c7cb8..d188428512f5 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -20,7 +20,8 @@ void __page_table_check_pud_clear(struct mm_struct *mm, pud_t 
pud);
 void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte,
unsigned int nr);
 void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd);
-void __page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp, pud_t pud);
+void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
+   pud_t *pudp, pud_t pud);
 void __page_table_check_pte_clear_range(struct mm_struct *mm,
unsigned long addr,
pmd_t pmd);
@@ -83,13 +84,14 @@ static inline void page_table_check_pmd_set(struct 
mm_struct *mm, pmd_t *pmdp,
__page_table_check_pmd_set(mm, pmdp, pmd);
 }
 
-static inline void page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp,
+static inline void page_table_check_pud_set(struct mm_struct *mm,
+   unsigned long addr, pud_t *pudp,
pud_t pud)
 {
if (static_branch_likely(_table_check_disabled))
return;
 
-   __page_table_check_pud_set(mm, pudp, pud);
+   __page_table_check_pud_set(mm, addr, pudp, pud);
 }
 
 static inline void page_table_check_pte_clear_range(struct mm_struct *mm,
@@ -134,7 +136,8 @@ static inline void page_table_check_pmd_set(struct 
mm_struct *mm, pmd_t *pmdp,
 {
 }
 
-static inline void page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp,
+static inline void page_table_check_pud_set(struct mm_struct *mm,
+   unsigned long addr, pud_t *pudp,
pud_t pud)
 {
 }
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index af69c3c8f7c2..75167537ebd7 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -210,7 +210,8 @@ void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t 
*pmdp, pmd_t pmd)
 }
 EXPORT_SYMBOL(__page_table_check_pmd_set);
 
-void __page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp, 

[PATCH] serial/pmac_zilog: Remove flawed mitigation for rx irq flood

2024-03-27 Thread Finn Thain
The mitigation was intended to stop the irq completely. That might have
been better than a hard lock-up but it turns out that you get a crash
anyway if you're using pmac_zilog as a serial console.

That's because the pr_err() call in pmz_receive_chars() results in
pmz_console_write() attempting to lock a spinlock already locked in
pmz_interrupt(). With CONFIG_DEBUG_SPINLOCK=y, this produces a fatal
BUG splat like the one below. (The spinlock at 0x62e140 is the one in
struct uart_port.)

Even when it's not fatal, the serial port rx function ceases to work.
Also, the iteration limit doesn't play nicely with QEMU. Please see
bug report linked below.

A web search for reports of the error message "pmz: rx irq flood" didn't
produce anything. So I don't think this code is needed any more. Remove it.

[   14.56] ttyPZ0: pmz: rx irq flood !
[   14.56] BUG: spinlock recursion on CPU#0, swapper/0
[   14.56]  lock: 0x62e140, .magic: dead4ead, .owner: swapper/0, 
.owner_cpu: 0
[   14.56] CPU: 0 PID: 0 Comm: swapper Not tainted 
6.8.0-mac-dbg-preempt-4-g4143b7b9144a #1
[   14.56] Stack from 0059bcc4:
[   14.56] 0059bcc4 0056316f 0056316f 2700 004b6444 0059bce4 
004ad8c6 0056316f
[   14.56] 0059bd10 004a6546 00556759 0062e140 dead4ead 0059f892 
 
[   14.56] 0062e140 0059bde8 005c03d0 0059bd24 0004daf6 0062e140 
005567bf 0062e140
[   14.56] 0059bd34 004b64c2 0062e140 0001 0059bd50 002e15ea 
0062e140 0001
[   14.56] 0059bde7 0059bde8 005c03d0 0059bdac 0005124e 005c03d0 
005cdc00 002b
[   14.56] 005a3caa 005a3caa  0059bde8 0004ff00 0059be8b 
00038200 000529ba
[   14.56] Call Trace: [<2700>] ret_from_kernel_thread+0xc/0x14
[   14.56]  [<004b6444>] _raw_spin_lock+0x0/0x28
[   14.56]  [<004ad8c6>] dump_stack+0x10/0x16
[   14.56]  [<004a6546>] spin_dump+0x6e/0x7c
[   14.56]  [<0004daf6>] do_raw_spin_lock+0x9c/0xa6
[   14.56]  [<004b64c2>] _raw_spin_lock_irqsave+0x2a/0x34
[   14.56]  [<002e15ea>] pmz_console_write+0x32/0x9a
[   14.56]  [<0005124e>] console_flush_all+0x112/0x3a2
[   14.56]  [<0004ff00>] console_trylock+0x0/0x7a
[   14.56]  [<00038200>] parameq+0x48/0x6e
[   14.56]  [<000529ba>] __printk_safe_enter+0x0/0x36
[   14.56]  [<0005113c>] console_flush_all+0x0/0x3a2
[   14.56]  [<000542c4>] prb_read_valid+0x0/0x1a
[   14.56]  [<004b65a4>] _raw_spin_unlock+0x0/0x38
[   14.56]  [<0005151e>] console_unlock+0x40/0xb8
[   14.56]  [<00038200>] parameq+0x48/0x6e
[   14.56]  [<002c778c>] __tty_insert_flip_string_flags+0x0/0x14e
[   14.56]  [<00051798>] vprintk_emit+0x156/0x238
[   14.56]  [<00051894>] vprintk_default+0x1a/0x1e
[   14.56]  [<000529a8>] vprintk+0x74/0x86
[   14.56]  [<004a6596>] _printk+0x12/0x16
[   14.56]  [<002e23be>] pmz_receive_chars+0x1cc/0x394
[   14.56]  [<004b6444>] _raw_spin_lock+0x0/0x28
[   14.56]  [<00038226>] parse_args+0x0/0x3a6
[   14.56]  [<004b6466>] _raw_spin_lock+0x22/0x28
[   14.56]  [<002e26b4>] pmz_interrupt+0x12e/0x1e0
[   14.56]  [<00048680>] arch_cpu_idle_enter+0x0/0x8
[   14.56]  [<00054ebc>] __handle_irq_event_percpu+0x24/0x106
[   14.56]  [<004ae576>] default_idle_call+0x0/0x46
[   14.56]  [<00055020>] handle_irq_event+0x30/0x90
[   14.56]  [<00058320>] handle_simple_irq+0x5e/0xc0
[   14.56]  [<00048688>] arch_cpu_idle_exit+0x0/0x8
[   14.56]  [<00054800>] generic_handle_irq+0x3c/0x4a
[   14.56]  [<2978>] do_IRQ+0x24/0x3a
[   14.56]  [<004ae508>] cpu_idle_poll.isra.0+0x0/0x6e
[   14.56]  [<2874>] auto_irqhandler_fixup+0x4/0xc
[   14.56]  [<004ae508>] cpu_idle_poll.isra.0+0x0/0x6e
[   14.56]  [<004ae576>] default_idle_call+0x0/0x46
[   14.56]  [<004ae598>] default_idle_call+0x22/0x46
[   14.56]  [<00048710>] do_idle+0x6a/0xf0
[   14.56]  [<000486a6>] do_idle+0x0/0xf0
[   14.56]  [<000367d2>] find_task_by_pid_ns+0x0/0x2a
[   14.56]  [<0005d064>] __rcu_read_lock+0x0/0x12
[   14.56]  [<00048a5a>] cpu_startup_entry+0x18/0x1c
[   14.56]  [<00063a06>] __rcu_read_unlock+0x0/0x26
[   14.56]  [<004ae65a>] kernel_init+0x0/0xfa
[   14.56]  [<0049c5a8>] strcpy+0x0/0x1e
[   14.56]  [<004a6584>] _printk+0x0/0x16
[   14.56]  [<0049c72a>] strlen+0x0/0x22
[   14.56]  [<006452d4>] memblock_alloc_try_nid+0x0/0x82
[   14.56]  [<0063939a>] arch_post_acpi_subsys_init+0x0/0x8
[   14.56]  [<0063991e>] console_on_rootfs+0x0/0x60
[   14.56]  [<00638410>] _sinittext+0x410/0xadc
[   14.56]

Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: "Aneesh Kumar K.V" 
Cc: "Naveen N. Rao" 
Cc: linux-m...@lists.linux-m68k.org
Link: https://github.com/vivier/qemu-m68k/issues/44
Link: https://lore.kernel.org/all/1078874617.9746.36.camel@gaston/
Signed-off-by: Finn Thain 
---
 drivers/tty/serial/pmac_zilog.c | 14 

Re: [PATCH v3] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id

2024-03-27 Thread Shijie Huang



在 2024/3/28 2:17, Andrew Morton 写道:

On Fri, 26 Jan 2024 14:44:51 +0800 Huang Shijie  
wrote:


During the kernel booting, the generic cpu_to_node() is called too early in
arm64, powerpc and riscv when CONFIG_NUMA is enabled.

There are at least four places in the common code where
the generic cpu_to_node() is called before it is initialized:
   1.) early_trace_init() in kernel/trace/trace.c
   2.) sched_init()   in kernel/sched/core.c
   3.) init_sched_fair_class()in kernel/sched/fair.c
   4.) workqueue_init_early() in kernel/workqueue.c

In order to fix the bug, the patch introduces early_numa_node_init()
which is called after smp_prepare_boot_cpu() in start_kernel.
early_numa_node_init will initialize the "numa_node" as soon as
the early_cpu_to_node() is ready, before the cpu_to_node() is called
at the first time.

What are the userspace-visible runtime effects of this bug?

For this bug, I do not see too much performance impact in the userspace 
applications.


It just pollutes the CPU caches in NUMA.


Thanks

Huang Shijie




Re: [PATCH v2 5/6] mm/mm_init.c: remove unneeded calc_memmap_size()

2024-03-27 Thread Baoquan He
On 03/27/24 at 06:21pm, Mike Rapoport wrote:
> On Mon, Mar 25, 2024 at 10:56:45PM +0800, Baoquan He wrote:
> > Nobody calls calc_memmap_size() now.
> > 
> > Signed-off-by: Baoquan He 
> 
> Reviewed-by: Mike Rapoport (IBM) 
> 
> Looks like I replied to patch 6/6 twice by mistake and missed this one.

Thanks for your careful reviewing.

> 
> > ---
> >  mm/mm_init.c | 20 
> >  1 file changed, 20 deletions(-)
> > 
> > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > index 7f71e56e83f3..e269a724f70e 100644
> > --- a/mm/mm_init.c
> > +++ b/mm/mm_init.c
> > @@ -1331,26 +1331,6 @@ static void __init calculate_node_totalpages(struct 
> > pglist_data *pgdat,
> > pr_debug("On node %d totalpages: %lu\n", pgdat->node_id, 
> > realtotalpages);
> >  }
> >  
> > -static unsigned long __init calc_memmap_size(unsigned long spanned_pages,
> > -   unsigned long present_pages)
> > -{
> > -   unsigned long pages = spanned_pages;
> > -
> > -   /*
> > -* Provide a more accurate estimation if there are holes within
> > -* the zone and SPARSEMEM is in use. If there are holes within the
> > -* zone, each populated memory region may cost us one or two extra
> > -* memmap pages due to alignment because memmap pages for each
> > -* populated regions may not be naturally aligned on page boundary.
> > -* So the (present_pages >> 4) heuristic is a tradeoff for that.
> > -*/
> > -   if (spanned_pages > present_pages + (present_pages >> 4) &&
> > -   IS_ENABLED(CONFIG_SPARSEMEM))
> > -   pages = present_pages;
> > -
> > -   return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
> > -}
> > -
> >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >  static void pgdat_init_split_queue(struct pglist_data *pgdat)
> >  {
> > -- 
> > 2.41.0
> > 
> 
> -- 
> Sincerely yours,
> Mike.
> 



Re: [PATCH v3 12/14] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT

2024-03-27 Thread Samuel Holland
On 2024-03-27 4:25 PM, Andrew Morton wrote:
> On Wed, 27 Mar 2024 13:00:43 -0700 Samuel Holland  
> wrote:
> 
>> Now that all previously-supported architectures select
>> ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
>> of the existing list of architectures. It can also take advantage of the
>> common kernel-mode FPU API and method of adjusting CFLAGS.
>>
>> ...
>>
>> @@ -87,16 +78,9 @@ void dc_fpu_begin(const char *function_name, const int 
>> line)
>>  WARN_ON_ONCE(!in_task());
>>  preempt_disable();
>>  depth = __this_cpu_inc_return(fpu_recursion_depth);
>> -
>>  if (depth == 1) {
>> -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
>> +BUG_ON(!kernel_fpu_available());
>>  kernel_fpu_begin();
> 
> For some reason kernel_fpu_available() was undefined in my x86_64
> allmodconfig build.  I just removed the statement.

This is because the include guard in asm/fpu.h conflicts with the existing one
in asm/fpu/types.h (which doesn't match its filename), so the definition of
kernel_fpu_available() is not seen. I can fix up the include guard in
asm/fpu/types.h in the next version:

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index ace9aa3b78a3..75a3910d867a 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -2,8 +2,8 @@
 /*
  * FPU data structures:
  */
-#ifndef _ASM_X86_FPU_H
-#define _ASM_X86_FPU_H
+#ifndef _ASM_X86_FPU_TYPES_H
+#define _ASM_X86_FPU_TYPES_H

 #include 

@@ -596,4 +596,4 @@ struct fpu_state_config {
 /* FPU state configuration information */
 extern struct fpu_state_config fpu_kernel_cfg, fpu_user_cfg;

-#endif /* _ASM_X86_FPU_H */
+#endif /* _ASM_X86_FPU_TY{ES_H */


Regards,
Samuel



Re: [PATCH v3 12/14] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT

2024-03-27 Thread Andrew Morton
On Wed, 27 Mar 2024 13:00:43 -0700 Samuel Holland  
wrote:

> Now that all previously-supported architectures select
> ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
> of the existing list of architectures. It can also take advantage of the
> common kernel-mode FPU API and method of adjusting CFLAGS.
> 
> ...
>
> @@ -87,16 +78,9 @@ void dc_fpu_begin(const char *function_name, const int 
> line)
>   WARN_ON_ONCE(!in_task());
>   preempt_disable();
>   depth = __this_cpu_inc_return(fpu_recursion_depth);
> -
>   if (depth == 1) {
> -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
> + BUG_ON(!kernel_fpu_available());
>   kernel_fpu_begin();

For some reason kernel_fpu_available() was undefined in my x86_64
allmodconfig build.  I just removed the statement.



Re: [PATCH 9/9] mmc: Convert from tasklet to BH workqueue

2024-03-27 Thread Jernej Škrabec
Dne sreda, 27. marec 2024 ob 17:03:14 CET je Allen Pais napisal(a):
> The only generic interface to execute asynchronously in the BH context is
> tasklet; however, it's marked deprecated and has some design flaws. To
> replace tasklets, BH workqueue support was recently added. A BH workqueue
> behaves similarly to regular workqueues except that the queued work items
> are executed in the BH context.
> 
> This patch converts drivers/infiniband/* from tasklet to BH workqueue.

infiniband -> mmc

Best regards,
Jernej

> 
> Based on the work done by Tejun Heo 
> Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10
> 
> Signed-off-by: Allen Pais 





Re: [PATCH 4/9] USB: Convert from tasklet to BH workqueue

2024-03-27 Thread Alan Stern
On Wed, Mar 27, 2024 at 04:03:09PM +, Allen Pais wrote:
> The only generic interface to execute asynchronously in the BH context is
> tasklet; however, it's marked deprecated and has some design flaws. To
> replace tasklets, BH workqueue support was recently added. A BH workqueue
> behaves similarly to regular workqueues except that the queued work items
> are executed in the BH context.
> 
> This patch converts drivers/infiniband/* from tasklet to BH workqueue.
> 
> Based on the work done by Tejun Heo 
> Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10
> 
> Signed-off-by: Allen Pais 
> ---

> diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> index c0e005670d67..88d8e1c366cd 100644
> --- a/drivers/usb/core/hcd.c
> +++ b/drivers/usb/core/hcd.c

> @@ -1662,10 +1663,9 @@ static void __usb_hcd_giveback_urb(struct urb *urb)
>   usb_put_urb(urb);
>  }
>  
> -static void usb_giveback_urb_bh(struct work_struct *work)
> +static void usb_giveback_urb_bh(struct work_struct *t)
>  {
> - struct giveback_urb_bh *bh =
> - container_of(work, struct giveback_urb_bh, bh);
> + struct giveback_urb_bh *bh = from_work(bh, t, bh);
>   struct list_head local_list;
>  
>   spin_lock_irq(>lock);

Is there any reason for this apparently pointless change of a local
variable's name?

Alan Stern


Re: [PATCH 6/9] ipmi: Convert from tasklet to BH workqueue

2024-03-27 Thread Corey Minyard
On Wed, Mar 27, 2024 at 04:03:11PM +, Allen Pais wrote:
> The only generic interface to execute asynchronously in the BH context is
> tasklet; however, it's marked deprecated and has some design flaws. To
> replace tasklets, BH workqueue support was recently added. A BH workqueue
> behaves similarly to regular workqueues except that the queued work items
> are executed in the BH context.
> 
> This patch converts drivers/infiniband/* from tasklet to BH workqueue.

I think you mean drivers/char/ipmi/* here.

I believe that work queues items are execute single-threaded for a work
queue, so this should be good.  I need to test this, though.  It may be
that an IPMI device can have its own work queue; it may not be important
to run it in bh context.

-corey

> 
> Based on the work done by Tejun Heo 
> Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10
> 
> Signed-off-by: Allen Pais 
> ---
>  drivers/char/ipmi/ipmi_msghandler.c | 30 ++---
>  1 file changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/char/ipmi/ipmi_msghandler.c 
> b/drivers/char/ipmi/ipmi_msghandler.c
> index b0eedc4595b3..fce2a2dbdc82 100644
> --- a/drivers/char/ipmi/ipmi_msghandler.c
> +++ b/drivers/char/ipmi/ipmi_msghandler.c
> @@ -36,12 +36,13 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define IPMI_DRIVER_VERSION "39.2"
>  
>  static struct ipmi_recv_msg *ipmi_alloc_recv_msg(void);
>  static int ipmi_init_msghandler(void);
> -static void smi_recv_tasklet(struct tasklet_struct *t);
> +static void smi_recv_work(struct work_struct *t);
>  static void handle_new_recv_msgs(struct ipmi_smi *intf);
>  static void need_waiter(struct ipmi_smi *intf);
>  static int handle_one_recv_msg(struct ipmi_smi *intf,
> @@ -498,13 +499,13 @@ struct ipmi_smi {
>   /*
>* Messages queued for delivery.  If delivery fails (out of memory
>* for instance), They will stay in here to be processed later in a
> -  * periodic timer interrupt.  The tasklet is for handling received
> +  * periodic timer interrupt.  The work is for handling received
>* messages directly from the handler.
>*/
>   spinlock_t   waiting_rcv_msgs_lock;
>   struct list_head waiting_rcv_msgs;
>   atomic_t watchdog_pretimeouts_to_deliver;
> - struct tasklet_struct recv_tasklet;
> + struct work_struct recv_work;
>  
>   spinlock_t xmit_msgs_lock;
>   struct list_head   xmit_msgs;
> @@ -704,7 +705,7 @@ static void clean_up_interface_data(struct ipmi_smi *intf)
>   struct cmd_rcvr  *rcvr, *rcvr2;
>   struct list_head list;
>  
> - tasklet_kill(>recv_tasklet);
> + cancel_work_sync(>recv_work);
>  
>   free_smi_msg_list(>waiting_rcv_msgs);
>   free_recv_msg_list(>waiting_events);
> @@ -1319,7 +1320,7 @@ static void free_user(struct kref *ref)
>  {
>   struct ipmi_user *user = container_of(ref, struct ipmi_user, refcount);
>  
> - /* SRCU cleanup must happen in task context. */
> + /* SRCU cleanup must happen in work context. */
>   queue_work(remove_work_wq, >remove_work);
>  }
>  
> @@ -3605,8 +3606,7 @@ int ipmi_add_smi(struct module *owner,
>   intf->curr_seq = 0;
>   spin_lock_init(>waiting_rcv_msgs_lock);
>   INIT_LIST_HEAD(>waiting_rcv_msgs);
> - tasklet_setup(>recv_tasklet,
> -  smi_recv_tasklet);
> + INIT_WORK(>recv_work, smi_recv_work);
>   atomic_set(>watchdog_pretimeouts_to_deliver, 0);
>   spin_lock_init(>xmit_msgs_lock);
>   INIT_LIST_HEAD(>xmit_msgs);
> @@ -4779,7 +4779,7 @@ static void handle_new_recv_msgs(struct ipmi_smi *intf)
>* To preserve message order, quit if we
>* can't handle a message.  Add the message
>* back at the head, this is safe because this
> -  * tasklet is the only thing that pulls the
> +  * work is the only thing that pulls the
>* messages.
>*/
>   list_add(_msg->link, >waiting_rcv_msgs);
> @@ -4812,10 +4812,10 @@ static void handle_new_recv_msgs(struct ipmi_smi 
> *intf)
>   }
>  }
>  
> -static void smi_recv_tasklet(struct tasklet_struct *t)
> +static void smi_recv_work(struct work_struct *t)
>  {
>   unsigned long flags = 0; /* keep us warning-free. */
> - struct ipmi_smi *intf = from_tasklet(intf, t, recv_tasklet);
> + struct ipmi_smi *intf = from_work(intf, t, recv_work);
>   int run_to_completion = intf->run_to_completion;
>   struct ipmi_smi_msg *newmsg = NULL;
>  
> @@ -4866,7 +4866,7 @@ void ipmi_smi_msg_received(struct ipmi_smi *intf,
>  
>   /*
>* To preserve message order, we keep a queue and deliver from
> -  * a tasklet.
> +  * a work.
>*/
>   if (!run_to_completion)
>   spin_lock_irqsave(>waiting_rcv_msgs_lock, flags);
> @@ -4887,9 

Re: [PATCH 4/9] USB: Convert from tasklet to BH workqueue

2024-03-27 Thread Allen
> > The only generic interface to execute asynchronously in the BH context is
> > tasklet; however, it's marked deprecated and has some design flaws. To
> > replace tasklets, BH workqueue support was recently added. A BH workqueue
> > behaves similarly to regular workqueues except that the queued work items
> > are executed in the BH context.
> >
> > This patch converts drivers/infiniband/* from tasklet to BH workqueue.
>
> No it does not, I think your changelog is wrong :(

Whoops, sorry about that. I messed up the commit messages. I will fix it in v2.
>
> >
> > Based on the work done by Tejun Heo 
> > Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10
> >
> > Signed-off-by: Allen Pais 
> > ---
> >  drivers/usb/atm/usbatm.c| 55 +++--
> >  drivers/usb/atm/usbatm.h|  3 +-
> >  drivers/usb/core/hcd.c  | 22 ++--
> >  drivers/usb/gadget/udc/fsl_qe_udc.c | 21 +--
> >  drivers/usb/gadget/udc/fsl_qe_udc.h |  4 +--
> >  drivers/usb/host/ehci-sched.c   |  2 +-
> >  drivers/usb/host/fhci-hcd.c |  3 +-
> >  drivers/usb/host/fhci-sched.c   | 10 +++---
> >  drivers/usb/host/fhci.h |  5 +--
> >  drivers/usb/host/xhci-dbgcap.h  |  3 +-
> >  drivers/usb/host/xhci-dbgtty.c  | 15 
> >  include/linux/usb/cdc_ncm.h |  2 +-
> >  include/linux/usb/usbnet.h  |  2 +-
> >  13 files changed, 76 insertions(+), 71 deletions(-)
> >
> > diff --git a/drivers/usb/atm/usbatm.c b/drivers/usb/atm/usbatm.c
> > index 2da6615fbb6f..74849f24e52e 100644
> > --- a/drivers/usb/atm/usbatm.c
> > +++ b/drivers/usb/atm/usbatm.c
> > @@ -17,7 +17,7 @@
> >   *   - Removed the limit on the number of devices
> >   *   - Module now autoloads on device plugin
> >   *   - Merged relevant parts of sarlib
> > - *   - Replaced the kernel thread with a tasklet
> > + *   - Replaced the kernel thread with a work
>
> a "work"?
 will fix the comments.

>
> >   *   - New packet transmission code
> >   *   - Changed proc file contents
> >   *   - Fixed all known SMP races
> > @@ -68,6 +68,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #ifdef VERBOSE_DEBUG
> >  static int usbatm_print_packet(struct usbatm_data *instance, const 
> > unsigned char *data, int len);
> > @@ -249,7 +250,7 @@ static void usbatm_complete(struct urb *urb)
> >   /* vdbg("%s: urb 0x%p, status %d, actual_length %d",
> >__func__, urb, status, urb->actual_length); */
> >
> > - /* Can be invoked from task context, protect against interrupts */
> > + /* Can be invoked from work context, protect against interrupts */
>
> "workqueue"?  This too seems wrong.
>
> Same for other comment changes in this patch.

Thanks for the quick review, I will fix the comments and send out v2.

- Alle

> thanks,
>
> greg k-h
>


Re: [PATCH 4/9] USB: Convert from tasklet to BH workqueue

2024-03-27 Thread Greg KH
On Wed, Mar 27, 2024 at 04:03:09PM +, Allen Pais wrote:
> The only generic interface to execute asynchronously in the BH context is
> tasklet; however, it's marked deprecated and has some design flaws. To
> replace tasklets, BH workqueue support was recently added. A BH workqueue
> behaves similarly to regular workqueues except that the queued work items
> are executed in the BH context.
> 
> This patch converts drivers/infiniband/* from tasklet to BH workqueue.

No it does not, I think your changelog is wrong :(

> 
> Based on the work done by Tejun Heo 
> Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10
> 
> Signed-off-by: Allen Pais 
> ---
>  drivers/usb/atm/usbatm.c| 55 +++--
>  drivers/usb/atm/usbatm.h|  3 +-
>  drivers/usb/core/hcd.c  | 22 ++--
>  drivers/usb/gadget/udc/fsl_qe_udc.c | 21 +--
>  drivers/usb/gadget/udc/fsl_qe_udc.h |  4 +--
>  drivers/usb/host/ehci-sched.c   |  2 +-
>  drivers/usb/host/fhci-hcd.c |  3 +-
>  drivers/usb/host/fhci-sched.c   | 10 +++---
>  drivers/usb/host/fhci.h |  5 +--
>  drivers/usb/host/xhci-dbgcap.h  |  3 +-
>  drivers/usb/host/xhci-dbgtty.c  | 15 
>  include/linux/usb/cdc_ncm.h |  2 +-
>  include/linux/usb/usbnet.h  |  2 +-
>  13 files changed, 76 insertions(+), 71 deletions(-)
> 
> diff --git a/drivers/usb/atm/usbatm.c b/drivers/usb/atm/usbatm.c
> index 2da6615fbb6f..74849f24e52e 100644
> --- a/drivers/usb/atm/usbatm.c
> +++ b/drivers/usb/atm/usbatm.c
> @@ -17,7 +17,7 @@
>   *   - Removed the limit on the number of devices
>   *   - Module now autoloads on device plugin
>   *   - Merged relevant parts of sarlib
> - *   - Replaced the kernel thread with a tasklet
> + *   - Replaced the kernel thread with a work

a "work"?

>   *   - New packet transmission code
>   *   - Changed proc file contents
>   *   - Fixed all known SMP races
> @@ -68,6 +68,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #ifdef VERBOSE_DEBUG
>  static int usbatm_print_packet(struct usbatm_data *instance, const unsigned 
> char *data, int len);
> @@ -249,7 +250,7 @@ static void usbatm_complete(struct urb *urb)
>   /* vdbg("%s: urb 0x%p, status %d, actual_length %d",
>__func__, urb, status, urb->actual_length); */
>  
> - /* Can be invoked from task context, protect against interrupts */
> + /* Can be invoked from work context, protect against interrupts */

"workqueue"?  This too seems wrong.

Same for other comment changes in this patch.

thanks,

greg k-h


Re: [PATCH 4/9] USB: Convert from tasklet to BH workqueue

2024-03-27 Thread Duncan Sands
Hi Allen, the usbatm bits look very reasonable to me.  Unfortunately I don't 
have the hardware to test any more.  Still, for what it's worth:


Signed-off-by: Duncan Sands 


[PATCH 5/9] mailbox: Convert from tasklet to BH workqueue

2024-03-27 Thread Allen Pais
The only generic interface to execute asynchronously in the BH context is
tasklet; however, it's marked deprecated and has some design flaws. To
replace tasklets, BH workqueue support was recently added. A BH workqueue
behaves similarly to regular workqueues except that the queued work items
are executed in the BH context.

This patch converts drivers/infiniband/* from tasklet to BH workqueue.

Based on the work done by Tejun Heo 
Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10

Signed-off-by: Allen Pais 
---
 drivers/mailbox/bcm-pdc-mailbox.c | 21 +++--
 drivers/mailbox/imx-mailbox.c | 16 
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/mailbox/bcm-pdc-mailbox.c 
b/drivers/mailbox/bcm-pdc-mailbox.c
index 1768d3d5aaa0..242e7504a628 100644
--- a/drivers/mailbox/bcm-pdc-mailbox.c
+++ b/drivers/mailbox/bcm-pdc-mailbox.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define PDC_SUCCESS  0
 
@@ -293,8 +294,8 @@ struct pdc_state {
 
unsigned int pdc_irq;
 
-   /* tasklet for deferred processing after DMA rx interrupt */
-   struct tasklet_struct rx_tasklet;
+   /* work for deferred processing after DMA rx interrupt */
+   struct work_struct rx_work;
 
/* Number of bytes of receive status prior to each rx frame */
u32 rx_status_len;
@@ -952,18 +953,18 @@ static irqreturn_t pdc_irq_handler(int irq, void *data)
iowrite32(intstatus, pdcs->pdc_reg_vbase + PDC_INTSTATUS_OFFSET);
 
/* Wakeup IRQ thread */
-   tasklet_schedule(>rx_tasklet);
+   queue_work(system_bh_wq, >rx_work);
return IRQ_HANDLED;
 }
 
 /**
- * pdc_tasklet_cb() - Tasklet callback that runs the deferred processing after
+ * pdc_work_cb() - Work callback that runs the deferred processing after
  * a DMA receive interrupt. Reenables the receive interrupt.
  * @t: Pointer to the Altera sSGDMA channel structure
  */
-static void pdc_tasklet_cb(struct tasklet_struct *t)
+static void pdc_work_cb(struct work_struct *t)
 {
-   struct pdc_state *pdcs = from_tasklet(pdcs, t, rx_tasklet);
+   struct pdc_state *pdcs = from_work(pdcs, t, rx_work);
 
pdc_receive(pdcs);
 
@@ -1577,8 +1578,8 @@ static int pdc_probe(struct platform_device *pdev)
 
pdc_hw_init(pdcs);
 
-   /* Init tasklet for deferred DMA rx processing */
-   tasklet_setup(>rx_tasklet, pdc_tasklet_cb);
+   /* Init work for deferred DMA rx processing */
+   INIT_WORK(>rx_work, pdc_work_cb);
 
err = pdc_interrupts_init(pdcs);
if (err)
@@ -1595,7 +1596,7 @@ static int pdc_probe(struct platform_device *pdev)
return PDC_SUCCESS;
 
 cleanup_buf_pool:
-   tasklet_kill(>rx_tasklet);
+   cancel_work_sync(>rx_work);
dma_pool_destroy(pdcs->rx_buf_pool);
 
 cleanup_ring_pool:
@@ -1611,7 +1612,7 @@ static void pdc_remove(struct platform_device *pdev)
 
pdc_free_debugfs();
 
-   tasklet_kill(>rx_tasklet);
+   cancel_work_sync(>rx_work);
 
pdc_hw_disable(pdcs);
 
diff --git a/drivers/mailbox/imx-mailbox.c b/drivers/mailbox/imx-mailbox.c
index 5c1d09cad761..933727f89431 100644
--- a/drivers/mailbox/imx-mailbox.c
+++ b/drivers/mailbox/imx-mailbox.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "mailbox.h"
 
@@ -80,7 +81,7 @@ struct imx_mu_con_priv {
charirq_desc[IMX_MU_CHAN_NAME_SIZE];
enum imx_mu_chan_type   type;
struct mbox_chan*chan;
-   struct tasklet_struct   txdb_tasklet;
+   struct work_struct  txdb_work;
 };
 
 struct imx_mu_priv {
@@ -232,7 +233,7 @@ static int imx_mu_generic_tx(struct imx_mu_priv *priv,
break;
case IMX_MU_TYPE_TXDB:
imx_mu_xcr_rmw(priv, IMX_MU_GCR, 
IMX_MU_xCR_GIRn(priv->dcfg->type, cp->idx), 0);
-   tasklet_schedule(>txdb_tasklet);
+   queue_work(system_bh_wq, >txdb_work);
break;
case IMX_MU_TYPE_TXDB_V2:
imx_mu_xcr_rmw(priv, IMX_MU_GCR, 
IMX_MU_xCR_GIRn(priv->dcfg->type, cp->idx), 0);
@@ -420,7 +421,7 @@ static int imx_mu_seco_tx(struct imx_mu_priv *priv, struct 
imx_mu_con_priv *cp,
}
 
/* Simulate hack for mbox framework */
-   tasklet_schedule(>txdb_tasklet);
+   queue_work(system_bh_wq, >txdb_work);
 
break;
default:
@@ -484,9 +485,9 @@ static int imx_mu_seco_rxdb(struct imx_mu_priv *priv, 
struct imx_mu_con_priv *cp
return err;
 }
 
-static void imx_mu_txdb_tasklet(unsigned long data)
+static void imx_mu_txdb_work(struct work_struct *t)
 {
-   struct imx_mu_con_priv *cp = (struct imx_mu_con_priv *)data;
+   struct imx_mu_con_priv *cp = from_work(cp, t, txdb_work);
 
mbox_chan_txdone(cp->chan, 0);
 }
@@ -570,8 +571,7 @@ static int imx_mu_startup(struct mbox_chan *chan)
 
if (cp->type == IMX_MU_TYPE_TXDB) 

[PATCH 7/9] s390: Convert from tasklet to BH workqueue

2024-03-27 Thread Allen Pais
The only generic interface to execute asynchronously in the BH context is
tasklet; however, it's marked deprecated and has some design flaws. To
replace tasklets, BH workqueue support was recently added. A BH workqueue
behaves similarly to regular workqueues except that the queued work items
are executed in the BH context.

This patch converts drivers/infiniband/* from tasklet to BH workqueue.

Based on the work done by Tejun Heo 
Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10

Note: Not tested. Please test/review.

Signed-off-by: Allen Pais 
---
 drivers/s390/block/dasd.c  | 42 
 drivers/s390/block/dasd_int.h  | 10 +++---
 drivers/s390/char/con3270.c| 27 
 drivers/s390/crypto/ap_bus.c   | 24 +++---
 drivers/s390/crypto/ap_bus.h   |  2 +-
 drivers/s390/crypto/zcrypt_msgtype50.c |  2 +-
 drivers/s390/crypto/zcrypt_msgtype6.c  |  4 +--
 drivers/s390/net/ctcm_fsms.c   |  4 +--
 drivers/s390/net/ctcm_main.c   | 15 -
 drivers/s390/net/ctcm_main.h   |  5 +--
 drivers/s390/net/ctcm_mpc.c| 12 +++
 drivers/s390/net/ctcm_mpc.h|  7 ++--
 drivers/s390/net/lcs.c | 26 +++
 drivers/s390/net/lcs.h |  2 +-
 drivers/s390/net/qeth_core_main.c  |  2 +-
 drivers/s390/scsi/zfcp_qdio.c  | 45 +-
 drivers/s390/scsi/zfcp_qdio.h  |  9 +++---
 17 files changed, 117 insertions(+), 121 deletions(-)

diff --git a/drivers/s390/block/dasd.c b/drivers/s390/block/dasd.c
index 0a97cfedd706..c6f9910f0a98 100644
--- a/drivers/s390/block/dasd.c
+++ b/drivers/s390/block/dasd.c
@@ -54,8 +54,8 @@ MODULE_LICENSE("GPL");
  * SECTION: prototypes for static functions of dasd.c
  */
 static int dasd_flush_block_queue(struct dasd_block *);
-static void dasd_device_tasklet(unsigned long);
-static void dasd_block_tasklet(unsigned long);
+static void dasd_device_work(struct work_struct *);
+static void dasd_block_work(struct work_struct *);
 static void do_kick_device(struct work_struct *);
 static void do_reload_device(struct work_struct *);
 static void do_requeue_requests(struct work_struct *);
@@ -114,9 +114,8 @@ struct dasd_device *dasd_alloc_device(void)
dasd_init_chunklist(>erp_chunks, device->erp_mem, PAGE_SIZE);
dasd_init_chunklist(>ese_chunks, device->ese_mem, PAGE_SIZE * 
2);
spin_lock_init(>mem_lock);
-   atomic_set(>tasklet_scheduled, 0);
-   tasklet_init(>tasklet, dasd_device_tasklet,
-(unsigned long) device);
+   atomic_set(>work_scheduled, 0);
+   INIT_WORK(>bh, dasd_device_work);
INIT_LIST_HEAD(>ccw_queue);
timer_setup(>timer, dasd_device_timeout, 0);
INIT_WORK(>kick_work, do_kick_device);
@@ -154,9 +153,8 @@ struct dasd_block *dasd_alloc_block(void)
/* open_count = 0 means device online but not in use */
atomic_set(>open_count, -1);
 
-   atomic_set(>tasklet_scheduled, 0);
-   tasklet_init(>tasklet, dasd_block_tasklet,
-(unsigned long) block);
+   atomic_set(>work_scheduled, 0);
+   INIT_WORK(>bh, dasd_block_work);
INIT_LIST_HEAD(>ccw_queue);
spin_lock_init(>queue_lock);
INIT_LIST_HEAD(>format_list);
@@ -2148,12 +2146,12 @@ EXPORT_SYMBOL_GPL(dasd_flush_device_queue);
 /*
  * Acquire the device lock and process queues for the device.
  */
-static void dasd_device_tasklet(unsigned long data)
+static void dasd_device_work(struct work_struct *t)
 {
-   struct dasd_device *device = (struct dasd_device *) data;
+   struct dasd_device *device = from_work(device, t, bh);
struct list_head final_queue;
 
-   atomic_set (>tasklet_scheduled, 0);
+   atomic_set (>work_scheduled, 0);
INIT_LIST_HEAD(_queue);
spin_lock_irq(get_ccwdev_lock(device->cdev));
/* Check expire time of first request on the ccw queue. */
@@ -2174,15 +2172,15 @@ static void dasd_device_tasklet(unsigned long data)
 }
 
 /*
- * Schedules a call to dasd_tasklet over the device tasklet.
+ * Schedules a call to dasd_work over the device wq.
  */
 void dasd_schedule_device_bh(struct dasd_device *device)
 {
/* Protect against rescheduling. */
-   if (atomic_cmpxchg (>tasklet_scheduled, 0, 1) != 0)
+   if (atomic_cmpxchg (>work_scheduled, 0, 1) != 0)
return;
dasd_get_device(device);
-   tasklet_hi_schedule(>tasklet);
+   queue_work(system_bh_highpri_wq, >bh);
 }
 EXPORT_SYMBOL(dasd_schedule_device_bh);
 
@@ -2595,7 +2593,7 @@ int dasd_sleep_on_immediatly(struct dasd_ccw_req *cqr)
else
rc = -EIO;
 
-   /* kick tasklets */
+   /* kick works */
dasd_schedule_device_bh(device);
if (device->block)
dasd_schedule_block_bh(device->block);
@@ -2891,15 +2889,15 @@ static void __dasd_block_start_head(struct dasd_block 

[PATCH 9/9] mmc: Convert from tasklet to BH workqueue

2024-03-27 Thread Allen Pais
The only generic interface to execute asynchronously in the BH context is
tasklet; however, it's marked deprecated and has some design flaws. To
replace tasklets, BH workqueue support was recently added. A BH workqueue
behaves similarly to regular workqueues except that the queued work items
are executed in the BH context.

This patch converts drivers/infiniband/* from tasklet to BH workqueue.

Based on the work done by Tejun Heo 
Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10

Signed-off-by: Allen Pais 
---
 drivers/mmc/host/atmel-mci.c  | 35 -
 drivers/mmc/host/au1xmmc.c| 37 -
 drivers/mmc/host/cb710-mmc.c  | 15 ++--
 drivers/mmc/host/cb710-mmc.h  |  3 +-
 drivers/mmc/host/dw_mmc.c | 25 ---
 drivers/mmc/host/dw_mmc.h |  9 ++-
 drivers/mmc/host/omap.c   | 17 +++--
 drivers/mmc/host/renesas_sdhi.h   |  3 +-
 drivers/mmc/host/renesas_sdhi_internal_dmac.c | 24 +++---
 drivers/mmc/host/renesas_sdhi_sys_dmac.c  |  9 +--
 drivers/mmc/host/sdhci-bcm-kona.c |  2 +-
 drivers/mmc/host/tifm_sd.c| 15 ++--
 drivers/mmc/host/tmio_mmc.h   |  3 +-
 drivers/mmc/host/tmio_mmc_core.c  |  4 +-
 drivers/mmc/host/uniphier-sd.c| 13 ++--
 drivers/mmc/host/via-sdmmc.c  | 25 ---
 drivers/mmc/host/wbsd.c   | 75 ++-
 drivers/mmc/host/wbsd.h   | 10 +--
 18 files changed, 167 insertions(+), 157 deletions(-)

diff --git a/drivers/mmc/host/atmel-mci.c b/drivers/mmc/host/atmel-mci.c
index dba826db739a..0a92a7fd020f 100644
--- a/drivers/mmc/host/atmel-mci.c
+++ b/drivers/mmc/host/atmel-mci.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -284,12 +285,12 @@ struct atmel_mci_dma {
  * EVENT_DATA_ERROR is pending.
  * @stop_cmdr: Value to be loaded into CMDR when the stop command is
  * to be sent.
- * @tasklet: Tasklet running the request state machine.
+ * @work: Work running the request state machine.
  * @pending_events: Bitmask of events flagged by the interrupt handler
- * to be processed by the tasklet.
+ * to be processed by the work.
  * @completed_events: Bitmask of events which the state machine has
  * processed.
- * @state: Tasklet state.
+ * @state: Work state.
  * @queue: List of slots waiting for access to the controller.
  * @need_clock_update: Update the clock rate before the next request.
  * @need_reset: Reset controller before next request.
@@ -363,7 +364,7 @@ struct atmel_mci {
u32 data_status;
u32 stop_cmdr;
 
-   struct tasklet_struct   tasklet;
+   struct work_struct  work;
unsigned long   pending_events;
unsigned long   completed_events;
enum atmel_mci_statestate;
@@ -761,7 +762,7 @@ static void atmci_timeout_timer(struct timer_list *t)
host->need_reset = 1;
host->state = STATE_END_REQUEST;
smp_wmb();
-   tasklet_schedule(>tasklet);
+   queue_work(system_bh_wq, >work);
 }
 
 static inline unsigned int atmci_ns_to_clocks(struct atmel_mci *host,
@@ -983,7 +984,7 @@ static void atmci_pdc_complete(struct atmel_mci *host)
 
dev_dbg(>pdev->dev, "(%s) set pending xfer complete\n", __func__);
atmci_set_pending(host, EVENT_XFER_COMPLETE);
-   tasklet_schedule(>tasklet);
+   queue_work(system_bh_wq, >work);
 }
 
 static void atmci_dma_cleanup(struct atmel_mci *host)
@@ -997,7 +998,7 @@ static void atmci_dma_cleanup(struct atmel_mci *host)
 }
 
 /*
- * This function is called by the DMA driver from tasklet context.
+ * This function is called by the DMA driver from work context.
  */
 static void atmci_dma_complete(void *arg)
 {
@@ -1020,7 +1021,7 @@ static void atmci_dma_complete(void *arg)
dev_dbg(>pdev->dev,
"(%s) set pending xfer complete\n", __func__);
atmci_set_pending(host, EVENT_XFER_COMPLETE);
-   tasklet_schedule(>tasklet);
+   queue_work(system_bh_wq, >work);
 
/*
 * Regardless of what the documentation says, we have
@@ -1033,7 +1034,7 @@ static void atmci_dma_complete(void *arg)
 * haven't seen all the potential error bits yet.
 *
 * The interrupt handler will schedule a different
-* tasklet to finish things up when the data transfer
+* work to finish things up when the data transfer
 * is completely done.
 *
 * We may not complete the mmc request here anyway
@@ -1765,9 +1766,9 @@ static void atmci_detect_change(struct timer_list *t)
}
 }
 
-static void atmci_tasklet_func(struct tasklet_struct *t)
+static 

[PATCH 8/9] drivers/media/*: Convert from tasklet to BH workqueue

2024-03-27 Thread Allen Pais
The only generic interface to execute asynchronously in the BH context is
tasklet; however, it's marked deprecated and has some design flaws. To
replace tasklets, BH workqueue support was recently added. A BH workqueue
behaves similarly to regular workqueues except that the queued work items
are executed in the BH context.

This patch converts drivers/media/* from tasklet to BH workqueue.

Based on the work done by Tejun Heo 
Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10

Signed-off-by: Allen Pais 
---
 drivers/media/pci/bt8xx/bt878.c   |  8 ++--
 drivers/media/pci/bt8xx/bt878.h   |  3 +-
 drivers/media/pci/bt8xx/dvb-bt8xx.c   |  9 ++--
 drivers/media/pci/ddbridge/ddbridge.h |  3 +-
 drivers/media/pci/mantis/hopper_cards.c   |  2 +-
 drivers/media/pci/mantis/mantis_cards.c   |  2 +-
 drivers/media/pci/mantis/mantis_common.h  |  3 +-
 drivers/media/pci/mantis/mantis_dma.c |  5 ++-
 drivers/media/pci/mantis/mantis_dma.h |  2 +-
 drivers/media/pci/mantis/mantis_dvb.c | 12 +++---
 drivers/media/pci/ngene/ngene-core.c  | 23 ++-
 drivers/media/pci/ngene/ngene.h   |  5 ++-
 drivers/media/pci/smipcie/smipcie-main.c  | 18 
 drivers/media/pci/smipcie/smipcie.h   |  3 +-
 drivers/media/pci/ttpci/budget-av.c   |  3 +-
 drivers/media/pci/ttpci/budget-ci.c   | 27 ++--
 drivers/media/pci/ttpci/budget-core.c | 10 ++---
 drivers/media/pci/ttpci/budget.h  |  5 ++-
 drivers/media/pci/tw5864/tw5864-core.c|  2 +-
 drivers/media/pci/tw5864/tw5864-video.c   | 13 +++---
 drivers/media/pci/tw5864/tw5864.h |  7 ++--
 drivers/media/platform/intel/pxa_camera.c | 15 +++
 drivers/media/platform/marvell/mcam-core.c| 11 ++---
 drivers/media/platform/marvell/mcam-core.h|  3 +-
 .../st/sti/c8sectpfe/c8sectpfe-core.c | 15 +++
 .../st/sti/c8sectpfe/c8sectpfe-core.h |  2 +-
 drivers/media/radio/wl128x/fmdrv.h|  7 ++--
 drivers/media/radio/wl128x/fmdrv_common.c | 41 ++-
 drivers/media/rc/mceusb.c |  2 +-
 drivers/media/usb/ttusb-dec/ttusb_dec.c   | 21 +-
 30 files changed, 151 insertions(+), 131 deletions(-)

diff --git a/drivers/media/pci/bt8xx/bt878.c b/drivers/media/pci/bt8xx/bt878.c
index 90972d6952f1..983ec29108f0 100644
--- a/drivers/media/pci/bt8xx/bt878.c
+++ b/drivers/media/pci/bt8xx/bt878.c
@@ -300,8 +300,8 @@ static irqreturn_t bt878_irq(int irq, void *dev_id)
}
if (astat & BT878_ARISCI) {
bt->finished_block = (stat & BT878_ARISCS) >> 28;
-   if (bt->tasklet.callback)
-   tasklet_schedule(>tasklet);
+   if (bt->work.func)
+   queue_work(system_bh_wq,
break;
}
count++;
@@ -478,8 +478,8 @@ static int bt878_probe(struct pci_dev *dev, const struct 
pci_device_id *pci_id)
btwrite(0, BT878_AINT_MASK);
bt878_num++;
 
-   if (!bt->tasklet.func)
-   tasklet_disable(>tasklet);
+   if (!bt->work.func)
+   disable_work_sync(>work);
 
return 0;
 
diff --git a/drivers/media/pci/bt8xx/bt878.h b/drivers/media/pci/bt8xx/bt878.h
index fde8db293c54..b9ce78e5116b 100644
--- a/drivers/media/pci/bt8xx/bt878.h
+++ b/drivers/media/pci/bt8xx/bt878.h
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "bt848.h"
 #include "bttv.h"
@@ -120,7 +121,7 @@ struct bt878 {
dma_addr_t risc_dma;
u32 risc_pos;
 
-   struct tasklet_struct tasklet;
+   struct work_struct work;
int shutdown;
 };
 
diff --git a/drivers/media/pci/bt8xx/dvb-bt8xx.c 
b/drivers/media/pci/bt8xx/dvb-bt8xx.c
index 390cbba6c065..8c0e1fa764a4 100644
--- a/drivers/media/pci/bt8xx/dvb-bt8xx.c
+++ b/drivers/media/pci/bt8xx/dvb-bt8xx.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -39,9 +40,9 @@ DVB_DEFINE_MOD_OPT_ADAPTER_NR(adapter_nr);
 
 #define IF_FREQUENCYx6 217/* 6 * 36.167MHz */
 
-static void dvb_bt8xx_task(struct tasklet_struct *t)
+static void dvb_bt8xx_task(struct work_struct *t)
 {
-   struct bt878 *bt = from_tasklet(bt, t, tasklet);
+   struct bt878 *bt = from_work(bt, t, work);
struct dvb_bt8xx_card *card = dev_get_drvdata(>adapter->dev);
 
dprintk("%d\n", card->bt->finished_block);
@@ -782,7 +783,7 @@ static int dvb_bt8xx_load_card(struct dvb_bt8xx_card *card, 
u32 type)
goto err_disconnect_frontend;
}
 
-   tasklet_setup(>bt->tasklet, dvb_bt8xx_task);
+   INIT_WORK(>bt->work, dvb_bt8xx_task);
 
frontend_init(card, type);
 
@@ -922,7 +923,7 @@ static void dvb_bt8xx_remove(struct bttv_sub_device *sub)
dprintk("dvb_bt8xx: unloading 

[PATCH 6/9] ipmi: Convert from tasklet to BH workqueue

2024-03-27 Thread Allen Pais
The only generic interface to execute asynchronously in the BH context is
tasklet; however, it's marked deprecated and has some design flaws. To
replace tasklets, BH workqueue support was recently added. A BH workqueue
behaves similarly to regular workqueues except that the queued work items
are executed in the BH context.

This patch converts drivers/infiniband/* from tasklet to BH workqueue.

Based on the work done by Tejun Heo 
Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10

Signed-off-by: Allen Pais 
---
 drivers/char/ipmi/ipmi_msghandler.c | 30 ++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_msghandler.c 
b/drivers/char/ipmi/ipmi_msghandler.c
index b0eedc4595b3..fce2a2dbdc82 100644
--- a/drivers/char/ipmi/ipmi_msghandler.c
+++ b/drivers/char/ipmi/ipmi_msghandler.c
@@ -36,12 +36,13 @@
 #include 
 #include 
 #include 
+#include 
 
 #define IPMI_DRIVER_VERSION "39.2"
 
 static struct ipmi_recv_msg *ipmi_alloc_recv_msg(void);
 static int ipmi_init_msghandler(void);
-static void smi_recv_tasklet(struct tasklet_struct *t);
+static void smi_recv_work(struct work_struct *t);
 static void handle_new_recv_msgs(struct ipmi_smi *intf);
 static void need_waiter(struct ipmi_smi *intf);
 static int handle_one_recv_msg(struct ipmi_smi *intf,
@@ -498,13 +499,13 @@ struct ipmi_smi {
/*
 * Messages queued for delivery.  If delivery fails (out of memory
 * for instance), They will stay in here to be processed later in a
-* periodic timer interrupt.  The tasklet is for handling received
+* periodic timer interrupt.  The work is for handling received
 * messages directly from the handler.
 */
spinlock_t   waiting_rcv_msgs_lock;
struct list_head waiting_rcv_msgs;
atomic_t watchdog_pretimeouts_to_deliver;
-   struct tasklet_struct recv_tasklet;
+   struct work_struct recv_work;
 
spinlock_t xmit_msgs_lock;
struct list_head   xmit_msgs;
@@ -704,7 +705,7 @@ static void clean_up_interface_data(struct ipmi_smi *intf)
struct cmd_rcvr  *rcvr, *rcvr2;
struct list_head list;
 
-   tasklet_kill(>recv_tasklet);
+   cancel_work_sync(>recv_work);
 
free_smi_msg_list(>waiting_rcv_msgs);
free_recv_msg_list(>waiting_events);
@@ -1319,7 +1320,7 @@ static void free_user(struct kref *ref)
 {
struct ipmi_user *user = container_of(ref, struct ipmi_user, refcount);
 
-   /* SRCU cleanup must happen in task context. */
+   /* SRCU cleanup must happen in work context. */
queue_work(remove_work_wq, >remove_work);
 }
 
@@ -3605,8 +3606,7 @@ int ipmi_add_smi(struct module *owner,
intf->curr_seq = 0;
spin_lock_init(>waiting_rcv_msgs_lock);
INIT_LIST_HEAD(>waiting_rcv_msgs);
-   tasklet_setup(>recv_tasklet,
-smi_recv_tasklet);
+   INIT_WORK(>recv_work, smi_recv_work);
atomic_set(>watchdog_pretimeouts_to_deliver, 0);
spin_lock_init(>xmit_msgs_lock);
INIT_LIST_HEAD(>xmit_msgs);
@@ -4779,7 +4779,7 @@ static void handle_new_recv_msgs(struct ipmi_smi *intf)
 * To preserve message order, quit if we
 * can't handle a message.  Add the message
 * back at the head, this is safe because this
-* tasklet is the only thing that pulls the
+* work is the only thing that pulls the
 * messages.
 */
list_add(_msg->link, >waiting_rcv_msgs);
@@ -4812,10 +4812,10 @@ static void handle_new_recv_msgs(struct ipmi_smi *intf)
}
 }
 
-static void smi_recv_tasklet(struct tasklet_struct *t)
+static void smi_recv_work(struct work_struct *t)
 {
unsigned long flags = 0; /* keep us warning-free. */
-   struct ipmi_smi *intf = from_tasklet(intf, t, recv_tasklet);
+   struct ipmi_smi *intf = from_work(intf, t, recv_work);
int run_to_completion = intf->run_to_completion;
struct ipmi_smi_msg *newmsg = NULL;
 
@@ -4866,7 +4866,7 @@ void ipmi_smi_msg_received(struct ipmi_smi *intf,
 
/*
 * To preserve message order, we keep a queue and deliver from
-* a tasklet.
+* a work.
 */
if (!run_to_completion)
spin_lock_irqsave(>waiting_rcv_msgs_lock, flags);
@@ -4887,9 +4887,9 @@ void ipmi_smi_msg_received(struct ipmi_smi *intf,
spin_unlock_irqrestore(>xmit_msgs_lock, flags);
 
if (run_to_completion)
-   smi_recv_tasklet(>recv_tasklet);
+   smi_recv_work(>recv_work);
else
-   tasklet_schedule(>recv_tasklet);
+   queue_work(system_bh_wq, >recv_work);
 }
 EXPORT_SYMBOL(ipmi_smi_msg_received);
 
@@ -4899,7 +4899,7 @@ void ipmi_smi_watchdog_pretimeout(struct 

[PATCH 4/9] USB: Convert from tasklet to BH workqueue

2024-03-27 Thread Allen Pais
The only generic interface to execute asynchronously in the BH context is
tasklet; however, it's marked deprecated and has some design flaws. To
replace tasklets, BH workqueue support was recently added. A BH workqueue
behaves similarly to regular workqueues except that the queued work items
are executed in the BH context.

This patch converts drivers/infiniband/* from tasklet to BH workqueue.

Based on the work done by Tejun Heo 
Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10

Signed-off-by: Allen Pais 
---
 drivers/usb/atm/usbatm.c| 55 +++--
 drivers/usb/atm/usbatm.h|  3 +-
 drivers/usb/core/hcd.c  | 22 ++--
 drivers/usb/gadget/udc/fsl_qe_udc.c | 21 +--
 drivers/usb/gadget/udc/fsl_qe_udc.h |  4 +--
 drivers/usb/host/ehci-sched.c   |  2 +-
 drivers/usb/host/fhci-hcd.c |  3 +-
 drivers/usb/host/fhci-sched.c   | 10 +++---
 drivers/usb/host/fhci.h |  5 +--
 drivers/usb/host/xhci-dbgcap.h  |  3 +-
 drivers/usb/host/xhci-dbgtty.c  | 15 
 include/linux/usb/cdc_ncm.h |  2 +-
 include/linux/usb/usbnet.h  |  2 +-
 13 files changed, 76 insertions(+), 71 deletions(-)

diff --git a/drivers/usb/atm/usbatm.c b/drivers/usb/atm/usbatm.c
index 2da6615fbb6f..74849f24e52e 100644
--- a/drivers/usb/atm/usbatm.c
+++ b/drivers/usb/atm/usbatm.c
@@ -17,7 +17,7 @@
  * - Removed the limit on the number of devices
  * - Module now autoloads on device plugin
  * - Merged relevant parts of sarlib
- * - Replaced the kernel thread with a tasklet
+ * - Replaced the kernel thread with a work
  * - New packet transmission code
  * - Changed proc file contents
  * - Fixed all known SMP races
@@ -68,6 +68,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef VERBOSE_DEBUG
 static int usbatm_print_packet(struct usbatm_data *instance, const unsigned 
char *data, int len);
@@ -249,7 +250,7 @@ static void usbatm_complete(struct urb *urb)
/* vdbg("%s: urb 0x%p, status %d, actual_length %d",
 __func__, urb, status, urb->actual_length); */
 
-   /* Can be invoked from task context, protect against interrupts */
+   /* Can be invoked from work context, protect against interrupts */
spin_lock_irqsave(>lock, flags);
 
/* must add to the back when receiving; doesn't matter when sending */
@@ -269,7 +270,7 @@ static void usbatm_complete(struct urb *urb)
/* throttle processing in case of an error */
mod_timer(>delay, jiffies + 
msecs_to_jiffies(THROTTLE_MSECS));
} else
-   tasklet_schedule(>tasklet);
+   queue_work(system_bh_wq, >work);
 }
 
 
@@ -511,10 +512,10 @@ static unsigned int usbatm_write_cells(struct usbatm_data 
*instance,
 **  receive  **
 **/
 
-static void usbatm_rx_process(struct tasklet_struct *t)
+static void usbatm_rx_process(struct work_struct *t)
 {
-   struct usbatm_data *instance = from_tasklet(instance, t,
-   rx_channel.tasklet);
+   struct usbatm_data *instance = from_work(instance, t,
+   rx_channel.work);
struct urb *urb;
 
while ((urb = usbatm_pop_urb(>rx_channel))) {
@@ -565,10 +566,10 @@ static void usbatm_rx_process(struct tasklet_struct *t)
 **  send  **
 ***/
 
-static void usbatm_tx_process(struct tasklet_struct *t)
+static void usbatm_tx_process(struct work_struct *t)
 {
-   struct usbatm_data *instance = from_tasklet(instance, t,
-   tx_channel.tasklet);
+   struct usbatm_data *instance = from_work(instance, t,
+   tx_channel.work);
struct sk_buff *skb = instance->current_skb;
struct urb *urb = NULL;
const unsigned int buf_size = instance->tx_channel.buf_size;
@@ -632,13 +633,13 @@ static void usbatm_cancel_send(struct usbatm_data 
*instance,
}
spin_unlock_irq(>sndqueue.lock);
 
-   tasklet_disable(>tx_channel.tasklet);
+   disable_work_sync(>tx_channel.work);
if ((skb = instance->current_skb) && (UDSL_SKB(skb)->atm.vcc == vcc)) {
atm_dbg(instance, "%s: popping current skb (0x%p)\n", __func__, 
skb);
instance->current_skb = NULL;
usbatm_pop(vcc, skb);
}
-   tasklet_enable(>tx_channel.tasklet);
+   enable_and_queue_work(system_bh_wq, >tx_channel.work);
 }
 
 static int usbatm_atm_send(struct atm_vcc *vcc, struct sk_buff *skb)
@@ -677,7 +678,7 @@ static int usbatm_atm_send(struct atm_vcc *vcc, struct 
sk_buff *skb)
ctrl->crc = crc32_be(~0, skb->data, skb->len);
 
skb_queue_tail(>sndqueue, skb);
-   tasklet_schedule(>tx_channel.tasklet);
+   queue_work(system_bh_wq, 

[PATCH 3/9] IB: Convert from tasklet to BH workqueue

2024-03-27 Thread Allen Pais
The only generic interface to execute asynchronously in the BH context is
tasklet; however, it's marked deprecated and has some design flaws. To
replace tasklets, BH workqueue support was recently added. A BH workqueue
behaves similarly to regular workqueues except that the queued work items
are executed in the BH context.

This patch converts drivers/infiniband/* from tasklet to BH workqueue.

Based on the work done by Tejun Heo 
Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10

Signed-off-by: Allen Pais 
---
 drivers/infiniband/hw/bnxt_re/bnxt_re.h|  3 +-
 drivers/infiniband/hw/bnxt_re/qplib_fp.c   | 21 ++--
 drivers/infiniband/hw/bnxt_re/qplib_fp.h   |  2 +-
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 25 ---
 drivers/infiniband/hw/bnxt_re/qplib_rcfw.h |  2 +-
 drivers/infiniband/hw/erdma/erdma.h|  3 +-
 drivers/infiniband/hw/erdma/erdma_eq.c | 11 ---
 drivers/infiniband/hw/hfi1/rc.c|  2 +-
 drivers/infiniband/hw/hfi1/sdma.c  | 37 +++---
 drivers/infiniband/hw/hfi1/sdma.h  |  9 +++---
 drivers/infiniband/hw/hfi1/tid_rdma.c  |  6 ++--
 drivers/infiniband/hw/irdma/ctrl.c |  2 +-
 drivers/infiniband/hw/irdma/hw.c   | 24 +++---
 drivers/infiniband/hw/irdma/main.h |  5 +--
 drivers/infiniband/hw/qib/qib.h|  7 ++--
 drivers/infiniband/hw/qib/qib_iba7322.c|  9 +++---
 drivers/infiniband/hw/qib/qib_rc.c | 16 +-
 drivers/infiniband/hw/qib/qib_ruc.c|  4 +--
 drivers/infiniband/hw/qib/qib_sdma.c   | 11 ---
 drivers/infiniband/sw/rdmavt/qp.c  |  2 +-
 20 files changed, 106 insertions(+), 95 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/bnxt_re.h 
b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
index 9dca451ed522..f511c8415806 100644
--- a/drivers/infiniband/hw/bnxt_re/bnxt_re.h
+++ b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
@@ -42,6 +42,7 @@
 #include 
 #include "hw_counters.h"
 #include 
+#include 
 #define ROCE_DRV_MODULE_NAME   "bnxt_re"
 
 #define BNXT_RE_DESC   "Broadcom NetXtreme-C/E RoCE Driver"
@@ -162,7 +163,7 @@ struct bnxt_re_dev {
u8  cur_prio_map;
 
/* FP Notification Queue (CQ & SRQ) */
-   struct tasklet_struct   nq_task;
+   struct work_struct  nq_work;
 
/* RCFW Channel */
struct bnxt_qplib_rcfw  rcfw;
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.c 
b/drivers/infiniband/hw/bnxt_re/qplib_fp.c
index 439d0c7c5d0c..052906982cdf 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_fp.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.c
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "roce_hsi.h"
@@ -294,9 +295,9 @@ static void __wait_for_all_nqes(struct bnxt_qplib_cq *cq, 
u16 cnq_events)
}
 }
 
-static void bnxt_qplib_service_nq(struct tasklet_struct *t)
+static void bnxt_qplib_service_nq(struct work_struct *t)
 {
-   struct bnxt_qplib_nq *nq = from_tasklet(nq, t, nq_tasklet);
+   struct bnxt_qplib_nq *nq = from_work(nq, t, nq_work);
struct bnxt_qplib_hwq *hwq = >hwq;
struct bnxt_qplib_cq *cq;
int budget = nq->budget;
@@ -394,7 +395,7 @@ void bnxt_re_synchronize_nq(struct bnxt_qplib_nq *nq)
int budget = nq->budget;
 
nq->budget = nq->hwq.max_elements;
-   bnxt_qplib_service_nq(>nq_tasklet);
+   bnxt_qplib_service_nq(>nq_work);
nq->budget = budget;
 }
 
@@ -409,7 +410,7 @@ static irqreturn_t bnxt_qplib_nq_irq(int irq, void 
*dev_instance)
prefetch(bnxt_qplib_get_qe(hwq, sw_cons, NULL));
 
/* Fan out to CPU affinitized kthreads? */
-   tasklet_schedule(>nq_tasklet);
+   queue_work(system_bh_wq, >nq_work);
 
return IRQ_HANDLED;
 }
@@ -430,8 +431,8 @@ void bnxt_qplib_nq_stop_irq(struct bnxt_qplib_nq *nq, bool 
kill)
nq->name = NULL;
 
if (kill)
-   tasklet_kill(>nq_tasklet);
-   tasklet_disable(>nq_tasklet);
+   cancel_work_sync(>nq_work);
+   disable_work_sync(>nq_work);
 }
 
 void bnxt_qplib_disable_nq(struct bnxt_qplib_nq *nq)
@@ -465,9 +466,9 @@ int bnxt_qplib_nq_start_irq(struct bnxt_qplib_nq *nq, int 
nq_indx,
 
nq->msix_vec = msix_vector;
if (need_init)
-   tasklet_setup(>nq_tasklet, bnxt_qplib_service_nq);
+   INIT_WORK(>nq_work, bnxt_qplib_service_nq);
else
-   tasklet_enable(>nq_tasklet);
+   enable_and_queue_work(system_bh_wq, >nq_work);
 
nq->name = kasprintf(GFP_KERNEL, "bnxt_re-nq-%d@pci:%s",
 nq_indx, pci_name(res->pdev));
@@ -477,7 +478,7 @@ int bnxt_qplib_nq_start_irq(struct bnxt_qplib_nq *nq, int 
nq_indx,
if (rc) {
kfree(nq->name);
nq->name = NULL;
-   tasklet_disable(>nq_tasklet);
+   disable_work_sync(>nq_work);
   

[PATCH 2/9] dma: Convert from tasklet to BH workqueue

2024-03-27 Thread Allen Pais
The only generic interface to execute asynchronously in the BH context is
tasklet; however, it's marked deprecated and has some design flaws. To
replace tasklets, BH workqueue support was recently added. A BH workqueue
behaves similarly to regular workqueues except that the queued work items
are executed in the BH context.

This patch converts drivers/dma/* from tasklet to BH workqueue.

Based on the work done by Tejun Heo 
Branch: git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10

Signed-off-by: Allen Pais 
---
 drivers/dma/altera-msgdma.c   | 15 
 drivers/dma/apple-admac.c | 15 
 drivers/dma/at_hdmac.c|  2 +-
 drivers/dma/at_xdmac.c| 15 
 drivers/dma/bcm2835-dma.c |  2 +-
 drivers/dma/dma-axi-dmac.c|  2 +-
 drivers/dma/dma-jz4780.c  |  2 +-
 .../dma/dw-axi-dmac/dw-axi-dmac-platform.c|  2 +-
 drivers/dma/dw-edma/dw-edma-core.c|  2 +-
 drivers/dma/dw/core.c | 13 +++
 drivers/dma/dw/regs.h |  3 +-
 drivers/dma/ep93xx_dma.c  | 15 
 drivers/dma/fsl-edma-common.c |  2 +-
 drivers/dma/fsl-qdma.c|  2 +-
 drivers/dma/fsl_raid.c| 11 +++---
 drivers/dma/fsl_raid.h|  2 +-
 drivers/dma/fsldma.c  | 15 
 drivers/dma/fsldma.h  |  3 +-
 drivers/dma/hisi_dma.c|  2 +-
 drivers/dma/hsu/hsu.c |  2 +-
 drivers/dma/idma64.c  |  4 +--
 drivers/dma/img-mdc-dma.c |  2 +-
 drivers/dma/imx-dma.c | 27 +++---
 drivers/dma/imx-sdma.c|  6 ++--
 drivers/dma/ioat/dma.c| 17 -
 drivers/dma/ioat/dma.h|  5 +--
 drivers/dma/ioat/init.c   |  2 +-
 drivers/dma/k3dma.c   | 19 +-
 drivers/dma/mediatek/mtk-cqdma.c  | 35 ++-
 drivers/dma/mediatek/mtk-hsdma.c  |  2 +-
 drivers/dma/mediatek/mtk-uart-apdma.c |  4 +--
 drivers/dma/mmp_pdma.c| 13 +++
 drivers/dma/mmp_tdma.c| 11 +++---
 drivers/dma/mpc512x_dma.c | 17 -
 drivers/dma/mv_xor.c  | 13 +++
 drivers/dma/mv_xor.h  |  5 +--
 drivers/dma/mv_xor_v2.c   | 23 ++--
 drivers/dma/mxs-dma.c | 13 +++
 drivers/dma/nbpfaxi.c | 15 
 drivers/dma/owl-dma.c |  2 +-
 drivers/dma/pch_dma.c | 17 -
 drivers/dma/pl330.c   | 31 
 drivers/dma/plx_dma.c | 13 +++
 drivers/dma/ppc4xx/adma.c | 17 -
 drivers/dma/ppc4xx/adma.h |  5 +--
 drivers/dma/pxa_dma.c |  2 +-
 drivers/dma/qcom/bam_dma.c| 35 ++-
 drivers/dma/qcom/gpi.c| 18 +-
 drivers/dma/qcom/hidma.c  | 11 +++---
 drivers/dma/qcom/hidma.h  |  5 +--
 drivers/dma/qcom/hidma_ll.c   | 11 +++---
 drivers/dma/qcom/qcom_adm.c   |  2 +-
 drivers/dma/sa11x0-dma.c  | 27 +++---
 drivers/dma/sf-pdma/sf-pdma.c | 23 ++--
 drivers/dma/sf-pdma/sf-pdma.h |  5 +--
 drivers/dma/sprd-dma.c|  2 +-
 drivers/dma/st_fdma.c |  2 +-
 drivers/dma/ste_dma40.c   | 17 -
 drivers/dma/sun6i-dma.c   | 33 -
 drivers/dma/tegra186-gpc-dma.c|  2 +-
 drivers/dma/tegra20-apb-dma.c | 19 +-
 drivers/dma/tegra210-adma.c   |  2 +-
 drivers/dma/ti/edma.c |  2 +-
 drivers/dma/ti/k3-udma.c  | 11 +++---
 drivers/dma/ti/omap-dma.c |  2 +-
 drivers/dma/timb_dma.c| 23 ++--
 drivers/dma/txx9dmac.c| 29 +++
 drivers/dma/txx9dmac.h|  5 +--
 drivers/dma/virt-dma.c|  9 ++---
 drivers/dma/virt-dma.h|  9 ++---
 drivers/dma/xgene-dma.c   | 21 +--
 drivers/dma/xilinx/xilinx_dma.c   | 23 ++--
 drivers/dma/xilinx/xilinx_dpdma.c | 21 +--
 drivers/dma/xilinx/zynqmp_dma.c   | 21 +--
 74 files changed, 442 insertions(+), 395 deletions(-)

diff --git 

[PATCH 1/9] hyperv: Convert from tasklet to BH workqueue

2024-03-27 Thread Allen Pais
The only generic interface to execute asynchronously in the BH context is
tasklet; however, it's marked deprecated and has some design flaws. To
replace tasklets, BH workqueue support was recently added. A BH workqueue
behaves similarly to regular workqueues except that the queued work items
are executed in the BH context.

This patch converts drivers/hv/* from tasklet to BH workqueue.

Based on the work done by Tejun Heo 
Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10

Signed-off-by: Allen Pais 
---
 drivers/hv/channel.c  |  8 
 drivers/hv/channel_mgmt.c |  5 ++---
 drivers/hv/connection.c   |  9 +
 drivers/hv/hv.c   |  3 +--
 drivers/hv/hv_balloon.c   |  4 ++--
 drivers/hv/hv_fcopy.c |  8 
 drivers/hv/hv_kvp.c   |  8 
 drivers/hv/hv_snapshot.c  |  8 
 drivers/hv/hyperv_vmbus.h |  9 +
 drivers/hv/vmbus_drv.c| 19 ++-
 include/linux/hyperv.h|  2 +-
 11 files changed, 42 insertions(+), 41 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index adbf674355b2..876d78eb4dce 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -859,7 +859,7 @@ void vmbus_reset_channel_cb(struct vmbus_channel *channel)
unsigned long flags;
 
/*
-* vmbus_on_event(), running in the per-channel tasklet, can race
+* vmbus_on_event(), running in the per-channel work, can race
 * with vmbus_close_internal() in the case of SMP guest, e.g., when
 * the former is accessing channel->inbound.ring_buffer, the latter
 * could be freeing the ring_buffer pages, so here we must stop it
@@ -871,7 +871,7 @@ void vmbus_reset_channel_cb(struct vmbus_channel *channel)
 * and that the channel ring buffer is no longer being accessed, cf.
 * the calls to napi_disable() in netvsc_device_remove().
 */
-   tasklet_disable(>callback_event);
+   disable_work_sync(>callback_event);
 
/* See the inline comments in vmbus_chan_sched(). */
spin_lock_irqsave(>sched_lock, flags);
@@ -880,8 +880,8 @@ void vmbus_reset_channel_cb(struct vmbus_channel *channel)
 
channel->sc_creation_callback = NULL;
 
-   /* Re-enable tasklet for use on re-open */
-   tasklet_enable(>callback_event);
+   /* Re-enable work for use on re-open */
+   enable_and_queue_work(system_bh_wq, >callback_event);
 }
 
 static int vmbus_close_internal(struct vmbus_channel *channel)
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 2f4d09ce027a..58397071a0de 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -353,8 +353,7 @@ static struct vmbus_channel *alloc_channel(void)
 
INIT_LIST_HEAD(>sc_list);
 
-   tasklet_init(>callback_event,
-vmbus_on_event, (unsigned long)channel);
+   INIT_WORK(>callback_event, vmbus_on_event);
 
hv_ringbuffer_pre_init(channel);
 
@@ -366,7 +365,7 @@ static struct vmbus_channel *alloc_channel(void)
  */
 static void free_channel(struct vmbus_channel *channel)
 {
-   tasklet_kill(>callback_event);
+   cancel_work_sync(>callback_event);
vmbus_remove_channel_attr_group(channel);
 
kobject_put(>kobj);
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 3cabeeabb1ca..f2a3394a8303 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -372,12 +372,13 @@ struct vmbus_channel *relid2channel(u32 relid)
  * 3. Once we return, enable signaling from the host. Once this
  *state is set we check to see if additional packets are
  *available to read. In this case we repeat the process.
- *If this tasklet has been running for a long time
+ *If this work has been running for a long time
  *then reschedule ourselves.
  */
-void vmbus_on_event(unsigned long data)
+void vmbus_on_event(struct work_struct *t)
 {
-   struct vmbus_channel *channel = (void *) data;
+   struct vmbus_channel *channel = from_work(channel, t,
+   callback_event);
void (*callback_fn)(void *context);
 
trace_vmbus_on_event(channel);
@@ -401,7 +402,7 @@ void vmbus_on_event(unsigned long data)
return;
 
hv_begin_read(>inbound);
-   tasklet_schedule(>callback_event);
+   queue_work(system_bh_wq, >callback_event);
 }
 
 /*
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index a8ad728354cb..2af92f08f9ce 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -119,8 +119,7 @@ int hv_synic_alloc(void)
for_each_present_cpu(cpu) {
hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu);
 
-   tasklet_init(_cpu->msg_dpc,
-vmbus_on_msg_dpc, (unsigned long) hv_cpu);
+   INIT_WORK(_cpu->msg_dpc, vmbus_on_msg_dpc);
 
if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) {

[PATCH 0/9] Convert Tasklets to BH Workqueues

2024-03-27 Thread Allen Pais
This patch series represents a significant shift in how asynchronous
execution in the bottom half (BH) context is handled within the kernel.
Traditionally, tasklets have been the go-to mechanism for such operations.
This series introduces the conversion of existing tasklet implementations
to the newly supported BH workqueues, marking a pivotal enhancement
in how asynchronous tasks are managed and executed.

Background and Motivation:
Tasklets have served as the kernel's lightweight mechanism for
scheduling bottom-half processing, providing a simple interface
for deferring work from interrupt context. There have been increasing
requests and motivations to deprecate and eventually remove tasklets
in favor of more modern and flexible mechanisms.

Introduction of BH Workqueues:
BH workqueues are designed to behave similarly to regular workqueues
with the added benefit of execution in the BH context.

Conversion Details:
The conversion process involved identifying all instances where
tasklets were used within the kernel and replacing them with BH workqueue
implementations.

This patch series is a first step toward broader adoption of BH workqueues
across the kernel, and soon other subsystems using tasklets will undergo
a similar transition. The groundwork laid here could serve as a
blueprint for such future conversions.

Testing Request:
In addition to a thorough review of these changes,
I kindly request that the reviwers engage in both functional and
performance testing of this patch series. Specifically, benchmarks
that measure interrupt handling efficiency, latency, and throughput.

I welcome your feedback, suggestions, and any further discussion on this
patch series.


Additional Info:
Based on the work done by Tejun Heo 
Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10

Allen Pais (9):
  hyperv: Convert from tasklet to BH workqueue
  dma: Convert from tasklet to BH workqueue
  IB: Convert from tasklet to BH workqueue
  USB: Convert from tasklet to BH workqueue
  mailbox: Convert from tasklet to BH workqueue
  ipmi: Convert from tasklet to BH workqueue
  s390: Convert from tasklet to BH workqueue
  drivers/media/*: Convert from tasklet to BH workqueue
  mmc: Convert from tasklet to BH workqueue

 drivers/char/ipmi/ipmi_msghandler.c   | 30 
 drivers/dma/altera-msgdma.c   | 15 ++--
 drivers/dma/apple-admac.c | 15 ++--
 drivers/dma/at_hdmac.c|  2 +-
 drivers/dma/at_xdmac.c| 15 ++--
 drivers/dma/bcm2835-dma.c |  2 +-
 drivers/dma/dma-axi-dmac.c|  2 +-
 drivers/dma/dma-jz4780.c  |  2 +-
 .../dma/dw-axi-dmac/dw-axi-dmac-platform.c|  2 +-
 drivers/dma/dw-edma/dw-edma-core.c|  2 +-
 drivers/dma/dw/core.c | 13 ++--
 drivers/dma/dw/regs.h |  3 +-
 drivers/dma/ep93xx_dma.c  | 15 ++--
 drivers/dma/fsl-edma-common.c |  2 +-
 drivers/dma/fsl-qdma.c|  2 +-
 drivers/dma/fsl_raid.c| 11 +--
 drivers/dma/fsl_raid.h|  2 +-
 drivers/dma/fsldma.c  | 15 ++--
 drivers/dma/fsldma.h  |  3 +-
 drivers/dma/hisi_dma.c|  2 +-
 drivers/dma/hsu/hsu.c |  2 +-
 drivers/dma/idma64.c  |  4 +-
 drivers/dma/img-mdc-dma.c |  2 +-
 drivers/dma/imx-dma.c | 27 +++
 drivers/dma/imx-sdma.c|  6 +-
 drivers/dma/ioat/dma.c| 17 +++--
 drivers/dma/ioat/dma.h|  5 +-
 drivers/dma/ioat/init.c   |  2 +-
 drivers/dma/k3dma.c   | 19 ++---
 drivers/dma/mediatek/mtk-cqdma.c  | 35 -
 drivers/dma/mediatek/mtk-hsdma.c  |  2 +-
 drivers/dma/mediatek/mtk-uart-apdma.c |  4 +-
 drivers/dma/mmp_pdma.c| 13 ++--
 drivers/dma/mmp_tdma.c| 11 +--
 drivers/dma/mpc512x_dma.c | 17 +++--
 drivers/dma/mv_xor.c  | 13 ++--
 drivers/dma/mv_xor.h  |  5 +-
 drivers/dma/mv_xor_v2.c   | 23 +++---
 drivers/dma/mxs-dma.c | 13 ++--
 drivers/dma/nbpfaxi.c | 15 ++--
 drivers/dma/owl-dma.c |  2 +-
 drivers/dma/pch_dma.c | 17 +++--
 drivers/dma/pl330.c   | 31 
 drivers/dma/plx_dma.c | 13 ++--
 drivers/dma/ppc4xx/adma.c | 17 +++--
 drivers/dma/ppc4xx/adma.h |  5 +-
 drivers/dma/pxa_dma.c |  2 +-
 drivers/dma/qcom/bam_dma.c| 35 -
 

Re: [PATCH v4] arch/powerpc/kvm: Add support for reading VPA counters for pseries guests

2024-03-27 Thread Madhavan Srinivasan


On 3/26/24 3:10 PM, Gautam Menghani wrote:

PAPR hypervisor has introduced three new counters in the VPA area of
LPAR CPUs for KVM L2 guest (see [1] for terminology) observability - 2
for context switches from host to guest and vice versa, and 1 counter
for getting the total time spent inside the KVM guest. Add a tracepoint
that enables reading the counters for use by ftrace/perf. Note that this
tracepoint is only available for nestedv2 API (i.e, KVM on PowerVM).

Also maintain an aggregation of the context switch times in vcpu->arch.
This will be useful in getting the aggregate times with a pmu driver
which will be upstreamed in the near future.

[1] Terminology:
a. L1 refers to the VM (LPAR) booted on top of PAPR hypervisor
b. L2 refers to the KVM guest booted on top of L1.

Signed-off-by: Vaibhav Jain
Signed-off-by: Gautam Menghani
---
v1 -> v2:
1. Fix the build error due to invalid struct member reference.

v2 -> v3:
1. Move the counter disabling and zeroing code to a different function.
2. Move the get_lppaca() inside the tracepoint_enabled() branch.
3. Add the aggregation logic to maintain total context switch time.

v3 -> v4:
1. After vcpu_run, check the VPA flag instead of checking for tracepoint
being enabled for disabling the cs time accumulation.

  arch/powerpc/include/asm/kvm_host.h |  5 +
  arch/powerpc/include/asm/lppaca.h   | 11 ---
  arch/powerpc/kvm/book3s_hv.c| 30 +
  arch/powerpc/kvm/trace_hv.h | 25 
  4 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 8abac5321..d953b32dd 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -847,6 +847,11 @@ struct kvm_vcpu_arch {
gpa_t nested_io_gpr;
/* For nested APIv2 guests*/
struct kvmhv_nestedv2_io nestedv2_io;
+
+   /* Aggregate context switch and guest run time info (in ns) */
+   u64 l1_to_l2_cs_agg;
+   u64 l2_to_l1_cs_agg;
+   u64 l2_runtime_agg;
  #endif
  
  #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING

diff --git a/arch/powerpc/include/asm/lppaca.h 
b/arch/powerpc/include/asm/lppaca.h
index 61ec2447d..bda6b86b9 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -62,7 +62,8 @@ struct lppaca {
u8  donate_dedicated_cpu;   /* Donate dedicated CPU cycles */
u8  fpregs_in_use;
u8  pmcregs_in_use;
-   u8  reserved8[28];
+   u8  l2_accumul_cntrs_enable;  /* Enable usage of counters for KVM 
guest */
+   u8  reserved8[27];
__be64  wait_state_cycles;  /* Wait cycles for this proc */
u8  reserved9[28];
__be16  slb_count;  /* # of SLBs to maintain */
@@ -92,9 +93,13 @@ struct lppaca {
/* cacheline 4-5 */
  
  	__be32	page_ins;		/* CMO Hint - # page ins by OS */

-   u8  reserved12[148];
+   u8  reserved12[28];
+   volatile __be64 l1_to_l2_cs_tb;
+   volatile __be64 l2_to_l1_cs_tb;
+   volatile __be64 l2_runtime_tb;
+   u8 reserved13[96];
volatile __be64 dtl_idx;/* Dispatch Trace Log head index */
-   u8  reserved13[96];
+   u8  reserved14[96];
  } cacheline_aligned;
  
  #define lppaca_of(cpu)	(*paca_ptrs[cpu]->lppaca_ptr)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8e86eb577..dcd6edd3b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4108,6 +4108,27 @@ static void vcpu_vpa_increment_dispatch(struct kvm_vcpu 
*vcpu)
}
  }
  
+static void do_trace_nested_cs_time(struct kvm_vcpu *vcpu)

+{
+   struct lppaca *lp = get_lppaca();
+   u64 l1_to_l2_ns, l2_to_l1_ns, l2_runtime_ns;
+
+   l1_to_l2_ns = tb_to_ns(be64_to_cpu(lp->l1_to_l2_cs_tb));
+   l2_to_l1_ns = tb_to_ns(be64_to_cpu(lp->l2_to_l1_cs_tb));
+   l2_runtime_ns = tb_to_ns(be64_to_cpu(lp->l2_runtime_tb));
+   trace_kvmppc_vcpu_exit_cs_time(vcpu, l1_to_l2_ns, l2_to_l1_ns,
+   l2_runtime_ns);
+   lp->l1_to_l2_cs_tb = 0;
+   lp->l2_to_l1_cs_tb = 0;
+   lp->l2_runtime_tb = 0;
+   lp->l2_accumul_cntrs_enable = 0;
+
+   // Maintain an aggregate of context switch times
+   vcpu->arch.l1_to_l2_cs_agg += l1_to_l2_ns;
+   vcpu->arch.l2_to_l1_cs_agg += l2_to_l1_ns;
+   vcpu->arch.l2_runtime_agg += l2_runtime_ns;
+}
+
  static int kvmhv_vcpu_entry_nestedv2(struct kvm_vcpu *vcpu, u64 time_limit,
 unsigned long lpcr, u64 *tb)
  {
@@ -4130,6 +4151,11 @@ static int kvmhv_vcpu_entry_nestedv2(struct kvm_vcpu 
*vcpu, u64 time_limit,
kvmppc_gse_put_u64(io->vcpu_run_input, KVMPPC_GSID_LPCR, lpcr);
  
  	accumulate_time(vcpu, >arch.in_guest);

+
+   /* Enable the guest host context switch time tracking */
+   if 

[PATCH v2 3/3] arch: Rename fbdev header and source files

2024-03-27 Thread Thomas Zimmermann
The per-architecture fbdev code has no dependencies on fbdev and can
be used for any video-related subsystem. Rename the files to 'video'.
Use video-sti.c on parisc as the source file depends on CONFIG_STI_CORE.

Further update all includes statements, includ guards, and Makefiles.
Also update a few strings and comments to refer to video instead of
fbdev.

Signed-off-by: Thomas Zimmermann 
Cc: Vineet Gupta 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Huacai Chen 
Cc: WANG Xuerui 
Cc: Geert Uytterhoeven 
Cc: Thomas Bogendoerfer 
Cc: "James E.J. Bottomley" 
Cc: Helge Deller 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: John Paul Adrian Glaubitz 
Cc: "David S. Miller" 
Cc: Andreas Larsson 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
---
 arch/arc/include/asm/fb.h|  8 
 arch/arc/include/asm/video.h |  8 
 arch/arm/include/asm/fb.h|  6 --
 arch/arm/include/asm/video.h |  6 ++
 arch/arm64/include/asm/fb.h  | 10 --
 arch/arm64/include/asm/video.h   | 10 ++
 arch/loongarch/include/asm/{fb.h => video.h} |  8 
 arch/m68k/include/asm/{fb.h => video.h}  |  8 
 arch/mips/include/asm/{fb.h => video.h}  | 12 ++--
 arch/parisc/include/asm/{fb.h => video.h}|  8 
 arch/parisc/video/Makefile   |  2 +-
 arch/parisc/video/{fbdev.c => video-sti.c}   |  2 +-
 arch/powerpc/include/asm/{fb.h => video.h}   |  8 
 arch/powerpc/kernel/pci-common.c |  2 +-
 arch/sh/include/asm/fb.h |  7 ---
 arch/sh/include/asm/video.h  |  7 +++
 arch/sparc/include/asm/{fb.h => video.h} |  8 
 arch/sparc/video/Makefile|  2 +-
 arch/sparc/video/{fbdev.c => video.c}|  4 ++--
 arch/x86/include/asm/{fb.h => video.h}   |  8 
 arch/x86/video/Makefile  |  2 +-
 arch/x86/video/{fbdev.c => video.c}  |  3 ++-
 include/asm-generic/Kbuild   |  2 +-
 include/asm-generic/{fb.h => video.h}|  6 +++---
 include/linux/fb.h   |  2 +-
 25 files changed, 75 insertions(+), 74 deletions(-)
 delete mode 100644 arch/arc/include/asm/fb.h
 create mode 100644 arch/arc/include/asm/video.h
 delete mode 100644 arch/arm/include/asm/fb.h
 create mode 100644 arch/arm/include/asm/video.h
 delete mode 100644 arch/arm64/include/asm/fb.h
 create mode 100644 arch/arm64/include/asm/video.h
 rename arch/loongarch/include/asm/{fb.h => video.h} (86%)
 rename arch/m68k/include/asm/{fb.h => video.h} (86%)
 rename arch/mips/include/asm/{fb.h => video.h} (76%)
 rename arch/parisc/include/asm/{fb.h => video.h} (68%)
 rename arch/parisc/video/{fbdev.c => video-sti.c} (96%)
 rename arch/powerpc/include/asm/{fb.h => video.h} (76%)
 delete mode 100644 arch/sh/include/asm/fb.h
 create mode 100644 arch/sh/include/asm/video.h
 rename arch/sparc/include/asm/{fb.h => video.h} (89%)
 rename arch/sparc/video/{fbdev.c => video.c} (86%)
 rename arch/x86/include/asm/{fb.h => video.h} (77%)
 rename arch/x86/video/{fbdev.c => video.c} (97%)
 rename include/asm-generic/{fb.h => video.h} (96%)

diff --git a/arch/arc/include/asm/fb.h b/arch/arc/include/asm/fb.h
deleted file mode 100644
index 9c2383d29cbb9..0
--- a/arch/arc/include/asm/fb.h
+++ /dev/null
@@ -1,8 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-
-#ifndef _ASM_FB_H_
-#define _ASM_FB_H_
-
-#include 
-
-#endif /* _ASM_FB_H_ */
diff --git a/arch/arc/include/asm/video.h b/arch/arc/include/asm/video.h
new file mode 100644
index 0..8ff7263727fe7
--- /dev/null
+++ b/arch/arc/include/asm/video.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _ASM_VIDEO_H_
+#define _ASM_VIDEO_H_
+
+#include 
+
+#endif /* _ASM_VIDEO_H_ */
diff --git a/arch/arm/include/asm/fb.h b/arch/arm/include/asm/fb.h
deleted file mode 100644
index ce20a43c30339..0
--- a/arch/arm/include/asm/fb.h
+++ /dev/null
@@ -1,6 +0,0 @@
-#ifndef _ASM_FB_H_
-#define _ASM_FB_H_
-
-#include 
-
-#endif /* _ASM_FB_H_ */
diff --git a/arch/arm/include/asm/video.h b/arch/arm/include/asm/video.h
new file mode 100644
index 0..f570565366e67
--- /dev/null
+++ b/arch/arm/include/asm/video.h
@@ -0,0 +1,6 @@
+#ifndef _ASM_VIDEO_H_
+#define _ASM_VIDEO_H_
+
+#include 
+
+#endif /* _ASM_VIDEO_H_ */
diff --git a/arch/arm64/include/asm/fb.h b/arch/arm64/include/asm/fb.h
deleted file mode 100644
index 1a495d8fb2ce0..0
--- a/arch/arm64/include/asm/fb.h
+++ /dev/null
@@ -1,10 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (C) 2012 ARM Ltd.
- */
-#ifndef __ASM_FB_H_
-#define __ASM_FB_H_
-
-#include 
-
-#endif /* __ASM_FB_H_ */
diff --git a/arch/arm64/include/asm/video.h b/arch/arm64/include/asm/video.h
new file mode 100644
index 

[PATCH v2 1/3] arch: Select fbdev helpers with CONFIG_VIDEO

2024-03-27 Thread Thomas Zimmermann
Various Kconfig options selected the per-architecture helpers for
fbdev. But none of the contained code depends on fbdev. Standardize
on CONFIG_VIDEO, which will allow to add more general helpers for
video functionality.

CONFIG_VIDEO protects each architecture's video/ directory. This
allows for the use of more fine-grained control for each directory's
files, such as the use of CONFIG_STI_CORE on parisc.

v2:
- sparc: rebased onto Makefile changes

Signed-off-by: Thomas Zimmermann 
Cc: "James E.J. Bottomley" 
Cc: Helge Deller 
Cc: "David S. Miller" 
Cc: Andreas Larsson 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
---
 arch/parisc/Makefile  | 2 +-
 arch/sparc/Makefile   | 4 ++--
 arch/sparc/video/Makefile | 2 +-
 arch/x86/Makefile | 2 +-
 arch/x86/video/Makefile   | 3 ++-
 5 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/parisc/Makefile b/arch/parisc/Makefile
index 316f84f1d15c8..21b8166a68839 100644
--- a/arch/parisc/Makefile
+++ b/arch/parisc/Makefile
@@ -119,7 +119,7 @@ export LIBGCC
 
 libs-y += arch/parisc/lib/ $(LIBGCC)
 
-drivers-y += arch/parisc/video/
+drivers-$(CONFIG_VIDEO) += arch/parisc/video/
 
 boot   := arch/parisc/boot
 
diff --git a/arch/sparc/Makefile b/arch/sparc/Makefile
index 2a03daa68f285..757451c3ea1df 100644
--- a/arch/sparc/Makefile
+++ b/arch/sparc/Makefile
@@ -59,8 +59,8 @@ endif
 libs-y += arch/sparc/prom/
 libs-y += arch/sparc/lib/
 
-drivers-$(CONFIG_PM) += arch/sparc/power/
-drivers-$(CONFIG_FB_CORE) += arch/sparc/video/
+drivers-$(CONFIG_PM)+= arch/sparc/power/
+drivers-$(CONFIG_VIDEO) += arch/sparc/video/
 
 boot := arch/sparc/boot
 
diff --git a/arch/sparc/video/Makefile b/arch/sparc/video/Makefile
index d4d83f1702c61..9dd82880a027a 100644
--- a/arch/sparc/video/Makefile
+++ b/arch/sparc/video/Makefile
@@ -1,3 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
-obj-$(CONFIG_FB_CORE) += fbdev.o
+obj-y  += fbdev.o
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 15a5f4f2ff0aa..c0ea612c62ebe 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -265,7 +265,7 @@ drivers-$(CONFIG_PCI)+= arch/x86/pci/
 # suspend and hibernation support
 drivers-$(CONFIG_PM) += arch/x86/power/
 
-drivers-$(CONFIG_FB_CORE) += arch/x86/video/
+drivers-$(CONFIG_VIDEO) += arch/x86/video/
 
 
 # boot loader support. Several targets are kept for legacy purposes
diff --git a/arch/x86/video/Makefile b/arch/x86/video/Makefile
index 5ebe48752ffc4..9dd82880a027a 100644
--- a/arch/x86/video/Makefile
+++ b/arch/x86/video/Makefile
@@ -1,2 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_FB_CORE)  += fbdev.o
+
+obj-y  += fbdev.o
-- 
2.44.0



[PATCH v2 2/3] arch: Remove struct fb_info from video helpers

2024-03-27 Thread Thomas Zimmermann
The per-architecture video helpers do not depend on struct fb_info
or anything else from fbdev. Remove it from the interface and replace
fb_is_primary_device() with video_is_primary_device(). The new helper
is similar in functionality, but can operate on non-fbdev devices.

Signed-off-by: Thomas Zimmermann 
Cc: "James E.J. Bottomley" 
Cc: Helge Deller 
Cc: "David S. Miller" 
Cc: Andreas Larsson 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
---
 arch/parisc/include/asm/fb.h |  8 +---
 arch/parisc/video/fbdev.c|  9 +
 arch/sparc/include/asm/fb.h  |  7 ---
 arch/sparc/video/fbdev.c | 17 -
 arch/x86/include/asm/fb.h|  8 +---
 arch/x86/video/fbdev.c   | 18 +++---
 drivers/video/fbdev/core/fbcon.c |  2 +-
 include/asm-generic/fb.h | 11 ++-
 8 files changed, 41 insertions(+), 39 deletions(-)

diff --git a/arch/parisc/include/asm/fb.h b/arch/parisc/include/asm/fb.h
index 658a8a7dc5312..ed2a195a3e762 100644
--- a/arch/parisc/include/asm/fb.h
+++ b/arch/parisc/include/asm/fb.h
@@ -2,11 +2,13 @@
 #ifndef _ASM_FB_H_
 #define _ASM_FB_H_
 
-struct fb_info;
+#include 
+
+struct device;
 
 #if defined(CONFIG_STI_CORE)
-int fb_is_primary_device(struct fb_info *info);
-#define fb_is_primary_device fb_is_primary_device
+bool video_is_primary_device(struct device *dev);
+#define video_is_primary_device video_is_primary_device
 #endif
 
 #include 
diff --git a/arch/parisc/video/fbdev.c b/arch/parisc/video/fbdev.c
index e4f8ac99fc9e0..540fa0c919d59 100644
--- a/arch/parisc/video/fbdev.c
+++ b/arch/parisc/video/fbdev.c
@@ -5,12 +5,13 @@
  * Copyright (C) 2001-2002 Thomas Bogendoerfer 
  */
 
-#include 
 #include 
 
 #include 
 
-int fb_is_primary_device(struct fb_info *info)
+#include 
+
+bool video_is_primary_device(struct device *dev)
 {
struct sti_struct *sti;
 
@@ -21,6 +22,6 @@ int fb_is_primary_device(struct fb_info *info)
return true;
 
/* return true if it's the default built-in framebuffer driver */
-   return (sti->dev == info->device);
+   return (sti->dev == dev);
 }
-EXPORT_SYMBOL(fb_is_primary_device);
+EXPORT_SYMBOL(video_is_primary_device);
diff --git a/arch/sparc/include/asm/fb.h b/arch/sparc/include/asm/fb.h
index 24440c0fda490..07f0325d6921c 100644
--- a/arch/sparc/include/asm/fb.h
+++ b/arch/sparc/include/asm/fb.h
@@ -3,10 +3,11 @@
 #define _SPARC_FB_H_
 
 #include 
+#include 
 
 #include 
 
-struct fb_info;
+struct device;
 
 #ifdef CONFIG_SPARC32
 static inline pgprot_t pgprot_framebuffer(pgprot_t prot,
@@ -18,8 +19,8 @@ static inline pgprot_t pgprot_framebuffer(pgprot_t prot,
 #define pgprot_framebuffer pgprot_framebuffer
 #endif
 
-int fb_is_primary_device(struct fb_info *info);
-#define fb_is_primary_device fb_is_primary_device
+bool video_is_primary_device(struct device *dev);
+#define video_is_primary_device video_is_primary_device
 
 static inline void fb_memcpy_fromio(void *to, const volatile void __iomem 
*from, size_t n)
 {
diff --git a/arch/sparc/video/fbdev.c b/arch/sparc/video/fbdev.c
index bff66dd1909a4..e46f0499c2774 100644
--- a/arch/sparc/video/fbdev.c
+++ b/arch/sparc/video/fbdev.c
@@ -1,26 +1,25 @@
 // SPDX-License-Identifier: GPL-2.0
 
 #include 
-#include 
+#include 
 #include 
 
+#include 
 #include 
 
-int fb_is_primary_device(struct fb_info *info)
+bool video_is_primary_device(struct device *dev)
 {
-   struct device *dev = info->device;
-   struct device_node *node;
+   struct device_node *node = dev->of_node;
 
if (console_set_on_cmdline)
-   return 0;
+   return false;
 
-   node = dev->of_node;
if (node && node == of_console_device)
-   return 1;
+   return true;
 
-   return 0;
+   return false;
 }
-EXPORT_SYMBOL(fb_is_primary_device);
+EXPORT_SYMBOL(video_is_primary_device);
 
 MODULE_DESCRIPTION("Sparc fbdev helpers");
 MODULE_LICENSE("GPL");
diff --git a/arch/x86/include/asm/fb.h b/arch/x86/include/asm/fb.h
index c3b9582de7efd..999db33792869 100644
--- a/arch/x86/include/asm/fb.h
+++ b/arch/x86/include/asm/fb.h
@@ -2,17 +2,19 @@
 #ifndef _ASM_X86_FB_H
 #define _ASM_X86_FB_H
 
+#include 
+
 #include 
 
-struct fb_info;
+struct device;
 
 pgprot_t pgprot_framebuffer(pgprot_t prot,
unsigned long vm_start, unsigned long vm_end,
unsigned long offset);
 #define pgprot_framebuffer pgprot_framebuffer
 
-int fb_is_primary_device(struct fb_info *info);
-#define fb_is_primary_device fb_is_primary_device
+bool video_is_primary_device(struct device *dev);
+#define video_is_primary_device video_is_primary_device
 
 #include 
 
diff --git a/arch/x86/video/fbdev.c b/arch/x86/video/fbdev.c
index 1dd6528cc947c..4d87ce8e257fe 100644
--- a/arch/x86/video/fbdev.c
+++ b/arch/x86/video/fbdev.c
@@ -7,7 +7,6 @@
  *
  */
 
-#include 
 #include 
 

[PATCH v2 0/3] arch: Remove fbdev dependency from video helpers

2024-03-27 Thread Thomas Zimmermann
Make architecture helpers for display functionality depend on general
video functionality instead of fbdev. This avoids the dependency on
fbdev and makes the functionality available for non-fbdev code.

Patch 1 replaces the variety of Kconfig options that control the
Makefiles with CONFIG_VIDEO. More fine-grained control of the build
can then be done within each video/ directory; see parisc for an
example.

Patch 2 replaces fb_is_primary_device() with video_is_primary_device(),
which has no dependencies on fbdev. The implementation remains identical
on all affected platforms. There's one minor change in fbcon, which is
the only caller of fb_is_primary_device().

Patch 3 renames the source and files from fbdev to video.

v2:
- improve cover letter
- rebase onto v6.9-rc1

Thomas Zimmermann (3):
  arch: Select fbdev helpers with CONFIG_VIDEO
  arch: Remove struct fb_info from video helpers
  arch: Rename fbdev header and source files

 arch/arc/include/asm/fb.h|  8 --
 arch/arc/include/asm/video.h |  8 ++
 arch/arm/include/asm/fb.h|  6 -
 arch/arm/include/asm/video.h |  6 +
 arch/arm64/include/asm/fb.h  | 10 
 arch/arm64/include/asm/video.h   | 10 
 arch/loongarch/include/asm/{fb.h => video.h} |  8 +++---
 arch/m68k/include/asm/{fb.h => video.h}  |  8 +++---
 arch/mips/include/asm/{fb.h => video.h}  | 12 -
 arch/parisc/Makefile |  2 +-
 arch/parisc/include/asm/fb.h | 14 ---
 arch/parisc/include/asm/video.h  | 16 
 arch/parisc/video/Makefile   |  2 +-
 arch/parisc/video/{fbdev.c => video-sti.c}   |  9 ---
 arch/powerpc/include/asm/{fb.h => video.h}   |  8 +++---
 arch/powerpc/kernel/pci-common.c |  2 +-
 arch/sh/include/asm/fb.h |  7 --
 arch/sh/include/asm/video.h  |  7 ++
 arch/sparc/Makefile  |  4 +--
 arch/sparc/include/asm/{fb.h => video.h} | 15 +--
 arch/sparc/video/Makefile|  2 +-
 arch/sparc/video/fbdev.c | 26 
 arch/sparc/video/video.c | 25 +++
 arch/x86/Makefile|  2 +-
 arch/x86/include/asm/fb.h| 19 --
 arch/x86/include/asm/video.h | 21 
 arch/x86/video/Makefile  |  3 ++-
 arch/x86/video/{fbdev.c => video.c}  | 21 +++-
 drivers/video/fbdev/core/fbcon.c |  2 +-
 include/asm-generic/Kbuild   |  2 +-
 include/asm-generic/{fb.h => video.h}| 17 +++--
 include/linux/fb.h   |  2 +-
 32 files changed, 154 insertions(+), 150 deletions(-)
 delete mode 100644 arch/arc/include/asm/fb.h
 create mode 100644 arch/arc/include/asm/video.h
 delete mode 100644 arch/arm/include/asm/fb.h
 create mode 100644 arch/arm/include/asm/video.h
 delete mode 100644 arch/arm64/include/asm/fb.h
 create mode 100644 arch/arm64/include/asm/video.h
 rename arch/loongarch/include/asm/{fb.h => video.h} (86%)
 rename arch/m68k/include/asm/{fb.h => video.h} (86%)
 rename arch/mips/include/asm/{fb.h => video.h} (76%)
 delete mode 100644 arch/parisc/include/asm/fb.h
 create mode 100644 arch/parisc/include/asm/video.h
 rename arch/parisc/video/{fbdev.c => video-sti.c} (78%)
 rename arch/powerpc/include/asm/{fb.h => video.h} (76%)
 delete mode 100644 arch/sh/include/asm/fb.h
 create mode 100644 arch/sh/include/asm/video.h
 rename arch/sparc/include/asm/{fb.h => video.h} (75%)
 delete mode 100644 arch/sparc/video/fbdev.c
 create mode 100644 arch/sparc/video/video.c
 delete mode 100644 arch/x86/include/asm/fb.h
 create mode 100644 arch/x86/include/asm/video.h
 rename arch/x86/video/{fbdev.c => video.c} (66%)
 rename include/asm-generic/{fb.h => video.h} (89%)

-- 
2.44.0



[PATCH v3 14/14] selftests/fpu: Allow building on other architectures

2024-03-27 Thread Samuel Holland
Now that ARCH_HAS_KERNEL_FPU_SUPPORT provides a common way to compile
and run floating-point code, this test is no longer x86-specific.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Samuel Holland 
---

(no changes since v1)

 lib/Kconfig.debug   |  2 +-
 lib/Makefile| 25 ++---
 lib/test_fpu_glue.c |  5 -
 3 files changed, 7 insertions(+), 25 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c63a5fbf1f1c..f93e778e0405 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2890,7 +2890,7 @@ config TEST_FREE_PAGES
 
 config TEST_FPU
tristate "Test floating point operations in kernel space"
-   depends on X86 && !KCOV_INSTRUMENT_ALL
+   depends on ARCH_HAS_KERNEL_FPU_SUPPORT && !KCOV_INSTRUMENT_ALL
help
  Enable this option to add /sys/kernel/debug/selftest_helpers/test_fpu
  which will trigger a sequence of floating point operations. This is 
used
diff --git a/lib/Makefile b/lib/Makefile
index fcb35bf50979..e44ad11f77b5 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -110,31 +110,10 @@ CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE)
 obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o
 obj-$(CONFIG_TEST_OBJPOOL) += test_objpool.o
 
-#
-# CFLAGS for compiling floating point code inside the kernel. x86/Makefile 
turns
-# off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS
-# get appended last to CFLAGS and thus override those previous compiler 
options.
-#
-FPU_CFLAGS := -msse -msse2
-ifdef CONFIG_CC_IS_GCC
-# Stack alignment mismatch, proceed with caution.
-# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
-# (8B stack alignment).
-# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383
-#
-# The "-msse" in the first argument is there so that the
-# -mpreferred-stack-boundary=3 build error:
-#
-#  -mpreferred-stack-boundary=3 is not between 4 and 12
-#
-# can be triggered. Otherwise gcc doesn't complain.
-FPU_CFLAGS += -mhard-float
-FPU_CFLAGS += $(call cc-option,-msse 
-mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4)
-endif
-
 obj-$(CONFIG_TEST_FPU) += test_fpu.o
 test_fpu-y := test_fpu_glue.o test_fpu_impl.o
-CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS)
+CFLAGS_test_fpu_impl.o += $(CC_FLAGS_FPU)
+CFLAGS_REMOVE_test_fpu_impl.o += $(CC_FLAGS_NO_FPU)
 
 # Some KUnit files (hooks.o) need to be built-in even when KUnit is a module,
 # so we can't just use obj-$(CONFIG_KUNIT).
diff --git a/lib/test_fpu_glue.c b/lib/test_fpu_glue.c
index 85963d7be826..eef282a2715f 100644
--- a/lib/test_fpu_glue.c
+++ b/lib/test_fpu_glue.c
@@ -17,7 +17,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include "test_fpu.h"
 
@@ -38,6 +38,9 @@ static struct dentry *selftest_dir;
 
 static int __init test_fpu_init(void)
 {
+   if (!kernel_fpu_available())
+   return -EINVAL;
+
selftest_dir = debugfs_create_dir("selftest_helpers", NULL);
if (!selftest_dir)
return -ENOMEM;
-- 
2.43.1



[PATCH v3 13/14] selftests/fpu: Move FP code to a separate translation unit

2024-03-27 Thread Samuel Holland
This ensures no compiler-generated floating-point code can appear
outside kernel_fpu_{begin,end}() sections, and some architectures
enforce this separation.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Samuel Holland 
---

(no changes since v2)

Changes in v2:
 - Declare test_fpu() in a header

 lib/Makefile|  3 ++-
 lib/test_fpu.h  |  8 +++
 lib/{test_fpu.c => test_fpu_glue.c} | 32 +
 lib/test_fpu_impl.c | 37 +
 4 files changed, 48 insertions(+), 32 deletions(-)
 create mode 100644 lib/test_fpu.h
 rename lib/{test_fpu.c => test_fpu_glue.c} (71%)
 create mode 100644 lib/test_fpu_impl.c

diff --git a/lib/Makefile b/lib/Makefile
index ffc6b2341b45..fcb35bf50979 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -133,7 +133,8 @@ FPU_CFLAGS += $(call cc-option,-msse 
-mpreferred-stack-boundary=3,-mpreferred-st
 endif
 
 obj-$(CONFIG_TEST_FPU) += test_fpu.o
-CFLAGS_test_fpu.o += $(FPU_CFLAGS)
+test_fpu-y := test_fpu_glue.o test_fpu_impl.o
+CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS)
 
 # Some KUnit files (hooks.o) need to be built-in even when KUnit is a module,
 # so we can't just use obj-$(CONFIG_KUNIT).
diff --git a/lib/test_fpu.h b/lib/test_fpu.h
new file mode 100644
index ..4459807084bc
--- /dev/null
+++ b/lib/test_fpu.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#ifndef _LIB_TEST_FPU_H
+#define _LIB_TEST_FPU_H
+
+int test_fpu(void);
+
+#endif
diff --git a/lib/test_fpu.c b/lib/test_fpu_glue.c
similarity index 71%
rename from lib/test_fpu.c
rename to lib/test_fpu_glue.c
index e82db19fed84..85963d7be826 100644
--- a/lib/test_fpu.c
+++ b/lib/test_fpu_glue.c
@@ -19,37 +19,7 @@
 #include 
 #include 
 
-static int test_fpu(void)
-{
-   /*
-* This sequence of operations tests that rounding mode is
-* to nearest and that denormal numbers are supported.
-* Volatile variables are used to avoid compiler optimizing
-* the calculations away.
-*/
-   volatile double a, b, c, d, e, f, g;
-
-   a = 4.0;
-   b = 1e-15;
-   c = 1e-310;
-
-   /* Sets precision flag */
-   d = a + b;
-
-   /* Result depends on rounding mode */
-   e = a + b / 2;
-
-   /* Denormal and very large values */
-   f = b / c;
-
-   /* Depends on denormal support */
-   g = a + c * f;
-
-   if (d > a && e > a && g > a)
-   return 0;
-   else
-   return -EINVAL;
-}
+#include "test_fpu.h"
 
 static int test_fpu_get(void *data, u64 *val)
 {
diff --git a/lib/test_fpu_impl.c b/lib/test_fpu_impl.c
new file mode 100644
index ..777894dbbe86
--- /dev/null
+++ b/lib/test_fpu_impl.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include 
+
+#include "test_fpu.h"
+
+int test_fpu(void)
+{
+   /*
+* This sequence of operations tests that rounding mode is
+* to nearest and that denormal numbers are supported.
+* Volatile variables are used to avoid compiler optimizing
+* the calculations away.
+*/
+   volatile double a, b, c, d, e, f, g;
+
+   a = 4.0;
+   b = 1e-15;
+   c = 1e-310;
+
+   /* Sets precision flag */
+   d = a + b;
+
+   /* Result depends on rounding mode */
+   e = a + b / 2;
+
+   /* Denormal and very large values */
+   f = b / c;
+
+   /* Depends on denormal support */
+   g = a + c * f;
+
+   if (d > a && e > a && g > a)
+   return 0;
+   else
+   return -EINVAL;
+}
-- 
2.43.1



[PATCH v3 12/14] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT

2024-03-27 Thread Samuel Holland
Now that all previously-supported architectures select
ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
of the existing list of architectures. It can also take advantage of the
common kernel-mode FPU API and method of adjusting CFLAGS.

Acked-by: Alex Deucher 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Samuel Holland 
---

(no changes since v2)

Changes in v2:
 - Split altivec removal to a separate patch
 - Use linux/fpu.h instead of asm/fpu.h in consumers

 drivers/gpu/drm/amd/display/Kconfig   |  2 +-
 .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c| 27 ++
 drivers/gpu/drm/amd/display/dc/dml/Makefile   | 36 ++-
 drivers/gpu/drm/amd/display/dc/dml2/Makefile  | 36 ++-
 4 files changed, 7 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/Kconfig 
b/drivers/gpu/drm/amd/display/Kconfig
index 901d1961b739..5fcd4f778dc3 100644
--- a/drivers/gpu/drm/amd/display/Kconfig
+++ b/drivers/gpu/drm/amd/display/Kconfig
@@ -8,7 +8,7 @@ config DRM_AMD_DC
depends on BROKEN || !CC_IS_CLANG || ARM64 || RISCV || SPARC64 || X86_64
select SND_HDA_COMPONENT if SND_HDA_CORE
# !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752
-   select DRM_AMD_DC_FP if (X86 || LOONGARCH || (PPC64 && ALTIVEC) || 
(ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG))
+   select DRM_AMD_DC_FP if ARCH_HAS_KERNEL_FPU_SUPPORT && (!ARM64 || 
!CC_IS_CLANG)
help
  Choose this option if you want to use the new display engine
  support for AMDGPU. This adds required support for Vega and
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
index 0de16796466b..e46f8ce41d87 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
@@ -26,16 +26,7 @@
 
 #include "dc_trace.h"
 
-#if defined(CONFIG_X86)
-#include 
-#elif defined(CONFIG_PPC64)
-#include 
-#include 
-#elif defined(CONFIG_ARM64)
-#include 
-#elif defined(CONFIG_LOONGARCH)
-#include 
-#endif
+#include 
 
 /**
  * DOC: DC FPU manipulation overview
@@ -87,16 +78,9 @@ void dc_fpu_begin(const char *function_name, const int line)
WARN_ON_ONCE(!in_task());
preempt_disable();
depth = __this_cpu_inc_return(fpu_recursion_depth);
-
if (depth == 1) {
-#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
+   BUG_ON(!kernel_fpu_available());
kernel_fpu_begin();
-#elif defined(CONFIG_PPC64)
-   if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
-   enable_kernel_fp();
-#elif defined(CONFIG_ARM64)
-   kernel_neon_begin();
-#endif
}
 
TRACE_DCN_FPU(true, function_name, line, depth);
@@ -118,14 +102,7 @@ void dc_fpu_end(const char *function_name, const int line)
 
depth = __this_cpu_dec_return(fpu_recursion_depth);
if (depth == 0) {
-#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
kernel_fpu_end();
-#elif defined(CONFIG_PPC64)
-   if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
-   disable_kernel_fp();
-#elif defined(CONFIG_ARM64)
-   kernel_neon_end();
-#endif
} else {
WARN_ON_ONCE(depth < 0);
}
diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile 
b/drivers/gpu/drm/amd/display/dc/dml/Makefile
index 59d3972341d2..a94b6d546cd1 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
@@ -25,40 +25,8 @@
 # It provides the general basic services required by other DAL
 # subcomponents.
 
-ifdef CONFIG_X86
-dml_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float
-dml_ccflags := $(dml_ccflags-y) -msse
-endif
-
-ifdef CONFIG_PPC64
-dml_ccflags := -mhard-float
-endif
-
-ifdef CONFIG_ARM64
-dml_rcflags := -mgeneral-regs-only
-endif
-
-ifdef CONFIG_LOONGARCH
-dml_ccflags := -mfpu=64
-dml_rcflags := -msoft-float
-endif
-
-ifdef CONFIG_CC_IS_GCC
-ifneq ($(call gcc-min-version, 70100),y)
-IS_OLD_GCC = 1
-endif
-endif
-
-ifdef CONFIG_X86
-ifdef IS_OLD_GCC
-# Stack alignment mismatch, proceed with caution.
-# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
-# (8B stack alignment).
-dml_ccflags += -mpreferred-stack-boundary=4
-else
-dml_ccflags += -msse2
-endif
-endif
+dml_ccflags := $(CC_FLAGS_FPU)
+dml_rcflags := $(CC_FLAGS_NO_FPU)
 
 ifneq ($(CONFIG_FRAME_WARN),0)
 ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y)
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile 
b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
index 7b51364084b5..4f6c804a26ad 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
@@ -24,40 +24,8 @@
 #
 # Makefile for dml2.
 
-ifdef CONFIG_X86
-dml2_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float
-dml2_ccflags := $(dml2_ccflags-y) -msse
-endif
-
-ifdef CONFIG_PPC64
-dml2_ccflags := 

[PATCH v3 11/14] drm/amd/display: Only use hard-float, not altivec on powerpc

2024-03-27 Thread Samuel Holland
From: Michael Ellerman 

The compiler flags enable altivec, but that is not required; hard-float
is sufficient for the code to build and function.

Drop altivec from the compiler flags and adjust the enable/disable code
to only enable FPU use.

Signed-off-by: Michael Ellerman 
Acked-by: Alex Deucher 
Signed-off-by: Samuel Holland 
---

(no changes since v2)

Changes in v2:
 - New patch for v2

 drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 12 ++--
 drivers/gpu/drm/amd/display/dc/dml/Makefile|  2 +-
 drivers/gpu/drm/amd/display/dc/dml2/Makefile   |  2 +-
 3 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
index 4ae4720535a5..0de16796466b 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
@@ -92,11 +92,7 @@ void dc_fpu_begin(const char *function_name, const int line)
 #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
kernel_fpu_begin();
 #elif defined(CONFIG_PPC64)
-   if (cpu_has_feature(CPU_FTR_VSX_COMP))
-   enable_kernel_vsx();
-   else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
-   enable_kernel_altivec();
-   else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
+   if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
enable_kernel_fp();
 #elif defined(CONFIG_ARM64)
kernel_neon_begin();
@@ -125,11 +121,7 @@ void dc_fpu_end(const char *function_name, const int line)
 #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
kernel_fpu_end();
 #elif defined(CONFIG_PPC64)
-   if (cpu_has_feature(CPU_FTR_VSX_COMP))
-   disable_kernel_vsx();
-   else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
-   disable_kernel_altivec();
-   else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
+   if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
disable_kernel_fp();
 #elif defined(CONFIG_ARM64)
kernel_neon_end();
diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile 
b/drivers/gpu/drm/amd/display/dc/dml/Makefile
index c4a5efd2dda5..59d3972341d2 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
@@ -31,7 +31,7 @@ dml_ccflags := $(dml_ccflags-y) -msse
 endif
 
 ifdef CONFIG_PPC64
-dml_ccflags := -mhard-float -maltivec
+dml_ccflags := -mhard-float
 endif
 
 ifdef CONFIG_ARM64
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile 
b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
index acff3449b8d7..7b51364084b5 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
@@ -30,7 +30,7 @@ dml2_ccflags := $(dml2_ccflags-y) -msse
 endif
 
 ifdef CONFIG_PPC64
-dml2_ccflags := -mhard-float -maltivec
+dml2_ccflags := -mhard-float
 endif
 
 ifdef CONFIG_ARM64
-- 
2.43.1



[PATCH v3 10/14] riscv: Add support for kernel-mode FPU

2024-03-27 Thread Samuel Holland
This is motivated by the amdgpu DRM driver, which needs floating-point
code to support recent hardware. That code is not performance-critical,
so only provide a minimal non-preemptible implementation for now.

Support is limited to riscv64 because riscv32 requires runtime (libgcc)
assistance to convert between doubles and 64-bit integers.

Acked-by: Palmer Dabbelt 
Reviewed-by: Palmer Dabbelt 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Samuel Holland 
---

Changes in v3:
 - Rebase on v6.9-rc1
 - Limit ARCH_HAS_KERNEL_FPU_SUPPORT to 64BIT

Changes in v2:
 - Remove RISC-V architecture-specific preprocessor check

 arch/riscv/Kconfig  |  1 +
 arch/riscv/Makefile |  3 +++
 arch/riscv/include/asm/fpu.h| 16 
 arch/riscv/kernel/Makefile  |  1 +
 arch/riscv/kernel/kernel_mode_fpu.c | 28 
 5 files changed, 49 insertions(+)
 create mode 100644 arch/riscv/include/asm/fpu.h
 create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index be09c8836d56..3bcd0d250810 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -27,6 +27,7 @@ config RISCV
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_HAS_KCOV
+   select ARCH_HAS_KERNEL_FPU_SUPPORT if 64BIT && FPU
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_MMIOWB
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 252d63942f34..76ff4033c854 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -84,6 +84,9 @@ KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed 
-E 's/(rv32ima|rv64i
 
 KBUILD_AFLAGS += -march=$(riscv-march-y)
 
+# For C code built with floating-point support, exclude V but keep F and D.
+CC_FLAGS_FPU  := -march=$(shell echo $(riscv-march-y) | sed -E 
's/(rv32ima|rv64ima)([^v_]*)v?/\1\2/')
+
 KBUILD_CFLAGS += -mno-save-restore
 KBUILD_CFLAGS += -DCONFIG_PAGE_OFFSET=$(CONFIG_PAGE_OFFSET)
 
diff --git a/arch/riscv/include/asm/fpu.h b/arch/riscv/include/asm/fpu.h
new file mode 100644
index ..91c04c244e12
--- /dev/null
+++ b/arch/riscv/include/asm/fpu.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef _ASM_RISCV_FPU_H
+#define _ASM_RISCV_FPU_H
+
+#include 
+
+#define kernel_fpu_available() has_fpu()
+
+void kernel_fpu_begin(void);
+void kernel_fpu_end(void);
+
+#endif /* ! _ASM_RISCV_FPU_H */
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 81d94a8ee10f..5b243d46f4b1 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -67,6 +67,7 @@ obj-$(CONFIG_RISCV_MISALIGNED)+= 
unaligned_access_speed.o
 obj-$(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS) += copy-unaligned.o
 
 obj-$(CONFIG_FPU)  += fpu.o
+obj-$(CONFIG_FPU)  += kernel_mode_fpu.o
 obj-$(CONFIG_RISCV_ISA_V)  += vector.o
 obj-$(CONFIG_RISCV_ISA_V)  += kernel_mode_vector.o
 obj-$(CONFIG_SMP)  += smpboot.o
diff --git a/arch/riscv/kernel/kernel_mode_fpu.c 
b/arch/riscv/kernel/kernel_mode_fpu.c
new file mode 100644
index ..0ac8348876c4
--- /dev/null
+++ b/arch/riscv/kernel/kernel_mode_fpu.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+void kernel_fpu_begin(void)
+{
+   preempt_disable();
+   fstate_save(current, task_pt_regs(current));
+   csr_set(CSR_SSTATUS, SR_FS);
+}
+EXPORT_SYMBOL_GPL(kernel_fpu_begin);
+
+void kernel_fpu_end(void)
+{
+   csr_clear(CSR_SSTATUS, SR_FS);
+   fstate_restore(current, task_pt_regs(current));
+   preempt_enable();
+}
+EXPORT_SYMBOL_GPL(kernel_fpu_end);
-- 
2.43.1



[PATCH v3 09/14] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

2024-03-27 Thread Samuel Holland
x86 already provides kernel_fpu_begin() and kernel_fpu_end(), but in a
different header. Add a wrapper header, and export the CFLAGS
adjustments as found in lib/Makefile.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Samuel Holland 
---

(no changes since v1)

 arch/x86/Kconfig   |  1 +
 arch/x86/Makefile  | 20 
 arch/x86/include/asm/fpu.h | 13 +
 3 files changed, 34 insertions(+)
 create mode 100644 arch/x86/include/asm/fpu.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 39886bab943a..7c9d032ee675 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -83,6 +83,7 @@ config X86
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_KCOVif X86_64
+   select ARCH_HAS_KERNEL_FPU_SUPPORT
select ARCH_HAS_MEM_ENCRYPT
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 662d9d4033e6..5a5f5999c505 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -74,6 +74,26 @@ KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow 
-mno-avx
 KBUILD_RUSTFLAGS += --target=$(objtree)/scripts/target.json
 KBUILD_RUSTFLAGS += 
-Ctarget-feature=-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-avx,-avx2
 
+#
+# CFLAGS for compiling floating point code inside the kernel.
+#
+CC_FLAGS_FPU := -msse -msse2
+ifdef CONFIG_CC_IS_GCC
+# Stack alignment mismatch, proceed with caution.
+# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
+# (8B stack alignment).
+# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383
+#
+# The "-msse" in the first argument is there so that the
+# -mpreferred-stack-boundary=3 build error:
+#
+#  -mpreferred-stack-boundary=3 is not between 4 and 12
+#
+# can be triggered. Otherwise gcc doesn't complain.
+CC_FLAGS_FPU += -mhard-float
+CC_FLAGS_FPU += $(call cc-option,-msse 
-mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4)
+endif
+
 ifeq ($(CONFIG_X86_KERNEL_IBT),y)
 #
 # Kernel IBT has S_CET.NOTRACK_EN=0, as such the compilers must not generate
diff --git a/arch/x86/include/asm/fpu.h b/arch/x86/include/asm/fpu.h
new file mode 100644
index ..b2743fe19339
--- /dev/null
+++ b/arch/x86/include/asm/fpu.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef _ASM_X86_FPU_H
+#define _ASM_X86_FPU_H
+
+#include 
+
+#define kernel_fpu_available() true
+
+#endif /* ! _ASM_X86_FPU_H */
-- 
2.43.1



[PATCH v3 08/14] powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

2024-03-27 Thread Samuel Holland
PowerPC provides an equivalent to the common kernel-mode FPU API, but in
a different header and using different function names. The PowerPC API
also requires a non-preemptible context. Add a wrapper header, and
export the CFLAGS adjustments.

Acked-by: Michael Ellerman  (powerpc)
Reviewed-by: Christoph Hellwig 
Signed-off-by: Samuel Holland 
---

(no changes since v1)

 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/Makefile  |  5 -
 arch/powerpc/include/asm/fpu.h | 28 
 3 files changed, 33 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/fpu.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1c4be3373686..c42a57b6839d 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -137,6 +137,7 @@ config PPC
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_HUGEPD  if HUGETLB_PAGE
select ARCH_HAS_KCOV
+   select ARCH_HAS_KERNEL_FPU_SUPPORT  if PPC_FPU
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_MEMREMAP_COMPAT_ALIGN   if PPC_64S_HASH_MMU
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 65261cbe5bfd..93d89f055b70 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -153,6 +153,9 @@ CFLAGS-$(CONFIG_PPC32)  += $(call cc-option, 
$(MULTIPLEWORD))
 
 CFLAGS-$(CONFIG_PPC32) += $(call cc-option,-mno-readonly-in-sdata)
 
+CC_FLAGS_FPU   := $(call cc-option,-mhard-float)
+CC_FLAGS_NO_FPU:= $(call cc-option,-msoft-float)
+
 ifdef CONFIG_FUNCTION_TRACER
 ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
 KBUILD_CPPFLAGS+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY
@@ -174,7 +177,7 @@ asinstr := $(call as-instr,lis 
9$(comma)foo@high,-DHAVE_AS_ATHIGH=1)
 
 KBUILD_CPPFLAGS+= -I $(srctree)/arch/powerpc $(asinstr)
 KBUILD_AFLAGS  += $(AFLAGS-y)
-KBUILD_CFLAGS  += $(call cc-option,-msoft-float)
+KBUILD_CFLAGS  += $(CC_FLAGS_NO_FPU)
 KBUILD_CFLAGS  += $(CFLAGS-y)
 CPP= $(CC) -E $(KBUILD_CFLAGS)
 
diff --git a/arch/powerpc/include/asm/fpu.h b/arch/powerpc/include/asm/fpu.h
new file mode 100644
index ..ca584e4bc40f
--- /dev/null
+++ b/arch/powerpc/include/asm/fpu.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef _ASM_POWERPC_FPU_H
+#define _ASM_POWERPC_FPU_H
+
+#include 
+
+#include 
+#include 
+
+#define kernel_fpu_available() (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
+
+static inline void kernel_fpu_begin(void)
+{
+   preempt_disable();
+   enable_kernel_fp();
+}
+
+static inline void kernel_fpu_end(void)
+{
+   disable_kernel_fp();
+   preempt_enable();
+}
+
+#endif /* ! _ASM_POWERPC_FPU_H */
-- 
2.43.1



[PATCH v3 07/14] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

2024-03-27 Thread Samuel Holland
LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in
asm/fpu.h, so it only needs to add kernel_fpu_available() and export
the CFLAGS adjustments.

Acked-by: WANG Xuerui 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Samuel Holland 
---

Changes in v3:
 - Rebase on v6.9-rc1

 arch/loongarch/Kconfig   | 1 +
 arch/loongarch/Makefile  | 5 -
 arch/loongarch/include/asm/fpu.h | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index a5f300ec6f28..2266c6c41c38 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -18,6 +18,7 @@ config LOONGARCH
select ARCH_HAS_CURRENT_STACK_POINTER
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_KCOV
+   select ARCH_HAS_KERNEL_FPU_SUPPORT if CPU_HAS_FPU
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PTE_SPECIAL
diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
index df6caf79537a..efb5440a43ec 100644
--- a/arch/loongarch/Makefile
+++ b/arch/loongarch/Makefile
@@ -26,6 +26,9 @@ endif
 32bit-emul = elf32loongarch
 64bit-emul = elf64loongarch
 
+CC_FLAGS_FPU   := -mfpu=64
+CC_FLAGS_NO_FPU:= -msoft-float
+
 ifdef CONFIG_UNWINDER_ORC
 orc_hash_h := arch/$(SRCARCH)/include/generated/asm/orc_hash.h
 orc_hash_sh := $(srctree)/scripts/orc_hash.sh
@@ -59,7 +62,7 @@ ld-emul   = $(64bit-emul)
 cflags-y   += -mabi=lp64s
 endif
 
-cflags-y   += -pipe -msoft-float
+cflags-y   += -pipe $(CC_FLAGS_NO_FPU)
 LDFLAGS_vmlinux+= -static -n -nostdlib
 
 # When the assembler supports explicit relocation hint, we must use it.
diff --git a/arch/loongarch/include/asm/fpu.h b/arch/loongarch/include/asm/fpu.h
index c2d8962fda00..3177674228f8 100644
--- a/arch/loongarch/include/asm/fpu.h
+++ b/arch/loongarch/include/asm/fpu.h
@@ -21,6 +21,7 @@
 
 struct sigcontext;
 
+#define kernel_fpu_available() cpu_has_fpu
 extern void kernel_fpu_begin(void);
 extern void kernel_fpu_end(void);
 
-- 
2.43.1



[PATCH v3 06/14] lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS

2024-03-27 Thread Samuel Holland
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source
tree, use it instead of duplicating the flags here.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Samuel Holland 
---

(no changes since v1)

 lib/raid6/Makefile | 31 ---
 1 file changed, 8 insertions(+), 23 deletions(-)

diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile
index 385a94aa0b99..c71984e04c4d 100644
--- a/lib/raid6/Makefile
+++ b/lib/raid6/Makefile
@@ -33,25 +33,6 @@ CFLAGS_REMOVE_vpermxor8.o += -msoft-float
 endif
 endif
 
-# The GCC option -ffreestanding is required in order to compile code containing
-# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel)
-ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
-NEON_FLAGS := -ffreestanding
-# Enable 
-NEON_FLAGS += -isystem $(shell $(CC) -print-file-name=include)
-ifeq ($(ARCH),arm)
-NEON_FLAGS += -march=armv7-a -mfloat-abi=softfp -mfpu=neon
-endif
-CFLAGS_recov_neon_inner.o += $(NEON_FLAGS)
-ifeq ($(ARCH),arm64)
-CFLAGS_REMOVE_recov_neon_inner.o += -mgeneral-regs-only
-CFLAGS_REMOVE_neon1.o += -mgeneral-regs-only
-CFLAGS_REMOVE_neon2.o += -mgeneral-regs-only
-CFLAGS_REMOVE_neon4.o += -mgeneral-regs-only
-CFLAGS_REMOVE_neon8.o += -mgeneral-regs-only
-endif
-endif
-
 quiet_cmd_unroll = UNROLL  $@
   cmd_unroll = $(AWK) -v N=$* -f $(srctree)/$(src)/unroll.awk < $< > $@
 
@@ -75,10 +56,14 @@ targets += vpermxor1.c vpermxor2.c vpermxor4.c vpermxor8.c
 $(obj)/vpermxor%.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE
$(call if_changed,unroll)
 
-CFLAGS_neon1.o += $(NEON_FLAGS)
-CFLAGS_neon2.o += $(NEON_FLAGS)
-CFLAGS_neon4.o += $(NEON_FLAGS)
-CFLAGS_neon8.o += $(NEON_FLAGS)
+CFLAGS_neon1.o += $(CC_FLAGS_FPU)
+CFLAGS_neon2.o += $(CC_FLAGS_FPU)
+CFLAGS_neon4.o += $(CC_FLAGS_FPU)
+CFLAGS_neon8.o += $(CC_FLAGS_FPU)
+CFLAGS_REMOVE_neon1.o += $(CC_FLAGS_NO_FPU)
+CFLAGS_REMOVE_neon2.o += $(CC_FLAGS_NO_FPU)
+CFLAGS_REMOVE_neon4.o += $(CC_FLAGS_NO_FPU)
+CFLAGS_REMOVE_neon8.o += $(CC_FLAGS_NO_FPU)
 targets += neon1.c neon2.c neon4.c neon8.c
 $(obj)/neon%.c: $(src)/neon.uc $(src)/unroll.awk FORCE
$(call if_changed,unroll)
-- 
2.43.1



[PATCH v3 05/14] arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS

2024-03-27 Thread Samuel Holland
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source
tree, use it instead of duplicating the flags here.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Samuel Holland 
---

(no changes since v2)

Changes in v2:
 - New patch for v2

 arch/arm64/lib/Makefile | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 29490be2546b..13e6a2829116 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -7,10 +7,8 @@ lib-y  := clear_user.o delay.o copy_from_user.o
\
 
 ifeq ($(CONFIG_KERNEL_MODE_NEON), y)
 obj-$(CONFIG_XOR_BLOCKS)   += xor-neon.o
-CFLAGS_REMOVE_xor-neon.o   += -mgeneral-regs-only
-CFLAGS_xor-neon.o  += -ffreestanding
-# Enable 
-CFLAGS_xor-neon.o  += -isystem $(shell $(CC) 
-print-file-name=include)
+CFLAGS_xor-neon.o  += $(CC_FLAGS_FPU)
+CFLAGS_REMOVE_xor-neon.o   += $(CC_FLAGS_NO_FPU)
 endif
 
 lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o
-- 
2.43.1



[PATCH v3 04/14] arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

2024-03-27 Thread Samuel Holland
arm64 provides an equivalent to the common kernel-mode FPU API, but in a
different header and using different function names. Add a wrapper
header, and export CFLAGS adjustments as found in lib/raid6/Makefile.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Samuel Holland 
---

(no changes since v2)

Changes in v2:
 - Remove file name from header comment

 arch/arm64/Kconfig   |  1 +
 arch/arm64/Makefile  |  9 -
 arch/arm64/include/asm/fpu.h | 15 +++
 3 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/include/asm/fpu.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b11c98b3e84..67f0d3b5b7df 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -30,6 +30,7 @@ config ARM64
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_HAS_KCOV
+   select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON
select ARCH_HAS_KEEPINITRD
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 0e075d3c546b..3e863e5b0169 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -36,7 +36,14 @@ ifeq ($(CONFIG_BROKEN_GAS_INST),y)
 $(warning Detected assembler with broken .inst; disassembly will be unreliable)
 endif
 
-KBUILD_CFLAGS  += -mgeneral-regs-only  \
+# The GCC option -ffreestanding is required in order to compile code containing
+# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel)
+CC_FLAGS_FPU   := -ffreestanding
+# Enable 
+CC_FLAGS_FPU   += -isystem $(shell $(CC) -print-file-name=include)
+CC_FLAGS_NO_FPU:= -mgeneral-regs-only
+
+KBUILD_CFLAGS  += $(CC_FLAGS_NO_FPU) \
   $(compat_vdso) $(cc_has_k_constraint)
 KBUILD_CFLAGS  += $(call cc-disable-warning, psabi)
 KBUILD_AFLAGS  += $(compat_vdso)
diff --git a/arch/arm64/include/asm/fpu.h b/arch/arm64/include/asm/fpu.h
new file mode 100644
index ..2ae50bdce59b
--- /dev/null
+++ b/arch/arm64/include/asm/fpu.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef __ASM_FPU_H
+#define __ASM_FPU_H
+
+#include 
+
+#define kernel_fpu_available() cpu_has_neon()
+#define kernel_fpu_begin() kernel_neon_begin()
+#define kernel_fpu_end()   kernel_neon_end()
+
+#endif /* ! __ASM_FPU_H */
-- 
2.43.1



[PATCH v3 03/14] ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS

2024-03-27 Thread Samuel Holland
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source
tree, use it instead of duplicating the flags here.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Samuel Holland 
---

(no changes since v1)

 arch/arm/lib/Makefile | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index 650404be6768..0ca5aae1bcc3 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -40,8 +40,7 @@ $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S
 $(obj)/csumpartialcopyuser.o:  $(obj)/csumpartialcopygeneric.S
 
 ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
-  NEON_FLAGS   := -march=armv7-a -mfloat-abi=softfp -mfpu=neon
-  CFLAGS_xor-neon.o+= $(NEON_FLAGS)
+  CFLAGS_xor-neon.o+= $(CC_FLAGS_FPU)
   obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
 endif
 
-- 
2.43.1



[PATCH v3 01/14] arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT

2024-03-27 Thread Samuel Holland
Several architectures provide an API to enable the FPU and run
floating-point SIMD code in kernel space. However, the function names,
header locations, and semantics are inconsistent across architectures,
and FPU support may be gated behind other Kconfig options.

Provide a standard way for architectures to declare that kernel space
FPU support is available. Architectures selecting this option must
implement what is currently the most common API (kernel_fpu_begin() and
kernel_fpu_end(), plus a new function kernel_fpu_available()) and
provide the appropriate CFLAGS for compiling floating-point C code.

Suggested-by: Christoph Hellwig 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Samuel Holland 
---

(no changes since v2)

Changes in v2:
 - Add documentation explaining the built-time and runtime APIs
 - Add a linux/fpu.h header for generic isolation enforcement

 Documentation/core-api/floating-point.rst | 78 +++
 Documentation/core-api/index.rst  |  1 +
 Makefile  |  5 ++
 arch/Kconfig  |  6 ++
 include/linux/fpu.h   | 12 
 5 files changed, 102 insertions(+)
 create mode 100644 Documentation/core-api/floating-point.rst
 create mode 100644 include/linux/fpu.h

diff --git a/Documentation/core-api/floating-point.rst 
b/Documentation/core-api/floating-point.rst
new file mode 100644
index ..a8d0d4b05052
--- /dev/null
+++ b/Documentation/core-api/floating-point.rst
@@ -0,0 +1,78 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+Floating-point API
+==
+
+Kernel code is normally prohibited from using floating-point (FP) registers or
+instructions, including the C float and double data types. This rule reduces
+system call overhead, because the kernel does not need to save and restore the
+userspace floating-point register state.
+
+However, occasionally drivers or library functions may need to include FP code.
+This is supported by isolating the functions containing FP code to a separate
+translation unit (a separate source file), and saving/restoring the FP register
+state around calls to those functions. This creates "critical sections" of
+floating-point usage.
+
+The reason for this isolation is to prevent the compiler from generating code
+touching the FP registers outside these critical sections. Compilers sometimes
+use FP registers to optimize inlined ``memcpy`` or variable assignment, as
+floating-point registers may be wider than general-purpose registers.
+
+Usability of floating-point code within the kernel is architecture-specific.
+Additionally, because a single kernel may be configured to support platforms
+both with and without a floating-point unit, FPU availability must be checked
+both at build time and at run time.
+
+Several architectures implement the generic kernel floating-point API from
+``linux/fpu.h``, as described below. Some other architectures implement their
+own unique APIs, which are documented separately.
+
+Build-time API
+--
+
+Floating-point code may be built if the option ``ARCH_HAS_KERNEL_FPU_SUPPORT``
+is enabled. For C code, such code must be placed in a separate file, and that
+file must have its compilation flags adjusted using the following pattern::
+
+CFLAGS_foo.o += $(CC_FLAGS_FPU)
+CFLAGS_REMOVE_foo.o += $(CC_FLAGS_NO_FPU)
+
+Architectures are expected to define one or both of these variables in their
+top-level Makefile as needed. For example::
+
+CC_FLAGS_FPU := -mhard-float
+
+or::
+
+CC_FLAGS_NO_FPU := -msoft-float
+
+Normal kernel code is assumed to use the equivalent of ``CC_FLAGS_NO_FPU``.
+
+Runtime API
+---
+
+The runtime API is provided in ``linux/fpu.h``. This header cannot be included
+from files implementing FP code (those with their compilation flags adjusted as
+above). Instead, it must be included when defining the FP critical sections.
+
+.. c:function:: bool kernel_fpu_available( void )
+
+This function reports if floating-point code can be used on this CPU or
+platform. The value returned by this function is not expected to change
+at runtime, so it only needs to be called once, not before every
+critical section.
+
+.. c:function:: void kernel_fpu_begin( void )
+void kernel_fpu_end( void )
+
+These functions create a floating-point critical section. It is only
+valid to call ``kernel_fpu_begin()`` after a previous call to
+``kernel_fpu_available()`` returned ``true``. These functions are only
+guaranteed to be callable from (preemptible or non-preemptible) process
+context.
+
+Preemption may be disabled inside critical sections, so their size
+should be minimized. They are *not* required to be reentrant. If the
+caller expects to nest critical sections, it must implement its own
+reference counting.
diff --git a/Documentation/core-api/index.rst 

[PATCH v3 00/14] Unified cross-architecture kernel-mode FPU API

2024-03-27 Thread Samuel Holland
This series unifies the kernel-mode FPU API across several architectures
by wrapping the existing functions (where needed) in consistently-named
functions placed in a consistent header location, with mostly the same
semantics: they can be called from preemptible or non-preemptible task
context, and are not assumed to be reentrant. Architectures are also
expected to provide CFLAGS adjustments for compiling FPU-dependent code.
For the moment, SIMD/vector units are out of scope for this common API.

This allows us to remove the ifdeffery and duplicated Makefile logic at
each FPU user. It then implements the common API on RISC-V, and converts
a couple of users to the new API: the AMDGPU DRM driver, and the FPU
self test.

The underlying goal of this series is to allow using newer AMD GPUs
(e.g. Navi) on RISC-V boards such as SiFive's HiFive Unmatched. Those
GPUs need CONFIG_DRM_AMD_DC_FP to initialize, which requires kernel-mode
FPU support.

Previous versions:
v2: 
https://lore.kernel.org/linux-kernel/20231228014220.3562640-1-samuel.holl...@sifive.com/
v1: 
https://lore.kernel.org/linux-kernel/20231208055501.2916202-1-samuel.holl...@sifive.com/
v0: 
https://lore.kernel.org/linux-kernel/20231122030621.3759313-1-samuel.holl...@sifive.com/

Changes in v3:
 - Rebase on v6.9-rc1
 - Limit ARCH_HAS_KERNEL_FPU_SUPPORT to 64BIT

Changes in v2:
 - Add documentation explaining the built-time and runtime APIs
 - Add a linux/fpu.h header for generic isolation enforcement
 - Remove file name from header comment
 - Clean up arch/arm64/lib/Makefile, like for arch/arm
 - Remove RISC-V architecture-specific preprocessor check
 - Split altivec removal to a separate patch
 - Use linux/fpu.h instead of asm/fpu.h in consumers
 - Declare test_fpu() in a header

Michael Ellerman (1):
  drm/amd/display: Only use hard-float, not altivec on powerpc

Samuel Holland (13):
  arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT
  ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
  ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
  arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
  arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
  lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS
  LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
  powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
  x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
  riscv: Add support for kernel-mode FPU
  drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
  selftests/fpu: Move FP code to a separate translation unit
  selftests/fpu: Allow building on other architectures

 Documentation/core-api/floating-point.rst | 78 +++
 Documentation/core-api/index.rst  |  1 +
 Makefile  |  5 ++
 arch/Kconfig  |  6 ++
 arch/arm/Kconfig  |  1 +
 arch/arm/Makefile |  7 ++
 arch/arm/include/asm/fpu.h| 15 
 arch/arm/lib/Makefile |  3 +-
 arch/arm64/Kconfig|  1 +
 arch/arm64/Makefile   |  9 ++-
 arch/arm64/include/asm/fpu.h  | 15 
 arch/arm64/lib/Makefile   |  6 +-
 arch/loongarch/Kconfig|  1 +
 arch/loongarch/Makefile   |  5 +-
 arch/loongarch/include/asm/fpu.h  |  1 +
 arch/powerpc/Kconfig  |  1 +
 arch/powerpc/Makefile |  5 +-
 arch/powerpc/include/asm/fpu.h| 28 +++
 arch/riscv/Kconfig|  1 +
 arch/riscv/Makefile   |  3 +
 arch/riscv/include/asm/fpu.h  | 16 
 arch/riscv/kernel/Makefile|  1 +
 arch/riscv/kernel/kernel_mode_fpu.c   | 28 +++
 arch/x86/Kconfig  |  1 +
 arch/x86/Makefile | 20 +
 arch/x86/include/asm/fpu.h| 13 
 drivers/gpu/drm/amd/display/Kconfig   |  2 +-
 .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c| 35 +
 drivers/gpu/drm/amd/display/dc/dml/Makefile   | 36 +
 drivers/gpu/drm/amd/display/dc/dml2/Makefile  | 36 +
 include/linux/fpu.h   | 12 +++
 lib/Kconfig.debug |  2 +-
 lib/Makefile  | 26 +--
 lib/raid6/Makefile| 31 ++--
 lib/test_fpu.h|  8 ++
 lib/{test_fpu.c => test_fpu_glue.c}   | 37 ++---
 lib/test_fpu_impl.c   | 37 +
 37 files changed, 343 insertions(+), 190 deletions(-)
 create mode 100644 Documentation/core-api/floating-point.rst
 create mode 100644 arch/arm/include/asm/fpu.h
 create mode 100644 arch/arm64/include/asm/fpu.h
 create mode 100644 arch/powerpc/include/asm/fpu.h
 create mode 100644 arch/riscv/include/asm/fpu.h
 create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c
 create mode 

[PATCH v3 02/14] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

2024-03-27 Thread Samuel Holland
ARM provides an equivalent to the common kernel-mode FPU API, but in a
different header and using different function names. Add a wrapper
header, and export CFLAGS adjustments as found in lib/raid6/Makefile.

Reviewed-by: Christoph Hellwig 
Signed-off-by: Samuel Holland 
---

(no changes since v2)

Changes in v2:
 - Remove file name from header comment

 arch/arm/Kconfig   |  1 +
 arch/arm/Makefile  |  7 +++
 arch/arm/include/asm/fpu.h | 15 +++
 3 files changed, 23 insertions(+)
 create mode 100644 arch/arm/include/asm/fpu.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index b14aed3a17ab..b1751c2cab87 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -15,6 +15,7 @@ config ARM
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_KEEPINITRD
select ARCH_HAS_KCOV
+   select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PTE_SPECIAL if ARM_LPAE
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index d82908b1b1bb..71afdd98ddf2 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -130,6 +130,13 @@ endif
 # Accept old syntax despite ".syntax unified"
 AFLAGS_NOWARN  :=$(call 
as-option,-Wa$(comma)-mno-warn-deprecated,-Wa$(comma)-W)
 
+# The GCC option -ffreestanding is required in order to compile code containing
+# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel)
+CC_FLAGS_FPU   := -ffreestanding
+# Enable 
+CC_FLAGS_FPU   += -isystem $(shell $(CC) -print-file-name=include)
+CC_FLAGS_FPU   += -march=armv7-a -mfloat-abi=softfp -mfpu=neon
+
 ifeq ($(CONFIG_THUMB2_KERNEL),y)
 CFLAGS_ISA :=-Wa,-mimplicit-it=always $(AFLAGS_NOWARN)
 AFLAGS_ISA :=$(CFLAGS_ISA) -Wa$(comma)-mthumb
diff --git a/arch/arm/include/asm/fpu.h b/arch/arm/include/asm/fpu.h
new file mode 100644
index ..2ae50bdce59b
--- /dev/null
+++ b/arch/arm/include/asm/fpu.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef __ASM_FPU_H
+#define __ASM_FPU_H
+
+#include 
+
+#define kernel_fpu_available() cpu_has_neon()
+#define kernel_fpu_begin() kernel_neon_begin()
+#define kernel_fpu_end()   kernel_neon_end()
+
+#endif /* ! __ASM_FPU_H */
-- 
2.43.1



Re: [PATCH v2 12/14] sh: Add support for suppressing warning backtraces

2024-03-27 Thread Simon Horman
On Wed, Mar 27, 2024 at 08:10:51AM -0700, Guenter Roeck wrote:
> On 3/27/24 07:44, Simon Horman wrote:
> > On Mon, Mar 25, 2024 at 10:52:46AM -0700, Guenter Roeck wrote:
> > > Add name of functions triggering warning backtraces to the __bug_table
> > > object section to enable support for suppressing WARNING backtraces.
> > > 
> > > To limit image size impact, the pointer to the function name is only added
> > > to the __bug_table section if both CONFIG_KUNIT_SUPPRESS_BACKTRACE and
> > > CONFIG_DEBUG_BUGVERBOSE are enabled. Otherwise, the __func__ assembly
> > > parameter is replaced with a (dummy) NULL parameter to avoid an image size
> > > increase due to unused __func__ entries (this is necessary because 
> > > __func__
> > > is not a define but a virtual variable).
> > > 
> > > Tested-by: Linux Kernel Functional Testing 
> > > Acked-by: Dan Carpenter 
> > > Signed-off-by: Guenter Roeck 
> > > ---
> > > - Rebased to v6.9-rc1
> > > - Added Tested-by:, Acked-by:, and Reviewed-by: tags
> > > - Introduced KUNIT_SUPPRESS_BACKTRACE configuration option
> > > 
> > >   arch/sh/include/asm/bug.h | 26 ++
> > >   1 file changed, 22 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/arch/sh/include/asm/bug.h b/arch/sh/include/asm/bug.h
> > > index 05a485c4fabc..470ce6567d20 100644
> > > --- a/arch/sh/include/asm/bug.h
> > > +++ b/arch/sh/include/asm/bug.h
> > > @@ -24,21 +24,36 @@
> > >* The offending file and line are encoded in the __bug_table section.
> > >*/
> > >   #ifdef CONFIG_DEBUG_BUGVERBOSE
> > > +
> > > +#ifdef CONFIG_KUNIT_SUPPRESS_BACKTRACE
> > > +# define HAVE_BUG_FUNCTION
> > > +# define __BUG_FUNC_PTR  "\t.long %O2\n"
> > > +#else
> > > +# define __BUG_FUNC_PTR
> > > +#endif /* CONFIG_KUNIT_SUPPRESS_BACKTRACE */
> > > +
> > 
> > Hi Guenter,
> > 
> > a minor nit from my side: this change results in a Kernel doc warning.
> > 
> >   .../bug.h:29: warning: expecting prototype for _EMIT_BUG_ENTRY(). 
> > Prototype was for HAVE_BUG_FUNCTION() instead
> > 
> > Perhaps either the new code should be placed above the Kernel doc,
> > or scripts/kernel-doc should be enhanced?
> > 
> 
> Thanks a lot for the feedback.
> 
> The definition block needs to be inside CONFIG_DEBUG_BUGVERBOSE,
> so it would be a bit odd to move it above the documentation
> just to make kerneldoc happy. I am not really sure that to do
> about it.

FWIIW, I agree that would be odd.
But perhaps the #ifdef could also move above the Kernel doc?
Maybe not a great idea, but the best one I've had so far.

> I'll wait for comments from others before making any changes.
> 
> Thanks,
> Guenter
> 
> > >   #define _EMIT_BUG_ENTRY \
> > >   "\t.pushsection __bug_table,\"aw\"\n"   \
> > >   "2:\t.long 1b, %O1\n"   \
> > > - "\t.short %O2, %O3\n"   \
> > > - "\t.org 2b+%O4\n"   \
> > > + __BUG_FUNC_PTR  \
> > > + "\t.short %O3, %O4\n"   \
> > > + "\t.org 2b+%O5\n"   \
> > >   "\t.popsection\n"
> > >   #else
> > >   #define _EMIT_BUG_ENTRY \
> > >   "\t.pushsection __bug_table,\"aw\"\n"   \
> > >   "2:\t.long 1b\n"\
> > > - "\t.short %O3\n"\
> > > - "\t.org 2b+%O4\n"   \
> > > + "\t.short %O4\n"\
> > > + "\t.org 2b+%O5\n"   \
> > >   "\t.popsection\n"
> > >   #endif
> > > +#ifdef HAVE_BUG_FUNCTION
> > > +# define __BUG_FUNC  __func__
> > > +#else
> > > +# define __BUG_FUNC  NULL
> > > +#endif
> > > +
> > >   #define BUG()   \
> > >   do {\
> > >   __asm__ __volatile__ (  \
> > 
> > ...
> 


Re: [PATCH v3] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id

2024-03-27 Thread Andrew Morton
On Fri, 26 Jan 2024 14:44:51 +0800 Huang Shijie  
wrote:

> During the kernel booting, the generic cpu_to_node() is called too early in
> arm64, powerpc and riscv when CONFIG_NUMA is enabled.
> 
> There are at least four places in the common code where
> the generic cpu_to_node() is called before it is initialized:
>  1.) early_trace_init() in kernel/trace/trace.c
>  2.) sched_init()   in kernel/sched/core.c
>  3.) init_sched_fair_class()in kernel/sched/fair.c
>  4.) workqueue_init_early() in kernel/workqueue.c
> 
> In order to fix the bug, the patch introduces early_numa_node_init()
> which is called after smp_prepare_boot_cpu() in start_kernel.
> early_numa_node_init will initialize the "numa_node" as soon as
> the early_cpu_to_node() is ready, before the cpu_to_node() is called
> at the first time.

What are the userspace-visible runtime effects of this bug?




[PATCH] usb: phy: MAINTAINERS: mark Freescale USB PHY as orphaned

2024-03-27 Thread Krzysztof Kozlowski
Emails to the only maintainer bounce:

  : host nxp-com.mail.protection.outlook.com[52.101.68.39]
  said: 550 5.4.1 Recipient address rejected: Access denied.

Signed-off-by: Krzysztof Kozlowski 
---
 MAINTAINERS | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 51d5a64a5a36..b66812e99caf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8760,10 +8760,9 @@ S:   Maintained
 F: drivers/usb/gadget/udc/fsl*
 
 FREESCALE USB PHY DRIVER
-M: Ran Wang 
 L: linux-...@vger.kernel.org
 L: linuxppc-dev@lists.ozlabs.org
-S: Maintained
+S: Orphan
 F: drivers/usb/phy/phy-fsl-usb*
 
 FREEVXFS FILESYSTEM
-- 
2.34.1



[PATCH 2/2] usb: typec: nvidia: drop driver owner assignment

2024-03-27 Thread Krzysztof Kozlowski
Core in typec_altmode_register_driver() already sets the .owner, so
driver does not need to.

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/usb/typec/altmodes/nvidia.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/usb/typec/altmodes/nvidia.c 
b/drivers/usb/typec/altmodes/nvidia.c
index c36769736405..fe70b36f078f 100644
--- a/drivers/usb/typec/altmodes/nvidia.c
+++ b/drivers/usb/typec/altmodes/nvidia.c
@@ -35,7 +35,6 @@ static struct typec_altmode_driver nvidia_altmode_driver = {
.remove = nvidia_altmode_remove,
.driver = {
.name = "typec_nvidia",
-   .owner = THIS_MODULE,
},
 };
 module_typec_altmode_driver(nvidia_altmode_driver);
-- 
2.34.1



[PATCH 1/2] usb: phy: fsl-usb: drop driver owner assignment

2024-03-27 Thread Krzysztof Kozlowski
Core in platform_driver_register() already sets the .owner, so driver
does not need to.

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/usb/phy/phy-fsl-usb.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/usb/phy/phy-fsl-usb.c b/drivers/usb/phy/phy-fsl-usb.c
index 79617bb0a70e..1ebbf189a535 100644
--- a/drivers/usb/phy/phy-fsl-usb.c
+++ b/drivers/usb/phy/phy-fsl-usb.c
@@ -1005,7 +1005,6 @@ struct platform_driver fsl_otg_driver = {
.remove_new = fsl_otg_remove,
.driver = {
.name = driver_name,
-   .owner = THIS_MODULE,
},
 };
 
-- 
2.34.1



Re: [RFC PATCH 1/8] mm: Provide pagesize to pmd_populate()

2024-03-27 Thread Jason Gunthorpe
On Wed, Mar 27, 2024 at 09:58:35AM +, Christophe Leroy wrote:
> > Just general remarks on the ones with huge pages:
> > 
> >   hash 64k and hugepage 16M/16G
> >   radix 64k/radix hugepage 2M/1G
> >   radix 4k/radix hugepage 2M/1G
> >   nohash 32
> >- I think this is just a normal x86 like scheme? PMD/PUD can be a
> >  leaf with the same size as a next level table.
> > 
> >  Do any of these cases need to know the higher level to parse the
> >  lower? eg is there a 2M bit in the PUD indicating that the PMD
> >  is a table of 2M leafs or does each PMD entry have a bit
> >  indicating it is a leaf?
> 
> For hash and radix there is a bit that tells it is leaf (_PAGE_PTE)
> 
> For nohash32/e500 I think the drawing is not full right, there is a huge 
> page directory (hugepd) with a single entry. I think it should be 
> possible to change it to a leaf entry, it seems we have bit _PAGE_SW1 
> available in the PTE.

It sounds to me like PPC breaks down into only a couple fundamental
behaviors
 - x86 like leaf in many page levels. Use the pgd/pud/pmd_leaf() and
   related to implement it
 - ARM like contig PTE within a single page table level. Use the
   contig sutff to implement it
 - Contig PTE across two page table levels with a bit in the
   PMD. Needs new support like you showed
 - Page table levels with a variable page size. Ie a PUD can point to
   a directory of 8 pages or 512 pages of different size. Probbaly
   needs some new core support, but I think your changes to the
   *_offset go a long way already.

> > 
> >   hash 4k and hugepage 16M/16G
> >   nohash 64
> >- How does this work? I guess since 8xx explicitly calls out
> >  consecutive this is actually the pgd can point to 512 256M
> >  entries or 8 16G entries? Ie the table size at each level is
> >  varable? Or is it the same and the table size is still 512 and
> >  each 16G entry is replicated 64 times?
> 
> For those it is using the huge page directory (hugepd) which can be 
> hooked at any level and is a directory of huge pages on its own. There 
> is no consecutive entries involved here I think, allthough I'm not 
> completely sure.
> 
> For hash4k I'm not sure how it works, this was changed by commit 
> e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a 
> different page table format")
> 
> For the nohash/64, a PGD entry points either to a regular PUD directory 
> or to a HUGEPD directory. The size of the HUGEPD directory is encoded in 
> the 6 lower bits of the PGD entry.

If it is a software walker there might be value in just aligning to
the contig pte scheme in all levels and forgetting about the variable
size page table levels. That quarter page stuff is a PITA to manage
the memory allocation for on PPC anyhow..

Jason


Re: [PATCH RFC 0/3] mm/gup: consistently call it GUP-fast

2024-03-27 Thread Arnd Bergmann
On Wed, Mar 27, 2024, at 16:39, David Hildenbrand wrote:
> On 27.03.24 16:21, Peter Xu wrote:
>> On Wed, Mar 27, 2024 at 02:05:35PM +0100, David Hildenbrand wrote:
>> 
>> I'm not sure what config you tried there; as I am doing some build tests
>> recently, I found turning off CONFIG_SAMPLES + CONFIG_GCC_PLUGINS could
>> avoid a lot of issues, I think it's due to libc missing.  But maybe not the
>> case there.
>
> CCin Arnd; I use some of his compiler chains, others from Fedora directly. For
> example for alpha and arc, the Fedora gcc is "13.2.1".

>
> But there is other stuff like (arc):
>
> ./arch/arc/include/asm/mmu-arcv2.h: In function 'mmu_setup_asid':
> ./arch/arc/include/asm/mmu-arcv2.h:82:9: error: implicit declaration of 
> function 'write_aux_reg' [-Werro
> r=implicit-function-declaration]
> 82 | write_aux_reg(ARC_REG_PID, asid | MMU_ENABLE);
>| ^

Seems to be missing an #include of soc/arc/aux.h, but I can't
tell when this first broke without bisecting.

> or (alpha)
>
> WARNING: modpost: "saved_config" [vmlinux] is COMMON symbol
> ERROR: modpost: "memcpy" [fs/reiserfs/reiserfs.ko] undefined!
> ERROR: modpost: "memcpy" [fs/nfs/nfs.ko] undefined!
> ERROR: modpost: "memcpy" [fs/nfs/nfsv3.ko] undefined!
> ERROR: modpost: "memcpy" [fs/nfsd/nfsd.ko] undefined!
> ERROR: modpost: "memcpy" [fs/lockd/lockd.ko] undefined!
> ERROR: modpost: "memcpy" [crypto/crypto.ko] undefined!
> ERROR: modpost: "memcpy" [crypto/crypto_algapi.ko] undefined!
> ERROR: modpost: "memcpy" [crypto/aead.ko] undefined!
> ERROR: modpost: "memcpy" [crypto/crypto_skcipher.ko] undefined!
> ERROR: modpost: "memcpy" [crypto/seqiv.ko] undefined!

Al did a series to fix various build problems on alpha, see
https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/log/?h=work.alpha
Not sure if he still has to send them to Matt, or if Matt
just needs to apply them.

I also have some alpha patches that I should send upstream.

 Arnd


Re: [PATCH v2 5/6] mm/mm_init.c: remove unneeded calc_memmap_size()

2024-03-27 Thread Mike Rapoport
On Mon, Mar 25, 2024 at 10:56:45PM +0800, Baoquan He wrote:
> Nobody calls calc_memmap_size() now.
> 
> Signed-off-by: Baoquan He 

Reviewed-by: Mike Rapoport (IBM) 

Looks like I replied to patch 6/6 twice by mistake and missed this one.

> ---
>  mm/mm_init.c | 20 
>  1 file changed, 20 deletions(-)
> 
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 7f71e56e83f3..e269a724f70e 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1331,26 +1331,6 @@ static void __init calculate_node_totalpages(struct 
> pglist_data *pgdat,
>   pr_debug("On node %d totalpages: %lu\n", pgdat->node_id, 
> realtotalpages);
>  }
>  
> -static unsigned long __init calc_memmap_size(unsigned long spanned_pages,
> - unsigned long present_pages)
> -{
> - unsigned long pages = spanned_pages;
> -
> - /*
> -  * Provide a more accurate estimation if there are holes within
> -  * the zone and SPARSEMEM is in use. If there are holes within the
> -  * zone, each populated memory region may cost us one or two extra
> -  * memmap pages due to alignment because memmap pages for each
> -  * populated regions may not be naturally aligned on page boundary.
> -  * So the (present_pages >> 4) heuristic is a tradeoff for that.
> -  */
> - if (spanned_pages > present_pages + (present_pages >> 4) &&
> - IS_ENABLED(CONFIG_SPARSEMEM))
> - pages = present_pages;
> -
> - return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT;
> -}
> -
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  static void pgdat_init_split_queue(struct pglist_data *pgdat)
>  {
> -- 
> 2.41.0
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH RFC 0/3] mm/gup: consistently call it GUP-fast

2024-03-27 Thread David Hildenbrand

On 27.03.24 16:46, Ryan Roberts wrote:


Some of them look like mm-unstable issue, For example, arm64 fails with

   CC  arch/arm64/mm/extable.o
In file included from ./include/linux/hugetlb.h:828,
  from security/commoncap.c:19:
./arch/arm64/include/asm/hugetlb.h:25:34: error: redefinition of
'arch_clear_hugetlb_flags'
    25 | #define arch_clear_hugetlb_flags arch_clear_hugetlb_flags
   |  ^~~~
./include/linux/hugetlb.h:840:20: note: in expansion of macro
'arch_clear_hugetlb_flags'
   840 | static inline void arch_clear_hugetlb_flags(struct folio *folio) { }
   |    ^~~~
./arch/arm64/include/asm/hugetlb.h:21:20: note: previous definition of
'arch_clear_hugetlb_flags' with t
ype 'void(struct folio *)'
    21 | static inline void arch_clear_hugetlb_flags(struct folio *folio)
   |    ^~~~
In file included from ./include/linux/hugetlb.h:828,
  from mm/filemap.c:37:
./arch/arm64/include/asm/hugetlb.h:25:34: error: redefinition of
'arch_clear_hugetlb_flags'
    25 | #define arch_clear_hugetlb_flags arch_clear_hugetlb_flags
   |  ^~~~
./include/linux/hugetlb.h:840:20: note: in expansion of macro
'arch_clear_hugetlb_flags'
   840 | static inline void arch_clear_hugetlb_flags(struct folio *folio) { }
   |    ^~~~
./arch/arm64/include/asm/hugetlb.h:21:20: note: previous definition of
'arch_clear_hugetlb_flags' with type 'void(struct folio *)'
    21 | static inline void arch_clear_hugetlb_flags(struct folio *folio)


see: https://lore.kernel.org/linux-mm/zgqvnkgdldkwh...@casper.infradead.org/



Yes, besides the other failures I see (odd targets), I was expecting 
that someone else noticed that already :) thanks!


--
Cheers,

David / dhildenb



Re: [PATCH RFC 0/3] mm/gup: consistently call it GUP-fast

2024-03-27 Thread Ryan Roberts
> 
> Some of them look like mm-unstable issue, For example, arm64 fails with
> 
>   CC  arch/arm64/mm/extable.o
> In file included from ./include/linux/hugetlb.h:828,
>  from security/commoncap.c:19:
> ./arch/arm64/include/asm/hugetlb.h:25:34: error: redefinition of
> 'arch_clear_hugetlb_flags'
>    25 | #define arch_clear_hugetlb_flags arch_clear_hugetlb_flags
>   |  ^~~~
> ./include/linux/hugetlb.h:840:20: note: in expansion of macro
> 'arch_clear_hugetlb_flags'
>   840 | static inline void arch_clear_hugetlb_flags(struct folio *folio) { }
>   |    ^~~~
> ./arch/arm64/include/asm/hugetlb.h:21:20: note: previous definition of
> 'arch_clear_hugetlb_flags' with t
> ype 'void(struct folio *)'
>    21 | static inline void arch_clear_hugetlb_flags(struct folio *folio)
>   |    ^~~~
> In file included from ./include/linux/hugetlb.h:828,
>  from mm/filemap.c:37:
> ./arch/arm64/include/asm/hugetlb.h:25:34: error: redefinition of
> 'arch_clear_hugetlb_flags'
>    25 | #define arch_clear_hugetlb_flags arch_clear_hugetlb_flags
>   |  ^~~~
> ./include/linux/hugetlb.h:840:20: note: in expansion of macro
> 'arch_clear_hugetlb_flags'
>   840 | static inline void arch_clear_hugetlb_flags(struct folio *folio) { }
>   |    ^~~~
> ./arch/arm64/include/asm/hugetlb.h:21:20: note: previous definition of
> 'arch_clear_hugetlb_flags' with type 'void(struct folio *)'
>    21 | static inline void arch_clear_hugetlb_flags(struct folio *folio)

see: https://lore.kernel.org/linux-mm/zgqvnkgdldkwh...@casper.infradead.org/



Re: [PATCH v2 6/6] mm/mm_init.c: remove arch_reserved_kernel_pages()

2024-03-27 Thread Mike Rapoport
On Mon, Mar 25, 2024 at 10:56:46PM +0800, Baoquan He wrote:
> Since the current calculation of calc_nr_kernel_pages() has taken into
> consideration of kernel reserved memory, no need to have
> arch_reserved_kernel_pages() any more.
> 
> Signed-off-by: Baoquan He 

Reviewed-by: Mike Rapoport (IBM) 

> ---
>  arch/powerpc/include/asm/mmu.h |  4 
>  arch/powerpc/kernel/fadump.c   |  5 -
>  include/linux/mm.h |  3 ---
>  mm/mm_init.c   | 12 
>  4 files changed, 24 deletions(-)
> 


Re: [PATCH v2 4/6] mm/mm_init.c: remove meaningless calculation of zone->managed_pages in free_area_init_core()

2024-03-27 Thread Mike Rapoport
On Mon, Mar 25, 2024 at 10:56:44PM +0800, Baoquan He wrote:
> Currently, in free_area_init_core(), when initialize zone's field, a
> rough value is set to zone->managed_pages. That value is calculated by
> (zone->present_pages - memmap_pages).
> 
> In the meantime, add the value to nr_all_pages and nr_kernel_pages which
> represent all free pages of system (only low memory or including HIGHMEM
> memory separately). Both of them are gonna be used in
> alloc_large_system_hash().
> 
> However, the rough calculation and setting of zone->managed_pages is
> meaningless because
>   a) memmap pages are allocated on units of node in sparse_init() or
>  alloc_node_mem_map(pgdat); The simple (zone->present_pages -
>  memmap_pages) is too rough to make sense for zone;
>   b) the set zone->managed_pages will be zeroed out and reset with
>  acutal value in mem_init() via memblock_free_all(). Before the
>  resetting, no buddy allocation request is issued.
> 
> Here, remove the meaningless and complicated calculation of
> (zone->present_pages - memmap_pages), initialize zone->managed_pages as 0
> which reflect its actual value because no any page is added into buddy
> system right now. It will be reset in mem_init().
> 
> And also remove the assignment of nr_all_pages and nr_kernel_pages in
> free_area_init_core(). Instead, call the newly added calc_nr_kernel_pages()
> to count up all free but not reserved memory in memblock and assign to
> nr_all_pages and nr_kernel_pages. The counting excludes memmap_pages,
> and other kernel used data, which is more accurate than old way and
> simpler, and can also cover the ppc required arch_reserved_kernel_pages()
> case.
> 
> And also clean up the outdated code comment above free_area_init_core().
> And free_area_init_core() is easy to understand now, no need to add
> words to explain.
> 
> Signed-off-by: Baoquan He 

Reviewed-by: Mike Rapoport (IBM) 

> ---
>  mm/mm_init.c | 46 +-
>  1 file changed, 5 insertions(+), 41 deletions(-)


Re: [PATCH v2 3/6] mm/mm_init.c: add new function calc_nr_all_pages()

2024-03-27 Thread Mike Rapoport
On Mon, Mar 25, 2024 at 10:56:43PM +0800, Baoquan He wrote:
> This is a preparation to calculate nr_kernel_pages and nr_all_pages,
> both of which will be used later in alloc_large_system_hash().
> 
> nr_all_pages counts up all free but not reserved memory in memblock
> allocator, including HIGHMEM memory. While nr_kernel_pages counts up
> all free but not reserved low memory in memblock allocator, excluding
> HIGHMEM memory.
> 
> Signed-off-by: Baoquan He 

Reviewed-by: Mike Rapoport (IBM) 

> ---
>  mm/mm_init.c | 24 
>  1 file changed, 24 insertions(+)
> 
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 153fb2dc666f..c57a7fc97a16 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1264,6 +1264,30 @@ static void __init 
> reset_memoryless_node_totalpages(struct pglist_data *pgdat)
>   pr_debug("On node %d totalpages: 0\n", pgdat->node_id);
>  }
>  
> +static void __init calc_nr_kernel_pages(void)
> +{
> + unsigned long start_pfn, end_pfn;
> + phys_addr_t start_addr, end_addr;
> + u64 u;
> +#ifdef CONFIG_HIGHMEM
> + unsigned long high_zone_low = 
> arch_zone_lowest_possible_pfn[ZONE_HIGHMEM];
> +#endif
> +
> + for_each_free_mem_range(u, NUMA_NO_NODE, MEMBLOCK_NONE, _addr, 
> _addr, NULL) {
> + start_pfn = PFN_UP(start_addr);
> + end_pfn   = PFN_DOWN(end_addr);
> +
> + if (start_pfn < end_pfn) {
> + nr_all_pages += end_pfn - start_pfn;
> +#ifdef CONFIG_HIGHMEM
> + start_pfn = clamp(start_pfn, 0, high_zone_low);
> + end_pfn = clamp(end_pfn, 0, high_zone_low);
> +#endif
> + nr_kernel_pages += end_pfn - start_pfn;
> + }
> + }
> +}
> +
>  static void __init calculate_node_totalpages(struct pglist_data *pgdat,
>   unsigned long node_start_pfn,
>   unsigned long node_end_pfn)
> -- 
> 2.41.0
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v7 6/6] docs: trusted-encrypted: add DCP as new trust source

2024-03-27 Thread Jarkko Sakkinen
On Wed Mar 27, 2024 at 10:24 AM EET, David Gstir wrote:
> Update the documentation for trusted and encrypted KEYS with DCP as new
> trust source:
>
> - Describe security properties of DCP trust source
> - Describe key usage
> - Document blob format
>
> Co-developed-by: Richard Weinberger 
> Signed-off-by: Richard Weinberger 
> Co-developed-by: David Oberhollenzer 
> Signed-off-by: David Oberhollenzer 
> Signed-off-by: David Gstir 
> ---
>  .../security/keys/trusted-encrypted.rst   | 85 +++
>  1 file changed, 85 insertions(+)
>
> diff --git a/Documentation/security/keys/trusted-encrypted.rst 
> b/Documentation/security/keys/trusted-encrypted.rst
> index e989b9802f92..81fb3540bb20 100644
> --- a/Documentation/security/keys/trusted-encrypted.rst
> +++ b/Documentation/security/keys/trusted-encrypted.rst
> @@ -42,6 +42,14 @@ safe.
>   randomly generated and fused into each SoC at manufacturing time.
>   Otherwise, a common fixed test key is used instead.
>  
> + (4) DCP (Data Co-Processor: crypto accelerator of various i.MX SoCs)
> +
> + Rooted to a one-time programmable key (OTP) that is generally burnt
> + in the on-chip fuses and is accessible to the DCP encryption engine 
> only.
> + DCP provides two keys that can be used as root of trust: the OTP key
> + and the UNIQUE key. Default is to use the UNIQUE key, but selecting
> + the OTP key can be done via a module parameter (dcp_use_otp_key).
> +
>*  Execution isolation
>  
>   (1) TPM
> @@ -57,6 +65,12 @@ safe.
>  
>   Fixed set of operations running in isolated execution environment.
>  
> + (4) DCP
> +
> + Fixed set of cryptographic operations running in isolated execution
> + environment. Only basic blob key encryption is executed there.
> + The actual key sealing/unsealing is done on main processor/kernel 
> space.
> +
>* Optional binding to platform integrity state
>  
>   (1) TPM
> @@ -79,6 +93,11 @@ safe.
>   Relies on the High Assurance Boot (HAB) mechanism of NXP SoCs
>   for platform integrity.
>  
> + (4) DCP
> +
> + Relies on Secure/Trusted boot process (called HAB by vendor) for
> + platform integrity.
> +
>*  Interfaces and APIs
>  
>   (1) TPM
> @@ -94,6 +113,11 @@ safe.
>  
>   Interface is specific to silicon vendor.
>  
> + (4) DCP
> +
> + Vendor-specific API that is implemented as part of the DCP crypto 
> driver in
> + ``drivers/crypto/mxs-dcp.c``.
> +
>*  Threat model
>  
>   The strength and appropriateness of a particular trust source for a 
> given
> @@ -129,6 +153,13 @@ selected trust source:
>   CAAM HWRNG, enable CRYPTO_DEV_FSL_CAAM_RNG_API and ensure the device
>   is probed.
>  
> +  *  DCP (Data Co-Processor: crypto accelerator of various i.MX SoCs)
> +
> + The DCP hardware device itself does not provide a dedicated RNG 
> interface,
> + so the kernel default RNG is used. SoCs with DCP like the i.MX6ULL do 
> have
> + a dedicated hardware RNG that is independent from DCP which can be 
> enabled
> + to back the kernel RNG.
> +
>  Users may override this by specifying ``trusted.rng=kernel`` on the kernel
>  command-line to override the used RNG with the kernel's random number pool.
>  
> @@ -231,6 +262,19 @@ Usage::
>  CAAM-specific format.  The key length for new keys is always in bytes.
>  Trusted Keys can be 32 - 128 bytes (256 - 1024 bits).
>  
> +Trusted Keys usage: DCP
> +---
> +
> +Usage::
> +
> +keyctl add trusted name "new keylen" ring
> +keyctl add trusted name "load hex_blob" ring
> +keyctl print keyid
> +
> +"keyctl print" returns an ASCII hex copy of the sealed key, which is in 
> format
> +specific to this DCP key-blob implementation.  The key length for new keys is
> +always in bytes. Trusted Keys can be 32 - 128 bytes (256 - 1024 bits).
> +
>  Encrypted Keys usage
>  
>  
> @@ -426,3 +470,44 @@ string length.
>  privkey is the binary representation of TPM2B_PUBLIC excluding the
>  initial TPM2B header which can be reconstructed from the ASN.1 octed
>  string length.
> +
> +DCP Blob Format
> +---
> +
> +The Data Co-Processor (DCP) provides hardware-bound AES keys using its
> +AES encryption engine only. It does not provide direct key sealing/unsealing.
> +To make DCP hardware encryption keys usable as trust source, we define
> +our own custom format that uses a hardware-bound key to secure the sealing
> +key stored in the key blob.
> +
> +Whenever a new trusted key using DCP is generated, we generate a random 
> 128-bit
> +blob encryption key (BEK) and 128-bit nonce. The BEK and nonce are used to
> +encrypt the trusted key payload using AES-128-GCM.
> +
> +The BEK itself is encrypted using the hardware-bound key using the DCP's AES
> +encryption engine with AES-128-ECB. The encrypted BEK, generated nonce,
> 

Re: [PATCH RFC 0/3] mm/gup: consistently call it GUP-fast

2024-03-27 Thread David Hildenbrand

On 27.03.24 16:21, Peter Xu wrote:

On Wed, Mar 27, 2024 at 02:05:35PM +0100, David Hildenbrand wrote:

Some cleanups around function names, comments and the config option of
"GUP-fast" -- GUP without "lock" safety belts on.

With this cleanup it's easy to judge which functions are GUP-fast specific.
We now consistently call it "GUP-fast", avoiding mixing it with "fast GUP",
"lockless", or simply "gup" (which I always considered confusing in the
ode).

So the magic now happens in functions that contain "gup_fast", whereby
gup_fast() is the entry point into that magic. Comments consistently
reference either "GUP-fast" or "gup_fast()".

Based on mm-unstable from today. I won't CC arch maintainers, but only
arch mailing lists, to reduce noise.

Tested on x86_64, cross compiled on a bunch of archs, whereby some of them
don't properly even compile on mm-unstable anymore in my usual setup
(alpha, arc, parisc64, sh) ... maybe the cross compilers are outdated,
but there are no new ones around. Hm.


I'm not sure what config you tried there; as I am doing some build tests
recently, I found turning off CONFIG_SAMPLES + CONFIG_GCC_PLUGINS could
avoid a lot of issues, I think it's due to libc missing.  But maybe not the
case there.


CCin Arnd; I use some of his compiler chains, others from Fedora directly. For
example for alpha and arc, the Fedora gcc is "13.2.1".


I compile quite some targets, usually with defconfig. From my compile script:

# COMPILER NAME ARCH CROSS_COMPILE CONFIG(if different from defconfig)

compile_gcc "alpha" "alpha" "alpha-linux-gnu-"
compile_gcc "arc" "arc" "arc-linux-gnu-"
compile_gcc "arm" "arm" "arm-linux-gnu-" "axm55xx_defconfig"
compile_gcc "arm-nommu" "arm" "arm-linux-gnu-" "imxrt_defconfig"
compile_gcc "arm64" "arm64" "aarch64-linux-gnu-"
compile_gcc "csky" "csky" 
"../cross/gcc-13.2.0-nolibc/csky-linux/bin/csky-linux-"
compile_gcc "loongarch" "loongarch" 
"../cross/gcc-13.2.0-nolibc/loongarch64-linux/bin/loongarch64-linux-"
compile_gcc "m68k-nommu" "m68k" "m68k-linux-gnu-" "amcore_defconfig"
compile_gcc "m68k-sun3" "m68k" "m68k-linux-gnu-" "sun3_defconfig"
compile_gcc "m68k-coldfire" "m68k" "m68k-linux-gnu-" "m5475evb_defconfig"
compile_gcc "m68k-virt" "m68k" "m68k-linux-gnu-" "virt_defconfig"
compile_gcc "microblaze" "microblaze" "microblaze-linux-gnu-"
compile_gcc "mips64" "mips" "mips64-linux-gnu-" "bigsur_defconfig"
compile_gcc "mips32-xpa" "mips" "mips64-linux-gnu-" "maltaup_xpa_defconfig"
compile_gcc "mips32-alchemy" "mips" "mips64-linux-gnu-" "gpr_defconfig"
compile_gcc "mips32" "mips" "mips64-linux-gnu-"
compile_gcc "nios2" "nios2" "nios2-linux-gnu-" "3c120_defconfig"
compile_gcc "openrisc" "openrisc" "../cross/gcc-13.2.0-nolibc/or1k-linux/bin/or1k-linux-" 
"virt_defconfig"
compile_gcc "parisc32" "parisc" "hppa-linux-gnu-" "generic-32bit_defconfig"
compile_gcc "parisc64" "parisc" "hppa64-linux-gnu-" "generic-64bit_defconfig"
compile_gcc "riscv32" "riscv" "riscv64-linux-gnu-" "32-bit.config"
compile_gcc "riscv64" "riscv" "riscv64-linux-gnu-" "64-bit.config"
compile_gcc "riscv64-nommu" "riscv" "riscv64-linux-gnu-" "nommu_virt_defconfig"
compile_gcc "s390x" "s390" "s390x-linux-gnu-"
compile_gcc "sh" "sh" "../cross/gcc-13.2.0-nolibc/sh4-linux/bin/sh4-linux-"
compile_gcc "sparc32" "sparc" "../cross/gcc-13.2.0-nolibc/sparc-linux/bin/sparc-linux-" 
"sparc32_defconfig"
compile_gcc "sparc64" "sparc" 
"../cross/gcc-13.2.0-nolibc/sparc64-linux/bin/sparc64-linux-" "sparc64_defconfig"
compile_gcc "uml64" "um" "" "x86_64_defconfig"
compile_gcc "x86" "x86" "" "i386_defconfig"
compile_gcc "x86-pae" "x86" "" "i386_defconfig"
compile_gcc "x86_64" "x86" ""
compile_gcc "xtensa" "xtensa" "../cross/gcc-13.2.0-nolibc/xtensa-linux/bin/xtensa-linux-" 
"virt_defconfig"
compile_gcc "powernv" "powerpc" 
"../cross/gcc-13.2.0-nolibc/powerpc64-linux/bin/powerpc64-linux-" "powernv_defconfig"
compile_gcc "pseries" "powerpc" 
"../cross/gcc-13.2.0-nolibc/powerpc64-linux/bin/powerpc64-linux-" "pseries_defconfig"



Some of them look like mm-unstable issue, For example, arm64 fails with

  CC  arch/arm64/mm/extable.o
In file included from ./include/linux/hugetlb.h:828,
 from security/commoncap.c:19:
./arch/arm64/include/asm/hugetlb.h:25:34: error: redefinition of 
'arch_clear_hugetlb_flags'
   25 | #define arch_clear_hugetlb_flags arch_clear_hugetlb_flags
  |  ^~~~
./include/linux/hugetlb.h:840:20: note: in expansion of macro 
'arch_clear_hugetlb_flags'
  840 | static inline void arch_clear_hugetlb_flags(struct folio *folio) { }
  |^~~~
./arch/arm64/include/asm/hugetlb.h:21:20: note: previous definition of 
'arch_clear_hugetlb_flags' with t
ype 'void(struct folio *)'
   21 | static inline void arch_clear_hugetlb_flags(struct folio *folio)
  |^~~~
In file included from ./include/linux/hugetlb.h:828,
 from mm/filemap.c:37:

Re: [PATCH v7 5/6] docs: document DCP-backed trusted keys kernel params

2024-03-27 Thread Jarkko Sakkinen
On Wed Mar 27, 2024 at 10:24 AM EET, David Gstir wrote:
> Document the kernel parameters trusted.dcp_use_otp_key
> and trusted.dcp_skip_zk_test for DCP-backed trusted keys.
>
> Co-developed-by: Richard Weinberger 
> Signed-off-by: Richard Weinberger 
> Co-developed-by: David Oberhollenzer 
> Signed-off-by: David Oberhollenzer 
> Signed-off-by: David Gstir 
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 24c02c704049..b6944e57768a 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -6698,6 +6698,7 @@
>   - "tpm"
>   - "tee"
>   - "caam"
> + - "dcp"
>   If not specified then it defaults to iterating through
>   the trust source list starting with TPM and assigns the
>   first trust source as a backend which is initialized
> @@ -6713,6 +6714,18 @@
>   If not specified, "default" is used. In this case,
>   the RNG's choice is left to each individual trust 
> source.
>  
> + trusted.dcp_use_otp_key
> + This is intended to be used in combination with
> + trusted.source=dcp and will select the DCP OTP key
> + instead of the DCP UNIQUE key blob encryption.
> +
> + trusted.dcp_skip_zk_test
> + This is intended to be used in combination with
> + trusted.source=dcp and will disable the check if all
> + the blob key is zero'ed. This is helpful for situations 
> where
> + having this key zero'ed is acceptable. E.g. in testing
> + scenarios.
> +
>   tsc=Disable clocksource stability checks for TSC.
>   Format: 
>   [x86] reliable: mark tsc clocksource as reliable, this

Nicely documented, i.e. even I can understand what is said here :-)

Reviewed-by: Jarkko Sakkinen 

BR, Jarkko


Re: [PATCH v7 2/6] KEYS: trusted: improve scalability of trust source config

2024-03-27 Thread Jarkko Sakkinen
On Wed Mar 27, 2024 at 10:24 AM EET, David Gstir wrote:
> Enabling trusted keys requires at least one trust source implementation
> (currently TPM, TEE or CAAM) to be enabled. Currently, this is
> done by checking each trust source's config option individually.
> This does not scale when more trust sources like the one for DCP
> are added, because the condition will get long and hard to read.
>
> Add config HAVE_TRUSTED_KEYS which is set to true by each trust source
> once its enabled and adapt the check for having at least one active trust
> source to use this option. Whenever a new trust source is added, it now
> needs to select HAVE_TRUSTED_KEYS.
>
> Signed-off-by: David Gstir 
> ---
>  security/keys/trusted-keys/Kconfig | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/security/keys/trusted-keys/Kconfig 
> b/security/keys/trusted-keys/Kconfig
> index dbfdd8536468..553dc117f385 100644
> --- a/security/keys/trusted-keys/Kconfig
> +++ b/security/keys/trusted-keys/Kconfig
> @@ -1,3 +1,6 @@
> +config HAVE_TRUSTED_KEYS
> + bool
> +
>  config TRUSTED_KEYS_TPM
>   bool "TPM-based trusted keys"
>   depends on TCG_TPM >= TRUSTED_KEYS
> @@ -9,6 +12,7 @@ config TRUSTED_KEYS_TPM
>   select ASN1_ENCODER
>   select OID_REGISTRY
>   select ASN1
> + select HAVE_TRUSTED_KEYS
>   help
> Enable use of the Trusted Platform Module (TPM) as trusted key
> backend. Trusted keys are random number symmetric keys,
> @@ -20,6 +24,7 @@ config TRUSTED_KEYS_TEE
>   bool "TEE-based trusted keys"
>   depends on TEE >= TRUSTED_KEYS
>   default y
> + select HAVE_TRUSTED_KEYS
>   help
> Enable use of the Trusted Execution Environment (TEE) as trusted
> key backend.
> @@ -29,10 +34,11 @@ config TRUSTED_KEYS_CAAM
>   depends on CRYPTO_DEV_FSL_CAAM_JR >= TRUSTED_KEYS
>   select CRYPTO_DEV_FSL_CAAM_BLOB_GEN
>   default y
> + select HAVE_TRUSTED_KEYS
>   help
> Enable use of NXP's Cryptographic Accelerator and Assurance Module
> (CAAM) as trusted key backend.
>  
> -if !TRUSTED_KEYS_TPM && !TRUSTED_KEYS_TEE && !TRUSTED_KEYS_CAAM
> -comment "No trust source selected!"
> +if !HAVE_TRUSTED_KEYS
> + comment "No trust source selected!"
>  endif

Tested-by: Jarkko Sakkinen  # for TRUSTED_KEYS_TPM
Reviewed-by: Jarkko Sakkinen 

BR, Jarkko


[PATCH v4 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code

2024-03-27 Thread peterx
From: Peter Xu 

Now follow_page() is ready to handle hugetlb pages in whatever form, and
over all architectures.  Switch to the generic code path.

Time to retire hugetlb_follow_page_mask(), following the previous
retirement of follow_hugetlb_page() in 4849807114b8.

There may be a slight difference of how the loops run when processing slow
GUP over a large hugetlb range on cont_pte/cont_pmd supported archs: each
loop of __get_user_pages() will resolve one pgtable entry with the patch
applied, rather than relying on the size of hugetlb hstate, the latter may
cover multiple entries in one loop.

A quick performance test on an aarch64 VM on M1 chip shows 15% degrade over
a tight loop of slow gup after the path switched.  That shouldn't be a
problem because slow-gup should not be a hot path for GUP in general: when
page is commonly present, fast-gup will already succeed, while when the
page is indeed missing and require a follow up page fault, the slow gup
degrade will probably buried in the fault paths anyway.  It also explains
why slow gup for THP used to be very slow before 57edfcfd3419 ("mm/gup:
accelerate thp gup even for "pages != NULL"") lands, the latter not part of
a performance analysis but a side benefit.  If the performance will be a
concern, we can consider handle CONT_PTE in follow_page().

Before that is justified to be necessary, keep everything clean and simple.

Reviewed-by: Jason Gunthorpe 
Signed-off-by: Peter Xu 
---
 include/linux/hugetlb.h |  7 
 mm/gup.c| 15 +++--
 mm/hugetlb.c| 71 -
 3 files changed, 5 insertions(+), 88 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 294c78b3549f..a546140f89cd 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -328,13 +328,6 @@ static inline void hugetlb_zap_end(
 {
 }
 
-static inline struct page *hugetlb_follow_page_mask(
-struct vm_area_struct *vma, unsigned long address, unsigned int flags,
-unsigned int *page_mask)
-{
-   BUILD_BUG(); /* should never be compiled in if !CONFIG_HUGETLB_PAGE*/
-}
-
 static inline int copy_hugetlb_page_range(struct mm_struct *dst,
  struct mm_struct *src,
  struct vm_area_struct *dst_vma,
diff --git a/mm/gup.c b/mm/gup.c
index a02463c9420e..c803d0b0f358 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1135,18 +1135,11 @@ static struct page *follow_page_mask(struct 
vm_area_struct *vma,
 {
pgd_t *pgd;
struct mm_struct *mm = vma->vm_mm;
+   struct page *page;
 
-   ctx->page_mask = 0;
-
-   /*
-* Call hugetlb_follow_page_mask for hugetlb vmas as it will use
-* special hugetlb page table walking code.  This eliminates the
-* need to check for hugetlb entries in the general walking code.
-*/
-   if (is_vm_hugetlb_page(vma))
-   return hugetlb_follow_page_mask(vma, address, flags,
-   >page_mask);
+   vma_pgtable_walk_begin(vma);
 
+   ctx->page_mask = 0;
pgd = pgd_offset(mm, address);
 
if (unlikely(is_hugepd(__hugepd(pgd_val(*pgd)
@@ -1157,6 +1150,8 @@ static struct page *follow_page_mask(struct 
vm_area_struct *vma,
else
page = follow_p4d_mask(vma, address, pgd, flags, ctx);
 
+   vma_pgtable_walk_end(vma);
+
return page;
 }
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 65b9c9a48fd2..cc79891a3597 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6870,77 +6870,6 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
 }
 #endif /* CONFIG_USERFAULTFD */
 
-struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
- unsigned long address, unsigned int flags,
- unsigned int *page_mask)
-{
-   struct hstate *h = hstate_vma(vma);
-   struct mm_struct *mm = vma->vm_mm;
-   unsigned long haddr = address & huge_page_mask(h);
-   struct page *page = NULL;
-   spinlock_t *ptl;
-   pte_t *pte, entry;
-   int ret;
-
-   hugetlb_vma_lock_read(vma);
-   pte = hugetlb_walk(vma, haddr, huge_page_size(h));
-   if (!pte)
-   goto out_unlock;
-
-   ptl = huge_pte_lock(h, mm, pte);
-   entry = huge_ptep_get(pte);
-   if (pte_present(entry)) {
-   page = pte_page(entry);
-
-   if (!huge_pte_write(entry)) {
-   if (flags & FOLL_WRITE) {
-   page = NULL;
-   goto out;
-   }
-
-   if (gup_must_unshare(vma, flags, page)) {
-   /* Tell the caller to do unsharing */
-   page = ERR_PTR(-EMLINK);
-   goto out;
-   }
-   }
-
-   page = nth_page(page, 

[PATCH v4 11/13] mm/gup: Handle huge pmd for follow_pmd_mask()

2024-03-27 Thread peterx
From: Peter Xu 

Replace pmd_trans_huge() with pmd_leaf() to also cover pmd_huge() as long
as enabled.

FOLL_TOUCH and FOLL_SPLIT_PMD only apply to THP, not yet huge.

Since now follow_trans_huge_pmd() can process hugetlb pages, renaming it
into follow_huge_pmd() to match what it does.  Move it into gup.c so not
depend on CONFIG_THP.

When at it, move the ctx->page_mask setup into follow_huge_pmd(), only set
it when the page is valid.  It was not a bug to set it before even if GUP
failed (page==NULL), because follow_page_mask() callers always ignores
page_mask if so.  But doing so makes the code cleaner.

Reviewed-by: Jason Gunthorpe 
Signed-off-by: Peter Xu 
---
 mm/gup.c | 107 ---
 mm/huge_memory.c |  86 +
 mm/internal.h|   5 +--
 3 files changed, 105 insertions(+), 93 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 1e5d42211bb4..a81184b01276 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -580,6 +580,93 @@ static struct page *follow_huge_pud(struct vm_area_struct 
*vma,
 
return page;
 }
+
+/* FOLL_FORCE can write to even unwritable PMDs in COW mappings. */
+static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page,
+   struct vm_area_struct *vma,
+   unsigned int flags)
+{
+   /* If the pmd is writable, we can write to the page. */
+   if (pmd_write(pmd))
+   return true;
+
+   /* Maybe FOLL_FORCE is set to override it? */
+   if (!(flags & FOLL_FORCE))
+   return false;
+
+   /* But FOLL_FORCE has no effect on shared mappings */
+   if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED))
+   return false;
+
+   /* ... or read-only private ones */
+   if (!(vma->vm_flags & VM_MAYWRITE))
+   return false;
+
+   /* ... or already writable ones that just need to take a write fault */
+   if (vma->vm_flags & VM_WRITE)
+   return false;
+
+   /*
+* See can_change_pte_writable(): we broke COW and could map the page
+* writable if we have an exclusive anonymous page ...
+*/
+   if (!page || !PageAnon(page) || !PageAnonExclusive(page))
+   return false;
+
+   /* ... and a write-fault isn't required for other reasons. */
+   if (vma_soft_dirty_enabled(vma) && !pmd_soft_dirty(pmd))
+   return false;
+   return !userfaultfd_huge_pmd_wp(vma, pmd);
+}
+
+static struct page *follow_huge_pmd(struct vm_area_struct *vma,
+   unsigned long addr, pmd_t *pmd,
+   unsigned int flags,
+   struct follow_page_context *ctx)
+{
+   struct mm_struct *mm = vma->vm_mm;
+   pmd_t pmdval = *pmd;
+   struct page *page;
+   int ret;
+
+   assert_spin_locked(pmd_lockptr(mm, pmd));
+
+   page = pmd_page(pmdval);
+   VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
+
+   if ((flags & FOLL_WRITE) &&
+   !can_follow_write_pmd(pmdval, page, vma, flags))
+   return NULL;
+
+   /* Avoid dumping huge zero page */
+   if ((flags & FOLL_DUMP) && is_huge_zero_pmd(pmdval))
+   return ERR_PTR(-EFAULT);
+
+   if (pmd_protnone(*pmd) && !gup_can_follow_protnone(vma, flags))
+   return NULL;
+
+   if (!pmd_write(pmdval) && gup_must_unshare(vma, flags, page))
+   return ERR_PTR(-EMLINK);
+
+   VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
+   !PageAnonExclusive(page), page);
+
+   ret = try_grab_page(page, flags);
+   if (ret)
+   return ERR_PTR(ret);
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+   if (pmd_trans_huge(pmdval) && (flags & FOLL_TOUCH))
+   touch_pmd(vma, addr, pmd, flags & FOLL_WRITE);
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+
+   page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT;
+   ctx->page_mask = HPAGE_PMD_NR - 1;
+   VM_BUG_ON_PAGE(!PageCompound(page) && !is_zone_device_page(page), page);
+
+   return page;
+}
+
 #else  /* CONFIG_PGTABLE_HAS_HUGE_LEAVES */
 static struct page *follow_huge_pud(struct vm_area_struct *vma,
unsigned long addr, pud_t *pudp,
@@ -587,6 +674,14 @@ static struct page *follow_huge_pud(struct vm_area_struct 
*vma,
 {
return NULL;
 }
+
+static struct page *follow_huge_pmd(struct vm_area_struct *vma,
+   unsigned long addr, pmd_t *pmd,
+   unsigned int flags,
+   struct follow_page_context *ctx)
+{
+   return NULL;
+}
 #endif /* CONFIG_PGTABLE_HAS_HUGE_LEAVES */
 
 static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
@@ -784,31 +879,31 @@ static struct page *follow_pmd_mask(struct vm_area_struct 
*vma,
 

[PATCH v4 12/13] mm/gup: Handle hugepd for follow_page()

2024-03-27 Thread peterx
From: Peter Xu 

Hugepd is only used in PowerPC so far on 4K page size kernels where hash
mmu is used.  follow_page_mask() used to leverage hugetlb APIs to access
hugepd entries.  Teach follow_page_mask() itself on hugepd.

With previous refactors on fast-gup gup_huge_pd(), most of the code can be
leveraged.  There's something not needed for follow page, for example,
gup_hugepte() tries to detect pgtable entry change which will never happen
with slow gup (which has the pgtable lock held), but that's not a problem
to check.

Since follow_page() always only fetch one page, set the end to "address +
PAGE_SIZE" should suffice.  We will still do the pgtable walk once for each
hugetlb page by setting ctx->page_mask properly.

One thing worth mentioning is that some level of pgtable's _bad() helper
will report is_hugepd() entries as TRUE on Power8 hash MMUs.  I think it at
least applies to PUD on Power8 with 4K pgsize.  It means feeding a hugepd
entry to pud_bad() will report a false positive. Let's leave that for now
because it can be arch-specific where I am a bit declined to touch.  In
this patch it's not a problem as long as hugepd is detected before any bad
pgtable entries.

To allow slow gup like follow_*_page() to access hugepd helpers, hugepd
codes are moved to the top.  Besides that, the helper record_subpages()
will be used by either hugepd or fast-gup now. To avoid "unused function"
warnings we must provide a "#ifdef" for it, unfortunately.

Signed-off-by: Peter Xu 
---
 mm/gup.c | 269 +--
 1 file changed, 163 insertions(+), 106 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index a81184b01276..a02463c9420e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -500,6 +500,149 @@ static inline void mm_set_has_pinned_flag(unsigned long 
*mm_flags)
 }
 
 #ifdef CONFIG_MMU
+
+#if defined(CONFIG_ARCH_HAS_HUGEPD) || defined(CONFIG_HAVE_FAST_GUP)
+static int record_subpages(struct page *page, unsigned long sz,
+  unsigned long addr, unsigned long end,
+  struct page **pages)
+{
+   struct page *start_page;
+   int nr;
+
+   start_page = nth_page(page, (addr & (sz - 1)) >> PAGE_SHIFT);
+   for (nr = 0; addr != end; nr++, addr += PAGE_SIZE)
+   pages[nr] = nth_page(start_page, nr);
+
+   return nr;
+}
+#endif /* CONFIG_ARCH_HAS_HUGEPD || CONFIG_HAVE_FAST_GUP */
+
+#ifdef CONFIG_ARCH_HAS_HUGEPD
+static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end,
+ unsigned long sz)
+{
+   unsigned long __boundary = (addr + sz) & ~(sz-1);
+   return (__boundary - 1 < end - 1) ? __boundary : end;
+}
+
+static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
+  unsigned long end, unsigned int flags,
+  struct page **pages, int *nr)
+{
+   unsigned long pte_end;
+   struct page *page;
+   struct folio *folio;
+   pte_t pte;
+   int refs;
+
+   pte_end = (addr + sz) & ~(sz-1);
+   if (pte_end < end)
+   end = pte_end;
+
+   pte = huge_ptep_get(ptep);
+
+   if (!pte_access_permitted(pte, flags & FOLL_WRITE))
+   return 0;
+
+   /* hugepages are never "special" */
+   VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
+
+   page = pte_page(pte);
+   refs = record_subpages(page, sz, addr, end, pages + *nr);
+
+   folio = try_grab_folio(page, refs, flags);
+   if (!folio)
+   return 0;
+
+   if (unlikely(pte_val(pte) != pte_val(ptep_get(ptep {
+   gup_put_folio(folio, refs, flags);
+   return 0;
+   }
+
+   if (!pte_write(pte) && gup_must_unshare(NULL, flags, >page)) {
+   gup_put_folio(folio, refs, flags);
+   return 0;
+   }
+
+   *nr += refs;
+   folio_set_referenced(folio);
+   return 1;
+}
+
+/*
+ * NOTE: currently GUP for a hugepd is only possible on hugetlbfs file
+ * systems on Power, which does not have issue with folio writeback against
+ * GUP updates.  When hugepd will be extended to support non-hugetlbfs or
+ * even anonymous memory, we need to do extra check as what we do with most
+ * of the other folios. See writable_file_mapping_allowed() and
+ * folio_fast_pin_allowed() for more information.
+ */
+static int gup_huge_pd(hugepd_t hugepd, unsigned long addr,
+   unsigned int pdshift, unsigned long end, unsigned int flags,
+   struct page **pages, int *nr)
+{
+   pte_t *ptep;
+   unsigned long sz = 1UL << hugepd_shift(hugepd);
+   unsigned long next;
+
+   ptep = hugepte_offset(hugepd, addr, pdshift);
+   do {
+   next = hugepte_addr_end(addr, end, sz);
+   if (!gup_hugepte(ptep, sz, addr, end, flags, pages, nr))
+   return 0;
+   } while (ptep++, addr = next, addr != end);
+
+   return 1;
+}
+
+static struct page 

[PATCH v4 10/13] mm/gup: Handle huge pud for follow_pud_mask()

2024-03-27 Thread peterx
From: Peter Xu 

Teach follow_pud_mask() to be able to handle normal PUD pages like hugetlb.

Rename follow_devmap_pud() to follow_huge_pud() so that it can process
either huge devmap or hugetlb. Move it out of TRANSPARENT_HUGEPAGE_PUD and
and huge_memory.c (which relies on CONFIG_THP).  Switch to pud_leaf() to
detect both cases in the slow gup.

In the new follow_huge_pud(), taking care of possible CoR for hugetlb if
necessary.  touch_pud() needs to be moved out of huge_memory.c to be
accessable from gup.c even if !THP.

Since at it, optimize the non-present check by adding a pud_present() early
check before taking the pgtable lock, failing the follow_page() early if
PUD is not present: that is required by both devmap or hugetlb.  Use
pud_huge() to also cover the pud_devmap() case.

One more trivial thing to mention is, introduce "pud_t pud" in the code
paths along the way, so the code doesn't dereference *pudp multiple time.
Not only because that looks less straightforward, but also because if the
dereference really happened, it's not clear whether there can be race to
see different *pudp values when it's being modified at the same time.

Setting ctx->page_mask properly for a PUD entry.  As a side effect, this
patch should also be able to optimize devmap GUP on PUD to be able to jump
over the whole PUD range, but not yet verified.  Hugetlb already can do so
prior to this patch.

Reviewed-by: Jason Gunthorpe 
Signed-off-by: Peter Xu 
---
 include/linux/huge_mm.h |  8 -
 mm/gup.c| 70 +++--
 mm/huge_memory.c| 47 ++-
 mm/internal.h   |  2 ++
 4 files changed, 71 insertions(+), 56 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index d3bb25c39482..3f36511bdc02 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -351,8 +351,6 @@ static inline bool folio_test_pmd_mappable(struct folio 
*folio)
 
 struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
pmd_t *pmd, int flags, struct dev_pagemap **pgmap);
-struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr,
-   pud_t *pud, int flags, struct dev_pagemap **pgmap);
 
 vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf);
 
@@ -507,12 +505,6 @@ static inline struct page *follow_devmap_pmd(struct 
vm_area_struct *vma,
return NULL;
 }
 
-static inline struct page *follow_devmap_pud(struct vm_area_struct *vma,
-   unsigned long addr, pud_t *pud, int flags, struct dev_pagemap **pgmap)
-{
-   return NULL;
-}
-
 static inline bool thp_migration_supported(void)
 {
return false;
diff --git a/mm/gup.c b/mm/gup.c
index 26b8cca24077..1e5d42211bb4 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -525,6 +525,70 @@ static struct page *no_page_table(struct vm_area_struct 
*vma,
return NULL;
 }
 
+#ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES
+static struct page *follow_huge_pud(struct vm_area_struct *vma,
+   unsigned long addr, pud_t *pudp,
+   int flags, struct follow_page_context *ctx)
+{
+   struct mm_struct *mm = vma->vm_mm;
+   struct page *page;
+   pud_t pud = *pudp;
+   unsigned long pfn = pud_pfn(pud);
+   int ret;
+
+   assert_spin_locked(pud_lockptr(mm, pudp));
+
+   if ((flags & FOLL_WRITE) && !pud_write(pud))
+   return NULL;
+
+   if (!pud_present(pud))
+   return NULL;
+
+   pfn += (addr & ~PUD_MASK) >> PAGE_SHIFT;
+
+   if (IS_ENABLED(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) &&
+   pud_devmap(pud)) {
+   /*
+* device mapped pages can only be returned if the caller
+* will manage the page reference count.
+*
+* At least one of FOLL_GET | FOLL_PIN must be set, so
+* assert that here:
+*/
+   if (!(flags & (FOLL_GET | FOLL_PIN)))
+   return ERR_PTR(-EEXIST);
+
+   if (flags & FOLL_TOUCH)
+   touch_pud(vma, addr, pudp, flags & FOLL_WRITE);
+
+   ctx->pgmap = get_dev_pagemap(pfn, ctx->pgmap);
+   if (!ctx->pgmap)
+   return ERR_PTR(-EFAULT);
+   }
+
+   page = pfn_to_page(pfn);
+
+   if (!pud_devmap(pud) && !pud_write(pud) &&
+   gup_must_unshare(vma, flags, page))
+   return ERR_PTR(-EMLINK);
+
+   ret = try_grab_page(page, flags);
+   if (ret)
+   page = ERR_PTR(ret);
+   else
+   ctx->page_mask = HPAGE_PUD_NR - 1;
+
+   return page;
+}
+#else  /* CONFIG_PGTABLE_HAS_HUGE_LEAVES */
+static struct page *follow_huge_pud(struct vm_area_struct *vma,
+   unsigned long addr, pud_t *pudp,
+   int flags, struct follow_page_context *ctx)
+{
+   return NULL;
+}

[PATCH v4 09/13] mm/gup: Cache *pudp in follow_pud_mask()

2024-03-27 Thread peterx
From: Peter Xu 

Introduce "pud_t pud" in the function, so the code won't dereference *pudp
multiple time.  Not only because that looks less straightforward, but also
because if the dereference really happened, it's not clear whether there
can be race to see different *pudp values if it's being modified at the
same time.

Acked-by: James Houghton 
Reviewed-by: Jason Gunthorpe 
Signed-off-by: Peter Xu 
---
 mm/gup.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index ef46a7053e16..26b8cca24077 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -753,26 +753,27 @@ static struct page *follow_pud_mask(struct vm_area_struct 
*vma,
unsigned int flags,
struct follow_page_context *ctx)
 {
-   pud_t *pud;
+   pud_t *pudp, pud;
spinlock_t *ptl;
struct page *page;
struct mm_struct *mm = vma->vm_mm;
 
-   pud = pud_offset(p4dp, address);
-   if (pud_none(*pud))
+   pudp = pud_offset(p4dp, address);
+   pud = READ_ONCE(*pudp);
+   if (pud_none(pud))
return no_page_table(vma, flags, address);
-   if (pud_devmap(*pud)) {
-   ptl = pud_lock(mm, pud);
-   page = follow_devmap_pud(vma, address, pud, flags, >pgmap);
+   if (pud_devmap(pud)) {
+   ptl = pud_lock(mm, pudp);
+   page = follow_devmap_pud(vma, address, pudp, flags, 
>pgmap);
spin_unlock(ptl);
if (page)
return page;
return no_page_table(vma, flags, address);
}
-   if (unlikely(pud_bad(*pud)))
+   if (unlikely(pud_bad(pud)))
return no_page_table(vma, flags, address);
 
-   return follow_pmd_mask(vma, address, pud, flags, ctx);
+   return follow_pmd_mask(vma, address, pudp, flags, ctx);
 }
 
 static struct page *follow_p4d_mask(struct vm_area_struct *vma,
-- 
2.44.0



[PATCH v4 08/13] mm/gup: Handle hugetlb for no_page_table()

2024-03-27 Thread peterx
From: Peter Xu 

no_page_table() is not yet used for hugetlb code paths. Make it prepared.

The major difference here is hugetlb will return -EFAULT as long as page
cache does not exist, even if VM_SHARED.  See hugetlb_follow_page_mask().

Pass "address" into no_page_table() too, as hugetlb will need it.

Reviewed-by: Christoph Hellwig 
Reviewed-by: Jason Gunthorpe 
Signed-off-by: Peter Xu 
---
 mm/gup.c | 44 ++--
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index c2881772216b..ef46a7053e16 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -501,19 +501,27 @@ static inline void mm_set_has_pinned_flag(unsigned long 
*mm_flags)
 
 #ifdef CONFIG_MMU
 static struct page *no_page_table(struct vm_area_struct *vma,
-   unsigned int flags)
+ unsigned int flags, unsigned long address)
 {
+   if (!(flags & FOLL_DUMP))
+   return NULL;
+
/*
-* When core dumping an enormous anonymous area that nobody
-* has touched so far, we don't want to allocate unnecessary pages or
+* When core dumping, we don't want to allocate unnecessary pages or
 * page tables.  Return error instead of NULL to skip handle_mm_fault,
 * then get_dump_page() will return NULL to leave a hole in the dump.
 * But we can only make this optimization where a hole would surely
 * be zero-filled if handle_mm_fault() actually did handle it.
 */
-   if ((flags & FOLL_DUMP) &&
-   (vma_is_anonymous(vma) || !vma->vm_ops->fault))
+   if (is_vm_hugetlb_page(vma)) {
+   struct hstate *h = hstate_vma(vma);
+
+   if (!hugetlbfs_pagecache_present(h, vma, address))
+   return ERR_PTR(-EFAULT);
+   } else if ((vma_is_anonymous(vma) || !vma->vm_ops->fault)) {
return ERR_PTR(-EFAULT);
+   }
+
return NULL;
 }
 
@@ -593,7 +601,7 @@ static struct page *follow_page_pte(struct vm_area_struct 
*vma,
 
ptep = pte_offset_map_lock(mm, pmd, address, );
if (!ptep)
-   return no_page_table(vma, flags);
+   return no_page_table(vma, flags, address);
pte = ptep_get(ptep);
if (!pte_present(pte))
goto no_page;
@@ -685,7 +693,7 @@ static struct page *follow_page_pte(struct vm_area_struct 
*vma,
pte_unmap_unlock(ptep, ptl);
if (!pte_none(pte))
return NULL;
-   return no_page_table(vma, flags);
+   return no_page_table(vma, flags, address);
 }
 
 static struct page *follow_pmd_mask(struct vm_area_struct *vma,
@@ -701,27 +709,27 @@ static struct page *follow_pmd_mask(struct vm_area_struct 
*vma,
pmd = pmd_offset(pudp, address);
pmdval = pmdp_get_lockless(pmd);
if (pmd_none(pmdval))
-   return no_page_table(vma, flags);
+   return no_page_table(vma, flags, address);
if (!pmd_present(pmdval))
-   return no_page_table(vma, flags);
+   return no_page_table(vma, flags, address);
if (pmd_devmap(pmdval)) {
ptl = pmd_lock(mm, pmd);
page = follow_devmap_pmd(vma, address, pmd, flags, >pgmap);
spin_unlock(ptl);
if (page)
return page;
-   return no_page_table(vma, flags);
+   return no_page_table(vma, flags, address);
}
if (likely(!pmd_trans_huge(pmdval)))
return follow_page_pte(vma, address, pmd, flags, >pgmap);
 
if (pmd_protnone(pmdval) && !gup_can_follow_protnone(vma, flags))
-   return no_page_table(vma, flags);
+   return no_page_table(vma, flags, address);
 
ptl = pmd_lock(mm, pmd);
if (unlikely(!pmd_present(*pmd))) {
spin_unlock(ptl);
-   return no_page_table(vma, flags);
+   return no_page_table(vma, flags, address);
}
if (unlikely(!pmd_trans_huge(*pmd))) {
spin_unlock(ptl);
@@ -752,17 +760,17 @@ static struct page *follow_pud_mask(struct vm_area_struct 
*vma,
 
pud = pud_offset(p4dp, address);
if (pud_none(*pud))
-   return no_page_table(vma, flags);
+   return no_page_table(vma, flags, address);
if (pud_devmap(*pud)) {
ptl = pud_lock(mm, pud);
page = follow_devmap_pud(vma, address, pud, flags, >pgmap);
spin_unlock(ptl);
if (page)
return page;
-   return no_page_table(vma, flags);
+   return no_page_table(vma, flags, address);
}
if (unlikely(pud_bad(*pud)))
-   return no_page_table(vma, flags);
+   return no_page_table(vma, flags, address);
 
return follow_pmd_mask(vma, address, pud, flags, ctx);
 }
@@ -777,10 +785,10 @@ static 

[PATCH v4 07/13] mm/gup: Refactor record_subpages() to find 1st small page

2024-03-27 Thread peterx
From: Peter Xu 

All the fast-gup functions take a tail page to operate, always need to do
page mask calculations before feeding that into record_subpages().

Merge that logic into record_subpages(), so that it will do the nth_page()
calculation.

Reviewed-by: Jason Gunthorpe 
Signed-off-by: Peter Xu 
---
 mm/gup.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index db35b056fc9a..c2881772216b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2779,13 +2779,16 @@ static int __gup_device_huge_pud(pud_t pud, pud_t 
*pudp, unsigned long addr,
 }
 #endif
 
-static int record_subpages(struct page *page, unsigned long addr,
-  unsigned long end, struct page **pages)
+static int record_subpages(struct page *page, unsigned long sz,
+  unsigned long addr, unsigned long end,
+  struct page **pages)
 {
+   struct page *start_page;
int nr;
 
+   start_page = nth_page(page, (addr & (sz - 1)) >> PAGE_SHIFT);
for (nr = 0; addr != end; nr++, addr += PAGE_SIZE)
-   pages[nr] = nth_page(page, nr);
+   pages[nr] = nth_page(start_page, nr);
 
return nr;
 }
@@ -2820,8 +2823,8 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, 
unsigned long addr,
/* hugepages are never "special" */
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 
-   page = nth_page(pte_page(pte), (addr & (sz - 1)) >> PAGE_SHIFT);
-   refs = record_subpages(page, addr, end, pages + *nr);
+   page = pte_page(pte);
+   refs = record_subpages(page, sz, addr, end, pages + *nr);
 
folio = try_grab_folio(page, refs, flags);
if (!folio)
@@ -2894,8 +2897,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned 
long addr,
 pages, nr);
}
 
-   page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
-   refs = record_subpages(page, addr, end, pages + *nr);
+   page = pmd_page(orig);
+   refs = record_subpages(page, PMD_SIZE, addr, end, pages + *nr);
 
folio = try_grab_folio(page, refs, flags);
if (!folio)
@@ -2938,8 +2941,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned 
long addr,
 pages, nr);
}
 
-   page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT);
-   refs = record_subpages(page, addr, end, pages + *nr);
+   page = pud_page(orig);
+   refs = record_subpages(page, PUD_SIZE, addr, end, pages + *nr);
 
folio = try_grab_folio(page, refs, flags);
if (!folio)
@@ -2978,8 +2981,8 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned 
long addr,
 
BUILD_BUG_ON(pgd_devmap(orig));
 
-   page = nth_page(pgd_page(orig), (addr & ~PGDIR_MASK) >> PAGE_SHIFT);
-   refs = record_subpages(page, addr, end, pages + *nr);
+   page = pgd_page(orig);
+   refs = record_subpages(page, PGDIR_SIZE, addr, end, pages + *nr);
 
folio = try_grab_folio(page, refs, flags);
if (!folio)
-- 
2.44.0



[PATCH v4 06/13] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

2024-03-27 Thread peterx
From: Peter Xu 

Hugepd format for GUP is only used in PowerPC with hugetlbfs.  There are
some kernel usage of hugepd (can refer to hugepd_populate_kernel() for
PPC_8XX), however those pages are not candidates for GUP.

Commit a6e79df92e4a ("mm/gup: disallow FOLL_LONGTERM GUP-fast writing to
file-backed mappings") added a check to fail gup-fast if there's potential
risk of violating GUP over writeback file systems.  That should never apply
to hugepd.  Considering that hugepd is an old format (and even
software-only), there's no plan to extend hugepd into other file typed
memories that is prone to the same issue.

Drop that check, not only because it'll never be true for hugepd per any
known plan, but also it paves way for reusing the function outside
fast-gup.

To make sure we'll still remember this issue just in case hugepd will be
extended to support non-hugetlbfs memories, add a rich comment above
gup_huge_pd(), explaining the issue with proper references.

Cc: Christoph Hellwig 
Cc: Lorenzo Stoakes 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Peter Xu 
---
 mm/gup.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index e7510b6ce765..db35b056fc9a 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2832,11 +2832,6 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, 
unsigned long addr,
return 0;
}
 
-   if (!folio_fast_pin_allowed(folio, flags)) {
-   gup_put_folio(folio, refs, flags);
-   return 0;
-   }
-
if (!pte_write(pte) && gup_must_unshare(NULL, flags, >page)) {
gup_put_folio(folio, refs, flags);
return 0;
@@ -2847,6 +2842,14 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, 
unsigned long addr,
return 1;
 }
 
+/*
+ * NOTE: currently GUP for a hugepd is only possible on hugetlbfs file
+ * systems on Power, which does not have issue with folio writeback against
+ * GUP updates.  When hugepd will be extended to support non-hugetlbfs or
+ * even anonymous memory, we need to do extra check as what we do with most
+ * of the other folios. See writable_file_mapping_allowed() and
+ * folio_fast_pin_allowed() for more information.
+ */
 static int gup_huge_pd(hugepd_t hugepd, unsigned long addr,
unsigned int pdshift, unsigned long end, unsigned int flags,
struct page **pages, int *nr)
-- 
2.44.0



[PATCH v4 05/13] mm/arch: Provide pud_pfn() fallback

2024-03-27 Thread peterx
From: Peter Xu 

The comment in the code explains the reasons.  We took a different approach
comparing to pmd_pfn() by providing a fallback function.

Another option is to provide some lower level config options (compare to
HUGETLB_PAGE or THP) to identify which layer an arch can support for such
huge mappings.  However that can be an overkill.

Cc: Mike Rapoport (IBM) 
Cc: Matthew Wilcox 
Reviewed-by: Jason Gunthorpe 
Signed-off-by: Peter Xu 
---
 arch/riscv/include/asm/pgtable.h|  1 +
 arch/s390/include/asm/pgtable.h |  1 +
 arch/sparc/include/asm/pgtable_64.h |  1 +
 arch/x86/include/asm/pgtable.h  |  1 +
 include/linux/pgtable.h | 10 ++
 5 files changed, 14 insertions(+)

diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 20242402fc11..0ca28cc8e3fa 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -646,6 +646,7 @@ static inline unsigned long pmd_pfn(pmd_t pmd)
 
 #define __pud_to_phys(pud)  (__page_val_to_pfn(pud_val(pud)) << PAGE_SHIFT)
 
+#define pud_pfn pud_pfn
 static inline unsigned long pud_pfn(pud_t pud)
 {
return ((__pud_to_phys(pud) & PUD_MASK) >> PAGE_SHIFT);
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 1a71cb19c089..6cbbe473f680 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1414,6 +1414,7 @@ static inline unsigned long pud_deref(pud_t pud)
return (unsigned long)__va(pud_val(pud) & origin_mask);
 }
 
+#define pud_pfn pud_pfn
 static inline unsigned long pud_pfn(pud_t pud)
 {
return __pa(pud_deref(pud)) >> PAGE_SHIFT;
diff --git a/arch/sparc/include/asm/pgtable_64.h 
b/arch/sparc/include/asm/pgtable_64.h
index 4d1bafaba942..26efc9bb644a 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -875,6 +875,7 @@ static inline bool pud_leaf(pud_t pud)
return pte_val(pte) & _PAGE_PMD_HUGE;
 }
 
+#define pud_pfn pud_pfn
 static inline unsigned long pud_pfn(pud_t pud)
 {
pte_t pte = __pte(pud_val(pud));
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index cefc7a84f7a4..273f7557218c 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -234,6 +234,7 @@ static inline unsigned long pmd_pfn(pmd_t pmd)
return (pfn & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
 }
 
+#define pud_pfn pud_pfn
 static inline unsigned long pud_pfn(pud_t pud)
 {
phys_addr_t pfn = pud_val(pud);
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 600e17d03659..75fe309a4e10 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1817,6 +1817,16 @@ typedef unsigned int pgtbl_mod_mask;
 #define pte_leaf_size(x) PAGE_SIZE
 #endif
 
+/*
+ * We always define pmd_pfn for all archs as it's used in lots of generic
+ * code.  Now it happens too for pud_pfn (and can happen for larger
+ * mappings too in the future; we're not there yet).  Instead of defining
+ * it for all archs (like pmd_pfn), provide a fallback.
+ */
+#ifndef pud_pfn
+#define pud_pfn(x) ({ BUILD_BUG(); 0; })
+#endif
+
 /*
  * Some architectures have MMUs that are configurable or selectable at boot
  * time. These lead to variable PTRS_PER_x. For statically allocated arrays it
-- 
2.44.0



[PATCH v4 04/13] mm: Introduce vma_pgtable_walk_{begin|end}()

2024-03-27 Thread peterx
From: Peter Xu 

Introduce per-vma begin()/end() helpers for pgtable walks.  This is a
preparation work to merge hugetlb pgtable walkers with generic mm.

The helpers need to be called before and after a pgtable walk, will start
to be needed if the pgtable walker code supports hugetlb pages.  It's a
hook point for any type of VMA, but for now only hugetlb uses it to
stablize the pgtable pages from getting away (due to possible pmd
unsharing).

Reviewed-by: Christoph Hellwig 
Reviewed-by: Muchun Song 
Signed-off-by: Peter Xu 
---
 include/linux/mm.h |  3 +++
 mm/memory.c| 12 
 2 files changed, 15 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index afe27ff3fa94..d8f78017d271 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4233,4 +4233,7 @@ static inline bool pfn_is_unaccepted_memory(unsigned long 
pfn)
return range_contains_unaccepted_memory(paddr, paddr + PAGE_SIZE);
 }
 
+void vma_pgtable_walk_begin(struct vm_area_struct *vma);
+void vma_pgtable_walk_end(struct vm_area_struct *vma);
+
 #endif /* _LINUX_MM_H */
diff --git a/mm/memory.c b/mm/memory.c
index 3d0c0cc33c57..27d173f9a521 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6438,3 +6438,15 @@ void ptlock_free(struct ptdesc *ptdesc)
kmem_cache_free(page_ptl_cachep, ptdesc->ptl);
 }
 #endif
+
+void vma_pgtable_walk_begin(struct vm_area_struct *vma)
+{
+   if (is_vm_hugetlb_page(vma))
+   hugetlb_vma_lock_read(vma);
+}
+
+void vma_pgtable_walk_end(struct vm_area_struct *vma)
+{
+   if (is_vm_hugetlb_page(vma))
+   hugetlb_vma_unlock_read(vma);
+}
-- 
2.44.0



[PATCH v4 03/13] mm: Make HPAGE_PXD_* macros even if !THP

2024-03-27 Thread peterx
From: Peter Xu 

These macros can be helpful when we plan to merge hugetlb code into generic
code.  Move them out and define them as long as PGTABLE_HAS_HUGE_LEAVES is
selected, because there are systems that only define HUGETLB_PAGE not THP.

One note here is HPAGE_PMD_SHIFT must be defined even if PMD_SHIFT is not
defined (e.g. !CONFIG_MMU case); it (or in other forms, like HPAGE_PMD_NR)
is already used in lots of common codes without ifdef guards.  Use the old
trick to let complations work.

Here we only need to differenciate HPAGE_PXD_SHIFT definitions. All the
rest macros will be defined based on it.  When at it, move HPAGE_PMD_NR /
HPAGE_PMD_ORDER over together.

Signed-off-by: Peter Xu 
---
 include/linux/huge_mm.h | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7576025db55d..d3bb25c39482 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -64,9 +64,6 @@ ssize_t single_hugepage_flag_show(struct kobject *kobj,
  enum transparent_hugepage_flag flag);
 extern struct kobj_attribute shmem_enabled_attr;
 
-#define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT)
-#define HPAGE_PMD_NR (1<

[PATCH v4 02/13] mm/hugetlb: Declare hugetlbfs_pagecache_present() non-static

2024-03-27 Thread peterx
From: Peter Xu 

It will be used outside hugetlb.c soon.

Signed-off-by: Peter Xu 
---
 include/linux/hugetlb.h | 9 +
 mm/hugetlb.c| 4 ++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index d748628efc5e..294c78b3549f 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -174,6 +174,9 @@ u32 hugetlb_fault_mutex_hash(struct address_space *mapping, 
pgoff_t idx);
 
 pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
  unsigned long addr, pud_t *pud);
+bool hugetlbfs_pagecache_present(struct hstate *h,
+struct vm_area_struct *vma,
+unsigned long address);
 
 struct address_space *hugetlb_page_mapping_lock_write(struct page *hpage);
 
@@ -1228,6 +1231,12 @@ static inline void hugetlb_register_node(struct node 
*node)
 static inline void hugetlb_unregister_node(struct node *node)
 {
 }
+
+static inline bool hugetlbfs_pagecache_present(
+struct hstate *h, struct vm_area_struct *vma, unsigned long address)
+{
+   return false;
+}
 #endif /* CONFIG_HUGETLB_PAGE */
 
 static inline spinlock_t *huge_pte_lock(struct hstate *h,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f9640a81226e..65b9c9a48fd2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6110,8 +6110,8 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct 
vm_area_struct *vma,
 /*
  * Return whether there is a pagecache page to back given address within VMA.
  */
-static bool hugetlbfs_pagecache_present(struct hstate *h,
-   struct vm_area_struct *vma, unsigned long address)
+bool hugetlbfs_pagecache_present(struct hstate *h,
+struct vm_area_struct *vma, unsigned long 
address)
 {
struct address_space *mapping = vma->vm_file->f_mapping;
pgoff_t idx = linear_page_index(vma, address);
-- 
2.44.0



[PATCH v4 01/13] mm/Kconfig: CONFIG_PGTABLE_HAS_HUGE_LEAVES

2024-03-27 Thread peterx
From: Peter Xu 

Introduce a config option that will be selected as long as huge leaves are
involved in pgtable (thp or hugetlbfs).  It would be useful to mark any
code with this new config that can process either hugetlb or thp pages in
any level that is higher than pte level.

Reviewed-by: Jason Gunthorpe 
Signed-off-by: Peter Xu 
---
 mm/Kconfig | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index b924f4a5a3ef..497cdf4d8ebf 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -850,6 +850,12 @@ config READ_ONLY_THP_FOR_FS
 
 endif # TRANSPARENT_HUGEPAGE
 
+#
+# The architecture supports pgtable leaves that is larger than PAGE_SIZE
+#
+config PGTABLE_HAS_HUGE_LEAVES
+   def_bool TRANSPARENT_HUGEPAGE || HUGETLB_PAGE
+
 #
 # UP and nommu archs use km based percpu allocator
 #
-- 
2.44.0



[PATCH v4 00/13] mm/gup: Unify hugetlb, part 2

2024-03-27 Thread peterx
From: Peter Xu 

v4:
- Fix build issues, tested on more archs/configs ([x86_64, i386, arm, arm64,
  powerpc, riscv, s390] x [allno, alldef, allmod]).
  - Squashed the fixup series into v3, touched up commit messages [1]
  - Added the patch to fix pud_pfn() into the series [2]
  - Fixed one more build issue on arm+alldefconfig, where pgd_t is a
two-item array.
- Manage R-bs: add some, remove some (due to the squashes above)
- Rebase to latest mm-unstable (2f6182cd23a7, March 26th)

rfc: https://lore.kernel.org/r/20231116012908.392077-1-pet...@redhat.com
v1:  https://lore.kernel.org/r/20231219075538.414708-1-pet...@redhat.com
v2:  https://lore.kernel.org/r/20240103091423.400294-1-pet...@redhat.com
v3:  https://lore.kernel.org/r/20240321220802.679544-1-pet...@redhat.com

The series removes the hugetlb slow gup path after a previous refactor work
[1], so that slow gup now uses the exact same path to process all kinds of
memory including hugetlb.

For the long term, we may want to remove most, if not all, call sites of
huge_pte_offset().  It'll be ideal if that API can be completely dropped
from arch hugetlb API.  This series is one small step towards merging
hugetlb specific codes into generic mm paths.  From that POV, this series
removes one reference to huge_pte_offset() out of many others.

One goal of such a route is that we can reconsider merging hugetlb features
like High Granularity Mapping (HGM).  It was not accepted in the past
because it may add lots of hugetlb specific codes and make the mm code even
harder to maintain.  With a merged codeset, features like HGM can hopefully
share some code with THP, legacy (PMD+) or modern (continuous PTEs).

To make it work, the generic slow gup code will need to at least understand
hugepd, which is already done like so in fast-gup.  Due to the specialty of
hugepd to be software-only solution (no hardware recognizes the hugepd
format, so it's purely artificial structures), there's chance we can merge
some or all hugepd formats with cont_pte in the future.  That question is
yet unsettled from Power side to have an acknowledgement.  As of now for
this series, I kept the hugepd handling because we may still need to do so
before getting a clearer picture of the future of hugepd.  The other reason
is simply that we did it already for fast-gup and most codes are still
around to be reused.  It'll make more sense to keep slow/fast gup behave
the same before a decision is made to remove hugepd.

There's one major difference for slow-gup on cont_pte / cont_pmd handling,
currently supported on three architectures (aarch64, riscv, ppc).  Before
the series, slow gup will be able to recognize e.g. cont_pte entries with
the help of huge_pte_offset() when hstate is around.  Now it's gone but
still working, by looking up pgtable entries one by one.

It's not ideal, but hopefully this change should not affect yet on major
workloads.  There's some more information in the commit message of the last
patch.  If this would be a concern, we can consider teaching slow gup to
recognize cont pte/pmd entries, and that should recover the lost
performance.  But I doubt its necessity for now, so I kept it as simple as
it can be.

Test Done
=

For x86_64, tested full gup_test matrix over 2MB huge pages. For aarch64,
tested the same over 64KB cont_pte huge pages.

One note is that this v3 didn't go through any ppc test anymore, as finding
such system can always take time.  It's based on the fact that it was
tested in previous versions, and this version should have zero change
regarding to hugepd sections.

If anyone (Christophe?) wants to give it a shot on PowerPC, please do and I
would appreciate it: "./run_vmtests.sh -a -t gup_test" should do well
enough (please consider [2] applied if hugepd is <1MB), as long as we're
sure the hugepd pages are touched as expected.

Patch layout
=

Patch 1-8:Preparation works, or cleanups in relevant code paths
Patch 9-11:   Teach slow gup with all kinds of huge entries (pXd, hugepd)
Patch 12: Drop hugetlb_follow_page_mask()

More information can be found in the commit messages of each patch.  Any
comment will be welcomed.  Thanks.

[1] https://lore.kernel.org/all/20230628215310.73782-1-pet...@redhat.com
[2] https://lore.kernel.org/r/20240321215047.678172-1-pet...@redhat.com

Peter Xu (13):
  mm/Kconfig: CONFIG_PGTABLE_HAS_HUGE_LEAVES
  mm/hugetlb: Declare hugetlbfs_pagecache_present() non-static
  mm: Make HPAGE_PXD_* macros even if !THP
  mm: Introduce vma_pgtable_walk_{begin|end}()
  mm/arch: Provide pud_pfn() fallback
  mm/gup: Drop folio_fast_pin_allowed() in hugepd processing
  mm/gup: Refactor record_subpages() to find 1st small page
  mm/gup: Handle hugetlb for no_page_table()
  mm/gup: Cache *pudp in follow_pud_mask()
  mm/gup: Handle huge pud for follow_pud_mask()
  mm/gup: Handle huge pmd for follow_pmd_mask()
  mm/gup: Handle hugepd for follow_page()
  mm/gup: Handle hugetlb in the generic follow_page_mask code

 

Re: [PATCH RFC 0/3] mm/gup: consistently call it GUP-fast

2024-03-27 Thread Peter Xu
On Wed, Mar 27, 2024 at 02:05:35PM +0100, David Hildenbrand wrote:
> Some cleanups around function names, comments and the config option of
> "GUP-fast" -- GUP without "lock" safety belts on.
> 
> With this cleanup it's easy to judge which functions are GUP-fast specific.
> We now consistently call it "GUP-fast", avoiding mixing it with "fast GUP",
> "lockless", or simply "gup" (which I always considered confusing in the
> ode).
> 
> So the magic now happens in functions that contain "gup_fast", whereby
> gup_fast() is the entry point into that magic. Comments consistently
> reference either "GUP-fast" or "gup_fast()".
> 
> Based on mm-unstable from today. I won't CC arch maintainers, but only
> arch mailing lists, to reduce noise.
> 
> Tested on x86_64, cross compiled on a bunch of archs, whereby some of them
> don't properly even compile on mm-unstable anymore in my usual setup
> (alpha, arc, parisc64, sh) ... maybe the cross compilers are outdated,
> but there are no new ones around. Hm.

I'm not sure what config you tried there; as I am doing some build tests
recently, I found turning off CONFIG_SAMPLES + CONFIG_GCC_PLUGINS could
avoid a lot of issues, I think it's due to libc missing.  But maybe not the
case there.

The series makes sense to me, the naming is confusing.  Btw, thanks for
posting this as RFC. This definitely has a conflict with the other gup
series that I had; I'll post v4 of that shortly.

-- 
Peter Xu



Re: [PATCH] Add static_key_feature_checks_initialized flag

2024-03-27 Thread Christophe Leroy


Le 27/03/2024 à 05:59, Nicholas Miehlbradt a écrit :
> JUMP_LABEL_FEATURE_CHECK_DEBUG used static_key_initialized to determine
> whether {cpu,mmu}_has_feature() was used before static keys were
> initialized. However, {cpu,mmu}_has_feature() should not be used before
> setup_feature_keys() is called. As static_key_initalized is set much
> earlier during boot there is a window in which JUMP_LABEL_FEATURE_CHECK_DEBUG
> will not report errors. Add a flag specifically to indicate when
> {cpu,mmu}_has_feature() is safe to use.

What do you mean by "much earlier" ?

As far as I can see, static_key_initialized is set by jump_label_init() 
as cpu_feature_keys_init() and mmu_feature_keys_init() are call 
immediately after. I don't think it is possible to do anything inbetween.

Or maybe you mean the problem is the call to jump_label_init() in 
early_init_devtree() ? You should make it explicit in the message, and 
see if it wouldn't be better to call cpu_feature_keys_init() and 
mmu_feature_keys_init() as well in early_init_devtree() in that case ?

Christophe


Re: FAILED: Patch "powerpc: xor_vmx: Add '-mhard-float' to CFLAGS" failed to apply to 5.10-stable tree

2024-03-27 Thread Nathan Chancellor
On Wed, Mar 27, 2024 at 08:20:07AM -0400, Sasha Levin wrote:
> The patch below does not apply to the 5.10-stable tree.
> If someone wants it applied there, or to any other stable or longterm
> tree, then please email the backport, including the original git commit
> id to .
...
> -- original commit in Linus's tree --
> 
> From 35f20786c481d5ced9283ff42de5c69b65e5ed13 Mon Sep 17 00:00:00 2001
> From: Nathan Chancellor 
> Date: Sat, 27 Jan 2024 11:07:43 -0700
> Subject: [PATCH] powerpc: xor_vmx: Add '-mhard-float' to CFLAGS

I have attached a backport that will work for 5.15 and earlier. I think
you worked around this conflict in 5.15 by taking 04e85bbf71c9 but I am
not sure that is a smart idea. I think it might just be better to drop
that dependency and apply this version in 5.15.

Cheers,
Nathan
>From c6cb80d94871cbb4ff151f7eb2586cadeb364ef7 Mon Sep 17 00:00:00 2001
From: Nathan Chancellor 
Date: Sat, 27 Jan 2024 11:07:43 -0700
Subject: [PATCH 4.19 to 5.15] powerpc: xor_vmx: Add '-mhard-float' to CFLAGS

commit 35f20786c481d5ced9283ff42de5c69b65e5ed13 upstream.

arch/powerpc/lib/xor_vmx.o is built with '-msoft-float' (from the main
powerpc Makefile) and '-maltivec' (from its CFLAGS), which causes an
error when building with clang after a recent change in main:

  error: option '-msoft-float' cannot be specified with '-maltivec'
  make[6]: *** [scripts/Makefile.build:243: arch/powerpc/lib/xor_vmx.o] Error 1

Explicitly add '-mhard-float' before '-maltivec' in xor_vmx.o's CFLAGS
to override the previous inclusion of '-msoft-float' (as the last option
wins), which matches how other areas of the kernel use '-maltivec', such
as AMDGPU.

Cc: sta...@vger.kernel.org
Closes: https://github.com/ClangBuiltLinux/linux/issues/1986
Link: 
https://github.com/llvm/llvm-project/commit/4792f912b232141ecba4cbae538873be3c28556c
Signed-off-by: Nathan Chancellor 
Signed-off-by: Michael Ellerman 
Link: 
https://msgid.link/20240127-ppc-xor_vmx-drop-msoft-float-v1-1-f24140e81...@kernel.org
[nathan: Fixed conflicts due to lack of 04e85bbf71c9 in older trees]
Signed-off-by: Nathan Chancellor 
---
 arch/powerpc/lib/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 321cab5c3ea0..bd5012aa94e3 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -67,6 +67,6 @@ obj-$(CONFIG_PPC_LIB_RHEAP) += rheap.o
 obj-$(CONFIG_FTR_FIXUP_SELFTEST) += feature-fixups-test.o
 
 obj-$(CONFIG_ALTIVEC)  += xor_vmx.o xor_vmx_glue.o
-CFLAGS_xor_vmx.o += -maltivec $(call cc-option,-mabi=altivec)
+CFLAGS_xor_vmx.o += -mhard-float -maltivec $(call cc-option,-mabi=altivec)
 
 obj-$(CONFIG_PPC64) += $(obj64-y)
-- 
2.44.0



Re: [PATCH v2 12/14] sh: Add support for suppressing warning backtraces

2024-03-27 Thread Guenter Roeck

On 3/27/24 07:44, Simon Horman wrote:

On Mon, Mar 25, 2024 at 10:52:46AM -0700, Guenter Roeck wrote:

Add name of functions triggering warning backtraces to the __bug_table
object section to enable support for suppressing WARNING backtraces.

To limit image size impact, the pointer to the function name is only added
to the __bug_table section if both CONFIG_KUNIT_SUPPRESS_BACKTRACE and
CONFIG_DEBUG_BUGVERBOSE are enabled. Otherwise, the __func__ assembly
parameter is replaced with a (dummy) NULL parameter to avoid an image size
increase due to unused __func__ entries (this is necessary because __func__
is not a define but a virtual variable).

Tested-by: Linux Kernel Functional Testing 
Acked-by: Dan Carpenter 
Signed-off-by: Guenter Roeck 
---
- Rebased to v6.9-rc1
- Added Tested-by:, Acked-by:, and Reviewed-by: tags
- Introduced KUNIT_SUPPRESS_BACKTRACE configuration option

  arch/sh/include/asm/bug.h | 26 ++
  1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/arch/sh/include/asm/bug.h b/arch/sh/include/asm/bug.h
index 05a485c4fabc..470ce6567d20 100644
--- a/arch/sh/include/asm/bug.h
+++ b/arch/sh/include/asm/bug.h
@@ -24,21 +24,36 @@
   * The offending file and line are encoded in the __bug_table section.
   */
  #ifdef CONFIG_DEBUG_BUGVERBOSE
+
+#ifdef CONFIG_KUNIT_SUPPRESS_BACKTRACE
+# define HAVE_BUG_FUNCTION
+# define __BUG_FUNC_PTR"\t.long %O2\n"
+#else
+# define __BUG_FUNC_PTR
+#endif /* CONFIG_KUNIT_SUPPRESS_BACKTRACE */
+


Hi Guenter,

a minor nit from my side: this change results in a Kernel doc warning.

  .../bug.h:29: warning: expecting prototype for _EMIT_BUG_ENTRY(). 
Prototype was for HAVE_BUG_FUNCTION() instead

Perhaps either the new code should be placed above the Kernel doc,
or scripts/kernel-doc should be enhanced?



Thanks a lot for the feedback.

The definition block needs to be inside CONFIG_DEBUG_BUGVERBOSE,
so it would be a bit odd to move it above the documentation
just to make kerneldoc happy. I am not really sure that to do
about it.

I'll wait for comments from others before making any changes.

Thanks,
Guenter


  #define _EMIT_BUG_ENTRY   \
"\t.pushsection __bug_table,\"aw\"\n"   \
"2:\t.long 1b, %O1\n" \
-   "\t.short %O2, %O3\n" \
-   "\t.org 2b+%O4\n" \
+   __BUG_FUNC_PTR  \
+   "\t.short %O3, %O4\n" \
+   "\t.org 2b+%O5\n" \
"\t.popsection\n"
  #else
  #define _EMIT_BUG_ENTRY   \
"\t.pushsection __bug_table,\"aw\"\n"   \
"2:\t.long 1b\n"  \
-   "\t.short %O3\n"  \
-   "\t.org 2b+%O4\n" \
+   "\t.short %O4\n"  \
+   "\t.org 2b+%O5\n" \
"\t.popsection\n"
  #endif
  
+#ifdef HAVE_BUG_FUNCTION

+# define __BUG_FUNC__func__
+#else
+# define __BUG_FUNCNULL
+#endif
+
  #define BUG() \
  do {  \
__asm__ __volatile__ (  \


...




Re: [PATCH RFC 3/3] mm: use "GUP-fast" instead "fast GUP" in remaining comments

2024-03-27 Thread Mike Rapoport
On Wed, Mar 27, 2024 at 02:05:38PM +0100, David Hildenbrand wrote:
> Let's fixup the remaining comments to consistently call that thing
> "GUP-fast". With this change, we consistently call it "GUP-fast".
> 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Mike Rapoport (IBM) 

> ---
>  mm/filemap.c| 2 +-
>  mm/khugepaged.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 387b394754fa..c668e11cd6ef 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1810,7 +1810,7 @@ EXPORT_SYMBOL(page_cache_prev_miss);
>   * C. Return the page to the page allocator
>   *
>   * This means that any page may have its reference count temporarily
> - * increased by a speculative page cache (or fast GUP) lookup as it can
> + * increased by a speculative page cache (or GUP-fast) lookup as it can
>   * be allocated by another user before the RCU grace period expires.
>   * Because the refcount temporarily acquired here may end up being the
>   * last refcount on the page, any page allocation must be freeable by
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 38830174608f..6972fa05132e 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1169,7 +1169,7 @@ static int collapse_huge_page(struct mm_struct *mm, 
> unsigned long address,
>* huge and small TLB entries for the same virtual address to
>* avoid the risk of CPU bugs in that area.
>*
> -  * Parallel fast GUP is fine since fast GUP will back off when
> +  * Parallel GUP-fast is fine since GUP-fast will back off when
>* it detects PMD is changed.
>*/
>   _pmd = pmdp_collapse_flush(vma, address, pmd);
> -- 
> 2.43.2
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH RFC 2/3] mm/treewide: rename CONFIG_HAVE_FAST_GUP to CONFIG_HAVE_GUP_FAST

2024-03-27 Thread Mike Rapoport
On Wed, Mar 27, 2024 at 02:05:37PM +0100, David Hildenbrand wrote:
> Nowadays, we call it "GUP-fast", the external interface includes
> functions like "get_user_pages_fast()", and we renamed all internal
> functions to reflect that as well.
> 
> Let's make the config option reflect that.
> 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Mike Rapoport (IBM) 

> ---
>  arch/arm/Kconfig   | 2 +-
>  arch/arm64/Kconfig | 2 +-
>  arch/loongarch/Kconfig | 2 +-
>  arch/mips/Kconfig  | 2 +-
>  arch/powerpc/Kconfig   | 2 +-
>  arch/s390/Kconfig  | 2 +-
>  arch/sh/Kconfig| 2 +-
>  arch/x86/Kconfig   | 2 +-
>  include/linux/rmap.h   | 8 
>  kernel/events/core.c   | 4 ++--
>  mm/Kconfig | 2 +-
>  mm/gup.c   | 6 +++---
>  mm/internal.h  | 2 +-
>  13 files changed, 19 insertions(+), 19 deletions(-)
> 


Re: [PATCH v2 12/14] sh: Add support for suppressing warning backtraces

2024-03-27 Thread Simon Horman
On Mon, Mar 25, 2024 at 10:52:46AM -0700, Guenter Roeck wrote:
> Add name of functions triggering warning backtraces to the __bug_table
> object section to enable support for suppressing WARNING backtraces.
> 
> To limit image size impact, the pointer to the function name is only added
> to the __bug_table section if both CONFIG_KUNIT_SUPPRESS_BACKTRACE and
> CONFIG_DEBUG_BUGVERBOSE are enabled. Otherwise, the __func__ assembly
> parameter is replaced with a (dummy) NULL parameter to avoid an image size
> increase due to unused __func__ entries (this is necessary because __func__
> is not a define but a virtual variable).
> 
> Tested-by: Linux Kernel Functional Testing 
> Acked-by: Dan Carpenter 
> Signed-off-by: Guenter Roeck 
> ---
> - Rebased to v6.9-rc1
> - Added Tested-by:, Acked-by:, and Reviewed-by: tags
> - Introduced KUNIT_SUPPRESS_BACKTRACE configuration option
> 
>  arch/sh/include/asm/bug.h | 26 ++
>  1 file changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/sh/include/asm/bug.h b/arch/sh/include/asm/bug.h
> index 05a485c4fabc..470ce6567d20 100644
> --- a/arch/sh/include/asm/bug.h
> +++ b/arch/sh/include/asm/bug.h
> @@ -24,21 +24,36 @@
>   * The offending file and line are encoded in the __bug_table section.
>   */
>  #ifdef CONFIG_DEBUG_BUGVERBOSE
> +
> +#ifdef CONFIG_KUNIT_SUPPRESS_BACKTRACE
> +# define HAVE_BUG_FUNCTION
> +# define __BUG_FUNC_PTR  "\t.long %O2\n"
> +#else
> +# define __BUG_FUNC_PTR
> +#endif /* CONFIG_KUNIT_SUPPRESS_BACKTRACE */
> +

Hi Guenter,

a minor nit from my side: this change results in a Kernel doc warning.

 .../bug.h:29: warning: expecting prototype for _EMIT_BUG_ENTRY(). 
Prototype was for HAVE_BUG_FUNCTION() instead

Perhaps either the new code should be placed above the Kernel doc,
or scripts/kernel-doc should be enhanced?

>  #define _EMIT_BUG_ENTRY  \
>   "\t.pushsection __bug_table,\"aw\"\n"   \
>   "2:\t.long 1b, %O1\n"   \
> - "\t.short %O2, %O3\n"   \
> - "\t.org 2b+%O4\n"   \
> + __BUG_FUNC_PTR  \
> + "\t.short %O3, %O4\n"   \
> + "\t.org 2b+%O5\n"   \
>   "\t.popsection\n"
>  #else
>  #define _EMIT_BUG_ENTRY  \
>   "\t.pushsection __bug_table,\"aw\"\n"   \
>   "2:\t.long 1b\n"\
> - "\t.short %O3\n"\
> - "\t.org 2b+%O4\n"   \
> + "\t.short %O4\n"\
> + "\t.org 2b+%O5\n"   \
>   "\t.popsection\n"
>  #endif
>  
> +#ifdef HAVE_BUG_FUNCTION
> +# define __BUG_FUNC  __func__
> +#else
> +# define __BUG_FUNC  NULL
> +#endif
> +
>  #define BUG()\
>  do { \
>   __asm__ __volatile__ (  \

...


Re: [PATCH v2 2/5] arm64, powerpc, riscv, s390, x86: ptdump: Refactor CONFIG_DEBUG_WX

2024-03-27 Thread Palmer Dabbelt

On Tue, 30 Jan 2024 02:34:33 PST (-0800), christophe.le...@csgroup.eu wrote:

All architectures using the core ptdump functionality also implement
CONFIG_DEBUG_WX, and they all do it more or less the same way, with a
function called debug_checkwx() that is called by mark_rodata_ro(),
which is a substitute to ptdump_check_wx() when CONFIG_DEBUG_WX is
set and a no-op otherwise.

Refactor by centraly defining debug_checkwx() in linux/ptdump.h and
call debug_checkwx() immediately after calling mark_rodata_ro()
instead of calling it at the end of every mark_rodata_ro().

On x86_32, mark_rodata_ro() first checks __supported_pte_mask has
_PAGE_NX before calling debug_checkwx(). Now the check is inside the
callee ptdump_walk_pgd_level_checkwx().

On powerpc_64, mark_rodata_ro() bails out early before calling
ptdump_check_wx() when the MMU doesn't have KERNEL_RO feature. The
check is now also done in ptdump_check_wx() as it is called outside
mark_rodata_ro().

Signed-off-by: Christophe Leroy 
Reviewed-by: Alexandre Ghiti 
---
v2: For x86 change macro ptdump_check_wx() to ptdump_check_wx
---
 arch/arm64/include/asm/ptdump.h |  7 ---
 arch/arm64/mm/mmu.c |  2 --
 arch/powerpc/mm/mmu_decl.h  |  6 --
 arch/powerpc/mm/pgtable_32.c|  4 
 arch/powerpc/mm/pgtable_64.c|  3 ---
 arch/powerpc/mm/ptdump/ptdump.c |  3 +++
 arch/riscv/include/asm/ptdump.h | 22 --
 arch/riscv/mm/init.c|  3 ---
 arch/riscv/mm/ptdump.c  |  1 -
 arch/s390/include/asm/ptdump.h  | 14 --
 arch/s390/mm/dump_pagetables.c  |  1 -
 arch/s390/mm/init.c |  2 --
 arch/x86/include/asm/pgtable.h  |  3 +--
 arch/x86/mm/dump_pagetables.c   |  3 +++
 arch/x86/mm/init_32.c   |  2 --
 arch/x86/mm/init_64.c   |  2 --
 include/linux/ptdump.h  |  7 +++
 init/main.c |  2 ++
 18 files changed, 16 insertions(+), 71 deletions(-)
 delete mode 100644 arch/riscv/include/asm/ptdump.h
 delete mode 100644 arch/s390/include/asm/ptdump.h

diff --git a/arch/arm64/include/asm/ptdump.h b/arch/arm64/include/asm/ptdump.h
index 581caac525b0..5b1701c76d1c 100644
--- a/arch/arm64/include/asm/ptdump.h
+++ b/arch/arm64/include/asm/ptdump.h
@@ -29,13 +29,6 @@ void __init ptdump_debugfs_register(struct ptdump_info 
*info, const char *name);
 static inline void ptdump_debugfs_register(struct ptdump_info *info,
   const char *name) { }
 #endif
-void ptdump_check_wx(void);
 #endif /* CONFIG_PTDUMP_CORE */

-#ifdef CONFIG_DEBUG_WX
-#define debug_checkwx()ptdump_check_wx()
-#else
-#define debug_checkwx()do { } while (0)
-#endif
-
 #endif /* __ASM_PTDUMP_H */
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 1ac7467d34c9..3a27d887f7dd 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -632,8 +632,6 @@ void mark_rodata_ro(void)
section_size = (unsigned long)__init_begin - (unsigned 
long)__start_rodata;
update_mapping_prot(__pa_symbol(__start_rodata), (unsigned 
long)__start_rodata,
section_size, PAGE_KERNEL_RO);
-
-   debug_checkwx();
 }

 static void __init map_kernel_segment(pgd_t *pgdp, void *va_start, void 
*va_end,
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 72341b9fb552..90dcc2844056 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -171,12 +171,6 @@ static inline void mmu_mark_rodata_ro(void) { }
 void __init mmu_mapin_immr(void);
 #endif

-#ifdef CONFIG_DEBUG_WX
-void ptdump_check_wx(void);
-#else
-static inline void ptdump_check_wx(void) { }
-#endif
-
 static inline bool debug_pagealloc_enabled_or_kfence(void)
 {
return IS_ENABLED(CONFIG_KFENCE) || debug_pagealloc_enabled();
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 5c02fd08d61e..12498017da8e 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -153,7 +153,6 @@ void mark_rodata_ro(void)

if (v_block_mapped((unsigned long)_stext + 1)) {
mmu_mark_rodata_ro();
-   ptdump_check_wx();
return;
}

@@ -166,9 +165,6 @@ void mark_rodata_ro(void)
   PFN_DOWN((unsigned long)_stext);

set_memory_ro((unsigned long)_stext, numpages);
-
-   // mark_initmem_nx() should have already run by now
-   ptdump_check_wx();
 }
 #endif

diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 5ac1fd30341b..1b366526f4f2 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -150,9 +150,6 @@ void mark_rodata_ro(void)
radix__mark_rodata_ro();
else
hash__mark_rodata_ro();
-
-   // mark_initmem_nx() should have already run by now
-   ptdump_check_wx();
 }

 void mark_initmem_nx(void)
diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index 

Re: [PATCH v3] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id

2024-03-27 Thread Palmer Dabbelt

On Thu, 25 Jan 2024 22:44:51 PST (-0800), shi...@os.amperecomputing.com wrote:

During the kernel booting, the generic cpu_to_node() is called too early in
arm64, powerpc and riscv when CONFIG_NUMA is enabled.

There are at least four places in the common code where
the generic cpu_to_node() is called before it is initialized:
   1.) early_trace_init() in kernel/trace/trace.c
   2.) sched_init()   in kernel/sched/core.c
   3.) init_sched_fair_class()in kernel/sched/fair.c
   4.) workqueue_init_early() in kernel/workqueue.c

In order to fix the bug, the patch introduces early_numa_node_init()
which is called after smp_prepare_boot_cpu() in start_kernel.
early_numa_node_init will initialize the "numa_node" as soon as
the early_cpu_to_node() is ready, before the cpu_to_node() is called
at the first time.

Signed-off-by: Huang Shijie 
---
v2 --> v3:
Do not change the cpu_to_node to function pointer.
Introduce early_numa_node_init() which initialize
the numa_node at an early stage.

v2: 
https://lore.kernel.org/all/20240123045843.75969-1-shi...@os.amperecomputing.com/

v1 --> v2:
In order to fix the x86 compiling error, move the cpu_to_node()
from driver/base/arch_numa.c to driver/base/node.c.

v1: 
http://lists.infradead.org/pipermail/linux-arm-kernel/2024-January/896160.html

An old different title patch:

http://lists.infradead.org/pipermail/linux-arm-kernel/2024-January/895963.html

---
 init/main.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/init/main.c b/init/main.c
index e24b0780fdff..39efe5ed58a0 100644
--- a/init/main.c
+++ b/init/main.c
@@ -870,6 +870,19 @@ static void __init print_unknown_bootoptions(void)
memblock_free(unknown_options, len);
 }

+static void __init early_numa_node_init(void)
+{
+#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
+#ifndef cpu_to_node
+   int cpu;
+
+   /* The early_cpu_to_node() should be ready here. */
+   for_each_possible_cpu(cpu)
+   set_cpu_numa_node(cpu, early_cpu_to_node(cpu));
+#endif
+#endif
+}
+
 asmlinkage __visible __init __no_sanitize_address __noreturn 
__no_stack_protector
 void start_kernel(void)
 {
@@ -900,6 +913,7 @@ void start_kernel(void)
setup_nr_cpu_ids();
setup_per_cpu_areas();
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
+   early_numa_node_init();
boot_cpu_hotplug_init();

pr_notice("Kernel command line: %s\n", saved_command_line);


Acked-by: Palmer Dabbelt  # RISC-V

I don't really understand the init/main.c stuff all that well, I'm 
adding Andrew as it looks like he's been merging stuff here.


Re: [PATCH RFC 1/3] mm/gup: consistently name GUP-fast functions

2024-03-27 Thread David Hildenbrand

On 27.03.24 14:52, Jason Gunthorpe wrote:

On Wed, Mar 27, 2024 at 02:05:36PM +0100, David Hildenbrand wrote:

Let's consistently call the "fast-only" part of GUP "GUP-fast" and rename
all relevant internal functions to start with "gup_fast", to make it
clearer that this is not ordinary GUP. The current mixture of
"lockless", "gup" and "gup_fast" is confusing.

Further, avoid the term "huge" when talking about a "leaf" -- for
example, we nowadays check pmd_leaf() because pmd_huge() is gone. For the
"hugepd"/"hugepte" stuff, it's part of the name ("is_hugepd"), so that
says.

What remains is the "external" interface:
* get_user_pages_fast_only()
* get_user_pages_fast()
* pin_user_pages_fast()

And the "internal" interface that handles GUP-fast + fallback:
* internal_get_user_pages_fast()


This would like a better name too. How about gup_fast_fallback() ?


Yes, I was not able to come up with something I liked. But I do like
your proposal, so I'll do that!

[...]



I think it is a great idea, it always takes a moment to figure out if
a function is part of the fast callchain or not..

(even better would be to shift the fast stuff into its own file, but I
expect that is too much)


Yes, one step at a time :)



Reviewed-by: Jason Gunthorpe 


Thanks Jason!

--
Cheers,

David / dhildenb



Re: [PATCH RFC 1/3] mm/gup: consistently name GUP-fast functions

2024-03-27 Thread Jason Gunthorpe
On Wed, Mar 27, 2024 at 02:05:36PM +0100, David Hildenbrand wrote:
> Let's consistently call the "fast-only" part of GUP "GUP-fast" and rename
> all relevant internal functions to start with "gup_fast", to make it
> clearer that this is not ordinary GUP. The current mixture of
> "lockless", "gup" and "gup_fast" is confusing.
> 
> Further, avoid the term "huge" when talking about a "leaf" -- for
> example, we nowadays check pmd_leaf() because pmd_huge() is gone. For the
> "hugepd"/"hugepte" stuff, it's part of the name ("is_hugepd"), so that
> says.
> 
> What remains is the "external" interface:
> * get_user_pages_fast_only()
> * get_user_pages_fast()
> * pin_user_pages_fast()
> 
> And the "internal" interface that handles GUP-fast + fallback:
> * internal_get_user_pages_fast()

This would like a better name too. How about gup_fast_fallback() ?

> The high-level internal function for GUP-fast is now:
> * gup_fast()
> 
> The basic GUP-fast walker functions:
> * gup_pgd_range() -> gup_fast_pgd_range()
> * gup_p4d_range() -> gup_fast_p4d_range()
> * gup_pud_range() -> gup_fast_pud_range()
> * gup_pmd_range() -> gup_fast_pmd_range()
> * gup_pte_range() -> gup_fast_pte_range()
> * gup_huge_pgd()  -> gup_fast_pgd_leaf()
> * gup_huge_pud()  -> gup_fast_pud_leaf()
> * gup_huge_pmd()  -> gup_fast_pmd_leaf()
> 
> The weird hugepd stuff:
> * gup_huge_pd() -> gup_fast_hugepd()
> * gup_hugepte() -> gup_fast_hugepte()
> 
> The weird devmap stuff:
> * __gup_device_huge_pud() -> gup_fast_devmap_pud_leaf()
> * __gup_device_huge_pmd   -> gup_fast_devmap_pmd_leaf()
> * __gup_device_huge() -> gup_fast_devmap_leaf()
>
> Helper functions:
> * unpin_user_pages_lockless() -> gup_fast_unpin_user_pages()
> * gup_fast_folio_allowed() is already properly named
> * gup_fast_permitted() is already properly named
> 
> With "gup_fast()", we now even have a function that is referred to in
> comment in mm/mmu_gather.c.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  mm/gup.c | 164 ---
>  1 file changed, 84 insertions(+), 80 deletions(-)

I think it is a great idea, it always takes a moment to figure out if
a function is part of the fast callchain or not..

(even better would be to shift the fast stuff into its own file, but I
expect that is too much)

Reviewed-by: Jason Gunthorpe 

Jason


[PATCH RFC 3/3] mm: use "GUP-fast" instead "fast GUP" in remaining comments

2024-03-27 Thread David Hildenbrand
Let's fixup the remaining comments to consistently call that thing
"GUP-fast". With this change, we consistently call it "GUP-fast".

Signed-off-by: David Hildenbrand 
---
 mm/filemap.c| 2 +-
 mm/khugepaged.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 387b394754fa..c668e11cd6ef 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1810,7 +1810,7 @@ EXPORT_SYMBOL(page_cache_prev_miss);
  * C. Return the page to the page allocator
  *
  * This means that any page may have its reference count temporarily
- * increased by a speculative page cache (or fast GUP) lookup as it can
+ * increased by a speculative page cache (or GUP-fast) lookup as it can
  * be allocated by another user before the RCU grace period expires.
  * Because the refcount temporarily acquired here may end up being the
  * last refcount on the page, any page allocation must be freeable by
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 38830174608f..6972fa05132e 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1169,7 +1169,7 @@ static int collapse_huge_page(struct mm_struct *mm, 
unsigned long address,
 * huge and small TLB entries for the same virtual address to
 * avoid the risk of CPU bugs in that area.
 *
-* Parallel fast GUP is fine since fast GUP will back off when
+* Parallel GUP-fast is fine since GUP-fast will back off when
 * it detects PMD is changed.
 */
_pmd = pmdp_collapse_flush(vma, address, pmd);
-- 
2.43.2



  1   2   >