Re: [PATCH RFC 0/3] mm/gup: consistently call it GUP-fast
+CC Alexey On 3/27/24 09:22, Arnd Bergmann wrote: > On Wed, Mar 27, 2024, at 16:39, David Hildenbrand wrote: >> On 27.03.24 16:21, Peter Xu wrote: >>> On Wed, Mar 27, 2024 at 02:05:35PM +0100, David Hildenbrand wrote: >>> >>> I'm not sure what config you tried there; as I am doing some build tests >>> recently, I found turning off CONFIG_SAMPLES + CONFIG_GCC_PLUGINS could >>> avoid a lot of issues, I think it's due to libc missing. But maybe not the >>> case there. >> CCin Arnd; I use some of his compiler chains, others from Fedora directly. >> For >> example for alpha and arc, the Fedora gcc is "13.2.1". >> But there is other stuff like (arc): >> >> ./arch/arc/include/asm/mmu-arcv2.h: In function 'mmu_setup_asid': >> ./arch/arc/include/asm/mmu-arcv2.h:82:9: error: implicit declaration of >> function 'write_aux_reg' [-Werro >> r=implicit-function-declaration] >> 82 | write_aux_reg(ARC_REG_PID, asid | MMU_ENABLE); >>| ^ > Seems to be missing an #include of soc/arc/aux.h, but I can't > tell when this first broke without bisecting. Weird I don't see this one but I only have gcc 12 handy ATM. gcc version 12.2.1 20230306 (ARC HS GNU/Linux glibc toolchain - build 1360) I even tried W=1 (which according to scripts/Makefile.extrawarn) should include -Werror=implicit-function-declaration but don't see this still. Tomorrow I'll try building a gcc 13.2.1 for ARC. > >> or (alpha) >> >> WARNING: modpost: "saved_config" [vmlinux] is COMMON symbol >> ERROR: modpost: "memcpy" [fs/reiserfs/reiserfs.ko] undefined! >> ERROR: modpost: "memcpy" [fs/nfs/nfs.ko] undefined! >> ERROR: modpost: "memcpy" [fs/nfs/nfsv3.ko] undefined! >> ERROR: modpost: "memcpy" [fs/nfsd/nfsd.ko] undefined! >> ERROR: modpost: "memcpy" [fs/lockd/lockd.ko] undefined! >> ERROR: modpost: "memcpy" [crypto/crypto.ko] undefined! >> ERROR: modpost: "memcpy" [crypto/crypto_algapi.ko] undefined! >> ERROR: modpost: "memcpy" [crypto/aead.ko] undefined! >> ERROR: modpost: "memcpy" [crypto/crypto_skcipher.ko] undefined! >> ERROR: modpost: "memcpy" [crypto/seqiv.ko] undefined! Are these from ARC build or otherwise ? Thx, -Vineet
Re: [PATCH v11 09/11] poweprc: mm: Implement *_user_accessible_page() for ptes
On Thu, 2024-03-28 at 05:40 +, Christophe Leroy wrote: > > > Le 28/03/2024 à 05:55, Rohan McLure a écrit : > > Page table checking depends on architectures providing an > > implementation of p{te,md,ud}_user_accessible_page. With > > refactorisations made on powerpc/mm, the pte_access_permitted() and > > similar methods verify whether a userland page is accessible with > > the > > required permissions. > > > > Since page table checking is the only user of > > p{te,md,ud}_user_accessible_page(), implement these for all > > platforms, > > using some of the same preliminary checks taken by > > pte_access_permitted() > > on that platform. > > > > Since Commit 8e9bd41e4ce1 ("powerpc/nohash: Replace pte_user() by > > pte_read()") > > pte_user() is no longer required to be present on all platforms as > > it > > may be equivalent to or implied by pte_read(). Hence > > implementations of > > pte_user_accessible_page() are specialised. > > > > Signed-off-by: Rohan McLure > > --- > > v9: New implementation > > v10: Let book3s/64 use pte_user(), but otherwise default other > > platforms > > to using the address provided with the call to infer whether it is > > a > > user page or not. pmd/pud variants will warn on all other > > platforms, as > > they should not be used for user page mappings > > v11: Conditionally define p{m,u}d_user_accessible_page(), as not > > all > > platforms have p{m,u}d_leaf(), p{m,u}d_pte() stubs. > > See my comment to v10 patch 10. > > p{m,u}d_leaf() is defined for all platforms (There is a fallback > definition in include/linux/pgtable.h) so > p{m,u}d_user_accessible_page() > can be defined for all platforms, no need for a conditionally define. The issue I see is that the definition in include/linux/pgtable.h occurs after this header is included. Prior to the removal of a local definition of p{m,u}d_leaf() etc we didn't run into this issue, but we still do now. Not insistent on doing it this way with ifndef, so amenable to suggestions if you have a preference. > > > --- > > arch/powerpc/include/asm/book3s/32/pgtable.h | 5 + > > arch/powerpc/include/asm/book3s/64/pgtable.h | 17 > > + > > arch/powerpc/include/asm/nohash/pgtable.h | 5 + > > arch/powerpc/include/asm/pgtable.h | 8 > > 4 files changed, 35 insertions(+) > > > > diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h > > b/arch/powerpc/include/asm/book3s/32/pgtable.h > > index 52971ee30717..83f7b98ef49f 100644 > > --- a/arch/powerpc/include/asm/book3s/32/pgtable.h > > +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h > > @@ -436,6 +436,11 @@ static inline bool pte_access_permitted(pte_t > > pte, bool write) > > return true; > > } > > > > +static inline bool pte_user_accessible_page(pte_t pte, unsigned > > long addr) > > +{ > > + return pte_present(pte) && !is_kernel_addr(addr); > > +} > > + > > /* Conversion functions: convert a page and protection to a page > > entry, > > * and a page entry and page directory to the page they refer to. > > * > > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h > > b/arch/powerpc/include/asm/book3s/64/pgtable.h > > index fac5615e6bc5..d8640ddbcad1 100644 > > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h > > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h > > @@ -538,6 +538,11 @@ static inline bool pte_access_permitted(pte_t > > pte, bool write) > > return arch_pte_access_permitted(pte_val(pte), write, 0); > > } > > > > +static inline bool pte_user_accessible_page(pte_t pte, unsigned > > long addr) > > +{ > > + return pte_present(pte) && pte_user(pte); > > +} > > + > > /* > > * Conversion functions: convert a page and protection to a page > > entry, > > * and a page entry and page directory to the page they refer to. > > @@ -1441,5 +1446,17 @@ static inline bool pud_leaf(pud_t pud) > > return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE)); > > } > > > > +#define pmd_user_accessible_page pmd_user_accessible_page > > +static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned > > long addr) > > +{ > > + return pmd_leaf(pmd) && > > pte_user_accessible_page(pmd_pte(pmd), addr); > > +} > > + > > +#define pud_user_accessible_page pud_user_accessible_page > > +static inline bool pud_user_accessible_page(pud_t pud, unsigned > > long addr) > > +{ > > + return pud_leaf(pud) && > > pte_user_accessible_page(pud_pte(pud), addr); > > +} > > + > > #endif /* __ASSEMBLY__ */ > > #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */ > > diff --git a/arch/powerpc/include/asm/nohash/pgtable.h > > b/arch/powerpc/include/asm/nohash/pgtable.h > > index 427db14292c9..413d01a51e6f 100644 > > --- a/arch/powerpc/include/asm/nohash/pgtable.h > > +++ b/arch/powerpc/include/asm/nohash/pgtable.h > > @@ -213,6 +213,11 @@ static inline bool pte_access_permitted(pte_t > > pte, bool write) > > return true; > > } > > > > +static inline bool pte_user_accessible_page(pte_t pte,
Re: [PATCH v11 09/11] poweprc: mm: Implement *_user_accessible_page() for ptes
Le 28/03/2024 à 05:55, Rohan McLure a écrit : > Page table checking depends on architectures providing an > implementation of p{te,md,ud}_user_accessible_page. With > refactorisations made on powerpc/mm, the pte_access_permitted() and > similar methods verify whether a userland page is accessible with the > required permissions. > > Since page table checking is the only user of > p{te,md,ud}_user_accessible_page(), implement these for all platforms, > using some of the same preliminary checks taken by pte_access_permitted() > on that platform. > > Since Commit 8e9bd41e4ce1 ("powerpc/nohash: Replace pte_user() by pte_read()") > pte_user() is no longer required to be present on all platforms as it > may be equivalent to or implied by pte_read(). Hence implementations of > pte_user_accessible_page() are specialised. > > Signed-off-by: Rohan McLure > --- > v9: New implementation > v10: Let book3s/64 use pte_user(), but otherwise default other platforms > to using the address provided with the call to infer whether it is a > user page or not. pmd/pud variants will warn on all other platforms, as > they should not be used for user page mappings > v11: Conditionally define p{m,u}d_user_accessible_page(), as not all > platforms have p{m,u}d_leaf(), p{m,u}d_pte() stubs. See my comment to v10 patch 10. p{m,u}d_leaf() is defined for all platforms (There is a fallback definition in include/linux/pgtable.h) so p{m,u}d_user_accessible_page() can be defined for all platforms, no need for a conditionally define. > --- > arch/powerpc/include/asm/book3s/32/pgtable.h | 5 + > arch/powerpc/include/asm/book3s/64/pgtable.h | 17 + > arch/powerpc/include/asm/nohash/pgtable.h| 5 + > arch/powerpc/include/asm/pgtable.h | 8 > 4 files changed, 35 insertions(+) > > diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h > b/arch/powerpc/include/asm/book3s/32/pgtable.h > index 52971ee30717..83f7b98ef49f 100644 > --- a/arch/powerpc/include/asm/book3s/32/pgtable.h > +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h > @@ -436,6 +436,11 @@ static inline bool pte_access_permitted(pte_t pte, bool > write) > return true; > } > > +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr) > +{ > + return pte_present(pte) && !is_kernel_addr(addr); > +} > + > /* Conversion functions: convert a page and protection to a page entry, >* and a page entry and page directory to the page they refer to. >* > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h > b/arch/powerpc/include/asm/book3s/64/pgtable.h > index fac5615e6bc5..d8640ddbcad1 100644 > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h > @@ -538,6 +538,11 @@ static inline bool pte_access_permitted(pte_t pte, bool > write) > return arch_pte_access_permitted(pte_val(pte), write, 0); > } > > +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr) > +{ > + return pte_present(pte) && pte_user(pte); > +} > + > /* >* Conversion functions: convert a page and protection to a page entry, >* and a page entry and page directory to the page they refer to. > @@ -1441,5 +1446,17 @@ static inline bool pud_leaf(pud_t pud) > return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE)); > } > > +#define pmd_user_accessible_page pmd_user_accessible_page > +static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr) > +{ > + return pmd_leaf(pmd) && pte_user_accessible_page(pmd_pte(pmd), addr); > +} > + > +#define pud_user_accessible_page pud_user_accessible_page > +static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr) > +{ > + return pud_leaf(pud) && pte_user_accessible_page(pud_pte(pud), addr); > +} > + > #endif /* __ASSEMBLY__ */ > #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */ > diff --git a/arch/powerpc/include/asm/nohash/pgtable.h > b/arch/powerpc/include/asm/nohash/pgtable.h > index 427db14292c9..413d01a51e6f 100644 > --- a/arch/powerpc/include/asm/nohash/pgtable.h > +++ b/arch/powerpc/include/asm/nohash/pgtable.h > @@ -213,6 +213,11 @@ static inline bool pte_access_permitted(pte_t pte, bool > write) > return true; > } > > +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr) > +{ > + return pte_present(pte) && !is_kernel_addr(addr); > +} > + > /* Conversion functions: convert a page and protection to a page entry, >* and a page entry and page directory to the page they refer to. >* > diff --git a/arch/powerpc/include/asm/pgtable.h > b/arch/powerpc/include/asm/pgtable.h > index ee8c82c0528f..f1ceae778cb1 100644 > --- a/arch/powerpc/include/asm/pgtable.h > +++ b/arch/powerpc/include/asm/pgtable.h > @@ -219,6 +219,14 @@ static inline int pud_pfn(pud_t pud) > } > #endif > > +#ifndef pmd_user_accessible_page > +#define pmd_user_accessible_page(pmd, addr) false > +#endif > + > +#ifndef
Re: [PATCH v11 08/11] powerpc: mm: Add pud_pfn() stub
Le 28/03/2024 à 05:55, Rohan McLure a écrit : > The page table check feature requires that pud_pfn() be defined > on each consuming architecture. Since only 64-bit, Book3S platforms > allow for hugepages at this upper level, and since the calling code is > gated by a call to pud_user_accessible_page(), which will return zero, > include this stub as a BUILD_BUG(). > > Signed-off-by: Rohan McLure > --- > v11: pud_pfn() stub has been removed upstream as it has valid users now > in transparent hugepages. Create a BUG_ON() for other, non Book3S64 > platforms. > --- > arch/powerpc/include/asm/pgtable.h | 8 > 1 file changed, 8 insertions(+) > > diff --git a/arch/powerpc/include/asm/pgtable.h > b/arch/powerpc/include/asm/pgtable.h > index 239709a2f68e..ee8c82c0528f 100644 > --- a/arch/powerpc/include/asm/pgtable.h > +++ b/arch/powerpc/include/asm/pgtable.h > @@ -211,6 +211,14 @@ static inline bool > arch_supports_memmap_on_memory(unsigned long vmemmap_size) > > #endif /* CONFIG_PPC64 */ > > +#ifndef pud_pfn > +#define pud_pfn pud_pfn > +static inline int pud_pfn(pud_t pud) > +{ > + BUILD_BUG(); This function must return something. > +} > +#endif > + > #endif /* __ASSEMBLY__ */ > > #endif /* _ASM_POWERPC_PGTABLE_H */
[PATCH v11 02/11] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pmd_set"
This reverts commit a3b837130b5865521fa8662aceaa6ebc8d29389a. Reinstate previously unused parameters for the purpose of supporting powerpc platforms, as many do not encode user/kernel ownership of the page in the pte, but instead in the address of the access. riscv: Respect change to delete mm, addr parameters from __set_pte_at() This commit also changed calls to __set_pte_at() to use fewer parameters on riscv. Keep that change rather than reverting it, as the signature of __set_pte_at() is changed in a different commit. Signed-off-by: Rohan McLure --- arch/arm64/include/asm/pgtable.h | 4 ++-- arch/riscv/include/asm/pgtable.h | 4 ++-- arch/x86/include/asm/pgtable.h | 4 ++-- include/linux/page_table_check.h | 11 +++ mm/page_table_check.c| 3 ++- 5 files changed, 15 insertions(+), 11 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 7334e5526185..995cc6213d0d 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -560,7 +560,7 @@ static inline void __set_pte_at(struct mm_struct *mm, static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t pmd) { - page_table_check_pmd_set(mm, pmdp, pmd); + page_table_check_pmd_set(mm, addr, pmdp, pmd); return __set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd), PMD_SIZE >> PAGE_SHIFT); } @@ -1239,7 +1239,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, static inline pmd_t pmdp_establish(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp, pmd_t pmd) { - page_table_check_pmd_set(vma->vm_mm, pmdp, pmd); + page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd); return __pmd(xchg_relaxed(_val(*pmdp), pmd_val(pmd))); } #endif diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 1e0c0717b3f9..7b4053ff597e 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -712,7 +712,7 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd) static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t pmd) { - page_table_check_pmd_set(mm, pmdp, pmd); + page_table_check_pmd_set(mm, addr, pmdp, pmd); return __set_pte_at((pte_t *)pmdp, pmd_pte(pmd)); } @@ -783,7 +783,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, static inline pmd_t pmdp_establish(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp, pmd_t pmd) { - page_table_check_pmd_set(vma->vm_mm, pmdp, pmd); + page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd); return __pmd(atomic_long_xchg((atomic_long_t *)pmdp, pmd_val(pmd))); } diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 09db55fa8856..82bbe115a1a4 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1238,7 +1238,7 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp) static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t pmd) { - page_table_check_pmd_set(mm, pmdp, pmd); + page_table_check_pmd_set(mm, addr, pmdp, pmd); set_pmd(pmdp, pmd); } @@ -1383,7 +1383,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, static inline pmd_t pmdp_establish(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp, pmd_t pmd) { - page_table_check_pmd_set(vma->vm_mm, pmdp, pmd); + page_table_check_pmd_set(vma->vm_mm, address, pmdp, pmd); if (IS_ENABLED(CONFIG_SMP)) { return xchg(pmdp, pmd); } else { diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h index d188428512f5..5855d690c48a 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -19,7 +19,8 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd); void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud); void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte, unsigned int nr); -void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd); +void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr, + pmd_t *pmdp, pmd_t pmd); void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr, pud_t *pudp, pud_t pud); void __page_table_check_pte_clear_range(struct mm_struct *mm, @@ -75,13 +76,14 @@ static inline void page_table_check_ptes_set(struct mm_struct *mm, __page_table_check_ptes_set(mm, ptep, pte, nr); } -static inline void page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, +static inline void
[PATCH v11 08/11] powerpc: mm: Add pud_pfn() stub
The page table check feature requires that pud_pfn() be defined on each consuming architecture. Since only 64-bit, Book3S platforms allow for hugepages at this upper level, and since the calling code is gated by a call to pud_user_accessible_page(), which will return zero, include this stub as a BUILD_BUG(). Signed-off-by: Rohan McLure --- v11: pud_pfn() stub has been removed upstream as it has valid users now in transparent hugepages. Create a BUG_ON() for other, non Book3S64 platforms. --- arch/powerpc/include/asm/pgtable.h | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 239709a2f68e..ee8c82c0528f 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -211,6 +211,14 @@ static inline bool arch_supports_memmap_on_memory(unsigned long vmemmap_size) #endif /* CONFIG_PPC64 */ +#ifndef pud_pfn +#define pud_pfn pud_pfn +static inline int pud_pfn(pud_t pud) +{ + BUILD_BUG(); +} +#endif + #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_PGTABLE_H */ -- 2.44.0
[PATCH v11 03/11] mm: Provide addr parameter to page_table_check_pte_set()
To provide support for powerpc platforms, provide an addr parameter to the page_table_check_pte_set() routine. This parameter is needed on some powerpc platforms which do not encode whether a mapping is for user or kernel in the pte. On such platforms, this can be inferred form the addr parameter. Signed-off-by: Rohan McLure --- arch/arm64/include/asm/pgtable.h | 2 +- arch/riscv/include/asm/pgtable.h | 2 +- include/linux/page_table_check.h | 12 +++- include/linux/pgtable.h | 2 +- mm/page_table_check.c| 4 ++-- 5 files changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 995cc6213d0d..b3938f80a1b6 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -376,7 +376,7 @@ static inline void __set_ptes(struct mm_struct *mm, unsigned long __always_unused addr, pte_t *ptep, pte_t pte, unsigned int nr) { - page_table_check_ptes_set(mm, ptep, pte, nr); + page_table_check_ptes_set(mm, addr, ptep, pte, nr); __sync_cache_and_tags(pte, nr); for (;;) { diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 7b4053ff597e..a153d3d143d2 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -532,7 +532,7 @@ static inline void __set_pte_at(pte_t *ptep, pte_t pteval) static inline void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pteval, unsigned int nr) { - page_table_check_ptes_set(mm, ptep, pteval, nr); + page_table_check_ptes_set(mm, addr, ptep, pteval, nr); for (;;) { __set_pte_at(ptep, pteval); diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h index 5855d690c48a..9243c920ed02 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -17,8 +17,8 @@ void __page_table_check_zero(struct page *page, unsigned int order); void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte); void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd); void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud); -void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte, - unsigned int nr); +void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr, +pte_t *ptep, pte_t pte, unsigned int nr); void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t pmd); void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr, @@ -68,12 +68,13 @@ static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud) } static inline void page_table_check_ptes_set(struct mm_struct *mm, - pte_t *ptep, pte_t pte, unsigned int nr) +unsigned long addr, pte_t *ptep, +pte_t pte, unsigned int nr) { if (static_branch_likely(_table_check_disabled)) return; - __page_table_check_ptes_set(mm, ptep, pte, nr); + __page_table_check_ptes_set(mm, addr, ptep, pte, nr); } static inline void page_table_check_pmd_set(struct mm_struct *mm, @@ -129,7 +130,8 @@ static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud) } static inline void page_table_check_ptes_set(struct mm_struct *mm, - pte_t *ptep, pte_t pte, unsigned int nr) +unsigned long addr, pte_t *ptep, +pte_t pte, unsigned int nr) { } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 85fc7554cd52..b2b4c1160d4a 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -264,7 +264,7 @@ static inline pte_t pte_advance_pfn(pte_t pte, unsigned long nr) static inline void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned int nr) { - page_table_check_ptes_set(mm, ptep, pte, nr); + page_table_check_ptes_set(mm, addr, ptep, pte, nr); arch_enter_lazy_mmu_mode(); for (;;) { diff --git a/mm/page_table_check.c b/mm/page_table_check.c index 7b9d7b45505d..3a338fee6d00 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -182,8 +182,8 @@ void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud) } EXPORT_SYMBOL(__page_table_check_pud_clear); -void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte, - unsigned int nr) +void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr, +pte_t *ptep, pte_t pte, unsigned int nr) { unsigned int i; -- 2.44.0
[PATCH v11 11/11] powerpc: mm: Support page table check
On creation and clearing of a page table mapping, instrument such calls by invoking page_table_check_pte_set and page_table_check_pte_clear respectively. These calls serve as a sanity check against illegal mappings. Enable ARCH_SUPPORTS_PAGE_TABLE_CHECK for all platforms. See also: riscv support in commit 3fee229a8eb9 ("riscv/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK") arm64 in commit 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK") x86_64 in commit d283d422c6c4 ("x86: mm: add x86_64 support for page table check") Reviewed-by: Christophe Leroy Signed-off-by: Rohan McLure --- v9: Updated for new API. Instrument pmdp_collapse_flush's two constituent calls to avoid header hell v10: Cause p{u,m}dp_huge_get_and_clear() to resemble one another --- arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/book3s/32/pgtable.h | 7 ++- arch/powerpc/include/asm/book3s/64/pgtable.h | 45 +++- arch/powerpc/mm/book3s64/hash_pgtable.c | 4 ++ arch/powerpc/mm/book3s64/pgtable.c | 11 +++-- arch/powerpc/mm/book3s64/radix_pgtable.c | 3 ++ arch/powerpc/mm/pgtable.c| 4 ++ 7 files changed, 61 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index a68b9e637eda..66a72f9078f5 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -166,6 +166,7 @@ config PPC select ARCH_STACKWALK select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_SUPPORTS_DEBUG_PAGEALLOCif PPC_BOOK3S || PPC_8xx || 40x + select ARCH_SUPPORTS_PAGE_TABLE_CHECK select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF if PPC64 select ARCH_USE_MEMTEST diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h index 83f7b98ef49f..703deb5749e6 100644 --- a/arch/powerpc/include/asm/book3s/32/pgtable.h +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h @@ -201,6 +201,7 @@ void unmap_kernel_page(unsigned long va); #ifndef __ASSEMBLY__ #include #include +#include /* Bits to mask out from a PGD to get to the PUD page */ #define PGD_MASKED_BITS0 @@ -314,7 +315,11 @@ static inline int __ptep_test_and_clear_young(struct mm_struct *mm, static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - return __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0)); + pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0)); + + page_table_check_pte_clear(mm, addr, old_pte); + + return old_pte; } #define __HAVE_ARCH_PTEP_SET_WRPROTECT diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index d8640ddbcad1..6199d2b4bded 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -145,6 +145,8 @@ #define PAGE_KERNEL_ROX__pgprot(_PAGE_BASE | _PAGE_KERNEL_ROX) #ifndef __ASSEMBLY__ +#include + /* * page table defines */ @@ -415,8 +417,11 @@ static inline void huge_ptep_set_wrprotect(struct mm_struct *mm, static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - unsigned long old = pte_update(mm, addr, ptep, ~0UL, 0, 0); - return __pte(old); + pte_t old_pte = __pte(pte_update(mm, addr, ptep, ~0UL, 0, 0)); + + page_table_check_pte_clear(mm, addr, old_pte); + + return old_pte; } #define __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL @@ -425,11 +430,16 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, pte_t *ptep, int full) { if (full && radix_enabled()) { + pte_t old_pte; + /* * We know that this is a full mm pte clear and * hence can be sure there is no parallel set_pte. */ - return radix__ptep_get_and_clear_full(mm, addr, ptep, full); + old_pte = radix__ptep_get_and_clear_full(mm, addr, ptep, full); + page_table_check_pte_clear(mm, addr, old_pte); + + return old_pte; } return ptep_get_and_clear(mm, addr, ptep); } @@ -1306,19 +1316,34 @@ extern int pudp_test_and_clear_young(struct vm_area_struct *vma, static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp) { - if (radix_enabled()) - return radix__pmdp_huge_get_and_clear(mm, addr, pmdp); - return hash__pmdp_huge_get_and_clear(mm, addr, pmdp); + pmd_t old_pmd; + + if (radix_enabled()) { + old_pmd = radix__pmdp_huge_get_and_clear(mm, addr, pmdp); + } else { + old_pmd = hash__pmdp_huge_get_and_clear(mm, addr, pmdp); + } + +
[PATCH v11 09/11] poweprc: mm: Implement *_user_accessible_page() for ptes
Page table checking depends on architectures providing an implementation of p{te,md,ud}_user_accessible_page. With refactorisations made on powerpc/mm, the pte_access_permitted() and similar methods verify whether a userland page is accessible with the required permissions. Since page table checking is the only user of p{te,md,ud}_user_accessible_page(), implement these for all platforms, using some of the same preliminary checks taken by pte_access_permitted() on that platform. Since Commit 8e9bd41e4ce1 ("powerpc/nohash: Replace pte_user() by pte_read()") pte_user() is no longer required to be present on all platforms as it may be equivalent to or implied by pte_read(). Hence implementations of pte_user_accessible_page() are specialised. Signed-off-by: Rohan McLure --- v9: New implementation v10: Let book3s/64 use pte_user(), but otherwise default other platforms to using the address provided with the call to infer whether it is a user page or not. pmd/pud variants will warn on all other platforms, as they should not be used for user page mappings v11: Conditionally define p{m,u}d_user_accessible_page(), as not all platforms have p{m,u}d_leaf(), p{m,u}d_pte() stubs. --- arch/powerpc/include/asm/book3s/32/pgtable.h | 5 + arch/powerpc/include/asm/book3s/64/pgtable.h | 17 + arch/powerpc/include/asm/nohash/pgtable.h| 5 + arch/powerpc/include/asm/pgtable.h | 8 4 files changed, 35 insertions(+) diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h index 52971ee30717..83f7b98ef49f 100644 --- a/arch/powerpc/include/asm/book3s/32/pgtable.h +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h @@ -436,6 +436,11 @@ static inline bool pte_access_permitted(pte_t pte, bool write) return true; } +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr) +{ + return pte_present(pte) && !is_kernel_addr(addr); +} + /* Conversion functions: convert a page and protection to a page entry, * and a page entry and page directory to the page they refer to. * diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index fac5615e6bc5..d8640ddbcad1 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -538,6 +538,11 @@ static inline bool pte_access_permitted(pte_t pte, bool write) return arch_pte_access_permitted(pte_val(pte), write, 0); } +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr) +{ + return pte_present(pte) && pte_user(pte); +} + /* * Conversion functions: convert a page and protection to a page entry, * and a page entry and page directory to the page they refer to. @@ -1441,5 +1446,17 @@ static inline bool pud_leaf(pud_t pud) return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE)); } +#define pmd_user_accessible_page pmd_user_accessible_page +static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr) +{ + return pmd_leaf(pmd) && pte_user_accessible_page(pmd_pte(pmd), addr); +} + +#define pud_user_accessible_page pud_user_accessible_page +static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr) +{ + return pud_leaf(pud) && pte_user_accessible_page(pud_pte(pud), addr); +} + #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */ diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h index 427db14292c9..413d01a51e6f 100644 --- a/arch/powerpc/include/asm/nohash/pgtable.h +++ b/arch/powerpc/include/asm/nohash/pgtable.h @@ -213,6 +213,11 @@ static inline bool pte_access_permitted(pte_t pte, bool write) return true; } +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr) +{ + return pte_present(pte) && !is_kernel_addr(addr); +} + /* Conversion functions: convert a page and protection to a page entry, * and a page entry and page directory to the page they refer to. * diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index ee8c82c0528f..f1ceae778cb1 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -219,6 +219,14 @@ static inline int pud_pfn(pud_t pud) } #endif +#ifndef pmd_user_accessible_page +#define pmd_user_accessible_page(pmd, addr)false +#endif + +#ifndef pud_user_accessible_page +#define pud_user_accessible_page(pud, addr)false +#endif + #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_PGTABLE_H */ -- 2.44.0
[PATCH v11 10/11] powerpc: mm: Use set_pte_at_unchecked() for early-boot / internal usages
In the new set_ptes() API, set_pte_at() (a special case of set_ptes()) is intended to be instrumented by the page table check facility. There are however several other routines that constitute the API for setting page table entries, including set_pmd_at() among others. Such routines are themselves implemented in terms of set_ptes_at(). A future patch providing support for page table checking on powerpc must take care to avoid duplicate calls to page_table_check_p{te,md,ud}_set(). Allow for assignment of pte entries without instrumentation through the set_pte_at_unchecked() routine introduced in this patch. Cause API-facing routines that call set_pte_at() to instead call set_pte_at_unchecked(), which will remain uninstrumented by page table check. set_ptes() is itself implemented by calls to __set_pte_at(), so this eliminates redundant code. Also prefer set_pte_at_unchecked() in early-boot usages which should not be instrumented. Signed-off-by: Rohan McLure --- v9: New patch v10: don't reuse __set_pte_at(), as that will not apply filters. Instead use new set_pte_at_unchecked(). v11: Include the assertion that hwvalid => !protnone. It is possible that some of these calls can be safely replaced with __set_pte_at(), however that will have to be done at a later stage. --- arch/powerpc/include/asm/pgtable.h | 2 ++ arch/powerpc/mm/book3s64/hash_pgtable.c | 2 +- arch/powerpc/mm/book3s64/pgtable.c | 6 +++--- arch/powerpc/mm/book3s64/radix_pgtable.c | 8 arch/powerpc/mm/nohash/book3e_pgtable.c | 2 +- arch/powerpc/mm/pgtable.c| 8 arch/powerpc/mm/pgtable_32.c | 2 +- 7 files changed, 20 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index f1ceae778cb1..ad0c1451502d 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -46,6 +46,8 @@ struct mm_struct; void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned int nr); #define set_ptes set_ptes +void set_pte_at_unchecked(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte); #define update_mmu_cache(vma, addr, ptep) \ update_mmu_cache_range(NULL, vma, addr, ptep, 1) diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c b/arch/powerpc/mm/book3s64/hash_pgtable.c index 988948d69bc1..871472f99a01 100644 --- a/arch/powerpc/mm/book3s64/hash_pgtable.c +++ b/arch/powerpc/mm/book3s64/hash_pgtable.c @@ -165,7 +165,7 @@ int hash__map_kernel_page(unsigned long ea, unsigned long pa, pgprot_t prot) ptep = pte_alloc_kernel(pmdp, ea); if (!ptep) return -ENOMEM; - set_pte_at(_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, prot)); + set_pte_at_unchecked(_mm, ea, ptep, pfn_pte(pa >> PAGE_SHIFT, prot)); } else { /* * If the mm subsystem is not fully up, we cannot create a diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 83823db3488b..f7be5fa058e8 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -116,7 +116,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr, WARN_ON(!(pmd_leaf(pmd))); #endif trace_hugepage_set_pmd(addr, pmd_val(pmd)); - return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd)); + return set_pte_at_unchecked(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd)); } void set_pud_at(struct mm_struct *mm, unsigned long addr, @@ -133,7 +133,7 @@ void set_pud_at(struct mm_struct *mm, unsigned long addr, WARN_ON(!(pud_leaf(pud))); #endif trace_hugepage_set_pud(addr, pud_val(pud)); - return set_pte_at(mm, addr, pudp_ptep(pudp), pud_pte(pud)); + return set_pte_at_unchecked(mm, addr, pudp_ptep(pudp), pud_pte(pud)); } static void do_serialize(void *arg) @@ -539,7 +539,7 @@ void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, if (radix_enabled()) return radix__ptep_modify_prot_commit(vma, addr, ptep, old_pte, pte); - set_pte_at(vma->vm_mm, addr, ptep, pte); + set_pte_at_unchecked(vma->vm_mm, addr, ptep, pte); } #ifdef CONFIG_TRANSPARENT_HUGEPAGE diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c index 15e88f1439ec..e8da30536bd5 100644 --- a/arch/powerpc/mm/book3s64/radix_pgtable.c +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c @@ -109,7 +109,7 @@ static int early_map_kernel_page(unsigned long ea, unsigned long pa, ptep = pte_offset_kernel(pmdp, ea); set_the_pte: - set_pte_at(_mm, ea, ptep, pfn_pte(pfn, flags)); + set_pte_at_unchecked(_mm, ea, ptep, pfn_pte(pfn, flags)); asm volatile("ptesync": : :"memory"); return 0; } @@ -1522,7 +1522,7 @@ void
[PATCH v11 07/11] mm: Provide address parameter to p{te,md,ud}_user_accessible_page()
On several powerpc platforms, a page table entry may not imply whether the relevant mapping is for userspace or kernelspace. Instead, such platforms infer this by the address which is being accessed. Add an additional address argument to each of these routines in order to provide support for page table check on powerpc. Signed-off-by: Rohan McLure --- arch/arm64/include/asm/pgtable.h | 6 +++--- arch/riscv/include/asm/pgtable.h | 6 +++--- arch/x86/include/asm/pgtable.h | 6 +++--- mm/page_table_check.c| 12 ++-- 4 files changed, 15 insertions(+), 15 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 040c2e664cff..f698b30463f3 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1074,17 +1074,17 @@ static inline int pgd_devmap(pgd_t pgd) #endif #ifdef CONFIG_PAGE_TABLE_CHECK -static inline bool pte_user_accessible_page(pte_t pte) +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr) { return pte_present(pte) && (pte_user(pte) || pte_user_exec(pte)); } -static inline bool pmd_user_accessible_page(pmd_t pmd) +static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr) { return pmd_leaf(pmd) && !pmd_present_invalid(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd)); } -static inline bool pud_user_accessible_page(pud_t pud) +static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr) { return pud_leaf(pud) && (pud_user(pud) || pud_user_exec(pud)); } diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 92bf5c309055..b9663e03475b 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -724,17 +724,17 @@ static inline void set_pud_at(struct mm_struct *mm, unsigned long addr, } #ifdef CONFIG_PAGE_TABLE_CHECK -static inline bool pte_user_accessible_page(pte_t pte) +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr) { return pte_present(pte) && pte_user(pte); } -static inline bool pmd_user_accessible_page(pmd_t pmd) +static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr) { return pmd_leaf(pmd) && pmd_user(pmd); } -static inline bool pud_user_accessible_page(pud_t pud) +static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr) { return pud_leaf(pud) && pud_user(pud); } diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index b2b3902f8df4..e898813fce01 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1688,17 +1688,17 @@ static inline bool arch_has_hw_nonleaf_pmd_young(void) #endif #ifdef CONFIG_PAGE_TABLE_CHECK -static inline bool pte_user_accessible_page(pte_t pte) +static inline bool pte_user_accessible_page(pte_t pte, unsigned long addr) { return (pte_val(pte) & _PAGE_PRESENT) && (pte_val(pte) & _PAGE_USER); } -static inline bool pmd_user_accessible_page(pmd_t pmd) +static inline bool pmd_user_accessible_page(pmd_t pmd, unsigned long addr) { return pmd_leaf(pmd) && (pmd_val(pmd) & _PAGE_PRESENT) && (pmd_val(pmd) & _PAGE_USER); } -static inline bool pud_user_accessible_page(pud_t pud) +static inline bool pud_user_accessible_page(pud_t pud, unsigned long addr) { return pud_leaf(pud) && (pud_val(pud) & _PAGE_PRESENT) && (pud_val(pud) & _PAGE_USER); } diff --git a/mm/page_table_check.c b/mm/page_table_check.c index 98cccee74b02..aa5e16c8328e 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -155,7 +155,7 @@ void __page_table_check_pte_clear(struct mm_struct *mm, unsigned long addr, if (_mm == mm) return; - if (pte_user_accessible_page(pte)) { + if (pte_user_accessible_page(pte, addr)) { page_table_check_clear(pte_pfn(pte), PAGE_SIZE >> PAGE_SHIFT); } } @@ -167,7 +167,7 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr, if (_mm == mm) return; - if (pmd_user_accessible_page(pmd)) { + if (pmd_user_accessible_page(pmd, addr)) { page_table_check_clear(pmd_pfn(pmd), PMD_SIZE >> PAGE_SHIFT); } } @@ -179,7 +179,7 @@ void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr, if (_mm == mm) return; - if (pud_user_accessible_page(pud)) { + if (pud_user_accessible_page(pud, addr)) { page_table_check_clear(pud_pfn(pud), PUD_SIZE >> PAGE_SHIFT); } } @@ -195,7 +195,7 @@ void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr, for (i = 0; i < nr; i++) __page_table_check_pte_clear(mm, addr, ptep_get(ptep + i)); - if (pte_user_accessible_page(pte)) + if (pte_user_accessible_page(pte, addr)) page_table_check_set(pte_pfn(pte), nr, pte_write(pte)); }
[PATCH v11 06/11] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pte_clear"
This reverts commit aa232204c4689427cefa55fe975692b57291523a. Reinstate previously unused parameters for the purpose of supporting powerpc platforms, as many do not encode user/kernel ownership of the page in the pte, but instead in the address of the access. Signed-off-by: Rohan McLure --- arch/arm64/include/asm/pgtable.h | 2 +- arch/riscv/include/asm/pgtable.h | 2 +- arch/x86/include/asm/pgtable.h | 4 ++-- include/linux/page_table_check.h | 11 +++ include/linux/pgtable.h | 2 +- mm/page_table_check.c| 7 --- 6 files changed, 16 insertions(+), 12 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index d20afcfae530..040c2e664cff 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1145,7 +1145,7 @@ static inline pte_t __ptep_get_and_clear(struct mm_struct *mm, { pte_t pte = __pte(xchg_relaxed(_val(*ptep), 0)); - page_table_check_pte_clear(mm, pte); + page_table_check_pte_clear(mm, address, pte); return pte; } diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 0066626159a5..92bf5c309055 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -563,7 +563,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, { pte_t pte = __pte(atomic_long_xchg((atomic_long_t *)ptep, 0)); - page_table_check_pte_clear(mm, pte); + page_table_check_pte_clear(mm, address, pte); return pte; } diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 9876e6d92799..b2b3902f8df4 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1276,7 +1276,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { pte_t pte = native_ptep_get_and_clear(ptep); - page_table_check_pte_clear(mm, pte); + page_table_check_pte_clear(mm, addr, pte); return pte; } @@ -1292,7 +1292,7 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, * care about updates and native needs no locking */ pte = native_local_ptep_get_and_clear(ptep); - page_table_check_pte_clear(mm, pte); + page_table_check_pte_clear(mm, addr, pte); } else { pte = ptep_get_and_clear(mm, addr, ptep); } diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h index 0a6ebfa46a31..48721a4a2b84 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -14,7 +14,8 @@ extern struct static_key_true page_table_check_disabled; extern struct page_ext_operations page_table_check_ops; void __page_table_check_zero(struct page *page, unsigned int order); -void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte); +void __page_table_check_pte_clear(struct mm_struct *mm, unsigned long addr, + pte_t pte); void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr, pmd_t pmd); void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr, @@ -45,12 +46,13 @@ static inline void page_table_check_free(struct page *page, unsigned int order) __page_table_check_zero(page, order); } -static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte) +static inline void page_table_check_pte_clear(struct mm_struct *mm, + unsigned long addr, pte_t pte) { if (static_branch_likely(_table_check_disabled)) return; - __page_table_check_pte_clear(mm, pte); + __page_table_check_pte_clear(mm, addr, pte); } static inline void page_table_check_pmd_clear(struct mm_struct *mm, @@ -121,7 +123,8 @@ static inline void page_table_check_free(struct page *page, unsigned int order) { } -static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte) +static inline void page_table_check_pte_clear(struct mm_struct *mm, + unsigned long addr, pte_t pte) { } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index d17fbca4da7b..7c18a1e55696 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -454,7 +454,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, { pte_t pte = ptep_get(ptep); pte_clear(mm, address, ptep); - page_table_check_pte_clear(mm, pte); + page_table_check_pte_clear(mm, address, pte); return pte; } #endif diff --git a/mm/page_table_check.c b/mm/page_table_check.c index 7afaad9c6e6f..98cccee74b02 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -149,7 +149,8 @@ void __page_table_check_zero(struct page *page, unsigned int order)
[PATCH v11 05/11] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pmd_clear"
This reverts commit 1831414cd729a34af937d56ad684a66599de6344. Reinstate previously unused parameters for the purpose of supporting powerpc platforms, as many do not encode user/kernel ownership of the page in the pte, but instead in the address of the access. Signed-off-by: Rohan McLure --- arch/arm64/include/asm/pgtable.h | 2 +- arch/riscv/include/asm/pgtable.h | 2 +- arch/x86/include/asm/pgtable.h | 2 +- include/linux/page_table_check.h | 11 +++ include/linux/pgtable.h | 2 +- mm/page_table_check.c| 5 +++-- 6 files changed, 14 insertions(+), 10 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index b3938f80a1b6..d20afcfae530 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1188,7 +1188,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, { pmd_t pmd = __pmd(xchg_relaxed(_val(*pmdp), 0)); - page_table_check_pmd_clear(mm, pmd); + page_table_check_pmd_clear(mm, address, pmd); return pmd; } diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index a153d3d143d2..0066626159a5 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -767,7 +767,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, { pmd_t pmd = __pmd(atomic_long_xchg((atomic_long_t *)pmdp, 0)); - page_table_check_pmd_clear(mm, pmd); + page_table_check_pmd_clear(mm, address, pmd); return pmd; } diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index e35b2b4f5ea1..9876e6d92799 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1345,7 +1345,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, unsigned long { pmd_t pmd = native_pmdp_get_and_clear(pmdp); - page_table_check_pmd_clear(mm, pmd); + page_table_check_pmd_clear(mm, addr, pmd); return pmd; } diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h index d01a00ffc1f9..0a6ebfa46a31 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -15,7 +15,8 @@ extern struct page_ext_operations page_table_check_ops; void __page_table_check_zero(struct page *page, unsigned int order); void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte); -void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd); +void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr, + pmd_t pmd); void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr, pud_t pud); void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr, @@ -52,12 +53,13 @@ static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte) __page_table_check_pte_clear(mm, pte); } -static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd) +static inline void page_table_check_pmd_clear(struct mm_struct *mm, + unsigned long addr, pmd_t pmd) { if (static_branch_likely(_table_check_disabled)) return; - __page_table_check_pmd_clear(mm, pmd); + __page_table_check_pmd_clear(mm, addr, pmd); } static inline void page_table_check_pud_clear(struct mm_struct *mm, @@ -123,7 +125,8 @@ static inline void page_table_check_pte_clear(struct mm_struct *mm, pte_t pte) { } -static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd) +static inline void page_table_check_pmd_clear(struct mm_struct *mm, + unsigned long addr, pmd_t pmd) { } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 6a5c44c2208e..d17fbca4da7b 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -557,7 +557,7 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, pmd_t pmd = *pmdp; pmd_clear(pmdp); - page_table_check_pmd_clear(mm, pmd); + page_table_check_pmd_clear(mm, address, pmd); return pmd; } diff --git a/mm/page_table_check.c b/mm/page_table_check.c index a8c8fd7f06f8..7afaad9c6e6f 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -160,7 +160,8 @@ void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte) } EXPORT_SYMBOL(__page_table_check_pte_clear); -void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd) +void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr, + pmd_t pmd) { if (_mm == mm) return; @@ -204,7 +205,7 @@ void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr, if (_mm == mm) return; - __page_table_check_pmd_clear(mm, *pmdp); +
[PATCH v11 04/11] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pud_clear"
This reverts commit 931c38e16499a057e30a3033f4d6a9c242f0f156. Reinstate previously unused parameters for the purpose of supporting powerpc platforms, as many do not encode user/kernel ownership of the page in the pte, but instead in the address of the access. Signed-off-by: Rohan McLure --- arch/x86/include/asm/pgtable.h | 2 +- include/linux/page_table_check.h | 11 +++ include/linux/pgtable.h | 2 +- mm/page_table_check.c| 5 +++-- 4 files changed, 12 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 82bbe115a1a4..e35b2b4f5ea1 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1356,7 +1356,7 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, { pud_t pud = native_pudp_get_and_clear(pudp); - page_table_check_pud_clear(mm, pud); + page_table_check_pud_clear(mm, addr, pud); return pud; } diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h index 9243c920ed02..d01a00ffc1f9 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -16,7 +16,8 @@ extern struct page_ext_operations page_table_check_ops; void __page_table_check_zero(struct page *page, unsigned int order); void __page_table_check_pte_clear(struct mm_struct *mm, pte_t pte); void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd); -void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud); +void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr, + pud_t pud); void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned int nr); void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr, @@ -59,12 +60,13 @@ static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd) __page_table_check_pmd_clear(mm, pmd); } -static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud) +static inline void page_table_check_pud_clear(struct mm_struct *mm, + unsigned long addr, pud_t pud) { if (static_branch_likely(_table_check_disabled)) return; - __page_table_check_pud_clear(mm, pud); + __page_table_check_pud_clear(mm, addr, pud); } static inline void page_table_check_ptes_set(struct mm_struct *mm, @@ -125,7 +127,8 @@ static inline void page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd) { } -static inline void page_table_check_pud_clear(struct mm_struct *mm, pud_t pud) +static inline void page_table_check_pud_clear(struct mm_struct *mm, + unsigned long addr, pud_t pud) { } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index b2b4c1160d4a..6a5c44c2208e 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -570,7 +570,7 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, pud_t pud = *pudp; pud_clear(pudp); - page_table_check_pud_clear(mm, pud); + page_table_check_pud_clear(mm, address, pud); return pud; } diff --git a/mm/page_table_check.c b/mm/page_table_check.c index 3a338fee6d00..a8c8fd7f06f8 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -171,7 +171,8 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, pmd_t pmd) } EXPORT_SYMBOL(__page_table_check_pmd_clear); -void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud) +void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr, + pud_t pud) { if (_mm == mm) return; @@ -217,7 +218,7 @@ void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr, if (_mm == mm) return; - __page_table_check_pud_clear(mm, *pudp); + __page_table_check_pud_clear(mm, addr, *pudp); if (pud_user_accessible_page(pud)) { page_table_check_set(pud_pfn(pud), PUD_SIZE >> PAGE_SHIFT, pud_write(pud)); -- 2.44.0
[PATCH v11 00/11] Support page table check PowerPC
Support page table check on all PowerPC platforms. This works by serialising assignments, reassignments and clears of page table entries at each level in order to ensure that anonymous mappings have at most one writable consumer, and likewise that file-backed mappings are not simultaneously also anonymous mappings. In order to support this infrastructure, a number of stubs must be defined for all powerpc platforms. Additionally, seperate set_pte_at() and set_pte_at_unchecked(), to allow for internal, uninstrumented mappings. v11: * The pud_pfn() stub, which previously had no legitimate users on any powerpc platform, now has users in Book3s64 with transparent pages. Include a stub of the same name for each platform that does not define their own. * Drop patch that standardised use of p*d_leaf(), as already included upstream in v6.9. * Provide fallback definitions of p{m,u}d_user_accessible_page() that do not reference p*d_leaf(), p*d_pte(), as they are defined after powerpc/mm headers by linux/mm headers. * Ensure that set_pte_at_unchecked() has the same checks as set_pte_at(). v10: * Revert patches that removed address and mm parameters from page table check routines, including consuming code from arm64, x86_64 and riscv. * Implement *_user_accessible_page() routines in terms of pte_user() where available (64-bit, book3s) but otherwise by checking the address (on platforms where the pte does not imply whether the mapping is for user or kernel) * Internal set_pte_at() calls replaced with set_pte_at_unchecked(), which is identical, but prevents double instrumentation. Link: https://lore.kernel.org/linuxppc-dev/20240313042118.230397-9-rmcl...@linux.ibm.com/T/ v9: * Adapt to using the set_ptes() API, using __set_pte_at() where we need must avoid instrumentation. * Use the logic of *_access_permitted() for implementing *_user_accessible_page(), which are required routines for page table check. * Even though we no longer need p{m,u,4}d_leaf(), still default implement these to assist in refactoring out extant p{m,u,4}_is_leaf(). * Add p{m,u}_pte() stubs where asm-generic does not provide them, as page table check wants all *user_accessible_page() variants, and we would like to default implement the variants in terms of pte_user_accessible_page(). * Avoid the ugly pmdp_collapse_flush() macro nonsense! Just instrument its constituent calls instead for radix and hash. Link: https://lore.kernel.org/linuxppc-dev/20231130025404.37179-2-rmcl...@linux.ibm.com/ v8: * Fix linux/page_table_check.h include in asm/pgtable.h breaking 32-bit. Link: https://lore.kernel.org/linuxppc-dev/20230215231153.2147454-1-rmcl...@linux.ibm.com/ v7: * Remove use of extern in set_pte prototypes * Clean up pmdp_collapse_flush macro * Replace set_pte_at with static inline function * Fix commit message for patch 7 Link: https://lore.kernel.org/linuxppc-dev/20230215020155.1969194-1-rmcl...@linux.ibm.com/ v6: * Support huge pages and p{m,u}d accounting. * Remove instrumentation from set_pte from kernel internal pages. * 64s: Implement pmdp_collapse_flush in terms of __pmdp_collapse_flush as access to the mm_struct * is required. Link: https://lore.kernel.org/linuxppc-dev/20230214015939.1853438-1-rmcl...@linux.ibm.com/ v5: Link: https://lore.kernel.org/linuxppc-dev/20221118002146.25979-1-rmcl...@linux.ibm.com/ Rohan McLure (11): Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pud_set" Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pmd_set" mm: Provide addr parameter to page_table_check_pte_set() Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pud_clear" Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pmd_clear" Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pte_clear" mm: Provide address parameter to p{te,md,ud}_user_accessible_page() powerpc: mm: Add pud_pfn() stub poweprc: mm: Implement *_user_accessible_page() for ptes powerpc: mm: Use set_pte_at_unchecked() for early-boot / internal usages powerpc: mm: Support page table check arch/arm64/include/asm/pgtable.h | 18 +++--- arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/book3s/32/pgtable.h | 12 +++- arch/powerpc/include/asm/book3s/64/pgtable.h | 62 +++--- arch/powerpc/include/asm/nohash/pgtable.h| 5 ++ arch/powerpc/include/asm/pgtable.h | 18 ++ arch/powerpc/mm/book3s64/hash_pgtable.c | 6 +- arch/powerpc/mm/book3s64/pgtable.c | 17 +++-- arch/powerpc/mm/book3s64/radix_pgtable.c | 11 ++-- arch/powerpc/mm/nohash/book3e_pgtable.c | 2 +- arch/powerpc/mm/pgtable.c| 12 arch/powerpc/mm/pgtable_32.c | 2 +- arch/riscv/include/asm/pgtable.h | 18
[PATCH v11 01/11] Revert "mm/page_table_check: remove unused parameter in [__]page_table_check_pud_set"
This reverts commit 6d144436d954311f2dbacb5bf7b084042448d83e. Reinstate previously unused parameters for the purpose of supporting powerpc platforms, as many do not encode user/kernel ownership of the page in the pte, but instead in the address of the access. riscv: Respect change to delete mm, addr parameters from __set_pte_at() This commit also changed calls to __set_pte_at() to use fewer parameters on riscv. Keep that change rather than reverting it, as the signature of __set_pte_at() is changed in a different commit. Signed-off-by: Rohan McLure --- arch/arm64/include/asm/pgtable.h | 2 +- arch/riscv/include/asm/pgtable.h | 2 +- arch/x86/include/asm/pgtable.h | 2 +- include/linux/page_table_check.h | 11 +++ mm/page_table_check.c| 3 ++- 5 files changed, 12 insertions(+), 8 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index afdd56d26ad7..7334e5526185 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -568,7 +568,7 @@ static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr, static inline void set_pud_at(struct mm_struct *mm, unsigned long addr, pud_t *pudp, pud_t pud) { - page_table_check_pud_set(mm, pudp, pud); + page_table_check_pud_set(mm, addr, pudp, pud); return __set_pte_at(mm, addr, (pte_t *)pudp, pud_pte(pud), PUD_SIZE >> PAGE_SHIFT); } diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 20242402fc11..1e0c0717b3f9 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -719,7 +719,7 @@ static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr, static inline void set_pud_at(struct mm_struct *mm, unsigned long addr, pud_t *pudp, pud_t pud) { - page_table_check_pud_set(mm, pudp, pud); + page_table_check_pud_set(mm, addr, pudp, pud); return __set_pte_at((pte_t *)pudp, pud_pte(pud)); } diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 315535ffb258..09db55fa8856 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1245,7 +1245,7 @@ static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr, static inline void set_pud_at(struct mm_struct *mm, unsigned long addr, pud_t *pudp, pud_t pud) { - page_table_check_pud_set(mm, pudp, pud); + page_table_check_pud_set(mm, addr, pudp, pud); native_set_pud(pudp, pud); } diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h index 6722941c7cb8..d188428512f5 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -20,7 +20,8 @@ void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud); void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t pte, unsigned int nr); void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd); -void __page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp, pud_t pud); +void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr, + pud_t *pudp, pud_t pud); void __page_table_check_pte_clear_range(struct mm_struct *mm, unsigned long addr, pmd_t pmd); @@ -83,13 +84,14 @@ static inline void page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, __page_table_check_pmd_set(mm, pmdp, pmd); } -static inline void page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp, +static inline void page_table_check_pud_set(struct mm_struct *mm, + unsigned long addr, pud_t *pudp, pud_t pud) { if (static_branch_likely(_table_check_disabled)) return; - __page_table_check_pud_set(mm, pudp, pud); + __page_table_check_pud_set(mm, addr, pudp, pud); } static inline void page_table_check_pte_clear_range(struct mm_struct *mm, @@ -134,7 +136,8 @@ static inline void page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, { } -static inline void page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp, +static inline void page_table_check_pud_set(struct mm_struct *mm, + unsigned long addr, pud_t *pudp, pud_t pud) { } diff --git a/mm/page_table_check.c b/mm/page_table_check.c index af69c3c8f7c2..75167537ebd7 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -210,7 +210,8 @@ void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t pmd) } EXPORT_SYMBOL(__page_table_check_pmd_set); -void __page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp,
[PATCH] serial/pmac_zilog: Remove flawed mitigation for rx irq flood
The mitigation was intended to stop the irq completely. That might have been better than a hard lock-up but it turns out that you get a crash anyway if you're using pmac_zilog as a serial console. That's because the pr_err() call in pmz_receive_chars() results in pmz_console_write() attempting to lock a spinlock already locked in pmz_interrupt(). With CONFIG_DEBUG_SPINLOCK=y, this produces a fatal BUG splat like the one below. (The spinlock at 0x62e140 is the one in struct uart_port.) Even when it's not fatal, the serial port rx function ceases to work. Also, the iteration limit doesn't play nicely with QEMU. Please see bug report linked below. A web search for reports of the error message "pmz: rx irq flood" didn't produce anything. So I don't think this code is needed any more. Remove it. [ 14.56] ttyPZ0: pmz: rx irq flood ! [ 14.56] BUG: spinlock recursion on CPU#0, swapper/0 [ 14.56] lock: 0x62e140, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0 [ 14.56] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-mac-dbg-preempt-4-g4143b7b9144a #1 [ 14.56] Stack from 0059bcc4: [ 14.56] 0059bcc4 0056316f 0056316f 2700 004b6444 0059bce4 004ad8c6 0056316f [ 14.56] 0059bd10 004a6546 00556759 0062e140 dead4ead 0059f892 [ 14.56] 0062e140 0059bde8 005c03d0 0059bd24 0004daf6 0062e140 005567bf 0062e140 [ 14.56] 0059bd34 004b64c2 0062e140 0001 0059bd50 002e15ea 0062e140 0001 [ 14.56] 0059bde7 0059bde8 005c03d0 0059bdac 0005124e 005c03d0 005cdc00 002b [ 14.56] 005a3caa 005a3caa 0059bde8 0004ff00 0059be8b 00038200 000529ba [ 14.56] Call Trace: [<2700>] ret_from_kernel_thread+0xc/0x14 [ 14.56] [<004b6444>] _raw_spin_lock+0x0/0x28 [ 14.56] [<004ad8c6>] dump_stack+0x10/0x16 [ 14.56] [<004a6546>] spin_dump+0x6e/0x7c [ 14.56] [<0004daf6>] do_raw_spin_lock+0x9c/0xa6 [ 14.56] [<004b64c2>] _raw_spin_lock_irqsave+0x2a/0x34 [ 14.56] [<002e15ea>] pmz_console_write+0x32/0x9a [ 14.56] [<0005124e>] console_flush_all+0x112/0x3a2 [ 14.56] [<0004ff00>] console_trylock+0x0/0x7a [ 14.56] [<00038200>] parameq+0x48/0x6e [ 14.56] [<000529ba>] __printk_safe_enter+0x0/0x36 [ 14.56] [<0005113c>] console_flush_all+0x0/0x3a2 [ 14.56] [<000542c4>] prb_read_valid+0x0/0x1a [ 14.56] [<004b65a4>] _raw_spin_unlock+0x0/0x38 [ 14.56] [<0005151e>] console_unlock+0x40/0xb8 [ 14.56] [<00038200>] parameq+0x48/0x6e [ 14.56] [<002c778c>] __tty_insert_flip_string_flags+0x0/0x14e [ 14.56] [<00051798>] vprintk_emit+0x156/0x238 [ 14.56] [<00051894>] vprintk_default+0x1a/0x1e [ 14.56] [<000529a8>] vprintk+0x74/0x86 [ 14.56] [<004a6596>] _printk+0x12/0x16 [ 14.56] [<002e23be>] pmz_receive_chars+0x1cc/0x394 [ 14.56] [<004b6444>] _raw_spin_lock+0x0/0x28 [ 14.56] [<00038226>] parse_args+0x0/0x3a6 [ 14.56] [<004b6466>] _raw_spin_lock+0x22/0x28 [ 14.56] [<002e26b4>] pmz_interrupt+0x12e/0x1e0 [ 14.56] [<00048680>] arch_cpu_idle_enter+0x0/0x8 [ 14.56] [<00054ebc>] __handle_irq_event_percpu+0x24/0x106 [ 14.56] [<004ae576>] default_idle_call+0x0/0x46 [ 14.56] [<00055020>] handle_irq_event+0x30/0x90 [ 14.56] [<00058320>] handle_simple_irq+0x5e/0xc0 [ 14.56] [<00048688>] arch_cpu_idle_exit+0x0/0x8 [ 14.56] [<00054800>] generic_handle_irq+0x3c/0x4a [ 14.56] [<2978>] do_IRQ+0x24/0x3a [ 14.56] [<004ae508>] cpu_idle_poll.isra.0+0x0/0x6e [ 14.56] [<2874>] auto_irqhandler_fixup+0x4/0xc [ 14.56] [<004ae508>] cpu_idle_poll.isra.0+0x0/0x6e [ 14.56] [<004ae576>] default_idle_call+0x0/0x46 [ 14.56] [<004ae598>] default_idle_call+0x22/0x46 [ 14.56] [<00048710>] do_idle+0x6a/0xf0 [ 14.56] [<000486a6>] do_idle+0x0/0xf0 [ 14.56] [<000367d2>] find_task_by_pid_ns+0x0/0x2a [ 14.56] [<0005d064>] __rcu_read_lock+0x0/0x12 [ 14.56] [<00048a5a>] cpu_startup_entry+0x18/0x1c [ 14.56] [<00063a06>] __rcu_read_unlock+0x0/0x26 [ 14.56] [<004ae65a>] kernel_init+0x0/0xfa [ 14.56] [<0049c5a8>] strcpy+0x0/0x1e [ 14.56] [<004a6584>] _printk+0x0/0x16 [ 14.56] [<0049c72a>] strlen+0x0/0x22 [ 14.56] [<006452d4>] memblock_alloc_try_nid+0x0/0x82 [ 14.56] [<0063939a>] arch_post_acpi_subsys_init+0x0/0x8 [ 14.56] [<0063991e>] console_on_rootfs+0x0/0x60 [ 14.56] [<00638410>] _sinittext+0x410/0xadc [ 14.56] Cc: Benjamin Herrenschmidt Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: "Aneesh Kumar K.V" Cc: "Naveen N. Rao" Cc: linux-m...@lists.linux-m68k.org Link: https://github.com/vivier/qemu-m68k/issues/44 Link: https://lore.kernel.org/all/1078874617.9746.36.camel@gaston/ Signed-off-by: Finn Thain --- drivers/tty/serial/pmac_zilog.c | 14
Re: [PATCH v3] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id
在 2024/3/28 2:17, Andrew Morton 写道: On Fri, 26 Jan 2024 14:44:51 +0800 Huang Shijie wrote: During the kernel booting, the generic cpu_to_node() is called too early in arm64, powerpc and riscv when CONFIG_NUMA is enabled. There are at least four places in the common code where the generic cpu_to_node() is called before it is initialized: 1.) early_trace_init() in kernel/trace/trace.c 2.) sched_init() in kernel/sched/core.c 3.) init_sched_fair_class()in kernel/sched/fair.c 4.) workqueue_init_early() in kernel/workqueue.c In order to fix the bug, the patch introduces early_numa_node_init() which is called after smp_prepare_boot_cpu() in start_kernel. early_numa_node_init will initialize the "numa_node" as soon as the early_cpu_to_node() is ready, before the cpu_to_node() is called at the first time. What are the userspace-visible runtime effects of this bug? For this bug, I do not see too much performance impact in the userspace applications. It just pollutes the CPU caches in NUMA. Thanks Huang Shijie
Re: [PATCH v2 5/6] mm/mm_init.c: remove unneeded calc_memmap_size()
On 03/27/24 at 06:21pm, Mike Rapoport wrote: > On Mon, Mar 25, 2024 at 10:56:45PM +0800, Baoquan He wrote: > > Nobody calls calc_memmap_size() now. > > > > Signed-off-by: Baoquan He > > Reviewed-by: Mike Rapoport (IBM) > > Looks like I replied to patch 6/6 twice by mistake and missed this one. Thanks for your careful reviewing. > > > --- > > mm/mm_init.c | 20 > > 1 file changed, 20 deletions(-) > > > > diff --git a/mm/mm_init.c b/mm/mm_init.c > > index 7f71e56e83f3..e269a724f70e 100644 > > --- a/mm/mm_init.c > > +++ b/mm/mm_init.c > > @@ -1331,26 +1331,6 @@ static void __init calculate_node_totalpages(struct > > pglist_data *pgdat, > > pr_debug("On node %d totalpages: %lu\n", pgdat->node_id, > > realtotalpages); > > } > > > > -static unsigned long __init calc_memmap_size(unsigned long spanned_pages, > > - unsigned long present_pages) > > -{ > > - unsigned long pages = spanned_pages; > > - > > - /* > > -* Provide a more accurate estimation if there are holes within > > -* the zone and SPARSEMEM is in use. If there are holes within the > > -* zone, each populated memory region may cost us one or two extra > > -* memmap pages due to alignment because memmap pages for each > > -* populated regions may not be naturally aligned on page boundary. > > -* So the (present_pages >> 4) heuristic is a tradeoff for that. > > -*/ > > - if (spanned_pages > present_pages + (present_pages >> 4) && > > - IS_ENABLED(CONFIG_SPARSEMEM)) > > - pages = present_pages; > > - > > - return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT; > > -} > > - > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > > static void pgdat_init_split_queue(struct pglist_data *pgdat) > > { > > -- > > 2.41.0 > > > > -- > Sincerely yours, > Mike. >
Re: [PATCH v3 12/14] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
On 2024-03-27 4:25 PM, Andrew Morton wrote: > On Wed, 27 Mar 2024 13:00:43 -0700 Samuel Holland > wrote: > >> Now that all previously-supported architectures select >> ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead >> of the existing list of architectures. It can also take advantage of the >> common kernel-mode FPU API and method of adjusting CFLAGS. >> >> ... >> >> @@ -87,16 +78,9 @@ void dc_fpu_begin(const char *function_name, const int >> line) >> WARN_ON_ONCE(!in_task()); >> preempt_disable(); >> depth = __this_cpu_inc_return(fpu_recursion_depth); >> - >> if (depth == 1) { >> -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) >> +BUG_ON(!kernel_fpu_available()); >> kernel_fpu_begin(); > > For some reason kernel_fpu_available() was undefined in my x86_64 > allmodconfig build. I just removed the statement. This is because the include guard in asm/fpu.h conflicts with the existing one in asm/fpu/types.h (which doesn't match its filename), so the definition of kernel_fpu_available() is not seen. I can fix up the include guard in asm/fpu/types.h in the next version: diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index ace9aa3b78a3..75a3910d867a 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -2,8 +2,8 @@ /* * FPU data structures: */ -#ifndef _ASM_X86_FPU_H -#define _ASM_X86_FPU_H +#ifndef _ASM_X86_FPU_TYPES_H +#define _ASM_X86_FPU_TYPES_H #include @@ -596,4 +596,4 @@ struct fpu_state_config { /* FPU state configuration information */ extern struct fpu_state_config fpu_kernel_cfg, fpu_user_cfg; -#endif /* _ASM_X86_FPU_H */ +#endif /* _ASM_X86_FPU_TY{ES_H */ Regards, Samuel
Re: [PATCH v3 12/14] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
On Wed, 27 Mar 2024 13:00:43 -0700 Samuel Holland wrote: > Now that all previously-supported architectures select > ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead > of the existing list of architectures. It can also take advantage of the > common kernel-mode FPU API and method of adjusting CFLAGS. > > ... > > @@ -87,16 +78,9 @@ void dc_fpu_begin(const char *function_name, const int > line) > WARN_ON_ONCE(!in_task()); > preempt_disable(); > depth = __this_cpu_inc_return(fpu_recursion_depth); > - > if (depth == 1) { > -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) > + BUG_ON(!kernel_fpu_available()); > kernel_fpu_begin(); For some reason kernel_fpu_available() was undefined in my x86_64 allmodconfig build. I just removed the statement.
Re: [PATCH 9/9] mmc: Convert from tasklet to BH workqueue
Dne sreda, 27. marec 2024 ob 17:03:14 CET je Allen Pais napisal(a): > The only generic interface to execute asynchronously in the BH context is > tasklet; however, it's marked deprecated and has some design flaws. To > replace tasklets, BH workqueue support was recently added. A BH workqueue > behaves similarly to regular workqueues except that the queued work items > are executed in the BH context. > > This patch converts drivers/infiniband/* from tasklet to BH workqueue. infiniband -> mmc Best regards, Jernej > > Based on the work done by Tejun Heo > Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10 > > Signed-off-by: Allen Pais
Re: [PATCH 4/9] USB: Convert from tasklet to BH workqueue
On Wed, Mar 27, 2024 at 04:03:09PM +, Allen Pais wrote: > The only generic interface to execute asynchronously in the BH context is > tasklet; however, it's marked deprecated and has some design flaws. To > replace tasklets, BH workqueue support was recently added. A BH workqueue > behaves similarly to regular workqueues except that the queued work items > are executed in the BH context. > > This patch converts drivers/infiniband/* from tasklet to BH workqueue. > > Based on the work done by Tejun Heo > Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10 > > Signed-off-by: Allen Pais > --- > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c > index c0e005670d67..88d8e1c366cd 100644 > --- a/drivers/usb/core/hcd.c > +++ b/drivers/usb/core/hcd.c > @@ -1662,10 +1663,9 @@ static void __usb_hcd_giveback_urb(struct urb *urb) > usb_put_urb(urb); > } > > -static void usb_giveback_urb_bh(struct work_struct *work) > +static void usb_giveback_urb_bh(struct work_struct *t) > { > - struct giveback_urb_bh *bh = > - container_of(work, struct giveback_urb_bh, bh); > + struct giveback_urb_bh *bh = from_work(bh, t, bh); > struct list_head local_list; > > spin_lock_irq(>lock); Is there any reason for this apparently pointless change of a local variable's name? Alan Stern
Re: [PATCH 6/9] ipmi: Convert from tasklet to BH workqueue
On Wed, Mar 27, 2024 at 04:03:11PM +, Allen Pais wrote: > The only generic interface to execute asynchronously in the BH context is > tasklet; however, it's marked deprecated and has some design flaws. To > replace tasklets, BH workqueue support was recently added. A BH workqueue > behaves similarly to regular workqueues except that the queued work items > are executed in the BH context. > > This patch converts drivers/infiniband/* from tasklet to BH workqueue. I think you mean drivers/char/ipmi/* here. I believe that work queues items are execute single-threaded for a work queue, so this should be good. I need to test this, though. It may be that an IPMI device can have its own work queue; it may not be important to run it in bh context. -corey > > Based on the work done by Tejun Heo > Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10 > > Signed-off-by: Allen Pais > --- > drivers/char/ipmi/ipmi_msghandler.c | 30 ++--- > 1 file changed, 15 insertions(+), 15 deletions(-) > > diff --git a/drivers/char/ipmi/ipmi_msghandler.c > b/drivers/char/ipmi/ipmi_msghandler.c > index b0eedc4595b3..fce2a2dbdc82 100644 > --- a/drivers/char/ipmi/ipmi_msghandler.c > +++ b/drivers/char/ipmi/ipmi_msghandler.c > @@ -36,12 +36,13 @@ > #include > #include > #include > +#include > > #define IPMI_DRIVER_VERSION "39.2" > > static struct ipmi_recv_msg *ipmi_alloc_recv_msg(void); > static int ipmi_init_msghandler(void); > -static void smi_recv_tasklet(struct tasklet_struct *t); > +static void smi_recv_work(struct work_struct *t); > static void handle_new_recv_msgs(struct ipmi_smi *intf); > static void need_waiter(struct ipmi_smi *intf); > static int handle_one_recv_msg(struct ipmi_smi *intf, > @@ -498,13 +499,13 @@ struct ipmi_smi { > /* >* Messages queued for delivery. If delivery fails (out of memory >* for instance), They will stay in here to be processed later in a > - * periodic timer interrupt. The tasklet is for handling received > + * periodic timer interrupt. The work is for handling received >* messages directly from the handler. >*/ > spinlock_t waiting_rcv_msgs_lock; > struct list_head waiting_rcv_msgs; > atomic_t watchdog_pretimeouts_to_deliver; > - struct tasklet_struct recv_tasklet; > + struct work_struct recv_work; > > spinlock_t xmit_msgs_lock; > struct list_head xmit_msgs; > @@ -704,7 +705,7 @@ static void clean_up_interface_data(struct ipmi_smi *intf) > struct cmd_rcvr *rcvr, *rcvr2; > struct list_head list; > > - tasklet_kill(>recv_tasklet); > + cancel_work_sync(>recv_work); > > free_smi_msg_list(>waiting_rcv_msgs); > free_recv_msg_list(>waiting_events); > @@ -1319,7 +1320,7 @@ static void free_user(struct kref *ref) > { > struct ipmi_user *user = container_of(ref, struct ipmi_user, refcount); > > - /* SRCU cleanup must happen in task context. */ > + /* SRCU cleanup must happen in work context. */ > queue_work(remove_work_wq, >remove_work); > } > > @@ -3605,8 +3606,7 @@ int ipmi_add_smi(struct module *owner, > intf->curr_seq = 0; > spin_lock_init(>waiting_rcv_msgs_lock); > INIT_LIST_HEAD(>waiting_rcv_msgs); > - tasklet_setup(>recv_tasklet, > - smi_recv_tasklet); > + INIT_WORK(>recv_work, smi_recv_work); > atomic_set(>watchdog_pretimeouts_to_deliver, 0); > spin_lock_init(>xmit_msgs_lock); > INIT_LIST_HEAD(>xmit_msgs); > @@ -4779,7 +4779,7 @@ static void handle_new_recv_msgs(struct ipmi_smi *intf) >* To preserve message order, quit if we >* can't handle a message. Add the message >* back at the head, this is safe because this > - * tasklet is the only thing that pulls the > + * work is the only thing that pulls the >* messages. >*/ > list_add(_msg->link, >waiting_rcv_msgs); > @@ -4812,10 +4812,10 @@ static void handle_new_recv_msgs(struct ipmi_smi > *intf) > } > } > > -static void smi_recv_tasklet(struct tasklet_struct *t) > +static void smi_recv_work(struct work_struct *t) > { > unsigned long flags = 0; /* keep us warning-free. */ > - struct ipmi_smi *intf = from_tasklet(intf, t, recv_tasklet); > + struct ipmi_smi *intf = from_work(intf, t, recv_work); > int run_to_completion = intf->run_to_completion; > struct ipmi_smi_msg *newmsg = NULL; > > @@ -4866,7 +4866,7 @@ void ipmi_smi_msg_received(struct ipmi_smi *intf, > > /* >* To preserve message order, we keep a queue and deliver from > - * a tasklet. > + * a work. >*/ > if (!run_to_completion) > spin_lock_irqsave(>waiting_rcv_msgs_lock, flags); > @@ -4887,9
Re: [PATCH 4/9] USB: Convert from tasklet to BH workqueue
> > The only generic interface to execute asynchronously in the BH context is > > tasklet; however, it's marked deprecated and has some design flaws. To > > replace tasklets, BH workqueue support was recently added. A BH workqueue > > behaves similarly to regular workqueues except that the queued work items > > are executed in the BH context. > > > > This patch converts drivers/infiniband/* from tasklet to BH workqueue. > > No it does not, I think your changelog is wrong :( Whoops, sorry about that. I messed up the commit messages. I will fix it in v2. > > > > > Based on the work done by Tejun Heo > > Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10 > > > > Signed-off-by: Allen Pais > > --- > > drivers/usb/atm/usbatm.c| 55 +++-- > > drivers/usb/atm/usbatm.h| 3 +- > > drivers/usb/core/hcd.c | 22 ++-- > > drivers/usb/gadget/udc/fsl_qe_udc.c | 21 +-- > > drivers/usb/gadget/udc/fsl_qe_udc.h | 4 +-- > > drivers/usb/host/ehci-sched.c | 2 +- > > drivers/usb/host/fhci-hcd.c | 3 +- > > drivers/usb/host/fhci-sched.c | 10 +++--- > > drivers/usb/host/fhci.h | 5 +-- > > drivers/usb/host/xhci-dbgcap.h | 3 +- > > drivers/usb/host/xhci-dbgtty.c | 15 > > include/linux/usb/cdc_ncm.h | 2 +- > > include/linux/usb/usbnet.h | 2 +- > > 13 files changed, 76 insertions(+), 71 deletions(-) > > > > diff --git a/drivers/usb/atm/usbatm.c b/drivers/usb/atm/usbatm.c > > index 2da6615fbb6f..74849f24e52e 100644 > > --- a/drivers/usb/atm/usbatm.c > > +++ b/drivers/usb/atm/usbatm.c > > @@ -17,7 +17,7 @@ > > * - Removed the limit on the number of devices > > * - Module now autoloads on device plugin > > * - Merged relevant parts of sarlib > > - * - Replaced the kernel thread with a tasklet > > + * - Replaced the kernel thread with a work > > a "work"? will fix the comments. > > > * - New packet transmission code > > * - Changed proc file contents > > * - Fixed all known SMP races > > @@ -68,6 +68,7 @@ > > #include > > #include > > #include > > +#include > > > > #ifdef VERBOSE_DEBUG > > static int usbatm_print_packet(struct usbatm_data *instance, const > > unsigned char *data, int len); > > @@ -249,7 +250,7 @@ static void usbatm_complete(struct urb *urb) > > /* vdbg("%s: urb 0x%p, status %d, actual_length %d", > >__func__, urb, status, urb->actual_length); */ > > > > - /* Can be invoked from task context, protect against interrupts */ > > + /* Can be invoked from work context, protect against interrupts */ > > "workqueue"? This too seems wrong. > > Same for other comment changes in this patch. Thanks for the quick review, I will fix the comments and send out v2. - Alle > thanks, > > greg k-h >
Re: [PATCH 4/9] USB: Convert from tasklet to BH workqueue
On Wed, Mar 27, 2024 at 04:03:09PM +, Allen Pais wrote: > The only generic interface to execute asynchronously in the BH context is > tasklet; however, it's marked deprecated and has some design flaws. To > replace tasklets, BH workqueue support was recently added. A BH workqueue > behaves similarly to regular workqueues except that the queued work items > are executed in the BH context. > > This patch converts drivers/infiniband/* from tasklet to BH workqueue. No it does not, I think your changelog is wrong :( > > Based on the work done by Tejun Heo > Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10 > > Signed-off-by: Allen Pais > --- > drivers/usb/atm/usbatm.c| 55 +++-- > drivers/usb/atm/usbatm.h| 3 +- > drivers/usb/core/hcd.c | 22 ++-- > drivers/usb/gadget/udc/fsl_qe_udc.c | 21 +-- > drivers/usb/gadget/udc/fsl_qe_udc.h | 4 +-- > drivers/usb/host/ehci-sched.c | 2 +- > drivers/usb/host/fhci-hcd.c | 3 +- > drivers/usb/host/fhci-sched.c | 10 +++--- > drivers/usb/host/fhci.h | 5 +-- > drivers/usb/host/xhci-dbgcap.h | 3 +- > drivers/usb/host/xhci-dbgtty.c | 15 > include/linux/usb/cdc_ncm.h | 2 +- > include/linux/usb/usbnet.h | 2 +- > 13 files changed, 76 insertions(+), 71 deletions(-) > > diff --git a/drivers/usb/atm/usbatm.c b/drivers/usb/atm/usbatm.c > index 2da6615fbb6f..74849f24e52e 100644 > --- a/drivers/usb/atm/usbatm.c > +++ b/drivers/usb/atm/usbatm.c > @@ -17,7 +17,7 @@ > * - Removed the limit on the number of devices > * - Module now autoloads on device plugin > * - Merged relevant parts of sarlib > - * - Replaced the kernel thread with a tasklet > + * - Replaced the kernel thread with a work a "work"? > * - New packet transmission code > * - Changed proc file contents > * - Fixed all known SMP races > @@ -68,6 +68,7 @@ > #include > #include > #include > +#include > > #ifdef VERBOSE_DEBUG > static int usbatm_print_packet(struct usbatm_data *instance, const unsigned > char *data, int len); > @@ -249,7 +250,7 @@ static void usbatm_complete(struct urb *urb) > /* vdbg("%s: urb 0x%p, status %d, actual_length %d", >__func__, urb, status, urb->actual_length); */ > > - /* Can be invoked from task context, protect against interrupts */ > + /* Can be invoked from work context, protect against interrupts */ "workqueue"? This too seems wrong. Same for other comment changes in this patch. thanks, greg k-h
Re: [PATCH 4/9] USB: Convert from tasklet to BH workqueue
Hi Allen, the usbatm bits look very reasonable to me. Unfortunately I don't have the hardware to test any more. Still, for what it's worth: Signed-off-by: Duncan Sands
[PATCH 5/9] mailbox: Convert from tasklet to BH workqueue
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This patch converts drivers/infiniband/* from tasklet to BH workqueue. Based on the work done by Tejun Heo Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10 Signed-off-by: Allen Pais --- drivers/mailbox/bcm-pdc-mailbox.c | 21 +++-- drivers/mailbox/imx-mailbox.c | 16 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/drivers/mailbox/bcm-pdc-mailbox.c b/drivers/mailbox/bcm-pdc-mailbox.c index 1768d3d5aaa0..242e7504a628 100644 --- a/drivers/mailbox/bcm-pdc-mailbox.c +++ b/drivers/mailbox/bcm-pdc-mailbox.c @@ -43,6 +43,7 @@ #include #include #include +#include #define PDC_SUCCESS 0 @@ -293,8 +294,8 @@ struct pdc_state { unsigned int pdc_irq; - /* tasklet for deferred processing after DMA rx interrupt */ - struct tasklet_struct rx_tasklet; + /* work for deferred processing after DMA rx interrupt */ + struct work_struct rx_work; /* Number of bytes of receive status prior to each rx frame */ u32 rx_status_len; @@ -952,18 +953,18 @@ static irqreturn_t pdc_irq_handler(int irq, void *data) iowrite32(intstatus, pdcs->pdc_reg_vbase + PDC_INTSTATUS_OFFSET); /* Wakeup IRQ thread */ - tasklet_schedule(>rx_tasklet); + queue_work(system_bh_wq, >rx_work); return IRQ_HANDLED; } /** - * pdc_tasklet_cb() - Tasklet callback that runs the deferred processing after + * pdc_work_cb() - Work callback that runs the deferred processing after * a DMA receive interrupt. Reenables the receive interrupt. * @t: Pointer to the Altera sSGDMA channel structure */ -static void pdc_tasklet_cb(struct tasklet_struct *t) +static void pdc_work_cb(struct work_struct *t) { - struct pdc_state *pdcs = from_tasklet(pdcs, t, rx_tasklet); + struct pdc_state *pdcs = from_work(pdcs, t, rx_work); pdc_receive(pdcs); @@ -1577,8 +1578,8 @@ static int pdc_probe(struct platform_device *pdev) pdc_hw_init(pdcs); - /* Init tasklet for deferred DMA rx processing */ - tasklet_setup(>rx_tasklet, pdc_tasklet_cb); + /* Init work for deferred DMA rx processing */ + INIT_WORK(>rx_work, pdc_work_cb); err = pdc_interrupts_init(pdcs); if (err) @@ -1595,7 +1596,7 @@ static int pdc_probe(struct platform_device *pdev) return PDC_SUCCESS; cleanup_buf_pool: - tasklet_kill(>rx_tasklet); + cancel_work_sync(>rx_work); dma_pool_destroy(pdcs->rx_buf_pool); cleanup_ring_pool: @@ -1611,7 +1612,7 @@ static void pdc_remove(struct platform_device *pdev) pdc_free_debugfs(); - tasklet_kill(>rx_tasklet); + cancel_work_sync(>rx_work); pdc_hw_disable(pdcs); diff --git a/drivers/mailbox/imx-mailbox.c b/drivers/mailbox/imx-mailbox.c index 5c1d09cad761..933727f89431 100644 --- a/drivers/mailbox/imx-mailbox.c +++ b/drivers/mailbox/imx-mailbox.c @@ -21,6 +21,7 @@ #include #include #include +#include #include "mailbox.h" @@ -80,7 +81,7 @@ struct imx_mu_con_priv { charirq_desc[IMX_MU_CHAN_NAME_SIZE]; enum imx_mu_chan_type type; struct mbox_chan*chan; - struct tasklet_struct txdb_tasklet; + struct work_struct txdb_work; }; struct imx_mu_priv { @@ -232,7 +233,7 @@ static int imx_mu_generic_tx(struct imx_mu_priv *priv, break; case IMX_MU_TYPE_TXDB: imx_mu_xcr_rmw(priv, IMX_MU_GCR, IMX_MU_xCR_GIRn(priv->dcfg->type, cp->idx), 0); - tasklet_schedule(>txdb_tasklet); + queue_work(system_bh_wq, >txdb_work); break; case IMX_MU_TYPE_TXDB_V2: imx_mu_xcr_rmw(priv, IMX_MU_GCR, IMX_MU_xCR_GIRn(priv->dcfg->type, cp->idx), 0); @@ -420,7 +421,7 @@ static int imx_mu_seco_tx(struct imx_mu_priv *priv, struct imx_mu_con_priv *cp, } /* Simulate hack for mbox framework */ - tasklet_schedule(>txdb_tasklet); + queue_work(system_bh_wq, >txdb_work); break; default: @@ -484,9 +485,9 @@ static int imx_mu_seco_rxdb(struct imx_mu_priv *priv, struct imx_mu_con_priv *cp return err; } -static void imx_mu_txdb_tasklet(unsigned long data) +static void imx_mu_txdb_work(struct work_struct *t) { - struct imx_mu_con_priv *cp = (struct imx_mu_con_priv *)data; + struct imx_mu_con_priv *cp = from_work(cp, t, txdb_work); mbox_chan_txdone(cp->chan, 0); } @@ -570,8 +571,7 @@ static int imx_mu_startup(struct mbox_chan *chan) if (cp->type == IMX_MU_TYPE_TXDB)
[PATCH 7/9] s390: Convert from tasklet to BH workqueue
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This patch converts drivers/infiniband/* from tasklet to BH workqueue. Based on the work done by Tejun Heo Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10 Note: Not tested. Please test/review. Signed-off-by: Allen Pais --- drivers/s390/block/dasd.c | 42 drivers/s390/block/dasd_int.h | 10 +++--- drivers/s390/char/con3270.c| 27 drivers/s390/crypto/ap_bus.c | 24 +++--- drivers/s390/crypto/ap_bus.h | 2 +- drivers/s390/crypto/zcrypt_msgtype50.c | 2 +- drivers/s390/crypto/zcrypt_msgtype6.c | 4 +-- drivers/s390/net/ctcm_fsms.c | 4 +-- drivers/s390/net/ctcm_main.c | 15 - drivers/s390/net/ctcm_main.h | 5 +-- drivers/s390/net/ctcm_mpc.c| 12 +++ drivers/s390/net/ctcm_mpc.h| 7 ++-- drivers/s390/net/lcs.c | 26 +++ drivers/s390/net/lcs.h | 2 +- drivers/s390/net/qeth_core_main.c | 2 +- drivers/s390/scsi/zfcp_qdio.c | 45 +- drivers/s390/scsi/zfcp_qdio.h | 9 +++--- 17 files changed, 117 insertions(+), 121 deletions(-) diff --git a/drivers/s390/block/dasd.c b/drivers/s390/block/dasd.c index 0a97cfedd706..c6f9910f0a98 100644 --- a/drivers/s390/block/dasd.c +++ b/drivers/s390/block/dasd.c @@ -54,8 +54,8 @@ MODULE_LICENSE("GPL"); * SECTION: prototypes for static functions of dasd.c */ static int dasd_flush_block_queue(struct dasd_block *); -static void dasd_device_tasklet(unsigned long); -static void dasd_block_tasklet(unsigned long); +static void dasd_device_work(struct work_struct *); +static void dasd_block_work(struct work_struct *); static void do_kick_device(struct work_struct *); static void do_reload_device(struct work_struct *); static void do_requeue_requests(struct work_struct *); @@ -114,9 +114,8 @@ struct dasd_device *dasd_alloc_device(void) dasd_init_chunklist(>erp_chunks, device->erp_mem, PAGE_SIZE); dasd_init_chunklist(>ese_chunks, device->ese_mem, PAGE_SIZE * 2); spin_lock_init(>mem_lock); - atomic_set(>tasklet_scheduled, 0); - tasklet_init(>tasklet, dasd_device_tasklet, -(unsigned long) device); + atomic_set(>work_scheduled, 0); + INIT_WORK(>bh, dasd_device_work); INIT_LIST_HEAD(>ccw_queue); timer_setup(>timer, dasd_device_timeout, 0); INIT_WORK(>kick_work, do_kick_device); @@ -154,9 +153,8 @@ struct dasd_block *dasd_alloc_block(void) /* open_count = 0 means device online but not in use */ atomic_set(>open_count, -1); - atomic_set(>tasklet_scheduled, 0); - tasklet_init(>tasklet, dasd_block_tasklet, -(unsigned long) block); + atomic_set(>work_scheduled, 0); + INIT_WORK(>bh, dasd_block_work); INIT_LIST_HEAD(>ccw_queue); spin_lock_init(>queue_lock); INIT_LIST_HEAD(>format_list); @@ -2148,12 +2146,12 @@ EXPORT_SYMBOL_GPL(dasd_flush_device_queue); /* * Acquire the device lock and process queues for the device. */ -static void dasd_device_tasklet(unsigned long data) +static void dasd_device_work(struct work_struct *t) { - struct dasd_device *device = (struct dasd_device *) data; + struct dasd_device *device = from_work(device, t, bh); struct list_head final_queue; - atomic_set (>tasklet_scheduled, 0); + atomic_set (>work_scheduled, 0); INIT_LIST_HEAD(_queue); spin_lock_irq(get_ccwdev_lock(device->cdev)); /* Check expire time of first request on the ccw queue. */ @@ -2174,15 +2172,15 @@ static void dasd_device_tasklet(unsigned long data) } /* - * Schedules a call to dasd_tasklet over the device tasklet. + * Schedules a call to dasd_work over the device wq. */ void dasd_schedule_device_bh(struct dasd_device *device) { /* Protect against rescheduling. */ - if (atomic_cmpxchg (>tasklet_scheduled, 0, 1) != 0) + if (atomic_cmpxchg (>work_scheduled, 0, 1) != 0) return; dasd_get_device(device); - tasklet_hi_schedule(>tasklet); + queue_work(system_bh_highpri_wq, >bh); } EXPORT_SYMBOL(dasd_schedule_device_bh); @@ -2595,7 +2593,7 @@ int dasd_sleep_on_immediatly(struct dasd_ccw_req *cqr) else rc = -EIO; - /* kick tasklets */ + /* kick works */ dasd_schedule_device_bh(device); if (device->block) dasd_schedule_block_bh(device->block); @@ -2891,15 +2889,15 @@ static void __dasd_block_start_head(struct dasd_block
[PATCH 9/9] mmc: Convert from tasklet to BH workqueue
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This patch converts drivers/infiniband/* from tasklet to BH workqueue. Based on the work done by Tejun Heo Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10 Signed-off-by: Allen Pais --- drivers/mmc/host/atmel-mci.c | 35 - drivers/mmc/host/au1xmmc.c| 37 - drivers/mmc/host/cb710-mmc.c | 15 ++-- drivers/mmc/host/cb710-mmc.h | 3 +- drivers/mmc/host/dw_mmc.c | 25 --- drivers/mmc/host/dw_mmc.h | 9 ++- drivers/mmc/host/omap.c | 17 +++-- drivers/mmc/host/renesas_sdhi.h | 3 +- drivers/mmc/host/renesas_sdhi_internal_dmac.c | 24 +++--- drivers/mmc/host/renesas_sdhi_sys_dmac.c | 9 +-- drivers/mmc/host/sdhci-bcm-kona.c | 2 +- drivers/mmc/host/tifm_sd.c| 15 ++-- drivers/mmc/host/tmio_mmc.h | 3 +- drivers/mmc/host/tmio_mmc_core.c | 4 +- drivers/mmc/host/uniphier-sd.c| 13 ++-- drivers/mmc/host/via-sdmmc.c | 25 --- drivers/mmc/host/wbsd.c | 75 ++- drivers/mmc/host/wbsd.h | 10 +-- 18 files changed, 167 insertions(+), 157 deletions(-) diff --git a/drivers/mmc/host/atmel-mci.c b/drivers/mmc/host/atmel-mci.c index dba826db739a..0a92a7fd020f 100644 --- a/drivers/mmc/host/atmel-mci.c +++ b/drivers/mmc/host/atmel-mci.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include @@ -284,12 +285,12 @@ struct atmel_mci_dma { * EVENT_DATA_ERROR is pending. * @stop_cmdr: Value to be loaded into CMDR when the stop command is * to be sent. - * @tasklet: Tasklet running the request state machine. + * @work: Work running the request state machine. * @pending_events: Bitmask of events flagged by the interrupt handler - * to be processed by the tasklet. + * to be processed by the work. * @completed_events: Bitmask of events which the state machine has * processed. - * @state: Tasklet state. + * @state: Work state. * @queue: List of slots waiting for access to the controller. * @need_clock_update: Update the clock rate before the next request. * @need_reset: Reset controller before next request. @@ -363,7 +364,7 @@ struct atmel_mci { u32 data_status; u32 stop_cmdr; - struct tasklet_struct tasklet; + struct work_struct work; unsigned long pending_events; unsigned long completed_events; enum atmel_mci_statestate; @@ -761,7 +762,7 @@ static void atmci_timeout_timer(struct timer_list *t) host->need_reset = 1; host->state = STATE_END_REQUEST; smp_wmb(); - tasklet_schedule(>tasklet); + queue_work(system_bh_wq, >work); } static inline unsigned int atmci_ns_to_clocks(struct atmel_mci *host, @@ -983,7 +984,7 @@ static void atmci_pdc_complete(struct atmel_mci *host) dev_dbg(>pdev->dev, "(%s) set pending xfer complete\n", __func__); atmci_set_pending(host, EVENT_XFER_COMPLETE); - tasklet_schedule(>tasklet); + queue_work(system_bh_wq, >work); } static void atmci_dma_cleanup(struct atmel_mci *host) @@ -997,7 +998,7 @@ static void atmci_dma_cleanup(struct atmel_mci *host) } /* - * This function is called by the DMA driver from tasklet context. + * This function is called by the DMA driver from work context. */ static void atmci_dma_complete(void *arg) { @@ -1020,7 +1021,7 @@ static void atmci_dma_complete(void *arg) dev_dbg(>pdev->dev, "(%s) set pending xfer complete\n", __func__); atmci_set_pending(host, EVENT_XFER_COMPLETE); - tasklet_schedule(>tasklet); + queue_work(system_bh_wq, >work); /* * Regardless of what the documentation says, we have @@ -1033,7 +1034,7 @@ static void atmci_dma_complete(void *arg) * haven't seen all the potential error bits yet. * * The interrupt handler will schedule a different -* tasklet to finish things up when the data transfer +* work to finish things up when the data transfer * is completely done. * * We may not complete the mmc request here anyway @@ -1765,9 +1766,9 @@ static void atmci_detect_change(struct timer_list *t) } } -static void atmci_tasklet_func(struct tasklet_struct *t) +static
[PATCH 8/9] drivers/media/*: Convert from tasklet to BH workqueue
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This patch converts drivers/media/* from tasklet to BH workqueue. Based on the work done by Tejun Heo Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10 Signed-off-by: Allen Pais --- drivers/media/pci/bt8xx/bt878.c | 8 ++-- drivers/media/pci/bt8xx/bt878.h | 3 +- drivers/media/pci/bt8xx/dvb-bt8xx.c | 9 ++-- drivers/media/pci/ddbridge/ddbridge.h | 3 +- drivers/media/pci/mantis/hopper_cards.c | 2 +- drivers/media/pci/mantis/mantis_cards.c | 2 +- drivers/media/pci/mantis/mantis_common.h | 3 +- drivers/media/pci/mantis/mantis_dma.c | 5 ++- drivers/media/pci/mantis/mantis_dma.h | 2 +- drivers/media/pci/mantis/mantis_dvb.c | 12 +++--- drivers/media/pci/ngene/ngene-core.c | 23 ++- drivers/media/pci/ngene/ngene.h | 5 ++- drivers/media/pci/smipcie/smipcie-main.c | 18 drivers/media/pci/smipcie/smipcie.h | 3 +- drivers/media/pci/ttpci/budget-av.c | 3 +- drivers/media/pci/ttpci/budget-ci.c | 27 ++-- drivers/media/pci/ttpci/budget-core.c | 10 ++--- drivers/media/pci/ttpci/budget.h | 5 ++- drivers/media/pci/tw5864/tw5864-core.c| 2 +- drivers/media/pci/tw5864/tw5864-video.c | 13 +++--- drivers/media/pci/tw5864/tw5864.h | 7 ++-- drivers/media/platform/intel/pxa_camera.c | 15 +++ drivers/media/platform/marvell/mcam-core.c| 11 ++--- drivers/media/platform/marvell/mcam-core.h| 3 +- .../st/sti/c8sectpfe/c8sectpfe-core.c | 15 +++ .../st/sti/c8sectpfe/c8sectpfe-core.h | 2 +- drivers/media/radio/wl128x/fmdrv.h| 7 ++-- drivers/media/radio/wl128x/fmdrv_common.c | 41 ++- drivers/media/rc/mceusb.c | 2 +- drivers/media/usb/ttusb-dec/ttusb_dec.c | 21 +- 30 files changed, 151 insertions(+), 131 deletions(-) diff --git a/drivers/media/pci/bt8xx/bt878.c b/drivers/media/pci/bt8xx/bt878.c index 90972d6952f1..983ec29108f0 100644 --- a/drivers/media/pci/bt8xx/bt878.c +++ b/drivers/media/pci/bt8xx/bt878.c @@ -300,8 +300,8 @@ static irqreturn_t bt878_irq(int irq, void *dev_id) } if (astat & BT878_ARISCI) { bt->finished_block = (stat & BT878_ARISCS) >> 28; - if (bt->tasklet.callback) - tasklet_schedule(>tasklet); + if (bt->work.func) + queue_work(system_bh_wq, break; } count++; @@ -478,8 +478,8 @@ static int bt878_probe(struct pci_dev *dev, const struct pci_device_id *pci_id) btwrite(0, BT878_AINT_MASK); bt878_num++; - if (!bt->tasklet.func) - tasklet_disable(>tasklet); + if (!bt->work.func) + disable_work_sync(>work); return 0; diff --git a/drivers/media/pci/bt8xx/bt878.h b/drivers/media/pci/bt8xx/bt878.h index fde8db293c54..b9ce78e5116b 100644 --- a/drivers/media/pci/bt8xx/bt878.h +++ b/drivers/media/pci/bt8xx/bt878.h @@ -14,6 +14,7 @@ #include #include #include +#include #include "bt848.h" #include "bttv.h" @@ -120,7 +121,7 @@ struct bt878 { dma_addr_t risc_dma; u32 risc_pos; - struct tasklet_struct tasklet; + struct work_struct work; int shutdown; }; diff --git a/drivers/media/pci/bt8xx/dvb-bt8xx.c b/drivers/media/pci/bt8xx/dvb-bt8xx.c index 390cbba6c065..8c0e1fa764a4 100644 --- a/drivers/media/pci/bt8xx/dvb-bt8xx.c +++ b/drivers/media/pci/bt8xx/dvb-bt8xx.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -39,9 +40,9 @@ DVB_DEFINE_MOD_OPT_ADAPTER_NR(adapter_nr); #define IF_FREQUENCYx6 217/* 6 * 36.167MHz */ -static void dvb_bt8xx_task(struct tasklet_struct *t) +static void dvb_bt8xx_task(struct work_struct *t) { - struct bt878 *bt = from_tasklet(bt, t, tasklet); + struct bt878 *bt = from_work(bt, t, work); struct dvb_bt8xx_card *card = dev_get_drvdata(>adapter->dev); dprintk("%d\n", card->bt->finished_block); @@ -782,7 +783,7 @@ static int dvb_bt8xx_load_card(struct dvb_bt8xx_card *card, u32 type) goto err_disconnect_frontend; } - tasklet_setup(>bt->tasklet, dvb_bt8xx_task); + INIT_WORK(>bt->work, dvb_bt8xx_task); frontend_init(card, type); @@ -922,7 +923,7 @@ static void dvb_bt8xx_remove(struct bttv_sub_device *sub) dprintk("dvb_bt8xx: unloading
[PATCH 6/9] ipmi: Convert from tasklet to BH workqueue
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This patch converts drivers/infiniband/* from tasklet to BH workqueue. Based on the work done by Tejun Heo Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10 Signed-off-by: Allen Pais --- drivers/char/ipmi/ipmi_msghandler.c | 30 ++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c index b0eedc4595b3..fce2a2dbdc82 100644 --- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -36,12 +36,13 @@ #include #include #include +#include #define IPMI_DRIVER_VERSION "39.2" static struct ipmi_recv_msg *ipmi_alloc_recv_msg(void); static int ipmi_init_msghandler(void); -static void smi_recv_tasklet(struct tasklet_struct *t); +static void smi_recv_work(struct work_struct *t); static void handle_new_recv_msgs(struct ipmi_smi *intf); static void need_waiter(struct ipmi_smi *intf); static int handle_one_recv_msg(struct ipmi_smi *intf, @@ -498,13 +499,13 @@ struct ipmi_smi { /* * Messages queued for delivery. If delivery fails (out of memory * for instance), They will stay in here to be processed later in a -* periodic timer interrupt. The tasklet is for handling received +* periodic timer interrupt. The work is for handling received * messages directly from the handler. */ spinlock_t waiting_rcv_msgs_lock; struct list_head waiting_rcv_msgs; atomic_t watchdog_pretimeouts_to_deliver; - struct tasklet_struct recv_tasklet; + struct work_struct recv_work; spinlock_t xmit_msgs_lock; struct list_head xmit_msgs; @@ -704,7 +705,7 @@ static void clean_up_interface_data(struct ipmi_smi *intf) struct cmd_rcvr *rcvr, *rcvr2; struct list_head list; - tasklet_kill(>recv_tasklet); + cancel_work_sync(>recv_work); free_smi_msg_list(>waiting_rcv_msgs); free_recv_msg_list(>waiting_events); @@ -1319,7 +1320,7 @@ static void free_user(struct kref *ref) { struct ipmi_user *user = container_of(ref, struct ipmi_user, refcount); - /* SRCU cleanup must happen in task context. */ + /* SRCU cleanup must happen in work context. */ queue_work(remove_work_wq, >remove_work); } @@ -3605,8 +3606,7 @@ int ipmi_add_smi(struct module *owner, intf->curr_seq = 0; spin_lock_init(>waiting_rcv_msgs_lock); INIT_LIST_HEAD(>waiting_rcv_msgs); - tasklet_setup(>recv_tasklet, -smi_recv_tasklet); + INIT_WORK(>recv_work, smi_recv_work); atomic_set(>watchdog_pretimeouts_to_deliver, 0); spin_lock_init(>xmit_msgs_lock); INIT_LIST_HEAD(>xmit_msgs); @@ -4779,7 +4779,7 @@ static void handle_new_recv_msgs(struct ipmi_smi *intf) * To preserve message order, quit if we * can't handle a message. Add the message * back at the head, this is safe because this -* tasklet is the only thing that pulls the +* work is the only thing that pulls the * messages. */ list_add(_msg->link, >waiting_rcv_msgs); @@ -4812,10 +4812,10 @@ static void handle_new_recv_msgs(struct ipmi_smi *intf) } } -static void smi_recv_tasklet(struct tasklet_struct *t) +static void smi_recv_work(struct work_struct *t) { unsigned long flags = 0; /* keep us warning-free. */ - struct ipmi_smi *intf = from_tasklet(intf, t, recv_tasklet); + struct ipmi_smi *intf = from_work(intf, t, recv_work); int run_to_completion = intf->run_to_completion; struct ipmi_smi_msg *newmsg = NULL; @@ -4866,7 +4866,7 @@ void ipmi_smi_msg_received(struct ipmi_smi *intf, /* * To preserve message order, we keep a queue and deliver from -* a tasklet. +* a work. */ if (!run_to_completion) spin_lock_irqsave(>waiting_rcv_msgs_lock, flags); @@ -4887,9 +4887,9 @@ void ipmi_smi_msg_received(struct ipmi_smi *intf, spin_unlock_irqrestore(>xmit_msgs_lock, flags); if (run_to_completion) - smi_recv_tasklet(>recv_tasklet); + smi_recv_work(>recv_work); else - tasklet_schedule(>recv_tasklet); + queue_work(system_bh_wq, >recv_work); } EXPORT_SYMBOL(ipmi_smi_msg_received); @@ -4899,7 +4899,7 @@ void ipmi_smi_watchdog_pretimeout(struct
[PATCH 4/9] USB: Convert from tasklet to BH workqueue
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This patch converts drivers/infiniband/* from tasklet to BH workqueue. Based on the work done by Tejun Heo Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10 Signed-off-by: Allen Pais --- drivers/usb/atm/usbatm.c| 55 +++-- drivers/usb/atm/usbatm.h| 3 +- drivers/usb/core/hcd.c | 22 ++-- drivers/usb/gadget/udc/fsl_qe_udc.c | 21 +-- drivers/usb/gadget/udc/fsl_qe_udc.h | 4 +-- drivers/usb/host/ehci-sched.c | 2 +- drivers/usb/host/fhci-hcd.c | 3 +- drivers/usb/host/fhci-sched.c | 10 +++--- drivers/usb/host/fhci.h | 5 +-- drivers/usb/host/xhci-dbgcap.h | 3 +- drivers/usb/host/xhci-dbgtty.c | 15 include/linux/usb/cdc_ncm.h | 2 +- include/linux/usb/usbnet.h | 2 +- 13 files changed, 76 insertions(+), 71 deletions(-) diff --git a/drivers/usb/atm/usbatm.c b/drivers/usb/atm/usbatm.c index 2da6615fbb6f..74849f24e52e 100644 --- a/drivers/usb/atm/usbatm.c +++ b/drivers/usb/atm/usbatm.c @@ -17,7 +17,7 @@ * - Removed the limit on the number of devices * - Module now autoloads on device plugin * - Merged relevant parts of sarlib - * - Replaced the kernel thread with a tasklet + * - Replaced the kernel thread with a work * - New packet transmission code * - Changed proc file contents * - Fixed all known SMP races @@ -68,6 +68,7 @@ #include #include #include +#include #ifdef VERBOSE_DEBUG static int usbatm_print_packet(struct usbatm_data *instance, const unsigned char *data, int len); @@ -249,7 +250,7 @@ static void usbatm_complete(struct urb *urb) /* vdbg("%s: urb 0x%p, status %d, actual_length %d", __func__, urb, status, urb->actual_length); */ - /* Can be invoked from task context, protect against interrupts */ + /* Can be invoked from work context, protect against interrupts */ spin_lock_irqsave(>lock, flags); /* must add to the back when receiving; doesn't matter when sending */ @@ -269,7 +270,7 @@ static void usbatm_complete(struct urb *urb) /* throttle processing in case of an error */ mod_timer(>delay, jiffies + msecs_to_jiffies(THROTTLE_MSECS)); } else - tasklet_schedule(>tasklet); + queue_work(system_bh_wq, >work); } @@ -511,10 +512,10 @@ static unsigned int usbatm_write_cells(struct usbatm_data *instance, ** receive ** **/ -static void usbatm_rx_process(struct tasklet_struct *t) +static void usbatm_rx_process(struct work_struct *t) { - struct usbatm_data *instance = from_tasklet(instance, t, - rx_channel.tasklet); + struct usbatm_data *instance = from_work(instance, t, + rx_channel.work); struct urb *urb; while ((urb = usbatm_pop_urb(>rx_channel))) { @@ -565,10 +566,10 @@ static void usbatm_rx_process(struct tasklet_struct *t) ** send ** ***/ -static void usbatm_tx_process(struct tasklet_struct *t) +static void usbatm_tx_process(struct work_struct *t) { - struct usbatm_data *instance = from_tasklet(instance, t, - tx_channel.tasklet); + struct usbatm_data *instance = from_work(instance, t, + tx_channel.work); struct sk_buff *skb = instance->current_skb; struct urb *urb = NULL; const unsigned int buf_size = instance->tx_channel.buf_size; @@ -632,13 +633,13 @@ static void usbatm_cancel_send(struct usbatm_data *instance, } spin_unlock_irq(>sndqueue.lock); - tasklet_disable(>tx_channel.tasklet); + disable_work_sync(>tx_channel.work); if ((skb = instance->current_skb) && (UDSL_SKB(skb)->atm.vcc == vcc)) { atm_dbg(instance, "%s: popping current skb (0x%p)\n", __func__, skb); instance->current_skb = NULL; usbatm_pop(vcc, skb); } - tasklet_enable(>tx_channel.tasklet); + enable_and_queue_work(system_bh_wq, >tx_channel.work); } static int usbatm_atm_send(struct atm_vcc *vcc, struct sk_buff *skb) @@ -677,7 +678,7 @@ static int usbatm_atm_send(struct atm_vcc *vcc, struct sk_buff *skb) ctrl->crc = crc32_be(~0, skb->data, skb->len); skb_queue_tail(>sndqueue, skb); - tasklet_schedule(>tx_channel.tasklet); + queue_work(system_bh_wq,
[PATCH 3/9] IB: Convert from tasklet to BH workqueue
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This patch converts drivers/infiniband/* from tasklet to BH workqueue. Based on the work done by Tejun Heo Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10 Signed-off-by: Allen Pais --- drivers/infiniband/hw/bnxt_re/bnxt_re.h| 3 +- drivers/infiniband/hw/bnxt_re/qplib_fp.c | 21 ++-- drivers/infiniband/hw/bnxt_re/qplib_fp.h | 2 +- drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 25 --- drivers/infiniband/hw/bnxt_re/qplib_rcfw.h | 2 +- drivers/infiniband/hw/erdma/erdma.h| 3 +- drivers/infiniband/hw/erdma/erdma_eq.c | 11 --- drivers/infiniband/hw/hfi1/rc.c| 2 +- drivers/infiniband/hw/hfi1/sdma.c | 37 +++--- drivers/infiniband/hw/hfi1/sdma.h | 9 +++--- drivers/infiniband/hw/hfi1/tid_rdma.c | 6 ++-- drivers/infiniband/hw/irdma/ctrl.c | 2 +- drivers/infiniband/hw/irdma/hw.c | 24 +++--- drivers/infiniband/hw/irdma/main.h | 5 +-- drivers/infiniband/hw/qib/qib.h| 7 ++-- drivers/infiniband/hw/qib/qib_iba7322.c| 9 +++--- drivers/infiniband/hw/qib/qib_rc.c | 16 +- drivers/infiniband/hw/qib/qib_ruc.c| 4 +-- drivers/infiniband/hw/qib/qib_sdma.c | 11 --- drivers/infiniband/sw/rdmavt/qp.c | 2 +- 20 files changed, 106 insertions(+), 95 deletions(-) diff --git a/drivers/infiniband/hw/bnxt_re/bnxt_re.h b/drivers/infiniband/hw/bnxt_re/bnxt_re.h index 9dca451ed522..f511c8415806 100644 --- a/drivers/infiniband/hw/bnxt_re/bnxt_re.h +++ b/drivers/infiniband/hw/bnxt_re/bnxt_re.h @@ -42,6 +42,7 @@ #include #include "hw_counters.h" #include +#include #define ROCE_DRV_MODULE_NAME "bnxt_re" #define BNXT_RE_DESC "Broadcom NetXtreme-C/E RoCE Driver" @@ -162,7 +163,7 @@ struct bnxt_re_dev { u8 cur_prio_map; /* FP Notification Queue (CQ & SRQ) */ - struct tasklet_struct nq_task; + struct work_struct nq_work; /* RCFW Channel */ struct bnxt_qplib_rcfw rcfw; diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.c b/drivers/infiniband/hw/bnxt_re/qplib_fp.c index 439d0c7c5d0c..052906982cdf 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_fp.c +++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.c @@ -46,6 +46,7 @@ #include #include #include +#include #include #include "roce_hsi.h" @@ -294,9 +295,9 @@ static void __wait_for_all_nqes(struct bnxt_qplib_cq *cq, u16 cnq_events) } } -static void bnxt_qplib_service_nq(struct tasklet_struct *t) +static void bnxt_qplib_service_nq(struct work_struct *t) { - struct bnxt_qplib_nq *nq = from_tasklet(nq, t, nq_tasklet); + struct bnxt_qplib_nq *nq = from_work(nq, t, nq_work); struct bnxt_qplib_hwq *hwq = >hwq; struct bnxt_qplib_cq *cq; int budget = nq->budget; @@ -394,7 +395,7 @@ void bnxt_re_synchronize_nq(struct bnxt_qplib_nq *nq) int budget = nq->budget; nq->budget = nq->hwq.max_elements; - bnxt_qplib_service_nq(>nq_tasklet); + bnxt_qplib_service_nq(>nq_work); nq->budget = budget; } @@ -409,7 +410,7 @@ static irqreturn_t bnxt_qplib_nq_irq(int irq, void *dev_instance) prefetch(bnxt_qplib_get_qe(hwq, sw_cons, NULL)); /* Fan out to CPU affinitized kthreads? */ - tasklet_schedule(>nq_tasklet); + queue_work(system_bh_wq, >nq_work); return IRQ_HANDLED; } @@ -430,8 +431,8 @@ void bnxt_qplib_nq_stop_irq(struct bnxt_qplib_nq *nq, bool kill) nq->name = NULL; if (kill) - tasklet_kill(>nq_tasklet); - tasklet_disable(>nq_tasklet); + cancel_work_sync(>nq_work); + disable_work_sync(>nq_work); } void bnxt_qplib_disable_nq(struct bnxt_qplib_nq *nq) @@ -465,9 +466,9 @@ int bnxt_qplib_nq_start_irq(struct bnxt_qplib_nq *nq, int nq_indx, nq->msix_vec = msix_vector; if (need_init) - tasklet_setup(>nq_tasklet, bnxt_qplib_service_nq); + INIT_WORK(>nq_work, bnxt_qplib_service_nq); else - tasklet_enable(>nq_tasklet); + enable_and_queue_work(system_bh_wq, >nq_work); nq->name = kasprintf(GFP_KERNEL, "bnxt_re-nq-%d@pci:%s", nq_indx, pci_name(res->pdev)); @@ -477,7 +478,7 @@ int bnxt_qplib_nq_start_irq(struct bnxt_qplib_nq *nq, int nq_indx, if (rc) { kfree(nq->name); nq->name = NULL; - tasklet_disable(>nq_tasklet); + disable_work_sync(>nq_work);
[PATCH 2/9] dma: Convert from tasklet to BH workqueue
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This patch converts drivers/dma/* from tasklet to BH workqueue. Based on the work done by Tejun Heo Branch: git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10 Signed-off-by: Allen Pais --- drivers/dma/altera-msgdma.c | 15 drivers/dma/apple-admac.c | 15 drivers/dma/at_hdmac.c| 2 +- drivers/dma/at_xdmac.c| 15 drivers/dma/bcm2835-dma.c | 2 +- drivers/dma/dma-axi-dmac.c| 2 +- drivers/dma/dma-jz4780.c | 2 +- .../dma/dw-axi-dmac/dw-axi-dmac-platform.c| 2 +- drivers/dma/dw-edma/dw-edma-core.c| 2 +- drivers/dma/dw/core.c | 13 +++ drivers/dma/dw/regs.h | 3 +- drivers/dma/ep93xx_dma.c | 15 drivers/dma/fsl-edma-common.c | 2 +- drivers/dma/fsl-qdma.c| 2 +- drivers/dma/fsl_raid.c| 11 +++--- drivers/dma/fsl_raid.h| 2 +- drivers/dma/fsldma.c | 15 drivers/dma/fsldma.h | 3 +- drivers/dma/hisi_dma.c| 2 +- drivers/dma/hsu/hsu.c | 2 +- drivers/dma/idma64.c | 4 +-- drivers/dma/img-mdc-dma.c | 2 +- drivers/dma/imx-dma.c | 27 +++--- drivers/dma/imx-sdma.c| 6 ++-- drivers/dma/ioat/dma.c| 17 - drivers/dma/ioat/dma.h| 5 +-- drivers/dma/ioat/init.c | 2 +- drivers/dma/k3dma.c | 19 +- drivers/dma/mediatek/mtk-cqdma.c | 35 ++- drivers/dma/mediatek/mtk-hsdma.c | 2 +- drivers/dma/mediatek/mtk-uart-apdma.c | 4 +-- drivers/dma/mmp_pdma.c| 13 +++ drivers/dma/mmp_tdma.c| 11 +++--- drivers/dma/mpc512x_dma.c | 17 - drivers/dma/mv_xor.c | 13 +++ drivers/dma/mv_xor.h | 5 +-- drivers/dma/mv_xor_v2.c | 23 ++-- drivers/dma/mxs-dma.c | 13 +++ drivers/dma/nbpfaxi.c | 15 drivers/dma/owl-dma.c | 2 +- drivers/dma/pch_dma.c | 17 - drivers/dma/pl330.c | 31 drivers/dma/plx_dma.c | 13 +++ drivers/dma/ppc4xx/adma.c | 17 - drivers/dma/ppc4xx/adma.h | 5 +-- drivers/dma/pxa_dma.c | 2 +- drivers/dma/qcom/bam_dma.c| 35 ++- drivers/dma/qcom/gpi.c| 18 +- drivers/dma/qcom/hidma.c | 11 +++--- drivers/dma/qcom/hidma.h | 5 +-- drivers/dma/qcom/hidma_ll.c | 11 +++--- drivers/dma/qcom/qcom_adm.c | 2 +- drivers/dma/sa11x0-dma.c | 27 +++--- drivers/dma/sf-pdma/sf-pdma.c | 23 ++-- drivers/dma/sf-pdma/sf-pdma.h | 5 +-- drivers/dma/sprd-dma.c| 2 +- drivers/dma/st_fdma.c | 2 +- drivers/dma/ste_dma40.c | 17 - drivers/dma/sun6i-dma.c | 33 - drivers/dma/tegra186-gpc-dma.c| 2 +- drivers/dma/tegra20-apb-dma.c | 19 +- drivers/dma/tegra210-adma.c | 2 +- drivers/dma/ti/edma.c | 2 +- drivers/dma/ti/k3-udma.c | 11 +++--- drivers/dma/ti/omap-dma.c | 2 +- drivers/dma/timb_dma.c| 23 ++-- drivers/dma/txx9dmac.c| 29 +++ drivers/dma/txx9dmac.h| 5 +-- drivers/dma/virt-dma.c| 9 ++--- drivers/dma/virt-dma.h| 9 ++--- drivers/dma/xgene-dma.c | 21 +-- drivers/dma/xilinx/xilinx_dma.c | 23 ++-- drivers/dma/xilinx/xilinx_dpdma.c | 21 +-- drivers/dma/xilinx/zynqmp_dma.c | 21 +-- 74 files changed, 442 insertions(+), 395 deletions(-) diff --git
[PATCH 1/9] hyperv: Convert from tasklet to BH workqueue
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This patch converts drivers/hv/* from tasklet to BH workqueue. Based on the work done by Tejun Heo Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10 Signed-off-by: Allen Pais --- drivers/hv/channel.c | 8 drivers/hv/channel_mgmt.c | 5 ++--- drivers/hv/connection.c | 9 + drivers/hv/hv.c | 3 +-- drivers/hv/hv_balloon.c | 4 ++-- drivers/hv/hv_fcopy.c | 8 drivers/hv/hv_kvp.c | 8 drivers/hv/hv_snapshot.c | 8 drivers/hv/hyperv_vmbus.h | 9 + drivers/hv/vmbus_drv.c| 19 ++- include/linux/hyperv.h| 2 +- 11 files changed, 42 insertions(+), 41 deletions(-) diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c index adbf674355b2..876d78eb4dce 100644 --- a/drivers/hv/channel.c +++ b/drivers/hv/channel.c @@ -859,7 +859,7 @@ void vmbus_reset_channel_cb(struct vmbus_channel *channel) unsigned long flags; /* -* vmbus_on_event(), running in the per-channel tasklet, can race +* vmbus_on_event(), running in the per-channel work, can race * with vmbus_close_internal() in the case of SMP guest, e.g., when * the former is accessing channel->inbound.ring_buffer, the latter * could be freeing the ring_buffer pages, so here we must stop it @@ -871,7 +871,7 @@ void vmbus_reset_channel_cb(struct vmbus_channel *channel) * and that the channel ring buffer is no longer being accessed, cf. * the calls to napi_disable() in netvsc_device_remove(). */ - tasklet_disable(>callback_event); + disable_work_sync(>callback_event); /* See the inline comments in vmbus_chan_sched(). */ spin_lock_irqsave(>sched_lock, flags); @@ -880,8 +880,8 @@ void vmbus_reset_channel_cb(struct vmbus_channel *channel) channel->sc_creation_callback = NULL; - /* Re-enable tasklet for use on re-open */ - tasklet_enable(>callback_event); + /* Re-enable work for use on re-open */ + enable_and_queue_work(system_bh_wq, >callback_event); } static int vmbus_close_internal(struct vmbus_channel *channel) diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c index 2f4d09ce027a..58397071a0de 100644 --- a/drivers/hv/channel_mgmt.c +++ b/drivers/hv/channel_mgmt.c @@ -353,8 +353,7 @@ static struct vmbus_channel *alloc_channel(void) INIT_LIST_HEAD(>sc_list); - tasklet_init(>callback_event, -vmbus_on_event, (unsigned long)channel); + INIT_WORK(>callback_event, vmbus_on_event); hv_ringbuffer_pre_init(channel); @@ -366,7 +365,7 @@ static struct vmbus_channel *alloc_channel(void) */ static void free_channel(struct vmbus_channel *channel) { - tasklet_kill(>callback_event); + cancel_work_sync(>callback_event); vmbus_remove_channel_attr_group(channel); kobject_put(>kobj); diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c index 3cabeeabb1ca..f2a3394a8303 100644 --- a/drivers/hv/connection.c +++ b/drivers/hv/connection.c @@ -372,12 +372,13 @@ struct vmbus_channel *relid2channel(u32 relid) * 3. Once we return, enable signaling from the host. Once this *state is set we check to see if additional packets are *available to read. In this case we repeat the process. - *If this tasklet has been running for a long time + *If this work has been running for a long time *then reschedule ourselves. */ -void vmbus_on_event(unsigned long data) +void vmbus_on_event(struct work_struct *t) { - struct vmbus_channel *channel = (void *) data; + struct vmbus_channel *channel = from_work(channel, t, + callback_event); void (*callback_fn)(void *context); trace_vmbus_on_event(channel); @@ -401,7 +402,7 @@ void vmbus_on_event(unsigned long data) return; hv_begin_read(>inbound); - tasklet_schedule(>callback_event); + queue_work(system_bh_wq, >callback_event); } /* diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c index a8ad728354cb..2af92f08f9ce 100644 --- a/drivers/hv/hv.c +++ b/drivers/hv/hv.c @@ -119,8 +119,7 @@ int hv_synic_alloc(void) for_each_present_cpu(cpu) { hv_cpu = per_cpu_ptr(hv_context.cpu_context, cpu); - tasklet_init(_cpu->msg_dpc, -vmbus_on_msg_dpc, (unsigned long) hv_cpu); + INIT_WORK(_cpu->msg_dpc, vmbus_on_msg_dpc); if (ms_hyperv.paravisor_present && hv_isolation_type_tdx()) {
[PATCH 0/9] Convert Tasklets to BH Workqueues
This patch series represents a significant shift in how asynchronous execution in the bottom half (BH) context is handled within the kernel. Traditionally, tasklets have been the go-to mechanism for such operations. This series introduces the conversion of existing tasklet implementations to the newly supported BH workqueues, marking a pivotal enhancement in how asynchronous tasks are managed and executed. Background and Motivation: Tasklets have served as the kernel's lightweight mechanism for scheduling bottom-half processing, providing a simple interface for deferring work from interrupt context. There have been increasing requests and motivations to deprecate and eventually remove tasklets in favor of more modern and flexible mechanisms. Introduction of BH Workqueues: BH workqueues are designed to behave similarly to regular workqueues with the added benefit of execution in the BH context. Conversion Details: The conversion process involved identifying all instances where tasklets were used within the kernel and replacing them with BH workqueue implementations. This patch series is a first step toward broader adoption of BH workqueues across the kernel, and soon other subsystems using tasklets will undergo a similar transition. The groundwork laid here could serve as a blueprint for such future conversions. Testing Request: In addition to a thorough review of these changes, I kindly request that the reviwers engage in both functional and performance testing of this patch series. Specifically, benchmarks that measure interrupt handling efficiency, latency, and throughput. I welcome your feedback, suggestions, and any further discussion on this patch series. Additional Info: Based on the work done by Tejun Heo Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10 Allen Pais (9): hyperv: Convert from tasklet to BH workqueue dma: Convert from tasklet to BH workqueue IB: Convert from tasklet to BH workqueue USB: Convert from tasklet to BH workqueue mailbox: Convert from tasklet to BH workqueue ipmi: Convert from tasklet to BH workqueue s390: Convert from tasklet to BH workqueue drivers/media/*: Convert from tasklet to BH workqueue mmc: Convert from tasklet to BH workqueue drivers/char/ipmi/ipmi_msghandler.c | 30 drivers/dma/altera-msgdma.c | 15 ++-- drivers/dma/apple-admac.c | 15 ++-- drivers/dma/at_hdmac.c| 2 +- drivers/dma/at_xdmac.c| 15 ++-- drivers/dma/bcm2835-dma.c | 2 +- drivers/dma/dma-axi-dmac.c| 2 +- drivers/dma/dma-jz4780.c | 2 +- .../dma/dw-axi-dmac/dw-axi-dmac-platform.c| 2 +- drivers/dma/dw-edma/dw-edma-core.c| 2 +- drivers/dma/dw/core.c | 13 ++-- drivers/dma/dw/regs.h | 3 +- drivers/dma/ep93xx_dma.c | 15 ++-- drivers/dma/fsl-edma-common.c | 2 +- drivers/dma/fsl-qdma.c| 2 +- drivers/dma/fsl_raid.c| 11 +-- drivers/dma/fsl_raid.h| 2 +- drivers/dma/fsldma.c | 15 ++-- drivers/dma/fsldma.h | 3 +- drivers/dma/hisi_dma.c| 2 +- drivers/dma/hsu/hsu.c | 2 +- drivers/dma/idma64.c | 4 +- drivers/dma/img-mdc-dma.c | 2 +- drivers/dma/imx-dma.c | 27 +++ drivers/dma/imx-sdma.c| 6 +- drivers/dma/ioat/dma.c| 17 +++-- drivers/dma/ioat/dma.h| 5 +- drivers/dma/ioat/init.c | 2 +- drivers/dma/k3dma.c | 19 ++--- drivers/dma/mediatek/mtk-cqdma.c | 35 - drivers/dma/mediatek/mtk-hsdma.c | 2 +- drivers/dma/mediatek/mtk-uart-apdma.c | 4 +- drivers/dma/mmp_pdma.c| 13 ++-- drivers/dma/mmp_tdma.c| 11 +-- drivers/dma/mpc512x_dma.c | 17 +++-- drivers/dma/mv_xor.c | 13 ++-- drivers/dma/mv_xor.h | 5 +- drivers/dma/mv_xor_v2.c | 23 +++--- drivers/dma/mxs-dma.c | 13 ++-- drivers/dma/nbpfaxi.c | 15 ++-- drivers/dma/owl-dma.c | 2 +- drivers/dma/pch_dma.c | 17 +++-- drivers/dma/pl330.c | 31 drivers/dma/plx_dma.c | 13 ++-- drivers/dma/ppc4xx/adma.c | 17 +++-- drivers/dma/ppc4xx/adma.h | 5 +- drivers/dma/pxa_dma.c | 2 +- drivers/dma/qcom/bam_dma.c| 35 -
Re: [PATCH v4] arch/powerpc/kvm: Add support for reading VPA counters for pseries guests
On 3/26/24 3:10 PM, Gautam Menghani wrote: PAPR hypervisor has introduced three new counters in the VPA area of LPAR CPUs for KVM L2 guest (see [1] for terminology) observability - 2 for context switches from host to guest and vice versa, and 1 counter for getting the total time spent inside the KVM guest. Add a tracepoint that enables reading the counters for use by ftrace/perf. Note that this tracepoint is only available for nestedv2 API (i.e, KVM on PowerVM). Also maintain an aggregation of the context switch times in vcpu->arch. This will be useful in getting the aggregate times with a pmu driver which will be upstreamed in the near future. [1] Terminology: a. L1 refers to the VM (LPAR) booted on top of PAPR hypervisor b. L2 refers to the KVM guest booted on top of L1. Signed-off-by: Vaibhav Jain Signed-off-by: Gautam Menghani --- v1 -> v2: 1. Fix the build error due to invalid struct member reference. v2 -> v3: 1. Move the counter disabling and zeroing code to a different function. 2. Move the get_lppaca() inside the tracepoint_enabled() branch. 3. Add the aggregation logic to maintain total context switch time. v3 -> v4: 1. After vcpu_run, check the VPA flag instead of checking for tracepoint being enabled for disabling the cs time accumulation. arch/powerpc/include/asm/kvm_host.h | 5 + arch/powerpc/include/asm/lppaca.h | 11 --- arch/powerpc/kvm/book3s_hv.c| 30 + arch/powerpc/kvm/trace_hv.h | 25 4 files changed, 68 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 8abac5321..d953b32dd 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -847,6 +847,11 @@ struct kvm_vcpu_arch { gpa_t nested_io_gpr; /* For nested APIv2 guests*/ struct kvmhv_nestedv2_io nestedv2_io; + + /* Aggregate context switch and guest run time info (in ns) */ + u64 l1_to_l2_cs_agg; + u64 l2_to_l1_cs_agg; + u64 l2_runtime_agg; #endif #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h index 61ec2447d..bda6b86b9 100644 --- a/arch/powerpc/include/asm/lppaca.h +++ b/arch/powerpc/include/asm/lppaca.h @@ -62,7 +62,8 @@ struct lppaca { u8 donate_dedicated_cpu; /* Donate dedicated CPU cycles */ u8 fpregs_in_use; u8 pmcregs_in_use; - u8 reserved8[28]; + u8 l2_accumul_cntrs_enable; /* Enable usage of counters for KVM guest */ + u8 reserved8[27]; __be64 wait_state_cycles; /* Wait cycles for this proc */ u8 reserved9[28]; __be16 slb_count; /* # of SLBs to maintain */ @@ -92,9 +93,13 @@ struct lppaca { /* cacheline 4-5 */ __be32 page_ins; /* CMO Hint - # page ins by OS */ - u8 reserved12[148]; + u8 reserved12[28]; + volatile __be64 l1_to_l2_cs_tb; + volatile __be64 l2_to_l1_cs_tb; + volatile __be64 l2_runtime_tb; + u8 reserved13[96]; volatile __be64 dtl_idx;/* Dispatch Trace Log head index */ - u8 reserved13[96]; + u8 reserved14[96]; } cacheline_aligned; #define lppaca_of(cpu) (*paca_ptrs[cpu]->lppaca_ptr) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 8e86eb577..dcd6edd3b 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -4108,6 +4108,27 @@ static void vcpu_vpa_increment_dispatch(struct kvm_vcpu *vcpu) } } +static void do_trace_nested_cs_time(struct kvm_vcpu *vcpu) +{ + struct lppaca *lp = get_lppaca(); + u64 l1_to_l2_ns, l2_to_l1_ns, l2_runtime_ns; + + l1_to_l2_ns = tb_to_ns(be64_to_cpu(lp->l1_to_l2_cs_tb)); + l2_to_l1_ns = tb_to_ns(be64_to_cpu(lp->l2_to_l1_cs_tb)); + l2_runtime_ns = tb_to_ns(be64_to_cpu(lp->l2_runtime_tb)); + trace_kvmppc_vcpu_exit_cs_time(vcpu, l1_to_l2_ns, l2_to_l1_ns, + l2_runtime_ns); + lp->l1_to_l2_cs_tb = 0; + lp->l2_to_l1_cs_tb = 0; + lp->l2_runtime_tb = 0; + lp->l2_accumul_cntrs_enable = 0; + + // Maintain an aggregate of context switch times + vcpu->arch.l1_to_l2_cs_agg += l1_to_l2_ns; + vcpu->arch.l2_to_l1_cs_agg += l2_to_l1_ns; + vcpu->arch.l2_runtime_agg += l2_runtime_ns; +} + static int kvmhv_vcpu_entry_nestedv2(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr, u64 *tb) { @@ -4130,6 +4151,11 @@ static int kvmhv_vcpu_entry_nestedv2(struct kvm_vcpu *vcpu, u64 time_limit, kvmppc_gse_put_u64(io->vcpu_run_input, KVMPPC_GSID_LPCR, lpcr); accumulate_time(vcpu, >arch.in_guest); + + /* Enable the guest host context switch time tracking */ + if
[PATCH v2 3/3] arch: Rename fbdev header and source files
The per-architecture fbdev code has no dependencies on fbdev and can be used for any video-related subsystem. Rename the files to 'video'. Use video-sti.c on parisc as the source file depends on CONFIG_STI_CORE. Further update all includes statements, includ guards, and Makefiles. Also update a few strings and comments to refer to video instead of fbdev. Signed-off-by: Thomas Zimmermann Cc: Vineet Gupta Cc: Catalin Marinas Cc: Will Deacon Cc: Huacai Chen Cc: WANG Xuerui Cc: Geert Uytterhoeven Cc: Thomas Bogendoerfer Cc: "James E.J. Bottomley" Cc: Helge Deller Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Yoshinori Sato Cc: Rich Felker Cc: John Paul Adrian Glaubitz Cc: "David S. Miller" Cc: Andreas Larsson Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: x...@kernel.org Cc: "H. Peter Anvin" --- arch/arc/include/asm/fb.h| 8 arch/arc/include/asm/video.h | 8 arch/arm/include/asm/fb.h| 6 -- arch/arm/include/asm/video.h | 6 ++ arch/arm64/include/asm/fb.h | 10 -- arch/arm64/include/asm/video.h | 10 ++ arch/loongarch/include/asm/{fb.h => video.h} | 8 arch/m68k/include/asm/{fb.h => video.h} | 8 arch/mips/include/asm/{fb.h => video.h} | 12 ++-- arch/parisc/include/asm/{fb.h => video.h}| 8 arch/parisc/video/Makefile | 2 +- arch/parisc/video/{fbdev.c => video-sti.c} | 2 +- arch/powerpc/include/asm/{fb.h => video.h} | 8 arch/powerpc/kernel/pci-common.c | 2 +- arch/sh/include/asm/fb.h | 7 --- arch/sh/include/asm/video.h | 7 +++ arch/sparc/include/asm/{fb.h => video.h} | 8 arch/sparc/video/Makefile| 2 +- arch/sparc/video/{fbdev.c => video.c}| 4 ++-- arch/x86/include/asm/{fb.h => video.h} | 8 arch/x86/video/Makefile | 2 +- arch/x86/video/{fbdev.c => video.c} | 3 ++- include/asm-generic/Kbuild | 2 +- include/asm-generic/{fb.h => video.h}| 6 +++--- include/linux/fb.h | 2 +- 25 files changed, 75 insertions(+), 74 deletions(-) delete mode 100644 arch/arc/include/asm/fb.h create mode 100644 arch/arc/include/asm/video.h delete mode 100644 arch/arm/include/asm/fb.h create mode 100644 arch/arm/include/asm/video.h delete mode 100644 arch/arm64/include/asm/fb.h create mode 100644 arch/arm64/include/asm/video.h rename arch/loongarch/include/asm/{fb.h => video.h} (86%) rename arch/m68k/include/asm/{fb.h => video.h} (86%) rename arch/mips/include/asm/{fb.h => video.h} (76%) rename arch/parisc/include/asm/{fb.h => video.h} (68%) rename arch/parisc/video/{fbdev.c => video-sti.c} (96%) rename arch/powerpc/include/asm/{fb.h => video.h} (76%) delete mode 100644 arch/sh/include/asm/fb.h create mode 100644 arch/sh/include/asm/video.h rename arch/sparc/include/asm/{fb.h => video.h} (89%) rename arch/sparc/video/{fbdev.c => video.c} (86%) rename arch/x86/include/asm/{fb.h => video.h} (77%) rename arch/x86/video/{fbdev.c => video.c} (97%) rename include/asm-generic/{fb.h => video.h} (96%) diff --git a/arch/arc/include/asm/fb.h b/arch/arc/include/asm/fb.h deleted file mode 100644 index 9c2383d29cbb9..0 --- a/arch/arc/include/asm/fb.h +++ /dev/null @@ -1,8 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ - -#ifndef _ASM_FB_H_ -#define _ASM_FB_H_ - -#include - -#endif /* _ASM_FB_H_ */ diff --git a/arch/arc/include/asm/video.h b/arch/arc/include/asm/video.h new file mode 100644 index 0..8ff7263727fe7 --- /dev/null +++ b/arch/arc/include/asm/video.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _ASM_VIDEO_H_ +#define _ASM_VIDEO_H_ + +#include + +#endif /* _ASM_VIDEO_H_ */ diff --git a/arch/arm/include/asm/fb.h b/arch/arm/include/asm/fb.h deleted file mode 100644 index ce20a43c30339..0 --- a/arch/arm/include/asm/fb.h +++ /dev/null @@ -1,6 +0,0 @@ -#ifndef _ASM_FB_H_ -#define _ASM_FB_H_ - -#include - -#endif /* _ASM_FB_H_ */ diff --git a/arch/arm/include/asm/video.h b/arch/arm/include/asm/video.h new file mode 100644 index 0..f570565366e67 --- /dev/null +++ b/arch/arm/include/asm/video.h @@ -0,0 +1,6 @@ +#ifndef _ASM_VIDEO_H_ +#define _ASM_VIDEO_H_ + +#include + +#endif /* _ASM_VIDEO_H_ */ diff --git a/arch/arm64/include/asm/fb.h b/arch/arm64/include/asm/fb.h deleted file mode 100644 index 1a495d8fb2ce0..0 --- a/arch/arm64/include/asm/fb.h +++ /dev/null @@ -1,10 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright (C) 2012 ARM Ltd. - */ -#ifndef __ASM_FB_H_ -#define __ASM_FB_H_ - -#include - -#endif /* __ASM_FB_H_ */ diff --git a/arch/arm64/include/asm/video.h b/arch/arm64/include/asm/video.h new file mode 100644 index
[PATCH v2 1/3] arch: Select fbdev helpers with CONFIG_VIDEO
Various Kconfig options selected the per-architecture helpers for fbdev. But none of the contained code depends on fbdev. Standardize on CONFIG_VIDEO, which will allow to add more general helpers for video functionality. CONFIG_VIDEO protects each architecture's video/ directory. This allows for the use of more fine-grained control for each directory's files, such as the use of CONFIG_STI_CORE on parisc. v2: - sparc: rebased onto Makefile changes Signed-off-by: Thomas Zimmermann Cc: "James E.J. Bottomley" Cc: Helge Deller Cc: "David S. Miller" Cc: Andreas Larsson Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: x...@kernel.org Cc: "H. Peter Anvin" --- arch/parisc/Makefile | 2 +- arch/sparc/Makefile | 4 ++-- arch/sparc/video/Makefile | 2 +- arch/x86/Makefile | 2 +- arch/x86/video/Makefile | 3 ++- 5 files changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/parisc/Makefile b/arch/parisc/Makefile index 316f84f1d15c8..21b8166a68839 100644 --- a/arch/parisc/Makefile +++ b/arch/parisc/Makefile @@ -119,7 +119,7 @@ export LIBGCC libs-y += arch/parisc/lib/ $(LIBGCC) -drivers-y += arch/parisc/video/ +drivers-$(CONFIG_VIDEO) += arch/parisc/video/ boot := arch/parisc/boot diff --git a/arch/sparc/Makefile b/arch/sparc/Makefile index 2a03daa68f285..757451c3ea1df 100644 --- a/arch/sparc/Makefile +++ b/arch/sparc/Makefile @@ -59,8 +59,8 @@ endif libs-y += arch/sparc/prom/ libs-y += arch/sparc/lib/ -drivers-$(CONFIG_PM) += arch/sparc/power/ -drivers-$(CONFIG_FB_CORE) += arch/sparc/video/ +drivers-$(CONFIG_PM)+= arch/sparc/power/ +drivers-$(CONFIG_VIDEO) += arch/sparc/video/ boot := arch/sparc/boot diff --git a/arch/sparc/video/Makefile b/arch/sparc/video/Makefile index d4d83f1702c61..9dd82880a027a 100644 --- a/arch/sparc/video/Makefile +++ b/arch/sparc/video/Makefile @@ -1,3 +1,3 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_FB_CORE) += fbdev.o +obj-y += fbdev.o diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 15a5f4f2ff0aa..c0ea612c62ebe 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -265,7 +265,7 @@ drivers-$(CONFIG_PCI)+= arch/x86/pci/ # suspend and hibernation support drivers-$(CONFIG_PM) += arch/x86/power/ -drivers-$(CONFIG_FB_CORE) += arch/x86/video/ +drivers-$(CONFIG_VIDEO) += arch/x86/video/ # boot loader support. Several targets are kept for legacy purposes diff --git a/arch/x86/video/Makefile b/arch/x86/video/Makefile index 5ebe48752ffc4..9dd82880a027a 100644 --- a/arch/x86/video/Makefile +++ b/arch/x86/video/Makefile @@ -1,2 +1,3 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_FB_CORE) += fbdev.o + +obj-y += fbdev.o -- 2.44.0
[PATCH v2 2/3] arch: Remove struct fb_info from video helpers
The per-architecture video helpers do not depend on struct fb_info or anything else from fbdev. Remove it from the interface and replace fb_is_primary_device() with video_is_primary_device(). The new helper is similar in functionality, but can operate on non-fbdev devices. Signed-off-by: Thomas Zimmermann Cc: "James E.J. Bottomley" Cc: Helge Deller Cc: "David S. Miller" Cc: Andreas Larsson Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: x...@kernel.org Cc: "H. Peter Anvin" --- arch/parisc/include/asm/fb.h | 8 +--- arch/parisc/video/fbdev.c| 9 + arch/sparc/include/asm/fb.h | 7 --- arch/sparc/video/fbdev.c | 17 - arch/x86/include/asm/fb.h| 8 +--- arch/x86/video/fbdev.c | 18 +++--- drivers/video/fbdev/core/fbcon.c | 2 +- include/asm-generic/fb.h | 11 ++- 8 files changed, 41 insertions(+), 39 deletions(-) diff --git a/arch/parisc/include/asm/fb.h b/arch/parisc/include/asm/fb.h index 658a8a7dc5312..ed2a195a3e762 100644 --- a/arch/parisc/include/asm/fb.h +++ b/arch/parisc/include/asm/fb.h @@ -2,11 +2,13 @@ #ifndef _ASM_FB_H_ #define _ASM_FB_H_ -struct fb_info; +#include + +struct device; #if defined(CONFIG_STI_CORE) -int fb_is_primary_device(struct fb_info *info); -#define fb_is_primary_device fb_is_primary_device +bool video_is_primary_device(struct device *dev); +#define video_is_primary_device video_is_primary_device #endif #include diff --git a/arch/parisc/video/fbdev.c b/arch/parisc/video/fbdev.c index e4f8ac99fc9e0..540fa0c919d59 100644 --- a/arch/parisc/video/fbdev.c +++ b/arch/parisc/video/fbdev.c @@ -5,12 +5,13 @@ * Copyright (C) 2001-2002 Thomas Bogendoerfer */ -#include #include #include -int fb_is_primary_device(struct fb_info *info) +#include + +bool video_is_primary_device(struct device *dev) { struct sti_struct *sti; @@ -21,6 +22,6 @@ int fb_is_primary_device(struct fb_info *info) return true; /* return true if it's the default built-in framebuffer driver */ - return (sti->dev == info->device); + return (sti->dev == dev); } -EXPORT_SYMBOL(fb_is_primary_device); +EXPORT_SYMBOL(video_is_primary_device); diff --git a/arch/sparc/include/asm/fb.h b/arch/sparc/include/asm/fb.h index 24440c0fda490..07f0325d6921c 100644 --- a/arch/sparc/include/asm/fb.h +++ b/arch/sparc/include/asm/fb.h @@ -3,10 +3,11 @@ #define _SPARC_FB_H_ #include +#include #include -struct fb_info; +struct device; #ifdef CONFIG_SPARC32 static inline pgprot_t pgprot_framebuffer(pgprot_t prot, @@ -18,8 +19,8 @@ static inline pgprot_t pgprot_framebuffer(pgprot_t prot, #define pgprot_framebuffer pgprot_framebuffer #endif -int fb_is_primary_device(struct fb_info *info); -#define fb_is_primary_device fb_is_primary_device +bool video_is_primary_device(struct device *dev); +#define video_is_primary_device video_is_primary_device static inline void fb_memcpy_fromio(void *to, const volatile void __iomem *from, size_t n) { diff --git a/arch/sparc/video/fbdev.c b/arch/sparc/video/fbdev.c index bff66dd1909a4..e46f0499c2774 100644 --- a/arch/sparc/video/fbdev.c +++ b/arch/sparc/video/fbdev.c @@ -1,26 +1,25 @@ // SPDX-License-Identifier: GPL-2.0 #include -#include +#include #include +#include #include -int fb_is_primary_device(struct fb_info *info) +bool video_is_primary_device(struct device *dev) { - struct device *dev = info->device; - struct device_node *node; + struct device_node *node = dev->of_node; if (console_set_on_cmdline) - return 0; + return false; - node = dev->of_node; if (node && node == of_console_device) - return 1; + return true; - return 0; + return false; } -EXPORT_SYMBOL(fb_is_primary_device); +EXPORT_SYMBOL(video_is_primary_device); MODULE_DESCRIPTION("Sparc fbdev helpers"); MODULE_LICENSE("GPL"); diff --git a/arch/x86/include/asm/fb.h b/arch/x86/include/asm/fb.h index c3b9582de7efd..999db33792869 100644 --- a/arch/x86/include/asm/fb.h +++ b/arch/x86/include/asm/fb.h @@ -2,17 +2,19 @@ #ifndef _ASM_X86_FB_H #define _ASM_X86_FB_H +#include + #include -struct fb_info; +struct device; pgprot_t pgprot_framebuffer(pgprot_t prot, unsigned long vm_start, unsigned long vm_end, unsigned long offset); #define pgprot_framebuffer pgprot_framebuffer -int fb_is_primary_device(struct fb_info *info); -#define fb_is_primary_device fb_is_primary_device +bool video_is_primary_device(struct device *dev); +#define video_is_primary_device video_is_primary_device #include diff --git a/arch/x86/video/fbdev.c b/arch/x86/video/fbdev.c index 1dd6528cc947c..4d87ce8e257fe 100644 --- a/arch/x86/video/fbdev.c +++ b/arch/x86/video/fbdev.c @@ -7,7 +7,6 @@ * */ -#include #include
[PATCH v2 0/3] arch: Remove fbdev dependency from video helpers
Make architecture helpers for display functionality depend on general video functionality instead of fbdev. This avoids the dependency on fbdev and makes the functionality available for non-fbdev code. Patch 1 replaces the variety of Kconfig options that control the Makefiles with CONFIG_VIDEO. More fine-grained control of the build can then be done within each video/ directory; see parisc for an example. Patch 2 replaces fb_is_primary_device() with video_is_primary_device(), which has no dependencies on fbdev. The implementation remains identical on all affected platforms. There's one minor change in fbcon, which is the only caller of fb_is_primary_device(). Patch 3 renames the source and files from fbdev to video. v2: - improve cover letter - rebase onto v6.9-rc1 Thomas Zimmermann (3): arch: Select fbdev helpers with CONFIG_VIDEO arch: Remove struct fb_info from video helpers arch: Rename fbdev header and source files arch/arc/include/asm/fb.h| 8 -- arch/arc/include/asm/video.h | 8 ++ arch/arm/include/asm/fb.h| 6 - arch/arm/include/asm/video.h | 6 + arch/arm64/include/asm/fb.h | 10 arch/arm64/include/asm/video.h | 10 arch/loongarch/include/asm/{fb.h => video.h} | 8 +++--- arch/m68k/include/asm/{fb.h => video.h} | 8 +++--- arch/mips/include/asm/{fb.h => video.h} | 12 - arch/parisc/Makefile | 2 +- arch/parisc/include/asm/fb.h | 14 --- arch/parisc/include/asm/video.h | 16 arch/parisc/video/Makefile | 2 +- arch/parisc/video/{fbdev.c => video-sti.c} | 9 --- arch/powerpc/include/asm/{fb.h => video.h} | 8 +++--- arch/powerpc/kernel/pci-common.c | 2 +- arch/sh/include/asm/fb.h | 7 -- arch/sh/include/asm/video.h | 7 ++ arch/sparc/Makefile | 4 +-- arch/sparc/include/asm/{fb.h => video.h} | 15 +-- arch/sparc/video/Makefile| 2 +- arch/sparc/video/fbdev.c | 26 arch/sparc/video/video.c | 25 +++ arch/x86/Makefile| 2 +- arch/x86/include/asm/fb.h| 19 -- arch/x86/include/asm/video.h | 21 arch/x86/video/Makefile | 3 ++- arch/x86/video/{fbdev.c => video.c} | 21 +++- drivers/video/fbdev/core/fbcon.c | 2 +- include/asm-generic/Kbuild | 2 +- include/asm-generic/{fb.h => video.h}| 17 +++-- include/linux/fb.h | 2 +- 32 files changed, 154 insertions(+), 150 deletions(-) delete mode 100644 arch/arc/include/asm/fb.h create mode 100644 arch/arc/include/asm/video.h delete mode 100644 arch/arm/include/asm/fb.h create mode 100644 arch/arm/include/asm/video.h delete mode 100644 arch/arm64/include/asm/fb.h create mode 100644 arch/arm64/include/asm/video.h rename arch/loongarch/include/asm/{fb.h => video.h} (86%) rename arch/m68k/include/asm/{fb.h => video.h} (86%) rename arch/mips/include/asm/{fb.h => video.h} (76%) delete mode 100644 arch/parisc/include/asm/fb.h create mode 100644 arch/parisc/include/asm/video.h rename arch/parisc/video/{fbdev.c => video-sti.c} (78%) rename arch/powerpc/include/asm/{fb.h => video.h} (76%) delete mode 100644 arch/sh/include/asm/fb.h create mode 100644 arch/sh/include/asm/video.h rename arch/sparc/include/asm/{fb.h => video.h} (75%) delete mode 100644 arch/sparc/video/fbdev.c create mode 100644 arch/sparc/video/video.c delete mode 100644 arch/x86/include/asm/fb.h create mode 100644 arch/x86/include/asm/video.h rename arch/x86/video/{fbdev.c => video.c} (66%) rename include/asm-generic/{fb.h => video.h} (89%) -- 2.44.0
[PATCH v3 14/14] selftests/fpu: Allow building on other architectures
Now that ARCH_HAS_KERNEL_FPU_SUPPORT provides a common way to compile and run floating-point code, this test is no longer x86-specific. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) lib/Kconfig.debug | 2 +- lib/Makefile| 25 ++--- lib/test_fpu_glue.c | 5 - 3 files changed, 7 insertions(+), 25 deletions(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index c63a5fbf1f1c..f93e778e0405 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2890,7 +2890,7 @@ config TEST_FREE_PAGES config TEST_FPU tristate "Test floating point operations in kernel space" - depends on X86 && !KCOV_INSTRUMENT_ALL + depends on ARCH_HAS_KERNEL_FPU_SUPPORT && !KCOV_INSTRUMENT_ALL help Enable this option to add /sys/kernel/debug/selftest_helpers/test_fpu which will trigger a sequence of floating point operations. This is used diff --git a/lib/Makefile b/lib/Makefile index fcb35bf50979..e44ad11f77b5 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -110,31 +110,10 @@ CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE) obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o obj-$(CONFIG_TEST_OBJPOOL) += test_objpool.o -# -# CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns -# off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS -# get appended last to CFLAGS and thus override those previous compiler options. -# -FPU_CFLAGS := -msse -msse2 -ifdef CONFIG_CC_IS_GCC -# Stack alignment mismatch, proceed with caution. -# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 -# (8B stack alignment). -# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383 -# -# The "-msse" in the first argument is there so that the -# -mpreferred-stack-boundary=3 build error: -# -# -mpreferred-stack-boundary=3 is not between 4 and 12 -# -# can be triggered. Otherwise gcc doesn't complain. -FPU_CFLAGS += -mhard-float -FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4) -endif - obj-$(CONFIG_TEST_FPU) += test_fpu.o test_fpu-y := test_fpu_glue.o test_fpu_impl.o -CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS) +CFLAGS_test_fpu_impl.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_test_fpu_impl.o += $(CC_FLAGS_NO_FPU) # Some KUnit files (hooks.o) need to be built-in even when KUnit is a module, # so we can't just use obj-$(CONFIG_KUNIT). diff --git a/lib/test_fpu_glue.c b/lib/test_fpu_glue.c index 85963d7be826..eef282a2715f 100644 --- a/lib/test_fpu_glue.c +++ b/lib/test_fpu_glue.c @@ -17,7 +17,7 @@ #include #include #include -#include +#include #include "test_fpu.h" @@ -38,6 +38,9 @@ static struct dentry *selftest_dir; static int __init test_fpu_init(void) { + if (!kernel_fpu_available()) + return -EINVAL; + selftest_dir = debugfs_create_dir("selftest_helpers", NULL); if (!selftest_dir) return -ENOMEM; -- 2.43.1
[PATCH v3 13/14] selftests/fpu: Move FP code to a separate translation unit
This ensures no compiler-generated floating-point code can appear outside kernel_fpu_{begin,end}() sections, and some architectures enforce this separation. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Declare test_fpu() in a header lib/Makefile| 3 ++- lib/test_fpu.h | 8 +++ lib/{test_fpu.c => test_fpu_glue.c} | 32 + lib/test_fpu_impl.c | 37 + 4 files changed, 48 insertions(+), 32 deletions(-) create mode 100644 lib/test_fpu.h rename lib/{test_fpu.c => test_fpu_glue.c} (71%) create mode 100644 lib/test_fpu_impl.c diff --git a/lib/Makefile b/lib/Makefile index ffc6b2341b45..fcb35bf50979 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -133,7 +133,8 @@ FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-st endif obj-$(CONFIG_TEST_FPU) += test_fpu.o -CFLAGS_test_fpu.o += $(FPU_CFLAGS) +test_fpu-y := test_fpu_glue.o test_fpu_impl.o +CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS) # Some KUnit files (hooks.o) need to be built-in even when KUnit is a module, # so we can't just use obj-$(CONFIG_KUNIT). diff --git a/lib/test_fpu.h b/lib/test_fpu.h new file mode 100644 index ..4459807084bc --- /dev/null +++ b/lib/test_fpu.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ + +#ifndef _LIB_TEST_FPU_H +#define _LIB_TEST_FPU_H + +int test_fpu(void); + +#endif diff --git a/lib/test_fpu.c b/lib/test_fpu_glue.c similarity index 71% rename from lib/test_fpu.c rename to lib/test_fpu_glue.c index e82db19fed84..85963d7be826 100644 --- a/lib/test_fpu.c +++ b/lib/test_fpu_glue.c @@ -19,37 +19,7 @@ #include #include -static int test_fpu(void) -{ - /* -* This sequence of operations tests that rounding mode is -* to nearest and that denormal numbers are supported. -* Volatile variables are used to avoid compiler optimizing -* the calculations away. -*/ - volatile double a, b, c, d, e, f, g; - - a = 4.0; - b = 1e-15; - c = 1e-310; - - /* Sets precision flag */ - d = a + b; - - /* Result depends on rounding mode */ - e = a + b / 2; - - /* Denormal and very large values */ - f = b / c; - - /* Depends on denormal support */ - g = a + c * f; - - if (d > a && e > a && g > a) - return 0; - else - return -EINVAL; -} +#include "test_fpu.h" static int test_fpu_get(void *data, u64 *val) { diff --git a/lib/test_fpu_impl.c b/lib/test_fpu_impl.c new file mode 100644 index ..777894dbbe86 --- /dev/null +++ b/lib/test_fpu_impl.c @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: GPL-2.0+ + +#include + +#include "test_fpu.h" + +int test_fpu(void) +{ + /* +* This sequence of operations tests that rounding mode is +* to nearest and that denormal numbers are supported. +* Volatile variables are used to avoid compiler optimizing +* the calculations away. +*/ + volatile double a, b, c, d, e, f, g; + + a = 4.0; + b = 1e-15; + c = 1e-310; + + /* Sets precision flag */ + d = a + b; + + /* Result depends on rounding mode */ + e = a + b / 2; + + /* Denormal and very large values */ + f = b / c; + + /* Depends on denormal support */ + g = a + c * f; + + if (d > a && e > a && g > a) + return 0; + else + return -EINVAL; +} -- 2.43.1
[PATCH v3 12/14] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
Now that all previously-supported architectures select ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead of the existing list of architectures. It can also take advantage of the common kernel-mode FPU API and method of adjusting CFLAGS. Acked-by: Alex Deucher Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Split altivec removal to a separate patch - Use linux/fpu.h instead of asm/fpu.h in consumers drivers/gpu/drm/amd/display/Kconfig | 2 +- .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c| 27 ++ drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 ++- drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 ++- 4 files changed, 7 insertions(+), 94 deletions(-) diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig index 901d1961b739..5fcd4f778dc3 100644 --- a/drivers/gpu/drm/amd/display/Kconfig +++ b/drivers/gpu/drm/amd/display/Kconfig @@ -8,7 +8,7 @@ config DRM_AMD_DC depends on BROKEN || !CC_IS_CLANG || ARM64 || RISCV || SPARC64 || X86_64 select SND_HDA_COMPONENT if SND_HDA_CORE # !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752 - select DRM_AMD_DC_FP if (X86 || LOONGARCH || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG)) + select DRM_AMD_DC_FP if ARCH_HAS_KERNEL_FPU_SUPPORT && (!ARM64 || !CC_IS_CLANG) help Choose this option if you want to use the new display engine support for AMDGPU. This adds required support for Vega and diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c index 0de16796466b..e46f8ce41d87 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c @@ -26,16 +26,7 @@ #include "dc_trace.h" -#if defined(CONFIG_X86) -#include -#elif defined(CONFIG_PPC64) -#include -#include -#elif defined(CONFIG_ARM64) -#include -#elif defined(CONFIG_LOONGARCH) -#include -#endif +#include /** * DOC: DC FPU manipulation overview @@ -87,16 +78,9 @@ void dc_fpu_begin(const char *function_name, const int line) WARN_ON_ONCE(!in_task()); preempt_disable(); depth = __this_cpu_inc_return(fpu_recursion_depth); - if (depth == 1) { -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) + BUG_ON(!kernel_fpu_available()); kernel_fpu_begin(); -#elif defined(CONFIG_PPC64) - if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) - enable_kernel_fp(); -#elif defined(CONFIG_ARM64) - kernel_neon_begin(); -#endif } TRACE_DCN_FPU(true, function_name, line, depth); @@ -118,14 +102,7 @@ void dc_fpu_end(const char *function_name, const int line) depth = __this_cpu_dec_return(fpu_recursion_depth); if (depth == 0) { -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_end(); -#elif defined(CONFIG_PPC64) - if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) - disable_kernel_fp(); -#elif defined(CONFIG_ARM64) - kernel_neon_end(); -#endif } else { WARN_ON_ONCE(depth < 0); } diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile index 59d3972341d2..a94b6d546cd1 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile @@ -25,40 +25,8 @@ # It provides the general basic services required by other DAL # subcomponents. -ifdef CONFIG_X86 -dml_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float -dml_ccflags := $(dml_ccflags-y) -msse -endif - -ifdef CONFIG_PPC64 -dml_ccflags := -mhard-float -endif - -ifdef CONFIG_ARM64 -dml_rcflags := -mgeneral-regs-only -endif - -ifdef CONFIG_LOONGARCH -dml_ccflags := -mfpu=64 -dml_rcflags := -msoft-float -endif - -ifdef CONFIG_CC_IS_GCC -ifneq ($(call gcc-min-version, 70100),y) -IS_OLD_GCC = 1 -endif -endif - -ifdef CONFIG_X86 -ifdef IS_OLD_GCC -# Stack alignment mismatch, proceed with caution. -# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 -# (8B stack alignment). -dml_ccflags += -mpreferred-stack-boundary=4 -else -dml_ccflags += -msse2 -endif -endif +dml_ccflags := $(CC_FLAGS_FPU) +dml_rcflags := $(CC_FLAGS_NO_FPU) ifneq ($(CONFIG_FRAME_WARN),0) ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile index 7b51364084b5..4f6c804a26ad 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile @@ -24,40 +24,8 @@ # # Makefile for dml2. -ifdef CONFIG_X86 -dml2_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float -dml2_ccflags := $(dml2_ccflags-y) -msse -endif - -ifdef CONFIG_PPC64 -dml2_ccflags :=
[PATCH v3 11/14] drm/amd/display: Only use hard-float, not altivec on powerpc
From: Michael Ellerman The compiler flags enable altivec, but that is not required; hard-float is sufficient for the code to build and function. Drop altivec from the compiler flags and adjust the enable/disable code to only enable FPU use. Signed-off-by: Michael Ellerman Acked-by: Alex Deucher Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - New patch for v2 drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 12 ++-- drivers/gpu/drm/amd/display/dc/dml/Makefile| 2 +- drivers/gpu/drm/amd/display/dc/dml2/Makefile | 2 +- 3 files changed, 4 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c index 4ae4720535a5..0de16796466b 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c @@ -92,11 +92,7 @@ void dc_fpu_begin(const char *function_name, const int line) #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_begin(); #elif defined(CONFIG_PPC64) - if (cpu_has_feature(CPU_FTR_VSX_COMP)) - enable_kernel_vsx(); - else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) - enable_kernel_altivec(); - else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) + if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) enable_kernel_fp(); #elif defined(CONFIG_ARM64) kernel_neon_begin(); @@ -125,11 +121,7 @@ void dc_fpu_end(const char *function_name, const int line) #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_end(); #elif defined(CONFIG_PPC64) - if (cpu_has_feature(CPU_FTR_VSX_COMP)) - disable_kernel_vsx(); - else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) - disable_kernel_altivec(); - else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) + if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) disable_kernel_fp(); #elif defined(CONFIG_ARM64) kernel_neon_end(); diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile index c4a5efd2dda5..59d3972341d2 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile @@ -31,7 +31,7 @@ dml_ccflags := $(dml_ccflags-y) -msse endif ifdef CONFIG_PPC64 -dml_ccflags := -mhard-float -maltivec +dml_ccflags := -mhard-float endif ifdef CONFIG_ARM64 diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile index acff3449b8d7..7b51364084b5 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile @@ -30,7 +30,7 @@ dml2_ccflags := $(dml2_ccflags-y) -msse endif ifdef CONFIG_PPC64 -dml2_ccflags := -mhard-float -maltivec +dml2_ccflags := -mhard-float endif ifdef CONFIG_ARM64 -- 2.43.1
[PATCH v3 10/14] riscv: Add support for kernel-mode FPU
This is motivated by the amdgpu DRM driver, which needs floating-point code to support recent hardware. That code is not performance-critical, so only provide a minimal non-preemptible implementation for now. Support is limited to riscv64 because riscv32 requires runtime (libgcc) assistance to convert between doubles and 64-bit integers. Acked-by: Palmer Dabbelt Reviewed-by: Palmer Dabbelt Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- Changes in v3: - Rebase on v6.9-rc1 - Limit ARCH_HAS_KERNEL_FPU_SUPPORT to 64BIT Changes in v2: - Remove RISC-V architecture-specific preprocessor check arch/riscv/Kconfig | 1 + arch/riscv/Makefile | 3 +++ arch/riscv/include/asm/fpu.h| 16 arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/kernel_mode_fpu.c | 28 5 files changed, 49 insertions(+) create mode 100644 arch/riscv/include/asm/fpu.h create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index be09c8836d56..3bcd0d250810 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -27,6 +27,7 @@ config RISCV select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if 64BIT && FPU select ARCH_HAS_MEMBARRIER_CALLBACKS select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_MMIOWB diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile index 252d63942f34..76ff4033c854 100644 --- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -84,6 +84,9 @@ KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64i KBUILD_AFLAGS += -march=$(riscv-march-y) +# For C code built with floating-point support, exclude V but keep F and D. +CC_FLAGS_FPU := -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)([^v_]*)v?/\1\2/') + KBUILD_CFLAGS += -mno-save-restore KBUILD_CFLAGS += -DCONFIG_PAGE_OFFSET=$(CONFIG_PAGE_OFFSET) diff --git a/arch/riscv/include/asm/fpu.h b/arch/riscv/include/asm/fpu.h new file mode 100644 index ..91c04c244e12 --- /dev/null +++ b/arch/riscv/include/asm/fpu.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef _ASM_RISCV_FPU_H +#define _ASM_RISCV_FPU_H + +#include + +#define kernel_fpu_available() has_fpu() + +void kernel_fpu_begin(void); +void kernel_fpu_end(void); + +#endif /* ! _ASM_RISCV_FPU_H */ diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 81d94a8ee10f..5b243d46f4b1 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -67,6 +67,7 @@ obj-$(CONFIG_RISCV_MISALIGNED)+= unaligned_access_speed.o obj-$(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS) += copy-unaligned.o obj-$(CONFIG_FPU) += fpu.o +obj-$(CONFIG_FPU) += kernel_mode_fpu.o obj-$(CONFIG_RISCV_ISA_V) += vector.o obj-$(CONFIG_RISCV_ISA_V) += kernel_mode_vector.o obj-$(CONFIG_SMP) += smpboot.o diff --git a/arch/riscv/kernel/kernel_mode_fpu.c b/arch/riscv/kernel/kernel_mode_fpu.c new file mode 100644 index ..0ac8348876c4 --- /dev/null +++ b/arch/riscv/kernel/kernel_mode_fpu.c @@ -0,0 +1,28 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2023 SiFive + */ + +#include +#include + +#include +#include +#include +#include + +void kernel_fpu_begin(void) +{ + preempt_disable(); + fstate_save(current, task_pt_regs(current)); + csr_set(CSR_SSTATUS, SR_FS); +} +EXPORT_SYMBOL_GPL(kernel_fpu_begin); + +void kernel_fpu_end(void) +{ + csr_clear(CSR_SSTATUS, SR_FS); + fstate_restore(current, task_pt_regs(current)); + preempt_enable(); +} +EXPORT_SYMBOL_GPL(kernel_fpu_end); -- 2.43.1
[PATCH v3 09/14] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
x86 already provides kernel_fpu_begin() and kernel_fpu_end(), but in a different header. Add a wrapper header, and export the CFLAGS adjustments as found in lib/Makefile. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) arch/x86/Kconfig | 1 + arch/x86/Makefile | 20 arch/x86/include/asm/fpu.h | 13 + 3 files changed, 34 insertions(+) create mode 100644 arch/x86/include/asm/fpu.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 39886bab943a..7c9d032ee675 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -83,6 +83,7 @@ config X86 select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOVif X86_64 + select ARCH_HAS_KERNEL_FPU_SUPPORT select ARCH_HAS_MEM_ENCRYPT select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 662d9d4033e6..5a5f5999c505 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -74,6 +74,26 @@ KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx KBUILD_RUSTFLAGS += --target=$(objtree)/scripts/target.json KBUILD_RUSTFLAGS += -Ctarget-feature=-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-avx,-avx2 +# +# CFLAGS for compiling floating point code inside the kernel. +# +CC_FLAGS_FPU := -msse -msse2 +ifdef CONFIG_CC_IS_GCC +# Stack alignment mismatch, proceed with caution. +# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 +# (8B stack alignment). +# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383 +# +# The "-msse" in the first argument is there so that the +# -mpreferred-stack-boundary=3 build error: +# +# -mpreferred-stack-boundary=3 is not between 4 and 12 +# +# can be triggered. Otherwise gcc doesn't complain. +CC_FLAGS_FPU += -mhard-float +CC_FLAGS_FPU += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4) +endif + ifeq ($(CONFIG_X86_KERNEL_IBT),y) # # Kernel IBT has S_CET.NOTRACK_EN=0, as such the compilers must not generate diff --git a/arch/x86/include/asm/fpu.h b/arch/x86/include/asm/fpu.h new file mode 100644 index ..b2743fe19339 --- /dev/null +++ b/arch/x86/include/asm/fpu.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef _ASM_X86_FPU_H +#define _ASM_X86_FPU_H + +#include + +#define kernel_fpu_available() true + +#endif /* ! _ASM_X86_FPU_H */ -- 2.43.1
[PATCH v3 08/14] powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
PowerPC provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. The PowerPC API also requires a non-preemptible context. Add a wrapper header, and export the CFLAGS adjustments. Acked-by: Michael Ellerman (powerpc) Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 5 - arch/powerpc/include/asm/fpu.h | 28 3 files changed, 33 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/include/asm/fpu.h diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 1c4be3373686..c42a57b6839d 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -137,6 +137,7 @@ config PPC select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_HUGEPD if HUGETLB_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if PPC_FPU select ARCH_HAS_MEMBARRIER_CALLBACKS select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_MEMREMAP_COMPAT_ALIGN if PPC_64S_HASH_MMU diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 65261cbe5bfd..93d89f055b70 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -153,6 +153,9 @@ CFLAGS-$(CONFIG_PPC32) += $(call cc-option, $(MULTIPLEWORD)) CFLAGS-$(CONFIG_PPC32) += $(call cc-option,-mno-readonly-in-sdata) +CC_FLAGS_FPU := $(call cc-option,-mhard-float) +CC_FLAGS_NO_FPU:= $(call cc-option,-msoft-float) + ifdef CONFIG_FUNCTION_TRACER ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY KBUILD_CPPFLAGS+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY @@ -174,7 +177,7 @@ asinstr := $(call as-instr,lis 9$(comma)foo@high,-DHAVE_AS_ATHIGH=1) KBUILD_CPPFLAGS+= -I $(srctree)/arch/powerpc $(asinstr) KBUILD_AFLAGS += $(AFLAGS-y) -KBUILD_CFLAGS += $(call cc-option,-msoft-float) +KBUILD_CFLAGS += $(CC_FLAGS_NO_FPU) KBUILD_CFLAGS += $(CFLAGS-y) CPP= $(CC) -E $(KBUILD_CFLAGS) diff --git a/arch/powerpc/include/asm/fpu.h b/arch/powerpc/include/asm/fpu.h new file mode 100644 index ..ca584e4bc40f --- /dev/null +++ b/arch/powerpc/include/asm/fpu.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef _ASM_POWERPC_FPU_H +#define _ASM_POWERPC_FPU_H + +#include + +#include +#include + +#define kernel_fpu_available() (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) + +static inline void kernel_fpu_begin(void) +{ + preempt_disable(); + enable_kernel_fp(); +} + +static inline void kernel_fpu_end(void) +{ + disable_kernel_fp(); + preempt_enable(); +} + +#endif /* ! _ASM_POWERPC_FPU_H */ -- 2.43.1
[PATCH v3 07/14] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in asm/fpu.h, so it only needs to add kernel_fpu_available() and export the CFLAGS adjustments. Acked-by: WANG Xuerui Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- Changes in v3: - Rebase on v6.9-rc1 arch/loongarch/Kconfig | 1 + arch/loongarch/Makefile | 5 - arch/loongarch/include/asm/fpu.h | 1 + 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index a5f300ec6f28..2266c6c41c38 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -18,6 +18,7 @@ config LOONGARCH select ARCH_HAS_CURRENT_STACK_POINTER select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if CPU_HAS_FPU select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PTE_SPECIAL diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile index df6caf79537a..efb5440a43ec 100644 --- a/arch/loongarch/Makefile +++ b/arch/loongarch/Makefile @@ -26,6 +26,9 @@ endif 32bit-emul = elf32loongarch 64bit-emul = elf64loongarch +CC_FLAGS_FPU := -mfpu=64 +CC_FLAGS_NO_FPU:= -msoft-float + ifdef CONFIG_UNWINDER_ORC orc_hash_h := arch/$(SRCARCH)/include/generated/asm/orc_hash.h orc_hash_sh := $(srctree)/scripts/orc_hash.sh @@ -59,7 +62,7 @@ ld-emul = $(64bit-emul) cflags-y += -mabi=lp64s endif -cflags-y += -pipe -msoft-float +cflags-y += -pipe $(CC_FLAGS_NO_FPU) LDFLAGS_vmlinux+= -static -n -nostdlib # When the assembler supports explicit relocation hint, we must use it. diff --git a/arch/loongarch/include/asm/fpu.h b/arch/loongarch/include/asm/fpu.h index c2d8962fda00..3177674228f8 100644 --- a/arch/loongarch/include/asm/fpu.h +++ b/arch/loongarch/include/asm/fpu.h @@ -21,6 +21,7 @@ struct sigcontext; +#define kernel_fpu_available() cpu_has_fpu extern void kernel_fpu_begin(void); extern void kernel_fpu_end(void); -- 2.43.1
[PATCH v3 06/14] lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source tree, use it instead of duplicating the flags here. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) lib/raid6/Makefile | 31 --- 1 file changed, 8 insertions(+), 23 deletions(-) diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile index 385a94aa0b99..c71984e04c4d 100644 --- a/lib/raid6/Makefile +++ b/lib/raid6/Makefile @@ -33,25 +33,6 @@ CFLAGS_REMOVE_vpermxor8.o += -msoft-float endif endif -# The GCC option -ffreestanding is required in order to compile code containing -# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel) -ifeq ($(CONFIG_KERNEL_MODE_NEON),y) -NEON_FLAGS := -ffreestanding -# Enable -NEON_FLAGS += -isystem $(shell $(CC) -print-file-name=include) -ifeq ($(ARCH),arm) -NEON_FLAGS += -march=armv7-a -mfloat-abi=softfp -mfpu=neon -endif -CFLAGS_recov_neon_inner.o += $(NEON_FLAGS) -ifeq ($(ARCH),arm64) -CFLAGS_REMOVE_recov_neon_inner.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon1.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon2.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon4.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon8.o += -mgeneral-regs-only -endif -endif - quiet_cmd_unroll = UNROLL $@ cmd_unroll = $(AWK) -v N=$* -f $(srctree)/$(src)/unroll.awk < $< > $@ @@ -75,10 +56,14 @@ targets += vpermxor1.c vpermxor2.c vpermxor4.c vpermxor8.c $(obj)/vpermxor%.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE $(call if_changed,unroll) -CFLAGS_neon1.o += $(NEON_FLAGS) -CFLAGS_neon2.o += $(NEON_FLAGS) -CFLAGS_neon4.o += $(NEON_FLAGS) -CFLAGS_neon8.o += $(NEON_FLAGS) +CFLAGS_neon1.o += $(CC_FLAGS_FPU) +CFLAGS_neon2.o += $(CC_FLAGS_FPU) +CFLAGS_neon4.o += $(CC_FLAGS_FPU) +CFLAGS_neon8.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_neon1.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_neon2.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_neon4.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_neon8.o += $(CC_FLAGS_NO_FPU) targets += neon1.c neon2.c neon4.c neon8.c $(obj)/neon%.c: $(src)/neon.uc $(src)/unroll.awk FORCE $(call if_changed,unroll) -- 2.43.1
[PATCH v3 05/14] arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source tree, use it instead of duplicating the flags here. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - New patch for v2 arch/arm64/lib/Makefile | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile index 29490be2546b..13e6a2829116 100644 --- a/arch/arm64/lib/Makefile +++ b/arch/arm64/lib/Makefile @@ -7,10 +7,8 @@ lib-y := clear_user.o delay.o copy_from_user.o \ ifeq ($(CONFIG_KERNEL_MODE_NEON), y) obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o -CFLAGS_REMOVE_xor-neon.o += -mgeneral-regs-only -CFLAGS_xor-neon.o += -ffreestanding -# Enable -CFLAGS_xor-neon.o += -isystem $(shell $(CC) -print-file-name=include) +CFLAGS_xor-neon.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_xor-neon.o += $(CC_FLAGS_NO_FPU) endif lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o -- 2.43.1
[PATCH v3 04/14] arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
arm64 provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. Add a wrapper header, and export CFLAGS adjustments as found in lib/raid6/Makefile. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Remove file name from header comment arch/arm64/Kconfig | 1 + arch/arm64/Makefile | 9 - arch/arm64/include/asm/fpu.h | 15 +++ 3 files changed, 24 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/include/asm/fpu.h diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 7b11c98b3e84..67f0d3b5b7df 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -30,6 +30,7 @@ config ARM64 select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON select ARCH_HAS_KEEPINITRD select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 0e075d3c546b..3e863e5b0169 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -36,7 +36,14 @@ ifeq ($(CONFIG_BROKEN_GAS_INST),y) $(warning Detected assembler with broken .inst; disassembly will be unreliable) endif -KBUILD_CFLAGS += -mgeneral-regs-only \ +# The GCC option -ffreestanding is required in order to compile code containing +# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel) +CC_FLAGS_FPU := -ffreestanding +# Enable +CC_FLAGS_FPU += -isystem $(shell $(CC) -print-file-name=include) +CC_FLAGS_NO_FPU:= -mgeneral-regs-only + +KBUILD_CFLAGS += $(CC_FLAGS_NO_FPU) \ $(compat_vdso) $(cc_has_k_constraint) KBUILD_CFLAGS += $(call cc-disable-warning, psabi) KBUILD_AFLAGS += $(compat_vdso) diff --git a/arch/arm64/include/asm/fpu.h b/arch/arm64/include/asm/fpu.h new file mode 100644 index ..2ae50bdce59b --- /dev/null +++ b/arch/arm64/include/asm/fpu.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef __ASM_FPU_H +#define __ASM_FPU_H + +#include + +#define kernel_fpu_available() cpu_has_neon() +#define kernel_fpu_begin() kernel_neon_begin() +#define kernel_fpu_end() kernel_neon_end() + +#endif /* ! __ASM_FPU_H */ -- 2.43.1
[PATCH v3 03/14] ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source tree, use it instead of duplicating the flags here. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) arch/arm/lib/Makefile | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index 650404be6768..0ca5aae1bcc3 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -40,8 +40,7 @@ $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S ifeq ($(CONFIG_KERNEL_MODE_NEON),y) - NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp -mfpu=neon - CFLAGS_xor-neon.o+= $(NEON_FLAGS) + CFLAGS_xor-neon.o+= $(CC_FLAGS_FPU) obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o endif -- 2.43.1
[PATCH v3 01/14] arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT
Several architectures provide an API to enable the FPU and run floating-point SIMD code in kernel space. However, the function names, header locations, and semantics are inconsistent across architectures, and FPU support may be gated behind other Kconfig options. Provide a standard way for architectures to declare that kernel space FPU support is available. Architectures selecting this option must implement what is currently the most common API (kernel_fpu_begin() and kernel_fpu_end(), plus a new function kernel_fpu_available()) and provide the appropriate CFLAGS for compiling floating-point C code. Suggested-by: Christoph Hellwig Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Add documentation explaining the built-time and runtime APIs - Add a linux/fpu.h header for generic isolation enforcement Documentation/core-api/floating-point.rst | 78 +++ Documentation/core-api/index.rst | 1 + Makefile | 5 ++ arch/Kconfig | 6 ++ include/linux/fpu.h | 12 5 files changed, 102 insertions(+) create mode 100644 Documentation/core-api/floating-point.rst create mode 100644 include/linux/fpu.h diff --git a/Documentation/core-api/floating-point.rst b/Documentation/core-api/floating-point.rst new file mode 100644 index ..a8d0d4b05052 --- /dev/null +++ b/Documentation/core-api/floating-point.rst @@ -0,0 +1,78 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Floating-point API +== + +Kernel code is normally prohibited from using floating-point (FP) registers or +instructions, including the C float and double data types. This rule reduces +system call overhead, because the kernel does not need to save and restore the +userspace floating-point register state. + +However, occasionally drivers or library functions may need to include FP code. +This is supported by isolating the functions containing FP code to a separate +translation unit (a separate source file), and saving/restoring the FP register +state around calls to those functions. This creates "critical sections" of +floating-point usage. + +The reason for this isolation is to prevent the compiler from generating code +touching the FP registers outside these critical sections. Compilers sometimes +use FP registers to optimize inlined ``memcpy`` or variable assignment, as +floating-point registers may be wider than general-purpose registers. + +Usability of floating-point code within the kernel is architecture-specific. +Additionally, because a single kernel may be configured to support platforms +both with and without a floating-point unit, FPU availability must be checked +both at build time and at run time. + +Several architectures implement the generic kernel floating-point API from +``linux/fpu.h``, as described below. Some other architectures implement their +own unique APIs, which are documented separately. + +Build-time API +-- + +Floating-point code may be built if the option ``ARCH_HAS_KERNEL_FPU_SUPPORT`` +is enabled. For C code, such code must be placed in a separate file, and that +file must have its compilation flags adjusted using the following pattern:: + +CFLAGS_foo.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_foo.o += $(CC_FLAGS_NO_FPU) + +Architectures are expected to define one or both of these variables in their +top-level Makefile as needed. For example:: + +CC_FLAGS_FPU := -mhard-float + +or:: + +CC_FLAGS_NO_FPU := -msoft-float + +Normal kernel code is assumed to use the equivalent of ``CC_FLAGS_NO_FPU``. + +Runtime API +--- + +The runtime API is provided in ``linux/fpu.h``. This header cannot be included +from files implementing FP code (those with their compilation flags adjusted as +above). Instead, it must be included when defining the FP critical sections. + +.. c:function:: bool kernel_fpu_available( void ) + +This function reports if floating-point code can be used on this CPU or +platform. The value returned by this function is not expected to change +at runtime, so it only needs to be called once, not before every +critical section. + +.. c:function:: void kernel_fpu_begin( void ) +void kernel_fpu_end( void ) + +These functions create a floating-point critical section. It is only +valid to call ``kernel_fpu_begin()`` after a previous call to +``kernel_fpu_available()`` returned ``true``. These functions are only +guaranteed to be callable from (preemptible or non-preemptible) process +context. + +Preemption may be disabled inside critical sections, so their size +should be minimized. They are *not* required to be reentrant. If the +caller expects to nest critical sections, it must implement its own +reference counting. diff --git a/Documentation/core-api/index.rst
[PATCH v3 00/14] Unified cross-architecture kernel-mode FPU API
This series unifies the kernel-mode FPU API across several architectures by wrapping the existing functions (where needed) in consistently-named functions placed in a consistent header location, with mostly the same semantics: they can be called from preemptible or non-preemptible task context, and are not assumed to be reentrant. Architectures are also expected to provide CFLAGS adjustments for compiling FPU-dependent code. For the moment, SIMD/vector units are out of scope for this common API. This allows us to remove the ifdeffery and duplicated Makefile logic at each FPU user. It then implements the common API on RISC-V, and converts a couple of users to the new API: the AMDGPU DRM driver, and the FPU self test. The underlying goal of this series is to allow using newer AMD GPUs (e.g. Navi) on RISC-V boards such as SiFive's HiFive Unmatched. Those GPUs need CONFIG_DRM_AMD_DC_FP to initialize, which requires kernel-mode FPU support. Previous versions: v2: https://lore.kernel.org/linux-kernel/20231228014220.3562640-1-samuel.holl...@sifive.com/ v1: https://lore.kernel.org/linux-kernel/20231208055501.2916202-1-samuel.holl...@sifive.com/ v0: https://lore.kernel.org/linux-kernel/20231122030621.3759313-1-samuel.holl...@sifive.com/ Changes in v3: - Rebase on v6.9-rc1 - Limit ARCH_HAS_KERNEL_FPU_SUPPORT to 64BIT Changes in v2: - Add documentation explaining the built-time and runtime APIs - Add a linux/fpu.h header for generic isolation enforcement - Remove file name from header comment - Clean up arch/arm64/lib/Makefile, like for arch/arm - Remove RISC-V architecture-specific preprocessor check - Split altivec removal to a separate patch - Use linux/fpu.h instead of asm/fpu.h in consumers - Declare test_fpu() in a header Michael Ellerman (1): drm/amd/display: Only use hard-float, not altivec on powerpc Samuel Holland (13): arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT riscv: Add support for kernel-mode FPU drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT selftests/fpu: Move FP code to a separate translation unit selftests/fpu: Allow building on other architectures Documentation/core-api/floating-point.rst | 78 +++ Documentation/core-api/index.rst | 1 + Makefile | 5 ++ arch/Kconfig | 6 ++ arch/arm/Kconfig | 1 + arch/arm/Makefile | 7 ++ arch/arm/include/asm/fpu.h| 15 arch/arm/lib/Makefile | 3 +- arch/arm64/Kconfig| 1 + arch/arm64/Makefile | 9 ++- arch/arm64/include/asm/fpu.h | 15 arch/arm64/lib/Makefile | 6 +- arch/loongarch/Kconfig| 1 + arch/loongarch/Makefile | 5 +- arch/loongarch/include/asm/fpu.h | 1 + arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 5 +- arch/powerpc/include/asm/fpu.h| 28 +++ arch/riscv/Kconfig| 1 + arch/riscv/Makefile | 3 + arch/riscv/include/asm/fpu.h | 16 arch/riscv/kernel/Makefile| 1 + arch/riscv/kernel/kernel_mode_fpu.c | 28 +++ arch/x86/Kconfig | 1 + arch/x86/Makefile | 20 + arch/x86/include/asm/fpu.h| 13 drivers/gpu/drm/amd/display/Kconfig | 2 +- .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c| 35 + drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 + drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 + include/linux/fpu.h | 12 +++ lib/Kconfig.debug | 2 +- lib/Makefile | 26 +-- lib/raid6/Makefile| 31 ++-- lib/test_fpu.h| 8 ++ lib/{test_fpu.c => test_fpu_glue.c} | 37 ++--- lib/test_fpu_impl.c | 37 + 37 files changed, 343 insertions(+), 190 deletions(-) create mode 100644 Documentation/core-api/floating-point.rst create mode 100644 arch/arm/include/asm/fpu.h create mode 100644 arch/arm64/include/asm/fpu.h create mode 100644 arch/powerpc/include/asm/fpu.h create mode 100644 arch/riscv/include/asm/fpu.h create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c create mode
[PATCH v3 02/14] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
ARM provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. Add a wrapper header, and export CFLAGS adjustments as found in lib/raid6/Makefile. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Remove file name from header comment arch/arm/Kconfig | 1 + arch/arm/Makefile | 7 +++ arch/arm/include/asm/fpu.h | 15 +++ 3 files changed, 23 insertions(+) create mode 100644 arch/arm/include/asm/fpu.h diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index b14aed3a17ab..b1751c2cab87 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -15,6 +15,7 @@ config ARM select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_KEEPINITRD select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PTE_SPECIAL if ARM_LPAE diff --git a/arch/arm/Makefile b/arch/arm/Makefile index d82908b1b1bb..71afdd98ddf2 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -130,6 +130,13 @@ endif # Accept old syntax despite ".syntax unified" AFLAGS_NOWARN :=$(call as-option,-Wa$(comma)-mno-warn-deprecated,-Wa$(comma)-W) +# The GCC option -ffreestanding is required in order to compile code containing +# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel) +CC_FLAGS_FPU := -ffreestanding +# Enable +CC_FLAGS_FPU += -isystem $(shell $(CC) -print-file-name=include) +CC_FLAGS_FPU += -march=armv7-a -mfloat-abi=softfp -mfpu=neon + ifeq ($(CONFIG_THUMB2_KERNEL),y) CFLAGS_ISA :=-Wa,-mimplicit-it=always $(AFLAGS_NOWARN) AFLAGS_ISA :=$(CFLAGS_ISA) -Wa$(comma)-mthumb diff --git a/arch/arm/include/asm/fpu.h b/arch/arm/include/asm/fpu.h new file mode 100644 index ..2ae50bdce59b --- /dev/null +++ b/arch/arm/include/asm/fpu.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef __ASM_FPU_H +#define __ASM_FPU_H + +#include + +#define kernel_fpu_available() cpu_has_neon() +#define kernel_fpu_begin() kernel_neon_begin() +#define kernel_fpu_end() kernel_neon_end() + +#endif /* ! __ASM_FPU_H */ -- 2.43.1
Re: [PATCH v2 12/14] sh: Add support for suppressing warning backtraces
On Wed, Mar 27, 2024 at 08:10:51AM -0700, Guenter Roeck wrote: > On 3/27/24 07:44, Simon Horman wrote: > > On Mon, Mar 25, 2024 at 10:52:46AM -0700, Guenter Roeck wrote: > > > Add name of functions triggering warning backtraces to the __bug_table > > > object section to enable support for suppressing WARNING backtraces. > > > > > > To limit image size impact, the pointer to the function name is only added > > > to the __bug_table section if both CONFIG_KUNIT_SUPPRESS_BACKTRACE and > > > CONFIG_DEBUG_BUGVERBOSE are enabled. Otherwise, the __func__ assembly > > > parameter is replaced with a (dummy) NULL parameter to avoid an image size > > > increase due to unused __func__ entries (this is necessary because > > > __func__ > > > is not a define but a virtual variable). > > > > > > Tested-by: Linux Kernel Functional Testing > > > Acked-by: Dan Carpenter > > > Signed-off-by: Guenter Roeck > > > --- > > > - Rebased to v6.9-rc1 > > > - Added Tested-by:, Acked-by:, and Reviewed-by: tags > > > - Introduced KUNIT_SUPPRESS_BACKTRACE configuration option > > > > > > arch/sh/include/asm/bug.h | 26 ++ > > > 1 file changed, 22 insertions(+), 4 deletions(-) > > > > > > diff --git a/arch/sh/include/asm/bug.h b/arch/sh/include/asm/bug.h > > > index 05a485c4fabc..470ce6567d20 100644 > > > --- a/arch/sh/include/asm/bug.h > > > +++ b/arch/sh/include/asm/bug.h > > > @@ -24,21 +24,36 @@ > > >* The offending file and line are encoded in the __bug_table section. > > >*/ > > > #ifdef CONFIG_DEBUG_BUGVERBOSE > > > + > > > +#ifdef CONFIG_KUNIT_SUPPRESS_BACKTRACE > > > +# define HAVE_BUG_FUNCTION > > > +# define __BUG_FUNC_PTR "\t.long %O2\n" > > > +#else > > > +# define __BUG_FUNC_PTR > > > +#endif /* CONFIG_KUNIT_SUPPRESS_BACKTRACE */ > > > + > > > > Hi Guenter, > > > > a minor nit from my side: this change results in a Kernel doc warning. > > > > .../bug.h:29: warning: expecting prototype for _EMIT_BUG_ENTRY(). > > Prototype was for HAVE_BUG_FUNCTION() instead > > > > Perhaps either the new code should be placed above the Kernel doc, > > or scripts/kernel-doc should be enhanced? > > > > Thanks a lot for the feedback. > > The definition block needs to be inside CONFIG_DEBUG_BUGVERBOSE, > so it would be a bit odd to move it above the documentation > just to make kerneldoc happy. I am not really sure that to do > about it. FWIIW, I agree that would be odd. But perhaps the #ifdef could also move above the Kernel doc? Maybe not a great idea, but the best one I've had so far. > I'll wait for comments from others before making any changes. > > Thanks, > Guenter > > > > #define _EMIT_BUG_ENTRY \ > > > "\t.pushsection __bug_table,\"aw\"\n" \ > > > "2:\t.long 1b, %O1\n" \ > > > - "\t.short %O2, %O3\n" \ > > > - "\t.org 2b+%O4\n" \ > > > + __BUG_FUNC_PTR \ > > > + "\t.short %O3, %O4\n" \ > > > + "\t.org 2b+%O5\n" \ > > > "\t.popsection\n" > > > #else > > > #define _EMIT_BUG_ENTRY \ > > > "\t.pushsection __bug_table,\"aw\"\n" \ > > > "2:\t.long 1b\n"\ > > > - "\t.short %O3\n"\ > > > - "\t.org 2b+%O4\n" \ > > > + "\t.short %O4\n"\ > > > + "\t.org 2b+%O5\n" \ > > > "\t.popsection\n" > > > #endif > > > +#ifdef HAVE_BUG_FUNCTION > > > +# define __BUG_FUNC __func__ > > > +#else > > > +# define __BUG_FUNC NULL > > > +#endif > > > + > > > #define BUG() \ > > > do {\ > > > __asm__ __volatile__ ( \ > > > > ... >
Re: [PATCH v3] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id
On Fri, 26 Jan 2024 14:44:51 +0800 Huang Shijie wrote: > During the kernel booting, the generic cpu_to_node() is called too early in > arm64, powerpc and riscv when CONFIG_NUMA is enabled. > > There are at least four places in the common code where > the generic cpu_to_node() is called before it is initialized: > 1.) early_trace_init() in kernel/trace/trace.c > 2.) sched_init() in kernel/sched/core.c > 3.) init_sched_fair_class()in kernel/sched/fair.c > 4.) workqueue_init_early() in kernel/workqueue.c > > In order to fix the bug, the patch introduces early_numa_node_init() > which is called after smp_prepare_boot_cpu() in start_kernel. > early_numa_node_init will initialize the "numa_node" as soon as > the early_cpu_to_node() is ready, before the cpu_to_node() is called > at the first time. What are the userspace-visible runtime effects of this bug?
[PATCH] usb: phy: MAINTAINERS: mark Freescale USB PHY as orphaned
Emails to the only maintainer bounce: : host nxp-com.mail.protection.outlook.com[52.101.68.39] said: 550 5.4.1 Recipient address rejected: Access denied. Signed-off-by: Krzysztof Kozlowski --- MAINTAINERS | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 51d5a64a5a36..b66812e99caf 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8760,10 +8760,9 @@ S: Maintained F: drivers/usb/gadget/udc/fsl* FREESCALE USB PHY DRIVER -M: Ran Wang L: linux-...@vger.kernel.org L: linuxppc-dev@lists.ozlabs.org -S: Maintained +S: Orphan F: drivers/usb/phy/phy-fsl-usb* FREEVXFS FILESYSTEM -- 2.34.1
[PATCH 2/2] usb: typec: nvidia: drop driver owner assignment
Core in typec_altmode_register_driver() already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski --- drivers/usb/typec/altmodes/nvidia.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/usb/typec/altmodes/nvidia.c b/drivers/usb/typec/altmodes/nvidia.c index c36769736405..fe70b36f078f 100644 --- a/drivers/usb/typec/altmodes/nvidia.c +++ b/drivers/usb/typec/altmodes/nvidia.c @@ -35,7 +35,6 @@ static struct typec_altmode_driver nvidia_altmode_driver = { .remove = nvidia_altmode_remove, .driver = { .name = "typec_nvidia", - .owner = THIS_MODULE, }, }; module_typec_altmode_driver(nvidia_altmode_driver); -- 2.34.1
[PATCH 1/2] usb: phy: fsl-usb: drop driver owner assignment
Core in platform_driver_register() already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski --- drivers/usb/phy/phy-fsl-usb.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/usb/phy/phy-fsl-usb.c b/drivers/usb/phy/phy-fsl-usb.c index 79617bb0a70e..1ebbf189a535 100644 --- a/drivers/usb/phy/phy-fsl-usb.c +++ b/drivers/usb/phy/phy-fsl-usb.c @@ -1005,7 +1005,6 @@ struct platform_driver fsl_otg_driver = { .remove_new = fsl_otg_remove, .driver = { .name = driver_name, - .owner = THIS_MODULE, }, }; -- 2.34.1
Re: [RFC PATCH 1/8] mm: Provide pagesize to pmd_populate()
On Wed, Mar 27, 2024 at 09:58:35AM +, Christophe Leroy wrote: > > Just general remarks on the ones with huge pages: > > > > hash 64k and hugepage 16M/16G > > radix 64k/radix hugepage 2M/1G > > radix 4k/radix hugepage 2M/1G > > nohash 32 > >- I think this is just a normal x86 like scheme? PMD/PUD can be a > > leaf with the same size as a next level table. > > > > Do any of these cases need to know the higher level to parse the > > lower? eg is there a 2M bit in the PUD indicating that the PMD > > is a table of 2M leafs or does each PMD entry have a bit > > indicating it is a leaf? > > For hash and radix there is a bit that tells it is leaf (_PAGE_PTE) > > For nohash32/e500 I think the drawing is not full right, there is a huge > page directory (hugepd) with a single entry. I think it should be > possible to change it to a leaf entry, it seems we have bit _PAGE_SW1 > available in the PTE. It sounds to me like PPC breaks down into only a couple fundamental behaviors - x86 like leaf in many page levels. Use the pgd/pud/pmd_leaf() and related to implement it - ARM like contig PTE within a single page table level. Use the contig sutff to implement it - Contig PTE across two page table levels with a bit in the PMD. Needs new support like you showed - Page table levels with a variable page size. Ie a PUD can point to a directory of 8 pages or 512 pages of different size. Probbaly needs some new core support, but I think your changes to the *_offset go a long way already. > > > > hash 4k and hugepage 16M/16G > > nohash 64 > >- How does this work? I guess since 8xx explicitly calls out > > consecutive this is actually the pgd can point to 512 256M > > entries or 8 16G entries? Ie the table size at each level is > > varable? Or is it the same and the table size is still 512 and > > each 16G entry is replicated 64 times? > > For those it is using the huge page directory (hugepd) which can be > hooked at any level and is a directory of huge pages on its own. There > is no consecutive entries involved here I think, allthough I'm not > completely sure. > > For hash4k I'm not sure how it works, this was changed by commit > e2b3d202d1db ("powerpc: Switch 16GB and 16MB explicit hugepages to a > different page table format") > > For the nohash/64, a PGD entry points either to a regular PUD directory > or to a HUGEPD directory. The size of the HUGEPD directory is encoded in > the 6 lower bits of the PGD entry. If it is a software walker there might be value in just aligning to the contig pte scheme in all levels and forgetting about the variable size page table levels. That quarter page stuff is a PITA to manage the memory allocation for on PPC anyhow.. Jason
Re: [PATCH RFC 0/3] mm/gup: consistently call it GUP-fast
On Wed, Mar 27, 2024, at 16:39, David Hildenbrand wrote: > On 27.03.24 16:21, Peter Xu wrote: >> On Wed, Mar 27, 2024 at 02:05:35PM +0100, David Hildenbrand wrote: >> >> I'm not sure what config you tried there; as I am doing some build tests >> recently, I found turning off CONFIG_SAMPLES + CONFIG_GCC_PLUGINS could >> avoid a lot of issues, I think it's due to libc missing. But maybe not the >> case there. > > CCin Arnd; I use some of his compiler chains, others from Fedora directly. For > example for alpha and arc, the Fedora gcc is "13.2.1". > > But there is other stuff like (arc): > > ./arch/arc/include/asm/mmu-arcv2.h: In function 'mmu_setup_asid': > ./arch/arc/include/asm/mmu-arcv2.h:82:9: error: implicit declaration of > function 'write_aux_reg' [-Werro > r=implicit-function-declaration] > 82 | write_aux_reg(ARC_REG_PID, asid | MMU_ENABLE); >| ^ Seems to be missing an #include of soc/arc/aux.h, but I can't tell when this first broke without bisecting. > or (alpha) > > WARNING: modpost: "saved_config" [vmlinux] is COMMON symbol > ERROR: modpost: "memcpy" [fs/reiserfs/reiserfs.ko] undefined! > ERROR: modpost: "memcpy" [fs/nfs/nfs.ko] undefined! > ERROR: modpost: "memcpy" [fs/nfs/nfsv3.ko] undefined! > ERROR: modpost: "memcpy" [fs/nfsd/nfsd.ko] undefined! > ERROR: modpost: "memcpy" [fs/lockd/lockd.ko] undefined! > ERROR: modpost: "memcpy" [crypto/crypto.ko] undefined! > ERROR: modpost: "memcpy" [crypto/crypto_algapi.ko] undefined! > ERROR: modpost: "memcpy" [crypto/aead.ko] undefined! > ERROR: modpost: "memcpy" [crypto/crypto_skcipher.ko] undefined! > ERROR: modpost: "memcpy" [crypto/seqiv.ko] undefined! Al did a series to fix various build problems on alpha, see https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/log/?h=work.alpha Not sure if he still has to send them to Matt, or if Matt just needs to apply them. I also have some alpha patches that I should send upstream. Arnd
Re: [PATCH v2 5/6] mm/mm_init.c: remove unneeded calc_memmap_size()
On Mon, Mar 25, 2024 at 10:56:45PM +0800, Baoquan He wrote: > Nobody calls calc_memmap_size() now. > > Signed-off-by: Baoquan He Reviewed-by: Mike Rapoport (IBM) Looks like I replied to patch 6/6 twice by mistake and missed this one. > --- > mm/mm_init.c | 20 > 1 file changed, 20 deletions(-) > > diff --git a/mm/mm_init.c b/mm/mm_init.c > index 7f71e56e83f3..e269a724f70e 100644 > --- a/mm/mm_init.c > +++ b/mm/mm_init.c > @@ -1331,26 +1331,6 @@ static void __init calculate_node_totalpages(struct > pglist_data *pgdat, > pr_debug("On node %d totalpages: %lu\n", pgdat->node_id, > realtotalpages); > } > > -static unsigned long __init calc_memmap_size(unsigned long spanned_pages, > - unsigned long present_pages) > -{ > - unsigned long pages = spanned_pages; > - > - /* > - * Provide a more accurate estimation if there are holes within > - * the zone and SPARSEMEM is in use. If there are holes within the > - * zone, each populated memory region may cost us one or two extra > - * memmap pages due to alignment because memmap pages for each > - * populated regions may not be naturally aligned on page boundary. > - * So the (present_pages >> 4) heuristic is a tradeoff for that. > - */ > - if (spanned_pages > present_pages + (present_pages >> 4) && > - IS_ENABLED(CONFIG_SPARSEMEM)) > - pages = present_pages; > - > - return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT; > -} > - > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > static void pgdat_init_split_queue(struct pglist_data *pgdat) > { > -- > 2.41.0 > -- Sincerely yours, Mike.
Re: [PATCH RFC 0/3] mm/gup: consistently call it GUP-fast
On 27.03.24 16:46, Ryan Roberts wrote: Some of them look like mm-unstable issue, For example, arm64 fails with CC arch/arm64/mm/extable.o In file included from ./include/linux/hugetlb.h:828, from security/commoncap.c:19: ./arch/arm64/include/asm/hugetlb.h:25:34: error: redefinition of 'arch_clear_hugetlb_flags' 25 | #define arch_clear_hugetlb_flags arch_clear_hugetlb_flags | ^~~~ ./include/linux/hugetlb.h:840:20: note: in expansion of macro 'arch_clear_hugetlb_flags' 840 | static inline void arch_clear_hugetlb_flags(struct folio *folio) { } | ^~~~ ./arch/arm64/include/asm/hugetlb.h:21:20: note: previous definition of 'arch_clear_hugetlb_flags' with t ype 'void(struct folio *)' 21 | static inline void arch_clear_hugetlb_flags(struct folio *folio) | ^~~~ In file included from ./include/linux/hugetlb.h:828, from mm/filemap.c:37: ./arch/arm64/include/asm/hugetlb.h:25:34: error: redefinition of 'arch_clear_hugetlb_flags' 25 | #define arch_clear_hugetlb_flags arch_clear_hugetlb_flags | ^~~~ ./include/linux/hugetlb.h:840:20: note: in expansion of macro 'arch_clear_hugetlb_flags' 840 | static inline void arch_clear_hugetlb_flags(struct folio *folio) { } | ^~~~ ./arch/arm64/include/asm/hugetlb.h:21:20: note: previous definition of 'arch_clear_hugetlb_flags' with type 'void(struct folio *)' 21 | static inline void arch_clear_hugetlb_flags(struct folio *folio) see: https://lore.kernel.org/linux-mm/zgqvnkgdldkwh...@casper.infradead.org/ Yes, besides the other failures I see (odd targets), I was expecting that someone else noticed that already :) thanks! -- Cheers, David / dhildenb
Re: [PATCH RFC 0/3] mm/gup: consistently call it GUP-fast
> > Some of them look like mm-unstable issue, For example, arm64 fails with > > CC arch/arm64/mm/extable.o > In file included from ./include/linux/hugetlb.h:828, > from security/commoncap.c:19: > ./arch/arm64/include/asm/hugetlb.h:25:34: error: redefinition of > 'arch_clear_hugetlb_flags' > 25 | #define arch_clear_hugetlb_flags arch_clear_hugetlb_flags > | ^~~~ > ./include/linux/hugetlb.h:840:20: note: in expansion of macro > 'arch_clear_hugetlb_flags' > 840 | static inline void arch_clear_hugetlb_flags(struct folio *folio) { } > | ^~~~ > ./arch/arm64/include/asm/hugetlb.h:21:20: note: previous definition of > 'arch_clear_hugetlb_flags' with t > ype 'void(struct folio *)' > 21 | static inline void arch_clear_hugetlb_flags(struct folio *folio) > | ^~~~ > In file included from ./include/linux/hugetlb.h:828, > from mm/filemap.c:37: > ./arch/arm64/include/asm/hugetlb.h:25:34: error: redefinition of > 'arch_clear_hugetlb_flags' > 25 | #define arch_clear_hugetlb_flags arch_clear_hugetlb_flags > | ^~~~ > ./include/linux/hugetlb.h:840:20: note: in expansion of macro > 'arch_clear_hugetlb_flags' > 840 | static inline void arch_clear_hugetlb_flags(struct folio *folio) { } > | ^~~~ > ./arch/arm64/include/asm/hugetlb.h:21:20: note: previous definition of > 'arch_clear_hugetlb_flags' with type 'void(struct folio *)' > 21 | static inline void arch_clear_hugetlb_flags(struct folio *folio) see: https://lore.kernel.org/linux-mm/zgqvnkgdldkwh...@casper.infradead.org/
Re: [PATCH v2 6/6] mm/mm_init.c: remove arch_reserved_kernel_pages()
On Mon, Mar 25, 2024 at 10:56:46PM +0800, Baoquan He wrote: > Since the current calculation of calc_nr_kernel_pages() has taken into > consideration of kernel reserved memory, no need to have > arch_reserved_kernel_pages() any more. > > Signed-off-by: Baoquan He Reviewed-by: Mike Rapoport (IBM) > --- > arch/powerpc/include/asm/mmu.h | 4 > arch/powerpc/kernel/fadump.c | 5 - > include/linux/mm.h | 3 --- > mm/mm_init.c | 12 > 4 files changed, 24 deletions(-) >
Re: [PATCH v2 4/6] mm/mm_init.c: remove meaningless calculation of zone->managed_pages in free_area_init_core()
On Mon, Mar 25, 2024 at 10:56:44PM +0800, Baoquan He wrote: > Currently, in free_area_init_core(), when initialize zone's field, a > rough value is set to zone->managed_pages. That value is calculated by > (zone->present_pages - memmap_pages). > > In the meantime, add the value to nr_all_pages and nr_kernel_pages which > represent all free pages of system (only low memory or including HIGHMEM > memory separately). Both of them are gonna be used in > alloc_large_system_hash(). > > However, the rough calculation and setting of zone->managed_pages is > meaningless because > a) memmap pages are allocated on units of node in sparse_init() or > alloc_node_mem_map(pgdat); The simple (zone->present_pages - > memmap_pages) is too rough to make sense for zone; > b) the set zone->managed_pages will be zeroed out and reset with > acutal value in mem_init() via memblock_free_all(). Before the > resetting, no buddy allocation request is issued. > > Here, remove the meaningless and complicated calculation of > (zone->present_pages - memmap_pages), initialize zone->managed_pages as 0 > which reflect its actual value because no any page is added into buddy > system right now. It will be reset in mem_init(). > > And also remove the assignment of nr_all_pages and nr_kernel_pages in > free_area_init_core(). Instead, call the newly added calc_nr_kernel_pages() > to count up all free but not reserved memory in memblock and assign to > nr_all_pages and nr_kernel_pages. The counting excludes memmap_pages, > and other kernel used data, which is more accurate than old way and > simpler, and can also cover the ppc required arch_reserved_kernel_pages() > case. > > And also clean up the outdated code comment above free_area_init_core(). > And free_area_init_core() is easy to understand now, no need to add > words to explain. > > Signed-off-by: Baoquan He Reviewed-by: Mike Rapoport (IBM) > --- > mm/mm_init.c | 46 +- > 1 file changed, 5 insertions(+), 41 deletions(-)
Re: [PATCH v2 3/6] mm/mm_init.c: add new function calc_nr_all_pages()
On Mon, Mar 25, 2024 at 10:56:43PM +0800, Baoquan He wrote: > This is a preparation to calculate nr_kernel_pages and nr_all_pages, > both of which will be used later in alloc_large_system_hash(). > > nr_all_pages counts up all free but not reserved memory in memblock > allocator, including HIGHMEM memory. While nr_kernel_pages counts up > all free but not reserved low memory in memblock allocator, excluding > HIGHMEM memory. > > Signed-off-by: Baoquan He Reviewed-by: Mike Rapoport (IBM) > --- > mm/mm_init.c | 24 > 1 file changed, 24 insertions(+) > > diff --git a/mm/mm_init.c b/mm/mm_init.c > index 153fb2dc666f..c57a7fc97a16 100644 > --- a/mm/mm_init.c > +++ b/mm/mm_init.c > @@ -1264,6 +1264,30 @@ static void __init > reset_memoryless_node_totalpages(struct pglist_data *pgdat) > pr_debug("On node %d totalpages: 0\n", pgdat->node_id); > } > > +static void __init calc_nr_kernel_pages(void) > +{ > + unsigned long start_pfn, end_pfn; > + phys_addr_t start_addr, end_addr; > + u64 u; > +#ifdef CONFIG_HIGHMEM > + unsigned long high_zone_low = > arch_zone_lowest_possible_pfn[ZONE_HIGHMEM]; > +#endif > + > + for_each_free_mem_range(u, NUMA_NO_NODE, MEMBLOCK_NONE, _addr, > _addr, NULL) { > + start_pfn = PFN_UP(start_addr); > + end_pfn = PFN_DOWN(end_addr); > + > + if (start_pfn < end_pfn) { > + nr_all_pages += end_pfn - start_pfn; > +#ifdef CONFIG_HIGHMEM > + start_pfn = clamp(start_pfn, 0, high_zone_low); > + end_pfn = clamp(end_pfn, 0, high_zone_low); > +#endif > + nr_kernel_pages += end_pfn - start_pfn; > + } > + } > +} > + > static void __init calculate_node_totalpages(struct pglist_data *pgdat, > unsigned long node_start_pfn, > unsigned long node_end_pfn) > -- > 2.41.0 > -- Sincerely yours, Mike.
Re: [PATCH v7 6/6] docs: trusted-encrypted: add DCP as new trust source
On Wed Mar 27, 2024 at 10:24 AM EET, David Gstir wrote: > Update the documentation for trusted and encrypted KEYS with DCP as new > trust source: > > - Describe security properties of DCP trust source > - Describe key usage > - Document blob format > > Co-developed-by: Richard Weinberger > Signed-off-by: Richard Weinberger > Co-developed-by: David Oberhollenzer > Signed-off-by: David Oberhollenzer > Signed-off-by: David Gstir > --- > .../security/keys/trusted-encrypted.rst | 85 +++ > 1 file changed, 85 insertions(+) > > diff --git a/Documentation/security/keys/trusted-encrypted.rst > b/Documentation/security/keys/trusted-encrypted.rst > index e989b9802f92..81fb3540bb20 100644 > --- a/Documentation/security/keys/trusted-encrypted.rst > +++ b/Documentation/security/keys/trusted-encrypted.rst > @@ -42,6 +42,14 @@ safe. > randomly generated and fused into each SoC at manufacturing time. > Otherwise, a common fixed test key is used instead. > > + (4) DCP (Data Co-Processor: crypto accelerator of various i.MX SoCs) > + > + Rooted to a one-time programmable key (OTP) that is generally burnt > + in the on-chip fuses and is accessible to the DCP encryption engine > only. > + DCP provides two keys that can be used as root of trust: the OTP key > + and the UNIQUE key. Default is to use the UNIQUE key, but selecting > + the OTP key can be done via a module parameter (dcp_use_otp_key). > + >* Execution isolation > > (1) TPM > @@ -57,6 +65,12 @@ safe. > > Fixed set of operations running in isolated execution environment. > > + (4) DCP > + > + Fixed set of cryptographic operations running in isolated execution > + environment. Only basic blob key encryption is executed there. > + The actual key sealing/unsealing is done on main processor/kernel > space. > + >* Optional binding to platform integrity state > > (1) TPM > @@ -79,6 +93,11 @@ safe. > Relies on the High Assurance Boot (HAB) mechanism of NXP SoCs > for platform integrity. > > + (4) DCP > + > + Relies on Secure/Trusted boot process (called HAB by vendor) for > + platform integrity. > + >* Interfaces and APIs > > (1) TPM > @@ -94,6 +113,11 @@ safe. > > Interface is specific to silicon vendor. > > + (4) DCP > + > + Vendor-specific API that is implemented as part of the DCP crypto > driver in > + ``drivers/crypto/mxs-dcp.c``. > + >* Threat model > > The strength and appropriateness of a particular trust source for a > given > @@ -129,6 +153,13 @@ selected trust source: > CAAM HWRNG, enable CRYPTO_DEV_FSL_CAAM_RNG_API and ensure the device > is probed. > > + * DCP (Data Co-Processor: crypto accelerator of various i.MX SoCs) > + > + The DCP hardware device itself does not provide a dedicated RNG > interface, > + so the kernel default RNG is used. SoCs with DCP like the i.MX6ULL do > have > + a dedicated hardware RNG that is independent from DCP which can be > enabled > + to back the kernel RNG. > + > Users may override this by specifying ``trusted.rng=kernel`` on the kernel > command-line to override the used RNG with the kernel's random number pool. > > @@ -231,6 +262,19 @@ Usage:: > CAAM-specific format. The key length for new keys is always in bytes. > Trusted Keys can be 32 - 128 bytes (256 - 1024 bits). > > +Trusted Keys usage: DCP > +--- > + > +Usage:: > + > +keyctl add trusted name "new keylen" ring > +keyctl add trusted name "load hex_blob" ring > +keyctl print keyid > + > +"keyctl print" returns an ASCII hex copy of the sealed key, which is in > format > +specific to this DCP key-blob implementation. The key length for new keys is > +always in bytes. Trusted Keys can be 32 - 128 bytes (256 - 1024 bits). > + > Encrypted Keys usage > > > @@ -426,3 +470,44 @@ string length. > privkey is the binary representation of TPM2B_PUBLIC excluding the > initial TPM2B header which can be reconstructed from the ASN.1 octed > string length. > + > +DCP Blob Format > +--- > + > +The Data Co-Processor (DCP) provides hardware-bound AES keys using its > +AES encryption engine only. It does not provide direct key sealing/unsealing. > +To make DCP hardware encryption keys usable as trust source, we define > +our own custom format that uses a hardware-bound key to secure the sealing > +key stored in the key blob. > + > +Whenever a new trusted key using DCP is generated, we generate a random > 128-bit > +blob encryption key (BEK) and 128-bit nonce. The BEK and nonce are used to > +encrypt the trusted key payload using AES-128-GCM. > + > +The BEK itself is encrypted using the hardware-bound key using the DCP's AES > +encryption engine with AES-128-ECB. The encrypted BEK, generated nonce, >
Re: [PATCH RFC 0/3] mm/gup: consistently call it GUP-fast
On 27.03.24 16:21, Peter Xu wrote: On Wed, Mar 27, 2024 at 02:05:35PM +0100, David Hildenbrand wrote: Some cleanups around function names, comments and the config option of "GUP-fast" -- GUP without "lock" safety belts on. With this cleanup it's easy to judge which functions are GUP-fast specific. We now consistently call it "GUP-fast", avoiding mixing it with "fast GUP", "lockless", or simply "gup" (which I always considered confusing in the ode). So the magic now happens in functions that contain "gup_fast", whereby gup_fast() is the entry point into that magic. Comments consistently reference either "GUP-fast" or "gup_fast()". Based on mm-unstable from today. I won't CC arch maintainers, but only arch mailing lists, to reduce noise. Tested on x86_64, cross compiled on a bunch of archs, whereby some of them don't properly even compile on mm-unstable anymore in my usual setup (alpha, arc, parisc64, sh) ... maybe the cross compilers are outdated, but there are no new ones around. Hm. I'm not sure what config you tried there; as I am doing some build tests recently, I found turning off CONFIG_SAMPLES + CONFIG_GCC_PLUGINS could avoid a lot of issues, I think it's due to libc missing. But maybe not the case there. CCin Arnd; I use some of his compiler chains, others from Fedora directly. For example for alpha and arc, the Fedora gcc is "13.2.1". I compile quite some targets, usually with defconfig. From my compile script: # COMPILER NAME ARCH CROSS_COMPILE CONFIG(if different from defconfig) compile_gcc "alpha" "alpha" "alpha-linux-gnu-" compile_gcc "arc" "arc" "arc-linux-gnu-" compile_gcc "arm" "arm" "arm-linux-gnu-" "axm55xx_defconfig" compile_gcc "arm-nommu" "arm" "arm-linux-gnu-" "imxrt_defconfig" compile_gcc "arm64" "arm64" "aarch64-linux-gnu-" compile_gcc "csky" "csky" "../cross/gcc-13.2.0-nolibc/csky-linux/bin/csky-linux-" compile_gcc "loongarch" "loongarch" "../cross/gcc-13.2.0-nolibc/loongarch64-linux/bin/loongarch64-linux-" compile_gcc "m68k-nommu" "m68k" "m68k-linux-gnu-" "amcore_defconfig" compile_gcc "m68k-sun3" "m68k" "m68k-linux-gnu-" "sun3_defconfig" compile_gcc "m68k-coldfire" "m68k" "m68k-linux-gnu-" "m5475evb_defconfig" compile_gcc "m68k-virt" "m68k" "m68k-linux-gnu-" "virt_defconfig" compile_gcc "microblaze" "microblaze" "microblaze-linux-gnu-" compile_gcc "mips64" "mips" "mips64-linux-gnu-" "bigsur_defconfig" compile_gcc "mips32-xpa" "mips" "mips64-linux-gnu-" "maltaup_xpa_defconfig" compile_gcc "mips32-alchemy" "mips" "mips64-linux-gnu-" "gpr_defconfig" compile_gcc "mips32" "mips" "mips64-linux-gnu-" compile_gcc "nios2" "nios2" "nios2-linux-gnu-" "3c120_defconfig" compile_gcc "openrisc" "openrisc" "../cross/gcc-13.2.0-nolibc/or1k-linux/bin/or1k-linux-" "virt_defconfig" compile_gcc "parisc32" "parisc" "hppa-linux-gnu-" "generic-32bit_defconfig" compile_gcc "parisc64" "parisc" "hppa64-linux-gnu-" "generic-64bit_defconfig" compile_gcc "riscv32" "riscv" "riscv64-linux-gnu-" "32-bit.config" compile_gcc "riscv64" "riscv" "riscv64-linux-gnu-" "64-bit.config" compile_gcc "riscv64-nommu" "riscv" "riscv64-linux-gnu-" "nommu_virt_defconfig" compile_gcc "s390x" "s390" "s390x-linux-gnu-" compile_gcc "sh" "sh" "../cross/gcc-13.2.0-nolibc/sh4-linux/bin/sh4-linux-" compile_gcc "sparc32" "sparc" "../cross/gcc-13.2.0-nolibc/sparc-linux/bin/sparc-linux-" "sparc32_defconfig" compile_gcc "sparc64" "sparc" "../cross/gcc-13.2.0-nolibc/sparc64-linux/bin/sparc64-linux-" "sparc64_defconfig" compile_gcc "uml64" "um" "" "x86_64_defconfig" compile_gcc "x86" "x86" "" "i386_defconfig" compile_gcc "x86-pae" "x86" "" "i386_defconfig" compile_gcc "x86_64" "x86" "" compile_gcc "xtensa" "xtensa" "../cross/gcc-13.2.0-nolibc/xtensa-linux/bin/xtensa-linux-" "virt_defconfig" compile_gcc "powernv" "powerpc" "../cross/gcc-13.2.0-nolibc/powerpc64-linux/bin/powerpc64-linux-" "powernv_defconfig" compile_gcc "pseries" "powerpc" "../cross/gcc-13.2.0-nolibc/powerpc64-linux/bin/powerpc64-linux-" "pseries_defconfig" Some of them look like mm-unstable issue, For example, arm64 fails with CC arch/arm64/mm/extable.o In file included from ./include/linux/hugetlb.h:828, from security/commoncap.c:19: ./arch/arm64/include/asm/hugetlb.h:25:34: error: redefinition of 'arch_clear_hugetlb_flags' 25 | #define arch_clear_hugetlb_flags arch_clear_hugetlb_flags | ^~~~ ./include/linux/hugetlb.h:840:20: note: in expansion of macro 'arch_clear_hugetlb_flags' 840 | static inline void arch_clear_hugetlb_flags(struct folio *folio) { } |^~~~ ./arch/arm64/include/asm/hugetlb.h:21:20: note: previous definition of 'arch_clear_hugetlb_flags' with t ype 'void(struct folio *)' 21 | static inline void arch_clear_hugetlb_flags(struct folio *folio) |^~~~ In file included from ./include/linux/hugetlb.h:828, from mm/filemap.c:37:
Re: [PATCH v7 5/6] docs: document DCP-backed trusted keys kernel params
On Wed Mar 27, 2024 at 10:24 AM EET, David Gstir wrote: > Document the kernel parameters trusted.dcp_use_otp_key > and trusted.dcp_skip_zk_test for DCP-backed trusted keys. > > Co-developed-by: Richard Weinberger > Signed-off-by: Richard Weinberger > Co-developed-by: David Oberhollenzer > Signed-off-by: David Oberhollenzer > Signed-off-by: David Gstir > --- > Documentation/admin-guide/kernel-parameters.txt | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt > b/Documentation/admin-guide/kernel-parameters.txt > index 24c02c704049..b6944e57768a 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -6698,6 +6698,7 @@ > - "tpm" > - "tee" > - "caam" > + - "dcp" > If not specified then it defaults to iterating through > the trust source list starting with TPM and assigns the > first trust source as a backend which is initialized > @@ -6713,6 +6714,18 @@ > If not specified, "default" is used. In this case, > the RNG's choice is left to each individual trust > source. > > + trusted.dcp_use_otp_key > + This is intended to be used in combination with > + trusted.source=dcp and will select the DCP OTP key > + instead of the DCP UNIQUE key blob encryption. > + > + trusted.dcp_skip_zk_test > + This is intended to be used in combination with > + trusted.source=dcp and will disable the check if all > + the blob key is zero'ed. This is helpful for situations > where > + having this key zero'ed is acceptable. E.g. in testing > + scenarios. > + > tsc=Disable clocksource stability checks for TSC. > Format: > [x86] reliable: mark tsc clocksource as reliable, this Nicely documented, i.e. even I can understand what is said here :-) Reviewed-by: Jarkko Sakkinen BR, Jarkko
Re: [PATCH v7 2/6] KEYS: trusted: improve scalability of trust source config
On Wed Mar 27, 2024 at 10:24 AM EET, David Gstir wrote: > Enabling trusted keys requires at least one trust source implementation > (currently TPM, TEE or CAAM) to be enabled. Currently, this is > done by checking each trust source's config option individually. > This does not scale when more trust sources like the one for DCP > are added, because the condition will get long and hard to read. > > Add config HAVE_TRUSTED_KEYS which is set to true by each trust source > once its enabled and adapt the check for having at least one active trust > source to use this option. Whenever a new trust source is added, it now > needs to select HAVE_TRUSTED_KEYS. > > Signed-off-by: David Gstir > --- > security/keys/trusted-keys/Kconfig | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/security/keys/trusted-keys/Kconfig > b/security/keys/trusted-keys/Kconfig > index dbfdd8536468..553dc117f385 100644 > --- a/security/keys/trusted-keys/Kconfig > +++ b/security/keys/trusted-keys/Kconfig > @@ -1,3 +1,6 @@ > +config HAVE_TRUSTED_KEYS > + bool > + > config TRUSTED_KEYS_TPM > bool "TPM-based trusted keys" > depends on TCG_TPM >= TRUSTED_KEYS > @@ -9,6 +12,7 @@ config TRUSTED_KEYS_TPM > select ASN1_ENCODER > select OID_REGISTRY > select ASN1 > + select HAVE_TRUSTED_KEYS > help > Enable use of the Trusted Platform Module (TPM) as trusted key > backend. Trusted keys are random number symmetric keys, > @@ -20,6 +24,7 @@ config TRUSTED_KEYS_TEE > bool "TEE-based trusted keys" > depends on TEE >= TRUSTED_KEYS > default y > + select HAVE_TRUSTED_KEYS > help > Enable use of the Trusted Execution Environment (TEE) as trusted > key backend. > @@ -29,10 +34,11 @@ config TRUSTED_KEYS_CAAM > depends on CRYPTO_DEV_FSL_CAAM_JR >= TRUSTED_KEYS > select CRYPTO_DEV_FSL_CAAM_BLOB_GEN > default y > + select HAVE_TRUSTED_KEYS > help > Enable use of NXP's Cryptographic Accelerator and Assurance Module > (CAAM) as trusted key backend. > > -if !TRUSTED_KEYS_TPM && !TRUSTED_KEYS_TEE && !TRUSTED_KEYS_CAAM > -comment "No trust source selected!" > +if !HAVE_TRUSTED_KEYS > + comment "No trust source selected!" > endif Tested-by: Jarkko Sakkinen # for TRUSTED_KEYS_TPM Reviewed-by: Jarkko Sakkinen BR, Jarkko
[PATCH v4 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code
From: Peter Xu Now follow_page() is ready to handle hugetlb pages in whatever form, and over all architectures. Switch to the generic code path. Time to retire hugetlb_follow_page_mask(), following the previous retirement of follow_hugetlb_page() in 4849807114b8. There may be a slight difference of how the loops run when processing slow GUP over a large hugetlb range on cont_pte/cont_pmd supported archs: each loop of __get_user_pages() will resolve one pgtable entry with the patch applied, rather than relying on the size of hugetlb hstate, the latter may cover multiple entries in one loop. A quick performance test on an aarch64 VM on M1 chip shows 15% degrade over a tight loop of slow gup after the path switched. That shouldn't be a problem because slow-gup should not be a hot path for GUP in general: when page is commonly present, fast-gup will already succeed, while when the page is indeed missing and require a follow up page fault, the slow gup degrade will probably buried in the fault paths anyway. It also explains why slow gup for THP used to be very slow before 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"") lands, the latter not part of a performance analysis but a side benefit. If the performance will be a concern, we can consider handle CONT_PTE in follow_page(). Before that is justified to be necessary, keep everything clean and simple. Reviewed-by: Jason Gunthorpe Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 7 mm/gup.c| 15 +++-- mm/hugetlb.c| 71 - 3 files changed, 5 insertions(+), 88 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 294c78b3549f..a546140f89cd 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -328,13 +328,6 @@ static inline void hugetlb_zap_end( { } -static inline struct page *hugetlb_follow_page_mask( -struct vm_area_struct *vma, unsigned long address, unsigned int flags, -unsigned int *page_mask) -{ - BUILD_BUG(); /* should never be compiled in if !CONFIG_HUGETLB_PAGE*/ -} - static inline int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *dst_vma, diff --git a/mm/gup.c b/mm/gup.c index a02463c9420e..c803d0b0f358 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1135,18 +1135,11 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, { pgd_t *pgd; struct mm_struct *mm = vma->vm_mm; + struct page *page; - ctx->page_mask = 0; - - /* -* Call hugetlb_follow_page_mask for hugetlb vmas as it will use -* special hugetlb page table walking code. This eliminates the -* need to check for hugetlb entries in the general walking code. -*/ - if (is_vm_hugetlb_page(vma)) - return hugetlb_follow_page_mask(vma, address, flags, - >page_mask); + vma_pgtable_walk_begin(vma); + ctx->page_mask = 0; pgd = pgd_offset(mm, address); if (unlikely(is_hugepd(__hugepd(pgd_val(*pgd) @@ -1157,6 +1150,8 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, else page = follow_p4d_mask(vma, address, pgd, flags, ctx); + vma_pgtable_walk_end(vma); + return page; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 65b9c9a48fd2..cc79891a3597 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6870,77 +6870,6 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, } #endif /* CONFIG_USERFAULTFD */ -struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, - unsigned long address, unsigned int flags, - unsigned int *page_mask) -{ - struct hstate *h = hstate_vma(vma); - struct mm_struct *mm = vma->vm_mm; - unsigned long haddr = address & huge_page_mask(h); - struct page *page = NULL; - spinlock_t *ptl; - pte_t *pte, entry; - int ret; - - hugetlb_vma_lock_read(vma); - pte = hugetlb_walk(vma, haddr, huge_page_size(h)); - if (!pte) - goto out_unlock; - - ptl = huge_pte_lock(h, mm, pte); - entry = huge_ptep_get(pte); - if (pte_present(entry)) { - page = pte_page(entry); - - if (!huge_pte_write(entry)) { - if (flags & FOLL_WRITE) { - page = NULL; - goto out; - } - - if (gup_must_unshare(vma, flags, page)) { - /* Tell the caller to do unsharing */ - page = ERR_PTR(-EMLINK); - goto out; - } - } - - page = nth_page(page,
[PATCH v4 11/13] mm/gup: Handle huge pmd for follow_pmd_mask()
From: Peter Xu Replace pmd_trans_huge() with pmd_leaf() to also cover pmd_huge() as long as enabled. FOLL_TOUCH and FOLL_SPLIT_PMD only apply to THP, not yet huge. Since now follow_trans_huge_pmd() can process hugetlb pages, renaming it into follow_huge_pmd() to match what it does. Move it into gup.c so not depend on CONFIG_THP. When at it, move the ctx->page_mask setup into follow_huge_pmd(), only set it when the page is valid. It was not a bug to set it before even if GUP failed (page==NULL), because follow_page_mask() callers always ignores page_mask if so. But doing so makes the code cleaner. Reviewed-by: Jason Gunthorpe Signed-off-by: Peter Xu --- mm/gup.c | 107 --- mm/huge_memory.c | 86 + mm/internal.h| 5 +-- 3 files changed, 105 insertions(+), 93 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 1e5d42211bb4..a81184b01276 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -580,6 +580,93 @@ static struct page *follow_huge_pud(struct vm_area_struct *vma, return page; } + +/* FOLL_FORCE can write to even unwritable PMDs in COW mappings. */ +static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page, + struct vm_area_struct *vma, + unsigned int flags) +{ + /* If the pmd is writable, we can write to the page. */ + if (pmd_write(pmd)) + return true; + + /* Maybe FOLL_FORCE is set to override it? */ + if (!(flags & FOLL_FORCE)) + return false; + + /* But FOLL_FORCE has no effect on shared mappings */ + if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED)) + return false; + + /* ... or read-only private ones */ + if (!(vma->vm_flags & VM_MAYWRITE)) + return false; + + /* ... or already writable ones that just need to take a write fault */ + if (vma->vm_flags & VM_WRITE) + return false; + + /* +* See can_change_pte_writable(): we broke COW and could map the page +* writable if we have an exclusive anonymous page ... +*/ + if (!page || !PageAnon(page) || !PageAnonExclusive(page)) + return false; + + /* ... and a write-fault isn't required for other reasons. */ + if (vma_soft_dirty_enabled(vma) && !pmd_soft_dirty(pmd)) + return false; + return !userfaultfd_huge_pmd_wp(vma, pmd); +} + +static struct page *follow_huge_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmd, + unsigned int flags, + struct follow_page_context *ctx) +{ + struct mm_struct *mm = vma->vm_mm; + pmd_t pmdval = *pmd; + struct page *page; + int ret; + + assert_spin_locked(pmd_lockptr(mm, pmd)); + + page = pmd_page(pmdval); + VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page); + + if ((flags & FOLL_WRITE) && + !can_follow_write_pmd(pmdval, page, vma, flags)) + return NULL; + + /* Avoid dumping huge zero page */ + if ((flags & FOLL_DUMP) && is_huge_zero_pmd(pmdval)) + return ERR_PTR(-EFAULT); + + if (pmd_protnone(*pmd) && !gup_can_follow_protnone(vma, flags)) + return NULL; + + if (!pmd_write(pmdval) && gup_must_unshare(vma, flags, page)) + return ERR_PTR(-EMLINK); + + VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && + !PageAnonExclusive(page), page); + + ret = try_grab_page(page, flags); + if (ret) + return ERR_PTR(ret); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if (pmd_trans_huge(pmdval) && (flags & FOLL_TOUCH)) + touch_pmd(vma, addr, pmd, flags & FOLL_WRITE); +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + + page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT; + ctx->page_mask = HPAGE_PMD_NR - 1; + VM_BUG_ON_PAGE(!PageCompound(page) && !is_zone_device_page(page), page); + + return page; +} + #else /* CONFIG_PGTABLE_HAS_HUGE_LEAVES */ static struct page *follow_huge_pud(struct vm_area_struct *vma, unsigned long addr, pud_t *pudp, @@ -587,6 +674,14 @@ static struct page *follow_huge_pud(struct vm_area_struct *vma, { return NULL; } + +static struct page *follow_huge_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmd, + unsigned int flags, + struct follow_page_context *ctx) +{ + return NULL; +} #endif /* CONFIG_PGTABLE_HAS_HUGE_LEAVES */ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address, @@ -784,31 +879,31 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
[PATCH v4 12/13] mm/gup: Handle hugepd for follow_page()
From: Peter Xu Hugepd is only used in PowerPC so far on 4K page size kernels where hash mmu is used. follow_page_mask() used to leverage hugetlb APIs to access hugepd entries. Teach follow_page_mask() itself on hugepd. With previous refactors on fast-gup gup_huge_pd(), most of the code can be leveraged. There's something not needed for follow page, for example, gup_hugepte() tries to detect pgtable entry change which will never happen with slow gup (which has the pgtable lock held), but that's not a problem to check. Since follow_page() always only fetch one page, set the end to "address + PAGE_SIZE" should suffice. We will still do the pgtable walk once for each hugetlb page by setting ctx->page_mask properly. One thing worth mentioning is that some level of pgtable's _bad() helper will report is_hugepd() entries as TRUE on Power8 hash MMUs. I think it at least applies to PUD on Power8 with 4K pgsize. It means feeding a hugepd entry to pud_bad() will report a false positive. Let's leave that for now because it can be arch-specific where I am a bit declined to touch. In this patch it's not a problem as long as hugepd is detected before any bad pgtable entries. To allow slow gup like follow_*_page() to access hugepd helpers, hugepd codes are moved to the top. Besides that, the helper record_subpages() will be used by either hugepd or fast-gup now. To avoid "unused function" warnings we must provide a "#ifdef" for it, unfortunately. Signed-off-by: Peter Xu --- mm/gup.c | 269 +-- 1 file changed, 163 insertions(+), 106 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index a81184b01276..a02463c9420e 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -500,6 +500,149 @@ static inline void mm_set_has_pinned_flag(unsigned long *mm_flags) } #ifdef CONFIG_MMU + +#if defined(CONFIG_ARCH_HAS_HUGEPD) || defined(CONFIG_HAVE_FAST_GUP) +static int record_subpages(struct page *page, unsigned long sz, + unsigned long addr, unsigned long end, + struct page **pages) +{ + struct page *start_page; + int nr; + + start_page = nth_page(page, (addr & (sz - 1)) >> PAGE_SHIFT); + for (nr = 0; addr != end; nr++, addr += PAGE_SIZE) + pages[nr] = nth_page(start_page, nr); + + return nr; +} +#endif /* CONFIG_ARCH_HAS_HUGEPD || CONFIG_HAVE_FAST_GUP */ + +#ifdef CONFIG_ARCH_HAS_HUGEPD +static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end, + unsigned long sz) +{ + unsigned long __boundary = (addr + sz) & ~(sz-1); + return (__boundary - 1 < end - 1) ? __boundary : end; +} + +static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, + unsigned long end, unsigned int flags, + struct page **pages, int *nr) +{ + unsigned long pte_end; + struct page *page; + struct folio *folio; + pte_t pte; + int refs; + + pte_end = (addr + sz) & ~(sz-1); + if (pte_end < end) + end = pte_end; + + pte = huge_ptep_get(ptep); + + if (!pte_access_permitted(pte, flags & FOLL_WRITE)) + return 0; + + /* hugepages are never "special" */ + VM_BUG_ON(!pfn_valid(pte_pfn(pte))); + + page = pte_page(pte); + refs = record_subpages(page, sz, addr, end, pages + *nr); + + folio = try_grab_folio(page, refs, flags); + if (!folio) + return 0; + + if (unlikely(pte_val(pte) != pte_val(ptep_get(ptep { + gup_put_folio(folio, refs, flags); + return 0; + } + + if (!pte_write(pte) && gup_must_unshare(NULL, flags, >page)) { + gup_put_folio(folio, refs, flags); + return 0; + } + + *nr += refs; + folio_set_referenced(folio); + return 1; +} + +/* + * NOTE: currently GUP for a hugepd is only possible on hugetlbfs file + * systems on Power, which does not have issue with folio writeback against + * GUP updates. When hugepd will be extended to support non-hugetlbfs or + * even anonymous memory, we need to do extra check as what we do with most + * of the other folios. See writable_file_mapping_allowed() and + * folio_fast_pin_allowed() for more information. + */ +static int gup_huge_pd(hugepd_t hugepd, unsigned long addr, + unsigned int pdshift, unsigned long end, unsigned int flags, + struct page **pages, int *nr) +{ + pte_t *ptep; + unsigned long sz = 1UL << hugepd_shift(hugepd); + unsigned long next; + + ptep = hugepte_offset(hugepd, addr, pdshift); + do { + next = hugepte_addr_end(addr, end, sz); + if (!gup_hugepte(ptep, sz, addr, end, flags, pages, nr)) + return 0; + } while (ptep++, addr = next, addr != end); + + return 1; +} + +static struct page
[PATCH v4 10/13] mm/gup: Handle huge pud for follow_pud_mask()
From: Peter Xu Teach follow_pud_mask() to be able to handle normal PUD pages like hugetlb. Rename follow_devmap_pud() to follow_huge_pud() so that it can process either huge devmap or hugetlb. Move it out of TRANSPARENT_HUGEPAGE_PUD and and huge_memory.c (which relies on CONFIG_THP). Switch to pud_leaf() to detect both cases in the slow gup. In the new follow_huge_pud(), taking care of possible CoR for hugetlb if necessary. touch_pud() needs to be moved out of huge_memory.c to be accessable from gup.c even if !THP. Since at it, optimize the non-present check by adding a pud_present() early check before taking the pgtable lock, failing the follow_page() early if PUD is not present: that is required by both devmap or hugetlb. Use pud_huge() to also cover the pud_devmap() case. One more trivial thing to mention is, introduce "pud_t pud" in the code paths along the way, so the code doesn't dereference *pudp multiple time. Not only because that looks less straightforward, but also because if the dereference really happened, it's not clear whether there can be race to see different *pudp values when it's being modified at the same time. Setting ctx->page_mask properly for a PUD entry. As a side effect, this patch should also be able to optimize devmap GUP on PUD to be able to jump over the whole PUD range, but not yet verified. Hugetlb already can do so prior to this patch. Reviewed-by: Jason Gunthorpe Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 8 - mm/gup.c| 70 +++-- mm/huge_memory.c| 47 ++- mm/internal.h | 2 ++ 4 files changed, 71 insertions(+), 56 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index d3bb25c39482..3f36511bdc02 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -351,8 +351,6 @@ static inline bool folio_test_pmd_mappable(struct folio *folio) struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, int flags, struct dev_pagemap **pgmap); -struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, - pud_t *pud, int flags, struct dev_pagemap **pgmap); vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf); @@ -507,12 +505,6 @@ static inline struct page *follow_devmap_pmd(struct vm_area_struct *vma, return NULL; } -static inline struct page *follow_devmap_pud(struct vm_area_struct *vma, - unsigned long addr, pud_t *pud, int flags, struct dev_pagemap **pgmap) -{ - return NULL; -} - static inline bool thp_migration_supported(void) { return false; diff --git a/mm/gup.c b/mm/gup.c index 26b8cca24077..1e5d42211bb4 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -525,6 +525,70 @@ static struct page *no_page_table(struct vm_area_struct *vma, return NULL; } +#ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES +static struct page *follow_huge_pud(struct vm_area_struct *vma, + unsigned long addr, pud_t *pudp, + int flags, struct follow_page_context *ctx) +{ + struct mm_struct *mm = vma->vm_mm; + struct page *page; + pud_t pud = *pudp; + unsigned long pfn = pud_pfn(pud); + int ret; + + assert_spin_locked(pud_lockptr(mm, pudp)); + + if ((flags & FOLL_WRITE) && !pud_write(pud)) + return NULL; + + if (!pud_present(pud)) + return NULL; + + pfn += (addr & ~PUD_MASK) >> PAGE_SHIFT; + + if (IS_ENABLED(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) && + pud_devmap(pud)) { + /* +* device mapped pages can only be returned if the caller +* will manage the page reference count. +* +* At least one of FOLL_GET | FOLL_PIN must be set, so +* assert that here: +*/ + if (!(flags & (FOLL_GET | FOLL_PIN))) + return ERR_PTR(-EEXIST); + + if (flags & FOLL_TOUCH) + touch_pud(vma, addr, pudp, flags & FOLL_WRITE); + + ctx->pgmap = get_dev_pagemap(pfn, ctx->pgmap); + if (!ctx->pgmap) + return ERR_PTR(-EFAULT); + } + + page = pfn_to_page(pfn); + + if (!pud_devmap(pud) && !pud_write(pud) && + gup_must_unshare(vma, flags, page)) + return ERR_PTR(-EMLINK); + + ret = try_grab_page(page, flags); + if (ret) + page = ERR_PTR(ret); + else + ctx->page_mask = HPAGE_PUD_NR - 1; + + return page; +} +#else /* CONFIG_PGTABLE_HAS_HUGE_LEAVES */ +static struct page *follow_huge_pud(struct vm_area_struct *vma, + unsigned long addr, pud_t *pudp, + int flags, struct follow_page_context *ctx) +{ + return NULL; +}
[PATCH v4 09/13] mm/gup: Cache *pudp in follow_pud_mask()
From: Peter Xu Introduce "pud_t pud" in the function, so the code won't dereference *pudp multiple time. Not only because that looks less straightforward, but also because if the dereference really happened, it's not clear whether there can be race to see different *pudp values if it's being modified at the same time. Acked-by: James Houghton Reviewed-by: Jason Gunthorpe Signed-off-by: Peter Xu --- mm/gup.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index ef46a7053e16..26b8cca24077 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -753,26 +753,27 @@ static struct page *follow_pud_mask(struct vm_area_struct *vma, unsigned int flags, struct follow_page_context *ctx) { - pud_t *pud; + pud_t *pudp, pud; spinlock_t *ptl; struct page *page; struct mm_struct *mm = vma->vm_mm; - pud = pud_offset(p4dp, address); - if (pud_none(*pud)) + pudp = pud_offset(p4dp, address); + pud = READ_ONCE(*pudp); + if (pud_none(pud)) return no_page_table(vma, flags, address); - if (pud_devmap(*pud)) { - ptl = pud_lock(mm, pud); - page = follow_devmap_pud(vma, address, pud, flags, >pgmap); + if (pud_devmap(pud)) { + ptl = pud_lock(mm, pudp); + page = follow_devmap_pud(vma, address, pudp, flags, >pgmap); spin_unlock(ptl); if (page) return page; return no_page_table(vma, flags, address); } - if (unlikely(pud_bad(*pud))) + if (unlikely(pud_bad(pud))) return no_page_table(vma, flags, address); - return follow_pmd_mask(vma, address, pud, flags, ctx); + return follow_pmd_mask(vma, address, pudp, flags, ctx); } static struct page *follow_p4d_mask(struct vm_area_struct *vma, -- 2.44.0
[PATCH v4 08/13] mm/gup: Handle hugetlb for no_page_table()
From: Peter Xu no_page_table() is not yet used for hugetlb code paths. Make it prepared. The major difference here is hugetlb will return -EFAULT as long as page cache does not exist, even if VM_SHARED. See hugetlb_follow_page_mask(). Pass "address" into no_page_table() too, as hugetlb will need it. Reviewed-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Signed-off-by: Peter Xu --- mm/gup.c | 44 ++-- 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index c2881772216b..ef46a7053e16 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -501,19 +501,27 @@ static inline void mm_set_has_pinned_flag(unsigned long *mm_flags) #ifdef CONFIG_MMU static struct page *no_page_table(struct vm_area_struct *vma, - unsigned int flags) + unsigned int flags, unsigned long address) { + if (!(flags & FOLL_DUMP)) + return NULL; + /* -* When core dumping an enormous anonymous area that nobody -* has touched so far, we don't want to allocate unnecessary pages or +* When core dumping, we don't want to allocate unnecessary pages or * page tables. Return error instead of NULL to skip handle_mm_fault, * then get_dump_page() will return NULL to leave a hole in the dump. * But we can only make this optimization where a hole would surely * be zero-filled if handle_mm_fault() actually did handle it. */ - if ((flags & FOLL_DUMP) && - (vma_is_anonymous(vma) || !vma->vm_ops->fault)) + if (is_vm_hugetlb_page(vma)) { + struct hstate *h = hstate_vma(vma); + + if (!hugetlbfs_pagecache_present(h, vma, address)) + return ERR_PTR(-EFAULT); + } else if ((vma_is_anonymous(vma) || !vma->vm_ops->fault)) { return ERR_PTR(-EFAULT); + } + return NULL; } @@ -593,7 +601,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, ptep = pte_offset_map_lock(mm, pmd, address, ); if (!ptep) - return no_page_table(vma, flags); + return no_page_table(vma, flags, address); pte = ptep_get(ptep); if (!pte_present(pte)) goto no_page; @@ -685,7 +693,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, pte_unmap_unlock(ptep, ptl); if (!pte_none(pte)) return NULL; - return no_page_table(vma, flags); + return no_page_table(vma, flags, address); } static struct page *follow_pmd_mask(struct vm_area_struct *vma, @@ -701,27 +709,27 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, pmd = pmd_offset(pudp, address); pmdval = pmdp_get_lockless(pmd); if (pmd_none(pmdval)) - return no_page_table(vma, flags); + return no_page_table(vma, flags, address); if (!pmd_present(pmdval)) - return no_page_table(vma, flags); + return no_page_table(vma, flags, address); if (pmd_devmap(pmdval)) { ptl = pmd_lock(mm, pmd); page = follow_devmap_pmd(vma, address, pmd, flags, >pgmap); spin_unlock(ptl); if (page) return page; - return no_page_table(vma, flags); + return no_page_table(vma, flags, address); } if (likely(!pmd_trans_huge(pmdval))) return follow_page_pte(vma, address, pmd, flags, >pgmap); if (pmd_protnone(pmdval) && !gup_can_follow_protnone(vma, flags)) - return no_page_table(vma, flags); + return no_page_table(vma, flags, address); ptl = pmd_lock(mm, pmd); if (unlikely(!pmd_present(*pmd))) { spin_unlock(ptl); - return no_page_table(vma, flags); + return no_page_table(vma, flags, address); } if (unlikely(!pmd_trans_huge(*pmd))) { spin_unlock(ptl); @@ -752,17 +760,17 @@ static struct page *follow_pud_mask(struct vm_area_struct *vma, pud = pud_offset(p4dp, address); if (pud_none(*pud)) - return no_page_table(vma, flags); + return no_page_table(vma, flags, address); if (pud_devmap(*pud)) { ptl = pud_lock(mm, pud); page = follow_devmap_pud(vma, address, pud, flags, >pgmap); spin_unlock(ptl); if (page) return page; - return no_page_table(vma, flags); + return no_page_table(vma, flags, address); } if (unlikely(pud_bad(*pud))) - return no_page_table(vma, flags); + return no_page_table(vma, flags, address); return follow_pmd_mask(vma, address, pud, flags, ctx); } @@ -777,10 +785,10 @@ static
[PATCH v4 07/13] mm/gup: Refactor record_subpages() to find 1st small page
From: Peter Xu All the fast-gup functions take a tail page to operate, always need to do page mask calculations before feeding that into record_subpages(). Merge that logic into record_subpages(), so that it will do the nth_page() calculation. Reviewed-by: Jason Gunthorpe Signed-off-by: Peter Xu --- mm/gup.c | 25 ++--- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index db35b056fc9a..c2881772216b 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2779,13 +2779,16 @@ static int __gup_device_huge_pud(pud_t pud, pud_t *pudp, unsigned long addr, } #endif -static int record_subpages(struct page *page, unsigned long addr, - unsigned long end, struct page **pages) +static int record_subpages(struct page *page, unsigned long sz, + unsigned long addr, unsigned long end, + struct page **pages) { + struct page *start_page; int nr; + start_page = nth_page(page, (addr & (sz - 1)) >> PAGE_SHIFT); for (nr = 0; addr != end; nr++, addr += PAGE_SIZE) - pages[nr] = nth_page(page, nr); + pages[nr] = nth_page(start_page, nr); return nr; } @@ -2820,8 +2823,8 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, /* hugepages are never "special" */ VM_BUG_ON(!pfn_valid(pte_pfn(pte))); - page = nth_page(pte_page(pte), (addr & (sz - 1)) >> PAGE_SHIFT); - refs = record_subpages(page, addr, end, pages + *nr); + page = pte_page(pte); + refs = record_subpages(page, sz, addr, end, pages + *nr); folio = try_grab_folio(page, refs, flags); if (!folio) @@ -2894,8 +2897,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, pages, nr); } - page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT); - refs = record_subpages(page, addr, end, pages + *nr); + page = pmd_page(orig); + refs = record_subpages(page, PMD_SIZE, addr, end, pages + *nr); folio = try_grab_folio(page, refs, flags); if (!folio) @@ -2938,8 +2941,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, pages, nr); } - page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT); - refs = record_subpages(page, addr, end, pages + *nr); + page = pud_page(orig); + refs = record_subpages(page, PUD_SIZE, addr, end, pages + *nr); folio = try_grab_folio(page, refs, flags); if (!folio) @@ -2978,8 +2981,8 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, BUILD_BUG_ON(pgd_devmap(orig)); - page = nth_page(pgd_page(orig), (addr & ~PGDIR_MASK) >> PAGE_SHIFT); - refs = record_subpages(page, addr, end, pages + *nr); + page = pgd_page(orig); + refs = record_subpages(page, PGDIR_SIZE, addr, end, pages + *nr); folio = try_grab_folio(page, refs, flags); if (!folio) -- 2.44.0
[PATCH v4 06/13] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing
From: Peter Xu Hugepd format for GUP is only used in PowerPC with hugetlbfs. There are some kernel usage of hugepd (can refer to hugepd_populate_kernel() for PPC_8XX), however those pages are not candidates for GUP. Commit a6e79df92e4a ("mm/gup: disallow FOLL_LONGTERM GUP-fast writing to file-backed mappings") added a check to fail gup-fast if there's potential risk of violating GUP over writeback file systems. That should never apply to hugepd. Considering that hugepd is an old format (and even software-only), there's no plan to extend hugepd into other file typed memories that is prone to the same issue. Drop that check, not only because it'll never be true for hugepd per any known plan, but also it paves way for reusing the function outside fast-gup. To make sure we'll still remember this issue just in case hugepd will be extended to support non-hugetlbfs memories, add a rich comment above gup_huge_pd(), explaining the issue with proper references. Cc: Christoph Hellwig Cc: Lorenzo Stoakes Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Peter Xu --- mm/gup.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index e7510b6ce765..db35b056fc9a 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2832,11 +2832,6 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, return 0; } - if (!folio_fast_pin_allowed(folio, flags)) { - gup_put_folio(folio, refs, flags); - return 0; - } - if (!pte_write(pte) && gup_must_unshare(NULL, flags, >page)) { gup_put_folio(folio, refs, flags); return 0; @@ -2847,6 +2842,14 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, return 1; } +/* + * NOTE: currently GUP for a hugepd is only possible on hugetlbfs file + * systems on Power, which does not have issue with folio writeback against + * GUP updates. When hugepd will be extended to support non-hugetlbfs or + * even anonymous memory, we need to do extra check as what we do with most + * of the other folios. See writable_file_mapping_allowed() and + * folio_fast_pin_allowed() for more information. + */ static int gup_huge_pd(hugepd_t hugepd, unsigned long addr, unsigned int pdshift, unsigned long end, unsigned int flags, struct page **pages, int *nr) -- 2.44.0
[PATCH v4 05/13] mm/arch: Provide pud_pfn() fallback
From: Peter Xu The comment in the code explains the reasons. We took a different approach comparing to pmd_pfn() by providing a fallback function. Another option is to provide some lower level config options (compare to HUGETLB_PAGE or THP) to identify which layer an arch can support for such huge mappings. However that can be an overkill. Cc: Mike Rapoport (IBM) Cc: Matthew Wilcox Reviewed-by: Jason Gunthorpe Signed-off-by: Peter Xu --- arch/riscv/include/asm/pgtable.h| 1 + arch/s390/include/asm/pgtable.h | 1 + arch/sparc/include/asm/pgtable_64.h | 1 + arch/x86/include/asm/pgtable.h | 1 + include/linux/pgtable.h | 10 ++ 5 files changed, 14 insertions(+) diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 20242402fc11..0ca28cc8e3fa 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -646,6 +646,7 @@ static inline unsigned long pmd_pfn(pmd_t pmd) #define __pud_to_phys(pud) (__page_val_to_pfn(pud_val(pud)) << PAGE_SHIFT) +#define pud_pfn pud_pfn static inline unsigned long pud_pfn(pud_t pud) { return ((__pud_to_phys(pud) & PUD_MASK) >> PAGE_SHIFT); diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 1a71cb19c089..6cbbe473f680 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1414,6 +1414,7 @@ static inline unsigned long pud_deref(pud_t pud) return (unsigned long)__va(pud_val(pud) & origin_mask); } +#define pud_pfn pud_pfn static inline unsigned long pud_pfn(pud_t pud) { return __pa(pud_deref(pud)) >> PAGE_SHIFT; diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 4d1bafaba942..26efc9bb644a 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -875,6 +875,7 @@ static inline bool pud_leaf(pud_t pud) return pte_val(pte) & _PAGE_PMD_HUGE; } +#define pud_pfn pud_pfn static inline unsigned long pud_pfn(pud_t pud) { pte_t pte = __pte(pud_val(pud)); diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index cefc7a84f7a4..273f7557218c 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -234,6 +234,7 @@ static inline unsigned long pmd_pfn(pmd_t pmd) return (pfn & pmd_pfn_mask(pmd)) >> PAGE_SHIFT; } +#define pud_pfn pud_pfn static inline unsigned long pud_pfn(pud_t pud) { phys_addr_t pfn = pud_val(pud); diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 600e17d03659..75fe309a4e10 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1817,6 +1817,16 @@ typedef unsigned int pgtbl_mod_mask; #define pte_leaf_size(x) PAGE_SIZE #endif +/* + * We always define pmd_pfn for all archs as it's used in lots of generic + * code. Now it happens too for pud_pfn (and can happen for larger + * mappings too in the future; we're not there yet). Instead of defining + * it for all archs (like pmd_pfn), provide a fallback. + */ +#ifndef pud_pfn +#define pud_pfn(x) ({ BUILD_BUG(); 0; }) +#endif + /* * Some architectures have MMUs that are configurable or selectable at boot * time. These lead to variable PTRS_PER_x. For statically allocated arrays it -- 2.44.0
[PATCH v4 04/13] mm: Introduce vma_pgtable_walk_{begin|end}()
From: Peter Xu Introduce per-vma begin()/end() helpers for pgtable walks. This is a preparation work to merge hugetlb pgtable walkers with generic mm. The helpers need to be called before and after a pgtable walk, will start to be needed if the pgtable walker code supports hugetlb pages. It's a hook point for any type of VMA, but for now only hugetlb uses it to stablize the pgtable pages from getting away (due to possible pmd unsharing). Reviewed-by: Christoph Hellwig Reviewed-by: Muchun Song Signed-off-by: Peter Xu --- include/linux/mm.h | 3 +++ mm/memory.c| 12 2 files changed, 15 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index afe27ff3fa94..d8f78017d271 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4233,4 +4233,7 @@ static inline bool pfn_is_unaccepted_memory(unsigned long pfn) return range_contains_unaccepted_memory(paddr, paddr + PAGE_SIZE); } +void vma_pgtable_walk_begin(struct vm_area_struct *vma); +void vma_pgtable_walk_end(struct vm_area_struct *vma); + #endif /* _LINUX_MM_H */ diff --git a/mm/memory.c b/mm/memory.c index 3d0c0cc33c57..27d173f9a521 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6438,3 +6438,15 @@ void ptlock_free(struct ptdesc *ptdesc) kmem_cache_free(page_ptl_cachep, ptdesc->ptl); } #endif + +void vma_pgtable_walk_begin(struct vm_area_struct *vma) +{ + if (is_vm_hugetlb_page(vma)) + hugetlb_vma_lock_read(vma); +} + +void vma_pgtable_walk_end(struct vm_area_struct *vma) +{ + if (is_vm_hugetlb_page(vma)) + hugetlb_vma_unlock_read(vma); +} -- 2.44.0
[PATCH v4 03/13] mm: Make HPAGE_PXD_* macros even if !THP
From: Peter Xu These macros can be helpful when we plan to merge hugetlb code into generic code. Move them out and define them as long as PGTABLE_HAS_HUGE_LEAVES is selected, because there are systems that only define HUGETLB_PAGE not THP. One note here is HPAGE_PMD_SHIFT must be defined even if PMD_SHIFT is not defined (e.g. !CONFIG_MMU case); it (or in other forms, like HPAGE_PMD_NR) is already used in lots of common codes without ifdef guards. Use the old trick to let complations work. Here we only need to differenciate HPAGE_PXD_SHIFT definitions. All the rest macros will be defined based on it. When at it, move HPAGE_PMD_NR / HPAGE_PMD_ORDER over together. Signed-off-by: Peter Xu --- include/linux/huge_mm.h | 29 +++-- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 7576025db55d..d3bb25c39482 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -64,9 +64,6 @@ ssize_t single_hugepage_flag_show(struct kobject *kobj, enum transparent_hugepage_flag flag); extern struct kobj_attribute shmem_enabled_attr; -#define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT) -#define HPAGE_PMD_NR (1<
[PATCH v4 02/13] mm/hugetlb: Declare hugetlbfs_pagecache_present() non-static
From: Peter Xu It will be used outside hugetlb.c soon. Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 9 + mm/hugetlb.c| 4 ++-- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d748628efc5e..294c78b3549f 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -174,6 +174,9 @@ u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx); pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pud_t *pud); +bool hugetlbfs_pagecache_present(struct hstate *h, +struct vm_area_struct *vma, +unsigned long address); struct address_space *hugetlb_page_mapping_lock_write(struct page *hpage); @@ -1228,6 +1231,12 @@ static inline void hugetlb_register_node(struct node *node) static inline void hugetlb_unregister_node(struct node *node) { } + +static inline bool hugetlbfs_pagecache_present( +struct hstate *h, struct vm_area_struct *vma, unsigned long address) +{ + return false; +} #endif /* CONFIG_HUGETLB_PAGE */ static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f9640a81226e..65b9c9a48fd2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6110,8 +6110,8 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, /* * Return whether there is a pagecache page to back given address within VMA. */ -static bool hugetlbfs_pagecache_present(struct hstate *h, - struct vm_area_struct *vma, unsigned long address) +bool hugetlbfs_pagecache_present(struct hstate *h, +struct vm_area_struct *vma, unsigned long address) { struct address_space *mapping = vma->vm_file->f_mapping; pgoff_t idx = linear_page_index(vma, address); -- 2.44.0
[PATCH v4 01/13] mm/Kconfig: CONFIG_PGTABLE_HAS_HUGE_LEAVES
From: Peter Xu Introduce a config option that will be selected as long as huge leaves are involved in pgtable (thp or hugetlbfs). It would be useful to mark any code with this new config that can process either hugetlb or thp pages in any level that is higher than pte level. Reviewed-by: Jason Gunthorpe Signed-off-by: Peter Xu --- mm/Kconfig | 6 ++ 1 file changed, 6 insertions(+) diff --git a/mm/Kconfig b/mm/Kconfig index b924f4a5a3ef..497cdf4d8ebf 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -850,6 +850,12 @@ config READ_ONLY_THP_FOR_FS endif # TRANSPARENT_HUGEPAGE +# +# The architecture supports pgtable leaves that is larger than PAGE_SIZE +# +config PGTABLE_HAS_HUGE_LEAVES + def_bool TRANSPARENT_HUGEPAGE || HUGETLB_PAGE + # # UP and nommu archs use km based percpu allocator # -- 2.44.0
[PATCH v4 00/13] mm/gup: Unify hugetlb, part 2
From: Peter Xu v4: - Fix build issues, tested on more archs/configs ([x86_64, i386, arm, arm64, powerpc, riscv, s390] x [allno, alldef, allmod]). - Squashed the fixup series into v3, touched up commit messages [1] - Added the patch to fix pud_pfn() into the series [2] - Fixed one more build issue on arm+alldefconfig, where pgd_t is a two-item array. - Manage R-bs: add some, remove some (due to the squashes above) - Rebase to latest mm-unstable (2f6182cd23a7, March 26th) rfc: https://lore.kernel.org/r/20231116012908.392077-1-pet...@redhat.com v1: https://lore.kernel.org/r/20231219075538.414708-1-pet...@redhat.com v2: https://lore.kernel.org/r/20240103091423.400294-1-pet...@redhat.com v3: https://lore.kernel.org/r/20240321220802.679544-1-pet...@redhat.com The series removes the hugetlb slow gup path after a previous refactor work [1], so that slow gup now uses the exact same path to process all kinds of memory including hugetlb. For the long term, we may want to remove most, if not all, call sites of huge_pte_offset(). It'll be ideal if that API can be completely dropped from arch hugetlb API. This series is one small step towards merging hugetlb specific codes into generic mm paths. From that POV, this series removes one reference to huge_pte_offset() out of many others. One goal of such a route is that we can reconsider merging hugetlb features like High Granularity Mapping (HGM). It was not accepted in the past because it may add lots of hugetlb specific codes and make the mm code even harder to maintain. With a merged codeset, features like HGM can hopefully share some code with THP, legacy (PMD+) or modern (continuous PTEs). To make it work, the generic slow gup code will need to at least understand hugepd, which is already done like so in fast-gup. Due to the specialty of hugepd to be software-only solution (no hardware recognizes the hugepd format, so it's purely artificial structures), there's chance we can merge some or all hugepd formats with cont_pte in the future. That question is yet unsettled from Power side to have an acknowledgement. As of now for this series, I kept the hugepd handling because we may still need to do so before getting a clearer picture of the future of hugepd. The other reason is simply that we did it already for fast-gup and most codes are still around to be reused. It'll make more sense to keep slow/fast gup behave the same before a decision is made to remove hugepd. There's one major difference for slow-gup on cont_pte / cont_pmd handling, currently supported on three architectures (aarch64, riscv, ppc). Before the series, slow gup will be able to recognize e.g. cont_pte entries with the help of huge_pte_offset() when hstate is around. Now it's gone but still working, by looking up pgtable entries one by one. It's not ideal, but hopefully this change should not affect yet on major workloads. There's some more information in the commit message of the last patch. If this would be a concern, we can consider teaching slow gup to recognize cont pte/pmd entries, and that should recover the lost performance. But I doubt its necessity for now, so I kept it as simple as it can be. Test Done = For x86_64, tested full gup_test matrix over 2MB huge pages. For aarch64, tested the same over 64KB cont_pte huge pages. One note is that this v3 didn't go through any ppc test anymore, as finding such system can always take time. It's based on the fact that it was tested in previous versions, and this version should have zero change regarding to hugepd sections. If anyone (Christophe?) wants to give it a shot on PowerPC, please do and I would appreciate it: "./run_vmtests.sh -a -t gup_test" should do well enough (please consider [2] applied if hugepd is <1MB), as long as we're sure the hugepd pages are touched as expected. Patch layout = Patch 1-8:Preparation works, or cleanups in relevant code paths Patch 9-11: Teach slow gup with all kinds of huge entries (pXd, hugepd) Patch 12: Drop hugetlb_follow_page_mask() More information can be found in the commit messages of each patch. Any comment will be welcomed. Thanks. [1] https://lore.kernel.org/all/20230628215310.73782-1-pet...@redhat.com [2] https://lore.kernel.org/r/20240321215047.678172-1-pet...@redhat.com Peter Xu (13): mm/Kconfig: CONFIG_PGTABLE_HAS_HUGE_LEAVES mm/hugetlb: Declare hugetlbfs_pagecache_present() non-static mm: Make HPAGE_PXD_* macros even if !THP mm: Introduce vma_pgtable_walk_{begin|end}() mm/arch: Provide pud_pfn() fallback mm/gup: Drop folio_fast_pin_allowed() in hugepd processing mm/gup: Refactor record_subpages() to find 1st small page mm/gup: Handle hugetlb for no_page_table() mm/gup: Cache *pudp in follow_pud_mask() mm/gup: Handle huge pud for follow_pud_mask() mm/gup: Handle huge pmd for follow_pmd_mask() mm/gup: Handle hugepd for follow_page() mm/gup: Handle hugetlb in the generic follow_page_mask code
Re: [PATCH RFC 0/3] mm/gup: consistently call it GUP-fast
On Wed, Mar 27, 2024 at 02:05:35PM +0100, David Hildenbrand wrote: > Some cleanups around function names, comments and the config option of > "GUP-fast" -- GUP without "lock" safety belts on. > > With this cleanup it's easy to judge which functions are GUP-fast specific. > We now consistently call it "GUP-fast", avoiding mixing it with "fast GUP", > "lockless", or simply "gup" (which I always considered confusing in the > ode). > > So the magic now happens in functions that contain "gup_fast", whereby > gup_fast() is the entry point into that magic. Comments consistently > reference either "GUP-fast" or "gup_fast()". > > Based on mm-unstable from today. I won't CC arch maintainers, but only > arch mailing lists, to reduce noise. > > Tested on x86_64, cross compiled on a bunch of archs, whereby some of them > don't properly even compile on mm-unstable anymore in my usual setup > (alpha, arc, parisc64, sh) ... maybe the cross compilers are outdated, > but there are no new ones around. Hm. I'm not sure what config you tried there; as I am doing some build tests recently, I found turning off CONFIG_SAMPLES + CONFIG_GCC_PLUGINS could avoid a lot of issues, I think it's due to libc missing. But maybe not the case there. The series makes sense to me, the naming is confusing. Btw, thanks for posting this as RFC. This definitely has a conflict with the other gup series that I had; I'll post v4 of that shortly. -- Peter Xu
Re: [PATCH] Add static_key_feature_checks_initialized flag
Le 27/03/2024 à 05:59, Nicholas Miehlbradt a écrit : > JUMP_LABEL_FEATURE_CHECK_DEBUG used static_key_initialized to determine > whether {cpu,mmu}_has_feature() was used before static keys were > initialized. However, {cpu,mmu}_has_feature() should not be used before > setup_feature_keys() is called. As static_key_initalized is set much > earlier during boot there is a window in which JUMP_LABEL_FEATURE_CHECK_DEBUG > will not report errors. Add a flag specifically to indicate when > {cpu,mmu}_has_feature() is safe to use. What do you mean by "much earlier" ? As far as I can see, static_key_initialized is set by jump_label_init() as cpu_feature_keys_init() and mmu_feature_keys_init() are call immediately after. I don't think it is possible to do anything inbetween. Or maybe you mean the problem is the call to jump_label_init() in early_init_devtree() ? You should make it explicit in the message, and see if it wouldn't be better to call cpu_feature_keys_init() and mmu_feature_keys_init() as well in early_init_devtree() in that case ? Christophe
Re: FAILED: Patch "powerpc: xor_vmx: Add '-mhard-float' to CFLAGS" failed to apply to 5.10-stable tree
On Wed, Mar 27, 2024 at 08:20:07AM -0400, Sasha Levin wrote: > The patch below does not apply to the 5.10-stable tree. > If someone wants it applied there, or to any other stable or longterm > tree, then please email the backport, including the original git commit > id to . ... > -- original commit in Linus's tree -- > > From 35f20786c481d5ced9283ff42de5c69b65e5ed13 Mon Sep 17 00:00:00 2001 > From: Nathan Chancellor > Date: Sat, 27 Jan 2024 11:07:43 -0700 > Subject: [PATCH] powerpc: xor_vmx: Add '-mhard-float' to CFLAGS I have attached a backport that will work for 5.15 and earlier. I think you worked around this conflict in 5.15 by taking 04e85bbf71c9 but I am not sure that is a smart idea. I think it might just be better to drop that dependency and apply this version in 5.15. Cheers, Nathan >From c6cb80d94871cbb4ff151f7eb2586cadeb364ef7 Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Sat, 27 Jan 2024 11:07:43 -0700 Subject: [PATCH 4.19 to 5.15] powerpc: xor_vmx: Add '-mhard-float' to CFLAGS commit 35f20786c481d5ced9283ff42de5c69b65e5ed13 upstream. arch/powerpc/lib/xor_vmx.o is built with '-msoft-float' (from the main powerpc Makefile) and '-maltivec' (from its CFLAGS), which causes an error when building with clang after a recent change in main: error: option '-msoft-float' cannot be specified with '-maltivec' make[6]: *** [scripts/Makefile.build:243: arch/powerpc/lib/xor_vmx.o] Error 1 Explicitly add '-mhard-float' before '-maltivec' in xor_vmx.o's CFLAGS to override the previous inclusion of '-msoft-float' (as the last option wins), which matches how other areas of the kernel use '-maltivec', such as AMDGPU. Cc: sta...@vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/1986 Link: https://github.com/llvm/llvm-project/commit/4792f912b232141ecba4cbae538873be3c28556c Signed-off-by: Nathan Chancellor Signed-off-by: Michael Ellerman Link: https://msgid.link/20240127-ppc-xor_vmx-drop-msoft-float-v1-1-f24140e81...@kernel.org [nathan: Fixed conflicts due to lack of 04e85bbf71c9 in older trees] Signed-off-by: Nathan Chancellor --- arch/powerpc/lib/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile index 321cab5c3ea0..bd5012aa94e3 100644 --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -67,6 +67,6 @@ obj-$(CONFIG_PPC_LIB_RHEAP) += rheap.o obj-$(CONFIG_FTR_FIXUP_SELFTEST) += feature-fixups-test.o obj-$(CONFIG_ALTIVEC) += xor_vmx.o xor_vmx_glue.o -CFLAGS_xor_vmx.o += -maltivec $(call cc-option,-mabi=altivec) +CFLAGS_xor_vmx.o += -mhard-float -maltivec $(call cc-option,-mabi=altivec) obj-$(CONFIG_PPC64) += $(obj64-y) -- 2.44.0
Re: [PATCH v2 12/14] sh: Add support for suppressing warning backtraces
On 3/27/24 07:44, Simon Horman wrote: On Mon, Mar 25, 2024 at 10:52:46AM -0700, Guenter Roeck wrote: Add name of functions triggering warning backtraces to the __bug_table object section to enable support for suppressing WARNING backtraces. To limit image size impact, the pointer to the function name is only added to the __bug_table section if both CONFIG_KUNIT_SUPPRESS_BACKTRACE and CONFIG_DEBUG_BUGVERBOSE are enabled. Otherwise, the __func__ assembly parameter is replaced with a (dummy) NULL parameter to avoid an image size increase due to unused __func__ entries (this is necessary because __func__ is not a define but a virtual variable). Tested-by: Linux Kernel Functional Testing Acked-by: Dan Carpenter Signed-off-by: Guenter Roeck --- - Rebased to v6.9-rc1 - Added Tested-by:, Acked-by:, and Reviewed-by: tags - Introduced KUNIT_SUPPRESS_BACKTRACE configuration option arch/sh/include/asm/bug.h | 26 ++ 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/arch/sh/include/asm/bug.h b/arch/sh/include/asm/bug.h index 05a485c4fabc..470ce6567d20 100644 --- a/arch/sh/include/asm/bug.h +++ b/arch/sh/include/asm/bug.h @@ -24,21 +24,36 @@ * The offending file and line are encoded in the __bug_table section. */ #ifdef CONFIG_DEBUG_BUGVERBOSE + +#ifdef CONFIG_KUNIT_SUPPRESS_BACKTRACE +# define HAVE_BUG_FUNCTION +# define __BUG_FUNC_PTR"\t.long %O2\n" +#else +# define __BUG_FUNC_PTR +#endif /* CONFIG_KUNIT_SUPPRESS_BACKTRACE */ + Hi Guenter, a minor nit from my side: this change results in a Kernel doc warning. .../bug.h:29: warning: expecting prototype for _EMIT_BUG_ENTRY(). Prototype was for HAVE_BUG_FUNCTION() instead Perhaps either the new code should be placed above the Kernel doc, or scripts/kernel-doc should be enhanced? Thanks a lot for the feedback. The definition block needs to be inside CONFIG_DEBUG_BUGVERBOSE, so it would be a bit odd to move it above the documentation just to make kerneldoc happy. I am not really sure that to do about it. I'll wait for comments from others before making any changes. Thanks, Guenter #define _EMIT_BUG_ENTRY \ "\t.pushsection __bug_table,\"aw\"\n" \ "2:\t.long 1b, %O1\n" \ - "\t.short %O2, %O3\n" \ - "\t.org 2b+%O4\n" \ + __BUG_FUNC_PTR \ + "\t.short %O3, %O4\n" \ + "\t.org 2b+%O5\n" \ "\t.popsection\n" #else #define _EMIT_BUG_ENTRY \ "\t.pushsection __bug_table,\"aw\"\n" \ "2:\t.long 1b\n" \ - "\t.short %O3\n" \ - "\t.org 2b+%O4\n" \ + "\t.short %O4\n" \ + "\t.org 2b+%O5\n" \ "\t.popsection\n" #endif +#ifdef HAVE_BUG_FUNCTION +# define __BUG_FUNC__func__ +#else +# define __BUG_FUNCNULL +#endif + #define BUG() \ do { \ __asm__ __volatile__ ( \ ...
Re: [PATCH RFC 3/3] mm: use "GUP-fast" instead "fast GUP" in remaining comments
On Wed, Mar 27, 2024 at 02:05:38PM +0100, David Hildenbrand wrote: > Let's fixup the remaining comments to consistently call that thing > "GUP-fast". With this change, we consistently call it "GUP-fast". > > Signed-off-by: David Hildenbrand Reviewed-by: Mike Rapoport (IBM) > --- > mm/filemap.c| 2 +- > mm/khugepaged.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 387b394754fa..c668e11cd6ef 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1810,7 +1810,7 @@ EXPORT_SYMBOL(page_cache_prev_miss); > * C. Return the page to the page allocator > * > * This means that any page may have its reference count temporarily > - * increased by a speculative page cache (or fast GUP) lookup as it can > + * increased by a speculative page cache (or GUP-fast) lookup as it can > * be allocated by another user before the RCU grace period expires. > * Because the refcount temporarily acquired here may end up being the > * last refcount on the page, any page allocation must be freeable by > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 38830174608f..6972fa05132e 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1169,7 +1169,7 @@ static int collapse_huge_page(struct mm_struct *mm, > unsigned long address, >* huge and small TLB entries for the same virtual address to >* avoid the risk of CPU bugs in that area. >* > - * Parallel fast GUP is fine since fast GUP will back off when > + * Parallel GUP-fast is fine since GUP-fast will back off when >* it detects PMD is changed. >*/ > _pmd = pmdp_collapse_flush(vma, address, pmd); > -- > 2.43.2 > -- Sincerely yours, Mike.
Re: [PATCH RFC 2/3] mm/treewide: rename CONFIG_HAVE_FAST_GUP to CONFIG_HAVE_GUP_FAST
On Wed, Mar 27, 2024 at 02:05:37PM +0100, David Hildenbrand wrote: > Nowadays, we call it "GUP-fast", the external interface includes > functions like "get_user_pages_fast()", and we renamed all internal > functions to reflect that as well. > > Let's make the config option reflect that. > > Signed-off-by: David Hildenbrand Reviewed-by: Mike Rapoport (IBM) > --- > arch/arm/Kconfig | 2 +- > arch/arm64/Kconfig | 2 +- > arch/loongarch/Kconfig | 2 +- > arch/mips/Kconfig | 2 +- > arch/powerpc/Kconfig | 2 +- > arch/s390/Kconfig | 2 +- > arch/sh/Kconfig| 2 +- > arch/x86/Kconfig | 2 +- > include/linux/rmap.h | 8 > kernel/events/core.c | 4 ++-- > mm/Kconfig | 2 +- > mm/gup.c | 6 +++--- > mm/internal.h | 2 +- > 13 files changed, 19 insertions(+), 19 deletions(-) >
Re: [PATCH v2 12/14] sh: Add support for suppressing warning backtraces
On Mon, Mar 25, 2024 at 10:52:46AM -0700, Guenter Roeck wrote: > Add name of functions triggering warning backtraces to the __bug_table > object section to enable support for suppressing WARNING backtraces. > > To limit image size impact, the pointer to the function name is only added > to the __bug_table section if both CONFIG_KUNIT_SUPPRESS_BACKTRACE and > CONFIG_DEBUG_BUGVERBOSE are enabled. Otherwise, the __func__ assembly > parameter is replaced with a (dummy) NULL parameter to avoid an image size > increase due to unused __func__ entries (this is necessary because __func__ > is not a define but a virtual variable). > > Tested-by: Linux Kernel Functional Testing > Acked-by: Dan Carpenter > Signed-off-by: Guenter Roeck > --- > - Rebased to v6.9-rc1 > - Added Tested-by:, Acked-by:, and Reviewed-by: tags > - Introduced KUNIT_SUPPRESS_BACKTRACE configuration option > > arch/sh/include/asm/bug.h | 26 ++ > 1 file changed, 22 insertions(+), 4 deletions(-) > > diff --git a/arch/sh/include/asm/bug.h b/arch/sh/include/asm/bug.h > index 05a485c4fabc..470ce6567d20 100644 > --- a/arch/sh/include/asm/bug.h > +++ b/arch/sh/include/asm/bug.h > @@ -24,21 +24,36 @@ > * The offending file and line are encoded in the __bug_table section. > */ > #ifdef CONFIG_DEBUG_BUGVERBOSE > + > +#ifdef CONFIG_KUNIT_SUPPRESS_BACKTRACE > +# define HAVE_BUG_FUNCTION > +# define __BUG_FUNC_PTR "\t.long %O2\n" > +#else > +# define __BUG_FUNC_PTR > +#endif /* CONFIG_KUNIT_SUPPRESS_BACKTRACE */ > + Hi Guenter, a minor nit from my side: this change results in a Kernel doc warning. .../bug.h:29: warning: expecting prototype for _EMIT_BUG_ENTRY(). Prototype was for HAVE_BUG_FUNCTION() instead Perhaps either the new code should be placed above the Kernel doc, or scripts/kernel-doc should be enhanced? > #define _EMIT_BUG_ENTRY \ > "\t.pushsection __bug_table,\"aw\"\n" \ > "2:\t.long 1b, %O1\n" \ > - "\t.short %O2, %O3\n" \ > - "\t.org 2b+%O4\n" \ > + __BUG_FUNC_PTR \ > + "\t.short %O3, %O4\n" \ > + "\t.org 2b+%O5\n" \ > "\t.popsection\n" > #else > #define _EMIT_BUG_ENTRY \ > "\t.pushsection __bug_table,\"aw\"\n" \ > "2:\t.long 1b\n"\ > - "\t.short %O3\n"\ > - "\t.org 2b+%O4\n" \ > + "\t.short %O4\n"\ > + "\t.org 2b+%O5\n" \ > "\t.popsection\n" > #endif > > +#ifdef HAVE_BUG_FUNCTION > +# define __BUG_FUNC __func__ > +#else > +# define __BUG_FUNC NULL > +#endif > + > #define BUG()\ > do { \ > __asm__ __volatile__ ( \ ...
Re: [PATCH v2 2/5] arm64, powerpc, riscv, s390, x86: ptdump: Refactor CONFIG_DEBUG_WX
On Tue, 30 Jan 2024 02:34:33 PST (-0800), christophe.le...@csgroup.eu wrote: All architectures using the core ptdump functionality also implement CONFIG_DEBUG_WX, and they all do it more or less the same way, with a function called debug_checkwx() that is called by mark_rodata_ro(), which is a substitute to ptdump_check_wx() when CONFIG_DEBUG_WX is set and a no-op otherwise. Refactor by centraly defining debug_checkwx() in linux/ptdump.h and call debug_checkwx() immediately after calling mark_rodata_ro() instead of calling it at the end of every mark_rodata_ro(). On x86_32, mark_rodata_ro() first checks __supported_pte_mask has _PAGE_NX before calling debug_checkwx(). Now the check is inside the callee ptdump_walk_pgd_level_checkwx(). On powerpc_64, mark_rodata_ro() bails out early before calling ptdump_check_wx() when the MMU doesn't have KERNEL_RO feature. The check is now also done in ptdump_check_wx() as it is called outside mark_rodata_ro(). Signed-off-by: Christophe Leroy Reviewed-by: Alexandre Ghiti --- v2: For x86 change macro ptdump_check_wx() to ptdump_check_wx --- arch/arm64/include/asm/ptdump.h | 7 --- arch/arm64/mm/mmu.c | 2 -- arch/powerpc/mm/mmu_decl.h | 6 -- arch/powerpc/mm/pgtable_32.c| 4 arch/powerpc/mm/pgtable_64.c| 3 --- arch/powerpc/mm/ptdump/ptdump.c | 3 +++ arch/riscv/include/asm/ptdump.h | 22 -- arch/riscv/mm/init.c| 3 --- arch/riscv/mm/ptdump.c | 1 - arch/s390/include/asm/ptdump.h | 14 -- arch/s390/mm/dump_pagetables.c | 1 - arch/s390/mm/init.c | 2 -- arch/x86/include/asm/pgtable.h | 3 +-- arch/x86/mm/dump_pagetables.c | 3 +++ arch/x86/mm/init_32.c | 2 -- arch/x86/mm/init_64.c | 2 -- include/linux/ptdump.h | 7 +++ init/main.c | 2 ++ 18 files changed, 16 insertions(+), 71 deletions(-) delete mode 100644 arch/riscv/include/asm/ptdump.h delete mode 100644 arch/s390/include/asm/ptdump.h diff --git a/arch/arm64/include/asm/ptdump.h b/arch/arm64/include/asm/ptdump.h index 581caac525b0..5b1701c76d1c 100644 --- a/arch/arm64/include/asm/ptdump.h +++ b/arch/arm64/include/asm/ptdump.h @@ -29,13 +29,6 @@ void __init ptdump_debugfs_register(struct ptdump_info *info, const char *name); static inline void ptdump_debugfs_register(struct ptdump_info *info, const char *name) { } #endif -void ptdump_check_wx(void); #endif /* CONFIG_PTDUMP_CORE */ -#ifdef CONFIG_DEBUG_WX -#define debug_checkwx()ptdump_check_wx() -#else -#define debug_checkwx()do { } while (0) -#endif - #endif /* __ASM_PTDUMP_H */ diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 1ac7467d34c9..3a27d887f7dd 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -632,8 +632,6 @@ void mark_rodata_ro(void) section_size = (unsigned long)__init_begin - (unsigned long)__start_rodata; update_mapping_prot(__pa_symbol(__start_rodata), (unsigned long)__start_rodata, section_size, PAGE_KERNEL_RO); - - debug_checkwx(); } static void __init map_kernel_segment(pgd_t *pgdp, void *va_start, void *va_end, diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h index 72341b9fb552..90dcc2844056 100644 --- a/arch/powerpc/mm/mmu_decl.h +++ b/arch/powerpc/mm/mmu_decl.h @@ -171,12 +171,6 @@ static inline void mmu_mark_rodata_ro(void) { } void __init mmu_mapin_immr(void); #endif -#ifdef CONFIG_DEBUG_WX -void ptdump_check_wx(void); -#else -static inline void ptdump_check_wx(void) { } -#endif - static inline bool debug_pagealloc_enabled_or_kfence(void) { return IS_ENABLED(CONFIG_KFENCE) || debug_pagealloc_enabled(); diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index 5c02fd08d61e..12498017da8e 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -153,7 +153,6 @@ void mark_rodata_ro(void) if (v_block_mapped((unsigned long)_stext + 1)) { mmu_mark_rodata_ro(); - ptdump_check_wx(); return; } @@ -166,9 +165,6 @@ void mark_rodata_ro(void) PFN_DOWN((unsigned long)_stext); set_memory_ro((unsigned long)_stext, numpages); - - // mark_initmem_nx() should have already run by now - ptdump_check_wx(); } #endif diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c index 5ac1fd30341b..1b366526f4f2 100644 --- a/arch/powerpc/mm/pgtable_64.c +++ b/arch/powerpc/mm/pgtable_64.c @@ -150,9 +150,6 @@ void mark_rodata_ro(void) radix__mark_rodata_ro(); else hash__mark_rodata_ro(); - - // mark_initmem_nx() should have already run by now - ptdump_check_wx(); } void mark_initmem_nx(void) diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c index
Re: [PATCH v3] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id
On Thu, 25 Jan 2024 22:44:51 PST (-0800), shi...@os.amperecomputing.com wrote: During the kernel booting, the generic cpu_to_node() is called too early in arm64, powerpc and riscv when CONFIG_NUMA is enabled. There are at least four places in the common code where the generic cpu_to_node() is called before it is initialized: 1.) early_trace_init() in kernel/trace/trace.c 2.) sched_init() in kernel/sched/core.c 3.) init_sched_fair_class()in kernel/sched/fair.c 4.) workqueue_init_early() in kernel/workqueue.c In order to fix the bug, the patch introduces early_numa_node_init() which is called after smp_prepare_boot_cpu() in start_kernel. early_numa_node_init will initialize the "numa_node" as soon as the early_cpu_to_node() is ready, before the cpu_to_node() is called at the first time. Signed-off-by: Huang Shijie --- v2 --> v3: Do not change the cpu_to_node to function pointer. Introduce early_numa_node_init() which initialize the numa_node at an early stage. v2: https://lore.kernel.org/all/20240123045843.75969-1-shi...@os.amperecomputing.com/ v1 --> v2: In order to fix the x86 compiling error, move the cpu_to_node() from driver/base/arch_numa.c to driver/base/node.c. v1: http://lists.infradead.org/pipermail/linux-arm-kernel/2024-January/896160.html An old different title patch: http://lists.infradead.org/pipermail/linux-arm-kernel/2024-January/895963.html --- init/main.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/init/main.c b/init/main.c index e24b0780fdff..39efe5ed58a0 100644 --- a/init/main.c +++ b/init/main.c @@ -870,6 +870,19 @@ static void __init print_unknown_bootoptions(void) memblock_free(unknown_options, len); } +static void __init early_numa_node_init(void) +{ +#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID +#ifndef cpu_to_node + int cpu; + + /* The early_cpu_to_node() should be ready here. */ + for_each_possible_cpu(cpu) + set_cpu_numa_node(cpu, early_cpu_to_node(cpu)); +#endif +#endif +} + asmlinkage __visible __init __no_sanitize_address __noreturn __no_stack_protector void start_kernel(void) { @@ -900,6 +913,7 @@ void start_kernel(void) setup_nr_cpu_ids(); setup_per_cpu_areas(); smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */ + early_numa_node_init(); boot_cpu_hotplug_init(); pr_notice("Kernel command line: %s\n", saved_command_line); Acked-by: Palmer Dabbelt # RISC-V I don't really understand the init/main.c stuff all that well, I'm adding Andrew as it looks like he's been merging stuff here.
Re: [PATCH RFC 1/3] mm/gup: consistently name GUP-fast functions
On 27.03.24 14:52, Jason Gunthorpe wrote: On Wed, Mar 27, 2024 at 02:05:36PM +0100, David Hildenbrand wrote: Let's consistently call the "fast-only" part of GUP "GUP-fast" and rename all relevant internal functions to start with "gup_fast", to make it clearer that this is not ordinary GUP. The current mixture of "lockless", "gup" and "gup_fast" is confusing. Further, avoid the term "huge" when talking about a "leaf" -- for example, we nowadays check pmd_leaf() because pmd_huge() is gone. For the "hugepd"/"hugepte" stuff, it's part of the name ("is_hugepd"), so that says. What remains is the "external" interface: * get_user_pages_fast_only() * get_user_pages_fast() * pin_user_pages_fast() And the "internal" interface that handles GUP-fast + fallback: * internal_get_user_pages_fast() This would like a better name too. How about gup_fast_fallback() ? Yes, I was not able to come up with something I liked. But I do like your proposal, so I'll do that! [...] I think it is a great idea, it always takes a moment to figure out if a function is part of the fast callchain or not.. (even better would be to shift the fast stuff into its own file, but I expect that is too much) Yes, one step at a time :) Reviewed-by: Jason Gunthorpe Thanks Jason! -- Cheers, David / dhildenb
Re: [PATCH RFC 1/3] mm/gup: consistently name GUP-fast functions
On Wed, Mar 27, 2024 at 02:05:36PM +0100, David Hildenbrand wrote: > Let's consistently call the "fast-only" part of GUP "GUP-fast" and rename > all relevant internal functions to start with "gup_fast", to make it > clearer that this is not ordinary GUP. The current mixture of > "lockless", "gup" and "gup_fast" is confusing. > > Further, avoid the term "huge" when talking about a "leaf" -- for > example, we nowadays check pmd_leaf() because pmd_huge() is gone. For the > "hugepd"/"hugepte" stuff, it's part of the name ("is_hugepd"), so that > says. > > What remains is the "external" interface: > * get_user_pages_fast_only() > * get_user_pages_fast() > * pin_user_pages_fast() > > And the "internal" interface that handles GUP-fast + fallback: > * internal_get_user_pages_fast() This would like a better name too. How about gup_fast_fallback() ? > The high-level internal function for GUP-fast is now: > * gup_fast() > > The basic GUP-fast walker functions: > * gup_pgd_range() -> gup_fast_pgd_range() > * gup_p4d_range() -> gup_fast_p4d_range() > * gup_pud_range() -> gup_fast_pud_range() > * gup_pmd_range() -> gup_fast_pmd_range() > * gup_pte_range() -> gup_fast_pte_range() > * gup_huge_pgd() -> gup_fast_pgd_leaf() > * gup_huge_pud() -> gup_fast_pud_leaf() > * gup_huge_pmd() -> gup_fast_pmd_leaf() > > The weird hugepd stuff: > * gup_huge_pd() -> gup_fast_hugepd() > * gup_hugepte() -> gup_fast_hugepte() > > The weird devmap stuff: > * __gup_device_huge_pud() -> gup_fast_devmap_pud_leaf() > * __gup_device_huge_pmd -> gup_fast_devmap_pmd_leaf() > * __gup_device_huge() -> gup_fast_devmap_leaf() > > Helper functions: > * unpin_user_pages_lockless() -> gup_fast_unpin_user_pages() > * gup_fast_folio_allowed() is already properly named > * gup_fast_permitted() is already properly named > > With "gup_fast()", we now even have a function that is referred to in > comment in mm/mmu_gather.c. > > Signed-off-by: David Hildenbrand > --- > mm/gup.c | 164 --- > 1 file changed, 84 insertions(+), 80 deletions(-) I think it is a great idea, it always takes a moment to figure out if a function is part of the fast callchain or not.. (even better would be to shift the fast stuff into its own file, but I expect that is too much) Reviewed-by: Jason Gunthorpe Jason
[PATCH RFC 3/3] mm: use "GUP-fast" instead "fast GUP" in remaining comments
Let's fixup the remaining comments to consistently call that thing "GUP-fast". With this change, we consistently call it "GUP-fast". Signed-off-by: David Hildenbrand --- mm/filemap.c| 2 +- mm/khugepaged.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 387b394754fa..c668e11cd6ef 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1810,7 +1810,7 @@ EXPORT_SYMBOL(page_cache_prev_miss); * C. Return the page to the page allocator * * This means that any page may have its reference count temporarily - * increased by a speculative page cache (or fast GUP) lookup as it can + * increased by a speculative page cache (or GUP-fast) lookup as it can * be allocated by another user before the RCU grace period expires. * Because the refcount temporarily acquired here may end up being the * last refcount on the page, any page allocation must be freeable by diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 38830174608f..6972fa05132e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1169,7 +1169,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, * huge and small TLB entries for the same virtual address to * avoid the risk of CPU bugs in that area. * -* Parallel fast GUP is fine since fast GUP will back off when +* Parallel GUP-fast is fine since GUP-fast will back off when * it detects PMD is changed. */ _pmd = pmdp_collapse_flush(vma, address, pmd); -- 2.43.2