Re: [PATCH v2 12/14] mm/treewide: Remove pXd_huge()

2024-05-26 Thread Christophe Leroy


Le 18/03/2024 à 21:04, pet...@redhat.com a écrit :
> From: Peter Xu 
> 
> This API is not used anymore, drop it for the whole tree.

Some documentation remain in v6.10-rc1:

$ git grep -w p.d_huge
Documentation/mm/arch_pgtable_helpers.rst:| pmd_huge  | 
Tests a HugeTLB mapped PMD   |
Documentation/mm/arch_pgtable_helpers.rst:| pud_huge  | 
Tests a HugeTLB mapped PUD   |
arch/x86/mm/pat/set_memory.c:* otherwise 
pmd_present/pmd_huge will return true


Christophe


Re: [RFC PATCH v3 05/16] powerpc/mm: Fix __find_linux_pte() on 32 bits with PMD leaf entries

2024-05-26 Thread Christophe Leroy


Le 27/05/2024 à 06:55, Oscar Salvador a écrit :
> On Sun, May 26, 2024 at 11:22:25AM +0200, Christophe Leroy wrote:
>> Building on 32 bits with pmd_leaf() not returning always false leads
>> to the following error:
>>
>>CC  arch/powerpc/mm/pgtable.o
>> arch/powerpc/mm/pgtable.c: In function '__find_linux_pte':
>> arch/powerpc/mm/pgtable.c:506:1: error: function may return address of local 
>> variable [-Werror=return-local-addr]
>>506 | }
>>| ^
>> arch/powerpc/mm/pgtable.c:394:15: note: declared here
>>394 | pud_t pud, *pudp;
>>|   ^~~
>> arch/powerpc/mm/pgtable.c:394:15: note: declared here
>>
>> This is due to pmd_offset() being a no-op in that case.
>>
>> So rework it for powerpc/32 so that pXd_offset() are used on real
>> pointers and not on on-stack copies.
>>
>> Signed-off-by: Christophe Leroy 
> 
> Maybe this could be folded into the patch that makes pmd_leaf() not returning
> always false, but no strong feelings:

I prefer to keep it separate, the patch introducing pmd_leaf() is 
already big enough.

> 
> Reviewed-by: Oscar Salvador 
> 
> 


Re: [RFC PATCH v3 07/16] powerpc/8xx: Fix size given to set_huge_pte_at()

2024-05-26 Thread Oscar Salvador
On Sun, May 26, 2024 at 11:22:27AM +0200, Christophe Leroy wrote:
> set_huge_pte_at() expects the size of the hugepage as an int, not the
> psize which is the index of the page definition in table mmu_psize_defs[]
> 
> Fixes: 935d4f0c6dc8 ("mm: hugetlb: add huge page size param to 
> set_huge_pte_at()")
> Signed-off-by: Christophe Leroy 

Reviewed-by: Oscar Salvador 

> ---
>  arch/powerpc/mm/nohash/8xx.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
> index 43d4842bb1c7..d93433e26ded 100644
> --- a/arch/powerpc/mm/nohash/8xx.c
> +++ b/arch/powerpc/mm/nohash/8xx.c
> @@ -94,7 +94,8 @@ static int __ref __early_map_kernel_hugepage(unsigned long 
> va, phys_addr_t pa,
>   return -EINVAL;
>  
>   set_huge_pte_at(&init_mm, va, ptep,
> - pte_mkhuge(pfn_pte(pa >> PAGE_SHIFT, prot)), psize);
> + pte_mkhuge(pfn_pte(pa >> PAGE_SHIFT, prot)),
> + 1UL << mmu_psize_to_shift(psize));
>  
>   return 0;
>  }
> -- 
> 2.44.0
> 

-- 
Oscar Salvador
SUSE Labs


Re: [RFC PATCH v3 05/16] powerpc/mm: Fix __find_linux_pte() on 32 bits with PMD leaf entries

2024-05-26 Thread Oscar Salvador
On Sun, May 26, 2024 at 11:22:25AM +0200, Christophe Leroy wrote:
> Building on 32 bits with pmd_leaf() not returning always false leads
> to the following error:
> 
>   CC  arch/powerpc/mm/pgtable.o
> arch/powerpc/mm/pgtable.c: In function '__find_linux_pte':
> arch/powerpc/mm/pgtable.c:506:1: error: function may return address of local 
> variable [-Werror=return-local-addr]
>   506 | }
>   | ^
> arch/powerpc/mm/pgtable.c:394:15: note: declared here
>   394 | pud_t pud, *pudp;
>   |   ^~~
> arch/powerpc/mm/pgtable.c:394:15: note: declared here
> 
> This is due to pmd_offset() being a no-op in that case.
> 
> So rework it for powerpc/32 so that pXd_offset() are used on real
> pointers and not on on-stack copies.
> 
> Signed-off-by: Christophe Leroy 

Maybe this could be folded into the patch that makes pmd_leaf() not returning
always false, but no strong feelings:

Reviewed-by: Oscar Salvador 


-- 
Oscar Salvador
SUSE Labs


Re: [RFC PATCH v3 02/16] mm: Define __pte_leaf_size() to also take a PMD entry

2024-05-26 Thread Oscar Salvador
On Sun, May 26, 2024 at 11:22:22AM +0200, Christophe Leroy wrote:
> On powerpc 8xx, when a page is 8M size, the information is in the PMD
> entry. So allow architectures to provide __pte_leaf_size() instead of
> pte_leaf_size() and provide the PMD entry to that function.
> 
> When __pte_leaf_size() is not defined, define it as a pte_leaf_size()
> so that architectures not interested in the PMD arguments are not
> impacted.
> 
> Only define a default pte_leaf_size() when __pte_leaf_size() is not
> defined to make sure nobody adds new calls to pte_leaf_size() in the
> core.
> 
> Signed-off-by: Christophe Leroy 

thanks, this looks much cleaner.

Reviewed-by: Oscar Salvador 


-- 
Oscar Salvador
SUSE Labs


Re: [RFC PATCH v3 00/16] Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)

2024-05-26 Thread Oscar Salvador
On Sun, May 26, 2024 at 11:22:20AM +0200, Christophe Leroy wrote:
> This is the continuation of the RFC v1 series "Reimplement huge pages
> without hugepd on powerpc 8xx". It now get rid of hugepd completely
> after handling also e500 and book3s/64
> 
> Also see https://github.com/linuxppc/issues/issues/483
> 
> Unlike most architectures, powerpc 8xx HW requires a two-level
> pagetable topology for all page sizes. So a leaf PMD-contig approach
> is not feasible as such.
> 
> Possible sizes on 8xx are 4k, 16k, 512k and 8M.
> 
> First level (PGD/PMD) covers 4M per entry. For 8M pages, two PMD entries
> must point to a single entry level-2 page table. Until now that was
> done using hugepd. This series changes it to use standard page tables
> where the entry is replicated 1024 times on each of the two pagetables
> refered by the two associated PMD entries for that 8M page.
> 
> For e500 and book3s/64 there are less constraints because it is not
> tied to the HW assisted tablewalk like on 8xx, so it is easier to use
> leaf PMDs (and PUDs).
> 
> On e500 the supported page sizes are 4M, 16M, 64M, 256M and 1G. All at
> PMD level on e500/32 (mpc85xx) and mix of PMD and PUD for e500/64. We
> encode page size with 4 available bits in PTE entries. On e300/32 PGD
> entries size is increases to 64 bits in order to allow leaf-PMD entries
> because PTE are 64 bits on e500.
> 
> On book3s/64 only the hash-4k mode is concerned. It supports 16M pages
> as cont-PMD and 16G pages as cont-PUD. In other modes (radix-4k, radix-6k
> and hash-64k) the sizes match with PMD and PUD sizes so that's just leaf
> entries. The hash processing make things a bit more complex. To ease
> things, __hash_page_huge() is modified to bail out when DIRTY or ACCESSED
> bits are missing, leaving it to mm core to fix it.
> 
> Global changes in v3:
> - Removed patches 1 and 2
> - Squashed patch 11 into patch 5
> - Replaced patches 12 and 13 with a series from Michael
> - Reordered patches a bit to have more general patches up front
> 
> For more details on changes, see in each patch.
> 
> Christophe Leroy (15):
>   mm: Define __pte_leaf_size() to also take a PMD entry
>   mm: Provide mm_struct and address to huge_ptep_get()
>   powerpc/mm: Remove _PAGE_PSIZE
>   powerpc/mm: Fix __find_linux_pte() on 32 bits with PMD leaf entries
>   powerpc/mm: Allow hugepages without hugepd
>   powerpc/8xx: Fix size given to set_huge_pte_at()
>   powerpc/8xx: Rework support for 8M pages using contiguous PTE entries
>   powerpc/8xx: Simplify struct mmu_psize_def
>   powerpc/e500: Remove enc and ind fields from struct mmu_psize_def
>   powerpc/e500: Switch to 64 bits PGD on 85xx (32 bits)
>   powerpc/e500: Encode hugepage size in PTE bits
>   powerpc/e500: Use contiguous PMD instead of hugepd
>   powerpc/64s: Use contiguous PMD/PUD instead of HUGEPD
>   powerpc/mm: Remove hugepd leftovers
>   mm: Remove CONFIG_ARCH_HAS_HUGEPD

I glanced over it and it looks much better, not having to fiddle with other arch
code and generic declarations is a big plus.
I plan to do a proper review tomorrow.

Thanks for working on this Christophe!


-- 
Oscar Salvador
SUSE Labs


Re: [PATCH 0/6] ipmi: Convert to platform remove callback returning void

2024-05-26 Thread Uwe Kleine-König
Hello Corey,

On Sat, May 25, 2024 at 09:39:36AM -0500, Corey Minyard wrote:
> On Sat, May 25, 2024 at 12:10:38PM +0200, Uwe Kleine-König wrote:
> > These changes are in next since a while but didn't land in Linus tree
> > for v6.10-rc1. I intend to send a PR to Greg early next week changing
> > platform_driver::remove to match remove_new. If these commits don't make
> > it in in time, I'll be so bold and just include the commits from your
> > for-next branch in my PR.
> 
> I sent them to Linus right after 6.9 dropped, let me resend...

That worked, they landed now in Linus' tree. Thanks, that makes it a bit
less ugly for me.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | https://www.pengutronix.de/ |


signature.asc
Description: PGP signature


Re: [RFC PATCH 4/8] mm: Provide mm_struct and address to huge_ptep_get()

2024-05-26 Thread Christophe Leroy


Le 25/03/2024 à 17:35, Jason Gunthorpe a écrit :
> On Mon, Mar 25, 2024 at 03:55:57PM +0100, Christophe Leroy wrote:
> 
>>   arch/arm64/include/asm/hugetlb.h |  2 +-
>>   fs/hugetlbfs/inode.c |  2 +-
>>   fs/proc/task_mmu.c   |  8 +++---
>>   fs/userfaultfd.c |  2 +-
>>   include/asm-generic/hugetlb.h|  2 +-
>>   include/linux/swapops.h  |  2 +-
>>   mm/damon/vaddr.c |  6 ++---
>>   mm/gup.c |  2 +-
>>   mm/hmm.c |  2 +-
>>   mm/hugetlb.c | 46 
>>   mm/memory-failure.c  |  2 +-
>>   mm/mempolicy.c   |  2 +-
>>   mm/migrate.c |  4 +--
>>   mm/mincore.c |  2 +-
>>   mm/userfaultfd.c |  2 +-
>>   15 files changed, 43 insertions(+), 43 deletions(-)
>>
>> diff --git a/arch/qarm64/include/asm/hugetlb.h 
>> b/arch/arm64/include/asm/hugetlb.h
>> index 2ddc33d93b13..1af39a74e791 100644
>> --- a/arch/arm64/include/asm/hugetlb.h
>> +++ b/arch/arm64/include/asm/hugetlb.h
>> @@ -46,7 +46,7 @@ extern pte_t huge_ptep_clear_flush(struct vm_area_struct 
>> *vma,
>>   extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
>> pte_t *ptep, unsigned long sz);
>>   #define __HAVE_ARCH_HUGE_PTEP_GET
>> -extern pte_t huge_ptep_get(pte_t *ptep);
>> +extern pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t 
>> *ptep);
> 
> The header changed but not the implementation? This will need to do
> riscv and s390 too.

It is now fixed.

> 
> Though, really, I think the right path is to work toward removing
> huge_ptep_get() from the arch code..
> 
> riscv and arm are doing the same thing - propogating dirty/young bits
> from the contig PTEs to the results. The core code can do this, maybe
> with a ARCH #define opt in.
> 
> s390.. Ouchy - is this because hugetlb wants to pretend that every
> level is encoded as a PTE so it takes the PGD and recodes the flags to
> the PTE layout??
> 
> Jason


[RFC PATCH v3 15/16] powerpc/mm: Remove hugepd leftovers

2024-05-26 Thread Christophe Leroy
All targets have now opted out of CONFIG_ARCH_HAS_HUGEPD so
remove left over code.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/hugetlb.h  |   7 -
 arch/powerpc/include/asm/page.h |   6 -
 arch/powerpc/include/asm/pgtable-be-types.h |  10 -
 arch/powerpc/include/asm/pgtable-types.h|   9 -
 arch/powerpc/mm/hugetlbpage.c   | 412 
 arch/powerpc/mm/init-common.c   |   8 +-
 arch/powerpc/mm/pgtable.c   |  27 +-
 7 files changed, 3 insertions(+), 476 deletions(-)

diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index e959c26c0b52..18a3028ac3b6 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -30,13 +30,6 @@ static inline int is_hugepage_only_range(struct mm_struct 
*mm,
 }
 #define is_hugepage_only_range is_hugepage_only_range
 
-#ifdef CONFIG_ARCH_HAS_HUGEPD
-#define __HAVE_ARCH_HUGETLB_FREE_PGD_RANGE
-void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
-   unsigned long end, unsigned long floor,
-   unsigned long ceiling);
-#endif
-
 #define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
 void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
 pte_t pte, unsigned long sz);
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index c0af246a64ff..83d0a4fc5f75 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -269,12 +269,6 @@ static inline const void *pfn_to_kaddr(unsigned long pfn)
 #define is_kernel_addr(x)  ((x) >= TASK_SIZE)
 #endif
 
-/*
- * Some number of bits at the level of the page table that points to
- * a hugepte are used to encode the size.  This masks those bits.
- */
-#define HUGEPD_SHIFT_MASK 0x3f
-
 #ifndef __ASSEMBLY__
 
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/include/asm/pgtable-be-types.h 
b/arch/powerpc/include/asm/pgtable-be-types.h
index 82633200b500..6bd8f89b25dc 100644
--- a/arch/powerpc/include/asm/pgtable-be-types.h
+++ b/arch/powerpc/include/asm/pgtable-be-types.h
@@ -101,14 +101,4 @@ static inline bool pmd_xchg(pmd_t *pmdp, pmd_t old, pmd_t 
new)
return pmd_raw(old) == prev;
 }
 
-#ifdef CONFIG_ARCH_HAS_HUGEPD
-typedef struct { __be64 pdbe; } hugepd_t;
-#define __hugepd(x) ((hugepd_t) { cpu_to_be64(x) })
-
-static inline unsigned long hpd_val(hugepd_t x)
-{
-   return be64_to_cpu(x.pdbe);
-}
-#endif
-
 #endif /* _ASM_POWERPC_PGTABLE_BE_TYPES_H */
diff --git a/arch/powerpc/include/asm/pgtable-types.h 
b/arch/powerpc/include/asm/pgtable-types.h
index db965d98e0ae..7b3d4c592a10 100644
--- a/arch/powerpc/include/asm/pgtable-types.h
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -87,13 +87,4 @@ static inline bool pte_xchg(pte_t *ptep, pte_t old, pte_t 
new)
 }
 #endif
 
-#ifdef CONFIG_ARCH_HAS_HUGEPD
-typedef struct { unsigned long pd; } hugepd_t;
-#define __hugepd(x) ((hugepd_t) { (x) })
-static inline unsigned long hpd_val(hugepd_t x)
-{
-   return x.pd;
-}
-#endif
-
 #endif /* _ASM_POWERPC_PGTABLE_TYPES_H */
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 1fe2843f5b12..76846c6014e4 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -28,8 +28,6 @@
 
 bool hugetlb_disabled = false;
 
-#define hugepd_none(hpd)   (hpd_val(hpd) == 0)
-
 #define PTE_T_ORDER(__builtin_ffs(sizeof(pte_basic_t)) - \
 __builtin_ffs(sizeof(void *)))
 
@@ -42,156 +40,6 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long 
addr, unsigned long s
return __find_linux_pte(mm->pgd, addr, NULL, NULL);
 }
 
-#ifdef CONFIG_ARCH_HAS_HUGEPD
-static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
-  unsigned long address, unsigned int pdshift,
-  unsigned int pshift, spinlock_t *ptl)
-{
-   struct kmem_cache *cachep;
-   pte_t *new;
-   int i;
-   int num_hugepd;
-
-   if (pshift >= pdshift) {
-   cachep = PGT_CACHE(PTE_T_ORDER);
-   num_hugepd = 1 << (pshift - pdshift);
-   } else {
-   cachep = PGT_CACHE(pdshift - pshift);
-   num_hugepd = 1;
-   }
-
-   if (!cachep) {
-   WARN_ONCE(1, "No page table cache created for hugetlb tables");
-   return -ENOMEM;
-   }
-
-   new = kmem_cache_alloc(cachep, pgtable_gfp_flags(mm, GFP_KERNEL));
-
-   BUG_ON(pshift > HUGEPD_SHIFT_MASK);
-   BUG_ON((unsigned long)new & HUGEPD_SHIFT_MASK);
-
-   if (!new)
-   return -ENOMEM;
-
-   /*
-* Make sure other cpus find the hugepd set only after a
-* properly initialized page table is visible to them.
-* For more details look for comment in __pte_alloc().
-*/
-   smp_wmb();
-
-   spin_lock(ptl);
-   /*
-* We have mu

[RFC PATCH v3 16/16] mm: Remove CONFIG_ARCH_HAS_HUGEPD

2024-05-26 Thread Christophe Leroy
powerpc was the only user of CONFIG_ARCH_HAS_HUGEPD and doesn't
use it anymore, so remove all related code.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/hugetlbpage.c |   1 -
 include/linux/hugetlb.h   |   6 --
 mm/Kconfig|  10 
 mm/gup.c  | 105 +-
 mm/pagewalk.c |  57 ++
 5 files changed, 5 insertions(+), 174 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 76846c6014e4..6b043180220a 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -78,7 +78,6 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct 
vm_area_struct *vma,
 
return pte_alloc_huge(mm, pmd, addr);
 }
-#endif
 
 #ifdef CONFIG_PPC_BOOK3S_64
 /*
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 77b30a8c6076..f6a509487773 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -20,12 +20,6 @@ struct user_struct;
 struct mmu_gather;
 struct node;
 
-#ifndef CONFIG_ARCH_HAS_HUGEPD
-typedef struct { unsigned long pd; } hugepd_t;
-#define is_hugepd(hugepd) (0)
-#define __hugepd(x) ((hugepd_t) { (x) })
-#endif
-
 void free_huge_folio(struct folio *folio);
 
 #ifdef CONFIG_HUGETLB_PAGE
diff --git a/mm/Kconfig b/mm/Kconfig
index b1448aa81e15..a52f8e3224fb 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1114,16 +1114,6 @@ config DMAPOOL_TEST
 config ARCH_HAS_PTE_SPECIAL
bool
 
-#
-# Some architectures require a special hugepage directory format that is
-# required to support multiple hugepage sizes. For example a4fe3ce76
-# "powerpc/mm: Allow more flexible layouts for hugepage pagetables"
-# introduced it on powerpc.  This allows for a more flexible hugepage
-# pagetable layouts.
-#
-config ARCH_HAS_HUGEPD
-   bool
-
 config MAPPING_DIRTY_HELPERS
 bool
 
diff --git a/mm/gup.c b/mm/gup.c
index 86b5105b82a1..95f121223f04 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2790,89 +2790,6 @@ static int record_subpages(struct page *page, unsigned 
long addr,
return nr;
 }
 
-#ifdef CONFIG_ARCH_HAS_HUGEPD
-static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end,
- unsigned long sz)
-{
-   unsigned long __boundary = (addr + sz) & ~(sz-1);
-   return (__boundary - 1 < end - 1) ? __boundary : end;
-}
-
-static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
-  unsigned long end, unsigned int flags,
-  struct page **pages, int *nr)
-{
-   unsigned long pte_end;
-   struct page *page;
-   struct folio *folio;
-   pte_t pte;
-   int refs;
-
-   pte_end = (addr + sz) & ~(sz-1);
-   if (pte_end < end)
-   end = pte_end;
-
-   pte = huge_ptep_get(NULL, addr, ptep);
-
-   if (!pte_access_permitted(pte, flags & FOLL_WRITE))
-   return 0;
-
-   /* hugepages are never "special" */
-   VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
-
-   page = nth_page(pte_page(pte), (addr & (sz - 1)) >> PAGE_SHIFT);
-   refs = record_subpages(page, addr, end, pages + *nr);
-
-   folio = try_grab_folio(page, refs, flags);
-   if (!folio)
-   return 0;
-
-   if (unlikely(pte_val(pte) != pte_val(ptep_get(ptep {
-   gup_put_folio(folio, refs, flags);
-   return 0;
-   }
-
-   if (!folio_fast_pin_allowed(folio, flags)) {
-   gup_put_folio(folio, refs, flags);
-   return 0;
-   }
-
-   if (!pte_write(pte) && gup_must_unshare(NULL, flags, &folio->page)) {
-   gup_put_folio(folio, refs, flags);
-   return 0;
-   }
-
-   *nr += refs;
-   folio_set_referenced(folio);
-   return 1;
-}
-
-static int gup_huge_pd(hugepd_t hugepd, unsigned long addr,
-   unsigned int pdshift, unsigned long end, unsigned int flags,
-   struct page **pages, int *nr)
-{
-   pte_t *ptep;
-   unsigned long sz = 1UL << hugepd_shift(hugepd);
-   unsigned long next;
-
-   ptep = hugepte_offset(hugepd, addr, pdshift);
-   do {
-   next = hugepte_addr_end(addr, end, sz);
-   if (!gup_hugepte(ptep, sz, addr, end, flags, pages, nr))
-   return 0;
-   } while (ptep++, addr = next, addr != end);
-
-   return 1;
-}
-#else
-static inline int gup_huge_pd(hugepd_t hugepd, unsigned long addr,
-   unsigned int pdshift, unsigned long end, unsigned int flags,
-   struct page **pages, int *nr)
-{
-   return 0;
-}
-#endif /* CONFIG_ARCH_HAS_HUGEPD */
-
 static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
unsigned long end, unsigned int flags,
struct page **pages, int *nr)
@@ -3026,14 +2943,6 @@ static int gup_pmd_range(pud_t *pudp, pud_t pud, 
unsigned long addr, unsigned lo
   

[RFC PATCH v3 13/16] powerpc/e500: Use contiguous PMD instead of hugepd

2024-05-26 Thread Christophe Leroy
e500 supports many page sizes among which the following size are
implemented in the kernel at the time being: 4M, 16M, 64M, 256M, 1G.

On e500, TLB miss for hugepages is exclusively handled by SW even
on e6500 which has HW assistance for 4k pages, so there are no
constraints like on the 8xx.

On e500/32, all are at PGD/PMD level and can be handled as
cont-PMD.

On e500/64, smaller ones are on PMD while bigger ones are on PUD.
Again, they can easily be handled as cont-PMD and cont-PUD instead
of hugepd.

Signed-off-by: Christophe Leroy 
---
v3: Add missing pmd_leaf_size() and pud_leaf_size()
---
 .../powerpc/include/asm/nohash/hugetlb-e500.h | 32 +-
 arch/powerpc/include/asm/nohash/pgalloc.h |  2 -
 arch/powerpc/include/asm/nohash/pgtable.h | 43 +
 arch/powerpc/include/asm/nohash/pte-e500.h| 28 +
 arch/powerpc/include/asm/page.h   | 15 +
 arch/powerpc/kernel/head_85xx.S   | 23 +++
 arch/powerpc/mm/hugetlbpage.c |  2 -
 arch/powerpc/mm/nohash/tlb_low_64e.S  | 63 +++
 arch/powerpc/mm/pgtable.c | 31 +
 arch/powerpc/platforms/Kconfig.cputype|  1 -
 10 files changed, 144 insertions(+), 96 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/hugetlb-e500.h 
b/arch/powerpc/include/asm/nohash/hugetlb-e500.h
index d8e51a3f8557..d30e2a3f129d 100644
--- a/arch/powerpc/include/asm/nohash/hugetlb-e500.h
+++ b/arch/powerpc/include/asm/nohash/hugetlb-e500.h
@@ -2,38 +2,12 @@
 #ifndef _ASM_POWERPC_NOHASH_HUGETLB_E500_H
 #define _ASM_POWERPC_NOHASH_HUGETLB_E500_H
 
-static inline pte_t *hugepd_page(hugepd_t hpd)
-{
-   if (WARN_ON(!hugepd_ok(hpd)))
-   return NULL;
-
-   return (pte_t *)((hpd_val(hpd) & ~HUGEPD_SHIFT_MASK) | PD_HUGE);
-}
-
-static inline unsigned int hugepd_shift(hugepd_t hpd)
-{
-   return hpd_val(hpd) & HUGEPD_SHIFT_MASK;
-}
-
-static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
-   unsigned int pdshift)
-{
-   /*
-* On FSL BookE, we have multiple higher-level table entries that
-* point to the same hugepte.  Just use the first one since they're all
-* identical.  So for that case, idx=0.
-*/
-   return hugepd_page(hpd);
-}
+#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
+void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
+pte_t pte, unsigned long sz);
 
 void flush_hugetlb_page(struct vm_area_struct *vma, unsigned long vmaddr);
 
-static inline void hugepd_populate(hugepd_t *hpdp, pte_t *new, unsigned int 
pshift)
-{
-   /* We use the old format for PPC_E500 */
-   *hpdp = __hugepd(((unsigned long)new & ~PD_HUGE) | pshift);
-}
-
 static inline int check_and_get_huge_psize(int shift)
 {
if (shift & 1)  /* Not a power of 4 */
diff --git a/arch/powerpc/include/asm/nohash/pgalloc.h 
b/arch/powerpc/include/asm/nohash/pgalloc.h
index 4b62376318e1..d06efac6d7aa 100644
--- a/arch/powerpc/include/asm/nohash/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/pgalloc.h
@@ -44,8 +44,6 @@ static inline void pgtable_free(void *table, int shift)
}
 }
 
-#define get_hugepd_cache_index(x)  (x)
-
 static inline void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int 
shift)
 {
unsigned long pgf = (unsigned long)table;
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index c4be7754e96f..28ecb2c8b433 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -52,11 +52,36 @@ static inline pte_basic_t pte_update(struct mm_struct *mm, 
unsigned long addr, p
 {
pte_basic_t old = pte_val(*p);
pte_basic_t new = (old & ~(pte_basic_t)clr) | set;
+   unsigned long sz;
+   unsigned long pdsize;
+   int i;
 
if (new == old)
return old;
 
-   *p = __pte(new);
+#ifdef CONFIG_PPC_E500
+   if (huge)
+   sz = 1UL << (((old & _PAGE_HSIZE_MSK) >> _PAGE_HSIZE_SHIFT) + 
20);
+   else
+#endif
+   sz = PAGE_SIZE;
+
+   if (!huge || sz < PMD_SIZE)
+   pdsize = PAGE_SIZE;
+   else if (sz < PUD_SIZE)
+   pdsize = PMD_SIZE;
+   else if (sz < P4D_SIZE)
+   pdsize = PUD_SIZE;
+   else if (sz < PGDIR_SIZE)
+   pdsize = P4D_SIZE;
+   else
+   pdsize = PGDIR_SIZE;
+
+   for (i = 0; i < sz / pdsize; i++, p++) {
+   *p = __pte(new);
+   if (new)
+   new += (unsigned long long)(pdsize / PAGE_SIZE) << 
PTE_RPN_SHIFT;
+   }
 
if (IS_ENABLED(CONFIG_44x) && !is_kernel_addr(addr) && (old & 
_PAGE_EXEC))
icache_44x_need_flush = 1;
@@ -340,25 +365,19 @@ static inline void __set_pte_at(struct mm_struct *mm, 
unsigned long addr,
 
 #define pgprot_writecombine pgprot_noncached_wc
 
-#ifdef 

Re: [RFC PATCH 1/8] mm: Provide pagesize to pmd_populate()

2024-05-26 Thread Christophe Leroy


Le 25/03/2024 à 17:19, Jason Gunthorpe a écrit :
> On Mon, Mar 25, 2024 at 03:55:54PM +0100, Christophe Leroy wrote:
>> Unlike many architectures, powerpc 8xx hardware tablewalk requires
>> a two level process for all page sizes, allthough second level only
>> has one entry when pagesize is 8M.
>>
>> To fit with Linux page table topology and without requiring special
>> page directory layout like hugepd, the page entry will be replicated
>> 1024 times in the standard page table. However for large pages it is
>> necessary to set bits in the level-1 (PMD) entry. At the time being,
>> for 512k pages the flag is kept in the PTE and inserted in the PMD
>> entry at TLB miss exception, that is necessary because we can have
>> pages of different sizes in a page table. However the 12 PTE bits are
>> fully used and there is no room for an additional bit for page size.
>>
>> For 8M pages, there will be only one page per PMD entry, it is
>> therefore possible to flag the pagesize in the PMD entry, with the
>> advantage that the information will already be at the right place for
>> the hardware.
>>
>> To do so, add a new helper called pmd_populate_size() which takes the
>> page size as an additional argument, and modify __pte_alloc() to also
>> take that argument. pte_alloc() is left unmodified in order to
>> reduce churn on callers, and a pte_alloc_size() is added for use by
>> pte_alloc_huge().
>>
>> When an architecture doesn't provide pmd_populate_size(),
>> pmd_populate() is used as a fallback.
> 
> I think it would be a good idea to document what the semantic is
> supposed to be for sz?
> 
> Just a general remark, probably nothing for this, but with these new
> arguments the historical naming seems pretty tortured for
> pte_alloc_size().. Something like pmd_populate_leaf(size) as a naming
> scheme would make this more intuitive. Ie pmd_populate_leaf() gives
> you a PMD entry where the entry points to a leaf page table able to
> store folios of at least size.

I removed patches 1 and 2 and now add bit _PMD_PAGE_8M in PMD entry 
afterwards in set_huge_pte_at()

> 
> Anyhow, I thought the edits to the mm helpers were fine, certainly
> much nicer than hugepd. Do you see a path to remove hugepd entirely
> from here?
> 
> Thanks,
> Jason


[RFC PATCH v3 14/16] powerpc/64s: Use contiguous PMD/PUD instead of HUGEPD

2024-05-26 Thread Christophe Leroy
On book3s/64, the only user of hugepd is hash in 4k mode.

All other setups (hash-64, radix-4, radix-64) use leaf PMD/PUD.

Rework hash-4k to use contiguous PMD and PUD instead.

In that setup there are only two huge page sizes: 16M and 16G.

16M sits at PMD level and 16G at PUD level.

pte_update doesn't know page size, lets use the same trick as
hpte_need_flush() to get page size from segment properties. That's
not the most efficient way but let's do that until callers of
pte_update() provide page size instead of just a huge flag.

Signed-off-by: Christophe Leroy 
---
v3:
- Add missing pmd_leaf_size() and pud_leaf_size()
- More cleanup in hugetlbpage_init()
- Take a page fault when DIRTY or ACCESSED is missing on hash-4 hugepage
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  | 15 
 arch/powerpc/include/asm/book3s/64/hash.h | 38 +++
 arch/powerpc/include/asm/book3s/64/hugetlb.h  | 38 ---
 .../include/asm/book3s/64/pgtable-4k.h| 34 -
 .../include/asm/book3s/64/pgtable-64k.h   | 20 --
 arch/powerpc/include/asm/book3s/64/pgtable.h  | 18 +
 arch/powerpc/include/asm/hugetlb.h|  4 ++
 .../powerpc/include/asm/nohash/hugetlb-e500.h |  4 --
 arch/powerpc/include/asm/page.h   |  8 
 arch/powerpc/mm/book3s64/hash_utils.c | 11 --
 arch/powerpc/mm/book3s64/hugetlbpage.c| 10 +
 arch/powerpc/mm/book3s64/pgtable.c| 12 --
 arch/powerpc/mm/hugetlbpage.c | 27 -
 arch/powerpc/mm/pgtable.c |  2 +-
 arch/powerpc/platforms/Kconfig.cputype|  1 -
 15 files changed, 71 insertions(+), 171 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 6472b08fa1b0..c654c376ef8b 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -74,21 +74,6 @@
 #define remap_4k_pfn(vma, addr, pfn, prot) \
remap_pfn_range((vma), (addr), (pfn), PAGE_SIZE, (prot))
 
-#ifdef CONFIG_HUGETLB_PAGE
-static inline int hash__hugepd_ok(hugepd_t hpd)
-{
-   unsigned long hpdval = hpd_val(hpd);
-   /*
-* if it is not a pte and have hugepd shift mask
-* set, then it is a hugepd directory pointer
-*/
-   if (!(hpdval & _PAGE_PTE) && (hpdval & _PAGE_PRESENT) &&
-   ((hpdval & HUGEPD_SHIFT_MASK) != 0))
-   return true;
-   return false;
-}
-#endif
-
 /*
  * 4K PTE format is different from 64K PTE format. Saving the hash_slot is just
  * a matter of returning the PTE bits that need to be modified. On 64K PTE,
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index faf3e3b4e4b2..8202c27afe23 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -4,6 +4,7 @@
 #ifdef __KERNEL__
 
 #include 
+#include 
 
 /*
  * Common bits between 4K and 64K pages in a linux-style PTE.
@@ -161,14 +162,10 @@ extern void hpte_need_flush(struct mm_struct *mm, 
unsigned long addr,
pte_t *ptep, unsigned long pte, int huge);
 unsigned long htab_convert_pte_flags(unsigned long pteflags, unsigned long 
flags);
 /* Atomic PTE updates */
-static inline unsigned long hash__pte_update(struct mm_struct *mm,
-unsigned long addr,
-pte_t *ptep, unsigned long clr,
-unsigned long set,
-int huge)
+static inline unsigned long hash__pte_update_one(pte_t *ptep, unsigned long 
clr,
+unsigned long set)
 {
__be64 old_be, tmp_be;
-   unsigned long old;
 
__asm__ __volatile__(
"1: ldarx   %0,0,%3 # pte_update\n\
@@ -182,11 +179,38 @@ static inline unsigned long hash__pte_update(struct 
mm_struct *mm,
: "r" (ptep), "r" (cpu_to_be64(clr)), "m" (*ptep),
  "r" (cpu_to_be64(H_PAGE_BUSY)), "r" (cpu_to_be64(set))
: "cc" );
+
+   return be64_to_cpu(old_be);
+}
+
+static inline unsigned long hash__pte_update(struct mm_struct *mm,
+unsigned long addr,
+pte_t *ptep, unsigned long clr,
+unsigned long set,
+int huge)
+{
+   unsigned long old;
+
+   old = hash__pte_update_one(ptep, clr, set);
+
+   if (IS_ENABLED(CONFIG_PPC_4K_PAGES) && huge) {
+   unsigned int psize = get_slice_psize(mm, addr);
+   int nb, i;
+
+   if (psize == MMU_PAGE_16M)
+   nb = SZ_16M / PMD_SIZE;
+   else if (psize == MMU_PAGE_16G)
+   nb = SZ_16G / PUD_SIZE;
+   else
+   

[RFC PATCH v3 11/16] powerpc/e500: Switch to 64 bits PGD on 85xx (32 bits)

2024-05-26 Thread Christophe Leroy
At the time being when CONFIG_PTE_64BIT is selected, PTE entries are
64 bits but PGD entries are still 32 bits.

In order to allow leaf PMD entries, switch the PGD to 64 bits entries.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/pgtable-types.h |  4 
 arch/powerpc/kernel/head_85xx.S  | 10 ++
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable-types.h 
b/arch/powerpc/include/asm/pgtable-types.h
index 082c85cc09b1..db965d98e0ae 100644
--- a/arch/powerpc/include/asm/pgtable-types.h
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -49,7 +49,11 @@ static inline unsigned long pud_val(pud_t x)
 #endif /* CONFIG_PPC64 */
 
 /* PGD level */
+#if defined(CONFIG_PPC_E500) && defined(CONFIG_PTE_64BIT)
+typedef struct { unsigned long long pgd; } pgd_t;
+#else
 typedef struct { unsigned long pgd; } pgd_t;
+#endif
 #define __pgd(x)   ((pgd_t) { (x) })
 static inline unsigned long pgd_val(pgd_t x)
 {
diff --git a/arch/powerpc/kernel/head_85xx.S b/arch/powerpc/kernel/head_85xx.S
index 39724ff5ae1f..a305244afc9f 100644
--- a/arch/powerpc/kernel/head_85xx.S
+++ b/arch/powerpc/kernel/head_85xx.S
@@ -307,8 +307,9 @@ set_ivor:
 #ifdef CONFIG_PTE_64BIT
 #ifdef CONFIG_HUGETLB_PAGE
 #define FIND_PTE   \
-   rlwinm  r12, r10, 13, 19, 29;   /* Compute pgdir/pmd offset */  \
-   lwzxr11, r12, r11;  /* Get pgd/pmd entry */ \
+   rlwinm  r12, r10, 14, 18, 28;   /* Compute pgdir/pmd offset */  \
+   add r12, r11, r12;  \
+   lwz r11, 4(r12);/* Get pgd/pmd entry */ \
rlwinm. r12, r11, 0, 0, 20; /* Extract pt base address */   \
blt 1000f;  /* Normal non-huge page */  \
beq 2f; /* Bail if no table */  \
@@ -321,8 +322,9 @@ set_ivor:
 1001:  lwz r11, 4(r12);/* Get pte entry */
 #else
 #define FIND_PTE   \
-   rlwinm  r12, r10, 13, 19, 29;   /* Compute pgdir/pmd offset */  \
-   lwzxr11, r12, r11;  /* Get pgd/pmd entry */ \
+   rlwinm  r12, r10, 14, 18, 28;   /* Compute pgdir/pmd offset */  \
+   add r12, r11, r12;  \
+   lwz r11, 4(r12);/* Get pgd/pmd entry */ \
rlwinm. r12, r11, 0, 0, 20; /* Extract pt base address */   \
beq 2f; /* Bail if no table */  \
rlwimi  r12, r10, 23, 20, 28;   /* Compute pte address */   \
-- 
2.44.0



[RFC PATCH v3 10/16] powerpc/e500: Remove enc and ind fields from struct mmu_psize_def

2024-05-26 Thread Christophe Leroy
enc field is hidden behind BOOK3E_PAGESZ_XX macros, and when you look
closer you realise that this field is nothing else than the value of
shift minus ten.

So remove enc field and calculate tsize from shift field.

Also remove inc field which is unused.

Signed-off-by: Christophe Leroy 
Reviewed-by: Oscar Salvador 
---
 arch/powerpc/include/asm/nohash/mmu-e500.h | 3 ---
 arch/powerpc/mm/nohash/book3e_pgtable.c| 4 ++--
 arch/powerpc/mm/nohash/tlb.c   | 9 +
 arch/powerpc/mm/nohash/tlb_64e.c   | 2 +-
 4 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/mmu-e500.h 
b/arch/powerpc/include/asm/nohash/mmu-e500.h
index 7dc24b8632d7..b281d9eeaf1e 100644
--- a/arch/powerpc/include/asm/nohash/mmu-e500.h
+++ b/arch/powerpc/include/asm/nohash/mmu-e500.h
@@ -244,14 +244,11 @@ typedef struct {
 /* Page size definitions, common between 32 and 64-bit
  *
  *shift : is the "PAGE_SHIFT" value for that page size
- *penc  : is the pte encoding mask
  *
  */
 struct mmu_psize_def
 {
unsigned intshift;  /* number of bits */
-   unsigned intenc;/* PTE encoding */
-   unsigned intind;/* Corresponding indirect page size shift */
unsigned intflags;
 #define MMU_PAGE_SIZE_DIRECT   0x1 /* Supported as a direct size */
 #define MMU_PAGE_SIZE_INDIRECT 0x2 /* Supported as an indirect size */
diff --git a/arch/powerpc/mm/nohash/book3e_pgtable.c 
b/arch/powerpc/mm/nohash/book3e_pgtable.c
index 1c5e4ecbebeb..ad2a7c26f2a0 100644
--- a/arch/powerpc/mm/nohash/book3e_pgtable.c
+++ b/arch/powerpc/mm/nohash/book3e_pgtable.c
@@ -29,10 +29,10 @@ int __meminit vmemmap_create_mapping(unsigned long start,
_PAGE_KERNEL_RW;
 
/* PTEs only contain page size encodings up to 32M */
-   BUG_ON(mmu_psize_defs[mmu_vmemmap_psize].enc > 0xf);
+   BUG_ON(mmu_psize_defs[mmu_vmemmap_psize].shift - 10 > 0xf);
 
/* Encode the size in the PTE */
-   flags |= mmu_psize_defs[mmu_vmemmap_psize].enc << 8;
+   flags |= (mmu_psize_defs[mmu_vmemmap_psize].shift - 10) << 8;
 
/* For each PTE for that area, map things. Note that we don't
 * increment phys because all PTEs are of the large size and
diff --git a/arch/powerpc/mm/nohash/tlb.c b/arch/powerpc/mm/nohash/tlb.c
index f57dc721d063..b653a7be4cb1 100644
--- a/arch/powerpc/mm/nohash/tlb.c
+++ b/arch/powerpc/mm/nohash/tlb.c
@@ -53,37 +53,30 @@
 struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
[MMU_PAGE_4K] = {
.shift  = 12,
-   .enc= BOOK3E_PAGESZ_4K,
},
[MMU_PAGE_2M] = {
.shift  = 21,
-   .enc= BOOK3E_PAGESZ_2M,
},
[MMU_PAGE_4M] = {
.shift  = 22,
-   .enc= BOOK3E_PAGESZ_4M,
},
[MMU_PAGE_16M] = {
.shift  = 24,
-   .enc= BOOK3E_PAGESZ_16M,
},
[MMU_PAGE_64M] = {
.shift  = 26,
-   .enc= BOOK3E_PAGESZ_64M,
},
[MMU_PAGE_256M] = {
.shift  = 28,
-   .enc= BOOK3E_PAGESZ_256M,
},
[MMU_PAGE_1G] = {
.shift  = 30,
-   .enc= BOOK3E_PAGESZ_1GB,
},
 };
 
 static inline int mmu_get_tsize(int psize)
 {
-   return mmu_psize_defs[psize].enc;
+   return mmu_psize_defs[psize].shift - 10;
 }
 #else
 static inline int mmu_get_tsize(int psize)
diff --git a/arch/powerpc/mm/nohash/tlb_64e.c b/arch/powerpc/mm/nohash/tlb_64e.c
index 053128a5636c..7988238496d7 100644
--- a/arch/powerpc/mm/nohash/tlb_64e.c
+++ b/arch/powerpc/mm/nohash/tlb_64e.c
@@ -53,7 +53,7 @@ int extlb_level_exc;
  */
 void tlb_flush_pgtable(struct mmu_gather *tlb, unsigned long address)
 {
-   int tsize = mmu_psize_defs[mmu_pte_psize].enc;
+   int tsize = mmu_psize_defs[mmu_pte_psize].shift - 10;
 
if (book3e_htw_mode != PPC_HTW_NONE) {
unsigned long start = address & PMD_MASK;
-- 
2.44.0



[RFC PATCH v3 12/16] powerpc/e500: Encode hugepage size in PTE bits

2024-05-26 Thread Christophe Leroy
Use U0-U3 bits to encode hugepage size, more exactly page shift.

As we start using hugepages at shift 21 (2Mbytes), substract 20
so that it fits into 4 bits. That may change in the future if
we want to use smaller hugepages.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/hugetlb-e500.h | 6 ++
 arch/powerpc/include/asm/nohash/pte-e500.h | 3 +++
 2 files changed, 9 insertions(+)

diff --git a/arch/powerpc/include/asm/nohash/hugetlb-e500.h 
b/arch/powerpc/include/asm/nohash/hugetlb-e500.h
index 8f04ad20e040..d8e51a3f8557 100644
--- a/arch/powerpc/include/asm/nohash/hugetlb-e500.h
+++ b/arch/powerpc/include/asm/nohash/hugetlb-e500.h
@@ -42,4 +42,10 @@ static inline int check_and_get_huge_psize(int shift)
return shift_to_mmu_psize(shift);
 }
 
+static inline pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, 
vm_flags_t flags)
+{
+   return __pte(pte_val(entry) | (_PAGE_U3 * (shift - 20)));
+}
+#define arch_make_huge_pte arch_make_huge_pte
+
 #endif /* _ASM_POWERPC_NOHASH_HUGETLB_E500_H */
diff --git a/arch/powerpc/include/asm/nohash/pte-e500.h 
b/arch/powerpc/include/asm/nohash/pte-e500.h
index 975facc7e38e..091e4bff1fba 100644
--- a/arch/powerpc/include/asm/nohash/pte-e500.h
+++ b/arch/powerpc/include/asm/nohash/pte-e500.h
@@ -46,6 +46,9 @@
 #define _PAGE_NO_CACHE 0x40 /* I: cache inhibit */
 #define _PAGE_WRITETHRU0x80 /* W: cache write-through */
 
+#define _PAGE_HSIZE_MSK (_PAGE_U0 | _PAGE_U1 | _PAGE_U2 | _PAGE_U3)
+#define _PAGE_HSIZE_SHIFT  14
+
 /* "Higher level" linux bit combinations */
 #define _PAGE_EXEC (_PAGE_BAP_SX | _PAGE_BAP_UX) /* .. and was 
cache cleaned */
 #define _PAGE_READ (_PAGE_BAP_SR | _PAGE_BAP_UR) /* User read 
permission */
-- 
2.44.0



[RFC PATCH v3 02/16] mm: Define __pte_leaf_size() to also take a PMD entry

2024-05-26 Thread Christophe Leroy
On powerpc 8xx, when a page is 8M size, the information is in the PMD
entry. So allow architectures to provide __pte_leaf_size() instead of
pte_leaf_size() and provide the PMD entry to that function.

When __pte_leaf_size() is not defined, define it as a pte_leaf_size()
so that architectures not interested in the PMD arguments are not
impacted.

Only define a default pte_leaf_size() when __pte_leaf_size() is not
defined to make sure nobody adds new calls to pte_leaf_size() in the
core.

Signed-off-by: Christophe Leroy 
---
v3: Don't change pte_leaf_size() to not impact other architectures
---
 include/linux/pgtable.h | 3 +++
 kernel/events/core.c| 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 85fc7554cd52..514e05730df1 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1801,9 +1801,12 @@ typedef unsigned int pgtbl_mod_mask;
 #ifndef pmd_leaf_size
 #define pmd_leaf_size(x) PMD_SIZE
 #endif
+#ifndef __pte_leaf_size
 #ifndef pte_leaf_size
 #define pte_leaf_size(x) PAGE_SIZE
 #endif
+#define __pte_leaf_size(x,y) pte_leaf_size(y)
+#endif
 
 /*
  * Some architectures have MMUs that are configurable or selectable at boot
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 724e6d7e128f..d37512f2ebf2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7585,7 +7585,7 @@ static u64 perf_get_pgtable_size(struct mm_struct *mm, 
unsigned long addr)
 
pte = ptep_get_lockless(ptep);
if (pte_present(pte))
-   size = pte_leaf_size(pte);
+   size = __pte_leaf_size(pmd, pte);
pte_unmap(ptep);
 #endif /* CONFIG_HAVE_FAST_GUP */
 
-- 
2.44.0



[RFC PATCH v3 01/16] powerpc/64e: Remove unused IBM HTW code [SQUASHED]

2024-05-26 Thread Christophe Leroy
From: Michael Ellerman 

This is a squash of series from Michael 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20240524073141.1637736-1-...@ellerman.id.au/

The nohash HTW_IBM (Hardware Table Walk) code is unused since support
for A2 was removed in commit fb5a515704d7 ("powerpc: Remove platforms/
wsp and associated pieces") (2014).

The remaining supported CPUs use either no HTW (data_tlb_miss_bolted),
or the e6500 HTW (data_tlb_miss_e6500).

Signed-off-by: Michael Ellerman 

powerpc/64e: Split out nohash Book3E 64-bit code

A reasonable chunk of nohash/tlb.c is 64-bit only code, split it out
into a separate file.

Signed-off-by: Michael Ellerman 

powerpc/64e: Drop E500 ifdefs in 64-bit code

All 64-bit Book3E have E500=y, so drop the unneeded ifdefs.

Signed-off-by: Michael Ellerman 

powerpc/64e: Drop MMU_FTR_TYPE_FSL_E checks in 64-bit code

All 64-bit Book3E have MMU_FTR_TYPE_FSL_E, since A2 was removed, so
remove checks for it in 64-bit only code.

Signed-off-by: Michael Ellerman 

powerpc/64e: Consolidate TLB miss handler patching

The 64e TLB miss handler patching is done in setup_mmu_htw(), and then
again immediately afterward in early_init_mmu_global(). Consolidate it
into a single location.

Signed-off-by: Michael Ellerman 

powerpc/64e: Drop unused TLB miss handlers

There are two possibilities for book3e_htw_mode, PPC_HTW_E6500 or
PPC_HTW_NONE.

The TLB miss handlers are patched to use, respectively:
  - exc_[data|indstruction]_tlb_miss_e6500_book3e
  - exc_[data|indstruction]_tlb_miss_bolted_book3e

Which means the default handlers are never used. Remove those, and use
the bolted handlers (PPC_HTW_NONE) by default.

Signed-off-by: Michael Ellerman 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/mmu-e500.h |   3 +-
 arch/powerpc/kernel/exceptions-64e.S   |   4 +-
 arch/powerpc/kernel/setup_64.c |   6 +-
 arch/powerpc/mm/nohash/Makefile|   2 +-
 arch/powerpc/mm/nohash/tlb.c   | 398 +--
 arch/powerpc/mm/nohash/tlb_64e.c   | 314 +++
 arch/powerpc/mm/nohash/tlb_low_64e.S   | 421 -
 7 files changed, 320 insertions(+), 828 deletions(-)
 create mode 100644 arch/powerpc/mm/nohash/tlb_64e.c

diff --git a/arch/powerpc/include/asm/nohash/mmu-e500.h 
b/arch/powerpc/include/asm/nohash/mmu-e500.h
index 6ddced0415cb..7dc24b8632d7 100644
--- a/arch/powerpc/include/asm/nohash/mmu-e500.h
+++ b/arch/powerpc/include/asm/nohash/mmu-e500.h
@@ -303,8 +303,7 @@ extern unsigned long linear_map_top;
 extern int book3e_htw_mode;
 
 #define PPC_HTW_NONE   0
-#define PPC_HTW_IBM1
-#define PPC_HTW_E6500  2
+#define PPC_HTW_E6500  1
 
 /*
  * 64-bit booke platforms don't load the tlb in the tlb miss handler code.
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index dcf0591ad3c2..63f6b9f513a4 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -485,8 +485,8 @@ interrupt_base_book3e:  
/* fake trap */
EXCEPTION_STUB(0x160, decrementer)  /* 0x0900 */
EXCEPTION_STUB(0x180, fixed_interval)   /* 0x0980 */
EXCEPTION_STUB(0x1a0, watchdog) /* 0x09f0 */
-   EXCEPTION_STUB(0x1c0, data_tlb_miss)
-   EXCEPTION_STUB(0x1e0, instruction_tlb_miss)
+   EXCEPTION_STUB(0x1c0, data_tlb_miss_bolted)
+   EXCEPTION_STUB(0x1e0, instruction_tlb_miss_bolted)
EXCEPTION_STUB(0x200, altivec_unavailable)
EXCEPTION_STUB(0x220, altivec_assist)
EXCEPTION_STUB(0x260, perfmon)
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index ae36a129789f..22f83fbbc762 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -696,11 +696,7 @@ __init u64 ppc64_bolted_size(void)
 {
 #ifdef CONFIG_PPC_BOOK3E_64
/* Freescale BookE bolts the entire linear mapping */
-   /* XXX: BookE ppc64_rma_limit setup seems to disagree? */
-   if (early_mmu_has_feature(MMU_FTR_TYPE_FSL_E))
-   return linear_map_top;
-   /* Other BookE, we assume the first GB is bolted */
-   return 1ul << 30;
+   return linear_map_top;
 #else
/* BookS radix, does not take faults on linear mapping */
if (early_radix_enabled())
diff --git a/arch/powerpc/mm/nohash/Makefile b/arch/powerpc/mm/nohash/Makefile
index b3f0498dd42f..90e846f0c46c 100644
--- a/arch/powerpc/mm/nohash/Makefile
+++ b/arch/powerpc/mm/nohash/Makefile
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-y  += mmu_context.o tlb.o tlb_low.o kup.o
-obj-$(CONFIG_PPC_BOOK3E_64)+= tlb_low_64e.o book3e_pgtable.o
+obj-$(CONFIG_PPC_BOOK3E_64)+= tlb_64e.o tlb_low_64e.o book3e_pgtable.o
 obj-$(CONFIG_40x)  += 40x.o
 obj-$(CONFIG_44x)  += 44x.o
 obj-$(CONFIG_PPC_8xx)  += 8xx.o
diff --git a/arch/powerpc/mm/nohash/tlb.c b

[RFC PATCH v3 03/16] mm: Provide mm_struct and address to huge_ptep_get()

2024-05-26 Thread Christophe Leroy
On powerpc 8xx huge_ptep_get() will need to know whether the given
ptep is a PTE entry or a PMD entry. This cannot be known with the
PMD entry itself because there is no easy way to know it from the
content of the entry.

So huge_ptep_get() will need to know either the size of the page
or get the pmd.

In order to be consistent with huge_ptep_get_and_clear(), give
mm and address to huge_ptep_get().

Signed-off-by: Christophe Leroy 
---
v2: Add missing changes in arch implementations
v3: Fixed a comment in ARM and missing changes in S390
---
 arch/arm/include/asm/hugetlb-3level.h |  4 +--
 arch/arm64/include/asm/hugetlb.h  |  2 +-
 arch/arm64/mm/hugetlbpage.c   |  2 +-
 arch/riscv/include/asm/hugetlb.h  |  2 +-
 arch/riscv/mm/hugetlbpage.c   |  2 +-
 arch/s390/include/asm/hugetlb.h   |  4 +--
 arch/s390/mm/hugetlbpage.c|  4 +--
 fs/hugetlbfs/inode.c  |  2 +-
 fs/proc/task_mmu.c|  8 ++---
 fs/userfaultfd.c  |  2 +-
 include/asm-generic/hugetlb.h |  2 +-
 include/linux/swapops.h   |  2 +-
 mm/damon/vaddr.c  |  6 ++--
 mm/gup.c  |  2 +-
 mm/hmm.c  |  2 +-
 mm/hugetlb.c  | 46 +--
 mm/memory-failure.c   |  2 +-
 mm/mempolicy.c|  2 +-
 mm/migrate.c  |  4 +--
 mm/mincore.c  |  2 +-
 mm/userfaultfd.c  |  2 +-
 21 files changed, 52 insertions(+), 52 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb-3level.h 
b/arch/arm/include/asm/hugetlb-3level.h
index a30be5505793..87d48e2d90ad 100644
--- a/arch/arm/include/asm/hugetlb-3level.h
+++ b/arch/arm/include/asm/hugetlb-3level.h
@@ -13,12 +13,12 @@
 
 /*
  * If our huge pte is non-zero then mark the valid bit.
- * This allows pte_present(huge_ptep_get(ptep)) to return true for non-zero
+ * This allows pte_present(huge_ptep_get(mm,addr,ptep)) to return true for 
non-zero
  * ptes.
  * (The valid bit is automatically cleared by set_pte_at for PROT_NONE ptes).
  */
 #define __HAVE_ARCH_HUGE_PTEP_GET
-static inline pte_t huge_ptep_get(pte_t *ptep)
+static inline pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, 
pte_t *ptep)
 {
pte_t retval = *ptep;
if (pte_val(retval))
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 2ddc33d93b13..1af39a74e791 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -46,7 +46,7 @@ extern pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
 extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
   pte_t *ptep, unsigned long sz);
 #define __HAVE_ARCH_HUGE_PTEP_GET
-extern pte_t huge_ptep_get(pte_t *ptep);
+extern pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t 
*ptep);
 
 void __init arm64_hugetlb_cma_reserve(void);
 
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index b872b003a55f..19c4abde13a3 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -141,7 +141,7 @@ static inline int num_contig_ptes(unsigned long size, 
size_t *pgsize)
return contig_ptes;
 }
 
-pte_t huge_ptep_get(pte_t *ptep)
+pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
int ncontig, i;
size_t pgsize;
diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
index 22deb7a2a6ec..6321bca08740 100644
--- a/arch/riscv/include/asm/hugetlb.h
+++ b/arch/riscv/include/asm/hugetlb.h
@@ -44,7 +44,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
   pte_t pte, int dirty);
 
 #define __HAVE_ARCH_HUGE_PTEP_GET
-pte_t huge_ptep_get(pte_t *ptep);
+pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep);
 
 pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags);
 #define arch_make_huge_pte arch_make_huge_pte
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index 5ef2a6891158..20bf499044b7 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -3,7 +3,7 @@
 #include 
 
 #ifdef CONFIG_RISCV_ISA_SVNAPOT
-pte_t huge_ptep_get(pte_t *ptep)
+pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
unsigned long pte_num;
int i;
diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h
index deb198a61039..3b4835094fd5 100644
--- a/arch/s390/include/asm/hugetlb.h
+++ b/arch/s390/include/asm/hugetlb.h
@@ -19,7 +19,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 pte_t *ptep, pte_t pte, unsigned long sz);
 void __set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
 pte_t *ptep, pte_t pte);
-pte_t huge_ptep_get(pte_t *ptep);
+pte

[RFC PATCH v3 00/16] Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)

2024-05-26 Thread Christophe Leroy
This is the continuation of the RFC v1 series "Reimplement huge pages
without hugepd on powerpc 8xx". It now get rid of hugepd completely
after handling also e500 and book3s/64

Also see https://github.com/linuxppc/issues/issues/483

Unlike most architectures, powerpc 8xx HW requires a two-level
pagetable topology for all page sizes. So a leaf PMD-contig approach
is not feasible as such.

Possible sizes on 8xx are 4k, 16k, 512k and 8M.

First level (PGD/PMD) covers 4M per entry. For 8M pages, two PMD entries
must point to a single entry level-2 page table. Until now that was
done using hugepd. This series changes it to use standard page tables
where the entry is replicated 1024 times on each of the two pagetables
refered by the two associated PMD entries for that 8M page.

For e500 and book3s/64 there are less constraints because it is not
tied to the HW assisted tablewalk like on 8xx, so it is easier to use
leaf PMDs (and PUDs).

On e500 the supported page sizes are 4M, 16M, 64M, 256M and 1G. All at
PMD level on e500/32 (mpc85xx) and mix of PMD and PUD for e500/64. We
encode page size with 4 available bits in PTE entries. On e300/32 PGD
entries size is increases to 64 bits in order to allow leaf-PMD entries
because PTE are 64 bits on e500.

On book3s/64 only the hash-4k mode is concerned. It supports 16M pages
as cont-PMD and 16G pages as cont-PUD. In other modes (radix-4k, radix-6k
and hash-64k) the sizes match with PMD and PUD sizes so that's just leaf
entries. The hash processing make things a bit more complex. To ease
things, __hash_page_huge() is modified to bail out when DIRTY or ACCESSED
bits are missing, leaving it to mm core to fix it.

Global changes in v3:
- Removed patches 1 and 2
- Squashed patch 11 into patch 5
- Replaced patches 12 and 13 with a series from Michael
- Reordered patches a bit to have more general patches up front

For more details on changes, see in each patch.

Christophe Leroy (15):
  mm: Define __pte_leaf_size() to also take a PMD entry
  mm: Provide mm_struct and address to huge_ptep_get()
  powerpc/mm: Remove _PAGE_PSIZE
  powerpc/mm: Fix __find_linux_pte() on 32 bits with PMD leaf entries
  powerpc/mm: Allow hugepages without hugepd
  powerpc/8xx: Fix size given to set_huge_pte_at()
  powerpc/8xx: Rework support for 8M pages using contiguous PTE entries
  powerpc/8xx: Simplify struct mmu_psize_def
  powerpc/e500: Remove enc and ind fields from struct mmu_psize_def
  powerpc/e500: Switch to 64 bits PGD on 85xx (32 bits)
  powerpc/e500: Encode hugepage size in PTE bits
  powerpc/e500: Use contiguous PMD instead of hugepd
  powerpc/64s: Use contiguous PMD/PUD instead of HUGEPD
  powerpc/mm: Remove hugepd leftovers
  mm: Remove CONFIG_ARCH_HAS_HUGEPD

Michael Ellerman (1):
  powerpc/64e: Remove unused IBM HTW code [SQUASHED]

 arch/arm/include/asm/hugetlb-3level.h |   4 +-
 arch/arm64/include/asm/hugetlb.h  |   2 +-
 arch/arm64/mm/hugetlbpage.c   |   2 +-
 arch/powerpc/Kconfig  |   1 -
 arch/powerpc/include/asm/book3s/32/pgalloc.h  |   2 -
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |  15 -
 arch/powerpc/include/asm/book3s/64/hash.h |  38 +-
 arch/powerpc/include/asm/book3s/64/hugetlb.h  |  38 --
 .../include/asm/book3s/64/pgtable-4k.h|  34 --
 .../include/asm/book3s/64/pgtable-64k.h   |  20 -
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  18 +
 arch/powerpc/include/asm/hugetlb.h|  15 +-
 .../include/asm/nohash/32/hugetlb-8xx.h   |  38 +-
 arch/powerpc/include/asm/nohash/32/mmu-8xx.h  |   9 +-
 arch/powerpc/include/asm/nohash/32/pte-40x.h  |   3 -
 arch/powerpc/include/asm/nohash/32/pte-44x.h  |   3 -
 arch/powerpc/include/asm/nohash/32/pte-85xx.h |   3 -
 arch/powerpc/include/asm/nohash/32/pte-8xx.h  |  58 ++-
 .../powerpc/include/asm/nohash/hugetlb-e500.h |  36 +-
 arch/powerpc/include/asm/nohash/mmu-e500.h|   6 +-
 arch/powerpc/include/asm/nohash/pgalloc.h |   2 -
 arch/powerpc/include/asm/nohash/pgtable.h |  45 +-
 arch/powerpc/include/asm/nohash/pte-e500.h|  35 +-
 arch/powerpc/include/asm/page.h   |  32 --
 arch/powerpc/include/asm/pgtable-be-types.h   |  10 -
 arch/powerpc/include/asm/pgtable-types.h  |  13 +-
 arch/powerpc/include/asm/pgtable.h|   3 +
 arch/powerpc/kernel/exceptions-64e.S  |   4 +-
 arch/powerpc/kernel/head_85xx.S   |  33 +-
 arch/powerpc/kernel/head_8xx.S|  10 +-
 arch/powerpc/kernel/setup_64.c|   6 +-
 arch/powerpc/mm/book3s64/hash_utils.c |  11 +-
 arch/powerpc/mm/book3s64/hugetlbpage.c|  10 +
 arch/powerpc/mm/book3s64/pgtable.c|  12 -
 arch/powerpc/mm/hugetlbpage.c | 455 +---
 arch/powerpc/mm/init-common.c |   8 +-
 arch/powerpc/mm/kasan/8xx.c   |  21 +-
 arch/powerpc/mm/nohash/8xx.c  |  43 +-
 arch/powerpc/mm/nohash/Makefile   |   2 +-
 arc

[RFC PATCH v3 06/16] powerpc/mm: Allow hugepages without hugepd

2024-05-26 Thread Christophe Leroy
In preparation of implementing huge pages on powerpc 8xx
without hugepd, enclose hugepd related code inside an
ifdef CONFIG_ARCH_HAS_HUGEPD

This also allows removing some stubs.

Signed-off-by: Christophe Leroy 
---
v3:
- Prepare huge_pte_alloc() for full standard topology, not only for 2-level
- Reordered last part of huge_pte_alloc()
---
 arch/powerpc/include/asm/book3s/32/pgalloc.h |  2 --
 arch/powerpc/include/asm/hugetlb.h   | 10 ++
 arch/powerpc/include/asm/nohash/pgtable.h|  8 +++--
 arch/powerpc/mm/hugetlbpage.c| 33 
 arch/powerpc/mm/pgtable.c|  2 ++
 5 files changed, 42 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index dc5c039eb28e..dd4eb3063175 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -47,8 +47,6 @@ static inline void pgtable_free(void *table, unsigned 
index_size)
}
 }
 
-#define get_hugepd_cache_index(x)  (x)
-
 static inline void pgtable_free_tlb(struct mmu_gather *tlb,
void *table, int shift)
 {
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index ea71f7245a63..79176a499763 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -30,10 +30,12 @@ static inline int is_hugepage_only_range(struct mm_struct 
*mm,
 }
 #define is_hugepage_only_range is_hugepage_only_range
 
+#ifdef CONFIG_ARCH_HAS_HUGEPD
 #define __HAVE_ARCH_HUGETLB_FREE_PGD_RANGE
 void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
unsigned long end, unsigned long floor,
unsigned long ceiling);
+#endif
 
 #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
 static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
@@ -67,14 +69,6 @@ static inline void flush_hugetlb_page(struct vm_area_struct 
*vma,
 {
 }
 
-#define hugepd_shift(x) 0
-static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
-   unsigned pdshift)
-{
-   return NULL;
-}
-
-
 static inline void __init gigantic_hugetlb_cma_reserve(void)
 {
 }
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 427db14292c9..ac3353f7f2ac 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -340,7 +340,7 @@ static inline void __set_pte_at(struct mm_struct *mm, 
unsigned long addr,
 
 #define pgprot_writecombine pgprot_noncached_wc
 
-#ifdef CONFIG_HUGETLB_PAGE
+#ifdef CONFIG_ARCH_HAS_HUGEPD
 static inline int hugepd_ok(hugepd_t hpd)
 {
 #ifdef CONFIG_PPC_8xx
@@ -351,6 +351,10 @@ static inline int hugepd_ok(hugepd_t hpd)
 #endif
 }
 
+#define is_hugepd(hpd) (hugepd_ok(hpd))
+#endif
+
+#ifdef CONFIG_HUGETLB_PAGE
 static inline int pmd_huge(pmd_t pmd)
 {
return 0;
@@ -360,8 +364,6 @@ static inline int pud_huge(pud_t pud)
 {
return 0;
 }
-
-#define is_hugepd(hpd) (hugepd_ok(hpd))
 #endif
 
 int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot);
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 594a4b7b2ca2..20fad59ff9f5 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -42,6 +42,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long 
addr, unsigned long s
return __find_linux_pte(mm->pgd, addr, NULL, NULL);
 }
 
+#ifdef CONFIG_ARCH_HAS_HUGEPD
 static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
   unsigned long address, unsigned int pdshift,
   unsigned int pshift, spinlock_t *ptl)
@@ -193,6 +194,36 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct 
vm_area_struct *vma,
 
return hugepte_offset(*hpdp, addr, pdshift);
 }
+#else
+pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long addr, unsigned long sz)
+{
+   p4d_t *p4d;
+   pud_t *pud;
+   pmd_t *pmd;
+
+   addr &= ~(sz - 1);
+
+   p4d = p4d_offset(pgd_offset(mm, addr), addr);
+   if (!mm_pud_folded(mm) && sz >= P4D_SIZE)
+   return (pte_t *)p4d;
+
+   pud = pud_alloc(mm, p4d, addr);
+   if (!pud)
+   return NULL;
+   if (!mm_pmd_folded(mm) && sz >= PUD_SIZE)
+   return (pte_t *)pud;
+
+   pmd = pmd_alloc(mm, pud, addr);
+   if (!pmd)
+   return NULL;
+
+   if (sz >= PMD_SIZE)
+   return (pte_t *)pmd;
+
+   return pte_alloc_huge(mm, pmd, addr);
+}
+#endif
 
 #ifdef CONFIG_PPC_BOOK3S_64
 /*
@@ -248,6 +279,7 @@ int __init alloc_bootmem_huge_page(struct hstate *h, int 
nid)
return __alloc_bootmem_huge_page(h, nid);
 }
 
+#ifdef CONFIG_ARCH_HAS_HUGEPD
 #ifndef CONFIG_PPC_BOOK3S_64
 #define HUGEPD_FREE

[RFC PATCH v3 08/16] powerpc/8xx: Rework support for 8M pages using contiguous PTE entries

2024-05-26 Thread Christophe Leroy
In order to fit better with standard Linux page tables layout, add
support for 8M pages using contiguous PTE entries in a standard
page table. Page tables will then be populated with 1024 similar
entries and two PMD entries will point to that page table.

The PMD entries also get a flag to tell it is addressing an 8M page,
this is required for the HW tablewalk assistance.

Signed-off-by: Christophe Leroy 
---
v3:
- Move huge_ptep_get() for a more readable commit diff
- Flag PMD as 8Mbytes in set_huge_pte_at()
- Define __pte_leaf_size()
- Change pte_update() instead of all huge callers of pte_update()
- Added ptep_is_8m_pmdp() helper
- Fixed kasan early memory 8M allocation
---
 arch/powerpc/Kconfig  |  1 -
 .../include/asm/nohash/32/hugetlb-8xx.h   | 38 +++--
 arch/powerpc/include/asm/nohash/32/pte-8xx.h  | 53 ---
 arch/powerpc/include/asm/nohash/pgtable.h |  4 --
 arch/powerpc/include/asm/page.h   |  5 --
 arch/powerpc/include/asm/pgtable.h|  3 ++
 arch/powerpc/kernel/head_8xx.S| 10 +---
 arch/powerpc/mm/hugetlbpage.c | 18 ---
 arch/powerpc/mm/kasan/8xx.c   | 21 +---
 arch/powerpc/mm/nohash/8xx.c  | 40 +++---
 arch/powerpc/mm/pgtable.c | 27 +++---
 arch/powerpc/mm/pgtable_32.c  |  2 +-
 arch/powerpc/platforms/Kconfig.cputype|  2 +
 13 files changed, 112 insertions(+), 112 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a1a3b3363008..6a4ea7dad23f 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -135,7 +135,6 @@ config PPC
select ARCH_HAS_DMA_MAP_DIRECT  if PPC_PSERIES
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
-   select ARCH_HAS_HUGEPD  if HUGETLB_PAGE
select ARCH_HAS_KCOV
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_MEMBARRIER_SYNC_CORE
diff --git a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h 
b/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
index 92df40c6cc6b..c60219269323 100644
--- a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
@@ -4,42 +4,12 @@
 
 #define PAGE_SHIFT_8M  23
 
-static inline pte_t *hugepd_page(hugepd_t hpd)
-{
-   BUG_ON(!hugepd_ok(hpd));
-
-   return (pte_t *)__va(hpd_val(hpd) & ~HUGEPD_SHIFT_MASK);
-}
-
-static inline unsigned int hugepd_shift(hugepd_t hpd)
-{
-   return PAGE_SHIFT_8M;
-}
-
-static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
-   unsigned int pdshift)
-{
-   unsigned long idx = (addr & (SZ_4M - 1)) >> PAGE_SHIFT;
-
-   return hugepd_page(hpd) + idx;
-}
-
 static inline void flush_hugetlb_page(struct vm_area_struct *vma,
  unsigned long vmaddr)
 {
flush_tlb_page(vma, vmaddr);
 }
 
-static inline void hugepd_populate(hugepd_t *hpdp, pte_t *new, unsigned int 
pshift)
-{
-   *hpdp = __hugepd(__pa(new) | _PMD_USER | _PMD_PRESENT | _PMD_PAGE_8M);
-}
-
-static inline void hugepd_populate_kernel(hugepd_t *hpdp, pte_t *new, unsigned 
int pshift)
-{
-   *hpdp = __hugepd(__pa(new) | _PMD_PRESENT | _PMD_PAGE_8M);
-}
-
 static inline int check_and_get_huge_psize(int shift)
 {
return shift_to_mmu_psize(shift);
@@ -49,6 +19,14 @@ static inline int check_and_get_huge_psize(int shift)
 void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
 pte_t pte, unsigned long sz);
 
+#define __HAVE_ARCH_HUGE_PTEP_GET
+static inline pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, 
pte_t *ptep)
+{
+   if (ptep_is_8m_pmdp(mm, addr, ptep))
+   ptep = pte_offset_kernel((pmd_t *)ptep, 0);
+   return ptep_get(ptep);
+}
+
 #define __HAVE_ARCH_HUGE_PTE_CLEAR
 static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
  pte_t *ptep, unsigned long sz)
diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h 
b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
index 625c31d6ce5c..54ebb91dbdcf 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
@@ -119,7 +119,7 @@ static inline pte_t pte_mkhuge(pte_t pte)
 
 #define pte_mkhuge pte_mkhuge
 
-static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, 
pte_t *p,
+static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, 
pte_t *ptep,
 unsigned long clr, unsigned long set, int 
huge);
 
 static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long 
addr, pte_t *ptep)
@@ -141,19 +141,12 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma, pte_t *pt
 }
 #define __ptep_set_access_flags __ptep_set_access_flags
 
-static inline unsi

[RFC PATCH v3 07/16] powerpc/8xx: Fix size given to set_huge_pte_at()

2024-05-26 Thread Christophe Leroy
set_huge_pte_at() expects the size of the hugepage as an int, not the
psize which is the index of the page definition in table mmu_psize_defs[]

Fixes: 935d4f0c6dc8 ("mm: hugetlb: add huge page size param to 
set_huge_pte_at()")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/nohash/8xx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
index 43d4842bb1c7..d93433e26ded 100644
--- a/arch/powerpc/mm/nohash/8xx.c
+++ b/arch/powerpc/mm/nohash/8xx.c
@@ -94,7 +94,8 @@ static int __ref __early_map_kernel_hugepage(unsigned long 
va, phys_addr_t pa,
return -EINVAL;
 
set_huge_pte_at(&init_mm, va, ptep,
-   pte_mkhuge(pfn_pte(pa >> PAGE_SHIFT, prot)), psize);
+   pte_mkhuge(pfn_pte(pa >> PAGE_SHIFT, prot)),
+   1UL << mmu_psize_to_shift(psize));
 
return 0;
 }
-- 
2.44.0



[RFC PATCH v3 04/16] powerpc/mm: Remove _PAGE_PSIZE

2024-05-26 Thread Christophe Leroy
_PAGE_PSIZE macro is never used outside the place it is defined
and is used only on 8xx and e500.

Remove indirection, remove it and use its content directly.

Signed-off-by: Christophe Leroy 
Reviewed-by: Oscar Salvador 
---
 arch/powerpc/include/asm/nohash/32/pte-40x.h  | 3 ---
 arch/powerpc/include/asm/nohash/32/pte-44x.h  | 3 ---
 arch/powerpc/include/asm/nohash/32/pte-85xx.h | 3 ---
 arch/powerpc/include/asm/nohash/32/pte-8xx.h  | 5 ++---
 arch/powerpc/include/asm/nohash/pte-e500.h| 4 +---
 5 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pte-40x.h 
b/arch/powerpc/include/asm/nohash/32/pte-40x.h
index d759cfd74754..52ed58516fa4 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-40x.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-40x.h
@@ -49,9 +49,6 @@
 #define _PAGE_EXEC 0x200   /* hardware: EX permission */
 #define _PAGE_ACCESSED 0x400   /* software: R: page referenced */
 
-/* No page size encoding in the linux PTE */
-#define _PAGE_PSIZE0
-
 /* cache related flags non existing on 40x */
 #define _PAGE_COHERENT 0
 
diff --git a/arch/powerpc/include/asm/nohash/32/pte-44x.h 
b/arch/powerpc/include/asm/nohash/32/pte-44x.h
index 851813725237..da0469928273 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-44x.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-44x.h
@@ -75,9 +75,6 @@
 #define _PAGE_NO_CACHE 0x0400  /* H: I bit */
 #define _PAGE_WRITETHRU0x0800  /* H: W bit */
 
-/* No page size encoding in the linux PTE */
-#define _PAGE_PSIZE0
-
 /* TODO: Add large page lowmem mapping support */
 #define _PMD_PRESENT   0
 #define _PMD_PRESENT_MASK (PAGE_MASK)
diff --git a/arch/powerpc/include/asm/nohash/32/pte-85xx.h 
b/arch/powerpc/include/asm/nohash/32/pte-85xx.h
index 653a342d3b25..14d64b4f3f14 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-85xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-85xx.h
@@ -31,9 +31,6 @@
 #define _PAGE_WRITETHRU0x00400 /* H: W bit */
 #define _PAGE_SPECIAL  0x00800 /* S: Special page */
 
-/* No page size encoding in the linux PTE */
-#define _PAGE_PSIZE0
-
 #define _PMD_PRESENT   0
 #define _PMD_PRESENT_MASK (PAGE_MASK)
 #define _PMD_BAD   (~PAGE_MASK)
diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h 
b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
index 137dc3c84e45..625c31d6ce5c 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
@@ -74,12 +74,11 @@
 #define _PTE_NONE_MASK 0
 
 #ifdef CONFIG_PPC_16K_PAGES
-#define _PAGE_PSIZE_PAGE_SPS
+#define _PAGE_BASE_NC  (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_SPS)
 #else
-#define _PAGE_PSIZE0
+#define _PAGE_BASE_NC  (_PAGE_PRESENT | _PAGE_ACCESSED)
 #endif
 
-#define _PAGE_BASE_NC  (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_PSIZE)
 #define _PAGE_BASE (_PAGE_BASE_NC)
 
 #include 
diff --git a/arch/powerpc/include/asm/nohash/pte-e500.h 
b/arch/powerpc/include/asm/nohash/pte-e500.h
index f516f0b5b7a8..975facc7e38e 100644
--- a/arch/powerpc/include/asm/nohash/pte-e500.h
+++ b/arch/powerpc/include/asm/nohash/pte-e500.h
@@ -65,8 +65,6 @@
 
 #define _PAGE_SPECIAL  _PAGE_SW0
 
-/* Base page size */
-#define _PAGE_PSIZE_PAGE_PSIZE_4K
 #definePTE_RPN_SHIFT   (24)
 
 #define PTE_WIMGE_SHIFT (19)
@@ -89,7 +87,7 @@
  * pages. We always set _PAGE_COHERENT when SMP is enabled or
  * the processor might need it for DMA coherency.
  */
-#define _PAGE_BASE_NC  (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_PSIZE)
+#define _PAGE_BASE_NC  (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_PSIZE_4K)
 #if defined(CONFIG_SMP)
 #define _PAGE_BASE (_PAGE_BASE_NC | _PAGE_COHERENT)
 #else
-- 
2.44.0



[RFC PATCH v3 05/16] powerpc/mm: Fix __find_linux_pte() on 32 bits with PMD leaf entries

2024-05-26 Thread Christophe Leroy
Building on 32 bits with pmd_leaf() not returning always false leads
to the following error:

  CC  arch/powerpc/mm/pgtable.o
arch/powerpc/mm/pgtable.c: In function '__find_linux_pte':
arch/powerpc/mm/pgtable.c:506:1: error: function may return address of local 
variable [-Werror=return-local-addr]
  506 | }
  | ^
arch/powerpc/mm/pgtable.c:394:15: note: declared here
  394 | pud_t pud, *pudp;
  |   ^~~
arch/powerpc/mm/pgtable.c:394:15: note: declared here

This is due to pmd_offset() being a no-op in that case.

So rework it for powerpc/32 so that pXd_offset() are used on real
pointers and not on on-stack copies.

Signed-off-by: Christophe Leroy 
---
v3: Removed p4dp and pudp locals for PPC32 and add a comment.
---
 arch/powerpc/mm/pgtable.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 9e7ba9c3851f..10adef5967a3 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -382,8 +382,10 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
bool *is_thp, unsigned *hpage_shift)
 {
pgd_t *pgdp;
+#ifdef CONFIG_PPC64
p4d_t p4d, *p4dp;
pud_t pud, *pudp;
+#endif
pmd_t pmd, *pmdp;
pte_t *ret_pte;
hugepd_t *hpdp = NULL;
@@ -401,8 +403,12 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
 * page fault or a page unmap. The return pte_t * is still not
 * stable. So should be checked there for above conditions.
 * Top level is an exception because it is folded into p4d.
+*
+* On PPC32, P4D/PUD/PMD are folded into PGD so go straight to
+* PMD level.
 */
pgdp = pgdir + pgd_index(ea);
+#ifdef CONFIG_PPC64
p4dp = p4d_offset(pgdp, ea);
p4d  = READ_ONCE(*p4dp);
pdshift = P4D_SHIFT;
@@ -444,6 +450,9 @@ pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea,
 
pdshift = PMD_SHIFT;
pmdp = pmd_offset(&pud, ea);
+#else
+   pmdp = pmd_offset(pud_offset(p4d_offset(pgdp, ea), ea), ea);
+#endif
pmd  = READ_ONCE(*pmdp);
 
/*
-- 
2.44.0



[RFC PATCH v3 09/16] powerpc/8xx: Simplify struct mmu_psize_def

2024-05-26 Thread Christophe Leroy
On 8xx, only the shift field is used in struct mmu_psize_def

Remove other fields and related macros.

Signed-off-by: Christophe Leroy 
Reviewed-by: Oscar Salvador 
---
 arch/powerpc/include/asm/nohash/32/mmu-8xx.h | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h 
b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
index 141d82e249a8..a756a1e59c54 100644
--- a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
@@ -189,19 +189,14 @@ typedef struct {
 
 #define PHYS_IMMR_BASE (mfspr(SPRN_IMMR) & 0xfff8)
 
-/* Page size definitions, common between 32 and 64-bit
+/*
+ * Page size definitions for 8xx
  *
  *shift : is the "PAGE_SHIFT" value for that page size
- *penc  : is the pte encoding mask
  *
  */
 struct mmu_psize_def {
unsigned intshift;  /* number of bits */
-   unsigned intenc;/* PTE encoding */
-   unsigned intind;/* Corresponding indirect page size shift */
-   unsigned intflags;
-#define MMU_PAGE_SIZE_DIRECT   0x1 /* Supported as a direct size */
-#define MMU_PAGE_SIZE_INDIRECT 0x2 /* Supported as an indirect size */
 };
 
 extern struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
-- 
2.44.0



Re: [PATCH v2 1/1] x86/elf: Add a new .note section containing Xfeatures information to x86 core files

2024-05-26 Thread Borislav Petkov
On Sun, May 26, 2024 at 10:24:41AM +0530, Balasubrmanian, Vignesh wrote:
> If we can add a new enum only when we extend, then as Thomas suggested can
> we use other kernel variables as in the first version of the patch until we
> extend for other/new features?

I assume by "other kernel variables" you mean CPUID?

If so, can you change the layout of your buffer once you export it to
userspace?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette