from:"Rapoport"

Re: [PATCH v4 22/34] csky: Convert __pte_free_tlb() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:11PM -0700, Vishal Moola (Oracle) wrote:
> Part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> Acked-by: Guo Ren 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/csky/include/asm/pgalloc.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/csky/include/asm/pgalloc.h b/arch/csky/include/asm/pgalloc.h
> index 7d57e5da0914..9c84c9012e53 100644
> --- a/arch/csky/include/asm/pgalloc.h
> +++ b/arch/csky/include/asm/pgalloc.h
> @@ -63,8 +63,8 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
>  
>  #define __pte_free_tlb(tlb, pte, address)\
>  do { \
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page(tlb, pte);  \
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc(tlb, page_ptdesc(pte));  \
>  } while (0)
>  
>  extern void pagetable_init(void);
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 21/34] arm64: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:10PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/arm64/include/asm/tlb.h | 14 --
>  arch/arm64/mm/mmu.c  |  7 ---
>  2 files changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> index c995d1f4594f..2c29239d05c3 100644
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -75,18 +75,20 @@ static inline void tlb_flush(struct mmu_gather *tlb)
>  static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
> unsigned long addr)
>  {
> - pgtable_pte_page_dtor(pte);
> - tlb_remove_table(tlb, pte);
> + struct ptdesc *ptdesc = page_ptdesc(pte);
> +
> + pagetable_pte_dtor(ptdesc);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  }
>  
>  #if CONFIG_PGTABLE_LEVELS > 2
>  static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
> unsigned long addr)
>  {
> - struct page *page = virt_to_page(pmdp);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
>  
> - pgtable_pmd_page_dtor(page);
> - tlb_remove_table(tlb, page);
> + pagetable_pmd_dtor(ptdesc);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  }
>  #endif
>  
> @@ -94,7 +96,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, 
> pmd_t *pmdp,
>  static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
> unsigned long addr)
>  {
> - tlb_remove_table(tlb, virt_to_page(pudp));
> + tlb_remove_ptdesc(tlb, virt_to_ptdesc(pudp));
>  }
>  #endif
>  
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index af6bc8403ee4..5867a0e917b9 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -426,6 +426,7 @@ static phys_addr_t __pgd_pgtable_alloc(int shift)
>  static phys_addr_t pgd_pgtable_alloc(int shift)
>  {
>   phys_addr_t pa = __pgd_pgtable_alloc(shift);
> + struct ptdesc *ptdesc = page_ptdesc(phys_to_page(pa));
>  
>   /*
>* Call proper page table ctor in case later we need to
> @@ -433,12 +434,12 @@ static phys_addr_t pgd_pgtable_alloc(int shift)
>* this pre-allocated page table.
>*
>* We don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK if pmd is
> -  * folded, and if so pgtable_pmd_page_ctor() becomes nop.
> +  * folded, and if so pagetable_pte_ctor() becomes nop.
>*/
>   if (shift == PAGE_SHIFT)
> - BUG_ON(!pgtable_pte_page_ctor(phys_to_page(pa)));
> + BUG_ON(!pagetable_pte_ctor(ptdesc));
>   else if (shift == PMD_SHIFT)
> - BUG_ON(!pgtable_pmd_page_ctor(phys_to_page(pa)));
> + BUG_ON(!pagetable_pmd_ctor(ptdesc));
>  
>   return pa;
>  }
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 21/34] arm64: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:10PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/arm64/include/asm/tlb.h | 14 --
>  arch/arm64/mm/mmu.c  |  7 ---
>  2 files changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> index c995d1f4594f..2c29239d05c3 100644
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -75,18 +75,20 @@ static inline void tlb_flush(struct mmu_gather *tlb)
>  static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
> unsigned long addr)
>  {
> - pgtable_pte_page_dtor(pte);
> - tlb_remove_table(tlb, pte);
> + struct ptdesc *ptdesc = page_ptdesc(pte);
> +
> + pagetable_pte_dtor(ptdesc);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  }
>  
>  #if CONFIG_PGTABLE_LEVELS > 2
>  static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
> unsigned long addr)
>  {
> - struct page *page = virt_to_page(pmdp);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
>  
> - pgtable_pmd_page_dtor(page);
> - tlb_remove_table(tlb, page);
> + pagetable_pmd_dtor(ptdesc);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  }
>  #endif
>  
> @@ -94,7 +96,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, 
> pmd_t *pmdp,
>  static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
> unsigned long addr)
>  {
> - tlb_remove_table(tlb, virt_to_page(pudp));
> + tlb_remove_ptdesc(tlb, virt_to_ptdesc(pudp));
>  }
>  #endif
>  
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index af6bc8403ee4..5867a0e917b9 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -426,6 +426,7 @@ static phys_addr_t __pgd_pgtable_alloc(int shift)
>  static phys_addr_t pgd_pgtable_alloc(int shift)
>  {
>   phys_addr_t pa = __pgd_pgtable_alloc(shift);
> + struct ptdesc *ptdesc = page_ptdesc(phys_to_page(pa));
>  
>   /*
>* Call proper page table ctor in case later we need to
> @@ -433,12 +434,12 @@ static phys_addr_t pgd_pgtable_alloc(int shift)
>* this pre-allocated page table.
>*
>* We don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK if pmd is
> -  * folded, and if so pgtable_pmd_page_ctor() becomes nop.
> +  * folded, and if so pagetable_pte_ctor() becomes nop.
>*/
>   if (shift == PAGE_SHIFT)
> - BUG_ON(!pgtable_pte_page_ctor(phys_to_page(pa)));
> + BUG_ON(!pagetable_pte_ctor(ptdesc));
>   else if (shift == PMD_SHIFT)
> - BUG_ON(!pgtable_pmd_page_ctor(phys_to_page(pa)));
> + BUG_ON(!pagetable_pmd_ctor(ptdesc));
>  
>   return pa;
>  }
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

___
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [PATCH v4 21/34] arm64: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:10PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/arm64/include/asm/tlb.h | 14 --
>  arch/arm64/mm/mmu.c  |  7 ---
>  2 files changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> index c995d1f4594f..2c29239d05c3 100644
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -75,18 +75,20 @@ static inline void tlb_flush(struct mmu_gather *tlb)
>  static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
> unsigned long addr)
>  {
> - pgtable_pte_page_dtor(pte);
> - tlb_remove_table(tlb, pte);
> + struct ptdesc *ptdesc = page_ptdesc(pte);
> +
> + pagetable_pte_dtor(ptdesc);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  }
>  
>  #if CONFIG_PGTABLE_LEVELS > 2
>  static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
> unsigned long addr)
>  {
> - struct page *page = virt_to_page(pmdp);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
>  
> - pgtable_pmd_page_dtor(page);
> - tlb_remove_table(tlb, page);
> + pagetable_pmd_dtor(ptdesc);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  }
>  #endif
>  
> @@ -94,7 +96,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, 
> pmd_t *pmdp,
>  static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
> unsigned long addr)
>  {
> - tlb_remove_table(tlb, virt_to_page(pudp));
> + tlb_remove_ptdesc(tlb, virt_to_ptdesc(pudp));
>  }
>  #endif
>  
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index af6bc8403ee4..5867a0e917b9 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -426,6 +426,7 @@ static phys_addr_t __pgd_pgtable_alloc(int shift)
>  static phys_addr_t pgd_pgtable_alloc(int shift)
>  {
>   phys_addr_t pa = __pgd_pgtable_alloc(shift);
> + struct ptdesc *ptdesc = page_ptdesc(phys_to_page(pa));
>  
>   /*
>* Call proper page table ctor in case later we need to
> @@ -433,12 +434,12 @@ static phys_addr_t pgd_pgtable_alloc(int shift)
>* this pre-allocated page table.
>*
>* We don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK if pmd is
> -  * folded, and if so pgtable_pmd_page_ctor() becomes nop.
> +  * folded, and if so pagetable_pte_ctor() becomes nop.
>*/
>   if (shift == PAGE_SHIFT)
> - BUG_ON(!pgtable_pte_page_ctor(phys_to_page(pa)));
> + BUG_ON(!pagetable_pte_ctor(ptdesc));
>   else if (shift == PMD_SHIFT)
> - BUG_ON(!pgtable_pmd_page_ctor(phys_to_page(pa)));
> + BUG_ON(!pagetable_pmd_ctor(ptdesc));
>  
>   return pa;
>  }
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 20/34] arm: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:09PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> late_alloc() also uses the __get_free_pages() helper function. Convert
> this to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

One comment below.

> ---
>  arch/arm/include/asm/tlb.h | 12 +++-
>  arch/arm/mm/mmu.c  |  6 +++---
>  2 files changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
> index b8cbe03ad260..f40d06ad5d2a 100644
> --- a/arch/arm/include/asm/tlb.h
> +++ b/arch/arm/include/asm/tlb.h
> @@ -39,7 +39,9 @@ static inline void __tlb_remove_table(void *_table)
>  static inline void
>  __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr)
>  {
> - pgtable_pte_page_dtor(pte);
> + struct ptdesc *ptdesc = page_ptdesc(pte);
> +
> + pagetable_pte_dtor(ptdesc);
>  
>  #ifndef CONFIG_ARM_LPAE
>   /*
> @@ -50,17 +52,17 @@ __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, 
> unsigned long addr)
>   __tlb_adjust_range(tlb, addr - PAGE_SIZE, 2 * PAGE_SIZE);
>  #endif
>  
> - tlb_remove_table(tlb, pte);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  }
>  
>  static inline void
>  __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
>  {
>  #ifdef CONFIG_ARM_LPAE
> - struct page *page = virt_to_page(pmdp);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
>  
> - pgtable_pmd_page_dtor(page);
> - tlb_remove_table(tlb, page);
> + pagetable_pmd_dtor(ptdesc);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  #endif
>  }
>  
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index 22292cf3381c..294518fd0240 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -737,11 +737,11 @@ static void __init *early_alloc(unsigned long sz)
>  
>  static void *__init late_alloc(unsigned long sz)
>  {
> - void *ptr = (void *)__get_free_pages(GFP_PGTABLE_KERNEL, get_order(sz));
> + void *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL, get_order(sz));
>  
> - if (!ptr || !pgtable_pte_page_ctor(virt_to_page(ptr)))
> + if (!ptdesc || !pagetable_pte_ctor(ptdesc))
>   BUG();
> - return ptr;
> + return ptdesc;

should be

return  ptdesc_to_virt(ptdesc);

>  }
>  
>  static pte_t * __init arm_pte_alloc(pmd_t *pmd, unsigned long addr,
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 20/34] arm: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:09PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> late_alloc() also uses the __get_free_pages() helper function. Convert
> this to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

One comment below.

> ---
>  arch/arm/include/asm/tlb.h | 12 +++-
>  arch/arm/mm/mmu.c  |  6 +++---
>  2 files changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
> index b8cbe03ad260..f40d06ad5d2a 100644
> --- a/arch/arm/include/asm/tlb.h
> +++ b/arch/arm/include/asm/tlb.h
> @@ -39,7 +39,9 @@ static inline void __tlb_remove_table(void *_table)
>  static inline void
>  __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr)
>  {
> - pgtable_pte_page_dtor(pte);
> + struct ptdesc *ptdesc = page_ptdesc(pte);
> +
> + pagetable_pte_dtor(ptdesc);
>  
>  #ifndef CONFIG_ARM_LPAE
>   /*
> @@ -50,17 +52,17 @@ __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, 
> unsigned long addr)
>   __tlb_adjust_range(tlb, addr - PAGE_SIZE, 2 * PAGE_SIZE);
>  #endif
>  
> - tlb_remove_table(tlb, pte);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  }
>  
>  static inline void
>  __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
>  {
>  #ifdef CONFIG_ARM_LPAE
> - struct page *page = virt_to_page(pmdp);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
>  
> - pgtable_pmd_page_dtor(page);
> - tlb_remove_table(tlb, page);
> + pagetable_pmd_dtor(ptdesc);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  #endif
>  }
>  
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index 22292cf3381c..294518fd0240 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -737,11 +737,11 @@ static void __init *early_alloc(unsigned long sz)
>  
>  static void *__init late_alloc(unsigned long sz)
>  {
> - void *ptr = (void *)__get_free_pages(GFP_PGTABLE_KERNEL, get_order(sz));
> + void *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL, get_order(sz));
>  
> - if (!ptr || !pgtable_pte_page_ctor(virt_to_page(ptr)))
> + if (!ptdesc || !pagetable_pte_ctor(ptdesc))
>   BUG();
> - return ptr;
> + return ptdesc;

should be

return  ptdesc_to_virt(ptdesc);

>  }
>  
>  static pte_t * __init arm_pte_alloc(pmd_t *pmd, unsigned long addr,
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 20/34] arm: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:09PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> late_alloc() also uses the __get_free_pages() helper function. Convert
> this to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

One comment below.

> ---
>  arch/arm/include/asm/tlb.h | 12 +++-
>  arch/arm/mm/mmu.c  |  6 +++---
>  2 files changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
> index b8cbe03ad260..f40d06ad5d2a 100644
> --- a/arch/arm/include/asm/tlb.h
> +++ b/arch/arm/include/asm/tlb.h
> @@ -39,7 +39,9 @@ static inline void __tlb_remove_table(void *_table)
>  static inline void
>  __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr)
>  {
> - pgtable_pte_page_dtor(pte);
> + struct ptdesc *ptdesc = page_ptdesc(pte);
> +
> + pagetable_pte_dtor(ptdesc);
>  
>  #ifndef CONFIG_ARM_LPAE
>   /*
> @@ -50,17 +52,17 @@ __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, 
> unsigned long addr)
>   __tlb_adjust_range(tlb, addr - PAGE_SIZE, 2 * PAGE_SIZE);
>  #endif
>  
> - tlb_remove_table(tlb, pte);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  }
>  
>  static inline void
>  __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
>  {
>  #ifdef CONFIG_ARM_LPAE
> - struct page *page = virt_to_page(pmdp);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
>  
> - pgtable_pmd_page_dtor(page);
> - tlb_remove_table(tlb, page);
> + pagetable_pmd_dtor(ptdesc);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  #endif
>  }
>  
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index 22292cf3381c..294518fd0240 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -737,11 +737,11 @@ static void __init *early_alloc(unsigned long sz)
>  
>  static void *__init late_alloc(unsigned long sz)
>  {
> - void *ptr = (void *)__get_free_pages(GFP_PGTABLE_KERNEL, get_order(sz));
> + void *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL, get_order(sz));
>  
> - if (!ptr || !pgtable_pte_page_ctor(virt_to_page(ptr)))
> + if (!ptdesc || !pagetable_pte_ctor(ptdesc))
>   BUG();
> - return ptr;
> + return ptdesc;

should be

return  ptdesc_to_virt(ptdesc);

>  }
>  
>  static pte_t * __init arm_pte_alloc(pmd_t *pmd, unsigned long addr,
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

___
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [PATCH v4 19/34] pgalloc: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:08PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  include/asm-generic/pgalloc.h | 62 +--
>  1 file changed, 37 insertions(+), 25 deletions(-)
> 
> diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
> index a7cf825befae..3fd6ce79e654 100644
> --- a/include/asm-generic/pgalloc.h
> +++ b/include/asm-generic/pgalloc.h
> @@ -18,7 +18,11 @@
>   */
>  static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm)
>  {
> - return (pte_t *)__get_free_page(GFP_PGTABLE_KERNEL);
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL, 0);
> +
> + if (!ptdesc)
> + return NULL;
> + return ptdesc_address(ptdesc);
>  }
>  
>  #ifndef __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
> @@ -41,7 +45,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct 
> *mm)
>   */
>  static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>  {
> - free_page((unsigned long)pte);
> + pagetable_free(virt_to_ptdesc(pte));
>  }
>  
>  /**
> @@ -49,7 +53,7 @@ static inline void pte_free_kernel(struct mm_struct *mm, 
> pte_t *pte)
>   * @mm: the mm_struct of the current context
>   * @gfp: GFP flags to use for the allocation
>   *
> - * Allocates a page and runs the pgtable_pte_page_ctor().
> + * Allocates a ptdesc and runs the pagetable_pte_ctor().

Allocates memory for page table and ptdesc

>   *
>   * This function is intended for architectures that need
>   * anything beyond simple page allocation or must have custom GFP flags.

The Return: description here should be fixed up

> @@ -58,17 +62,17 @@ static inline void pte_free_kernel(struct mm_struct *mm, 
> pte_t *pte)
>   */
>  static inline pgtable_t __pte_alloc_one(struct mm_struct *mm, gfp_t gfp)
>  {
> - struct page *pte;
> + struct ptdesc *ptdesc;
>  
> - pte = alloc_page(gfp);
> - if (!pte)
> + ptdesc = pagetable_alloc(gfp, 0);
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pte_page_ctor(pte)) {
> - __free_page(pte);
> + if (!pagetable_pte_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
>  
> - return pte;
> + return ptdesc_page(ptdesc);
>  }
>  
>  #ifndef __HAVE_ARCH_PTE_ALLOC_ONE
> @@ -76,7 +80,7 @@ static inline pgtable_t __pte_alloc_one(struct mm_struct 
> *mm, gfp_t gfp)
>   * pte_alloc_one - allocate a page for PTE-level user page table
>   * @mm: the mm_struct of the current context
>   *
> - * Allocates a page and runs the pgtable_pte_page_ctor().
> + * Allocates a ptdesc and runs the pagetable_pte_ctor().

Allocates memory for page table and ptdesc

>   *
>   * Return: `struct page` initialized as page table or %NULL on error

Return: ptdesc ...

>   */
> @@ -98,8 +102,10 @@ static inline pgtable_t pte_alloc_one(struct mm_struct 
> *mm)
>   */
>  static inline void pte_free(struct mm_struct *mm, struct page *pte_page)
>  {
> - pgtable_pte_page_dtor(pte_page);
> - __free_page(pte_page);
> + struct ptdesc *ptdesc = page_ptdesc(pte_page);
> +
> + pagetable_pte_dtor(ptdesc);
> + pagetable_free(ptdesc);
>  }
>  
>  
> @@ -110,7 +116,7 @@ static inline void pte_free(struct mm_struct *mm, struct 
> page *pte_page)
>   * pmd_alloc_one - allocate a page for PMD-level page table
>   * @mm: the mm_struct of the current context
>   *
> - * Allocates a page and runs the pgtable_pmd_page_ctor().
> + * Allocates a ptdesc and runs the pagetable_pmd_ctor().

Allocate memory for page table and ptdesc

>   * Allocations use %GFP_PGTABLE_USER in user context and
>   * %GFP_PGTABLE_KERNEL in kernel context.
>   *
> @@ -118,28 +124,30 @@ static inline void pte_free(struct mm_struct *mm, 
> struct page *pte_page)
>   */
>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
>  {
> - struct page *page;
> + struct ptdesc *ptdesc;
>   gfp_t gfp = GFP_PGTABLE_USER;
>  
>   if (mm == _mm)
>   gfp = GFP_PGTABLE_KERNEL;
> - page = alloc_page(gfp);
> - if (!page)
> + ptdesc = pagetable_alloc(gfp, 0);
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pmd_page_ctor(page)) {
> - __free_page(page);
> + if (!pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
> - return (pmd_t *)page_address(page);
> + return ptdesc_address(ptdesc);
>  }
>  #endif
>  
>  #ifndef __HAVE_ARCH_PMD_FREE
>  static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
>  {
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
> +
>

Re: [PATCH v4 19/34] pgalloc: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:08PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  include/asm-generic/pgalloc.h | 62 +--
>  1 file changed, 37 insertions(+), 25 deletions(-)
> 
> diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
> index a7cf825befae..3fd6ce79e654 100644
> --- a/include/asm-generic/pgalloc.h
> +++ b/include/asm-generic/pgalloc.h
> @@ -18,7 +18,11 @@
>   */
>  static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm)
>  {
> - return (pte_t *)__get_free_page(GFP_PGTABLE_KERNEL);
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL, 0);
> +
> + if (!ptdesc)
> + return NULL;
> + return ptdesc_address(ptdesc);
>  }
>  
>  #ifndef __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
> @@ -41,7 +45,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct 
> *mm)
>   */
>  static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>  {
> - free_page((unsigned long)pte);
> + pagetable_free(virt_to_ptdesc(pte));
>  }
>  
>  /**
> @@ -49,7 +53,7 @@ static inline void pte_free_kernel(struct mm_struct *mm, 
> pte_t *pte)
>   * @mm: the mm_struct of the current context
>   * @gfp: GFP flags to use for the allocation
>   *
> - * Allocates a page and runs the pgtable_pte_page_ctor().
> + * Allocates a ptdesc and runs the pagetable_pte_ctor().

Allocates memory for page table and ptdesc

>   *
>   * This function is intended for architectures that need
>   * anything beyond simple page allocation or must have custom GFP flags.

The Return: description here should be fixed up

> @@ -58,17 +62,17 @@ static inline void pte_free_kernel(struct mm_struct *mm, 
> pte_t *pte)
>   */
>  static inline pgtable_t __pte_alloc_one(struct mm_struct *mm, gfp_t gfp)
>  {
> - struct page *pte;
> + struct ptdesc *ptdesc;
>  
> - pte = alloc_page(gfp);
> - if (!pte)
> + ptdesc = pagetable_alloc(gfp, 0);
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pte_page_ctor(pte)) {
> - __free_page(pte);
> + if (!pagetable_pte_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
>  
> - return pte;
> + return ptdesc_page(ptdesc);
>  }
>  
>  #ifndef __HAVE_ARCH_PTE_ALLOC_ONE
> @@ -76,7 +80,7 @@ static inline pgtable_t __pte_alloc_one(struct mm_struct 
> *mm, gfp_t gfp)
>   * pte_alloc_one - allocate a page for PTE-level user page table
>   * @mm: the mm_struct of the current context
>   *
> - * Allocates a page and runs the pgtable_pte_page_ctor().
> + * Allocates a ptdesc and runs the pagetable_pte_ctor().

Allocates memory for page table and ptdesc

>   *
>   * Return: `struct page` initialized as page table or %NULL on error

Return: ptdesc ...

>   */
> @@ -98,8 +102,10 @@ static inline pgtable_t pte_alloc_one(struct mm_struct 
> *mm)
>   */
>  static inline void pte_free(struct mm_struct *mm, struct page *pte_page)
>  {
> - pgtable_pte_page_dtor(pte_page);
> - __free_page(pte_page);
> + struct ptdesc *ptdesc = page_ptdesc(pte_page);
> +
> + pagetable_pte_dtor(ptdesc);
> + pagetable_free(ptdesc);
>  }
>  
>  
> @@ -110,7 +116,7 @@ static inline void pte_free(struct mm_struct *mm, struct 
> page *pte_page)
>   * pmd_alloc_one - allocate a page for PMD-level page table
>   * @mm: the mm_struct of the current context
>   *
> - * Allocates a page and runs the pgtable_pmd_page_ctor().
> + * Allocates a ptdesc and runs the pagetable_pmd_ctor().

Allocate memory for page table and ptdesc

>   * Allocations use %GFP_PGTABLE_USER in user context and
>   * %GFP_PGTABLE_KERNEL in kernel context.
>   *
> @@ -118,28 +124,30 @@ static inline void pte_free(struct mm_struct *mm, 
> struct page *pte_page)
>   */
>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
>  {
> - struct page *page;
> + struct ptdesc *ptdesc;
>   gfp_t gfp = GFP_PGTABLE_USER;
>  
>   if (mm == _mm)
>   gfp = GFP_PGTABLE_KERNEL;
> - page = alloc_page(gfp);
> - if (!page)
> + ptdesc = pagetable_alloc(gfp, 0);
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pmd_page_ctor(page)) {
> - __free_page(page);
> + if (!pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
> - return (pmd_t *)page_address(page);
> + return ptdesc_address(ptdesc);
>  }
>  #endif
>  
>  #ifndef __HAVE_ARCH_PMD_FREE
>  static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
>  {
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
> +
>

Re: [PATCH v4 19/34] pgalloc: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:08PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  include/asm-generic/pgalloc.h | 62 +--
>  1 file changed, 37 insertions(+), 25 deletions(-)
> 
> diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
> index a7cf825befae..3fd6ce79e654 100644
> --- a/include/asm-generic/pgalloc.h
> +++ b/include/asm-generic/pgalloc.h
> @@ -18,7 +18,11 @@
>   */
>  static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm)
>  {
> - return (pte_t *)__get_free_page(GFP_PGTABLE_KERNEL);
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL, 0);
> +
> + if (!ptdesc)
> + return NULL;
> + return ptdesc_address(ptdesc);
>  }
>  
>  #ifndef __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
> @@ -41,7 +45,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct 
> *mm)
>   */
>  static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>  {
> - free_page((unsigned long)pte);
> + pagetable_free(virt_to_ptdesc(pte));
>  }
>  
>  /**
> @@ -49,7 +53,7 @@ static inline void pte_free_kernel(struct mm_struct *mm, 
> pte_t *pte)
>   * @mm: the mm_struct of the current context
>   * @gfp: GFP flags to use for the allocation
>   *
> - * Allocates a page and runs the pgtable_pte_page_ctor().
> + * Allocates a ptdesc and runs the pagetable_pte_ctor().

Allocates memory for page table and ptdesc

>   *
>   * This function is intended for architectures that need
>   * anything beyond simple page allocation or must have custom GFP flags.

The Return: description here should be fixed up

> @@ -58,17 +62,17 @@ static inline void pte_free_kernel(struct mm_struct *mm, 
> pte_t *pte)
>   */
>  static inline pgtable_t __pte_alloc_one(struct mm_struct *mm, gfp_t gfp)
>  {
> - struct page *pte;
> + struct ptdesc *ptdesc;
>  
> - pte = alloc_page(gfp);
> - if (!pte)
> + ptdesc = pagetable_alloc(gfp, 0);
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pte_page_ctor(pte)) {
> - __free_page(pte);
> + if (!pagetable_pte_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
>  
> - return pte;
> + return ptdesc_page(ptdesc);
>  }
>  
>  #ifndef __HAVE_ARCH_PTE_ALLOC_ONE
> @@ -76,7 +80,7 @@ static inline pgtable_t __pte_alloc_one(struct mm_struct 
> *mm, gfp_t gfp)
>   * pte_alloc_one - allocate a page for PTE-level user page table
>   * @mm: the mm_struct of the current context
>   *
> - * Allocates a page and runs the pgtable_pte_page_ctor().
> + * Allocates a ptdesc and runs the pagetable_pte_ctor().

Allocates memory for page table and ptdesc

>   *
>   * Return: `struct page` initialized as page table or %NULL on error

Return: ptdesc ...

>   */
> @@ -98,8 +102,10 @@ static inline pgtable_t pte_alloc_one(struct mm_struct 
> *mm)
>   */
>  static inline void pte_free(struct mm_struct *mm, struct page *pte_page)
>  {
> - pgtable_pte_page_dtor(pte_page);
> - __free_page(pte_page);
> + struct ptdesc *ptdesc = page_ptdesc(pte_page);
> +
> + pagetable_pte_dtor(ptdesc);
> + pagetable_free(ptdesc);
>  }
>  
>  
> @@ -110,7 +116,7 @@ static inline void pte_free(struct mm_struct *mm, struct 
> page *pte_page)
>   * pmd_alloc_one - allocate a page for PMD-level page table
>   * @mm: the mm_struct of the current context
>   *
> - * Allocates a page and runs the pgtable_pmd_page_ctor().
> + * Allocates a ptdesc and runs the pagetable_pmd_ctor().

Allocate memory for page table and ptdesc

>   * Allocations use %GFP_PGTABLE_USER in user context and
>   * %GFP_PGTABLE_KERNEL in kernel context.
>   *
> @@ -118,28 +124,30 @@ static inline void pte_free(struct mm_struct *mm, 
> struct page *pte_page)
>   */
>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
>  {
> - struct page *page;
> + struct ptdesc *ptdesc;
>   gfp_t gfp = GFP_PGTABLE_USER;
>  
>   if (mm == _mm)
>   gfp = GFP_PGTABLE_KERNEL;
> - page = alloc_page(gfp);
> - if (!page)
> + ptdesc = pagetable_alloc(gfp, 0);
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pmd_page_ctor(page)) {
> - __free_page(page);
> + if (!pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
> - return (pmd_t *)page_address(page);
> + return ptdesc_address(ptdesc);
>  }
>  #endif
>  
>  #ifndef __HAVE_ARCH_PMD_FREE
>  static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
>  {
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
> +
>

Re: [PATCH v4 18/34] mm: Remove page table members from struct page

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:07PM -0700, Vishal Moola (Oracle) wrote:
> The page table members are now split out into their own ptdesc struct.
> Remove them from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm_types.h | 14 --
>  include/linux/pgtable.h  |  3 ---
>  2 files changed, 17 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6161fe1ae5b8..31ffa1be21d0 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -141,20 +141,6 @@ struct page {
>   struct {/* Tail pages of compound page */
>   unsigned long compound_head;/* Bit zero is set */
>   };
> - struct {/* Page table pages */
> - unsigned long _pt_pad_1;/* compound_head */
> - pgtable_t pmd_huge_pte; /* protected by page->ptl */
> - unsigned long _pt_s390_gaddr;   /* mapping */
> - union {
> - struct mm_struct *pt_mm; /* x86 pgds only */
> - atomic_t pt_frag_refcount; /* powerpc */
> - };
> -#if ALLOC_SPLIT_PTLOCKS
> - spinlock_t *ptl;
> -#else
> - spinlock_t ptl;
> -#endif
> - };
>   struct {/* ZONE_DEVICE pages */
>   /** @pgmap: Points to the hosting device page map. */
>   struct dev_pagemap *pgmap;
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index c405f74d3875..33cc19d752b3 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1019,10 +1019,7 @@ struct ptdesc {
>  TABLE_MATCH(flags, __page_flags);
>  TABLE_MATCH(compound_head, pt_list);
>  TABLE_MATCH(compound_head, _pt_pad_1);
> -TABLE_MATCH(pmd_huge_pte, pmd_huge_pte);
>  TABLE_MATCH(mapping, _pt_s390_gaddr);
> -TABLE_MATCH(pt_mm, pt_mm);
> -TABLE_MATCH(ptl, ptl);
>  #undef TABLE_MATCH
>  static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
>  
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 18/34] mm: Remove page table members from struct page

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:07PM -0700, Vishal Moola (Oracle) wrote:
> The page table members are now split out into their own ptdesc struct.
> Remove them from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm_types.h | 14 --
>  include/linux/pgtable.h  |  3 ---
>  2 files changed, 17 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6161fe1ae5b8..31ffa1be21d0 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -141,20 +141,6 @@ struct page {
>   struct {/* Tail pages of compound page */
>   unsigned long compound_head;/* Bit zero is set */
>   };
> - struct {/* Page table pages */
> - unsigned long _pt_pad_1;/* compound_head */
> - pgtable_t pmd_huge_pte; /* protected by page->ptl */
> - unsigned long _pt_s390_gaddr;   /* mapping */
> - union {
> - struct mm_struct *pt_mm; /* x86 pgds only */
> - atomic_t pt_frag_refcount; /* powerpc */
> - };
> -#if ALLOC_SPLIT_PTLOCKS
> - spinlock_t *ptl;
> -#else
> - spinlock_t ptl;
> -#endif
> - };
>   struct {/* ZONE_DEVICE pages */
>   /** @pgmap: Points to the hosting device page map. */
>   struct dev_pagemap *pgmap;
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index c405f74d3875..33cc19d752b3 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1019,10 +1019,7 @@ struct ptdesc {
>  TABLE_MATCH(flags, __page_flags);
>  TABLE_MATCH(compound_head, pt_list);
>  TABLE_MATCH(compound_head, _pt_pad_1);
> -TABLE_MATCH(pmd_huge_pte, pmd_huge_pte);
>  TABLE_MATCH(mapping, _pt_s390_gaddr);
> -TABLE_MATCH(pt_mm, pt_mm);
> -TABLE_MATCH(ptl, ptl);
>  #undef TABLE_MATCH
>  static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
>  
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 18/34] mm: Remove page table members from struct page

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:07PM -0700, Vishal Moola (Oracle) wrote:
> The page table members are now split out into their own ptdesc struct.
> Remove them from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm_types.h | 14 --
>  include/linux/pgtable.h  |  3 ---
>  2 files changed, 17 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6161fe1ae5b8..31ffa1be21d0 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -141,20 +141,6 @@ struct page {
>   struct {/* Tail pages of compound page */
>   unsigned long compound_head;/* Bit zero is set */
>   };
> - struct {/* Page table pages */
> - unsigned long _pt_pad_1;/* compound_head */
> - pgtable_t pmd_huge_pte; /* protected by page->ptl */
> - unsigned long _pt_s390_gaddr;   /* mapping */
> - union {
> - struct mm_struct *pt_mm; /* x86 pgds only */
> - atomic_t pt_frag_refcount; /* powerpc */
> - };
> -#if ALLOC_SPLIT_PTLOCKS
> - spinlock_t *ptl;
> -#else
> - spinlock_t ptl;
> -#endif
> - };
>   struct {/* ZONE_DEVICE pages */
>   /** @pgmap: Points to the hosting device page map. */
>   struct dev_pagemap *pgmap;
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index c405f74d3875..33cc19d752b3 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1019,10 +1019,7 @@ struct ptdesc {
>  TABLE_MATCH(flags, __page_flags);
>  TABLE_MATCH(compound_head, pt_list);
>  TABLE_MATCH(compound_head, _pt_pad_1);
> -TABLE_MATCH(pmd_huge_pte, pmd_huge_pte);
>  TABLE_MATCH(mapping, _pt_s390_gaddr);
> -TABLE_MATCH(pt_mm, pt_mm);
> -TABLE_MATCH(ptl, ptl);
>  #undef TABLE_MATCH
>  static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
>  
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

___
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [PATCH v4 17/34] s390: Convert various pgalloc functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:06PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/s390/include/asm/pgalloc.h |   4 +-
>  arch/s390/include/asm/tlb.h |   4 +-
>  arch/s390/mm/pgalloc.c  | 108 
>  3 files changed, 59 insertions(+), 57 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
> index 17eb618f1348..00ad9b88fda9 100644
> --- a/arch/s390/include/asm/pgalloc.h
> +++ b/arch/s390/include/asm/pgalloc.h
> @@ -86,7 +86,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
> unsigned long vmaddr)
>   if (!table)
>   return NULL;
>   crst_table_init(table, _SEGMENT_ENTRY_EMPTY);
> - if (!pgtable_pmd_page_ctor(virt_to_page(table))) {
> + if (!pagetable_pmd_ctor(virt_to_ptdesc(table))) {
>   crst_table_free(mm, table);
>   return NULL;
>   }
> @@ -97,7 +97,7 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t 
> *pmd)
>  {
>   if (mm_pmd_folded(mm))
>   return;
> - pgtable_pmd_page_dtor(virt_to_page(pmd));
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd));
>   crst_table_free(mm, (unsigned long *) pmd);
>  }
>  
> diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
> index b91f4a9b044c..383b1f91442c 100644
> --- a/arch/s390/include/asm/tlb.h
> +++ b/arch/s390/include/asm/tlb.h
> @@ -89,12 +89,12 @@ static inline void pmd_free_tlb(struct mmu_gather *tlb, 
> pmd_t *pmd,
>  {
>   if (mm_pmd_folded(tlb->mm))
>   return;
> - pgtable_pmd_page_dtor(virt_to_page(pmd));
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd));
>   __tlb_adjust_range(tlb, address, PAGE_SIZE);
>   tlb->mm->context.flush_mm = 1;
>   tlb->freed_tables = 1;
>   tlb->cleared_puds = 1;
> - tlb_remove_table(tlb, pmd);
> + tlb_remove_ptdesc(tlb, pmd);
>  }
>  
>  /*
> diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
> index 6b99932abc66..eeb7c95b98cf 100644
> --- a/arch/s390/mm/pgalloc.c
> +++ b/arch/s390/mm/pgalloc.c
> @@ -43,17 +43,17 @@ __initcall(page_table_register_sysctl);
>  
>  unsigned long *crst_table_alloc(struct mm_struct *mm)
>  {
> - struct page *page = alloc_pages(GFP_KERNEL, CRST_ALLOC_ORDER);
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, CRST_ALLOC_ORDER);
>  
> - if (!page)
> + if (!ptdesc)
>   return NULL;
> - arch_set_page_dat(page, CRST_ALLOC_ORDER);
> - return (unsigned long *) page_to_virt(page);
> + arch_set_page_dat(ptdesc_page(ptdesc), CRST_ALLOC_ORDER);
> + return (unsigned long *) ptdesc_to_virt(ptdesc);
>  }
>  
>  void crst_table_free(struct mm_struct *mm, unsigned long *table)
>  {
> - free_pages((unsigned long)table, CRST_ALLOC_ORDER);
> + pagetable_free(virt_to_ptdesc(table));
>  }
>  
>  static void __crst_table_upgrade(void *arg)
> @@ -140,21 +140,21 @@ static inline unsigned int atomic_xor_bits(atomic_t *v, 
> unsigned int bits)
>  
>  struct page *page_table_alloc_pgste(struct mm_struct *mm)
>  {
> - struct page *page;
> + struct ptdesc *ptdesc;
>   u64 *table;
>  
> - page = alloc_page(GFP_KERNEL);
> - if (page) {
> - table = (u64 *)page_to_virt(page);
> + ptdesc = pagetable_alloc(GFP_KERNEL, 0);
> + if (ptdesc) {
> + table = (u64 *)ptdesc_to_virt(ptdesc);
>   memset64(table, _PAGE_INVALID, PTRS_PER_PTE);
>   memset64(table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
>   }
> - return page;
> + return ptdesc_page(ptdesc);
>  }
>  
>  void page_table_free_pgste(struct page *page)
>  {
> - __free_page(page);
> + pagetable_free(page_ptdesc(page));
>  }
>  
>  #endif /* CONFIG_PGSTE */
> @@ -230,7 +230,7 @@ void page_table_free_pgste(struct page *page)
>  unsigned long *page_table_alloc(struct mm_struct *mm)
>  {
>   unsigned long *table;
> - struct page *page;
> + struct ptdesc *ptdesc;
>   unsigned int mask, bit;
>  
>   /* Try to get a fragment of a 4K page as a 2K page table */
> @@ -238,9 +238,9 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>

Re: [PATCH v4 17/34] s390: Convert various pgalloc functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:06PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/s390/include/asm/pgalloc.h |   4 +-
>  arch/s390/include/asm/tlb.h |   4 +-
>  arch/s390/mm/pgalloc.c  | 108 
>  3 files changed, 59 insertions(+), 57 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
> index 17eb618f1348..00ad9b88fda9 100644
> --- a/arch/s390/include/asm/pgalloc.h
> +++ b/arch/s390/include/asm/pgalloc.h
> @@ -86,7 +86,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
> unsigned long vmaddr)
>   if (!table)
>   return NULL;
>   crst_table_init(table, _SEGMENT_ENTRY_EMPTY);
> - if (!pgtable_pmd_page_ctor(virt_to_page(table))) {
> + if (!pagetable_pmd_ctor(virt_to_ptdesc(table))) {
>   crst_table_free(mm, table);
>   return NULL;
>   }
> @@ -97,7 +97,7 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t 
> *pmd)
>  {
>   if (mm_pmd_folded(mm))
>   return;
> - pgtable_pmd_page_dtor(virt_to_page(pmd));
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd));
>   crst_table_free(mm, (unsigned long *) pmd);
>  }
>  
> diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
> index b91f4a9b044c..383b1f91442c 100644
> --- a/arch/s390/include/asm/tlb.h
> +++ b/arch/s390/include/asm/tlb.h
> @@ -89,12 +89,12 @@ static inline void pmd_free_tlb(struct mmu_gather *tlb, 
> pmd_t *pmd,
>  {
>   if (mm_pmd_folded(tlb->mm))
>   return;
> - pgtable_pmd_page_dtor(virt_to_page(pmd));
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd));
>   __tlb_adjust_range(tlb, address, PAGE_SIZE);
>   tlb->mm->context.flush_mm = 1;
>   tlb->freed_tables = 1;
>   tlb->cleared_puds = 1;
> - tlb_remove_table(tlb, pmd);
> + tlb_remove_ptdesc(tlb, pmd);
>  }
>  
>  /*
> diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
> index 6b99932abc66..eeb7c95b98cf 100644
> --- a/arch/s390/mm/pgalloc.c
> +++ b/arch/s390/mm/pgalloc.c
> @@ -43,17 +43,17 @@ __initcall(page_table_register_sysctl);
>  
>  unsigned long *crst_table_alloc(struct mm_struct *mm)
>  {
> - struct page *page = alloc_pages(GFP_KERNEL, CRST_ALLOC_ORDER);
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, CRST_ALLOC_ORDER);
>  
> - if (!page)
> + if (!ptdesc)
>   return NULL;
> - arch_set_page_dat(page, CRST_ALLOC_ORDER);
> - return (unsigned long *) page_to_virt(page);
> + arch_set_page_dat(ptdesc_page(ptdesc), CRST_ALLOC_ORDER);
> + return (unsigned long *) ptdesc_to_virt(ptdesc);
>  }
>  
>  void crst_table_free(struct mm_struct *mm, unsigned long *table)
>  {
> - free_pages((unsigned long)table, CRST_ALLOC_ORDER);
> + pagetable_free(virt_to_ptdesc(table));
>  }
>  
>  static void __crst_table_upgrade(void *arg)
> @@ -140,21 +140,21 @@ static inline unsigned int atomic_xor_bits(atomic_t *v, 
> unsigned int bits)
>  
>  struct page *page_table_alloc_pgste(struct mm_struct *mm)
>  {
> - struct page *page;
> + struct ptdesc *ptdesc;
>   u64 *table;
>  
> - page = alloc_page(GFP_KERNEL);
> - if (page) {
> - table = (u64 *)page_to_virt(page);
> + ptdesc = pagetable_alloc(GFP_KERNEL, 0);
> + if (ptdesc) {
> + table = (u64 *)ptdesc_to_virt(ptdesc);
>   memset64(table, _PAGE_INVALID, PTRS_PER_PTE);
>   memset64(table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
>   }
> - return page;
> + return ptdesc_page(ptdesc);
>  }
>  
>  void page_table_free_pgste(struct page *page)
>  {
> - __free_page(page);
> + pagetable_free(page_ptdesc(page));
>  }
>  
>  #endif /* CONFIG_PGSTE */
> @@ -230,7 +230,7 @@ void page_table_free_pgste(struct page *page)
>  unsigned long *page_table_alloc(struct mm_struct *mm)
>  {
>   unsigned long *table;
> - struct page *page;
> + struct ptdesc *ptdesc;
>   unsigned int mask, bit;
>  
>   /* Try to get a fragment of a 4K page as a 2K page table */
> @@ -238,9 +238,9 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>

Re: [PATCH v4 17/34] s390: Convert various pgalloc functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:06PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/s390/include/asm/pgalloc.h |   4 +-
>  arch/s390/include/asm/tlb.h |   4 +-
>  arch/s390/mm/pgalloc.c  | 108 
>  3 files changed, 59 insertions(+), 57 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
> index 17eb618f1348..00ad9b88fda9 100644
> --- a/arch/s390/include/asm/pgalloc.h
> +++ b/arch/s390/include/asm/pgalloc.h
> @@ -86,7 +86,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
> unsigned long vmaddr)
>   if (!table)
>   return NULL;
>   crst_table_init(table, _SEGMENT_ENTRY_EMPTY);
> - if (!pgtable_pmd_page_ctor(virt_to_page(table))) {
> + if (!pagetable_pmd_ctor(virt_to_ptdesc(table))) {
>   crst_table_free(mm, table);
>   return NULL;
>   }
> @@ -97,7 +97,7 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t 
> *pmd)
>  {
>   if (mm_pmd_folded(mm))
>   return;
> - pgtable_pmd_page_dtor(virt_to_page(pmd));
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd));
>   crst_table_free(mm, (unsigned long *) pmd);
>  }
>  
> diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
> index b91f4a9b044c..383b1f91442c 100644
> --- a/arch/s390/include/asm/tlb.h
> +++ b/arch/s390/include/asm/tlb.h
> @@ -89,12 +89,12 @@ static inline void pmd_free_tlb(struct mmu_gather *tlb, 
> pmd_t *pmd,
>  {
>   if (mm_pmd_folded(tlb->mm))
>   return;
> - pgtable_pmd_page_dtor(virt_to_page(pmd));
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd));
>   __tlb_adjust_range(tlb, address, PAGE_SIZE);
>   tlb->mm->context.flush_mm = 1;
>   tlb->freed_tables = 1;
>   tlb->cleared_puds = 1;
> - tlb_remove_table(tlb, pmd);
> + tlb_remove_ptdesc(tlb, pmd);
>  }
>  
>  /*
> diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
> index 6b99932abc66..eeb7c95b98cf 100644
> --- a/arch/s390/mm/pgalloc.c
> +++ b/arch/s390/mm/pgalloc.c
> @@ -43,17 +43,17 @@ __initcall(page_table_register_sysctl);
>  
>  unsigned long *crst_table_alloc(struct mm_struct *mm)
>  {
> - struct page *page = alloc_pages(GFP_KERNEL, CRST_ALLOC_ORDER);
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, CRST_ALLOC_ORDER);
>  
> - if (!page)
> + if (!ptdesc)
>   return NULL;
> - arch_set_page_dat(page, CRST_ALLOC_ORDER);
> - return (unsigned long *) page_to_virt(page);
> + arch_set_page_dat(ptdesc_page(ptdesc), CRST_ALLOC_ORDER);
> + return (unsigned long *) ptdesc_to_virt(ptdesc);
>  }
>  
>  void crst_table_free(struct mm_struct *mm, unsigned long *table)
>  {
> - free_pages((unsigned long)table, CRST_ALLOC_ORDER);
> + pagetable_free(virt_to_ptdesc(table));
>  }
>  
>  static void __crst_table_upgrade(void *arg)
> @@ -140,21 +140,21 @@ static inline unsigned int atomic_xor_bits(atomic_t *v, 
> unsigned int bits)
>  
>  struct page *page_table_alloc_pgste(struct mm_struct *mm)
>  {
> - struct page *page;
> + struct ptdesc *ptdesc;
>   u64 *table;
>  
> - page = alloc_page(GFP_KERNEL);
> - if (page) {
> - table = (u64 *)page_to_virt(page);
> + ptdesc = pagetable_alloc(GFP_KERNEL, 0);
> + if (ptdesc) {
> + table = (u64 *)ptdesc_to_virt(ptdesc);
>   memset64(table, _PAGE_INVALID, PTRS_PER_PTE);
>   memset64(table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
>   }
> - return page;
> + return ptdesc_page(ptdesc);
>  }
>  
>  void page_table_free_pgste(struct page *page)
>  {
> - __free_page(page);
> + pagetable_free(page_ptdesc(page));
>  }
>  
>  #endif /* CONFIG_PGSTE */
> @@ -230,7 +230,7 @@ void page_table_free_pgste(struct page *page)
>  unsigned long *page_table_alloc(struct mm_struct *mm)
>  {
>   unsigned long *table;
> - struct page *page;
> + struct ptdesc *ptdesc;
>   unsigned int mask, bit;
>  
>   /* Try to get a fragment of a 4K page as a 2K page table */
> @@ -238,9 +238,9 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>

Re: [PATCH v4 16/34] s390: Convert various gmap functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:05PM -0700, Vishal Moola (Oracle) wrote:
> In order to split struct ptdesc from struct page, convert various
> functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

With folding

ptdesc->_pt_s390_gaddr = 0;

into pagetable_free()

Acked-by: Mike Rapoport (IBM) 


> ---
>  arch/s390/mm/gmap.c | 230 
>  1 file changed, 128 insertions(+), 102 deletions(-)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 81c683426b49..010e87df7299 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -34,7 +34,7 @@
>  static struct gmap *gmap_alloc(unsigned long limit)
>  {
>   struct gmap *gmap;
> - struct page *page;
> + struct ptdesc *ptdesc;
>   unsigned long *table;
>   unsigned long etype, atype;
>  
> @@ -67,12 +67,12 @@ static struct gmap *gmap_alloc(unsigned long limit)
>   spin_lock_init(>guest_table_lock);
>   spin_lock_init(>shadow_lock);
>   refcount_set(>ref_count, 1);
> - page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> - if (!page)
> + ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> + if (!ptdesc)
>   goto out_free;
> - page->_pt_s390_gaddr = 0;
> - list_add(>lru, >crst_list);
> - table = page_to_virt(page);
> + ptdesc->_pt_s390_gaddr = 0;
> + list_add(>pt_list, >crst_list);
> + table = ptdesc_to_virt(ptdesc);
>   crst_table_init(table, etype);
>   gmap->table = table;
>   gmap->asce = atype | _ASCE_TABLE_LENGTH |
> @@ -181,25 +181,25 @@ static void gmap_rmap_radix_tree_free(struct 
> radix_tree_root *root)
>   */
>  static void gmap_free(struct gmap *gmap)
>  {
> - struct page *page, *next;
> + struct ptdesc *ptdesc, *next;
>  
>   /* Flush tlb of all gmaps (if not already done for shadows) */
>   if (!(gmap_is_shadow(gmap) && gmap->removed))
>   gmap_flush_tlb(gmap);
>   /* Free all segment & region tables. */
> - list_for_each_entry_safe(page, next, >crst_list, lru) {
> - page->_pt_s390_gaddr = 0;
> - __free_pages(page, CRST_ALLOC_ORDER);
> + list_for_each_entry_safe(ptdesc, next, >crst_list, pt_list) {
> + ptdesc->_pt_s390_gaddr = 0;
> + pagetable_free(ptdesc);
>   }
>   gmap_radix_tree_free(>guest_to_host);
>   gmap_radix_tree_free(>host_to_guest);
>  
>   /* Free additional data for a shadow gmap */
>   if (gmap_is_shadow(gmap)) {
> - /* Free all page tables. */
> - list_for_each_entry_safe(page, next, >pt_list, lru) {
> - page->_pt_s390_gaddr = 0;
> - page_table_free_pgste(page);
> + /* Free all ptdesc tables. */
> + list_for_each_entry_safe(ptdesc, next, >pt_list, pt_list) 
> {
> + ptdesc->_pt_s390_gaddr = 0;
> + page_table_free_pgste(ptdesc_page(ptdesc));
>   }
>   gmap_rmap_radix_tree_free(>host_to_rmap);
>   /* Release reference to the parent */
> @@ -308,27 +308,27 @@ EXPORT_SYMBOL_GPL(gmap_get_enabled);
>  static int gmap_alloc_table(struct gmap *gmap, unsigned long *table,
>   unsigned long init, unsigned long gaddr)
>  {
> - struct page *page;
> + struct ptdesc *ptdesc;
>   unsigned long *new;
>  
>   /* since we dont free the gmap table until gmap_free we can unlock */
> - page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> - if (!page)
> + ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> + if (!ptdesc)
>   return -ENOMEM;
> - new = page_to_virt(page);
> + new = ptdesc_to_virt(ptdesc);
>   crst_table_init(new, init);
>   spin_lock(>guest_table_lock);
>   if (*table & _REGION_ENTRY_INVALID) {
> - list_add(>lru, >crst_list);
> + list_add(>pt_list, >crst_list);
>   *table = __pa(new) | _REGION_ENTRY_LENGTH |
>   (*table & _REGION_ENTRY_TYPE_MASK);
> - page->_pt_s390_gaddr = gaddr;
> - page = NULL;
> + ptdesc->_pt_s390_gaddr = gaddr;
> + ptdesc = NULL;
>   }
>   spin_unlock(>guest_table_lock);
> - if (page) {
> - page

Re: [PATCH v4 16/34] s390: Convert various gmap functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:05PM -0700, Vishal Moola (Oracle) wrote:
> In order to split struct ptdesc from struct page, convert various
> functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

With folding

ptdesc->_pt_s390_gaddr = 0;

into pagetable_free()

Acked-by: Mike Rapoport (IBM) 


> ---
>  arch/s390/mm/gmap.c | 230 
>  1 file changed, 128 insertions(+), 102 deletions(-)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 81c683426b49..010e87df7299 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -34,7 +34,7 @@
>  static struct gmap *gmap_alloc(unsigned long limit)
>  {
>   struct gmap *gmap;
> - struct page *page;
> + struct ptdesc *ptdesc;
>   unsigned long *table;
>   unsigned long etype, atype;
>  
> @@ -67,12 +67,12 @@ static struct gmap *gmap_alloc(unsigned long limit)
>   spin_lock_init(>guest_table_lock);
>   spin_lock_init(>shadow_lock);
>   refcount_set(>ref_count, 1);
> - page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> - if (!page)
> + ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> + if (!ptdesc)
>   goto out_free;
> - page->_pt_s390_gaddr = 0;
> - list_add(>lru, >crst_list);
> - table = page_to_virt(page);
> + ptdesc->_pt_s390_gaddr = 0;
> + list_add(>pt_list, >crst_list);
> + table = ptdesc_to_virt(ptdesc);
>   crst_table_init(table, etype);
>   gmap->table = table;
>   gmap->asce = atype | _ASCE_TABLE_LENGTH |
> @@ -181,25 +181,25 @@ static void gmap_rmap_radix_tree_free(struct 
> radix_tree_root *root)
>   */
>  static void gmap_free(struct gmap *gmap)
>  {
> - struct page *page, *next;
> + struct ptdesc *ptdesc, *next;
>  
>   /* Flush tlb of all gmaps (if not already done for shadows) */
>   if (!(gmap_is_shadow(gmap) && gmap->removed))
>   gmap_flush_tlb(gmap);
>   /* Free all segment & region tables. */
> - list_for_each_entry_safe(page, next, >crst_list, lru) {
> - page->_pt_s390_gaddr = 0;
> - __free_pages(page, CRST_ALLOC_ORDER);
> + list_for_each_entry_safe(ptdesc, next, >crst_list, pt_list) {
> + ptdesc->_pt_s390_gaddr = 0;
> + pagetable_free(ptdesc);
>   }
>   gmap_radix_tree_free(>guest_to_host);
>   gmap_radix_tree_free(>host_to_guest);
>  
>   /* Free additional data for a shadow gmap */
>   if (gmap_is_shadow(gmap)) {
> - /* Free all page tables. */
> - list_for_each_entry_safe(page, next, >pt_list, lru) {
> - page->_pt_s390_gaddr = 0;
> - page_table_free_pgste(page);
> + /* Free all ptdesc tables. */
> + list_for_each_entry_safe(ptdesc, next, >pt_list, pt_list) 
> {
> + ptdesc->_pt_s390_gaddr = 0;
> + page_table_free_pgste(ptdesc_page(ptdesc));
>   }
>   gmap_rmap_radix_tree_free(>host_to_rmap);
>   /* Release reference to the parent */
> @@ -308,27 +308,27 @@ EXPORT_SYMBOL_GPL(gmap_get_enabled);
>  static int gmap_alloc_table(struct gmap *gmap, unsigned long *table,
>   unsigned long init, unsigned long gaddr)
>  {
> - struct page *page;
> + struct ptdesc *ptdesc;
>   unsigned long *new;
>  
>   /* since we dont free the gmap table until gmap_free we can unlock */
> - page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> - if (!page)
> + ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> + if (!ptdesc)
>   return -ENOMEM;
> - new = page_to_virt(page);
> + new = ptdesc_to_virt(ptdesc);
>   crst_table_init(new, init);
>   spin_lock(>guest_table_lock);
>   if (*table & _REGION_ENTRY_INVALID) {
> - list_add(>lru, >crst_list);
> + list_add(>pt_list, >crst_list);
>   *table = __pa(new) | _REGION_ENTRY_LENGTH |
>   (*table & _REGION_ENTRY_TYPE_MASK);
> - page->_pt_s390_gaddr = gaddr;
> - page = NULL;
> + ptdesc->_pt_s390_gaddr = gaddr;
> + ptdesc = NULL;
>   }
>   spin_unlock(>guest_table_lock);
> - if (page) {
> - page

Re: [PATCH v4 16/34] s390: Convert various gmap functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:05PM -0700, Vishal Moola (Oracle) wrote:
> In order to split struct ptdesc from struct page, convert various
> functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

With folding

ptdesc->_pt_s390_gaddr = 0;

into pagetable_free()

Acked-by: Mike Rapoport (IBM) 


> ---
>  arch/s390/mm/gmap.c | 230 
>  1 file changed, 128 insertions(+), 102 deletions(-)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 81c683426b49..010e87df7299 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -34,7 +34,7 @@
>  static struct gmap *gmap_alloc(unsigned long limit)
>  {
>   struct gmap *gmap;
> - struct page *page;
> + struct ptdesc *ptdesc;
>   unsigned long *table;
>   unsigned long etype, atype;
>  
> @@ -67,12 +67,12 @@ static struct gmap *gmap_alloc(unsigned long limit)
>   spin_lock_init(>guest_table_lock);
>   spin_lock_init(>shadow_lock);
>   refcount_set(>ref_count, 1);
> - page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> - if (!page)
> + ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> + if (!ptdesc)
>   goto out_free;
> - page->_pt_s390_gaddr = 0;
> - list_add(>lru, >crst_list);
> - table = page_to_virt(page);
> + ptdesc->_pt_s390_gaddr = 0;
> + list_add(>pt_list, >crst_list);
> + table = ptdesc_to_virt(ptdesc);
>   crst_table_init(table, etype);
>   gmap->table = table;
>   gmap->asce = atype | _ASCE_TABLE_LENGTH |
> @@ -181,25 +181,25 @@ static void gmap_rmap_radix_tree_free(struct 
> radix_tree_root *root)
>   */
>  static void gmap_free(struct gmap *gmap)
>  {
> - struct page *page, *next;
> + struct ptdesc *ptdesc, *next;
>  
>   /* Flush tlb of all gmaps (if not already done for shadows) */
>   if (!(gmap_is_shadow(gmap) && gmap->removed))
>   gmap_flush_tlb(gmap);
>   /* Free all segment & region tables. */
> - list_for_each_entry_safe(page, next, >crst_list, lru) {
> - page->_pt_s390_gaddr = 0;
> - __free_pages(page, CRST_ALLOC_ORDER);
> + list_for_each_entry_safe(ptdesc, next, >crst_list, pt_list) {
> + ptdesc->_pt_s390_gaddr = 0;
> + pagetable_free(ptdesc);
>   }
>   gmap_radix_tree_free(>guest_to_host);
>   gmap_radix_tree_free(>host_to_guest);
>  
>   /* Free additional data for a shadow gmap */
>   if (gmap_is_shadow(gmap)) {
> - /* Free all page tables. */
> - list_for_each_entry_safe(page, next, >pt_list, lru) {
> - page->_pt_s390_gaddr = 0;
> - page_table_free_pgste(page);
> + /* Free all ptdesc tables. */
> + list_for_each_entry_safe(ptdesc, next, >pt_list, pt_list) 
> {
> + ptdesc->_pt_s390_gaddr = 0;
> + page_table_free_pgste(ptdesc_page(ptdesc));
>   }
>   gmap_rmap_radix_tree_free(>host_to_rmap);
>   /* Release reference to the parent */
> @@ -308,27 +308,27 @@ EXPORT_SYMBOL_GPL(gmap_get_enabled);
>  static int gmap_alloc_table(struct gmap *gmap, unsigned long *table,
>   unsigned long init, unsigned long gaddr)
>  {
> - struct page *page;
> + struct ptdesc *ptdesc;
>   unsigned long *new;
>  
>   /* since we dont free the gmap table until gmap_free we can unlock */
> - page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> - if (!page)
> + ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> + if (!ptdesc)
>   return -ENOMEM;
> - new = page_to_virt(page);
> + new = ptdesc_to_virt(ptdesc);
>   crst_table_init(new, init);
>   spin_lock(>guest_table_lock);
>   if (*table & _REGION_ENTRY_INVALID) {
> - list_add(>lru, >crst_list);
> + list_add(>pt_list, >crst_list);
>   *table = __pa(new) | _REGION_ENTRY_LENGTH |
>   (*table & _REGION_ENTRY_TYPE_MASK);
> - page->_pt_s390_gaddr = gaddr;
> - page = NULL;
> + ptdesc->_pt_s390_gaddr = gaddr;
> + ptdesc = NULL;
>   }
>   spin_unlock(>guest_table_lock);
> - if (page) {
> - page

Re: [PATCH v4 15/34] x86: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:04PM -0700, Vishal Moola (Oracle) wrote:
> In order to split struct ptdesc from struct page, convert various
> functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert

Nit:   *get_free_page*()

> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.

More importantly, get_free_pages() ensures a page won't be allocated from
HIGHMEM, and for 32-bits this is a must.
 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  arch/x86/mm/pgtable.c | 46 +--
>  1 file changed, 27 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index 15a8009a4480..6da7fd5d4782 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -52,7 +52,7 @@ early_param("userpte", setup_userpte);
>  
>  void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
>  {
> - pgtable_pte_page_dtor(pte);
> + pagetable_pte_dtor(page_ptdesc(pte));
>   paravirt_release_pte(page_to_pfn(pte));
>   paravirt_tlb_remove_table(tlb, pte);
>  }
> @@ -60,7 +60,7 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page 
> *pte)
>  #if CONFIG_PGTABLE_LEVELS > 2
>  void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
>  {
> - struct page *page = virt_to_page(pmd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
>   paravirt_release_pmd(__pa(pmd) >> PAGE_SHIFT);
>   /*
>* NOTE! For PAE, any changes to the top page-directory-pointer-table
> @@ -69,8 +69,8 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
>  #ifdef CONFIG_X86_PAE
>   tlb->need_flush_all = 1;
>  #endif
> - pgtable_pmd_page_dtor(page);
> - paravirt_tlb_remove_table(tlb, page);
> + pagetable_pmd_dtor(ptdesc);
> + paravirt_tlb_remove_table(tlb, ptdesc_page(ptdesc));
>  }
>  
>  #if CONFIG_PGTABLE_LEVELS > 3
> @@ -92,16 +92,16 @@ void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
>  
>  static inline void pgd_list_add(pgd_t *pgd)
>  {
> - struct page *page = virt_to_page(pgd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pgd);
>  
> - list_add(>lru, _list);
> + list_add(>pt_list, _list);
>  }
>  
>  static inline void pgd_list_del(pgd_t *pgd)
>  {
> - struct page *page = virt_to_page(pgd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pgd);
>  
> - list_del(>lru);
> + list_del(>pt_list);
>  }
>  
>  #define UNSHARED_PTRS_PER_PGD\
> @@ -112,12 +112,12 @@ static inline void pgd_list_del(pgd_t *pgd)
>  
>  static void pgd_set_mm(pgd_t *pgd, struct mm_struct *mm)
>  {
> - virt_to_page(pgd)->pt_mm = mm;
> + virt_to_ptdesc(pgd)->pt_mm = mm;
>  }
>  
>  struct mm_struct *pgd_page_get_mm(struct page *page)
>  {
> - return page->pt_mm;
> + return page_ptdesc(page)->pt_mm;
>  }
>  
>  static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
> @@ -213,11 +213,14 @@ void pud_populate(struct mm_struct *mm, pud_t *pudp, 
> pmd_t *pmd)
>  static void free_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
>  {
>   int i;
> + struct ptdesc *ptdesc;
>  
>   for (i = 0; i < count; i++)
>   if (pmds[i]) {
> - pgtable_pmd_page_dtor(virt_to_page(pmds[i]));
> - free_page((unsigned long)pmds[i]);
> + ptdesc = virt_to_ptdesc(pmds[i]);
> +
> + pagetable_pmd_dtor(ptdesc);
> + pagetable_free(ptdesc);
>   mm_dec_nr_pmds(mm);
>   }
>  }
> @@ -232,16 +235,21 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t 
> *pmds[], int count)
>   gfp &= ~__GFP_ACCOUNT;
>  
>   for (i = 0; i < count; i++) {
> - pmd_t *pmd = (pmd_t *)__get_free_page(gfp);
> - if (!pmd)
> + pmd_t *pmd = NULL;
> + struct ptdesc *ptdesc = pagetable_alloc(gfp, 0);
> +
> + if (!ptdesc)
>   failed = true;
> - if (pmd && !pgtable_pmd_page_ctor(virt_to_page(pmd))) {
> - free_page((unsigned long)pmd);
> - pmd = NULL;
> + if (ptdesc && !pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
> + ptdesc = NULL;
>   failed = true;
>   }
> - if (pmd)
> + if (ptdesc) {
>   mm_inc_nr_pmds(mm);
> + pmd = ptdesc_address(ptdesc);
> + }
> +
>   pmds[i] = pmd;
>   }
>  
> @@ -830,7 +838,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
>  
>   free_page((unsigned long)pmd_sv);
>  
> - pgtable_pmd_page_dtor(virt_to_page(pmd));
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd));
>   free_page((unsigned long)pmd);
>  
>   return 1;
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 15/34] x86: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:04PM -0700, Vishal Moola (Oracle) wrote:
> In order to split struct ptdesc from struct page, convert various
> functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert

Nit:   *get_free_page*()

> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.

More importantly, get_free_pages() ensures a page won't be allocated from
HIGHMEM, and for 32-bits this is a must.
 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  arch/x86/mm/pgtable.c | 46 +--
>  1 file changed, 27 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index 15a8009a4480..6da7fd5d4782 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -52,7 +52,7 @@ early_param("userpte", setup_userpte);
>  
>  void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
>  {
> - pgtable_pte_page_dtor(pte);
> + pagetable_pte_dtor(page_ptdesc(pte));
>   paravirt_release_pte(page_to_pfn(pte));
>   paravirt_tlb_remove_table(tlb, pte);
>  }
> @@ -60,7 +60,7 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page 
> *pte)
>  #if CONFIG_PGTABLE_LEVELS > 2
>  void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
>  {
> - struct page *page = virt_to_page(pmd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
>   paravirt_release_pmd(__pa(pmd) >> PAGE_SHIFT);
>   /*
>* NOTE! For PAE, any changes to the top page-directory-pointer-table
> @@ -69,8 +69,8 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
>  #ifdef CONFIG_X86_PAE
>   tlb->need_flush_all = 1;
>  #endif
> - pgtable_pmd_page_dtor(page);
> - paravirt_tlb_remove_table(tlb, page);
> + pagetable_pmd_dtor(ptdesc);
> + paravirt_tlb_remove_table(tlb, ptdesc_page(ptdesc));
>  }
>  
>  #if CONFIG_PGTABLE_LEVELS > 3
> @@ -92,16 +92,16 @@ void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
>  
>  static inline void pgd_list_add(pgd_t *pgd)
>  {
> - struct page *page = virt_to_page(pgd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pgd);
>  
> - list_add(>lru, _list);
> + list_add(>pt_list, _list);
>  }
>  
>  static inline void pgd_list_del(pgd_t *pgd)
>  {
> - struct page *page = virt_to_page(pgd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pgd);
>  
> - list_del(>lru);
> + list_del(>pt_list);
>  }
>  
>  #define UNSHARED_PTRS_PER_PGD\
> @@ -112,12 +112,12 @@ static inline void pgd_list_del(pgd_t *pgd)
>  
>  static void pgd_set_mm(pgd_t *pgd, struct mm_struct *mm)
>  {
> - virt_to_page(pgd)->pt_mm = mm;
> + virt_to_ptdesc(pgd)->pt_mm = mm;
>  }
>  
>  struct mm_struct *pgd_page_get_mm(struct page *page)
>  {
> - return page->pt_mm;
> + return page_ptdesc(page)->pt_mm;
>  }
>  
>  static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
> @@ -213,11 +213,14 @@ void pud_populate(struct mm_struct *mm, pud_t *pudp, 
> pmd_t *pmd)
>  static void free_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
>  {
>   int i;
> + struct ptdesc *ptdesc;
>  
>   for (i = 0; i < count; i++)
>   if (pmds[i]) {
> - pgtable_pmd_page_dtor(virt_to_page(pmds[i]));
> - free_page((unsigned long)pmds[i]);
> + ptdesc = virt_to_ptdesc(pmds[i]);
> +
> + pagetable_pmd_dtor(ptdesc);
> + pagetable_free(ptdesc);
>   mm_dec_nr_pmds(mm);
>   }
>  }
> @@ -232,16 +235,21 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t 
> *pmds[], int count)
>   gfp &= ~__GFP_ACCOUNT;
>  
>   for (i = 0; i < count; i++) {
> - pmd_t *pmd = (pmd_t *)__get_free_page(gfp);
> - if (!pmd)
> + pmd_t *pmd = NULL;
> + struct ptdesc *ptdesc = pagetable_alloc(gfp, 0);
> +
> + if (!ptdesc)
>   failed = true;
> - if (pmd && !pgtable_pmd_page_ctor(virt_to_page(pmd))) {
> - free_page((unsigned long)pmd);
> - pmd = NULL;
> + if (ptdesc && !pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
> + ptdesc = NULL;
>   failed = true;
>   }
> - if (pmd)
> + if (ptdesc) {
>   mm_inc_nr_pmds(mm);
> + pmd = ptdesc_address(ptdesc);
> + }
> +
>   pmds[i] = pmd;
>   }
>  
> @@ -830,7 +838,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
>  
>   free_page((unsigned long)pmd_sv);
>  
> - pgtable_pmd_page_dtor(virt_to_page(pmd));
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd));
>   free_page((unsigned long)pmd);
>  
>   return 1;
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 15/34] x86: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:04PM -0700, Vishal Moola (Oracle) wrote:
> In order to split struct ptdesc from struct page, convert various
> functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert

Nit:   *get_free_page*()

> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.

More importantly, get_free_pages() ensures a page won't be allocated from
HIGHMEM, and for 32-bits this is a must.
 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  arch/x86/mm/pgtable.c | 46 +--
>  1 file changed, 27 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index 15a8009a4480..6da7fd5d4782 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -52,7 +52,7 @@ early_param("userpte", setup_userpte);
>  
>  void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
>  {
> - pgtable_pte_page_dtor(pte);
> + pagetable_pte_dtor(page_ptdesc(pte));
>   paravirt_release_pte(page_to_pfn(pte));
>   paravirt_tlb_remove_table(tlb, pte);
>  }
> @@ -60,7 +60,7 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page 
> *pte)
>  #if CONFIG_PGTABLE_LEVELS > 2
>  void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
>  {
> - struct page *page = virt_to_page(pmd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
>   paravirt_release_pmd(__pa(pmd) >> PAGE_SHIFT);
>   /*
>* NOTE! For PAE, any changes to the top page-directory-pointer-table
> @@ -69,8 +69,8 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
>  #ifdef CONFIG_X86_PAE
>   tlb->need_flush_all = 1;
>  #endif
> - pgtable_pmd_page_dtor(page);
> - paravirt_tlb_remove_table(tlb, page);
> + pagetable_pmd_dtor(ptdesc);
> + paravirt_tlb_remove_table(tlb, ptdesc_page(ptdesc));
>  }
>  
>  #if CONFIG_PGTABLE_LEVELS > 3
> @@ -92,16 +92,16 @@ void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
>  
>  static inline void pgd_list_add(pgd_t *pgd)
>  {
> - struct page *page = virt_to_page(pgd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pgd);
>  
> - list_add(>lru, _list);
> + list_add(>pt_list, _list);
>  }
>  
>  static inline void pgd_list_del(pgd_t *pgd)
>  {
> - struct page *page = virt_to_page(pgd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pgd);
>  
> - list_del(>lru);
> + list_del(>pt_list);
>  }
>  
>  #define UNSHARED_PTRS_PER_PGD\
> @@ -112,12 +112,12 @@ static inline void pgd_list_del(pgd_t *pgd)
>  
>  static void pgd_set_mm(pgd_t *pgd, struct mm_struct *mm)
>  {
> - virt_to_page(pgd)->pt_mm = mm;
> + virt_to_ptdesc(pgd)->pt_mm = mm;
>  }
>  
>  struct mm_struct *pgd_page_get_mm(struct page *page)
>  {
> - return page->pt_mm;
> + return page_ptdesc(page)->pt_mm;
>  }
>  
>  static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
> @@ -213,11 +213,14 @@ void pud_populate(struct mm_struct *mm, pud_t *pudp, 
> pmd_t *pmd)
>  static void free_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
>  {
>   int i;
> + struct ptdesc *ptdesc;
>  
>   for (i = 0; i < count; i++)
>   if (pmds[i]) {
> - pgtable_pmd_page_dtor(virt_to_page(pmds[i]));
> - free_page((unsigned long)pmds[i]);
> + ptdesc = virt_to_ptdesc(pmds[i]);
> +
> + pagetable_pmd_dtor(ptdesc);
> + pagetable_free(ptdesc);
>   mm_dec_nr_pmds(mm);
>   }
>  }
> @@ -232,16 +235,21 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t 
> *pmds[], int count)
>   gfp &= ~__GFP_ACCOUNT;
>  
>   for (i = 0; i < count; i++) {
> - pmd_t *pmd = (pmd_t *)__get_free_page(gfp);
> - if (!pmd)
> + pmd_t *pmd = NULL;
> + struct ptdesc *ptdesc = pagetable_alloc(gfp, 0);
> +
> + if (!ptdesc)
>   failed = true;
> - if (pmd && !pgtable_pmd_page_ctor(virt_to_page(pmd))) {
> - free_page((unsigned long)pmd);
> - pmd = NULL;
> + if (ptdesc && !pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
> + ptdesc = NULL;
>   failed = true;
>   }
> - if (pmd)
> + if (ptdesc) {
>   mm_inc_nr_pmds(mm);
> + pmd = ptdesc_address(ptdesc);
> + }
> +
>   pmds[i] = pmd;
>   }
>  
> @@ -830,7 +838,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
>  
>   free_page((unsigned long)pmd_sv);
>  
> - pgtable_pmd_page_dtor(virt_to_page(pmd));
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd));
>   free_page((unsigned long)pmd);
>  
>   return 1;
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 14/34] powerpc: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:03PM -0700, Vishal Moola (Oracle) wrote:
> In order to split struct ptdesc from struct page, convert various
> functions to use ptdescs.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/powerpc/mm/book3s64/mmu_context.c | 10 +++---
>  arch/powerpc/mm/book3s64/pgtable.c | 32 +-
>  arch/powerpc/mm/pgtable-frag.c | 46 +-
>  3 files changed, 44 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/mmu_context.c 
> b/arch/powerpc/mm/book3s64/mmu_context.c
> index c766e4c26e42..1715b07c630c 100644
> --- a/arch/powerpc/mm/book3s64/mmu_context.c
> +++ b/arch/powerpc/mm/book3s64/mmu_context.c
> @@ -246,15 +246,15 @@ static void destroy_contexts(mm_context_t *ctx)
>  static void pmd_frag_destroy(void *pmd_frag)
>  {
>   int count;
> - struct page *page;
> + struct ptdesc *ptdesc;
>  
> - page = virt_to_page(pmd_frag);
> + ptdesc = virt_to_ptdesc(pmd_frag);
>   /* drop all the pending references */
>   count = ((unsigned long)pmd_frag & ~PAGE_MASK) >> PMD_FRAG_SIZE_SHIFT;
>   /* We allow PTE_FRAG_NR fragments from a PTE page */
> - if (atomic_sub_and_test(PMD_FRAG_NR - count, >pt_frag_refcount)) {
> - pgtable_pmd_page_dtor(page);
> - __free_page(page);
> + if (atomic_sub_and_test(PMD_FRAG_NR - count, 
> >pt_frag_refcount)) {
> + pagetable_pmd_dtor(ptdesc);
> + pagetable_free(ptdesc);
>   }
>  }
>  
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
> b/arch/powerpc/mm/book3s64/pgtable.c
> index 85c84e89e3ea..1212deeabe15 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -306,22 +306,22 @@ static pmd_t *get_pmd_from_cache(struct mm_struct *mm)
>  static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
>  {
>   void *ret = NULL;
> - struct page *page;
> + struct ptdesc *ptdesc;
>   gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO;
>  
>   if (mm == _mm)
>   gfp &= ~__GFP_ACCOUNT;
> - page = alloc_page(gfp);
> - if (!page)
> + ptdesc = pagetable_alloc(gfp, 0);
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pmd_page_ctor(page)) {
> - __free_pages(page, 0);
> + if (!pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
>  
> - atomic_set(>pt_frag_refcount, 1);
> + atomic_set(>pt_frag_refcount, 1);
>  
> - ret = page_address(page);
> + ret = ptdesc_address(ptdesc);
>   /*
>* if we support only one fragment just return the
>* allocated page.
> @@ -331,12 +331,12 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
>  
>   spin_lock(>page_table_lock);
>   /*
> -  * If we find pgtable_page set, we return
> +  * If we find ptdesc_page set, we return
>* the allocated page with single fragment
>* count.
>*/
>   if (likely(!mm->context.pmd_frag)) {
> - atomic_set(>pt_frag_refcount, PMD_FRAG_NR);
> + atomic_set(>pt_frag_refcount, PMD_FRAG_NR);
>   mm->context.pmd_frag = ret + PMD_FRAG_SIZE;
>   }
>   spin_unlock(>page_table_lock);
> @@ -357,15 +357,15 @@ pmd_t *pmd_fragment_alloc(struct mm_struct *mm, 
> unsigned long vmaddr)
>  
>  void pmd_fragment_free(unsigned long *pmd)
>  {
> - struct page *page = virt_to_page(pmd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
>  
> - if (PageReserved(page))
> - return free_reserved_page(page);
> + if (pagetable_is_reserved(ptdesc))
> + return free_reserved_ptdesc(ptdesc);
>  
> - BUG_ON(atomic_read(>pt_frag_refcount) <= 0);
> - if (atomic_dec_and_test(>pt_frag_refcount)) {
> - pgtable_pmd_page_dtor(page);
> - __free_page(page);
> + BUG_ON(atomic_read(>pt_frag_refcount) <= 0);
> + if (atomic_dec_and_test(>pt_frag_refcount)) {
> + pagetable_pmd_dtor(ptdesc);
> + pagetable_free(ptdesc);
>   }
>  }
>  
> diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
> index 20652daa1d7e..8961f1540209 100644
> --- a/arch/powerpc/mm/pgtable-frag.c
> +++ b/arch/powerpc/mm/pgtable-frag.c
> @@ -18,15 +18,15 @@
>  void pte_frag_destroy(void *pte_frag)
>  {
>   int count;
> - struct page *page;
> + struct ptdesc *ptdesc;
>  
> - page = virt_to_page(pte_frag);
> + ptdesc = virt_t

Re: [PATCH v4 14/34] powerpc: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:03PM -0700, Vishal Moola (Oracle) wrote:
> In order to split struct ptdesc from struct page, convert various
> functions to use ptdescs.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/powerpc/mm/book3s64/mmu_context.c | 10 +++---
>  arch/powerpc/mm/book3s64/pgtable.c | 32 +-
>  arch/powerpc/mm/pgtable-frag.c | 46 +-
>  3 files changed, 44 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/mmu_context.c 
> b/arch/powerpc/mm/book3s64/mmu_context.c
> index c766e4c26e42..1715b07c630c 100644
> --- a/arch/powerpc/mm/book3s64/mmu_context.c
> +++ b/arch/powerpc/mm/book3s64/mmu_context.c
> @@ -246,15 +246,15 @@ static void destroy_contexts(mm_context_t *ctx)
>  static void pmd_frag_destroy(void *pmd_frag)
>  {
>   int count;
> - struct page *page;
> + struct ptdesc *ptdesc;
>  
> - page = virt_to_page(pmd_frag);
> + ptdesc = virt_to_ptdesc(pmd_frag);
>   /* drop all the pending references */
>   count = ((unsigned long)pmd_frag & ~PAGE_MASK) >> PMD_FRAG_SIZE_SHIFT;
>   /* We allow PTE_FRAG_NR fragments from a PTE page */
> - if (atomic_sub_and_test(PMD_FRAG_NR - count, >pt_frag_refcount)) {
> - pgtable_pmd_page_dtor(page);
> - __free_page(page);
> + if (atomic_sub_and_test(PMD_FRAG_NR - count, 
> >pt_frag_refcount)) {
> + pagetable_pmd_dtor(ptdesc);
> + pagetable_free(ptdesc);
>   }
>  }
>  
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
> b/arch/powerpc/mm/book3s64/pgtable.c
> index 85c84e89e3ea..1212deeabe15 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -306,22 +306,22 @@ static pmd_t *get_pmd_from_cache(struct mm_struct *mm)
>  static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
>  {
>   void *ret = NULL;
> - struct page *page;
> + struct ptdesc *ptdesc;
>   gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO;
>  
>   if (mm == _mm)
>   gfp &= ~__GFP_ACCOUNT;
> - page = alloc_page(gfp);
> - if (!page)
> + ptdesc = pagetable_alloc(gfp, 0);
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pmd_page_ctor(page)) {
> - __free_pages(page, 0);
> + if (!pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
>  
> - atomic_set(>pt_frag_refcount, 1);
> + atomic_set(>pt_frag_refcount, 1);
>  
> - ret = page_address(page);
> + ret = ptdesc_address(ptdesc);
>   /*
>* if we support only one fragment just return the
>* allocated page.
> @@ -331,12 +331,12 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
>  
>   spin_lock(>page_table_lock);
>   /*
> -  * If we find pgtable_page set, we return
> +  * If we find ptdesc_page set, we return
>* the allocated page with single fragment
>* count.
>*/
>   if (likely(!mm->context.pmd_frag)) {
> - atomic_set(>pt_frag_refcount, PMD_FRAG_NR);
> + atomic_set(>pt_frag_refcount, PMD_FRAG_NR);
>   mm->context.pmd_frag = ret + PMD_FRAG_SIZE;
>   }
>   spin_unlock(>page_table_lock);
> @@ -357,15 +357,15 @@ pmd_t *pmd_fragment_alloc(struct mm_struct *mm, 
> unsigned long vmaddr)
>  
>  void pmd_fragment_free(unsigned long *pmd)
>  {
> - struct page *page = virt_to_page(pmd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
>  
> - if (PageReserved(page))
> - return free_reserved_page(page);
> + if (pagetable_is_reserved(ptdesc))
> + return free_reserved_ptdesc(ptdesc);
>  
> - BUG_ON(atomic_read(>pt_frag_refcount) <= 0);
> - if (atomic_dec_and_test(>pt_frag_refcount)) {
> - pgtable_pmd_page_dtor(page);
> - __free_page(page);
> + BUG_ON(atomic_read(>pt_frag_refcount) <= 0);
> + if (atomic_dec_and_test(>pt_frag_refcount)) {
> + pagetable_pmd_dtor(ptdesc);
> + pagetable_free(ptdesc);
>   }
>  }
>  
> diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
> index 20652daa1d7e..8961f1540209 100644
> --- a/arch/powerpc/mm/pgtable-frag.c
> +++ b/arch/powerpc/mm/pgtable-frag.c
> @@ -18,15 +18,15 @@
>  void pte_frag_destroy(void *pte_frag)
>  {
>   int count;
> - struct page *page;
> + struct ptdesc *ptdesc;
>  
> - page = virt_to_page(pte_frag);
> + ptdesc = virt_t

Re: [PATCH v4 14/34] powerpc: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:03PM -0700, Vishal Moola (Oracle) wrote:
> In order to split struct ptdesc from struct page, convert various
> functions to use ptdescs.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/powerpc/mm/book3s64/mmu_context.c | 10 +++---
>  arch/powerpc/mm/book3s64/pgtable.c | 32 +-
>  arch/powerpc/mm/pgtable-frag.c | 46 +-
>  3 files changed, 44 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/mmu_context.c 
> b/arch/powerpc/mm/book3s64/mmu_context.c
> index c766e4c26e42..1715b07c630c 100644
> --- a/arch/powerpc/mm/book3s64/mmu_context.c
> +++ b/arch/powerpc/mm/book3s64/mmu_context.c
> @@ -246,15 +246,15 @@ static void destroy_contexts(mm_context_t *ctx)
>  static void pmd_frag_destroy(void *pmd_frag)
>  {
>   int count;
> - struct page *page;
> + struct ptdesc *ptdesc;
>  
> - page = virt_to_page(pmd_frag);
> + ptdesc = virt_to_ptdesc(pmd_frag);
>   /* drop all the pending references */
>   count = ((unsigned long)pmd_frag & ~PAGE_MASK) >> PMD_FRAG_SIZE_SHIFT;
>   /* We allow PTE_FRAG_NR fragments from a PTE page */
> - if (atomic_sub_and_test(PMD_FRAG_NR - count, >pt_frag_refcount)) {
> - pgtable_pmd_page_dtor(page);
> - __free_page(page);
> + if (atomic_sub_and_test(PMD_FRAG_NR - count, 
> >pt_frag_refcount)) {
> + pagetable_pmd_dtor(ptdesc);
> + pagetable_free(ptdesc);
>   }
>  }
>  
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
> b/arch/powerpc/mm/book3s64/pgtable.c
> index 85c84e89e3ea..1212deeabe15 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -306,22 +306,22 @@ static pmd_t *get_pmd_from_cache(struct mm_struct *mm)
>  static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
>  {
>   void *ret = NULL;
> - struct page *page;
> + struct ptdesc *ptdesc;
>   gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO;
>  
>   if (mm == _mm)
>   gfp &= ~__GFP_ACCOUNT;
> - page = alloc_page(gfp);
> - if (!page)
> + ptdesc = pagetable_alloc(gfp, 0);
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pmd_page_ctor(page)) {
> - __free_pages(page, 0);
> + if (!pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
>  
> - atomic_set(>pt_frag_refcount, 1);
> + atomic_set(>pt_frag_refcount, 1);
>  
> - ret = page_address(page);
> + ret = ptdesc_address(ptdesc);
>   /*
>* if we support only one fragment just return the
>* allocated page.
> @@ -331,12 +331,12 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
>  
>   spin_lock(>page_table_lock);
>   /*
> -  * If we find pgtable_page set, we return
> +  * If we find ptdesc_page set, we return
>* the allocated page with single fragment
>* count.
>*/
>   if (likely(!mm->context.pmd_frag)) {
> - atomic_set(>pt_frag_refcount, PMD_FRAG_NR);
> + atomic_set(>pt_frag_refcount, PMD_FRAG_NR);
>   mm->context.pmd_frag = ret + PMD_FRAG_SIZE;
>   }
>   spin_unlock(>page_table_lock);
> @@ -357,15 +357,15 @@ pmd_t *pmd_fragment_alloc(struct mm_struct *mm, 
> unsigned long vmaddr)
>  
>  void pmd_fragment_free(unsigned long *pmd)
>  {
> - struct page *page = virt_to_page(pmd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
>  
> - if (PageReserved(page))
> - return free_reserved_page(page);
> + if (pagetable_is_reserved(ptdesc))
> + return free_reserved_ptdesc(ptdesc);
>  
> - BUG_ON(atomic_read(>pt_frag_refcount) <= 0);
> - if (atomic_dec_and_test(>pt_frag_refcount)) {
> - pgtable_pmd_page_dtor(page);
> - __free_page(page);
> + BUG_ON(atomic_read(>pt_frag_refcount) <= 0);
> + if (atomic_dec_and_test(>pt_frag_refcount)) {
> + pagetable_pmd_dtor(ptdesc);
> + pagetable_free(ptdesc);
>   }
>  }
>  
> diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
> index 20652daa1d7e..8961f1540209 100644
> --- a/arch/powerpc/mm/pgtable-frag.c
> +++ b/arch/powerpc/mm/pgtable-frag.c
> @@ -18,15 +18,15 @@
>  void pte_frag_destroy(void *pte_frag)
>  {
>   int count;
> - struct page *page;
> + struct ptdesc *ptdesc;
>  
> - page = virt_to_page(pte_frag);
> + ptdesc = virt_t

Re: [PATCH v4 13/34] mm: Create ptdesc equivalents for pgtable_{pte,pmd}_page_{ctor,dtor}

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:02PM -0700, Vishal Moola (Oracle) wrote:
> Creates pagetable_pte_ctor(), pagetable_pmd_ctor(), pagetable_pte_dtor(),
> and pagetable_pmd_dtor() and make the original pgtable
> constructor/destructors wrappers.

Nit: either "creates ... makes" or "create ... make"
I like the second form more.
 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 56 ++
>  1 file changed, 42 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a1af7983e1bd..dc211c43610b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2886,20 +2886,34 @@ static inline bool ptlock_init(struct ptdesc *ptdesc) 
> { return true; }
>  static inline void ptlock_free(struct ptdesc *ptdesc) {}
>  #endif /* USE_SPLIT_PTE_PTLOCKS */
>  
> -static inline bool pgtable_pte_page_ctor(struct page *page)
> +static inline bool pagetable_pte_ctor(struct ptdesc *ptdesc)
>  {
> - if (!ptlock_init(page_ptdesc(page)))
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + if (!ptlock_init(ptdesc))
>   return false;
> - __SetPageTable(page);
> - inc_lruvec_page_state(page, NR_PAGETABLE);
> + __folio_set_table(folio);

This comment is more to patch 1 ("mm: Add PAGE_TYPE_OP folio functions")

It would be better to have _pgtable here, as "table" does not necessary
mean page table.
With PageType SetPageTable was fine, but with folio I think it should be
more explicit.

I'd add a third parameter to PAGE_TYPE_OPS for that.

> + lruvec_stat_add_folio(folio, NR_PAGETABLE);
>   return true;
>  }
>  
> +static inline bool pgtable_pte_page_ctor(struct page *page)
> +{
> + return pagetable_pte_ctor(page_ptdesc(page));
> +}
> +
> +static inline void pagetable_pte_dtor(struct ptdesc *ptdesc)
> +{
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + ptlock_free(ptdesc);
> + __folio_clear_table(folio);
> + lruvec_stat_sub_folio(folio, NR_PAGETABLE);
> +}
> +
>  static inline void pgtable_pte_page_dtor(struct page *page)
>  {
> - ptlock_free(page_ptdesc(page));
> - __ClearPageTable(page);
> - dec_lruvec_page_state(page, NR_PAGETABLE);
> + pagetable_pte_dtor(page_ptdesc(page));
>  }
>  
>  #define pte_offset_map_lock(mm, pmd, address, ptlp)  \
> @@ -2981,20 +2995,34 @@ static inline spinlock_t *pmd_lock(struct mm_struct 
> *mm, pmd_t *pmd)
>   return ptl;
>  }
>  
> -static inline bool pgtable_pmd_page_ctor(struct page *page)
> +static inline bool pagetable_pmd_ctor(struct ptdesc *ptdesc)
>  {
> - if (!pmd_ptlock_init(page_ptdesc(page)))
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + if (!pmd_ptlock_init(ptdesc))
>   return false;
> - __SetPageTable(page);
> - inc_lruvec_page_state(page, NR_PAGETABLE);
> + __folio_set_table(folio);
> + lruvec_stat_add_folio(folio, NR_PAGETABLE);
>   return true;
>  }
>  
> +static inline bool pgtable_pmd_page_ctor(struct page *page)
> +{
> + return pagetable_pmd_ctor(page_ptdesc(page));
> +}
> +
> +static inline void pagetable_pmd_dtor(struct ptdesc *ptdesc)
> +{
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + pmd_ptlock_free(ptdesc);
> + __folio_clear_table(folio);
> + lruvec_stat_sub_folio(folio, NR_PAGETABLE);
> +}
> +
>  static inline void pgtable_pmd_page_dtor(struct page *page)
>  {
> - pmd_ptlock_free(page_ptdesc(page));
> - __ClearPageTable(page);
> - dec_lruvec_page_state(page, NR_PAGETABLE);
> + pagetable_pmd_dtor(page_ptdesc(page));
>  }
>  
>  /*
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 13/34] mm: Create ptdesc equivalents for pgtable_{pte,pmd}_page_{ctor,dtor}

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:02PM -0700, Vishal Moola (Oracle) wrote:
> Creates pagetable_pte_ctor(), pagetable_pmd_ctor(), pagetable_pte_dtor(),
> and pagetable_pmd_dtor() and make the original pgtable
> constructor/destructors wrappers.

Nit: either "creates ... makes" or "create ... make"
I like the second form more.
 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 56 ++
>  1 file changed, 42 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a1af7983e1bd..dc211c43610b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2886,20 +2886,34 @@ static inline bool ptlock_init(struct ptdesc *ptdesc) 
> { return true; }
>  static inline void ptlock_free(struct ptdesc *ptdesc) {}
>  #endif /* USE_SPLIT_PTE_PTLOCKS */
>  
> -static inline bool pgtable_pte_page_ctor(struct page *page)
> +static inline bool pagetable_pte_ctor(struct ptdesc *ptdesc)
>  {
> - if (!ptlock_init(page_ptdesc(page)))
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + if (!ptlock_init(ptdesc))
>   return false;
> - __SetPageTable(page);
> - inc_lruvec_page_state(page, NR_PAGETABLE);
> + __folio_set_table(folio);

This comment is more to patch 1 ("mm: Add PAGE_TYPE_OP folio functions")

It would be better to have _pgtable here, as "table" does not necessary
mean page table.
With PageType SetPageTable was fine, but with folio I think it should be
more explicit.

I'd add a third parameter to PAGE_TYPE_OPS for that.

> + lruvec_stat_add_folio(folio, NR_PAGETABLE);
>   return true;
>  }
>  
> +static inline bool pgtable_pte_page_ctor(struct page *page)
> +{
> + return pagetable_pte_ctor(page_ptdesc(page));
> +}
> +
> +static inline void pagetable_pte_dtor(struct ptdesc *ptdesc)
> +{
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + ptlock_free(ptdesc);
> + __folio_clear_table(folio);
> + lruvec_stat_sub_folio(folio, NR_PAGETABLE);
> +}
> +
>  static inline void pgtable_pte_page_dtor(struct page *page)
>  {
> - ptlock_free(page_ptdesc(page));
> - __ClearPageTable(page);
> - dec_lruvec_page_state(page, NR_PAGETABLE);
> + pagetable_pte_dtor(page_ptdesc(page));
>  }
>  
>  #define pte_offset_map_lock(mm, pmd, address, ptlp)  \
> @@ -2981,20 +2995,34 @@ static inline spinlock_t *pmd_lock(struct mm_struct 
> *mm, pmd_t *pmd)
>   return ptl;
>  }
>  
> -static inline bool pgtable_pmd_page_ctor(struct page *page)
> +static inline bool pagetable_pmd_ctor(struct ptdesc *ptdesc)
>  {
> - if (!pmd_ptlock_init(page_ptdesc(page)))
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + if (!pmd_ptlock_init(ptdesc))
>   return false;
> - __SetPageTable(page);
> - inc_lruvec_page_state(page, NR_PAGETABLE);
> + __folio_set_table(folio);
> + lruvec_stat_add_folio(folio, NR_PAGETABLE);
>   return true;
>  }
>  
> +static inline bool pgtable_pmd_page_ctor(struct page *page)
> +{
> + return pagetable_pmd_ctor(page_ptdesc(page));
> +}
> +
> +static inline void pagetable_pmd_dtor(struct ptdesc *ptdesc)
> +{
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + pmd_ptlock_free(ptdesc);
> + __folio_clear_table(folio);
> + lruvec_stat_sub_folio(folio, NR_PAGETABLE);
> +}
> +
>  static inline void pgtable_pmd_page_dtor(struct page *page)
>  {
> - pmd_ptlock_free(page_ptdesc(page));
> - __ClearPageTable(page);
> - dec_lruvec_page_state(page, NR_PAGETABLE);
> + pagetable_pmd_dtor(page_ptdesc(page));
>  }
>  
>  /*
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 13/34] mm: Create ptdesc equivalents for pgtable_{pte,pmd}_page_{ctor,dtor}

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:02PM -0700, Vishal Moola (Oracle) wrote:
> Creates pagetable_pte_ctor(), pagetable_pmd_ctor(), pagetable_pte_dtor(),
> and pagetable_pmd_dtor() and make the original pgtable
> constructor/destructors wrappers.

Nit: either "creates ... makes" or "create ... make"
I like the second form more.
 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 56 ++
>  1 file changed, 42 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a1af7983e1bd..dc211c43610b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2886,20 +2886,34 @@ static inline bool ptlock_init(struct ptdesc *ptdesc) 
> { return true; }
>  static inline void ptlock_free(struct ptdesc *ptdesc) {}
>  #endif /* USE_SPLIT_PTE_PTLOCKS */
>  
> -static inline bool pgtable_pte_page_ctor(struct page *page)
> +static inline bool pagetable_pte_ctor(struct ptdesc *ptdesc)
>  {
> - if (!ptlock_init(page_ptdesc(page)))
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + if (!ptlock_init(ptdesc))
>   return false;
> - __SetPageTable(page);
> - inc_lruvec_page_state(page, NR_PAGETABLE);
> + __folio_set_table(folio);

This comment is more to patch 1 ("mm: Add PAGE_TYPE_OP folio functions")

It would be better to have _pgtable here, as "table" does not necessary
mean page table.
With PageType SetPageTable was fine, but with folio I think it should be
more explicit.

I'd add a third parameter to PAGE_TYPE_OPS for that.

> + lruvec_stat_add_folio(folio, NR_PAGETABLE);
>   return true;
>  }
>  
> +static inline bool pgtable_pte_page_ctor(struct page *page)
> +{
> + return pagetable_pte_ctor(page_ptdesc(page));
> +}
> +
> +static inline void pagetable_pte_dtor(struct ptdesc *ptdesc)
> +{
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + ptlock_free(ptdesc);
> + __folio_clear_table(folio);
> + lruvec_stat_sub_folio(folio, NR_PAGETABLE);
> +}
> +
>  static inline void pgtable_pte_page_dtor(struct page *page)
>  {
> - ptlock_free(page_ptdesc(page));
> - __ClearPageTable(page);
> - dec_lruvec_page_state(page, NR_PAGETABLE);
> + pagetable_pte_dtor(page_ptdesc(page));
>  }
>  
>  #define pte_offset_map_lock(mm, pmd, address, ptlp)  \
> @@ -2981,20 +2995,34 @@ static inline spinlock_t *pmd_lock(struct mm_struct 
> *mm, pmd_t *pmd)
>   return ptl;
>  }
>  
> -static inline bool pgtable_pmd_page_ctor(struct page *page)
> +static inline bool pagetable_pmd_ctor(struct ptdesc *ptdesc)
>  {
> - if (!pmd_ptlock_init(page_ptdesc(page)))
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + if (!pmd_ptlock_init(ptdesc))
>   return false;
> - __SetPageTable(page);
> - inc_lruvec_page_state(page, NR_PAGETABLE);
> + __folio_set_table(folio);
> + lruvec_stat_add_folio(folio, NR_PAGETABLE);
>   return true;
>  }
>  
> +static inline bool pgtable_pmd_page_ctor(struct page *page)
> +{
> + return pagetable_pmd_ctor(page_ptdesc(page));
> +}
> +
> +static inline void pagetable_pmd_dtor(struct ptdesc *ptdesc)
> +{
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + pmd_ptlock_free(ptdesc);
> + __folio_clear_table(folio);
> + lruvec_stat_sub_folio(folio, NR_PAGETABLE);
> +}
> +
>  static inline void pgtable_pmd_page_dtor(struct page *page)
>  {
> - pmd_ptlock_free(page_ptdesc(page));
> - __ClearPageTable(page);
> - dec_lruvec_page_state(page, NR_PAGETABLE);
> + pagetable_pmd_dtor(page_ptdesc(page));
>  }
>  
>  /*
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

___
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [PATCH v4 12/34] mm: Convert ptlock_free() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:01PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 10 +-
>  mm/memory.c|  4 ++--
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 3b54bb4c9753..a1af7983e1bd 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2826,7 +2826,7 @@ static inline void pagetable_clear(void *x)
>  #if ALLOC_SPLIT_PTLOCKS
>  void __init ptlock_cache_init(void);
>  bool ptlock_alloc(struct ptdesc *ptdesc);
> -extern void ptlock_free(struct page *page);
> +void ptlock_free(struct ptdesc *ptdesc);
>  
>  static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
>  {
> @@ -2842,7 +2842,7 @@ static inline bool ptlock_alloc(struct ptdesc *ptdesc)
>   return true;
>  }
>  
> -static inline void ptlock_free(struct page *page)
> +static inline void ptlock_free(struct ptdesc *ptdesc)
>  {
>  }
>  
> @@ -2883,7 +2883,7 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>  }
>  static inline void ptlock_cache_init(void) {}
>  static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
> -static inline void ptlock_free(struct page *page) {}
> +static inline void ptlock_free(struct ptdesc *ptdesc) {}
>  #endif /* USE_SPLIT_PTE_PTLOCKS */
>  
>  static inline bool pgtable_pte_page_ctor(struct page *page)
> @@ -2897,7 +2897,7 @@ static inline bool pgtable_pte_page_ctor(struct page 
> *page)
>  
>  static inline void pgtable_pte_page_dtor(struct page *page)
>  {
> - ptlock_free(page);
> + ptlock_free(page_ptdesc(page));
>   __ClearPageTable(page);
>   dec_lruvec_page_state(page, NR_PAGETABLE);
>  }
> @@ -2955,7 +2955,7 @@ static inline void pmd_ptlock_free(struct ptdesc 
> *ptdesc)
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>   VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte, ptdesc_page(ptdesc));
>  #endif
> - ptlock_free(ptdesc_page(ptdesc));
> + ptlock_free(ptdesc);
>  }
>  
>  #define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
> diff --git a/mm/memory.c b/mm/memory.c
> index ba9579117686..d4d2ea5cf0fd 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5945,8 +5945,8 @@ bool ptlock_alloc(struct ptdesc *ptdesc)
>   return true;
>  }
>  
> -void ptlock_free(struct page *page)
> +void ptlock_free(struct ptdesc *ptdesc)
>  {
> - kmem_cache_free(page_ptl_cachep, page->ptl);
> + kmem_cache_free(page_ptl_cachep, ptdesc->ptl);
>  }
>  #endif
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 12/34] mm: Convert ptlock_free() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:01PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 10 +-
>  mm/memory.c|  4 ++--
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 3b54bb4c9753..a1af7983e1bd 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2826,7 +2826,7 @@ static inline void pagetable_clear(void *x)
>  #if ALLOC_SPLIT_PTLOCKS
>  void __init ptlock_cache_init(void);
>  bool ptlock_alloc(struct ptdesc *ptdesc);
> -extern void ptlock_free(struct page *page);
> +void ptlock_free(struct ptdesc *ptdesc);
>  
>  static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
>  {
> @@ -2842,7 +2842,7 @@ static inline bool ptlock_alloc(struct ptdesc *ptdesc)
>   return true;
>  }
>  
> -static inline void ptlock_free(struct page *page)
> +static inline void ptlock_free(struct ptdesc *ptdesc)
>  {
>  }
>  
> @@ -2883,7 +2883,7 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>  }
>  static inline void ptlock_cache_init(void) {}
>  static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
> -static inline void ptlock_free(struct page *page) {}
> +static inline void ptlock_free(struct ptdesc *ptdesc) {}
>  #endif /* USE_SPLIT_PTE_PTLOCKS */
>  
>  static inline bool pgtable_pte_page_ctor(struct page *page)
> @@ -2897,7 +2897,7 @@ static inline bool pgtable_pte_page_ctor(struct page 
> *page)
>  
>  static inline void pgtable_pte_page_dtor(struct page *page)
>  {
> - ptlock_free(page);
> + ptlock_free(page_ptdesc(page));
>   __ClearPageTable(page);
>   dec_lruvec_page_state(page, NR_PAGETABLE);
>  }
> @@ -2955,7 +2955,7 @@ static inline void pmd_ptlock_free(struct ptdesc 
> *ptdesc)
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>   VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte, ptdesc_page(ptdesc));
>  #endif
> - ptlock_free(ptdesc_page(ptdesc));
> + ptlock_free(ptdesc);
>  }
>  
>  #define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
> diff --git a/mm/memory.c b/mm/memory.c
> index ba9579117686..d4d2ea5cf0fd 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5945,8 +5945,8 @@ bool ptlock_alloc(struct ptdesc *ptdesc)
>   return true;
>  }
>  
> -void ptlock_free(struct page *page)
> +void ptlock_free(struct ptdesc *ptdesc)
>  {
> - kmem_cache_free(page_ptl_cachep, page->ptl);
> + kmem_cache_free(page_ptl_cachep, ptdesc->ptl);
>  }
>  #endif
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

___
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [PATCH v4 12/34] mm: Convert ptlock_free() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:01PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 10 +-
>  mm/memory.c|  4 ++--
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 3b54bb4c9753..a1af7983e1bd 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2826,7 +2826,7 @@ static inline void pagetable_clear(void *x)
>  #if ALLOC_SPLIT_PTLOCKS
>  void __init ptlock_cache_init(void);
>  bool ptlock_alloc(struct ptdesc *ptdesc);
> -extern void ptlock_free(struct page *page);
> +void ptlock_free(struct ptdesc *ptdesc);
>  
>  static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
>  {
> @@ -2842,7 +2842,7 @@ static inline bool ptlock_alloc(struct ptdesc *ptdesc)
>   return true;
>  }
>  
> -static inline void ptlock_free(struct page *page)
> +static inline void ptlock_free(struct ptdesc *ptdesc)
>  {
>  }
>  
> @@ -2883,7 +2883,7 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>  }
>  static inline void ptlock_cache_init(void) {}
>  static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
> -static inline void ptlock_free(struct page *page) {}
> +static inline void ptlock_free(struct ptdesc *ptdesc) {}
>  #endif /* USE_SPLIT_PTE_PTLOCKS */
>  
>  static inline bool pgtable_pte_page_ctor(struct page *page)
> @@ -2897,7 +2897,7 @@ static inline bool pgtable_pte_page_ctor(struct page 
> *page)
>  
>  static inline void pgtable_pte_page_dtor(struct page *page)
>  {
> - ptlock_free(page);
> + ptlock_free(page_ptdesc(page));
>   __ClearPageTable(page);
>   dec_lruvec_page_state(page, NR_PAGETABLE);
>  }
> @@ -2955,7 +2955,7 @@ static inline void pmd_ptlock_free(struct ptdesc 
> *ptdesc)
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>   VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte, ptdesc_page(ptdesc));
>  #endif
> - ptlock_free(ptdesc_page(ptdesc));
> + ptlock_free(ptdesc);
>  }
>  
>  #define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
> diff --git a/mm/memory.c b/mm/memory.c
> index ba9579117686..d4d2ea5cf0fd 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5945,8 +5945,8 @@ bool ptlock_alloc(struct ptdesc *ptdesc)
>   return true;
>  }
>  
> -void ptlock_free(struct page *page)
> +void ptlock_free(struct ptdesc *ptdesc)
>  {
> - kmem_cache_free(page_ptl_cachep, page->ptl);
> + kmem_cache_free(page_ptl_cachep, ptdesc->ptl);
>  }
>  #endif
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 11/34] mm: Convert pmd_ptlock_free() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:00PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f48e626d9c98..3b54bb4c9753 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2950,12 +2950,12 @@ static inline bool pmd_ptlock_init(struct ptdesc 
> *ptdesc)
>   return ptlock_init(ptdesc);
>  }
>  
> -static inline void pmd_ptlock_free(struct page *page)
> +static inline void pmd_ptlock_free(struct ptdesc *ptdesc)
>  {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - VM_BUG_ON_PAGE(page->pmd_huge_pte, page);
> + VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte, ptdesc_page(ptdesc));
>  #endif
> - ptlock_free(page);
> + ptlock_free(ptdesc_page(ptdesc));
>  }
>  
>  #define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
> @@ -2968,7 +2968,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>  }
>  
>  static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; }
> -static inline void pmd_ptlock_free(struct page *page) {}
> +static inline void pmd_ptlock_free(struct ptdesc *ptdesc) {}
>  
>  #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
>  
> @@ -2992,7 +2992,7 @@ static inline bool pgtable_pmd_page_ctor(struct page 
> *page)
>  
>  static inline void pgtable_pmd_page_dtor(struct page *page)
>  {
> - pmd_ptlock_free(page);
> + pmd_ptlock_free(page_ptdesc(page));
>   __ClearPageTable(page);
>   dec_lruvec_page_state(page, NR_PAGETABLE);
>  }
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 11/34] mm: Convert pmd_ptlock_free() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:00PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f48e626d9c98..3b54bb4c9753 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2950,12 +2950,12 @@ static inline bool pmd_ptlock_init(struct ptdesc 
> *ptdesc)
>   return ptlock_init(ptdesc);
>  }
>  
> -static inline void pmd_ptlock_free(struct page *page)
> +static inline void pmd_ptlock_free(struct ptdesc *ptdesc)
>  {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - VM_BUG_ON_PAGE(page->pmd_huge_pte, page);
> + VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte, ptdesc_page(ptdesc));
>  #endif
> - ptlock_free(page);
> + ptlock_free(ptdesc_page(ptdesc));
>  }
>  
>  #define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
> @@ -2968,7 +2968,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>  }
>  
>  static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; }
> -static inline void pmd_ptlock_free(struct page *page) {}
> +static inline void pmd_ptlock_free(struct ptdesc *ptdesc) {}
>  
>  #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
>  
> @@ -2992,7 +2992,7 @@ static inline bool pgtable_pmd_page_ctor(struct page 
> *page)
>  
>  static inline void pgtable_pmd_page_dtor(struct page *page)
>  {
> - pmd_ptlock_free(page);
> + pmd_ptlock_free(page_ptdesc(page));
>   __ClearPageTable(page);
>   dec_lruvec_page_state(page, NR_PAGETABLE);
>  }
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 11/34] mm: Convert pmd_ptlock_free() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:04:00PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f48e626d9c98..3b54bb4c9753 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2950,12 +2950,12 @@ static inline bool pmd_ptlock_init(struct ptdesc 
> *ptdesc)
>   return ptlock_init(ptdesc);
>  }
>  
> -static inline void pmd_ptlock_free(struct page *page)
> +static inline void pmd_ptlock_free(struct ptdesc *ptdesc)
>  {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - VM_BUG_ON_PAGE(page->pmd_huge_pte, page);
> + VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte, ptdesc_page(ptdesc));
>  #endif
> - ptlock_free(page);
> + ptlock_free(ptdesc_page(ptdesc));
>  }
>  
>  #define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
> @@ -2968,7 +2968,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>  }
>  
>  static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; }
> -static inline void pmd_ptlock_free(struct page *page) {}
> +static inline void pmd_ptlock_free(struct ptdesc *ptdesc) {}
>  
>  #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
>  
> @@ -2992,7 +2992,7 @@ static inline bool pgtable_pmd_page_ctor(struct page 
> *page)
>  
>  static inline void pgtable_pmd_page_dtor(struct page *page)
>  {
> - pmd_ptlock_free(page);
> + pmd_ptlock_free(page_ptdesc(page));
>   __ClearPageTable(page);
>   dec_lruvec_page_state(page, NR_PAGETABLE);
>  }
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

___
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [PATCH v4 10/34] mm: Convert ptlock_init() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:59PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index daecf1db6cf1..f48e626d9c98 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2857,7 +2857,7 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>   return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
>  }
>  
> -static inline bool ptlock_init(struct page *page)
> +static inline bool ptlock_init(struct ptdesc *ptdesc)
>  {
>   /*
>* prep_new_page() initialize page->private (and therefore page->ptl)
> @@ -2866,10 +2866,10 @@ static inline bool ptlock_init(struct page *page)
>* It can happen if arch try to use slab for page table allocation:
>* slab code uses page->slab_cache, which share storage with page->ptl.
>*/
> - VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
> - if (!ptlock_alloc(page_ptdesc(page)))
> + VM_BUG_ON_PAGE(*(unsigned long *)>ptl, ptdesc_page(ptdesc));
> + if (!ptlock_alloc(ptdesc))
>   return false;
> - spin_lock_init(ptlock_ptr(page_ptdesc(page)));
> + spin_lock_init(ptlock_ptr(ptdesc));
>   return true;
>  }
>  
> @@ -2882,13 +2882,13 @@ static inline spinlock_t *pte_lockptr(struct 
> mm_struct *mm, pmd_t *pmd)
>   return >page_table_lock;
>  }
>  static inline void ptlock_cache_init(void) {}
> -static inline bool ptlock_init(struct page *page) { return true; }
> +static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
>  static inline void ptlock_free(struct page *page) {}
>  #endif /* USE_SPLIT_PTE_PTLOCKS */
>  
>  static inline bool pgtable_pte_page_ctor(struct page *page)
>  {
> - if (!ptlock_init(page))
> + if (!ptlock_init(page_ptdesc(page)))
>   return false;
>   __SetPageTable(page);
>   inc_lruvec_page_state(page, NR_PAGETABLE);
> @@ -2947,7 +2947,7 @@ static inline bool pmd_ptlock_init(struct ptdesc 
> *ptdesc)
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>   ptdesc->pmd_huge_pte = NULL;
>  #endif
> - return ptlock_init(ptdesc_page(ptdesc));
> + return ptlock_init(ptdesc);
>  }
>  
>  static inline void pmd_ptlock_free(struct page *page)
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 10/34] mm: Convert ptlock_init() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:59PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index daecf1db6cf1..f48e626d9c98 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2857,7 +2857,7 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>   return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
>  }
>  
> -static inline bool ptlock_init(struct page *page)
> +static inline bool ptlock_init(struct ptdesc *ptdesc)
>  {
>   /*
>* prep_new_page() initialize page->private (and therefore page->ptl)
> @@ -2866,10 +2866,10 @@ static inline bool ptlock_init(struct page *page)
>* It can happen if arch try to use slab for page table allocation:
>* slab code uses page->slab_cache, which share storage with page->ptl.
>*/
> - VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
> - if (!ptlock_alloc(page_ptdesc(page)))
> + VM_BUG_ON_PAGE(*(unsigned long *)>ptl, ptdesc_page(ptdesc));
> + if (!ptlock_alloc(ptdesc))
>   return false;
> - spin_lock_init(ptlock_ptr(page_ptdesc(page)));
> + spin_lock_init(ptlock_ptr(ptdesc));
>   return true;
>  }
>  
> @@ -2882,13 +2882,13 @@ static inline spinlock_t *pte_lockptr(struct 
> mm_struct *mm, pmd_t *pmd)
>   return >page_table_lock;
>  }
>  static inline void ptlock_cache_init(void) {}
> -static inline bool ptlock_init(struct page *page) { return true; }
> +static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
>  static inline void ptlock_free(struct page *page) {}
>  #endif /* USE_SPLIT_PTE_PTLOCKS */
>  
>  static inline bool pgtable_pte_page_ctor(struct page *page)
>  {
> - if (!ptlock_init(page))
> + if (!ptlock_init(page_ptdesc(page)))
>   return false;
>   __SetPageTable(page);
>   inc_lruvec_page_state(page, NR_PAGETABLE);
> @@ -2947,7 +2947,7 @@ static inline bool pmd_ptlock_init(struct ptdesc 
> *ptdesc)
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>   ptdesc->pmd_huge_pte = NULL;
>  #endif
> - return ptlock_init(ptdesc_page(ptdesc));
> + return ptlock_init(ptdesc);
>  }
>  
>  static inline void pmd_ptlock_free(struct page *page)
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 10/34] mm: Convert ptlock_init() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:59PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index daecf1db6cf1..f48e626d9c98 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2857,7 +2857,7 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>   return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
>  }
>  
> -static inline bool ptlock_init(struct page *page)
> +static inline bool ptlock_init(struct ptdesc *ptdesc)
>  {
>   /*
>* prep_new_page() initialize page->private (and therefore page->ptl)
> @@ -2866,10 +2866,10 @@ static inline bool ptlock_init(struct page *page)
>* It can happen if arch try to use slab for page table allocation:
>* slab code uses page->slab_cache, which share storage with page->ptl.
>*/
> - VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
> - if (!ptlock_alloc(page_ptdesc(page)))
> + VM_BUG_ON_PAGE(*(unsigned long *)>ptl, ptdesc_page(ptdesc));
> + if (!ptlock_alloc(ptdesc))
>   return false;
> - spin_lock_init(ptlock_ptr(page_ptdesc(page)));
> + spin_lock_init(ptlock_ptr(ptdesc));
>   return true;
>  }
>  
> @@ -2882,13 +2882,13 @@ static inline spinlock_t *pte_lockptr(struct 
> mm_struct *mm, pmd_t *pmd)
>   return >page_table_lock;
>  }
>  static inline void ptlock_cache_init(void) {}
> -static inline bool ptlock_init(struct page *page) { return true; }
> +static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
>  static inline void ptlock_free(struct page *page) {}
>  #endif /* USE_SPLIT_PTE_PTLOCKS */
>  
>  static inline bool pgtable_pte_page_ctor(struct page *page)
>  {
> - if (!ptlock_init(page))
> + if (!ptlock_init(page_ptdesc(page)))
>   return false;
>   __SetPageTable(page);
>   inc_lruvec_page_state(page, NR_PAGETABLE);
> @@ -2947,7 +2947,7 @@ static inline bool pmd_ptlock_init(struct ptdesc 
> *ptdesc)
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>   ptdesc->pmd_huge_pte = NULL;
>  #endif
> - return ptlock_init(ptdesc_page(ptdesc));
> + return ptlock_init(ptdesc);
>  }
>  
>  static inline void pmd_ptlock_free(struct page *page)
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

___
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [PATCH v4 09/34] mm: Convert pmd_ptlock_init() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:58PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bb934d51390f..daecf1db6cf1 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2942,12 +2942,12 @@ static inline spinlock_t *pmd_lockptr(struct 
> mm_struct *mm, pmd_t *pmd)
>   return ptlock_ptr(pmd_ptdesc(pmd));
>  }
>  
> -static inline bool pmd_ptlock_init(struct page *page)
> +static inline bool pmd_ptlock_init(struct ptdesc *ptdesc)
>  {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - page->pmd_huge_pte = NULL;
> + ptdesc->pmd_huge_pte = NULL;
>  #endif
> - return ptlock_init(page);
> + return ptlock_init(ptdesc_page(ptdesc));
>  }
>  
>  static inline void pmd_ptlock_free(struct page *page)
> @@ -2967,7 +2967,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>   return >page_table_lock;
>  }
>  
> -static inline bool pmd_ptlock_init(struct page *page) { return true; }
> +static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; }
>  static inline void pmd_ptlock_free(struct page *page) {}
>  
>  #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
> @@ -2983,7 +2983,7 @@ static inline spinlock_t *pmd_lock(struct mm_struct 
> *mm, pmd_t *pmd)
>  
>  static inline bool pgtable_pmd_page_ctor(struct page *page)
>  {
> - if (!pmd_ptlock_init(page))
> + if (!pmd_ptlock_init(page_ptdesc(page)))
>   return false;
>   __SetPageTable(page);
>   inc_lruvec_page_state(page, NR_PAGETABLE);
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 08/34] mm: Convert ptlock_ptr() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:57PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/x86/xen/mmu_pv.c |  2 +-
>  include/linux/mm.h| 14 +++---
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
> index b3b8d289b9ab..f469862e3ef4 100644
> --- a/arch/x86/xen/mmu_pv.c
> +++ b/arch/x86/xen/mmu_pv.c
> @@ -651,7 +651,7 @@ static spinlock_t *xen_pte_lock(struct page *page, struct 
> mm_struct *mm)
>   spinlock_t *ptl = NULL;
>  
>  #if USE_SPLIT_PTE_PTLOCKS
> - ptl = ptlock_ptr(page);
> + ptl = ptlock_ptr(page_ptdesc(page));
>   spin_lock_nest_lock(ptl, >page_table_lock);
>  #endif
>  
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e6f1be2a405e..bb934d51390f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2828,9 +2828,9 @@ void __init ptlock_cache_init(void);
>  bool ptlock_alloc(struct ptdesc *ptdesc);
>  extern void ptlock_free(struct page *page);
>  
> -static inline spinlock_t *ptlock_ptr(struct page *page)
> +static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
>  {
> - return page->ptl;
> + return ptdesc->ptl;
>  }
>  #else /* ALLOC_SPLIT_PTLOCKS */
>  static inline void ptlock_cache_init(void)
> @@ -2846,15 +2846,15 @@ static inline void ptlock_free(struct page *page)
>  {
>  }
>  
> -static inline spinlock_t *ptlock_ptr(struct page *page)
> +static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
>  {
> - return >ptl;
> + return >ptl;
>  }
>  #endif /* ALLOC_SPLIT_PTLOCKS */
>  
>  static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
> - return ptlock_ptr(pmd_page(*pmd));
> + return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
>  }
>  
>  static inline bool ptlock_init(struct page *page)
> @@ -2869,7 +2869,7 @@ static inline bool ptlock_init(struct page *page)
>   VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
>   if (!ptlock_alloc(page_ptdesc(page)))
>   return false;
> - spin_lock_init(ptlock_ptr(page));
> + spin_lock_init(ptlock_ptr(page_ptdesc(page)));
>   return true;
>  }
>  
> @@ -2939,7 +2939,7 @@ static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd)
>  
>  static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
> - return ptlock_ptr(ptdesc_page(pmd_ptdesc(pmd)));
> + return ptlock_ptr(pmd_ptdesc(pmd));
>  }
>  
>  static inline bool pmd_ptlock_init(struct page *page)
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 09/34] mm: Convert pmd_ptlock_init() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:58PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bb934d51390f..daecf1db6cf1 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2942,12 +2942,12 @@ static inline spinlock_t *pmd_lockptr(struct 
> mm_struct *mm, pmd_t *pmd)
>   return ptlock_ptr(pmd_ptdesc(pmd));
>  }
>  
> -static inline bool pmd_ptlock_init(struct page *page)
> +static inline bool pmd_ptlock_init(struct ptdesc *ptdesc)
>  {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - page->pmd_huge_pte = NULL;
> + ptdesc->pmd_huge_pte = NULL;
>  #endif
> - return ptlock_init(page);
> + return ptlock_init(ptdesc_page(ptdesc));
>  }
>  
>  static inline void pmd_ptlock_free(struct page *page)
> @@ -2967,7 +2967,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>   return >page_table_lock;
>  }
>  
> -static inline bool pmd_ptlock_init(struct page *page) { return true; }
> +static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; }
>  static inline void pmd_ptlock_free(struct page *page) {}
>  
>  #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
> @@ -2983,7 +2983,7 @@ static inline spinlock_t *pmd_lock(struct mm_struct 
> *mm, pmd_t *pmd)
>  
>  static inline bool pgtable_pmd_page_ctor(struct page *page)
>  {
> - if (!pmd_ptlock_init(page))
> + if (!pmd_ptlock_init(page_ptdesc(page)))
>   return false;
>   __SetPageTable(page);
>   inc_lruvec_page_state(page, NR_PAGETABLE);
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 09/34] mm: Convert pmd_ptlock_init() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:58PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bb934d51390f..daecf1db6cf1 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2942,12 +2942,12 @@ static inline spinlock_t *pmd_lockptr(struct 
> mm_struct *mm, pmd_t *pmd)
>   return ptlock_ptr(pmd_ptdesc(pmd));
>  }
>  
> -static inline bool pmd_ptlock_init(struct page *page)
> +static inline bool pmd_ptlock_init(struct ptdesc *ptdesc)
>  {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - page->pmd_huge_pte = NULL;
> + ptdesc->pmd_huge_pte = NULL;
>  #endif
> - return ptlock_init(page);
> + return ptlock_init(ptdesc_page(ptdesc));
>  }
>  
>  static inline void pmd_ptlock_free(struct page *page)
> @@ -2967,7 +2967,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>   return >page_table_lock;
>  }
>  
> -static inline bool pmd_ptlock_init(struct page *page) { return true; }
> +static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; }
>  static inline void pmd_ptlock_free(struct page *page) {}
>  
>  #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
> @@ -2983,7 +2983,7 @@ static inline spinlock_t *pmd_lock(struct mm_struct 
> *mm, pmd_t *pmd)
>  
>  static inline bool pgtable_pmd_page_ctor(struct page *page)
>  {
> - if (!pmd_ptlock_init(page))
> + if (!pmd_ptlock_init(page_ptdesc(page)))
>   return false;
>   __SetPageTable(page);
>   inc_lruvec_page_state(page, NR_PAGETABLE);
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

___
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [PATCH v4 07/34] mm: Convert ptlock_alloc() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:56PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 6 +++---
>  mm/memory.c| 4 ++--
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 088b7664f897..e6f1be2a405e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2825,7 +2825,7 @@ static inline void pagetable_clear(void *x)
>  #if USE_SPLIT_PTE_PTLOCKS
>  #if ALLOC_SPLIT_PTLOCKS
>  void __init ptlock_cache_init(void);
> -extern bool ptlock_alloc(struct page *page);
> +bool ptlock_alloc(struct ptdesc *ptdesc);
>  extern void ptlock_free(struct page *page);
>  
>  static inline spinlock_t *ptlock_ptr(struct page *page)
> @@ -2837,7 +2837,7 @@ static inline void ptlock_cache_init(void)
>  {
>  }
>  
> -static inline bool ptlock_alloc(struct page *page)
> +static inline bool ptlock_alloc(struct ptdesc *ptdesc)
>  {
>   return true;
>  }
> @@ -2867,7 +2867,7 @@ static inline bool ptlock_init(struct page *page)
>* slab code uses page->slab_cache, which share storage with page->ptl.
>*/
>   VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
> - if (!ptlock_alloc(page))
> + if (!ptlock_alloc(page_ptdesc(page)))
>   return false;
>   spin_lock_init(ptlock_ptr(page));
>   return true;
> diff --git a/mm/memory.c b/mm/memory.c
> index 80ce9dda2779..ba9579117686 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5934,14 +5934,14 @@ void __init ptlock_cache_init(void)
>   SLAB_PANIC, NULL);
>  }
>  
> -bool ptlock_alloc(struct page *page)
> +bool ptlock_alloc(struct ptdesc *ptdesc)
>  {
>   spinlock_t *ptl;
>  
>   ptl = kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL);
>   if (!ptl)
>   return false;
> - page->ptl = ptl;
> + ptdesc->ptl = ptl;
>   return true;
>  }
>  
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 08/34] mm: Convert ptlock_ptr() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:57PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/x86/xen/mmu_pv.c |  2 +-
>  include/linux/mm.h| 14 +++---
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
> index b3b8d289b9ab..f469862e3ef4 100644
> --- a/arch/x86/xen/mmu_pv.c
> +++ b/arch/x86/xen/mmu_pv.c
> @@ -651,7 +651,7 @@ static spinlock_t *xen_pte_lock(struct page *page, struct 
> mm_struct *mm)
>   spinlock_t *ptl = NULL;
>  
>  #if USE_SPLIT_PTE_PTLOCKS
> - ptl = ptlock_ptr(page);
> + ptl = ptlock_ptr(page_ptdesc(page));
>   spin_lock_nest_lock(ptl, >page_table_lock);
>  #endif
>  
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e6f1be2a405e..bb934d51390f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2828,9 +2828,9 @@ void __init ptlock_cache_init(void);
>  bool ptlock_alloc(struct ptdesc *ptdesc);
>  extern void ptlock_free(struct page *page);
>  
> -static inline spinlock_t *ptlock_ptr(struct page *page)
> +static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
>  {
> - return page->ptl;
> + return ptdesc->ptl;
>  }
>  #else /* ALLOC_SPLIT_PTLOCKS */
>  static inline void ptlock_cache_init(void)
> @@ -2846,15 +2846,15 @@ static inline void ptlock_free(struct page *page)
>  {
>  }
>  
> -static inline spinlock_t *ptlock_ptr(struct page *page)
> +static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
>  {
> - return >ptl;
> + return >ptl;
>  }
>  #endif /* ALLOC_SPLIT_PTLOCKS */
>  
>  static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
> - return ptlock_ptr(pmd_page(*pmd));
> + return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
>  }
>  
>  static inline bool ptlock_init(struct page *page)
> @@ -2869,7 +2869,7 @@ static inline bool ptlock_init(struct page *page)
>   VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
>   if (!ptlock_alloc(page_ptdesc(page)))
>   return false;
> - spin_lock_init(ptlock_ptr(page));
> + spin_lock_init(ptlock_ptr(page_ptdesc(page)));
>   return true;
>  }
>  
> @@ -2939,7 +2939,7 @@ static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd)
>  
>  static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
> - return ptlock_ptr(ptdesc_page(pmd_ptdesc(pmd)));
> + return ptlock_ptr(pmd_ptdesc(pmd));
>  }
>  
>  static inline bool pmd_ptlock_init(struct page *page)
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

___
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [PATCH v4 08/34] mm: Convert ptlock_ptr() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:57PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/x86/xen/mmu_pv.c |  2 +-
>  include/linux/mm.h| 14 +++---
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
> index b3b8d289b9ab..f469862e3ef4 100644
> --- a/arch/x86/xen/mmu_pv.c
> +++ b/arch/x86/xen/mmu_pv.c
> @@ -651,7 +651,7 @@ static spinlock_t *xen_pte_lock(struct page *page, struct 
> mm_struct *mm)
>   spinlock_t *ptl = NULL;
>  
>  #if USE_SPLIT_PTE_PTLOCKS
> - ptl = ptlock_ptr(page);
> + ptl = ptlock_ptr(page_ptdesc(page));
>   spin_lock_nest_lock(ptl, >page_table_lock);
>  #endif
>  
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e6f1be2a405e..bb934d51390f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2828,9 +2828,9 @@ void __init ptlock_cache_init(void);
>  bool ptlock_alloc(struct ptdesc *ptdesc);
>  extern void ptlock_free(struct page *page);
>  
> -static inline spinlock_t *ptlock_ptr(struct page *page)
> +static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
>  {
> - return page->ptl;
> + return ptdesc->ptl;
>  }
>  #else /* ALLOC_SPLIT_PTLOCKS */
>  static inline void ptlock_cache_init(void)
> @@ -2846,15 +2846,15 @@ static inline void ptlock_free(struct page *page)
>  {
>  }
>  
> -static inline spinlock_t *ptlock_ptr(struct page *page)
> +static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
>  {
> - return >ptl;
> + return >ptl;
>  }
>  #endif /* ALLOC_SPLIT_PTLOCKS */
>  
>  static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
> - return ptlock_ptr(pmd_page(*pmd));
> + return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
>  }
>  
>  static inline bool ptlock_init(struct page *page)
> @@ -2869,7 +2869,7 @@ static inline bool ptlock_init(struct page *page)
>   VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
>   if (!ptlock_alloc(page_ptdesc(page)))
>   return false;
> - spin_lock_init(ptlock_ptr(page));
> + spin_lock_init(ptlock_ptr(page_ptdesc(page)));
>   return true;
>  }
>  
> @@ -2939,7 +2939,7 @@ static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd)
>  
>  static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
> - return ptlock_ptr(ptdesc_page(pmd_ptdesc(pmd)));
> + return ptlock_ptr(pmd_ptdesc(pmd));
>  }
>  
>  static inline bool pmd_ptlock_init(struct page *page)
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 07/34] mm: Convert ptlock_alloc() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:56PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 6 +++---
>  mm/memory.c| 4 ++--
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 088b7664f897..e6f1be2a405e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2825,7 +2825,7 @@ static inline void pagetable_clear(void *x)
>  #if USE_SPLIT_PTE_PTLOCKS
>  #if ALLOC_SPLIT_PTLOCKS
>  void __init ptlock_cache_init(void);
> -extern bool ptlock_alloc(struct page *page);
> +bool ptlock_alloc(struct ptdesc *ptdesc);
>  extern void ptlock_free(struct page *page);
>  
>  static inline spinlock_t *ptlock_ptr(struct page *page)
> @@ -2837,7 +2837,7 @@ static inline void ptlock_cache_init(void)
>  {
>  }
>  
> -static inline bool ptlock_alloc(struct page *page)
> +static inline bool ptlock_alloc(struct ptdesc *ptdesc)
>  {
>   return true;
>  }
> @@ -2867,7 +2867,7 @@ static inline bool ptlock_init(struct page *page)
>* slab code uses page->slab_cache, which share storage with page->ptl.
>*/
>   VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
> - if (!ptlock_alloc(page))
> + if (!ptlock_alloc(page_ptdesc(page)))
>   return false;
>   spin_lock_init(ptlock_ptr(page));
>   return true;
> diff --git a/mm/memory.c b/mm/memory.c
> index 80ce9dda2779..ba9579117686 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5934,14 +5934,14 @@ void __init ptlock_cache_init(void)
>   SLAB_PANIC, NULL);
>  }
>  
> -bool ptlock_alloc(struct page *page)
> +bool ptlock_alloc(struct ptdesc *ptdesc)
>  {
>   spinlock_t *ptl;
>  
>   ptl = kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL);
>   if (!ptl)
>   return false;
> - page->ptl = ptl;
> + ptdesc->ptl = ptl;
>   return true;
>  }
>  
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 07/34] mm: Convert ptlock_alloc() to use ptdescs

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:56PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 6 +++---
>  mm/memory.c| 4 ++--
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 088b7664f897..e6f1be2a405e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2825,7 +2825,7 @@ static inline void pagetable_clear(void *x)
>  #if USE_SPLIT_PTE_PTLOCKS
>  #if ALLOC_SPLIT_PTLOCKS
>  void __init ptlock_cache_init(void);
> -extern bool ptlock_alloc(struct page *page);
> +bool ptlock_alloc(struct ptdesc *ptdesc);
>  extern void ptlock_free(struct page *page);
>  
>  static inline spinlock_t *ptlock_ptr(struct page *page)
> @@ -2837,7 +2837,7 @@ static inline void ptlock_cache_init(void)
>  {
>  }
>  
> -static inline bool ptlock_alloc(struct page *page)
> +static inline bool ptlock_alloc(struct ptdesc *ptdesc)
>  {
>   return true;
>  }
> @@ -2867,7 +2867,7 @@ static inline bool ptlock_init(struct page *page)
>* slab code uses page->slab_cache, which share storage with page->ptl.
>*/
>   VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
> - if (!ptlock_alloc(page))
> + if (!ptlock_alloc(page_ptdesc(page)))
>   return false;
>   spin_lock_init(ptlock_ptr(page));
>   return true;
> diff --git a/mm/memory.c b/mm/memory.c
> index 80ce9dda2779..ba9579117686 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5934,14 +5934,14 @@ void __init ptlock_cache_init(void)
>   SLAB_PANIC, NULL);
>  }
>  
> -bool ptlock_alloc(struct page *page)
> +bool ptlock_alloc(struct ptdesc *ptdesc)
>  {
>   spinlock_t *ptl;
>  
>   ptl = kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL);
>   if (!ptl)
>   return false;
> - page->ptl = ptl;
> + ptdesc->ptl = ptl;
>   return true;
>  }
>  
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

___
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [PATCH v4 06/34] mm: Convert pmd_pgtable_page() to pmd_ptdesc()

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:55PM -0700, Vishal Moola (Oracle) wrote:
> Converts pmd_pgtable_page() to pmd_ptdesc() and all its callers. This
> removes some direct accesses to struct page, working towards splitting
> out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f184f1eba85d..088b7664f897 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2931,15 +2931,15 @@ static inline void pgtable_pte_page_dtor(struct page 
> *page)
>  
>  #if USE_SPLIT_PMD_PTLOCKS
>  
> -static inline struct page *pmd_pgtable_page(pmd_t *pmd)
> +static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd)
>  {
>   unsigned long mask = ~(PTRS_PER_PMD * sizeof(pmd_t) - 1);
> - return virt_to_page((void *)((unsigned long) pmd & mask));
> + return virt_to_ptdesc((void *)((unsigned long) pmd & mask));
>  }
>  
>  static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
> - return ptlock_ptr(pmd_pgtable_page(pmd));
> + return ptlock_ptr(ptdesc_page(pmd_ptdesc(pmd)));
>  }
>  
>  static inline bool pmd_ptlock_init(struct page *page)
> @@ -2958,7 +2958,7 @@ static inline void pmd_ptlock_free(struct page *page)
>   ptlock_free(page);
>  }
>  
> -#define pmd_huge_pte(mm, pmd) (pmd_pgtable_page(pmd)->pmd_huge_pte)
> +#define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
>  
>  #else
>  
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 06/34] mm: Convert pmd_pgtable_page() to pmd_ptdesc()

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:55PM -0700, Vishal Moola (Oracle) wrote:
> Converts pmd_pgtable_page() to pmd_ptdesc() and all its callers. This
> removes some direct accesses to struct page, working towards splitting
> out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f184f1eba85d..088b7664f897 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2931,15 +2931,15 @@ static inline void pgtable_pte_page_dtor(struct page 
> *page)
>  
>  #if USE_SPLIT_PMD_PTLOCKS
>  
> -static inline struct page *pmd_pgtable_page(pmd_t *pmd)
> +static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd)
>  {
>   unsigned long mask = ~(PTRS_PER_PMD * sizeof(pmd_t) - 1);
> - return virt_to_page((void *)((unsigned long) pmd & mask));
> + return virt_to_ptdesc((void *)((unsigned long) pmd & mask));
>  }
>  
>  static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
> - return ptlock_ptr(pmd_pgtable_page(pmd));
> + return ptlock_ptr(ptdesc_page(pmd_ptdesc(pmd)));
>  }
>  
>  static inline bool pmd_ptlock_init(struct page *page)
> @@ -2958,7 +2958,7 @@ static inline void pmd_ptlock_free(struct page *page)
>   ptlock_free(page);
>  }
>  
> -#define pmd_huge_pte(mm, pmd) (pmd_pgtable_page(pmd)->pmd_huge_pte)
> +#define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
>  
>  #else
>  
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

___
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [PATCH v4 06/34] mm: Convert pmd_pgtable_page() to pmd_ptdesc()

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:55PM -0700, Vishal Moola (Oracle) wrote:
> Converts pmd_pgtable_page() to pmd_ptdesc() and all its callers. This
> removes some direct accesses to struct page, working towards splitting
> out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f184f1eba85d..088b7664f897 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2931,15 +2931,15 @@ static inline void pgtable_pte_page_dtor(struct page 
> *page)
>  
>  #if USE_SPLIT_PMD_PTLOCKS
>  
> -static inline struct page *pmd_pgtable_page(pmd_t *pmd)
> +static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd)
>  {
>   unsigned long mask = ~(PTRS_PER_PMD * sizeof(pmd_t) - 1);
> - return virt_to_page((void *)((unsigned long) pmd & mask));
> + return virt_to_ptdesc((void *)((unsigned long) pmd & mask));
>  }
>  
>  static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
> - return ptlock_ptr(pmd_pgtable_page(pmd));
> + return ptlock_ptr(ptdesc_page(pmd_ptdesc(pmd)));
>  }
>  
>  static inline bool pmd_ptlock_init(struct page *page)
> @@ -2958,7 +2958,7 @@ static inline void pmd_ptlock_free(struct page *page)
>   ptlock_free(page);
>  }
>  
> -#define pmd_huge_pte(mm, pmd) (pmd_pgtable_page(pmd)->pmd_huge_pte)
> +#define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
>  
>  #else
>  
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 05/34] mm: add utility functions for ptdesc

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:54PM -0700, Vishal Moola (Oracle) wrote:
> Introduce utility functions setting the foundation for ptdescs. These
> will also assist in the splitting out of ptdesc from struct page.
> 
> Functions that focus on the descriptor are prefixed with ptdesc_* while
> functions that focus on the pagetable are prefixed with pagetable_*.
> 
> pagetable_alloc() is defined to allocate new ptdesc pages as compound
> pages. This is to standardize ptdescs by allowing for one allocation
> and one free function, in contrast to 2 allocation and 2 free functions.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  include/asm-generic/tlb.h | 11 +++
>  include/linux/mm.h| 61 +++
>  include/linux/pgtable.h   | 12 
>  3 files changed, 84 insertions(+)
> 
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index b46617207c93..6bade9e0e799 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -481,6 +481,17 @@ static inline void tlb_remove_page(struct mmu_gather 
> *tlb, struct page *page)
>   return tlb_remove_page_size(tlb, page, PAGE_SIZE);
>  }
>  
> +static inline void tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
> +{
> + tlb_remove_table(tlb, pt);
> +}
> +
> +/* Like tlb_remove_ptdesc, but for page-like page directories. */
> +static inline void tlb_remove_page_ptdesc(struct mmu_gather *tlb, struct 
> ptdesc *pt)
> +{
> + tlb_remove_page(tlb, ptdesc_page(pt));
> +}
> +
>  static inline void tlb_change_page_size(struct mmu_gather *tlb,
>unsigned int page_size)
>  {
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0db09639dd2d..f184f1eba85d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2766,6 +2766,62 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, 
> pud_t *pud, unsigned long a
>  }
>  #endif /* CONFIG_MMU */
>  
> +static inline struct ptdesc *virt_to_ptdesc(const void *x)
> +{
> + return page_ptdesc(virt_to_page(x));
> +}
> +
> +static inline void *ptdesc_to_virt(const struct ptdesc *pt)
> +{
> + return page_to_virt(ptdesc_page(pt));
> +}
> +
> +static inline void *ptdesc_address(const struct ptdesc *pt)
> +{
> + return folio_address(ptdesc_folio(pt));
> +}
> +
> +static inline bool pagetable_is_reserved(struct ptdesc *pt)
> +{
> + return folio_test_reserved(ptdesc_folio(pt));
> +}
> +
> +/**
> + * pagetable_alloc - Allocate pagetables
> + * @gfp:GFP flags
> + * @order:  desired pagetable order
> + *
> + * pagetable_alloc allocates a page table descriptor as well as all pages
> + * described by it.

I think the order should be switched here to emphasize that primarily this
method allocates memory for page tables. How about

 pagetable_alloc allocates memory for the page tables as well as a page
 table descriptor that describes the allocated memory

> + *
> + * Return: The ptdesc describing the allocated page tables.
> + */
> +static inline struct ptdesc *pagetable_alloc(gfp_t gfp, unsigned int order)
> +{
> + struct page *page = alloc_pages(gfp | __GFP_COMP, order);
> +
> + return page_ptdesc(page);
> +}
> +
> +/**
> + * pagetable_free - Free pagetables
> + * @pt:  The page table descriptor
> + *
> + * pagetable_free frees a page table descriptor as well as all page
> + * tables described by said ptdesc.

Similarly here.

> + */
> +static inline void pagetable_free(struct ptdesc *pt)
> +{
> + struct page *page = ptdesc_page(pt);
> +
> + __free_pages(page, compound_order(page));
> +}
> +
> +static inline void pagetable_clear(void *x)
> +{
> + clear_page(x);
> +}
> +
>  #if USE_SPLIT_PTE_PTLOCKS
>  #if ALLOC_SPLIT_PTLOCKS
>  void __init ptlock_cache_init(void);
> @@ -2992,6 +3048,11 @@ static inline void mark_page_reserved(struct page 
> *page)
>   adjust_managed_page_count(page, -1);
>  }
>  
> +static inline void free_reserved_ptdesc(struct ptdesc *pt)
> +{
> + free_reserved_page(ptdesc_page(pt));
> +}
> +
>  /*
>   * Default method to free all the __init memory into the buddy system.
>   * The freed pages will be poisoned with pattern "poison" if it's within
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 330de96ebfd6..c405f74d3875 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1026,6 +1026,18 @@ TABLE_MATCH(ptl, ptl);
>  #undef TABLE_MATCH
>  static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
>  
> +#define ptdesc_page(pt)  (_Generic((pt), 
> \
> + const struct ptdesc *:  (const struct page *)(pt),  \
> + struct ptdesc *:(struct page *)(pt)))
> +
> +#define ptdesc_folio(pt) (_Generic((pt), \
> + const struct ptdesc *:  (const struct folio *)(pt), \
> + struct ptdesc *:(struct folio *)(pt)))
> +
> +#define page_ptdesc(p)

Re: [PATCH v4 05/34] mm: add utility functions for ptdesc

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:54PM -0700, Vishal Moola (Oracle) wrote:
> Introduce utility functions setting the foundation for ptdescs. These
> will also assist in the splitting out of ptdesc from struct page.
> 
> Functions that focus on the descriptor are prefixed with ptdesc_* while
> functions that focus on the pagetable are prefixed with pagetable_*.
> 
> pagetable_alloc() is defined to allocate new ptdesc pages as compound
> pages. This is to standardize ptdescs by allowing for one allocation
> and one free function, in contrast to 2 allocation and 2 free functions.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  include/asm-generic/tlb.h | 11 +++
>  include/linux/mm.h| 61 +++
>  include/linux/pgtable.h   | 12 
>  3 files changed, 84 insertions(+)
> 
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index b46617207c93..6bade9e0e799 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -481,6 +481,17 @@ static inline void tlb_remove_page(struct mmu_gather 
> *tlb, struct page *page)
>   return tlb_remove_page_size(tlb, page, PAGE_SIZE);
>  }
>  
> +static inline void tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
> +{
> + tlb_remove_table(tlb, pt);
> +}
> +
> +/* Like tlb_remove_ptdesc, but for page-like page directories. */
> +static inline void tlb_remove_page_ptdesc(struct mmu_gather *tlb, struct 
> ptdesc *pt)
> +{
> + tlb_remove_page(tlb, ptdesc_page(pt));
> +}
> +
>  static inline void tlb_change_page_size(struct mmu_gather *tlb,
>unsigned int page_size)
>  {
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0db09639dd2d..f184f1eba85d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2766,6 +2766,62 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, 
> pud_t *pud, unsigned long a
>  }
>  #endif /* CONFIG_MMU */
>  
> +static inline struct ptdesc *virt_to_ptdesc(const void *x)
> +{
> + return page_ptdesc(virt_to_page(x));
> +}
> +
> +static inline void *ptdesc_to_virt(const struct ptdesc *pt)
> +{
> + return page_to_virt(ptdesc_page(pt));
> +}
> +
> +static inline void *ptdesc_address(const struct ptdesc *pt)
> +{
> + return folio_address(ptdesc_folio(pt));
> +}
> +
> +static inline bool pagetable_is_reserved(struct ptdesc *pt)
> +{
> + return folio_test_reserved(ptdesc_folio(pt));
> +}
> +
> +/**
> + * pagetable_alloc - Allocate pagetables
> + * @gfp:GFP flags
> + * @order:  desired pagetable order
> + *
> + * pagetable_alloc allocates a page table descriptor as well as all pages
> + * described by it.

I think the order should be switched here to emphasize that primarily this
method allocates memory for page tables. How about

 pagetable_alloc allocates memory for the page tables as well as a page
 table descriptor that describes the allocated memory

> + *
> + * Return: The ptdesc describing the allocated page tables.
> + */
> +static inline struct ptdesc *pagetable_alloc(gfp_t gfp, unsigned int order)
> +{
> + struct page *page = alloc_pages(gfp | __GFP_COMP, order);
> +
> + return page_ptdesc(page);
> +}
> +
> +/**
> + * pagetable_free - Free pagetables
> + * @pt:  The page table descriptor
> + *
> + * pagetable_free frees a page table descriptor as well as all page
> + * tables described by said ptdesc.

Similarly here.

> + */
> +static inline void pagetable_free(struct ptdesc *pt)
> +{
> + struct page *page = ptdesc_page(pt);
> +
> + __free_pages(page, compound_order(page));
> +}
> +
> +static inline void pagetable_clear(void *x)
> +{
> + clear_page(x);
> +}
> +
>  #if USE_SPLIT_PTE_PTLOCKS
>  #if ALLOC_SPLIT_PTLOCKS
>  void __init ptlock_cache_init(void);
> @@ -2992,6 +3048,11 @@ static inline void mark_page_reserved(struct page 
> *page)
>   adjust_managed_page_count(page, -1);
>  }
>  
> +static inline void free_reserved_ptdesc(struct ptdesc *pt)
> +{
> + free_reserved_page(ptdesc_page(pt));
> +}
> +
>  /*
>   * Default method to free all the __init memory into the buddy system.
>   * The freed pages will be poisoned with pattern "poison" if it's within
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 330de96ebfd6..c405f74d3875 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1026,6 +1026,18 @@ TABLE_MATCH(ptl, ptl);
>  #undef TABLE_MATCH
>  static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
>  
> +#define ptdesc_page(pt)  (_Generic((pt), 
> \
> + const struct ptdesc *:  (const struct page *)(pt),  \
> + struct ptdesc *:(struct page *)(pt)))
> +
> +#define ptdesc_folio(pt) (_Generic((pt), \
> + const struct ptdesc *:  (const struct folio *)(pt), \
> + struct ptdesc *:(struct folio *)(pt)))
> +
> +#define page_ptdesc(p)

Re: [PATCH v4 05/34] mm: add utility functions for ptdesc

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:54PM -0700, Vishal Moola (Oracle) wrote:
> Introduce utility functions setting the foundation for ptdescs. These
> will also assist in the splitting out of ptdesc from struct page.
> 
> Functions that focus on the descriptor are prefixed with ptdesc_* while
> functions that focus on the pagetable are prefixed with pagetable_*.
> 
> pagetable_alloc() is defined to allocate new ptdesc pages as compound
> pages. This is to standardize ptdescs by allowing for one allocation
> and one free function, in contrast to 2 allocation and 2 free functions.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  include/asm-generic/tlb.h | 11 +++
>  include/linux/mm.h| 61 +++
>  include/linux/pgtable.h   | 12 
>  3 files changed, 84 insertions(+)
> 
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index b46617207c93..6bade9e0e799 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -481,6 +481,17 @@ static inline void tlb_remove_page(struct mmu_gather 
> *tlb, struct page *page)
>   return tlb_remove_page_size(tlb, page, PAGE_SIZE);
>  }
>  
> +static inline void tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
> +{
> + tlb_remove_table(tlb, pt);
> +}
> +
> +/* Like tlb_remove_ptdesc, but for page-like page directories. */
> +static inline void tlb_remove_page_ptdesc(struct mmu_gather *tlb, struct 
> ptdesc *pt)
> +{
> + tlb_remove_page(tlb, ptdesc_page(pt));
> +}
> +
>  static inline void tlb_change_page_size(struct mmu_gather *tlb,
>unsigned int page_size)
>  {
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0db09639dd2d..f184f1eba85d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2766,6 +2766,62 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, 
> pud_t *pud, unsigned long a
>  }
>  #endif /* CONFIG_MMU */
>  
> +static inline struct ptdesc *virt_to_ptdesc(const void *x)
> +{
> + return page_ptdesc(virt_to_page(x));
> +}
> +
> +static inline void *ptdesc_to_virt(const struct ptdesc *pt)
> +{
> + return page_to_virt(ptdesc_page(pt));
> +}
> +
> +static inline void *ptdesc_address(const struct ptdesc *pt)
> +{
> + return folio_address(ptdesc_folio(pt));
> +}
> +
> +static inline bool pagetable_is_reserved(struct ptdesc *pt)
> +{
> + return folio_test_reserved(ptdesc_folio(pt));
> +}
> +
> +/**
> + * pagetable_alloc - Allocate pagetables
> + * @gfp:GFP flags
> + * @order:  desired pagetable order
> + *
> + * pagetable_alloc allocates a page table descriptor as well as all pages
> + * described by it.

I think the order should be switched here to emphasize that primarily this
method allocates memory for page tables. How about

 pagetable_alloc allocates memory for the page tables as well as a page
 table descriptor that describes the allocated memory

> + *
> + * Return: The ptdesc describing the allocated page tables.
> + */
> +static inline struct ptdesc *pagetable_alloc(gfp_t gfp, unsigned int order)
> +{
> + struct page *page = alloc_pages(gfp | __GFP_COMP, order);
> +
> + return page_ptdesc(page);
> +}
> +
> +/**
> + * pagetable_free - Free pagetables
> + * @pt:  The page table descriptor
> + *
> + * pagetable_free frees a page table descriptor as well as all page
> + * tables described by said ptdesc.

Similarly here.

> + */
> +static inline void pagetable_free(struct ptdesc *pt)
> +{
> + struct page *page = ptdesc_page(pt);
> +
> + __free_pages(page, compound_order(page));
> +}
> +
> +static inline void pagetable_clear(void *x)
> +{
> + clear_page(x);
> +}
> +
>  #if USE_SPLIT_PTE_PTLOCKS
>  #if ALLOC_SPLIT_PTLOCKS
>  void __init ptlock_cache_init(void);
> @@ -2992,6 +3048,11 @@ static inline void mark_page_reserved(struct page 
> *page)
>   adjust_managed_page_count(page, -1);
>  }
>  
> +static inline void free_reserved_ptdesc(struct ptdesc *pt)
> +{
> + free_reserved_page(ptdesc_page(pt));
> +}
> +
>  /*
>   * Default method to free all the __init memory into the buddy system.
>   * The freed pages will be poisoned with pattern "poison" if it's within
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 330de96ebfd6..c405f74d3875 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1026,6 +1026,18 @@ TABLE_MATCH(ptl, ptl);
>  #undef TABLE_MATCH
>  static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
>  
> +#define ptdesc_page(pt)  (_Generic((pt), 
> \
> + const struct ptdesc *:  (const struct page *)(pt),  \
> + struct ptdesc *:(struct page *)(pt)))
> +
> +#define ptdesc_folio(pt) (_Generic((pt), \
> + const struct ptdesc *:  (const struct folio *)(pt), \
> + struct ptdesc *:(struct folio *)(pt)))
> +
> +#define page_ptdesc(p)

Re: [PATCH v4 04/34] pgtable: Create struct ptdesc

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:53PM -0700, Vishal Moola (Oracle) wrote:
> Currently, page table information is stored within struct page. As part
> of simplifying struct page, create struct ptdesc for page table
> information.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/pgtable.h | 51 +
>  1 file changed, 51 insertions(+)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index c5a51481bbb9..330de96ebfd6 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -975,6 +975,57 @@ static inline void ptep_modify_prot_commit(struct 
> vm_area_struct *vma,
>  #endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */
>  #endif /* CONFIG_MMU */
>  
> +
> +/**
> + * struct ptdesc - Memory descriptor for page tables.
> + * @__page_flags: Same as page flags. Unused for page tables.
> + * @pt_list: List of used page tables. Used for s390 and x86.
> + * @_pt_pad_1: Padding that aliases with page's compound head.
> + * @pmd_huge_pte: Protected by ptdesc->ptl, used for THPs.
> + * @_pt_s390_gaddr: Aliases with page's mapping. Used for s390 gmap only.
> + * @pt_mm: Used for x86 pgds.
> + * @pt_frag_refcount: For fragmented page table tracking. Powerpc and s390 
> only.
> + * @ptl: Lock for the page table.

Do you mind aligning the descriptions by @pt_frag_refcount? I think it'll
be more readable.

> + *
> + * This struct overlays struct page for now. Do not modify without a good
> + * understanding of the issues.
> + */
> +struct ptdesc {
> + unsigned long __page_flags;
> +
> + union {
> + struct list_head pt_list;
> + struct {
> + unsigned long _pt_pad_1;
> + pgtable_t pmd_huge_pte;
> + };
> + };
> + unsigned long _pt_s390_gaddr;
> +
> + union {
> + struct mm_struct *pt_mm;
> + atomic_t pt_frag_refcount;
> + };
> +
> +#if ALLOC_SPLIT_PTLOCKS
> + spinlock_t *ptl;
> +#else
> + spinlock_t ptl;
> +#endif
> +};
> +
> +#define TABLE_MATCH(pg, pt)  \
> + static_assert(offsetof(struct page, pg) == offsetof(struct ptdesc, pt))
> +TABLE_MATCH(flags, __page_flags);
> +TABLE_MATCH(compound_head, pt_list);
> +TABLE_MATCH(compound_head, _pt_pad_1);
> +TABLE_MATCH(pmd_huge_pte, pmd_huge_pte);
> +TABLE_MATCH(mapping, _pt_s390_gaddr);
> +TABLE_MATCH(pt_mm, pt_mm);
> +TABLE_MATCH(ptl, ptl);
> +#undef TABLE_MATCH
> +static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
> +
>  /*
>   * No-op macros that just return the current protection value. Defined here
>   * because these macros can be used even if CONFIG_MMU is not defined.
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

___
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [PATCH v4 04/34] pgtable: Create struct ptdesc

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:53PM -0700, Vishal Moola (Oracle) wrote:
> Currently, page table information is stored within struct page. As part
> of simplifying struct page, create struct ptdesc for page table
> information.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/pgtable.h | 51 +
>  1 file changed, 51 insertions(+)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index c5a51481bbb9..330de96ebfd6 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -975,6 +975,57 @@ static inline void ptep_modify_prot_commit(struct 
> vm_area_struct *vma,
>  #endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */
>  #endif /* CONFIG_MMU */
>  
> +
> +/**
> + * struct ptdesc - Memory descriptor for page tables.
> + * @__page_flags: Same as page flags. Unused for page tables.
> + * @pt_list: List of used page tables. Used for s390 and x86.
> + * @_pt_pad_1: Padding that aliases with page's compound head.
> + * @pmd_huge_pte: Protected by ptdesc->ptl, used for THPs.
> + * @_pt_s390_gaddr: Aliases with page's mapping. Used for s390 gmap only.
> + * @pt_mm: Used for x86 pgds.
> + * @pt_frag_refcount: For fragmented page table tracking. Powerpc and s390 
> only.
> + * @ptl: Lock for the page table.

Do you mind aligning the descriptions by @pt_frag_refcount? I think it'll
be more readable.

> + *
> + * This struct overlays struct page for now. Do not modify without a good
> + * understanding of the issues.
> + */
> +struct ptdesc {
> + unsigned long __page_flags;
> +
> + union {
> + struct list_head pt_list;
> + struct {
> + unsigned long _pt_pad_1;
> + pgtable_t pmd_huge_pte;
> + };
> + };
> + unsigned long _pt_s390_gaddr;
> +
> + union {
> + struct mm_struct *pt_mm;
> + atomic_t pt_frag_refcount;
> + };
> +
> +#if ALLOC_SPLIT_PTLOCKS
> + spinlock_t *ptl;
> +#else
> + spinlock_t ptl;
> +#endif
> +};
> +
> +#define TABLE_MATCH(pg, pt)  \
> + static_assert(offsetof(struct page, pg) == offsetof(struct ptdesc, pt))
> +TABLE_MATCH(flags, __page_flags);
> +TABLE_MATCH(compound_head, pt_list);
> +TABLE_MATCH(compound_head, _pt_pad_1);
> +TABLE_MATCH(pmd_huge_pte, pmd_huge_pte);
> +TABLE_MATCH(mapping, _pt_s390_gaddr);
> +TABLE_MATCH(pt_mm, pt_mm);
> +TABLE_MATCH(ptl, ptl);
> +#undef TABLE_MATCH
> +static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
> +
>  /*
>   * No-op macros that just return the current protection value. Defined here
>   * because these macros can be used even if CONFIG_MMU is not defined.
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 04/34] pgtable: Create struct ptdesc

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:53PM -0700, Vishal Moola (Oracle) wrote:
> Currently, page table information is stored within struct page. As part
> of simplifying struct page, create struct ptdesc for page table
> information.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/pgtable.h | 51 +
>  1 file changed, 51 insertions(+)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index c5a51481bbb9..330de96ebfd6 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -975,6 +975,57 @@ static inline void ptep_modify_prot_commit(struct 
> vm_area_struct *vma,
>  #endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */
>  #endif /* CONFIG_MMU */
>  
> +
> +/**
> + * struct ptdesc - Memory descriptor for page tables.
> + * @__page_flags: Same as page flags. Unused for page tables.
> + * @pt_list: List of used page tables. Used for s390 and x86.
> + * @_pt_pad_1: Padding that aliases with page's compound head.
> + * @pmd_huge_pte: Protected by ptdesc->ptl, used for THPs.
> + * @_pt_s390_gaddr: Aliases with page's mapping. Used for s390 gmap only.
> + * @pt_mm: Used for x86 pgds.
> + * @pt_frag_refcount: For fragmented page table tracking. Powerpc and s390 
> only.
> + * @ptl: Lock for the page table.

Do you mind aligning the descriptions by @pt_frag_refcount? I think it'll
be more readable.

> + *
> + * This struct overlays struct page for now. Do not modify without a good
> + * understanding of the issues.
> + */
> +struct ptdesc {
> + unsigned long __page_flags;
> +
> + union {
> + struct list_head pt_list;
> + struct {
> + unsigned long _pt_pad_1;
> + pgtable_t pmd_huge_pte;
> + };
> + };
> + unsigned long _pt_s390_gaddr;
> +
> + union {
> + struct mm_struct *pt_mm;
> + atomic_t pt_frag_refcount;
> + };
> +
> +#if ALLOC_SPLIT_PTLOCKS
> + spinlock_t *ptl;
> +#else
> + spinlock_t ptl;
> +#endif
> +};
> +
> +#define TABLE_MATCH(pg, pt)  \
> + static_assert(offsetof(struct page, pg) == offsetof(struct ptdesc, pt))
> +TABLE_MATCH(flags, __page_flags);
> +TABLE_MATCH(compound_head, pt_list);
> +TABLE_MATCH(compound_head, _pt_pad_1);
> +TABLE_MATCH(pmd_huge_pte, pmd_huge_pte);
> +TABLE_MATCH(mapping, _pt_s390_gaddr);
> +TABLE_MATCH(pt_mm, pt_mm);
> +TABLE_MATCH(ptl, ptl);
> +#undef TABLE_MATCH
> +static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
> +
>  /*
>   * No-op macros that just return the current protection value. Defined here
>   * because these macros can be used even if CONFIG_MMU is not defined.
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 03/34] s390: Use pt_frag_refcount for pagetables

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:52PM -0700, Vishal Moola (Oracle) wrote:
> s390 currently uses _refcount to identify fragmented page tables.
> The page table struct already has a member pt_frag_refcount used by
> powerpc, so have s390 use that instead of the _refcount field as well.
> This improves the safety for _refcount and the page table tracking.
> 
> This also allows us to simplify the tracking since we can once again use
> the lower byte of pt_frag_refcount instead of the upper byte of _refcount.
> 
> Signed-off-by: Vishal Moola (Oracle) 

One nit below, otherwise

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/s390/mm/pgalloc.c | 38 +++---
>  1 file changed, 15 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
> index 66ab68db9842..6b99932abc66 100644
> --- a/arch/s390/mm/pgalloc.c
> +++ b/arch/s390/mm/pgalloc.c
> @@ -182,20 +182,17 @@ void page_table_free_pgste(struct page *page)
>   * As follows from the above, no unallocated or fully allocated parent
>   * pages are contained in mm_context_t::pgtable_list.
>   *
> - * The upper byte (bits 24-31) of the parent page _refcount is used
> + * The lower byte (bits 0-7) of the parent page pt_frag_refcount is used
>   * for tracking contained 2KB-pgtables and has the following format:
>   *
>   *   PP  AA
> - * 01234567upper byte (bits 24-31) of struct page::_refcount
> + * 01234567upper byte (bits 0-7) of struct page::pt_frag_refcount

Nit:  lower

>   *   ||  ||
>   *   ||  |+--- upper 2KB-pgtable is allocated
>   *   ||  + lower 2KB-pgtable is allocated
>   *   |+--- upper 2KB-pgtable is pending for removal
>   *   + lower 2KB-pgtable is pending for removal
>   *
> - * (See commit 620b4e903179 ("s390: use _refcount for pgtables") on why
> - * using _refcount is possible).
> - *
>   * When 2KB-pgtable is allocated the corresponding AA bit is set to 1.
>   * The parent page is either:
>   *   - added to mm_context_t::pgtable_list in case the second half of the
> @@ -243,11 +240,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>   if (!list_empty(>context.pgtable_list)) {
>   page = list_first_entry(>context.pgtable_list,
>   struct page, lru);
> - mask = atomic_read(>_refcount) >> 24;
> + mask = atomic_read(>pt_frag_refcount);
>   /*
>* The pending removal bits must also be checked.
>* Failure to do so might lead to an impossible
> -  * value of (i.e 0x13 or 0x23) written to _refcount.
> +  * value of (i.e 0x13 or 0x23) written to
> +  * pt_frag_refcount.
>* Such values violate the assumption that pending and
>* allocation bits are mutually exclusive, and the rest
>* of the code unrails as result. That could lead to
> @@ -259,8 +257,8 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>   bit = mask & 1; /* =1 -> second 2K */
>   if (bit)
>   table += PTRS_PER_PTE;
> - atomic_xor_bits(>_refcount,
> - 0x01U << (bit + 24));
> + atomic_xor_bits(>pt_frag_refcount,
> + 0x01U << bit);
>   list_del(>lru);
>   }
>   }
> @@ -281,12 +279,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>   table = (unsigned long *) page_to_virt(page);
>   if (mm_alloc_pgste(mm)) {
>   /* Return 4K page table with PGSTEs */
> - atomic_xor_bits(>_refcount, 0x03U << 24);
> + atomic_xor_bits(>pt_frag_refcount, 0x03U);
>   memset64((u64 *)table, _PAGE_INVALID, PTRS_PER_PTE);
>   memset64((u64 *)table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
>   } else {
>   /* Return the first 2K fragment of the page */
> - atomic_xor_bits(>_refcount, 0x01U << 24);
> + atomic_xor_bits(>pt_frag_refcount, 0x01U);
>   memset64((u64 *)table, _PAGE_INVALID, 2 * PTRS_PER_PTE);
>   spin_lock_bh(>context.lock);
>   list_add(>lru, >context.pgtable_list);
> @@ -323,22 +321,19 @@ void page_table_free(struct mm_struct *mm, unsigned 
> long *table)
>

Re: [PATCH v4 03/34] s390: Use pt_frag_refcount for pagetables

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:52PM -0700, Vishal Moola (Oracle) wrote:
> s390 currently uses _refcount to identify fragmented page tables.
> The page table struct already has a member pt_frag_refcount used by
> powerpc, so have s390 use that instead of the _refcount field as well.
> This improves the safety for _refcount and the page table tracking.
> 
> This also allows us to simplify the tracking since we can once again use
> the lower byte of pt_frag_refcount instead of the upper byte of _refcount.
> 
> Signed-off-by: Vishal Moola (Oracle) 

One nit below, otherwise

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/s390/mm/pgalloc.c | 38 +++---
>  1 file changed, 15 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
> index 66ab68db9842..6b99932abc66 100644
> --- a/arch/s390/mm/pgalloc.c
> +++ b/arch/s390/mm/pgalloc.c
> @@ -182,20 +182,17 @@ void page_table_free_pgste(struct page *page)
>   * As follows from the above, no unallocated or fully allocated parent
>   * pages are contained in mm_context_t::pgtable_list.
>   *
> - * The upper byte (bits 24-31) of the parent page _refcount is used
> + * The lower byte (bits 0-7) of the parent page pt_frag_refcount is used
>   * for tracking contained 2KB-pgtables and has the following format:
>   *
>   *   PP  AA
> - * 01234567upper byte (bits 24-31) of struct page::_refcount
> + * 01234567upper byte (bits 0-7) of struct page::pt_frag_refcount

Nit:  lower

>   *   ||  ||
>   *   ||  |+--- upper 2KB-pgtable is allocated
>   *   ||  + lower 2KB-pgtable is allocated
>   *   |+--- upper 2KB-pgtable is pending for removal
>   *   + lower 2KB-pgtable is pending for removal
>   *
> - * (See commit 620b4e903179 ("s390: use _refcount for pgtables") on why
> - * using _refcount is possible).
> - *
>   * When 2KB-pgtable is allocated the corresponding AA bit is set to 1.
>   * The parent page is either:
>   *   - added to mm_context_t::pgtable_list in case the second half of the
> @@ -243,11 +240,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>   if (!list_empty(>context.pgtable_list)) {
>   page = list_first_entry(>context.pgtable_list,
>   struct page, lru);
> - mask = atomic_read(>_refcount) >> 24;
> + mask = atomic_read(>pt_frag_refcount);
>   /*
>* The pending removal bits must also be checked.
>* Failure to do so might lead to an impossible
> -  * value of (i.e 0x13 or 0x23) written to _refcount.
> +  * value of (i.e 0x13 or 0x23) written to
> +  * pt_frag_refcount.
>* Such values violate the assumption that pending and
>* allocation bits are mutually exclusive, and the rest
>* of the code unrails as result. That could lead to
> @@ -259,8 +257,8 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>   bit = mask & 1; /* =1 -> second 2K */
>   if (bit)
>   table += PTRS_PER_PTE;
> - atomic_xor_bits(>_refcount,
> - 0x01U << (bit + 24));
> + atomic_xor_bits(>pt_frag_refcount,
> + 0x01U << bit);
>   list_del(>lru);
>   }
>   }
> @@ -281,12 +279,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>   table = (unsigned long *) page_to_virt(page);
>   if (mm_alloc_pgste(mm)) {
>   /* Return 4K page table with PGSTEs */
> - atomic_xor_bits(>_refcount, 0x03U << 24);
> + atomic_xor_bits(>pt_frag_refcount, 0x03U);
>   memset64((u64 *)table, _PAGE_INVALID, PTRS_PER_PTE);
>   memset64((u64 *)table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
>   } else {
>   /* Return the first 2K fragment of the page */
> - atomic_xor_bits(>_refcount, 0x01U << 24);
> + atomic_xor_bits(>pt_frag_refcount, 0x01U);
>   memset64((u64 *)table, _PAGE_INVALID, 2 * PTRS_PER_PTE);
>   spin_lock_bh(>context.lock);
>   list_add(>lru, >context.pgtable_list);
> @@ -323,22 +321,19 @@ void page_table_free(struct mm_struct *mm, unsigned 
> long *table)
>

Re: [PATCH v4 03/34] s390: Use pt_frag_refcount for pagetables

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:52PM -0700, Vishal Moola (Oracle) wrote:
> s390 currently uses _refcount to identify fragmented page tables.
> The page table struct already has a member pt_frag_refcount used by
> powerpc, so have s390 use that instead of the _refcount field as well.
> This improves the safety for _refcount and the page table tracking.
> 
> This also allows us to simplify the tracking since we can once again use
> the lower byte of pt_frag_refcount instead of the upper byte of _refcount.
> 
> Signed-off-by: Vishal Moola (Oracle) 

One nit below, otherwise

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/s390/mm/pgalloc.c | 38 +++---
>  1 file changed, 15 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
> index 66ab68db9842..6b99932abc66 100644
> --- a/arch/s390/mm/pgalloc.c
> +++ b/arch/s390/mm/pgalloc.c
> @@ -182,20 +182,17 @@ void page_table_free_pgste(struct page *page)
>   * As follows from the above, no unallocated or fully allocated parent
>   * pages are contained in mm_context_t::pgtable_list.
>   *
> - * The upper byte (bits 24-31) of the parent page _refcount is used
> + * The lower byte (bits 0-7) of the parent page pt_frag_refcount is used
>   * for tracking contained 2KB-pgtables and has the following format:
>   *
>   *   PP  AA
> - * 01234567upper byte (bits 24-31) of struct page::_refcount
> + * 01234567upper byte (bits 0-7) of struct page::pt_frag_refcount

Nit:  lower

>   *   ||  ||
>   *   ||  |+--- upper 2KB-pgtable is allocated
>   *   ||  + lower 2KB-pgtable is allocated
>   *   |+--- upper 2KB-pgtable is pending for removal
>   *   + lower 2KB-pgtable is pending for removal
>   *
> - * (See commit 620b4e903179 ("s390: use _refcount for pgtables") on why
> - * using _refcount is possible).
> - *
>   * When 2KB-pgtable is allocated the corresponding AA bit is set to 1.
>   * The parent page is either:
>   *   - added to mm_context_t::pgtable_list in case the second half of the
> @@ -243,11 +240,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>   if (!list_empty(>context.pgtable_list)) {
>   page = list_first_entry(>context.pgtable_list,
>   struct page, lru);
> - mask = atomic_read(>_refcount) >> 24;
> + mask = atomic_read(>pt_frag_refcount);
>   /*
>* The pending removal bits must also be checked.
>* Failure to do so might lead to an impossible
> -  * value of (i.e 0x13 or 0x23) written to _refcount.
> +  * value of (i.e 0x13 or 0x23) written to
> +  * pt_frag_refcount.
>* Such values violate the assumption that pending and
>* allocation bits are mutually exclusive, and the rest
>* of the code unrails as result. That could lead to
> @@ -259,8 +257,8 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>   bit = mask & 1; /* =1 -> second 2K */
>   if (bit)
>   table += PTRS_PER_PTE;
> - atomic_xor_bits(>_refcount,
> - 0x01U << (bit + 24));
> + atomic_xor_bits(>pt_frag_refcount,
> + 0x01U << bit);
>   list_del(>lru);
>   }
>   }
> @@ -281,12 +279,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>   table = (unsigned long *) page_to_virt(page);
>   if (mm_alloc_pgste(mm)) {
>   /* Return 4K page table with PGSTEs */
> - atomic_xor_bits(>_refcount, 0x03U << 24);
> + atomic_xor_bits(>pt_frag_refcount, 0x03U);
>   memset64((u64 *)table, _PAGE_INVALID, PTRS_PER_PTE);
>   memset64((u64 *)table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
>   } else {
>   /* Return the first 2K fragment of the page */
> - atomic_xor_bits(>_refcount, 0x01U << 24);
> + atomic_xor_bits(>pt_frag_refcount, 0x01U);
>   memset64((u64 *)table, _PAGE_INVALID, 2 * PTRS_PER_PTE);
>   spin_lock_bh(>context.lock);
>   list_add(>lru, >context.pgtable_list);
> @@ -323,22 +321,19 @@ void page_table_free(struct mm_struct *mm, unsigned 
> long *table)
>

Re: [PATCH v4 02/34] s390: Use _pt_s390_gaddr for gmap address tracking

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:51PM -0700, Vishal Moola (Oracle) wrote:
> s390 uses page->index to keep track of page tables for the guest address
> space. In an attempt to consolidate the usage of page fields in s390,
> replace _pt_pad_2 with _pt_s390_gaddr to replace page->index in gmap.
> 
> This will help with the splitting of struct ptdesc from struct page, as
> well as allow s390 to use _pt_frag_refcount for fragmented page table
> tracking.
> 
> Since page->_pt_s390_gaddr aliases with mapping, ensure its set to NULL
> before freeing the pages as well.

I'm looking at the final result and unless I've missed something, setting
of _pt_s390_gaddr to 0 is always followed by pagetable_free().
Can't we have pagetable_free() take care of zeroing _pt_s390_gaddr?
I think patch 16 ("s390: Convert various gmap functions to use ptdescs")
would be the right place for that.

Otherwise:

Acked-by: Mike Rapoport (IBM) 
 
> This also reverts commit 7e25de77bc5ea ("s390/mm: use pmd_pgtable_page()
> helper in __gmap_segment_gaddr()") which had s390 use
> pmd_pgtable_page() to get a gmap page table, as pmd_pgtable_page()
> should be used for more generic process page tables.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  arch/s390/mm/gmap.c  | 56 +++-
>  include/linux/mm_types.h |  2 +-
>  2 files changed, 39 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index dc90d1eb0d55..81c683426b49 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -70,7 +70,7 @@ static struct gmap *gmap_alloc(unsigned long limit)
>   page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
>   if (!page)
>   goto out_free;
> - page->index = 0;
> + page->_pt_s390_gaddr = 0;
>   list_add(>lru, >crst_list);
>   table = page_to_virt(page);
>   crst_table_init(table, etype);
> @@ -187,16 +187,20 @@ static void gmap_free(struct gmap *gmap)
>   if (!(gmap_is_shadow(gmap) && gmap->removed))
>   gmap_flush_tlb(gmap);
>   /* Free all segment & region tables. */
> - list_for_each_entry_safe(page, next, >crst_list, lru)
> + list_for_each_entry_safe(page, next, >crst_list, lru) {
> + page->_pt_s390_gaddr = 0;
>   __free_pages(page, CRST_ALLOC_ORDER);
> + }
>   gmap_radix_tree_free(>guest_to_host);
>   gmap_radix_tree_free(>host_to_guest);
>  
>   /* Free additional data for a shadow gmap */
>   if (gmap_is_shadow(gmap)) {
>   /* Free all page tables. */
> - list_for_each_entry_safe(page, next, >pt_list, lru)
> + list_for_each_entry_safe(page, next, >pt_list, lru) {
> + page->_pt_s390_gaddr = 0;
>   page_table_free_pgste(page);
> + }
>   gmap_rmap_radix_tree_free(>host_to_rmap);
>   /* Release reference to the parent */
>   gmap_put(gmap->parent);
> @@ -318,12 +322,14 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned 
> long *table,
>   list_add(>lru, >crst_list);
>   *table = __pa(new) | _REGION_ENTRY_LENGTH |
>   (*table & _REGION_ENTRY_TYPE_MASK);
> - page->index = gaddr;
> + page->_pt_s390_gaddr = gaddr;
>   page = NULL;
>   }
>   spin_unlock(>guest_table_lock);
> - if (page)
> + if (page) {
> + page->_pt_s390_gaddr = 0;
>   __free_pages(page, CRST_ALLOC_ORDER);
> + }
>   return 0;
>  }
>  
> @@ -336,12 +342,14 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned 
> long *table,
>  static unsigned long __gmap_segment_gaddr(unsigned long *entry)
>  {
>   struct page *page;
> - unsigned long offset;
> + unsigned long offset, mask;
>  
>   offset = (unsigned long) entry / sizeof(unsigned long);
>   offset = (offset & (PTRS_PER_PMD - 1)) * PMD_SIZE;
> - page = pmd_pgtable_page((pmd_t *) entry);
> - return page->index + offset;
> + mask = ~(PTRS_PER_PMD * sizeof(pmd_t) - 1);
> + page = virt_to_page((void *)((unsigned long) entry & mask));
> +
> + return page->_pt_s390_gaddr + offset;
>  }
>  
>  /**
> @@ -1351,6 +1359,7 @@ static void gmap_unshadow_pgt(struct gmap *sg, unsigned 
> long raddr)
>   /* Free page table */
>   page = phys_to_page(pgt);
>   list_del(>lru);
> + page->_pt_s390_gaddr = 0;
>   page_table_free_pgste(page);
>  }
>  
> @@ -1379,6 +1388,7 @@ static void __gmap_unshadow_sg

Re: [PATCH v4 02/34] s390: Use _pt_s390_gaddr for gmap address tracking

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:51PM -0700, Vishal Moola (Oracle) wrote:
> s390 uses page->index to keep track of page tables for the guest address
> space. In an attempt to consolidate the usage of page fields in s390,
> replace _pt_pad_2 with _pt_s390_gaddr to replace page->index in gmap.
> 
> This will help with the splitting of struct ptdesc from struct page, as
> well as allow s390 to use _pt_frag_refcount for fragmented page table
> tracking.
> 
> Since page->_pt_s390_gaddr aliases with mapping, ensure its set to NULL
> before freeing the pages as well.

I'm looking at the final result and unless I've missed something, setting
of _pt_s390_gaddr to 0 is always followed by pagetable_free().
Can't we have pagetable_free() take care of zeroing _pt_s390_gaddr?
I think patch 16 ("s390: Convert various gmap functions to use ptdescs")
would be the right place for that.

Otherwise:

Acked-by: Mike Rapoport (IBM) 
 
> This also reverts commit 7e25de77bc5ea ("s390/mm: use pmd_pgtable_page()
> helper in __gmap_segment_gaddr()") which had s390 use
> pmd_pgtable_page() to get a gmap page table, as pmd_pgtable_page()
> should be used for more generic process page tables.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  arch/s390/mm/gmap.c  | 56 +++-
>  include/linux/mm_types.h |  2 +-
>  2 files changed, 39 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index dc90d1eb0d55..81c683426b49 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -70,7 +70,7 @@ static struct gmap *gmap_alloc(unsigned long limit)
>   page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
>   if (!page)
>   goto out_free;
> - page->index = 0;
> + page->_pt_s390_gaddr = 0;
>   list_add(>lru, >crst_list);
>   table = page_to_virt(page);
>   crst_table_init(table, etype);
> @@ -187,16 +187,20 @@ static void gmap_free(struct gmap *gmap)
>   if (!(gmap_is_shadow(gmap) && gmap->removed))
>   gmap_flush_tlb(gmap);
>   /* Free all segment & region tables. */
> - list_for_each_entry_safe(page, next, >crst_list, lru)
> + list_for_each_entry_safe(page, next, >crst_list, lru) {
> + page->_pt_s390_gaddr = 0;
>   __free_pages(page, CRST_ALLOC_ORDER);
> + }
>   gmap_radix_tree_free(>guest_to_host);
>   gmap_radix_tree_free(>host_to_guest);
>  
>   /* Free additional data for a shadow gmap */
>   if (gmap_is_shadow(gmap)) {
>   /* Free all page tables. */
> - list_for_each_entry_safe(page, next, >pt_list, lru)
> + list_for_each_entry_safe(page, next, >pt_list, lru) {
> + page->_pt_s390_gaddr = 0;
>   page_table_free_pgste(page);
> + }
>   gmap_rmap_radix_tree_free(>host_to_rmap);
>   /* Release reference to the parent */
>   gmap_put(gmap->parent);
> @@ -318,12 +322,14 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned 
> long *table,
>   list_add(>lru, >crst_list);
>   *table = __pa(new) | _REGION_ENTRY_LENGTH |
>   (*table & _REGION_ENTRY_TYPE_MASK);
> - page->index = gaddr;
> + page->_pt_s390_gaddr = gaddr;
>   page = NULL;
>   }
>   spin_unlock(>guest_table_lock);
> - if (page)
> + if (page) {
> + page->_pt_s390_gaddr = 0;
>   __free_pages(page, CRST_ALLOC_ORDER);
> + }
>   return 0;
>  }
>  
> @@ -336,12 +342,14 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned 
> long *table,
>  static unsigned long __gmap_segment_gaddr(unsigned long *entry)
>  {
>   struct page *page;
> - unsigned long offset;
> + unsigned long offset, mask;
>  
>   offset = (unsigned long) entry / sizeof(unsigned long);
>   offset = (offset & (PTRS_PER_PMD - 1)) * PMD_SIZE;
> - page = pmd_pgtable_page((pmd_t *) entry);
> - return page->index + offset;
> + mask = ~(PTRS_PER_PMD * sizeof(pmd_t) - 1);
> + page = virt_to_page((void *)((unsigned long) entry & mask));
> +
> + return page->_pt_s390_gaddr + offset;
>  }
>  
>  /**
> @@ -1351,6 +1359,7 @@ static void gmap_unshadow_pgt(struct gmap *sg, unsigned 
> long raddr)
>   /* Free page table */
>   page = phys_to_page(pgt);
>   list_del(>lru);
> + page->_pt_s390_gaddr = 0;
>   page_table_free_pgste(page);
>  }
>  
> @@ -1379,6 +1388,7 @@ static void __gmap_unshadow_sg

Re: [PATCH v4 02/34] s390: Use _pt_s390_gaddr for gmap address tracking

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:51PM -0700, Vishal Moola (Oracle) wrote:
> s390 uses page->index to keep track of page tables for the guest address
> space. In an attempt to consolidate the usage of page fields in s390,
> replace _pt_pad_2 with _pt_s390_gaddr to replace page->index in gmap.
> 
> This will help with the splitting of struct ptdesc from struct page, as
> well as allow s390 to use _pt_frag_refcount for fragmented page table
> tracking.
> 
> Since page->_pt_s390_gaddr aliases with mapping, ensure its set to NULL
> before freeing the pages as well.

I'm looking at the final result and unless I've missed something, setting
of _pt_s390_gaddr to 0 is always followed by pagetable_free().
Can't we have pagetable_free() take care of zeroing _pt_s390_gaddr?
I think patch 16 ("s390: Convert various gmap functions to use ptdescs")
would be the right place for that.

Otherwise:

Acked-by: Mike Rapoport (IBM) 
 
> This also reverts commit 7e25de77bc5ea ("s390/mm: use pmd_pgtable_page()
> helper in __gmap_segment_gaddr()") which had s390 use
> pmd_pgtable_page() to get a gmap page table, as pmd_pgtable_page()
> should be used for more generic process page tables.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  arch/s390/mm/gmap.c  | 56 +++-
>  include/linux/mm_types.h |  2 +-
>  2 files changed, 39 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index dc90d1eb0d55..81c683426b49 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -70,7 +70,7 @@ static struct gmap *gmap_alloc(unsigned long limit)
>   page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
>   if (!page)
>   goto out_free;
> - page->index = 0;
> + page->_pt_s390_gaddr = 0;
>   list_add(>lru, >crst_list);
>   table = page_to_virt(page);
>   crst_table_init(table, etype);
> @@ -187,16 +187,20 @@ static void gmap_free(struct gmap *gmap)
>   if (!(gmap_is_shadow(gmap) && gmap->removed))
>   gmap_flush_tlb(gmap);
>   /* Free all segment & region tables. */
> - list_for_each_entry_safe(page, next, >crst_list, lru)
> + list_for_each_entry_safe(page, next, >crst_list, lru) {
> + page->_pt_s390_gaddr = 0;
>   __free_pages(page, CRST_ALLOC_ORDER);
> + }
>   gmap_radix_tree_free(>guest_to_host);
>   gmap_radix_tree_free(>host_to_guest);
>  
>   /* Free additional data for a shadow gmap */
>   if (gmap_is_shadow(gmap)) {
>   /* Free all page tables. */
> - list_for_each_entry_safe(page, next, >pt_list, lru)
> + list_for_each_entry_safe(page, next, >pt_list, lru) {
> + page->_pt_s390_gaddr = 0;
>   page_table_free_pgste(page);
> + }
>   gmap_rmap_radix_tree_free(>host_to_rmap);
>   /* Release reference to the parent */
>   gmap_put(gmap->parent);
> @@ -318,12 +322,14 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned 
> long *table,
>   list_add(>lru, >crst_list);
>   *table = __pa(new) | _REGION_ENTRY_LENGTH |
>   (*table & _REGION_ENTRY_TYPE_MASK);
> - page->index = gaddr;
> + page->_pt_s390_gaddr = gaddr;
>   page = NULL;
>   }
>   spin_unlock(>guest_table_lock);
> - if (page)
> + if (page) {
> + page->_pt_s390_gaddr = 0;
>   __free_pages(page, CRST_ALLOC_ORDER);
> + }
>   return 0;
>  }
>  
> @@ -336,12 +342,14 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned 
> long *table,
>  static unsigned long __gmap_segment_gaddr(unsigned long *entry)
>  {
>   struct page *page;
> - unsigned long offset;
> + unsigned long offset, mask;
>  
>   offset = (unsigned long) entry / sizeof(unsigned long);
>   offset = (offset & (PTRS_PER_PMD - 1)) * PMD_SIZE;
> - page = pmd_pgtable_page((pmd_t *) entry);
> - return page->index + offset;
> + mask = ~(PTRS_PER_PMD * sizeof(pmd_t) - 1);
> + page = virt_to_page((void *)((unsigned long) entry & mask));
> +
> + return page->_pt_s390_gaddr + offset;
>  }
>  
>  /**
> @@ -1351,6 +1359,7 @@ static void gmap_unshadow_pgt(struct gmap *sg, unsigned 
> long raddr)
>   /* Free page table */
>   page = phys_to_page(pgt);
>   list_del(>lru);
> + page->_pt_s390_gaddr = 0;
>   page_table_free_pgste(page);
>  }
>  
> @@ -1379,6 +1388,7 @@ static void __gmap_unshadow_sg

Re: [PATCH v4 01/34] mm: Add PAGE_TYPE_OP folio functions

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:50PM -0700, Vishal Moola (Oracle) wrote:
> No folio equivalents for page type operations have been defined, so
> define them for later folio conversions.
> 
> Also changes the Page##uname macros to take in const struct page* since
> we only read the memory here.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/page-flags.h | 20 ++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 92a2063a0a23..e99a616b9bcd 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -908,6 +908,8 @@ static inline bool is_page_hwpoison(struct page *page)
>  
>  #define PageType(page, flag) \
>   ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> +#define folio_test_type(folio, flag) \
> + ((folio->page.page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
>  
>  static inline int page_type_has_type(unsigned int page_type)
>  {
> @@ -920,20 +922,34 @@ static inline int page_has_type(struct page *page)
>  }
>  
>  #define PAGE_TYPE_OPS(uname, lname)  \
> -static __always_inline int Page##uname(struct page *page)\
> +static __always_inline int Page##uname(const struct page *page)  
> \
>  {\
>   return PageType(page, PG_##lname);  \
>  }\
> +static __always_inline int folio_test_##lname(const struct folio *folio)\
> +{\
> + return folio_test_type(folio, PG_##lname);  \
> +}\
>  static __always_inline void __SetPage##uname(struct page *page)  
> \
>  {\
>   VM_BUG_ON_PAGE(!PageType(page, 0), page);   \
>   page->page_type &= ~PG_##lname; \
>  }\
> +static __always_inline void __folio_set_##lname(struct folio *folio) \
> +{\
> + VM_BUG_ON_FOLIO(!folio_test_type(folio, 0), folio); \
> + folio->page.page_type &= ~PG_##lname;   \
> +}\
>  static __always_inline void __ClearPage##uname(struct page *page)\
>  {\
>   VM_BUG_ON_PAGE(!Page##uname(page), page);   \
>   page->page_type |= PG_##lname;  \
> -}
> +}\
> +static __always_inline void __folio_clear_##lname(struct folio *folio)   
> \
> +{\
> + VM_BUG_ON_FOLIO(!folio_test_##lname(folio), folio); \
> + folio->page.page_type |= PG_##lname;\
> +}\
>  
>  /*
>   * PageBuddy() indicates that the page is free and in the buddy system
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4 01/34] mm: Add PAGE_TYPE_OP folio functions

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:50PM -0700, Vishal Moola (Oracle) wrote:
> No folio equivalents for page type operations have been defined, so
> define them for later folio conversions.
> 
> Also changes the Page##uname macros to take in const struct page* since
> we only read the memory here.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/page-flags.h | 20 ++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 92a2063a0a23..e99a616b9bcd 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -908,6 +908,8 @@ static inline bool is_page_hwpoison(struct page *page)
>  
>  #define PageType(page, flag) \
>   ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> +#define folio_test_type(folio, flag) \
> + ((folio->page.page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
>  
>  static inline int page_type_has_type(unsigned int page_type)
>  {
> @@ -920,20 +922,34 @@ static inline int page_has_type(struct page *page)
>  }
>  
>  #define PAGE_TYPE_OPS(uname, lname)  \
> -static __always_inline int Page##uname(struct page *page)\
> +static __always_inline int Page##uname(const struct page *page)  
> \
>  {\
>   return PageType(page, PG_##lname);  \
>  }\
> +static __always_inline int folio_test_##lname(const struct folio *folio)\
> +{\
> + return folio_test_type(folio, PG_##lname);  \
> +}\
>  static __always_inline void __SetPage##uname(struct page *page)  
> \
>  {\
>   VM_BUG_ON_PAGE(!PageType(page, 0), page);   \
>   page->page_type &= ~PG_##lname; \
>  }\
> +static __always_inline void __folio_set_##lname(struct folio *folio) \
> +{\
> + VM_BUG_ON_FOLIO(!folio_test_type(folio, 0), folio); \
> + folio->page.page_type &= ~PG_##lname;   \
> +}\
>  static __always_inline void __ClearPage##uname(struct page *page)\
>  {\
>   VM_BUG_ON_PAGE(!Page##uname(page), page);   \
>   page->page_type |= PG_##lname;  \
> -}
> +}\
> +static __always_inline void __folio_clear_##lname(struct folio *folio)   
> \
> +{\
> + VM_BUG_ON_FOLIO(!folio_test_##lname(folio), folio); \
> + folio->page.page_type |= PG_##lname;\
> +}\
>  
>  /*
>   * PageBuddy() indicates that the page is free and in the buddy system
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

___
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [PATCH v4 01/34] mm: Add PAGE_TYPE_OP folio functions

2023-06-14 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 02:03:50PM -0700, Vishal Moola (Oracle) wrote:
> No folio equivalents for page type operations have been defined, so
> define them for later folio conversions.
> 
> Also changes the Page##uname macros to take in const struct page* since
> we only read the memory here.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/page-flags.h | 20 ++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 92a2063a0a23..e99a616b9bcd 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -908,6 +908,8 @@ static inline bool is_page_hwpoison(struct page *page)
>  
>  #define PageType(page, flag) \
>   ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> +#define folio_test_type(folio, flag) \
> + ((folio->page.page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
>  
>  static inline int page_type_has_type(unsigned int page_type)
>  {
> @@ -920,20 +922,34 @@ static inline int page_has_type(struct page *page)
>  }
>  
>  #define PAGE_TYPE_OPS(uname, lname)  \
> -static __always_inline int Page##uname(struct page *page)\
> +static __always_inline int Page##uname(const struct page *page)  
> \
>  {\
>   return PageType(page, PG_##lname);  \
>  }\
> +static __always_inline int folio_test_##lname(const struct folio *folio)\
> +{\
> + return folio_test_type(folio, PG_##lname);  \
> +}\
>  static __always_inline void __SetPage##uname(struct page *page)  
> \
>  {\
>   VM_BUG_ON_PAGE(!PageType(page, 0), page);   \
>   page->page_type &= ~PG_##lname; \
>  }\
> +static __always_inline void __folio_set_##lname(struct folio *folio) \
> +{\
> + VM_BUG_ON_FOLIO(!folio_test_type(folio, 0), folio); \
> + folio->page.page_type &= ~PG_##lname;   \
> +}\
>  static __always_inline void __ClearPage##uname(struct page *page)\
>  {\
>   VM_BUG_ON_PAGE(!Page##uname(page), page);   \
>   page->page_type |= PG_##lname;  \
> -}
> +}\
> +static __always_inline void __folio_clear_##lname(struct folio *folio)   
> \
> +{\
> + VM_BUG_ON_FOLIO(!folio_test_##lname(folio), folio); \
> + folio->page.page_type |= PG_##lname;\
> +}\
>  
>  /*
>   * PageBuddy() indicates that the page is free and in the buddy system
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v9 01/42] mm: Rename arch pte_mkwrite()'s to pte_mkwrite_novma()

2023-06-13 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 05:10:27PM -0700, Rick Edgecombe wrote:
> The x86 Shadow stack feature includes a new type of memory called shadow
> stack. This shadow stack memory has some unusual properties, which requires
> some core mm changes to function properly.
> 
> One of these unusual properties is that shadow stack memory is writable,
> but only in limited ways. These limits are applied via a specific PTE
> bit combination. Nevertheless, the memory is writable, and core mm code
> will need to apply the writable permissions in the typical paths that
> call pte_mkwrite(). Future patches will make pte_mkwrite() take a VMA, so
> that the x86 implementation of it can know whether to create regular
> writable memory or shadow stack memory.

Nit:^ mapping?

> But there are a couple of challenges to this. Modifying the signatures of
> each arch pte_mkwrite() implementation would be error prone because some
> are generated with macros and would need to be re-implemented. Also, some
> pte_mkwrite() callers operate on kernel memory without a VMA.
> 
> So this can be done in a three step process. First pte_mkwrite() can be
> renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite()
> added that just calls pte_mkwrite_novma(). Next callers without a VMA can
> be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers
> can be changed to take/pass a VMA.
> 
> Start the process by renaming pte_mkwrite() to pte_mkwrite_novma() and
> adding the pte_mkwrite() wrapper in linux/pgtable.h. Apply the same
> pattern for pmd_mkwrite(). Since not all archs have a pmd_mkwrite_novma(),
> create a new arch config HAS_HUGE_PAGE that can be used to tell if
> pmd_mkwrite() should be defined. Otherwise in the !HAS_HUGE_PAGE cases the
> compiler would not be able to find pmd_mkwrite_novma().
> 
> No functional change.
> 
> Cc: linux-...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Cc: linux-al...@vger.kernel.org
> Cc: linux-snps-...@lists.infradead.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-c...@vger.kernel.org
> Cc: linux-hexa...@vger.kernel.org
> Cc: linux-i...@vger.kernel.org
> Cc: loonga...@lists.linux.dev
> Cc: linux-m...@lists.linux-m68k.org
> Cc: Michal Simek 
> Cc: Dinh Nguyen 
> Cc: linux-m...@vger.kernel.org
> Cc: openr...@lists.librecores.org
> Cc: linux-par...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-ri...@lists.infradead.org
> Cc: linux-s...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: sparcli...@vger.kernel.org
> Cc: linux...@lists.infradead.org
> Cc: linux-a...@vger.kernel.org
> Cc: linux...@kvack.org
> Suggested-by: Linus Torvalds 
> Signed-off-by: Rick Edgecombe 
> Link: 
> https://lore.kernel.org/lkml/CAHk-=wizjsu7c9sfyzb3q04108stghff2wfbokgccgw7riz...@mail.gmail.com/

Reviewed-by: Mike Rapoport (IBM) 

-- 
Sincerely yours,
Mike.

Re: [PATCH 00/13] mm: jit/text allocator

2023-06-13 Thread Mike Rapoport

On Tue, Jun 13, 2023 at 02:56:14PM -0400, Kent Overstreet wrote:
> On Thu, Jun 08, 2023 at 09:41:16PM +0300, Mike Rapoport wrote:
> > On Tue, Jun 06, 2023 at 11:21:59AM -0700, Song Liu wrote:
> > > On Mon, Jun 5, 2023 at 3:09 AM Mark Rutland  wrote:
> > > 
> > > [...]
> > > 
> > > > > > > Can you give more detail on what parameters you need? If the only 
> > > > > > > extra
> > > > > > > parameter is just "does this allocation need to live close to 
> > > > > > > kernel
> > > > > > > text", that's not that big of a deal.
> > > > > >
> > > > > > My thinking was that we at least need the start + end for each 
> > > > > > caller. That
> > > > > > might be it, tbh.
> > > > >
> > > > > Do you mean that modules will have something like
> > > > >
> > > > >   jit_text_alloc(size, MODULES_START, MODULES_END);
> > > > >
> > > > > and kprobes will have
> > > > >
> > > > >   jit_text_alloc(size, KPROBES_START, KPROBES_END);
> > > > > ?
> > > >
> > > > Yes.
> > > 
> > > How about we start with two APIs:
> > >  jit_text_alloc(size);
> > >  jit_text_alloc_range(size, start, end);
> > > 
> > > AFAICT, arm64 is the only arch that requires the latter API. And TBH, I am
> > > not quite convinced it is needed.
> >  
> > Right now arm64 and riscv override bpf and kprobes allocations to use the
> > entire vmalloc address space, but having the ability to allocate generated
> > code outside of modules area may be useful for other architectures.
> > 
> > Still the start + end for the callers feels backwards to me because the
> > callers do not define the ranges, but rather the architectures, so we still
> > need a way for architectures to define how they want allocate memory for
> > the generated code.
> 
> So, the start + end just comes from the need to keep relative pointers
> under a certain size. I think this could be just a flag, I see no reason
> to expose actual addresses here.

It's the other way around. The start + end comes from the need to restrict
allocation to certain range because of the relative addressing. I don't see
how a flag can help here.

-- 
Sincerely yours,
Mike.

Re: [PATCH v9 02/42] mm: Move pte/pmd_mkwrite() callers with no VMA to _novma()

2023-06-13 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 05:10:28PM -0700, Rick Edgecombe wrote:
> The x86 Shadow stack feature includes a new type of memory called shadow
> stack. This shadow stack memory has some unusual properties, which requires
> some core mm changes to function properly.
> 
> One of these unusual properties is that shadow stack memory is writable,
> but only in limited ways. These limits are applied via a specific PTE
> bit combination. Nevertheless, the memory is writable, and core mm code
> will need to apply the writable permissions in the typical paths that
> call pte_mkwrite(). Future patches will make pte_mkwrite() take a VMA, so
> that the x86 implementation of it can know whether to create regular
> writable memory or shadow stack memory.

Nit:^ mappings?
 
> But there are a couple of challenges to this. Modifying the signatures of
> each arch pte_mkwrite() implementation would be error prone because some
> are generated with macros and would need to be re-implemented. Also, some
> pte_mkwrite() callers operate on kernel memory without a VMA.
> 
> So this can be done in a three step process. First pte_mkwrite() can be
> renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite()
> added that just calls pte_mkwrite_novma(). Next callers without a VMA can
> be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers
> can be changed to take/pass a VMA.
> 
> Previous patches have done the first step, so next move the callers that
> don't have a VMA to pte_mkwrite_novma(). Also do the same for

I hear x86 maintainers asking to drop "previous patches" ;-)

Maybe
This is the second step of the conversion that moves the callers ...

> pmd_mkwrite(). This will be ok for the shadow stack feature, as these
> callers are on kernel memory which will not need to be made shadow stack,
> and the other architectures only currently support one type of memory
> in pte_mkwrite()
> 
> Cc: linux-...@vger.kernel.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-s...@vger.kernel.org
> Cc: xen-devel@lists.xenproject.org
> Cc: linux-a...@vger.kernel.org
> Cc: linux...@kvack.org
> Signed-off-by: Rick Edgecombe 

Reviewed-by: Mike Rapoport (IBM) 

-- 
Sincerely yours,
Mike.

Re: [PATCH v9 01/42] mm: Rename arch pte_mkwrite()'s to pte_mkwrite_novma()

2023-06-13 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 05:10:27PM -0700, Rick Edgecombe wrote:
> The x86 Shadow stack feature includes a new type of memory called shadow
> stack. This shadow stack memory has some unusual properties, which requires
> some core mm changes to function properly.
> 
> One of these unusual properties is that shadow stack memory is writable,
> but only in limited ways. These limits are applied via a specific PTE
> bit combination. Nevertheless, the memory is writable, and core mm code
> will need to apply the writable permissions in the typical paths that
> call pte_mkwrite(). Future patches will make pte_mkwrite() take a VMA, so
> that the x86 implementation of it can know whether to create regular
> writable memory or shadow stack memory.

Nit:^ mapping?

> But there are a couple of challenges to this. Modifying the signatures of
> each arch pte_mkwrite() implementation would be error prone because some
> are generated with macros and would need to be re-implemented. Also, some
> pte_mkwrite() callers operate on kernel memory without a VMA.
> 
> So this can be done in a three step process. First pte_mkwrite() can be
> renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite()
> added that just calls pte_mkwrite_novma(). Next callers without a VMA can
> be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers
> can be changed to take/pass a VMA.
> 
> Start the process by renaming pte_mkwrite() to pte_mkwrite_novma() and
> adding the pte_mkwrite() wrapper in linux/pgtable.h. Apply the same
> pattern for pmd_mkwrite(). Since not all archs have a pmd_mkwrite_novma(),
> create a new arch config HAS_HUGE_PAGE that can be used to tell if
> pmd_mkwrite() should be defined. Otherwise in the !HAS_HUGE_PAGE cases the
> compiler would not be able to find pmd_mkwrite_novma().
> 
> No functional change.
> 
> Cc: linux-...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Cc: linux-al...@vger.kernel.org
> Cc: linux-snps-arc@lists.infradead.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-c...@vger.kernel.org
> Cc: linux-hexa...@vger.kernel.org
> Cc: linux-i...@vger.kernel.org
> Cc: loonga...@lists.linux.dev
> Cc: linux-m...@lists.linux-m68k.org
> Cc: Michal Simek 
> Cc: Dinh Nguyen 
> Cc: linux-m...@vger.kernel.org
> Cc: openr...@lists.librecores.org
> Cc: linux-par...@vger.kernel.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-ri...@lists.infradead.org
> Cc: linux-s...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: sparcli...@vger.kernel.org
> Cc: linux...@lists.infradead.org
> Cc: linux-a...@vger.kernel.org
> Cc: linux...@kvack.org
> Suggested-by: Linus Torvalds 
> Signed-off-by: Rick Edgecombe 
> Link: 
> https://lore.kernel.org/lkml/CAHk-=wizjsu7c9sfyzb3q04108stghff2wfbokgccgw7riz...@mail.gmail.com/

Reviewed-by: Mike Rapoport (IBM) 

-- 
Sincerely yours,
Mike.

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

Re: [PATCH v9 01/42] mm: Rename arch pte_mkwrite()'s to pte_mkwrite_novma()

2023-06-13 Thread Mike Rapoport

On Mon, Jun 12, 2023 at 05:10:27PM -0700, Rick Edgecombe wrote:
> The x86 Shadow stack feature includes a new type of memory called shadow
> stack. This shadow stack memory has some unusual properties, which requires
> some core mm changes to function properly.
> 
> One of these unusual properties is that shadow stack memory is writable,
> but only in limited ways. These limits are applied via a specific PTE
> bit combination. Nevertheless, the memory is writable, and core mm code
> will need to apply the writable permissions in the typical paths that
> call pte_mkwrite(). Future patches will make pte_mkwrite() take a VMA, so
> that the x86 implementation of it can know whether to create regular
> writable memory or shadow stack memory.

Nit:^ mapping?

> But there are a couple of challenges to this. Modifying the signatures of
> each arch pte_mkwrite() implementation would be error prone because some
> are generated with macros and would need to be re-implemented. Also, some
> pte_mkwrite() callers operate on kernel memory without a VMA.
> 
> So this can be done in a three step process. First pte_mkwrite() can be
> renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite()
> added that just calls pte_mkwrite_novma(). Next callers without a VMA can
> be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers
> can be changed to take/pass a VMA.
> 
> Start the process by renaming pte_mkwrite() to pte_mkwrite_novma() and
> adding the pte_mkwrite() wrapper in linux/pgtable.h. Apply the same
> pattern for pmd_mkwrite(). Since not all archs have a pmd_mkwrite_novma(),
> create a new arch config HAS_HUGE_PAGE that can be used to tell if
> pmd_mkwrite() should be defined. Otherwise in the !HAS_HUGE_PAGE cases the
> compiler would not be able to find pmd_mkwrite_novma().
> 
> No functional change.
> 
> Cc: linux-...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Cc: linux-al...@vger.kernel.org
> Cc: linux-snps-...@lists.infradead.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-c...@vger.kernel.org
> Cc: linux-hexa...@vger.kernel.org
> Cc: linux-i...@vger.kernel.org
> Cc: loonga...@lists.linux.dev
> Cc: linux-m...@lists.linux-m68k.org
> Cc: Michal Simek 
> Cc: Dinh Nguyen 
> Cc: linux-m...@vger.kernel.org
> Cc: openr...@lists.librecores.org
> Cc: linux-par...@vger.kernel.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-ri...@lists.infradead.org
> Cc: linux-s...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: sparcli...@vger.kernel.org
> Cc: linux-um@lists.infradead.org
> Cc: linux-a...@vger.kernel.org
> Cc: linux...@kvack.org
> Suggested-by: Linus Torvalds 
> Signed-off-by: Rick Edgecombe 
> Link: 
> https://lore.kernel.org/lkml/CAHk-=wizjsu7c9sfyzb3q04108stghff2wfbokgccgw7riz...@mail.gmail.com/

Reviewed-by: Mike Rapoport (IBM) 

-- 
Sincerely yours,
Mike.

___
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [PATCH 00/13] mm: jit/text allocator

2023-06-12 Thread Mike Rapoport

On Fri, Jun 09, 2023 at 10:02:16AM -0700, Song Liu wrote:
> On Thu, Jun 8, 2023 at 11:41 AM Mike Rapoport  wrote:
> >
> > On Tue, Jun 06, 2023 at 11:21:59AM -0700, Song Liu wrote:
> > > On Mon, Jun 5, 2023 at 3:09 AM Mark Rutland  wrote:
> > >
> > > [...]
> > >
> > > > > > > Can you give more detail on what parameters you need? If the only 
> > > > > > > extra
> > > > > > > parameter is just "does this allocation need to live close to 
> > > > > > > kernel
> > > > > > > text", that's not that big of a deal.
> > > > > >
> > > > > > My thinking was that we at least need the start + end for each 
> > > > > > caller. That
> > > > > > might be it, tbh.
> > > > >
> > > > > Do you mean that modules will have something like
> > > > >
> > > > >   jit_text_alloc(size, MODULES_START, MODULES_END);
> > > > >
> > > > > and kprobes will have
> > > > >
> > > > >   jit_text_alloc(size, KPROBES_START, KPROBES_END);
> > > > > ?
> > > >
> > > > Yes.
> > >
> > > How about we start with two APIs:
> > >  jit_text_alloc(size);
> > >  jit_text_alloc_range(size, start, end);
> > >
> > > AFAICT, arm64 is the only arch that requires the latter API. And TBH, I am
> > > not quite convinced it is needed.
> >
> > Right now arm64 and riscv override bpf and kprobes allocations to use the
> > entire vmalloc address space, but having the ability to allocate generated
> > code outside of modules area may be useful for other architectures.
> >
> > Still the start + end for the callers feels backwards to me because the
> > callers do not define the ranges, but rather the architectures, so we still
> > need a way for architectures to define how they want allocate memory for
> > the generated code.
> 
> Yeah, this makes sense.
> 
> >
> > > > > It sill can be achieved with a single jit_alloc_arch_params(), just by
> > > > > adding enum jit_type parameter to jit_text_alloc().
> > > >
> > > > That feels backwards to me; it centralizes a bunch of information about
> > > > distinct users to be able to shove that into a static array, when the 
> > > > callsites
> > > > can pass that information.
> > >
> > > I think we only two type of users: module and everything else (ftrace, 
> > > kprobe,
> > > bpf stuff). The key differences are:
> > >
> > >   1. module uses text and data; while everything else only uses text.
> > >   2. module code is generated by the compiler, and thus has stronger
> > >   requirements in address ranges; everything else are generated via some
> > >   JIT or manual written assembly, so they are more flexible with address
> > >   ranges (in JIT, we can avoid using instructions that requires a specific
> > >   address range).
> > >
> > > The next question is, can we have the two types of users share the same
> > > address ranges? If not, we can reserve the preferred range for modules,
> > > and let everything else use the other range. I don't see reasons to 
> > > further
> > > separate users in the "everything else" group.
> >
> > I agree that we can define only two types: modules and everything else and
> > let the architectures define if they need different ranges for these two
> > types, or want the same range for everything.
> >
> > With only two types we can have two API calls for alloc, and a single
> > structure that defines the ranges etc from the architecture side rather
> > than spread all over.
> >
> > Like something along these lines:
> >
> > struct execmem_range {
> > unsigned long   start;
> > unsigned long   end;
> > unsigned long   fallback_start;
> > unsigned long   fallback_end;
> > pgprot_tpgprot;
> > unsigned intalignment;
> > };
> >
> > struct execmem_modules_range {
> > enum execmem_module_flags flags;
> > struct execmem_range text;
> > struct execmem_range data;
> > };
> >
> > struct execmem_jit_range {
> > struct execmem_range text;
> >

Re: [PATCH 00/13] mm: jit/text allocator

2023-06-08 Thread Mike Rapoport

On Tue, Jun 06, 2023 at 11:21:59AM -0700, Song Liu wrote:
> On Mon, Jun 5, 2023 at 3:09 AM Mark Rutland  wrote:
> 
> [...]
> 
> > > > > Can you give more detail on what parameters you need? If the only 
> > > > > extra
> > > > > parameter is just "does this allocation need to live close to kernel
> > > > > text", that's not that big of a deal.
> > > >
> > > > My thinking was that we at least need the start + end for each caller. 
> > > > That
> > > > might be it, tbh.
> > >
> > > Do you mean that modules will have something like
> > >
> > >   jit_text_alloc(size, MODULES_START, MODULES_END);
> > >
> > > and kprobes will have
> > >
> > >   jit_text_alloc(size, KPROBES_START, KPROBES_END);
> > > ?
> >
> > Yes.
> 
> How about we start with two APIs:
>  jit_text_alloc(size);
>  jit_text_alloc_range(size, start, end);
> 
> AFAICT, arm64 is the only arch that requires the latter API. And TBH, I am
> not quite convinced it is needed.
 
Right now arm64 and riscv override bpf and kprobes allocations to use the
entire vmalloc address space, but having the ability to allocate generated
code outside of modules area may be useful for other architectures.

Still the start + end for the callers feels backwards to me because the
callers do not define the ranges, but rather the architectures, so we still
need a way for architectures to define how they want allocate memory for
the generated code.

> > > It sill can be achieved with a single jit_alloc_arch_params(), just by
> > > adding enum jit_type parameter to jit_text_alloc().
> >
> > That feels backwards to me; it centralizes a bunch of information about
> > distinct users to be able to shove that into a static array, when the 
> > callsites
> > can pass that information.
> 
> I think we only two type of users: module and everything else (ftrace, kprobe,
> bpf stuff). The key differences are:
> 
>   1. module uses text and data; while everything else only uses text.
>   2. module code is generated by the compiler, and thus has stronger
>   requirements in address ranges; everything else are generated via some
>   JIT or manual written assembly, so they are more flexible with address
>   ranges (in JIT, we can avoid using instructions that requires a specific
>   address range).
> 
> The next question is, can we have the two types of users share the same
> address ranges? If not, we can reserve the preferred range for modules,
> and let everything else use the other range. I don't see reasons to further
> separate users in the "everything else" group.
 
I agree that we can define only two types: modules and everything else and
let the architectures define if they need different ranges for these two
types, or want the same range for everything.

With only two types we can have two API calls for alloc, and a single
structure that defines the ranges etc from the architecture side rather
than spread all over.

Like something along these lines:

struct execmem_range {
unsigned long   start;
unsigned long   end;
unsigned long   fallback_start;
unsigned long   fallback_end;
pgprot_tpgprot;
unsigned intalignment;
};

struct execmem_modules_range {
enum execmem_module_flags flags;
struct execmem_range text;
struct execmem_range data;
};

struct execmem_jit_range {
struct execmem_range text;
};

struct execmem_params {
struct execmem_modules_rangemodules;
struct execmem_jit_rangejit;
};

struct execmem_params *execmem_arch_params(void);

void *execmem_text_alloc(size_t size);
void *execmem_data_alloc(size_t size);
void execmem_free(void *ptr);

void *jit_text_alloc(size_t size);
void jit_free(void *ptr);

Modules or anything that must live close to the kernel image can use
execmem_*_alloc() and the callers that don't generally care about relative
addressing will use jit_text_alloc(), presuming that arch will restrict jit
range if necessary, like e.g. below for arm64 jit can be anywhere in
vmalloc and for x86 and s390 it will share the modules range. 


struct execmem_params arm64_execmem = {
.modules = {
.flags = KASAN,
.text = {
.start = MODULES_VADDR,
.end = MODULES_END,
.pgprot = PAGE_KERNEL_ROX,
.fallback_start = VMALLOC_START,
.fallback_start = VMALLOC_END,
},
},
.jit = {
.text = {
.start = VMALLOC_START,
.end = VMALLOC_END,
.pgprot =

[LincolnTalk] Lincoln Public Library Weekly Update - June 7, 2023

2023-06-07 Thread Robin Rapoport

arch in the areas of diet and nutrition, exercise,
cognitive activity and social engagement, and use hands-on tools to help
you incorporate these recommendations into a plan for healthy aging

An education program presented by the Alzheimer’s Association®

Registration is required. Please register on the page here
<https://lincolnpl.assabetinteractive.com/calendar/healthy-living-for-your-brain-and-body/>.




Knitting (and other fiber arts) Group
<https://lincolnpl.assabetinteractive.com/calendar/knitting-and-other-fiber-arts-group-33/>

*Wednesday, June 21, 7:00pm – 8:30pm*

*Farrar Room, Lincoln Public Library*

Do you enjoy knitting or other handcrafts? Want to chat with others about
what you are reading and making? Please join us for a weekly drop-in
opportunity for social crafting. Bring your current project or start
something new. If you are interested in learning how to knit, we welcome a
chance to get you started. Please contact Pauline pmaclel...@yahoo.com if
you would like to learn!



 *Recordings of past events are available* on our Special Online Displays
and Recorded Events page <https://www.lincolnpl.org/events/special-online>.



 Recent events:



Freedom’s Cause: Historical Black Communities and George Washington’s
Cambridge Camp Recorded February 8, 2023
<https://www.youtube.com/watch?v=kz_hSlU3Cfk>



Radon Awareness Presentation with Mass DPH Recorded January 26, 2023
<https://www.youtube.com/watch?v=Yo4a803fO1U>



The Ransomware Hunting Team: A Band of Misfits' Improbable Crusade to Save
the World from Cybercrime with Renee Dudley and Daniel Golden Recorded
January 25, 2023 <https://www.youtube.com/watch?v=ntTb1q5zu84>



Maureen Johnson ("Nine Liars") in Conversation with Jennifer Lynn Barnes
("The Final Gambit) Recorded January 4, 2023
<https://www.youtube.com/watch?v=Xlh-sKaQ42w>
Robin Rapoport (she/her)
Reference Librarian (in the library Monday, Tuesday, and Wednesday)
Lincoln Public Library
3 Bedford Road
Lincoln, MA 01773

Currently Reading
<https://www.goodreads.com/user/email_signature_destination/1167648?utm_medium=reading_link_source=email_signature>
[image: Book Cover]
<https://www.goodreads.com/user/email_signature_destination/1167648?utm_medium=cover_source=email_signature>
[image: Goodreads Logo]
<https://www.goodreads.com/?utm_medium=gr_logo_source=email_signature>Get
your own email signature
<https://www.goodreads.com/user/email_signature_instructions?utm_medium=gyo_link_source=email_signature>
-- 
The LincolnTalk mailing list.
To post, send mail to Lincoln@lincolntalk.org.
Browse the archives at https://pairlist9.pair.net/mailman/private/lincoln/.
Change your subscription settings at 
https://pairlist9.pair.net/mailman/listinfo/lincoln.

Re: [PATCH 00/13] mm: jit/text allocator

2023-06-06 Thread Mike Rapoport

On Mon, Jun 05, 2023 at 11:09:34AM +0100, Mark Rutland wrote:
> On Mon, Jun 05, 2023 at 12:20:40PM +0300, Mike Rapoport wrote:
> > On Fri, Jun 02, 2023 at 10:35:09AM +0100, Mark Rutland wrote:
> >
> > It sill can be achieved with a single jit_alloc_arch_params(), just by
> > adding enum jit_type parameter to jit_text_alloc().
> 
> That feels backwards to me; it centralizes a bunch of information about
> distinct users to be able to shove that into a static array, when the 
> callsites
> can pass that information. 

The goal was not to shove everything into an array, but centralize
architecture requirements for code allocations. The callsites don't have
that information per se, they get it from the arch code, so having this
information in a single place per arch is better than spreading
MODULE_START, KPROBES_START etc all over.

I'd agree though that having types for jit_text_alloc is ugly and this
should be handled differently.

> What's *actually* common after separating out the ranges? Is it just the
> permissions?

On x86 everything, on arm64 apparently just the permissions.

I've started to summarize what are the restrictions for code placement for
modules, kprobes and bpf on different architectures, that's roughly what
I've got so far:

* x86 and s390 need everything within modules address space because of
PC-relative
* arm, arm64, loongarch, sparc64, riscv64, some of mips and
powerpc32 configurations require a dedicated modules address space; the
rest just use vmalloc address space
* all architectures that support kprobes except x86 and s390 don't use
relative jumps, so they don't care where kprobes insn_page will live
* not sure yet about BPF. Looks like on arm and arm64 it does not use
relative jumps, so it can be anywhere, didn't dig enough about the others.

> If we want this to be able to share allocations and so on, why can't we do 
> this
> like a kmem_cache, and have the callsite pass a pointer to the allocator data?
> That would make it easy for callsites to share an allocator or use a distinct
> one.

This maybe something worth exploring.

> Thanks,
> Mark.

-- 
Sincerely yours,
Mike.

Re: [PATCH 12/13] x86/jitalloc: prepare to allocate exectuatble memory as ROX

2023-06-05 Thread Mike Rapoport

On Mon, Jun 05, 2023 at 04:10:21PM +, Edgecombe, Rick P wrote:
> On Mon, 2023-06-05 at 11:11 +0300, Mike Rapoport wrote:
> > On Sun, Jun 04, 2023 at 10:52:44PM -0400, Steven Rostedt wrote:
> > > On Thu, 1 Jun 2023 16:54:36 -0700
> > > Nadav Amit  wrote:
> > > 
> > > > > The way text_poke() is used here, it is creating a new writable
> > > > > alias
> > > > > and flushing it for *each* write to the module (like for each
> > > > > write of
> > > > > an individual relocation, etc). I was just thinking it might
> > > > > warrant
> > > > > some batching or something.  
> > 
> > > > I am not advocating to do so, but if you want to have many
> > > > efficient
> > > > writes, perhaps you can just disable CR0.WP. Just saying that if
> > > > you
> > > > are about to write all over the memory, text_poke() does not
> > > > provide
> > > > too much security for the poking thread.
> > 
> > Heh, this is definitely and easier hack to implement :)
> 
> I don't know the details, but previously there was some strong dislike
> of CR0.WP toggling. And now there is also the problem of CET. Setting
> CR0.WP=0 will #GP if CR4.CET is 1 (as it currently is for kernel IBT).
> I guess you might get away with toggling them both in some controlled
> situation, but it might be a lot easier to hack up then to be made
> fully acceptable. It does sound much more efficient though.
 
I don't think we'd really want that, especially looking at 

WARN_ONCE(bits_missing, "CR0 WP bit went missing!?\n");

at native_write_cr0().
 
> > > Batching does exist, which is what the text_poke_queue() thing
> > > does.
> > 
> > For module loading text_poke_queue() will still be much slower than a
> > bunch
> > of memset()s for no good reason because we don't need all the
> > complexity of
> > text_poke_bp_batch() for module initialization because we are sure we
> > are
> > not patching live code.
> > 
> > What we'd need here is a new batching mode that will create a
> > writable
> > alias mapping at the beginning of apply_relocate_*() and
> > module_finalize(),
> > then it will use memcpy() to that writable alias and will tear the
> > mapping
> > down in the end.
> 
> It's probably only a tiny bit faster than keeping a separate writable
> allocation and text_poking it in at the end.

Right, but it still will be faster than text_poking every relocation.
 
> > Another option is to teach alternatives to update a writable copy
> > rather
> > than do in place changes like Song suggested. My feeling is that it
> > will be
> > more intrusive change though.
> 
> You mean keeping a separate RW allocation and then text_poking() the
> whole thing in when you are done? That is what I was trying to say at
> the beginning of this thread. The other benefit is you don't make the
> intermediate loading states of the module, executable.
> 
> I tried this technique previously [0], and I thought it was not too
> bad. In most of the callers it looks similar to what you have in
> do_text_poke(). Sometimes less, sometimes more. It might need
> enlightening of some of the stuff currently using text_poke() during
> module loading, like jump labels. So that bit is more intrusive, yea.
> But it sounds so much cleaner and well controlled. Did you have a
> particular trouble spot in mind?

Nothing in particular, except the intrusive part. Except the changes in
modules.c we'd need to teach alternatives to deal with a writable copy.
 
> [0]
> https://lore.kernel.org/lkml/20201120202426.18009-5-rick.p.edgeco...@intel.com/

-- 
Sincerely yours,
Mike.

Re: [PATCH 00/13] mm: jit/text allocator

2023-06-05 Thread Mike Rapoport

On Fri, Jun 02, 2023 at 10:35:09AM +0100, Mark Rutland wrote:
> On Thu, Jun 01, 2023 at 02:14:56PM -0400, Kent Overstreet wrote:
> > On Thu, Jun 01, 2023 at 05:12:03PM +0100, Mark Rutland wrote:
> > > For a while I have wanted to give kprobes its own allocator so that it 
> > > can work
> > > even with CONFIG_MODULES=n, and so that it doesn't have to waste VA space 
> > > in
> > > the modules area.
> > > 
> > > Given that, I think these should have their own allocator functions that 
> > > can be
> > > provided independently, even if those happen to use common infrastructure.
> > 
> > How much memory can kprobes conceivably use? I think we also want to try
> > to push back on combinatorial new allocators, if we can.
> 
> That depends on who's using it, and how (e.g. via BPF).
> 
> To be clear, I'm not necessarily asking for entirely different allocators, but
> I do thinkg that we want wrappers that can at least pass distinct start+end
> parameters to a common allocator, and for arm64's modules code I'd expect that
> we'd keep the range falblack logic out of the common allcoator, and just call
> it twice.
> 
> > > > Several architectures override module_alloc() because of various
> > > > constraints where the executable memory can be located and this causes
> > > > additional obstacles for improvements of code allocation.
> > > > 
> > > > This set splits code allocation from modules by introducing
> > > > jit_text_alloc(), jit_data_alloc() and jit_free() APIs, replaces call
> > > > sites of module_alloc() and module_memfree() with the new APIs and
> > > > implements core text and related allocation in a central place.
> > > > 
> > > > Instead of architecture specific overrides for module_alloc(), the
> > > > architectures that require non-default behaviour for text allocation 
> > > > must
> > > > fill jit_alloc_params structure and implement jit_alloc_arch_params() 
> > > > that
> > > > returns a pointer to that structure. If an architecture does not 
> > > > implement
> > > > jit_alloc_arch_params(), the defaults compatible with the current
> > > > modules::module_alloc() are used.
> > > 
> > > As above, I suspect that each of the callsites should probably be using 
> > > common
> > > infrastructure, but I don't think that a single jit_alloc_arch_params() 
> > > makes
> > > sense, since the parameters for each case may need to be distinct.
> > 
> > I don't see how that follows. The whole point of function parameters is
> > that they may be different :)
> 
> What I mean is that jit_alloc_arch_params() tries to aggregate common
> parameters, but they aren't actually common (e.g. the actual start+end range
> for allocation).

jit_alloc_arch_params() tries to aggregate architecture constraints and
requirements for allocations of executable memory and this exactly what
the first 6 patches of this set do.

A while ago Thomas suggested to use a structure that parametrizes
architecture constraints by the memory type used in modules [1] and Song
implemented the infrastructure for it and x86 part [2].

I liked the idea of defining parameters in a single structure, but I
thought that approaching the problem from the arch side rather than from
modules perspective will be better starting point, hence these patches.

I don't see a fundamental reason why a single structure cannot describe
what is needed for different code allocation cases, be it modules, kprobes
or bpf. There is of course an assumption that the core allocations will be
the same for all the users, and it seems to me that something like 

* allocate physical memory if allocator caches are empty
* map it in vmalloc or modules address space
* return memory from the allocator cache to the caller

will work for all usecases.

We might need separate caches for different cases on different
architectures, and a way to specify what cache should be used in the
allocator API, but that does not contradict a single structure for arch
specific parameters, but only makes it more elaborate, e.g. something like

enum jit_type {
JIT_MODULES_TEXT,
JIT_MODULES_DATA,
JIT_KPROBES,
JIT_FTRACE,
JIT_BPF,
JIT_TYPE_MAX,
};

struct jit_alloc_params {
struct jit_rangeranges[JIT_TYPE_MAX];
/* ... */
};

> > Can you give more detail on what parameters you need? If the only extra
> > parameter is just "does this allocation need to live close to kernel
> > text", that's not that big of a deal.
> 
> My thinking was that we at least need the start + end for each caller. That
> might be it, tbh.

Do you mean that modules will have something like

jit_text_alloc(size, MODULES_START, MODULES_END);

and kprobes will have

jit_text_alloc(size, KPROBES_START, KPROBES_END);
?

It sill can be achieved with a single jit_alloc_arch_params(), just by
adding enum jit_type parameter to jit_text_alloc().

[1] https://lore.kernel.org/linux-mm/87v8mndy3y.ffs@tglx/ 
[2]

Re: [PATCH 12/13] x86/jitalloc: prepare to allocate exectuatble memory as ROX

2023-06-05 Thread Mike Rapoport

On Sun, Jun 04, 2023 at 10:52:44PM -0400, Steven Rostedt wrote:
> On Thu, 1 Jun 2023 16:54:36 -0700
> Nadav Amit  wrote:
> 
> > > The way text_poke() is used here, it is creating a new writable alias
> > > and flushing it for *each* write to the module (like for each write of
> > > an individual relocation, etc). I was just thinking it might warrant
> > > some batching or something.  

> > I am not advocating to do so, but if you want to have many efficient
> > writes, perhaps you can just disable CR0.WP. Just saying that if you
> > are about to write all over the memory, text_poke() does not provide
> > too much security for the poking thread.

Heh, this is definitely and easier hack to implement :)

> Batching does exist, which is what the text_poke_queue() thing does.

For module loading text_poke_queue() will still be much slower than a bunch
of memset()s for no good reason because we don't need all the complexity of
text_poke_bp_batch() for module initialization because we are sure we are
not patching live code.

What we'd need here is a new batching mode that will create a writable
alias mapping at the beginning of apply_relocate_*() and module_finalize(),
then it will use memcpy() to that writable alias and will tear the mapping
down in the end.

Another option is to teach alternatives to update a writable copy rather
than do in place changes like Song suggested. My feeling is that it will be
more intrusive change though.

> -- Steve
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH 12/13] x86/jitalloc: prepare to allocate exectuatble memory as ROX

2023-06-01 Thread Mike Rapoport

On Thu, Jun 01, 2023 at 12:30:50PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 01, 2023 at 01:12:56PM +0300, Mike Rapoport wrote:
> 
> > +static void __init_or_module do_text_poke(void *addr, const void *opcode, 
> > size_t len)
> > +{
> > +   if (system_state < SYSTEM_RUNNING) {
> > +   text_poke_early(addr, opcode, len);
> > +   } else {
> > +   mutex_lock(_mutex);
> > +   text_poke(addr, opcode, len);
> > +   mutex_unlock(_mutex);
> > +   }
> > +}
> 
> So I don't much like do_text_poke(); why?

I believe the idea was to keep memcpy for early boot before the kernel
image is protected without going and adding if (is_module_text_address())
all over the place.

I think this can be used instead without updating all the call sites of
text_poke_early():

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 91057de8e6bc..f994e63e9903 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1458,7 +1458,7 @@ void __init_or_module text_poke_early(void *addr, const 
void *opcode,
 * code cannot be running and speculative code-fetches are
 * prevented. Just change the code.
 */
-   memcpy(addr, opcode, len);
+   text_poke_copy(addr, opcode, len);
} else {
local_irq_save(flags);
memcpy(addr, opcode, len);
 
> > diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> > index aa99536b824c..d50595f2c1a6 100644
> > --- a/arch/x86/kernel/ftrace.c
> > +++ b/arch/x86/kernel/ftrace.c
> > @@ -118,10 +118,13 @@ ftrace_modify_code_direct(unsigned long ip, const 
> > char *old_code,
> > return ret;
> >  
> > /* replace the text with the new text */
> > -   if (ftrace_poke_late)
> > +   if (ftrace_poke_late) {
> > text_poke_queue((void *)ip, new_code, MCOUNT_INSN_SIZE, NULL);
> > -   else
> > -   text_poke_early((void *)ip, new_code, MCOUNT_INSN_SIZE);
> > +   } else {
> > +   mutex_lock(_mutex);
> > +   text_poke((void *)ip, new_code, MCOUNT_INSN_SIZE);
> > +   mutex_unlock(_mutex);
> > +   }
> > return 0;
> >  }
> 
> And in the above case it's actively wrong for loosing the _queue()
> thing.

-- 
Sincerely yours,
Mike.

[PATCH 13/13] x86/jitalloc: make memory allocated for code ROX

2023-06-01 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

When STRICT_KERNEL_RWX or STRICT_MODULE_RWX is enabled, force text
allocations to use KERNEL_PAGE_ROX.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/Kconfig |  3 +++
 arch/x86/Kconfig |  1 +
 arch/x86/kernel/ftrace.c |  3 ---
 arch/x86/mm/init.c   |  6 ++
 include/linux/jitalloc.h |  2 ++
 mm/jitalloc.c| 21 +
 6 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 479a7b8be191..e7c4b01307d7 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1307,6 +1307,9 @@ config STRICT_MODULE_RWX
  and non-text memory will be made non-executable. This provides
  protection against certain security exploits (e.g. writing to text)
 
+config ARCH_HAS_TEXT_POKE
+   def_bool n
+
 # select if the architecture provides an asm/dma-direct.h header
 config ARCH_HAS_PHYS_TO_DMA
bool
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index fac4add6ce16..e1a512f557de 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -96,6 +96,7 @@ config X86
select ARCH_HAS_SET_DIRECT_MAP
select ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_HAS_STRICT_MODULE_RWX
+   select ARCH_HAS_TEXT_POKE
select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
select ARCH_HAS_SYSCALL_WRAPPER
select ARCH_HAS_UBSAN_SANITIZE_ALL
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index d50595f2c1a6..bd4dd8974ee6 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -313,7 +313,6 @@ create_trampoline(struct ftrace_ops *ops, unsigned int 
*tramp_size)
unsigned long call_offset;
unsigned long jmp_offset;
unsigned long offset;
-   unsigned long npages;
unsigned long size;
unsigned long *ptr;
void *trampoline;
@@ -350,7 +349,6 @@ create_trampoline(struct ftrace_ops *ops, unsigned int 
*tramp_size)
return 0;
 
*tramp_size = size + RET_SIZE + sizeof(void *);
-   npages = DIV_ROUND_UP(*tramp_size, PAGE_SIZE);
 
/* Copy ftrace_caller onto the trampoline memory */
ret = text_poke_copy(trampoline, (void *)start_offset, size);
@@ -416,7 +414,6 @@ create_trampoline(struct ftrace_ops *ops, unsigned int 
*tramp_size)
/* ALLOC_TRAMP flags lets us know we created it */
ops->flags |= FTRACE_OPS_FL_ALLOC_TRAMP;
 
-   set_memory_rox((unsigned long)trampoline, npages);
return (unsigned long)trampoline;
 fail:
tramp_free(trampoline);
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index ffaf9a3840ce..c314738991fa 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -1127,6 +1127,12 @@ struct jit_alloc_params *jit_alloc_arch_params(void)
jit_alloc_params.text.start = MODULES_VADDR + get_jit_load_offset();
jit_alloc_params.text.end = MODULES_END;
 
+   if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX) ||
+   IS_ENABLED(CONFIG_STRICT_MODULE_RWX)) {
+   jit_alloc_params.text.pgprot = PAGE_KERNEL_ROX;
+   jit_alloc_params.flags |= JIT_ALLOC_USE_TEXT_POKE;
+   }
+
return _alloc_params;
 }
 #endif /* CONFIG_JIT_ALLOC */
diff --git a/include/linux/jitalloc.h b/include/linux/jitalloc.h
index 0ba5ef785a85..0e29e87acefe 100644
--- a/include/linux/jitalloc.h
+++ b/include/linux/jitalloc.h
@@ -15,9 +15,11 @@
 /**
  * enum jit_alloc_flags - options for executable memory allocations
  * @JIT_ALLOC_KASAN_SHADOW:allocate kasan shadow
+ * @JIT_ALLOC_USE_TEXT_POKE:   use text poking APIs to update memory
  */
 enum jit_alloc_flags {
JIT_ALLOC_KASAN_SHADOW  = (1 << 0),
+   JIT_ALLOC_USE_TEXT_POKE = (1 << 1),
 };
 
 /**
diff --git a/mm/jitalloc.c b/mm/jitalloc.c
index a8ae64364d56..15d1067faf3f 100644
--- a/mm/jitalloc.c
+++ b/mm/jitalloc.c
@@ -7,6 +7,26 @@
 
 static struct jit_alloc_params jit_alloc_params;
 
+#ifdef CONFIG_ARCH_HAS_TEXT_POKE
+#include 
+
+static inline void jit_text_poke_copy(void *dst, const void *src, size_t len)
+{
+   if (jit_alloc_params.flags & JIT_ALLOC_USE_TEXT_POKE)
+   text_poke_copy(dst, src, len);
+   else
+   memcpy(dst, src, len);
+}
+
+static inline void jit_text_poke_set(void *addr, int c, size_t len)
+{
+   if (jit_alloc_params.flags & JIT_ALLOC_USE_TEXT_POKE)
+   text_poke_set(addr, c, len);
+   else
+   memset(addr, c, len);
+}
+
+#else
 static inline void jit_text_poke_copy(void *dst, const void *src, size_t len)
 {
memcpy(dst, src, len);
@@ -16,6 +36,7 @@ static inline void jit_text_poke_set(void *addr, int c, 
size_t len)
 {
memset(addr, c, len);
 }
+#endif
 
 static void *jit_alloc(size_t len, unsigned int alignment, pgprot_t pgprot,
   unsigned long start, unsigned long end,
-- 
2.35.1

[PATCH 12/13] x86/jitalloc: prepare to allocate exectuatble memory as ROX

2023-06-01 Thread Mike Rapoport

From: Song Liu 

Replace direct memory writes to memory allocated for code with text poking
to allow allocation of executable memory as ROX.

The only exception is arch_prepare_bpf_trampoline() that cannot jit
directly into module memory yet, so it uses set_memory calls to
unprotect the memory before writing to it and to protect memory in the
end.

Signed-off-by: Song Liu 
Co-developed-by: Mike Rapoport (IBM) 
Signed-off-by: Mike Rapoport (IBM) 
---
 arch/x86/kernel/alternative.c | 43 +++
 arch/x86/kernel/ftrace.c  | 41 +
 arch/x86/kernel/module.c  | 24 +--
 arch/x86/kernel/static_call.c | 10 
 arch/x86/kernel/unwind_orc.c  | 13 +++
 arch/x86/net/bpf_jit_comp.c   | 22 +-
 6 files changed, 91 insertions(+), 62 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index f615e0cb6d93..91057de8e6bc 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -76,6 +77,19 @@ do { 
\
}   \
 } while (0)
 
+void text_poke_early(void *addr, const void *opcode, size_t len);
+
+static void __init_or_module do_text_poke(void *addr, const void *opcode, 
size_t len)
+{
+   if (system_state < SYSTEM_RUNNING) {
+   text_poke_early(addr, opcode, len);
+   } else {
+   mutex_lock(_mutex);
+   text_poke(addr, opcode, len);
+   mutex_unlock(_mutex);
+   }
+}
+
 static const unsigned char x86nops[] =
 {
BYTES_NOP1,
@@ -108,7 +122,7 @@ static void __init_or_module add_nops(void *insns, unsigned 
int len)
unsigned int noplen = len;
if (noplen > ASM_NOP_MAX)
noplen = ASM_NOP_MAX;
-   memcpy(insns, x86_nops[noplen], noplen);
+   do_text_poke(insns, x86_nops[noplen], noplen);
insns += noplen;
len -= noplen;
}
@@ -120,7 +134,6 @@ extern s32 __cfi_sites[], __cfi_sites_end[];
 extern s32 __ibt_endbr_seal[], __ibt_endbr_seal_end[];
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
 extern s32 __smp_locks[], __smp_locks_end[];
-void text_poke_early(void *addr, const void *opcode, size_t len);
 
 /*
  * Are we looking at a near JMP with a 1 or 4-byte displacement.
@@ -331,7 +344,7 @@ void __init_or_module noinline apply_alternatives(struct 
alt_instr *start,
 
DUMP_BYTES(insn_buff, insn_buff_sz, "%px: final_insn: ", instr);
 
-   text_poke_early(instr, insn_buff, insn_buff_sz);
+   do_text_poke(instr, insn_buff, insn_buff_sz);
 
 next:
optimize_nops(instr, a->instrlen);
@@ -564,7 +577,7 @@ void __init_or_module noinline apply_retpolines(s32 *start, 
s32 *end)
optimize_nops(bytes, len);
DUMP_BYTES(((u8*)addr),  len, "%px: orig: ", addr);
DUMP_BYTES(((u8*)bytes), len, "%px: repl: ", addr);
-   text_poke_early(addr, bytes, len);
+   do_text_poke(addr, bytes, len);
}
}
 }
@@ -638,7 +651,7 @@ void __init_or_module noinline apply_returns(s32 *start, 
s32 *end)
if (len == insn.length) {
DUMP_BYTES(((u8*)addr),  len, "%px: orig: ", addr);
DUMP_BYTES(((u8*)bytes), len, "%px: repl: ", addr);
-   text_poke_early(addr, bytes, len);
+   do_text_poke(addr, bytes, len);
}
}
 }
@@ -674,7 +687,7 @@ static void poison_endbr(void *addr, bool warn)
 */
DUMP_BYTES(((u8*)addr), 4, "%px: orig: ", addr);
DUMP_BYTES(((u8*)), 4, "%px: repl: ", addr);
-   text_poke_early(addr, , 4);
+   do_text_poke(addr, , 4);
 }
 
 /*
@@ -869,7 +882,7 @@ static int cfi_disable_callers(s32 *start, s32 *end)
if (!hash) /* nocfi callers */
continue;
 
-   text_poke_early(addr, jmp, 2);
+   do_text_poke(addr, jmp, 2);
}
 
return 0;
@@ -892,7 +905,7 @@ static int cfi_enable_callers(s32 *start, s32 *end)
if (!hash) /* nocfi callers */
continue;
 
-   text_poke_early(addr, mov, 2);
+   do_text_poke(addr, mov, 2);
}
 
return 0;
@@ -913,7 +926,7 @@ static int cfi_rand_preamble(s32 *start, s32 *end)
return -EINVAL;
 
hash = cfi_rehash(hash);
-   text_poke_early(addr + 1, , 4);
+   do_text_poke(addr + 1, , 4);
}
 
ret

[PATCH 11/13] ftrace: Add swap_func to ftrace_process_locs()

2023-06-01 Thread Mike Rapoport

From: Song Liu 

ftrace_process_locs sorts module mcount, which is inside RO memory. Add a
ftrace_swap_func so that archs can use RO-memory-poke function to do the
sorting.

Signed-off-by: Song Liu 
---
 include/linux/ftrace.h |  2 ++
 kernel/trace/ftrace.c  | 13 -
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index b23bdd414394..fe443b8ed32c 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1166,4 +1166,6 @@ unsigned long arch_syscall_addr(int nr);
 
 #endif /* CONFIG_FTRACE_SYSCALLS */
 
+void ftrace_swap_func(void *a, void *b, int n);
+
 #endif /* _LINUX_FTRACE_H */
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 764668467155..f5ddc9d4cfb6 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6430,6 +6430,17 @@ static void test_is_sorted(unsigned long *start, 
unsigned long count)
 }
 #endif
 
+void __weak ftrace_swap_func(void *a, void *b, int n)
+{
+   unsigned long t;
+
+   WARN_ON_ONCE(n != sizeof(t));
+
+   t = *((unsigned long *)a);
+   *(unsigned long *)a = *(unsigned long *)b;
+   *(unsigned long *)b = t;
+}
+
 static int ftrace_process_locs(struct module *mod,
   unsigned long *start,
   unsigned long *end)
@@ -6455,7 +6466,7 @@ static int ftrace_process_locs(struct module *mod,
 */
if (!IS_ENABLED(CONFIG_BUILDTIME_MCOUNT_SORT) || mod) {
sort(start, count, sizeof(*start),
-ftrace_cmp_ips, NULL);
+ftrace_cmp_ips, ftrace_swap_func);
} else {
test_is_sorted(start, count);
}
-- 
2.35.1

[PATCH 10/13] modules, jitalloc: prepare to allocate executable memory as ROX

2023-06-01 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

When executable memory will be allocated as ROX it won't be possible to
update it using memset() and memcpy().

Introduce jit_update_copy() and jit_update_set() APIs and use them in
modules loading code instead of memcpy() and memset().

Signed-off-by: Mike Rapoport (IBM) 
---
 include/linux/jitalloc.h |  2 ++
 kernel/module/main.c | 19 ++-
 mm/jitalloc.c| 20 
 3 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/include/linux/jitalloc.h b/include/linux/jitalloc.h
index 7f8cafb3cfe9..0ba5ef785a85 100644
--- a/include/linux/jitalloc.h
+++ b/include/linux/jitalloc.h
@@ -55,6 +55,8 @@ struct jit_alloc_params *jit_alloc_arch_params(void);
 void jit_free(void *buf);
 void *jit_text_alloc(size_t len);
 void *jit_data_alloc(size_t len);
+void jit_update_copy(void *buf, void *new_buf, size_t len);
+void jit_update_set(void *buf, int c, size_t len);
 
 #ifdef CONFIG_JIT_ALLOC
 void jit_alloc_init(void);
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 91477aa5f671..9f0711c42aa2 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1197,9 +1197,19 @@ void __weak module_arch_freeing_init(struct module *mod)
 
 static void *module_memory_alloc(unsigned int size, enum mod_mem_type type)
 {
-   if (mod_mem_type_is_data(type))
-   return jit_data_alloc(size);
-   return jit_text_alloc(size);
+   void *p;
+
+   if (mod_mem_type_is_data(type)) {
+   p = jit_data_alloc(size);
+   if (p)
+   memset(p, 0, size);
+   } else {
+   p = jit_text_alloc(size);
+   if (p)
+   jit_update_set(p, 0, size);
+   }
+
+   return p;
 }
 
 static void module_memory_free(void *ptr, enum mod_mem_type type)
@@ -2223,7 +2233,6 @@ static int move_module(struct module *mod, struct 
load_info *info)
t = type;
goto out_enomem;
}
-   memset(ptr, 0, mod->mem[type].size);
mod->mem[type].base = ptr;
}
 
@@ -2251,7 +2260,7 @@ static int move_module(struct module *mod, struct 
load_info *info)
ret = -ENOEXEC;
goto out_enomem;
}
-   memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);
+   jit_update_copy(dest, (void *)shdr->sh_addr, 
shdr->sh_size);
}
/*
 * Update the userspace copy's ELF section address to point to
diff --git a/mm/jitalloc.c b/mm/jitalloc.c
index 16fd715d501a..a8ae64364d56 100644
--- a/mm/jitalloc.c
+++ b/mm/jitalloc.c
@@ -7,6 +7,16 @@
 
 static struct jit_alloc_params jit_alloc_params;
 
+static inline void jit_text_poke_copy(void *dst, const void *src, size_t len)
+{
+   memcpy(dst, src, len);
+}
+
+static inline void jit_text_poke_set(void *addr, int c, size_t len)
+{
+   memset(addr, c, len);
+}
+
 static void *jit_alloc(size_t len, unsigned int alignment, pgprot_t pgprot,
   unsigned long start, unsigned long end,
   unsigned long fallback_start, unsigned long fallback_end,
@@ -86,6 +96,16 @@ void *jit_data_alloc(size_t len)
 fallback_start, fallback_end, kasan);
 }
 
+void jit_update_copy(void *buf, void *new_buf, size_t len)
+{
+   jit_text_poke_copy(buf, new_buf, len);
+}
+
+void jit_update_set(void *addr, int c, size_t len)
+{
+   jit_text_poke_set(addr, c, len);
+}
+
 struct jit_alloc_params * __weak jit_alloc_arch_params(void)
 {
return NULL;
-- 
2.35.1

[PATCH 09/13] kprobes: remove dependcy on CONFIG_MODULES

2023-06-01 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

kprobes depended on CONFIG_MODULES because it has to allocate memory for
code.

Since code allocations are now implemented with jitalloc, kprobes can be
enabled in non-modular kernels.

Add #ifdef CONFIG_MODULE guars for the code dealing with kprobes inside
modules, make CONFIG_KPROBES select CONFIG_JIT_ALLOC and drop the
dependency of CONFIG_KPROBES on CONFIG_MODULES

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/Kconfig|  2 +-
 kernel/kprobes.c| 43 +
 kernel/trace/trace_kprobe.c | 11 ++
 3 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 205fd23e0cad..479a7b8be191 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -39,9 +39,9 @@ config GENERIC_ENTRY
 
 config KPROBES
bool "Kprobes"
-   depends on MODULES
depends on HAVE_KPROBES
select KALLSYMS
+   select JIT_ALLOC
select TASKS_RCU if PREEMPTION
help
  Kprobes allows you to trap at almost any kernel address and
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 3caf3561c048..11c1cfbb11ae 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1568,6 +1568,7 @@ static int check_kprobe_address_safe(struct kprobe *p,
goto out;
}
 
+#ifdef CONFIG_MODULES
/* Check if 'p' is probing a module. */
*probed_mod = __module_text_address((unsigned long) p->addr);
if (*probed_mod) {
@@ -1591,6 +1592,8 @@ static int check_kprobe_address_safe(struct kprobe *p,
ret = -ENOENT;
}
}
+#endif
+
 out:
preempt_enable();
jump_label_unlock();
@@ -2484,24 +2487,6 @@ int kprobe_add_area_blacklist(unsigned long start, 
unsigned long end)
return 0;
 }
 
-/* Remove all symbols in given area from kprobe blacklist */
-static void kprobe_remove_area_blacklist(unsigned long start, unsigned long 
end)
-{
-   struct kprobe_blacklist_entry *ent, *n;
-
-   list_for_each_entry_safe(ent, n, _blacklist, list) {
-   if (ent->start_addr < start || ent->start_addr >= end)
-   continue;
-   list_del(>list);
-   kfree(ent);
-   }
-}
-
-static void kprobe_remove_ksym_blacklist(unsigned long entry)
-{
-   kprobe_remove_area_blacklist(entry, entry + 1);
-}
-
 int __weak arch_kprobe_get_kallsym(unsigned int *symnum, unsigned long *value,
   char *type, char *sym)
 {
@@ -2566,6 +2551,25 @@ static int __init populate_kprobe_blacklist(unsigned 
long *start,
return ret ? : arch_populate_kprobe_blacklist();
 }
 
+#ifdef CONFIG_MODULES
+/* Remove all symbols in given area from kprobe blacklist */
+static void kprobe_remove_area_blacklist(unsigned long start, unsigned long 
end)
+{
+   struct kprobe_blacklist_entry *ent, *n;
+
+   list_for_each_entry_safe(ent, n, _blacklist, list) {
+   if (ent->start_addr < start || ent->start_addr >= end)
+   continue;
+   list_del(>list);
+   kfree(ent);
+   }
+}
+
+static void kprobe_remove_ksym_blacklist(unsigned long entry)
+{
+   kprobe_remove_area_blacklist(entry, entry + 1);
+}
+
 static void add_module_kprobe_blacklist(struct module *mod)
 {
unsigned long start, end;
@@ -2667,6 +2671,7 @@ static struct notifier_block kprobe_module_nb = {
.notifier_call = kprobes_module_callback,
.priority = 0
 };
+#endif
 
 void kprobe_free_init_mem(void)
 {
@@ -2726,8 +2731,10 @@ static int __init init_kprobes(void)
err = arch_init_kprobes();
if (!err)
err = register_die_notifier(_exceptions_nb);
+#ifdef CONFIG_MODULES
if (!err)
err = register_module_notifier(_module_nb);
+#endif
 
kprobes_initialized = (err == 0);
kprobe_sysctls_init();
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 59cda19a9033..cf804e372554 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -111,6 +111,7 @@ static nokprobe_inline bool 
trace_kprobe_within_module(struct trace_kprobe *tk,
return strncmp(module_name(mod), name, len) == 0 && name[len] == ':';
 }
 
+#ifdef CONFIG_MODULES
 static nokprobe_inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
 {
char *p;
@@ -129,6 +130,12 @@ static nokprobe_inline bool 
trace_kprobe_module_exist(struct trace_kprobe *tk)
 
return ret;
 }
+#else
+static inline bool trace_kprobe_module_exist(struct trace_kprobe *tk)
+{
+   return false;
+}
+#endif
 
 static bool trace_kprobe_is_busy(struct dyn_event *ev)
 {
@@ -670,6 +677,7 @@ static int register_trace_kprobe(struct trace_kprobe *tk)
return ret;
 }
 
+#ifdef CONFIG_MODULES
 /* Module notifier call back, checking event on the module */
 stat

[PATCH 08/13] arch: make jitalloc setup available regardless of CONFIG_MODULES

2023-06-01 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

jitalloc does not depend on modules, on the contrary modules use
jitalloc.

To make jitalloc available when CONFIG_MODULES=n, for instance for
kprobes, split jit_alloc_params initialization out from
arch/kernel/module.c and compile it when CONFIG_JIT_ALLOC=y

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/arm/kernel/module.c   | 32 --
 arch/arm/mm/init.c | 35 
 arch/arm64/kernel/module.c | 40 
 arch/arm64/mm/init.c   | 42 ++
 arch/loongarch/kernel/module.c | 14 
 arch/loongarch/mm/init.c   | 16 +
 arch/mips/kernel/module.c  | 17 --
 arch/mips/mm/init.c| 19 +++
 arch/parisc/kernel/module.c| 17 --
 arch/parisc/mm/init.c  | 21 -
 arch/powerpc/kernel/module.c   | 39 ---
 arch/powerpc/mm/mem.c  | 41 +
 arch/riscv/kernel/module.c | 16 -
 arch/riscv/mm/init.c   | 18 +++
 arch/s390/kernel/module.c  | 32 --
 arch/s390/mm/init.c| 35 
 arch/sparc/kernel/module.c | 19 ---
 arch/sparc/mm/Makefile |  2 ++
 arch/sparc/mm/jitalloc.c   | 21 +
 19 files changed, 249 insertions(+), 227 deletions(-)
 create mode 100644 arch/sparc/mm/jitalloc.c

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index 83ccbf98164f..054e799e7091 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -16,44 +16,12 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
 #include 
 #include 
 
-#ifdef CONFIG_XIP_KERNEL
-/*
- * The XIP kernel text is mapped in the module area for modules and
- * some other stuff to work without any indirect relocations.
- * MODULES_VADDR is redefined here and not in asm/memory.h to avoid
- * recompiling the whole kernel when CONFIG_XIP_KERNEL is turned on/off.
- */
-#undef MODULES_VADDR
-#define MODULES_VADDR  (((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
-#endif
-
-#ifdef CONFIG_MMU
-static struct jit_alloc_params jit_alloc_params = {
-   .alignment  = 1,
-   .text.start = MODULES_VADDR,
-   .text.end   = MODULES_END,
-};
-
-struct jit_alloc_params *jit_alloc_arch_params(void)
-{
-   jit_alloc_params.text.pgprot = PAGE_KERNEL_EXEC;
-
-   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
-   jit_alloc_params.text.fallback_start = VMALLOC_START;
-   jit_alloc_params.text.fallback_end = VMALLOC_END;
-   }
-
-   return _alloc_params;
-}
-#endif
-
 bool module_init_section(const char *name)
 {
return strstarts(name, ".init") ||
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index ce64bdb55a16..e492625b7f3d 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -486,3 +487,37 @@ void free_initrd_mem(unsigned long start, unsigned long 
end)
free_reserved_area((void *)start, (void *)end, -1, "initrd");
 }
 #endif
+
+#ifdef CONFIG_JIT_ALLOC
+#ifdef CONFIG_XIP_KERNEL
+/*
+ * The XIP kernel text is mapped in the module area for modules and
+ * some other stuff to work without any indirect relocations.
+ * MODULES_VADDR is redefined here and not in asm/memory.h to avoid
+ * recompiling the whole kernel when CONFIG_XIP_KERNEL is turned on/off.
+ */
+#undef MODULES_VADDR
+#define MODULES_VADDR  (((unsigned long)_exiprom + ~PMD_MASK) & PMD_MASK)
+#endif
+
+#ifdef CONFIG_MMU
+static struct jit_alloc_params jit_alloc_params = {
+   .alignment  = 1,
+   .text.start = MODULES_VADDR,
+   .text.end   = MODULES_END,
+};
+
+struct jit_alloc_params *jit_alloc_arch_params(void)
+{
+   jit_alloc_params.text.pgprot = PAGE_KERNEL_EXEC;
+
+   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
+   jit_alloc_params.text.fallback_start = VMALLOC_START;
+   jit_alloc_params.text.fallback_end = VMALLOC_END;
+   }
+
+   return _alloc_params;
+}
+#endif
+
+#endif /* CONFIG_JIT_ALLOC */
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 91ffcff5a44c..6d09b29fe9db 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -17,51 +17,11 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
 
-static struct jit_alloc_params jit_alloc_params = {
-   .alignment  = JIT_ALLOC_ALIGN,
-   .flags  = JIT_ALLOC_KASAN_SHADOW,
-};
-
-struct jit_alloc_params *jit_alloc_arch_params(void)
-{
-   u64 module_alloc_end = module_alloc_base + MODULES_VSIZE;
-
-   if (IS_ENABLED(CONFIG_KASAN_GENERIC) ||
-   IS_ENABLED(CONFIG_KASAN_SW_TAGS))
-   /* don't

[PATCH 07/13] x86/ftrace: enable dynamic ftrace without CONFIG_MODULES

2023-06-01 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Dynamic ftrace must allocate memory for code and this was impossible
without CONFIG_MODULES.

With jitalloc separated from the modules code, the jit_text_alloc() is
available regardless of CONFIG_MODULE.

Move jitalloc initialization to x86/mm/init.c so that it won't get
compiled away when CONFIG_MODULE=n and enable dynamic ftrace
unconditionally.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/x86/Kconfig |  1 +
 arch/x86/kernel/ftrace.c |  9 
 arch/x86/kernel/module.c | 44 --
 arch/x86/mm/init.c   | 46 
 4 files changed, 47 insertions(+), 53 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 53bab123a8ee..fac4add6ce16 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -35,6 +35,7 @@ config X86_64
select SWIOTLB
select ARCH_HAS_ELFCORE_COMPAT
select ZONE_DMA32
+   select JIT_ALLOC if DYNAMIC_FTRACE
 
 config FORCE_DYNAMIC_FTRACE
def_bool y
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 157c8a799704..aa99536b824c 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -261,7 +261,6 @@ void arch_ftrace_update_code(int command)
 /* Currently only x86_64 supports dynamic trampolines */
 #ifdef CONFIG_X86_64
 
-#ifdef CONFIG_MODULES
 /* Module allocation simplifies allocating memory for code */
 static inline void *alloc_tramp(unsigned long size)
 {
@@ -271,14 +270,6 @@ static inline void tramp_free(void *tramp)
 {
jit_free(tramp);
 }
-#else
-/* Trampolines can only be created if modules are supported */
-static inline void *alloc_tramp(unsigned long size)
-{
-   return NULL;
-}
-static inline void tramp_free(void *tramp) { }
-#endif
 
 /* Defined as markers to the end of the ftrace default trampolines */
 extern void ftrace_regs_caller_end(void);
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index cacca613b8bd..94a00dc103cd 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -19,7 +19,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -37,49 +36,6 @@ do { \
 } while (0)
 #endif
 
-#ifdef CONFIG_RANDOMIZE_BASE
-static unsigned long module_load_offset;
-
-/* Mutex protects the module_load_offset. */
-static DEFINE_MUTEX(module_kaslr_mutex);
-
-static unsigned long int get_module_load_offset(void)
-{
-   if (kaslr_enabled()) {
-   mutex_lock(_kaslr_mutex);
-   /*
-* Calculate the module_load_offset the first time this
-* code is called. Once calculated it stays the same until
-* reboot.
-*/
-   if (module_load_offset == 0)
-   module_load_offset =
-   get_random_u32_inclusive(1, 1024) * PAGE_SIZE;
-   mutex_unlock(_kaslr_mutex);
-   }
-   return module_load_offset;
-}
-#else
-static unsigned long int get_module_load_offset(void)
-{
-   return 0;
-}
-#endif
-
-static struct jit_alloc_params jit_alloc_params = {
-   .alignment  = JIT_ALLOC_ALIGN,
-   .flags  = JIT_ALLOC_KASAN_SHADOW,
-};
-
-struct jit_alloc_params *jit_alloc_arch_params(void)
-{
-   jit_alloc_params.text.pgprot = PAGE_KERNEL;
-   jit_alloc_params.text.start = MODULES_VADDR + get_module_load_offset();
-   jit_alloc_params.text.end = MODULES_END;
-
-   return _alloc_params;
-}
-
 #ifdef CONFIG_X86_32
 int apply_relocate(Elf32_Shdr *sechdrs,
   const char *strtab,
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 3cdac0f0055d..ffaf9a3840ce 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1084,3 +1085,48 @@ unsigned long arch_max_swapfile_size(void)
return pages;
 }
 #endif
+
+#ifdef CONFIG_JIT_ALLOC
+#ifdef CONFIG_RANDOMIZE_BASE
+static unsigned long jit_load_offset;
+
+/* Mutex protects the jit_load_offset. */
+static DEFINE_MUTEX(jit_kaslr_mutex);
+
+static unsigned long int get_jit_load_offset(void)
+{
+   if (kaslr_enabled()) {
+   mutex_lock(_kaslr_mutex);
+   /*
+* Calculate the jit_load_offset the first time this
+* code is called. Once calculated it stays the same until
+* reboot.
+*/
+   if (jit_load_offset == 0)
+   jit_load_offset =
+   get_random_u32_inclusive(1, 1024) * PAGE_SIZE;
+   mutex_unlock(_kaslr_mutex);
+   }
+   return jit_load_offset;
+}
+#else
+static unsigned long int get_jit_load_offset(void)
+{
+   return 0;
+}
+#endif
+
+static struct jit_alloc_params jit_alloc_params = {
+   .alignment  = JIT_ALLOC_ALIGN,
+   .flags  = JIT_ALLOC_KA

[PATCH 06/13] mm/jitalloc: introduce jit_data_alloc()

2023-06-01 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Data related to code allocations, such as module data section, need to
comply with architecture constraints for its placement and its
allocation right now was done using jit_text_alloc().

Create a dedicated API for allocating data related to code allocations
and allow architectures to define address ranges for data allocations.

Since currently this is only relevant for powerpc variants that use the
VMALLOC address space for module data allocations, automatically reuse
address ranges defined for text unless address range for data is
explicitly defined by an architecture.

With separation of code and data allocations, data sections of the
modules are now mapped as PAGE_KERNEL rather than PAGE_KERNEL_EXEC which
was a default on many architectures.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/powerpc/kernel/module.c |  8 
 include/linux/jitalloc.h |  2 ++
 kernel/module/main.c | 15 +++
 mm/jitalloc.c| 36 
 4 files changed, 49 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index 83bdedc7eba0..b58af61e90c0 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -96,6 +96,10 @@ static struct jit_alloc_params jit_alloc_params = {
 
 struct jit_alloc_params *jit_alloc_arch_params(void)
 {
+   /*
+* BOOK3S_32 and 8xx define MODULES_VADDR for text allocations and
+* allow allocating data in the entire vmalloc space
+*/
 #ifdef MODULES_VADDR
pgprot_t prot = strict_module_rwx_enabled() ? PAGE_KERNEL : 
PAGE_KERNEL_EXEC;
unsigned long limit = (unsigned long)_etext - SZ_32M;
@@ -112,6 +116,10 @@ struct jit_alloc_params *jit_alloc_arch_params(void)
jit_alloc_params.text.start = MODULES_VADDR;
jit_alloc_params.text.end = MODULES_END;
}
+
+   jit_alloc_params.data.pgprot= PAGE_KERNEL;
+   jit_alloc_params.data.start = VMALLOC_START;
+   jit_alloc_params.data.end   = VMALLOC_END;
 #else
jit_alloc_params.text.start = VMALLOC_START;
jit_alloc_params.text.end = VMALLOC_END;
diff --git a/include/linux/jitalloc.h b/include/linux/jitalloc.h
index 823b13706a90..7f8cafb3cfe9 100644
--- a/include/linux/jitalloc.h
+++ b/include/linux/jitalloc.h
@@ -45,6 +45,7 @@ struct jit_address_space {
  */
 struct jit_alloc_params {
struct jit_address_spacetext;
+   struct jit_address_spacedata;
enum jit_alloc_flagsflags;
unsigned intalignment;
 };
@@ -53,6 +54,7 @@ struct jit_alloc_params *jit_alloc_arch_params(void);
 
 void jit_free(void *buf);
 void *jit_text_alloc(size_t len);
+void *jit_data_alloc(size_t len);
 
 #ifdef CONFIG_JIT_ALLOC
 void jit_alloc_init(void);
diff --git a/kernel/module/main.c b/kernel/module/main.c
index dfb7fa109f1a..91477aa5f671 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1195,25 +1195,16 @@ void __weak module_arch_freeing_init(struct module *mod)
 {
 }
 
-static bool mod_mem_use_vmalloc(enum mod_mem_type type)
-{
-   return IS_ENABLED(CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC) &&
-   mod_mem_type_is_core_data(type);
-}
-
 static void *module_memory_alloc(unsigned int size, enum mod_mem_type type)
 {
-   if (mod_mem_use_vmalloc(type))
-   return vzalloc(size);
+   if (mod_mem_type_is_data(type))
+   return jit_data_alloc(size);
return jit_text_alloc(size);
 }
 
 static void module_memory_free(void *ptr, enum mod_mem_type type)
 {
-   if (mod_mem_use_vmalloc(type))
-   vfree(ptr);
-   else
-   jit_free(ptr);
+   jit_free(ptr);
 }
 
 static void free_mod_mem(struct module *mod)
diff --git a/mm/jitalloc.c b/mm/jitalloc.c
index 221940e36b46..16fd715d501a 100644
--- a/mm/jitalloc.c
+++ b/mm/jitalloc.c
@@ -72,6 +72,20 @@ void *jit_text_alloc(size_t len)
 fallback_start, fallback_end, kasan);
 }
 
+void *jit_data_alloc(size_t len)
+{
+   unsigned int align = jit_alloc_params.alignment;
+   pgprot_t pgprot = jit_alloc_params.data.pgprot;
+   unsigned long start = jit_alloc_params.data.start;
+   unsigned long end = jit_alloc_params.data.end;
+   unsigned long fallback_start = jit_alloc_params.data.fallback_start;
+   unsigned long fallback_end = jit_alloc_params.data.fallback_end;
+   bool kasan = jit_alloc_params.flags & JIT_ALLOC_KASAN_SHADOW;
+
+   return jit_alloc(len, align, pgprot, start, end,
+fallback_start, fallback_end, kasan);
+}
+
 struct jit_alloc_params * __weak jit_alloc_arch_params(void)
 {
return NULL;
@@ -88,6 +102,23 @@ static bool jit_alloc_validate_params(struct 
jit_alloc_params *p)
return true;
 }
 
+static void jit_alloc_init_missing(struct jit_alloc_params *p)
+{
+

[PATCH 05/13] module, jitalloc: drop module_alloc

2023-06-01 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Define default parameters for address range for code allocations
using the current values in module_alloc() and make jit_text_alloc() use
these defaults when an architecure does not supply its specific
parameters.

With this, jit_text_alloc() implements memory allocation in a way
compatible with module_alloc() and can be used as a replacement for
module_alloc().

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/arm64/kernel/module.c   |  2 +-
 arch/s390/kernel/module.c|  2 +-
 arch/x86/kernel/module.c |  2 +-
 include/linux/jitalloc.h |  8 
 include/linux/moduleloader.h | 12 
 kernel/module/main.c |  7 ---
 mm/jitalloc.c| 31 +--
 7 files changed, 28 insertions(+), 36 deletions(-)

diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index ecf1f4030317..91ffcff5a44c 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -24,7 +24,7 @@
 #include 
 
 static struct jit_alloc_params jit_alloc_params = {
-   .alignment  = MODULE_ALIGN,
+   .alignment  = JIT_ALLOC_ALIGN,
.flags  = JIT_ALLOC_KASAN_SHADOW,
 };
 
diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index 0986a1a1b261..3f85cf1e7c4e 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -56,7 +56,7 @@ static unsigned long get_module_load_offset(void)
 }
 
 static struct jit_alloc_params jit_alloc_params = {
-   .alignment  = MODULE_ALIGN,
+   .alignment  = JIT_ALLOC_ALIGN,
.flags  = JIT_ALLOC_KASAN_SHADOW,
.text.pgprot= PAGE_KERNEL,
 };
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index cce84b61a036..cacca613b8bd 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -67,7 +67,7 @@ static unsigned long int get_module_load_offset(void)
 #endif
 
 static struct jit_alloc_params jit_alloc_params = {
-   .alignment  = MODULE_ALIGN,
+   .alignment  = JIT_ALLOC_ALIGN,
.flags  = JIT_ALLOC_KASAN_SHADOW,
 };
 
diff --git a/include/linux/jitalloc.h b/include/linux/jitalloc.h
index 34ee57795a18..823b13706a90 100644
--- a/include/linux/jitalloc.h
+++ b/include/linux/jitalloc.h
@@ -4,6 +4,14 @@
 
 #include 
 
+#if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \
+   !defined(CONFIG_KASAN_VMALLOC)
+#include 
+#define JIT_ALLOC_ALIGN (PAGE_SIZE << KASAN_SHADOW_SCALE_SHIFT)
+#else
+#define JIT_ALLOC_ALIGN PAGE_SIZE
+#endif
+
 /**
  * enum jit_alloc_flags - options for executable memory allocations
  * @JIT_ALLOC_KASAN_SHADOW:allocate kasan shadow
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index b3374342f7af..4321682fe849 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -25,10 +25,6 @@ int module_frob_arch_sections(Elf_Ehdr *hdr,
 /* Additional bytes needed by arch in front of individual sections */
 unsigned int arch_mod_section_prepend(struct module *mod, unsigned int 
section);
 
-/* Allocator used for allocating struct module, core sections and init
-   sections.  Returns NULL on failure. */
-void *module_alloc(unsigned long size);
-
 /* Determines if the section name is an init section (that is only used during
  * module loading).
  */
@@ -113,12 +109,4 @@ void module_arch_cleanup(struct module *mod);
 /* Any cleanup before freeing mod->module_init */
 void module_arch_freeing_init(struct module *mod);
 
-#if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \
-   !defined(CONFIG_KASAN_VMALLOC)
-#include 
-#define MODULE_ALIGN (PAGE_SIZE << KASAN_SHADOW_SCALE_SHIFT)
-#else
-#define MODULE_ALIGN PAGE_SIZE
-#endif
-
 #endif
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 51278c571bcb..dfb7fa109f1a 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1600,13 +1600,6 @@ static void free_modinfo(struct module *mod)
}
 }
 
-void * __weak module_alloc(unsigned long size)
-{
-   return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC, VM_FLUSH_RESET_PERMS,
-   NUMA_NO_NODE, __builtin_return_address(0));
-}
-
 bool __weak module_init_section(const char *name)
 {
return strstarts(name, ".init");
diff --git a/mm/jitalloc.c b/mm/jitalloc.c
index 4e10af7803f7..221940e36b46 100644
--- a/mm/jitalloc.c
+++ b/mm/jitalloc.c
@@ -60,20 +60,16 @@ void jit_free(void *buf)
 
 void *jit_text_alloc(size_t len)
 {
-   if (jit_alloc_params.text.start) {
-   unsigned int align = jit_alloc_params.alignment;
-   pgprot_t pgprot = jit_alloc_params.text.pgprot;
-   unsigned long start = jit_alloc_params.text.start;
-   unsigned long end = jit_alloc_params.text.end;
-   unsigned long fal

[PATCH 04/13] mm/jitalloc, arch: convert remaining overrides of module_alloc to jitalloc

2023-06-01 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Extend jitalloc parameters to accommodate more complex overrides of
module_alloc() by architectures.

This includes specification of a fallback range required by arm, arm64
and powerpc and support for allocation of KASAN shadow required by
arm64, s390 and x86.

The core implementation of jit_alloc() takes care of suppressing warnings
when the initial allocation fails but there is a fallback range defined.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/arm/kernel/module.c | 32 ++--
 arch/arm64/kernel/module.c   | 57 
 arch/powerpc/kernel/module.c | 46 +
 arch/s390/kernel/module.c| 31 
 arch/x86/kernel/module.c | 29 +++---
 include/linux/jitalloc.h | 14 +
 mm/jitalloc.c| 44 
 7 files changed, 138 insertions(+), 115 deletions(-)

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index d59c36dc0494..83ccbf98164f 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -34,23 +35,22 @@
 #endif
 
 #ifdef CONFIG_MMU
-void *module_alloc(unsigned long size)
+static struct jit_alloc_params jit_alloc_params = {
+   .alignment  = 1,
+   .text.start = MODULES_VADDR,
+   .text.end   = MODULES_END,
+};
+
+struct jit_alloc_params *jit_alloc_arch_params(void)
 {
-   gfp_t gfp_mask = GFP_KERNEL;
-   void *p;
-
-   /* Silence the initial allocation */
-   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS))
-   gfp_mask |= __GFP_NOWARN;
-
-   p = __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   gfp_mask, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
-   if (!IS_ENABLED(CONFIG_ARM_MODULE_PLTS) || p)
-   return p;
-   return __vmalloc_node_range(size, 1,  VMALLOC_START, VMALLOC_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   jit_alloc_params.text.pgprot = PAGE_KERNEL_EXEC;
+
+   if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) {
+   jit_alloc_params.text.fallback_start = VMALLOC_START;
+   jit_alloc_params.text.fallback_end = VMALLOC_END;
+   }
+
+   return _alloc_params;
 }
 #endif
 
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 5af4975caeb5..ecf1f4030317 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -17,56 +17,49 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 
-void *module_alloc(unsigned long size)
+static struct jit_alloc_params jit_alloc_params = {
+   .alignment  = MODULE_ALIGN,
+   .flags  = JIT_ALLOC_KASAN_SHADOW,
+};
+
+struct jit_alloc_params *jit_alloc_arch_params(void)
 {
u64 module_alloc_end = module_alloc_base + MODULES_VSIZE;
-   gfp_t gfp_mask = GFP_KERNEL;
-   void *p;
-
-   /* Silence the initial allocation */
-   if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
-   gfp_mask |= __GFP_NOWARN;
 
if (IS_ENABLED(CONFIG_KASAN_GENERIC) ||
IS_ENABLED(CONFIG_KASAN_SW_TAGS))
/* don't exceed the static module region - see below */
module_alloc_end = MODULES_END;
 
-   p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
-   module_alloc_end, gfp_mask, PAGE_KERNEL, 
VM_DEFER_KMEMLEAK,
-   NUMA_NO_NODE, __builtin_return_address(0));
+   jit_alloc_params.text.pgprot = PAGE_KERNEL;
+   jit_alloc_params.text.start = module_alloc_base;
+   jit_alloc_params.text.end = module_alloc_end;
 
-   if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
+   /*
+* KASAN without KASAN_VMALLOC can only deal with module
+* allocations being served from the reserved module region,
+* since the remainder of the vmalloc region is already
+* backed by zero shadow pages, and punching holes into it
+* is non-trivial. Since the module region is not randomized
+* when KASAN is enabled without KASAN_VMALLOC, it is even
+* less likely that the module region gets exhausted, so we
+* can simply omit this fallback in that case.
+*/
+   if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
(IS_ENABLED(CONFIG_KASAN_VMALLOC) ||
 (!IS_ENABLED(CONFIG_KASAN_GENERIC) &&
- !IS_ENABLED(CONFIG_KASAN_SW_TAGS
-   /*
-* KASAN without KASAN_VMALLOC can only deal with module
-* allocations being served from the reserved module region,
-* since the

[PATCH 03/13] mm/jitalloc, arch: convert simple overrides of module_alloc to jitalloc

2023-06-01 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Several architectures override module_alloc() only to define address
range for code allocations different than VMALLOC address space.

Provide a generic implementation in jitalloc that uses the parameters
for address space ranges, required alignment and page protections
provided by architectures.

The architecures must fill jit_alloc_params structure and implement
jit_alloc_arch_params() that returns a pointer to that structure. This
way the jitalloc initialization won't be called from every architecure,
but rather from a central place, namely initialization of the core
memory management.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/loongarch/kernel/module.c | 14 --
 arch/mips/kernel/module.c  | 16 ---
 arch/nios2/kernel/module.c | 15 ++
 arch/parisc/kernel/module.c| 18 
 arch/riscv/kernel/module.c | 16 +++
 arch/sparc/kernel/module.c | 39 +++---
 include/linux/jitalloc.h   | 31 +
 mm/jitalloc.c  | 51 ++
 mm/mm_init.c   |  2 ++
 9 files changed, 156 insertions(+), 46 deletions(-)

diff --git a/arch/loongarch/kernel/module.c b/arch/loongarch/kernel/module.c
index b8b86088b2dd..1d5e00874ae7 100644
--- a/arch/loongarch/kernel/module.c
+++ b/arch/loongarch/kernel/module.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -469,10 +470,17 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char 
*strtab,
return 0;
 }
 
-void *module_alloc(unsigned long size)
+static struct jit_alloc_params jit_alloc_params = {
+   .alignment  = 1,
+   .text.pgprot= PAGE_KERNEL,
+};
+
+struct jit_alloc_params *jit_alloc_arch_params(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE, 
__builtin_return_address(0));
+   jit_alloc_params.text.start = MODULES_VADDR;
+   jit_alloc_params.text.end = MODULES_END;
+
+   return _alloc_params;
 }
 
 static void module_init_ftrace_plt(const Elf_Ehdr *hdr,
diff --git a/arch/mips/kernel/module.c b/arch/mips/kernel/module.c
index 0c936cbf20c5..f762c697ab9c 100644
--- a/arch/mips/kernel/module.c
+++ b/arch/mips/kernel/module.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 extern void jump_label_apply_nops(struct module *mod);
 
@@ -33,11 +34,18 @@ static LIST_HEAD(dbe_list);
 static DEFINE_SPINLOCK(dbe_lock);
 
 #ifdef MODULE_START
-void *module_alloc(unsigned long size)
+
+static struct jit_alloc_params jit_alloc_params = {
+   .alignment  = 1,
+   .text.start = MODULE_START,
+   .text.end   = MODULE_END,
+};
+
+struct jit_alloc_params *jit_alloc_arch_params(void)
 {
-   return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
-   GFP_KERNEL, PAGE_KERNEL, 0, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   jit_alloc_params.text.pgprot = PAGE_KERNEL;
+
+   return _alloc_params;
 }
 #endif
 
diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
index 9c97b7513853..b41d52775ec2 100644
--- a/arch/nios2/kernel/module.c
+++ b/arch/nios2/kernel/module.c
@@ -18,15 +18,20 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
-void *module_alloc(unsigned long size)
+static struct jit_alloc_params jit_alloc_params = {
+   .alignment  = 1,
+   .text.pgprot= PAGE_KERNEL_EXEC,
+   .text.start = MODULES_VADDR,
+   .text.end   = MODULES_END,
+};
+
+struct jit_alloc_params *jit_alloc_arch_params(void)
 {
-   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
-   GFP_KERNEL, PAGE_KERNEL_EXEC,
-   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
-   __builtin_return_address(0));
+   return _alloc_params;
 }
 
 int apply_relocate_add(Elf32_Shdr *sechdrs, const char *strtab,
diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
index f6e38c4d3904..49fdf741fd24 100644
--- a/arch/parisc/kernel/module.c
+++ b/arch/parisc/kernel/module.c
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -173,15 +174,20 @@ static inline int reassemble_22(int as22)
((as22 & 0x0003ff) << 3));
 }
 
-void *module_alloc(unsigned long size)
-{
+static struct jit_alloc_params jit_alloc_params = {
+   .alignment  = 1,
/* using RWX means less protection for modules, but it's
 * easier than trying to map the text, data, init_text and
 * init_data correctly */
-   return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
-   GFP_KERNEL,
-   PAGE_KERNEL_

[PATCH 02/13] mm: introduce jit_text_alloc() and use it instead of module_alloc()

2023-06-01 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

module_alloc() is used everywhere as a mean to allocate memory for code.

Beside being semantically wrong, this unnecessarily ties all subsystmes
that need to allocate code, such as ftrace, kprobes and BPF to modules
and puts the burden of code allocation to the modules code.

Several architectures override module_alloc() because of various
constraints where the executable memory can be located and this causes
additional obstacles for improvements of code allocation.

Start splitting code allocation from modules by introducing
jit_text_alloc() and jit_free() APIs.

Start with making jit_text_alloc() a wrapper for module_alloc() and
jit_free() a replacement of module_memfree() to allow updating all call
sites to use the new APIs.

The name jit_text_alloc() emphasizes that the allocated memory is for
executable code, the allocations of the associated data, like data sections
of a module will use jit_data_alloc() interface that will be added later.

Signed-off-by: Mike Rapoport (IBM) 
---
 arch/powerpc/kernel/kprobes.c|  4 ++--
 arch/s390/kernel/ftrace.c|  4 ++--
 arch/s390/kernel/kprobes.c   |  4 ++--
 arch/s390/kernel/module.c|  5 +++--
 arch/sparc/net/bpf_jit_comp_32.c |  8 
 arch/x86/kernel/ftrace.c |  6 +++---
 arch/x86/kernel/kprobes/core.c   |  4 ++--
 include/linux/jitalloc.h | 10 ++
 include/linux/moduleloader.h |  3 ---
 kernel/bpf/core.c| 14 +++---
 kernel/kprobes.c |  8 
 kernel/module/Kconfig|  1 +
 kernel/module/main.c | 23 +++
 mm/Kconfig   |  3 +++
 mm/Makefile  |  1 +
 mm/jitalloc.c| 20 
 16 files changed, 71 insertions(+), 47 deletions(-)
 create mode 100644 include/linux/jitalloc.h
 create mode 100644 mm/jitalloc.c

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index b20ee72e873a..e5835b148ec4 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -19,8 +19,8 @@
 #include 
 #include 
 #include 
-#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -130,7 +130,7 @@ void *alloc_insn_page(void)
 {
void *page;
 
-   page = module_alloc(PAGE_SIZE);
+   page = jit_text_alloc(PAGE_SIZE);
if (!page)
return NULL;
 
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index c46381ea04ec..6e50a88b9b5d 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -7,13 +7,13 @@
  *   Author(s): Martin Schwidefsky 
  */
 
-#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -220,7 +220,7 @@ static int __init ftrace_plt_init(void)
 {
const char *start, *end;
 
-   ftrace_plt = module_alloc(PAGE_SIZE);
+   ftrace_plt = jit_text_alloc(PAGE_SIZE);
if (!ftrace_plt)
panic("cannot allocate ftrace plt\n");
 
diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
index d4b863ed0aa7..3804945f212f 100644
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@ -9,7 +9,6 @@
 
 #define pr_fmt(fmt) "kprobes: " fmt
 
-#include 
 #include 
 #include 
 #include 
@@ -21,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -38,7 +38,7 @@ void *alloc_insn_page(void)
 {
void *page;
 
-   page = module_alloc(PAGE_SIZE);
+   page = jit_text_alloc(PAGE_SIZE);
if (!page)
return NULL;
set_memory_rox((unsigned long)page, 1);
diff --git a/arch/s390/kernel/module.c b/arch/s390/kernel/module.c
index f1b35dcdf3eb..d4844cfe3d7e 100644
--- a/arch/s390/kernel/module.c
+++ b/arch/s390/kernel/module.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -76,7 +77,7 @@ void *module_alloc(unsigned long size)
 #ifdef CONFIG_FUNCTION_TRACER
 void module_arch_cleanup(struct module *mod)
 {
-   module_memfree(mod->arch.trampolines_start);
+   jit_free(mod->arch.trampolines_start);
 }
 #endif
 
@@ -509,7 +510,7 @@ static int module_alloc_ftrace_hotpatch_trampolines(struct 
module *me,
 
size = FTRACE_HOTPATCH_TRAMPOLINES_SIZE(s->sh_size);
numpages = DIV_ROUND_UP(size, PAGE_SIZE);
-   start = module_alloc(numpages * PAGE_SIZE);
+   start = jit_text_alloc(numpages * PAGE_SIZE);
if (!start)
return -ENOMEM;
set_memory_rox((unsigned long)start, numpages);
diff --git a/arch/sparc/net/bpf_jit_comp_32.c b/arch/sparc/net/bpf_jit_comp_32.c
index a74e5004c6c8..068be1097d1a 100644
--- a/arch/sparc/net/bpf_jit_comp_32.c
+++ b/arch/sparc/net/bpf_jit_comp_32.c
@@ -1,10 +1,10 @@
 // SPDX-License-Identifier: GPL-2.0
-#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include

[PATCH 01/13] nios2: define virtual address space for modules

2023-06-01 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

nios2 uses kmalloc() to implement module_alloc() because CALL26/PCREL26
cannot reach all of vmalloc address space.

Define module space as 32MiB below the kernel base and switch nios2 to
use vmalloc for module allocations.

Suggested-by: Thomas Gleixner 
Signed-off-by: Mike Rapoport (IBM) 
---
 arch/nios2/include/asm/pgtable.h |  5 -
 arch/nios2/kernel/module.c   | 19 ---
 2 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
index 0f5c2564e9f5..0073b289c6a4 100644
--- a/arch/nios2/include/asm/pgtable.h
+++ b/arch/nios2/include/asm/pgtable.h
@@ -25,7 +25,10 @@
 #include 
 
 #define VMALLOC_START  CONFIG_NIOS2_KERNEL_MMU_REGION_BASE
-#define VMALLOC_END(CONFIG_NIOS2_KERNEL_REGION_BASE - 1)
+#define VMALLOC_END(CONFIG_NIOS2_KERNEL_REGION_BASE - SZ_32M - 1)
+
+#define MODULES_VADDR  (CONFIG_NIOS2_KERNEL_REGION_BASE - SZ_32M)
+#define MODULES_END(CONFIG_NIOS2_KERNEL_REGION_BASE - 1)
 
 struct mm_struct;
 
diff --git a/arch/nios2/kernel/module.c b/arch/nios2/kernel/module.c
index 76e0a42d6e36..9c97b7513853 100644
--- a/arch/nios2/kernel/module.c
+++ b/arch/nios2/kernel/module.c
@@ -21,23 +21,12 @@
 
 #include 
 
-/*
- * Modules should NOT be allocated with kmalloc for (obvious) reasons.
- * But we do it for now to avoid relocation issues. CALL26/PCREL26 cannot reach
- * from 0x8000 (vmalloc area) to 0xc (kernel) (kmalloc returns
- * addresses in 0xc000)
- */
 void *module_alloc(unsigned long size)
 {
-   if (size == 0)
-   return NULL;
-   return kmalloc(size, GFP_KERNEL);
-}
-
-/* Free memory returned from module_alloc */
-void module_memfree(void *module_region)
-{
-   kfree(module_region);
+   return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
+   GFP_KERNEL, PAGE_KERNEL_EXEC,
+   VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
+   __builtin_return_address(0));
 }
 
 int apply_relocate_add(Elf32_Shdr *sechdrs, const char *strtab,
-- 
2.35.1

[PATCH 00/13] mm: jit/text allocator

2023-06-01 Thread Mike Rapoport

From: "Mike Rapoport (IBM)" 

Hi,

module_alloc() is used everywhere as a mean to allocate memory for code.

Beside being semantically wrong, this unnecessarily ties all subsystmes
that need to allocate code, such as ftrace, kprobes and BPF to modules
and puts the burden of code allocation to the modules code.

Several architectures override module_alloc() because of various
constraints where the executable memory can be located and this causes
additional obstacles for improvements of code allocation.

This set splits code allocation from modules by introducing
jit_text_alloc(), jit_data_alloc() and jit_free() APIs, replaces call
sites of module_alloc() and module_memfree() with the new APIs and
implements core text and related allocation in a central place.

Instead of architecture specific overrides for module_alloc(), the
architectures that require non-default behaviour for text allocation must
fill jit_alloc_params structure and implement jit_alloc_arch_params() that
returns a pointer to that structure. If an architecture does not implement
jit_alloc_arch_params(), the defaults compatible with the current
modules::module_alloc() are used.

The new jitalloc infrastructure allows decoupling of kprobes and ftrace
from modules, and most importantly it enables ROX allocations for
executable memory.

A centralized infrastructure for code allocation allows future
optimizations for allocations of executable memory, caching large pages for
better iTLB performance and providing sub-page allocations for users that
only need small jit code snippets.

patches 1-5: split out the code allocation from modules and arch
patch 6: add dedicated API for data allocations with constraints similar to
code allocations
patches 7-9: decouple dynamic ftrace and kprobes form CONFIG_MODULES
patches 10-13: enable ROX allocations for executable memory on x86

Mike Rapoport (IBM) (11):
  nios2: define virtual address space for modules
  mm: introduce jit_text_alloc() and use it instead of module_alloc()
  mm/jitalloc, arch: convert simple overrides of module_alloc to jitalloc
  mm/jitalloc, arch: convert remaining overrides of module_alloc to jitalloc
  module, jitalloc: drop module_alloc
  mm/jitalloc: introduce jit_data_alloc()
  x86/ftrace: enable dynamic ftrace without CONFIG_MODULES
  arch: make jitalloc setup available regardless of CONFIG_MODULES
  kprobes: remove dependcy on CONFIG_MODULES
  modules, jitalloc: prepare to allocate executable memory as ROX
  x86/jitalloc: make memory allocated for code ROX

Song Liu (2):
  ftrace: Add swap_func to ftrace_process_locs()
  x86/jitalloc: prepare to allocate exectuatble memory as ROX

 arch/Kconfig |   5 +-
 arch/arm/kernel/module.c |  32 --
 arch/arm/mm/init.c   |  35 ++
 arch/arm64/kernel/module.c   |  47 
 arch/arm64/mm/init.c |  42 +++
 arch/loongarch/kernel/module.c   |   6 -
 arch/loongarch/mm/init.c |  16 +++
 arch/mips/kernel/module.c|   9 --
 arch/mips/mm/init.c  |  19 
 arch/nios2/include/asm/pgtable.h |   5 +-
 arch/nios2/kernel/module.c   |  24 ++--
 arch/parisc/kernel/module.c  |  11 --
 arch/parisc/mm/init.c|  21 +++-
 arch/powerpc/kernel/kprobes.c|   4 +-
 arch/powerpc/kernel/module.c |  37 ---
 arch/powerpc/mm/mem.c|  41 +++
 arch/riscv/kernel/module.c   |  10 --
 arch/riscv/mm/init.c |  18 +++
 arch/s390/kernel/ftrace.c|   4 +-
 arch/s390/kernel/kprobes.c   |   4 +-
 arch/s390/kernel/module.c|  46 +---
 arch/s390/mm/init.c  |  35 ++
 arch/sparc/kernel/module.c   |  34 +-
 arch/sparc/mm/Makefile   |   2 +
 arch/sparc/mm/jitalloc.c |  21 
 arch/sparc/net/bpf_jit_comp_32.c |   8 +-
 arch/x86/Kconfig |   2 +
 arch/x86/kernel/alternative.c|  43 ---
 arch/x86/kernel/ftrace.c |  59 +-
 arch/x86/kernel/kprobes/core.c   |   4 +-
 arch/x86/kernel/module.c |  75 +
 arch/x86/kernel/static_call.c|  10 +-
 arch/x86/kernel/unwind_orc.c |  13 ++-
 arch/x86/mm/init.c   |  52 +
 arch/x86/net/bpf_jit_comp.c  |  22 +++-
 include/linux/ftrace.h   |   2 +
 include/linux/jitalloc.h |  69 
 include/linux/moduleloader.h |  15 ---
 kernel/bpf/core.c|  14 +--
 kernel/kprobes.c |  51 +
 kernel/module/Kconfig|   1 +
 kernel/module/main.c |  56 --
 kernel/trace/ftrace.c|  13 ++-
 kernel/trace/trace_kprobe.c  |  11 ++
 mm/Kconfig   |   3 +
 mm/Makefile  |   1 +
 mm/jitalloc.c| 185 +++
 mm/mm_init.c |   2 +
 48 files changed, 777 insertions(+), 462 deletions(-)
 create mode 100644 arch/sparc/mm/jitalloc.c
 create mode 100644 inc

[LincolnTalk] Lincoln Public Library Weekly Update - May 31, 2023

2023-05-31 Thread Robin Rapoport

tion with Mass DPH Recorded January 26, 2023
<https://www.youtube.com/watch?v=Yo4a803fO1U>



The Ransomware Hunting Team: A Band of Misfits' Improbable Crusade to Save
the World from Cybercrime with Renee Dudley and Daniel Golden Recorded
January 25, 2023 <https://www.youtube.com/watch?v=ntTb1q5zu84>



Maureen Johnson ("Nine Liars") in Conversation with Jennifer Lynn Barnes
("The Final Gambit) Recorded January 4, 2023
<https://www.youtube.com/watch?v=Xlh-sKaQ42w>


Robin Rapoport (she/her)
Reference Librarian (in the library Monday, Tuesday, and Wednesday)
Lincoln Public Library
3 Bedford Road
Lincoln, MA 01773

Currently Reading
<https://www.goodreads.com/user/email_signature_destination/1167648?utm_medium=reading_link_source=email_signature>
[image: Book Cover]
<https://www.goodreads.com/user/email_signature_destination/1167648?utm_medium=cover_source=email_signature>
[image: Goodreads Logo]
<https://www.goodreads.com/?utm_medium=gr_logo_source=email_signature>Get
your own email signature
<https://www.goodreads.com/user/email_signature_instructions?utm_medium=gyo_link_source=email_signature>
-- 
The LincolnTalk mailing list.
To post, send mail to Lincoln@lincolntalk.org.
Browse the archives at https://pairlist9.pair.net/mailman/private/lincoln/.
Change your subscription settings at 
https://pairlist9.pair.net/mailman/listinfo/lincoln.

Re: [PATCH v2 05/34] mm: add utility functions for ptdesc

2023-05-27 Thread Mike Rapoport

On Sat, May 27, 2023 at 04:09:31PM +0100, Matthew Wilcox wrote:
> On Sat, May 27, 2023 at 01:41:44PM +0300, Mike Rapoport wrote:
> > Sorry if I wasn't clear, by "page table page" I meant the page (or memory
> > for that matter) for actual page table rather than struct page describing
> > that memory.
> > 
> > So what we allocate here is the actual memory for the page tables and not
> > the memory for the metadata. That's why I think the name ptdesc_alloc is
> > confusing.
> 
> But that's going to be the common pattern in the Glorious Future.
> You allocate a folio and that includes both the folio memory descriptor
> and the 2^n pages of memory described by that folio.  Similarly for all
> the other memory descriptors.

I'm not arguing with that, I'm not happy about the naming. IMO, the name
should reflect that we allocate memory for page tables rather than for the
descriptor of that memory, say pgtable_alloc() or page_table_alloc().

-- 
Sincerely yours,
Mike.

Re: [PATCH v2 05/34] mm: add utility functions for ptdesc

2023-05-27 Thread Mike Rapoport

On Sat, May 27, 2023 at 04:09:31PM +0100, Matthew Wilcox wrote:
> On Sat, May 27, 2023 at 01:41:44PM +0300, Mike Rapoport wrote:
> > Sorry if I wasn't clear, by "page table page" I meant the page (or memory
> > for that matter) for actual page table rather than struct page describing
> > that memory.
> > 
> > So what we allocate here is the actual memory for the page tables and not
> > the memory for the metadata. That's why I think the name ptdesc_alloc is
> > confusing.
> 
> But that's going to be the common pattern in the Glorious Future.
> You allocate a folio and that includes both the folio memory descriptor
> and the 2^n pages of memory described by that folio.  Similarly for all
> the other memory descriptors.

I'm not arguing with that, I'm not happy about the naming. IMO, the name
should reflect that we allocate memory for page tables rather than for the
descriptor of that memory, say pgtable_alloc() or page_table_alloc().

-- 
Sincerely yours,
Mike.

Re: [PATCH v2 05/34] mm: add utility functions for ptdesc

2023-05-27 Thread Mike Rapoport

On Sat, May 27, 2023 at 04:09:31PM +0100, Matthew Wilcox wrote:
> On Sat, May 27, 2023 at 01:41:44PM +0300, Mike Rapoport wrote:
> > Sorry if I wasn't clear, by "page table page" I meant the page (or memory
> > for that matter) for actual page table rather than struct page describing
> > that memory.
> > 
> > So what we allocate here is the actual memory for the page tables and not
> > the memory for the metadata. That's why I think the name ptdesc_alloc is
> > confusing.
> 
> But that's going to be the common pattern in the Glorious Future.
> You allocate a folio and that includes both the folio memory descriptor
> and the 2^n pages of memory described by that folio.  Similarly for all
> the other memory descriptors.

I'm not arguing with that, I'm not happy about the naming. IMO, the name
should reflect that we allocate memory for page tables rather than for the
descriptor of that memory, say pgtable_alloc() or page_table_alloc().

-- 
Sincerely yours,
Mike.

___
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [PATCH v2 05/34] mm: add utility functions for ptdesc

2023-05-27 Thread Mike Rapoport

On Thu, May 25, 2023 at 01:53:24PM -0700, Vishal Moola wrote:
> On Thu, May 25, 2023 at 1:26 PM Mike Rapoport  wrote:
> >
> > On Thu, May 25, 2023 at 11:04:28AM -0700, Vishal Moola wrote:
> > > On Thu, May 25, 2023 at 2:10 AM Mike Rapoport  wrote:
> > > > > +
> > > > > +static inline struct ptdesc *ptdesc_alloc(gfp_t gfp, unsigned int 
> > > > > order)
> > > > > +{
> > > > > + struct page *page = alloc_pages(gfp | __GFP_COMP, order);
> > > > > +
> > > > > + return page_ptdesc(page);
> > > > > +}
> > > > > +
> > > > > +static inline void ptdesc_free(struct ptdesc *pt)
> > > > > +{
> > > > > + struct page *page = ptdesc_page(pt);
> > > > > +
> > > > > + __free_pages(page, compound_order(page));
> > > > > +}
> > > >
> > > > The ptdesc_{alloc,free} API does not sound right to me. The name
> > > > ptdesc_alloc() implies the allocation of the ptdesc itself, rather than
> > > > allocation of page table page. The same goes for free.
> > >
> > > I'm not sure I see the difference. Could you elaborate?
> >
> > I read ptdesc_alloc() as "allocate a ptdesc" rather than as "allocate a
> > page for page table and return ptdesc pointing to that page". Seems very
> > confusing to me already and it will be even more confusion when we'll start
> > allocating actual ptdescs.
> 
> Hmm, I see what you're saying. I'm envisioning this function evolving into
> one that allocates a ptdesc later. I don't see why we would need to have both 
> a
> page table page AND ptdesc at any point, but that may be a lack of knowledge
> from my part.

Sorry if I wasn't clear, by "page table page" I meant the page (or memory
for that matter) for actual page table rather than struct page describing
that memory.

So what we allocate here is the actual memory for the page tables and not
the memory for the metadata. That's why I think the name ptdesc_alloc is
confusing.
 
> I was thinking later, if necessary, we could make another function
> (only to be used internally) to allocate page table pages.

-- 
Sincerely yours,
Mike.

Re: [PATCH v2 05/34] mm: add utility functions for ptdesc

2023-05-27 Thread Mike Rapoport

On Thu, May 25, 2023 at 01:53:24PM -0700, Vishal Moola wrote:
> On Thu, May 25, 2023 at 1:26 PM Mike Rapoport  wrote:
> >
> > On Thu, May 25, 2023 at 11:04:28AM -0700, Vishal Moola wrote:
> > > On Thu, May 25, 2023 at 2:10 AM Mike Rapoport  wrote:
> > > > > +
> > > > > +static inline struct ptdesc *ptdesc_alloc(gfp_t gfp, unsigned int 
> > > > > order)
> > > > > +{
> > > > > + struct page *page = alloc_pages(gfp | __GFP_COMP, order);
> > > > > +
> > > > > + return page_ptdesc(page);
> > > > > +}
> > > > > +
> > > > > +static inline void ptdesc_free(struct ptdesc *pt)
> > > > > +{
> > > > > + struct page *page = ptdesc_page(pt);
> > > > > +
> > > > > + __free_pages(page, compound_order(page));
> > > > > +}
> > > >
> > > > The ptdesc_{alloc,free} API does not sound right to me. The name
> > > > ptdesc_alloc() implies the allocation of the ptdesc itself, rather than
> > > > allocation of page table page. The same goes for free.
> > >
> > > I'm not sure I see the difference. Could you elaborate?
> >
> > I read ptdesc_alloc() as "allocate a ptdesc" rather than as "allocate a
> > page for page table and return ptdesc pointing to that page". Seems very
> > confusing to me already and it will be even more confusion when we'll start
> > allocating actual ptdescs.
> 
> Hmm, I see what you're saying. I'm envisioning this function evolving into
> one that allocates a ptdesc later. I don't see why we would need to have both 
> a
> page table page AND ptdesc at any point, but that may be a lack of knowledge
> from my part.

Sorry if I wasn't clear, by "page table page" I meant the page (or memory
for that matter) for actual page table rather than struct page describing
that memory.

So what we allocate here is the actual memory for the page tables and not
the memory for the metadata. That's why I think the name ptdesc_alloc is
confusing.
 
> I was thinking later, if necessary, we could make another function
> (only to be used internally) to allocate page table pages.

-- 
Sincerely yours,
Mike.

Re: [PATCH v2 05/34] mm: add utility functions for ptdesc

2023-05-27 Thread Mike Rapoport

On Thu, May 25, 2023 at 01:53:24PM -0700, Vishal Moola wrote:
> On Thu, May 25, 2023 at 1:26 PM Mike Rapoport  wrote:
> >
> > On Thu, May 25, 2023 at 11:04:28AM -0700, Vishal Moola wrote:
> > > On Thu, May 25, 2023 at 2:10 AM Mike Rapoport  wrote:
> > > > > +
> > > > > +static inline struct ptdesc *ptdesc_alloc(gfp_t gfp, unsigned int 
> > > > > order)
> > > > > +{
> > > > > + struct page *page = alloc_pages(gfp | __GFP_COMP, order);
> > > > > +
> > > > > + return page_ptdesc(page);
> > > > > +}
> > > > > +
> > > > > +static inline void ptdesc_free(struct ptdesc *pt)
> > > > > +{
> > > > > + struct page *page = ptdesc_page(pt);
> > > > > +
> > > > > + __free_pages(page, compound_order(page));
> > > > > +}
> > > >
> > > > The ptdesc_{alloc,free} API does not sound right to me. The name
> > > > ptdesc_alloc() implies the allocation of the ptdesc itself, rather than
> > > > allocation of page table page. The same goes for free.
> > >
> > > I'm not sure I see the difference. Could you elaborate?
> >
> > I read ptdesc_alloc() as "allocate a ptdesc" rather than as "allocate a
> > page for page table and return ptdesc pointing to that page". Seems very
> > confusing to me already and it will be even more confusion when we'll start
> > allocating actual ptdescs.
> 
> Hmm, I see what you're saying. I'm envisioning this function evolving into
> one that allocates a ptdesc later. I don't see why we would need to have both 
> a
> page table page AND ptdesc at any point, but that may be a lack of knowledge
> from my part.

Sorry if I wasn't clear, by "page table page" I meant the page (or memory
for that matter) for actual page table rather than struct page describing
that memory.

So what we allocate here is the actual memory for the page tables and not
the memory for the metadata. That's why I think the name ptdesc_alloc is
confusing.
 
> I was thinking later, if necessary, we could make another function
> (only to be used internally) to allocate page table pages.

-- 
Sincerely yours,
Mike.

___
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [PATCH v2 05/34] mm: add utility functions for ptdesc

2023-05-25 Thread Mike Rapoport

On Thu, May 25, 2023 at 11:04:28AM -0700, Vishal Moola wrote:
> On Thu, May 25, 2023 at 2:10 AM Mike Rapoport  wrote:
> > > +
> > > +static inline struct ptdesc *ptdesc_alloc(gfp_t gfp, unsigned int order)
> > > +{
> > > + struct page *page = alloc_pages(gfp | __GFP_COMP, order);
> > > +
> > > + return page_ptdesc(page);
> > > +}
> > > +
> > > +static inline void ptdesc_free(struct ptdesc *pt)
> > > +{
> > > + struct page *page = ptdesc_page(pt);
> > > +
> > > + __free_pages(page, compound_order(page));
> > > +}
> >
> > The ptdesc_{alloc,free} API does not sound right to me. The name
> > ptdesc_alloc() implies the allocation of the ptdesc itself, rather than
> > allocation of page table page. The same goes for free.
> 
> I'm not sure I see the difference. Could you elaborate?

I read ptdesc_alloc() as "allocate a ptdesc" rather than as "allocate a
page for page table and return ptdesc pointing to that page". Seems very
confusing to me already and it will be even more confusion when we'll start
allocating actual ptdescs.
 
-- 
Sincerely yours,
Mike.

___
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [PATCH v2 05/34] mm: add utility functions for ptdesc

2023-05-25 Thread Mike Rapoport

On Thu, May 25, 2023 at 11:04:28AM -0700, Vishal Moola wrote:
> On Thu, May 25, 2023 at 2:10 AM Mike Rapoport  wrote:
> > > +
> > > +static inline struct ptdesc *ptdesc_alloc(gfp_t gfp, unsigned int order)
> > > +{
> > > + struct page *page = alloc_pages(gfp | __GFP_COMP, order);
> > > +
> > > + return page_ptdesc(page);
> > > +}
> > > +
> > > +static inline void ptdesc_free(struct ptdesc *pt)
> > > +{
> > > + struct page *page = ptdesc_page(pt);
> > > +
> > > + __free_pages(page, compound_order(page));
> > > +}
> >
> > The ptdesc_{alloc,free} API does not sound right to me. The name
> > ptdesc_alloc() implies the allocation of the ptdesc itself, rather than
> > allocation of page table page. The same goes for free.
> 
> I'm not sure I see the difference. Could you elaborate?

I read ptdesc_alloc() as "allocate a ptdesc" rather than as "allocate a
page for page table and return ptdesc pointing to that page". Seems very
confusing to me already and it will be even more confusion when we'll start
allocating actual ptdescs.
 
-- 
Sincerely yours,
Mike.

< 1 2 3 4 5 6 7 8 9 10 >

501 - 600 of 6442 matches

Mail list logo