Re: [PATCH RFC v3 11/35] mm: Allow an arch to hook into folio allocation when VMA is known

2024-01-31 Thread Alexandru Elisei
Hi,

On Wed, Jan 31, 2024 at 12:23:51PM +0530, Anshuman Khandual wrote:
> 
> 
> On 1/30/24 17:04, Alexandru Elisei wrote:
> > Hi,
> > 
> > On Tue, Jan 30, 2024 at 03:25:20PM +0530, Anshuman Khandual wrote:
> >>
> >> On 1/25/24 22:12, Alexandru Elisei wrote:
> >>> arm64 uses VM_HIGH_ARCH_0 and VM_HIGH_ARCH_1 for enabling MTE for a VMA.
> >>> When VM_HIGH_ARCH_0, which arm64 renames to VM_MTE, is set for a VMA, and
> >>> the gfp flag __GFP_ZERO is present, the __GFP_ZEROTAGS gfp flag also gets
> >>> set in vma_alloc_zeroed_movable_folio().
> >>>
> >>> Expand this to be more generic by adding an arch hook that modifes the gfp
> >>> flags for an allocation when the VMA is known.
> >>>
> >>> Note that __GFP_ZEROTAGS is ignored by the page allocator unless 
> >>> __GFP_ZERO
> >>> is also set; from that point of view, the current behaviour is unchanged,
> >>> even though the arm64 flag is set in more places.  When arm64 will have
> >>> support to reuse the tag storage for data allocation, the uses of the
> >>> __GFP_ZEROTAGS flag will be expanded to instruct the page allocator to try
> >>> to reserve the corresponding tag storage for the pages being allocated.
> >> Right but how will pushing __GFP_ZEROTAGS addition into gfp_t flags further
> >> down via a new arch call back i.e arch_calc_vma_gfp() while still 
> >> maintaining
> >> (vma->vm_flags & VM_MTE) conditionality improve the current scenario. 
> >> Because
> > I'm afraid I don't follow you.
> 
> I was just asking whether the overall scope of __GFP_ZEROTAGS flag is being
> increased to cover more core MM paths through this patch. I think you have
> already answered that below.
> 
> > 
> >> the page allocator could have still analyzed alloc flags for __GFP_ZEROTAGS
> >> for any additional stuff.
> >>
> >> OR this just adds some new core MM paths to get __GFP_ZEROTAGS which was 
> >> not
> >> the case earlier via this call back.
> > Before this patch: vma_alloc_zeroed_movable_folio() sets __GFP_ZEROTAGS.
> > After this patch: vma_alloc_folio() sets __GFP_ZEROTAGS.
> 
> Understood.
> 
> > 
> > This patch is about adding __GFP_ZEROTAGS for more callers.
> 
> Right, I guess that is the real motivation for this patch. But just wondering
> does this cover all possible anon fault paths for converting given vma_flag's
> VM_MTE flag into page alloc flag __GFP_ZEROTAGS ? Aren't there any other file
> besides (mm/shmem.c) which needs to be changed to include arch_calc_vma_gfp() 
> ?

My thoughts exactly. I went through most of the fault handling code, and
from the code I read, all the allocation were executed with
vma_alloc_folio() or by shmem.

That's not to say there's no scope for improvment, there definitely is, but
since having __GFP_ZEROTAGS isn't necessary for correctness (but it's very
useful for performance, since it can avoid a page fault and a page
migration) and this series is an RFC I settled on changing only the above,
since KVM support for dynamic tag storage also benefits from this change.

The series is very big already, I wanted to settle on an approach that is
acceptable for upstreaming before thinking too much about performance.

Thanks,
Alex



Re: [PATCH RFC v3 11/35] mm: Allow an arch to hook into folio allocation when VMA is known

2024-01-30 Thread Anshuman Khandual



On 1/30/24 17:04, Alexandru Elisei wrote:
> Hi,
> 
> On Tue, Jan 30, 2024 at 03:25:20PM +0530, Anshuman Khandual wrote:
>>
>> On 1/25/24 22:12, Alexandru Elisei wrote:
>>> arm64 uses VM_HIGH_ARCH_0 and VM_HIGH_ARCH_1 for enabling MTE for a VMA.
>>> When VM_HIGH_ARCH_0, which arm64 renames to VM_MTE, is set for a VMA, and
>>> the gfp flag __GFP_ZERO is present, the __GFP_ZEROTAGS gfp flag also gets
>>> set in vma_alloc_zeroed_movable_folio().
>>>
>>> Expand this to be more generic by adding an arch hook that modifes the gfp
>>> flags for an allocation when the VMA is known.
>>>
>>> Note that __GFP_ZEROTAGS is ignored by the page allocator unless __GFP_ZERO
>>> is also set; from that point of view, the current behaviour is unchanged,
>>> even though the arm64 flag is set in more places.  When arm64 will have
>>> support to reuse the tag storage for data allocation, the uses of the
>>> __GFP_ZEROTAGS flag will be expanded to instruct the page allocator to try
>>> to reserve the corresponding tag storage for the pages being allocated.
>> Right but how will pushing __GFP_ZEROTAGS addition into gfp_t flags further
>> down via a new arch call back i.e arch_calc_vma_gfp() while still maintaining
>> (vma->vm_flags & VM_MTE) conditionality improve the current scenario. Because
> I'm afraid I don't follow you.

I was just asking whether the overall scope of __GFP_ZEROTAGS flag is being
increased to cover more core MM paths through this patch. I think you have
already answered that below.

> 
>> the page allocator could have still analyzed alloc flags for __GFP_ZEROTAGS
>> for any additional stuff.
>>
>> OR this just adds some new core MM paths to get __GFP_ZEROTAGS which was not
>> the case earlier via this call back.
> Before this patch: vma_alloc_zeroed_movable_folio() sets __GFP_ZEROTAGS.
> After this patch: vma_alloc_folio() sets __GFP_ZEROTAGS.

Understood.

> 
> This patch is about adding __GFP_ZEROTAGS for more callers.

Right, I guess that is the real motivation for this patch. But just wondering
does this cover all possible anon fault paths for converting given vma_flag's
VM_MTE flag into page alloc flag __GFP_ZEROTAGS ? Aren't there any other file
besides (mm/shmem.c) which needs to be changed to include arch_calc_vma_gfp() ?



Re: [PATCH RFC v3 11/35] mm: Allow an arch to hook into folio allocation when VMA is known

2024-01-30 Thread Alexandru Elisei
Hi,

On Tue, Jan 30, 2024 at 03:25:20PM +0530, Anshuman Khandual wrote:
> 
> 
> On 1/25/24 22:12, Alexandru Elisei wrote:
> > arm64 uses VM_HIGH_ARCH_0 and VM_HIGH_ARCH_1 for enabling MTE for a VMA.
> > When VM_HIGH_ARCH_0, which arm64 renames to VM_MTE, is set for a VMA, and
> > the gfp flag __GFP_ZERO is present, the __GFP_ZEROTAGS gfp flag also gets
> > set in vma_alloc_zeroed_movable_folio().
> > 
> > Expand this to be more generic by adding an arch hook that modifes the gfp
> > flags for an allocation when the VMA is known.
> > 
> > Note that __GFP_ZEROTAGS is ignored by the page allocator unless __GFP_ZERO
> > is also set; from that point of view, the current behaviour is unchanged,
> > even though the arm64 flag is set in more places.  When arm64 will have
> > support to reuse the tag storage for data allocation, the uses of the
> > __GFP_ZEROTAGS flag will be expanded to instruct the page allocator to try
> > to reserve the corresponding tag storage for the pages being allocated.
> 
> Right but how will pushing __GFP_ZEROTAGS addition into gfp_t flags further
> down via a new arch call back i.e arch_calc_vma_gfp() while still maintaining
> (vma->vm_flags & VM_MTE) conditionality improve the current scenario. Because

I'm afraid I don't follow you.

> the page allocator could have still analyzed alloc flags for __GFP_ZEROTAGS
> for any additional stuff.
> 
> OR this just adds some new core MM paths to get __GFP_ZEROTAGS which was not
> the case earlier via this call back.

Before this patch: vma_alloc_zeroed_movable_folio() sets __GFP_ZEROTAGS.
After this patch: vma_alloc_folio() sets __GFP_ZEROTAGS.

This patch is about adding __GFP_ZEROTAGS for more callers.

Thanks,
Alex

> 
> > 
> > The flags returned by arch_calc_vma_gfp() are or'ed with the flags set by
> > the caller; this has been done to keep an architecture from modifying the
> > flags already set by the core memory management code; this is similar to
> > how do_mmap() -> calc_vm_flag_bits() -> arch_calc_vm_flag_bits() has been
> > implemented. This can be revisited in the future if there's a need to do
> > so.
> > 
> > Signed-off-by: Alexandru Elisei 
> > ---
> >  arch/arm64/include/asm/page.h|  5 ++---
> >  arch/arm64/include/asm/pgtable.h |  3 +++
> >  arch/arm64/mm/fault.c| 19 ++-
> >  include/linux/pgtable.h  |  7 +++
> >  mm/mempolicy.c   |  1 +
> >  mm/shmem.c   |  5 -
> >  6 files changed, 23 insertions(+), 17 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> > index 2312e6ee595f..88bab032a493 100644
> > --- a/arch/arm64/include/asm/page.h
> > +++ b/arch/arm64/include/asm/page.h
> > @@ -29,9 +29,8 @@ void copy_user_highpage(struct page *to, struct page 
> > *from,
> >  void copy_highpage(struct page *to, struct page *from);
> >  #define __HAVE_ARCH_COPY_HIGHPAGE
> >  
> > -struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
> > -   unsigned long vaddr);
> > -#define vma_alloc_zeroed_movable_folio vma_alloc_zeroed_movable_folio
> > +#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
> > +   vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false)
> >  
> >  void tag_clear_highpage(struct page *to);
> >  #define __HAVE_ARCH_TAG_CLEAR_HIGHPAGE
> > diff --git a/arch/arm64/include/asm/pgtable.h 
> > b/arch/arm64/include/asm/pgtable.h
> > index 79ce70fbb751..08f0904dbfc2 100644
> > --- a/arch/arm64/include/asm/pgtable.h
> > +++ b/arch/arm64/include/asm/pgtable.h
> > @@ -1071,6 +1071,9 @@ static inline void arch_swap_restore(swp_entry_t 
> > entry, struct folio *folio)
> >  
> >  #endif /* CONFIG_ARM64_MTE */
> >  
> > +#define __HAVE_ARCH_CALC_VMA_GFP
> > +gfp_t arch_calc_vma_gfp(struct vm_area_struct *vma, gfp_t gfp);
> > +
> >  /*
> >   * On AArch64, the cache coherency is handled via the set_pte_at() 
> > function.
> >   */
> > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> > index 55f6455a8284..4d3f0a870ad8 100644
> > --- a/arch/arm64/mm/fault.c
> > +++ b/arch/arm64/mm/fault.c
> > @@ -937,22 +937,15 @@ void do_debug_exception(unsigned long 
> > addr_if_watchpoint, unsigned long esr,
> >  NOKPROBE_SYMBOL(do_debug_exception);
> >  
> >  /*
> > - * Used during anonymous page fault handling.
> > + * If this is called during anonymous page fault handling, and the page is
> > + * mapped with PROT_MTE, initialise the tags at the point of tag zeroing 
> > as this
> > + * is usually faster than separate DC ZVA and STGM.
> >   */
> > -struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
> > -   unsigned long vaddr)
> > +gfp_t arch_calc_vma_gfp(struct vm_area_struct *vma, gfp_t gfp)
> >  {
> > -   gfp_t flags = GFP_HIGHUSER_MOVABLE | __GFP_ZERO;
> > -
> > -   /*
> > -* If the page is mapped with PROT_MTE, initialise the tags at the
> > -* point of 

Re: [PATCH RFC v3 11/35] mm: Allow an arch to hook into folio allocation when VMA is known

2024-01-30 Thread Anshuman Khandual



On 1/25/24 22:12, Alexandru Elisei wrote:
> arm64 uses VM_HIGH_ARCH_0 and VM_HIGH_ARCH_1 for enabling MTE for a VMA.
> When VM_HIGH_ARCH_0, which arm64 renames to VM_MTE, is set for a VMA, and
> the gfp flag __GFP_ZERO is present, the __GFP_ZEROTAGS gfp flag also gets
> set in vma_alloc_zeroed_movable_folio().
> 
> Expand this to be more generic by adding an arch hook that modifes the gfp
> flags for an allocation when the VMA is known.
> 
> Note that __GFP_ZEROTAGS is ignored by the page allocator unless __GFP_ZERO
> is also set; from that point of view, the current behaviour is unchanged,
> even though the arm64 flag is set in more places.  When arm64 will have
> support to reuse the tag storage for data allocation, the uses of the
> __GFP_ZEROTAGS flag will be expanded to instruct the page allocator to try
> to reserve the corresponding tag storage for the pages being allocated.

Right but how will pushing __GFP_ZEROTAGS addition into gfp_t flags further
down via a new arch call back i.e arch_calc_vma_gfp() while still maintaining
(vma->vm_flags & VM_MTE) conditionality improve the current scenario. Because
the page allocator could have still analyzed alloc flags for __GFP_ZEROTAGS
for any additional stuff.

OR this just adds some new core MM paths to get __GFP_ZEROTAGS which was not
the case earlier via this call back.

> 
> The flags returned by arch_calc_vma_gfp() are or'ed with the flags set by
> the caller; this has been done to keep an architecture from modifying the
> flags already set by the core memory management code; this is similar to
> how do_mmap() -> calc_vm_flag_bits() -> arch_calc_vm_flag_bits() has been
> implemented. This can be revisited in the future if there's a need to do
> so.
> 
> Signed-off-by: Alexandru Elisei 
> ---
>  arch/arm64/include/asm/page.h|  5 ++---
>  arch/arm64/include/asm/pgtable.h |  3 +++
>  arch/arm64/mm/fault.c| 19 ++-
>  include/linux/pgtable.h  |  7 +++
>  mm/mempolicy.c   |  1 +
>  mm/shmem.c   |  5 -
>  6 files changed, 23 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> index 2312e6ee595f..88bab032a493 100644
> --- a/arch/arm64/include/asm/page.h
> +++ b/arch/arm64/include/asm/page.h
> @@ -29,9 +29,8 @@ void copy_user_highpage(struct page *to, struct page *from,
>  void copy_highpage(struct page *to, struct page *from);
>  #define __HAVE_ARCH_COPY_HIGHPAGE
>  
> -struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
> - unsigned long vaddr);
> -#define vma_alloc_zeroed_movable_folio vma_alloc_zeroed_movable_folio
> +#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
> + vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false)
>  
>  void tag_clear_highpage(struct page *to);
>  #define __HAVE_ARCH_TAG_CLEAR_HIGHPAGE
> diff --git a/arch/arm64/include/asm/pgtable.h 
> b/arch/arm64/include/asm/pgtable.h
> index 79ce70fbb751..08f0904dbfc2 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -1071,6 +1071,9 @@ static inline void arch_swap_restore(swp_entry_t entry, 
> struct folio *folio)
>  
>  #endif /* CONFIG_ARM64_MTE */
>  
> +#define __HAVE_ARCH_CALC_VMA_GFP
> +gfp_t arch_calc_vma_gfp(struct vm_area_struct *vma, gfp_t gfp);
> +
>  /*
>   * On AArch64, the cache coherency is handled via the set_pte_at() function.
>   */
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 55f6455a8284..4d3f0a870ad8 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -937,22 +937,15 @@ void do_debug_exception(unsigned long 
> addr_if_watchpoint, unsigned long esr,
>  NOKPROBE_SYMBOL(do_debug_exception);
>  
>  /*
> - * Used during anonymous page fault handling.
> + * If this is called during anonymous page fault handling, and the page is
> + * mapped with PROT_MTE, initialise the tags at the point of tag zeroing as 
> this
> + * is usually faster than separate DC ZVA and STGM.
>   */
> -struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
> - unsigned long vaddr)
> +gfp_t arch_calc_vma_gfp(struct vm_area_struct *vma, gfp_t gfp)
>  {
> - gfp_t flags = GFP_HIGHUSER_MOVABLE | __GFP_ZERO;
> -
> - /*
> -  * If the page is mapped with PROT_MTE, initialise the tags at the
> -  * point of allocation and page zeroing as this is usually faster than
> -  * separate DC ZVA and STGM.
> -  */
>   if (vma->vm_flags & VM_MTE)
> - flags |= __GFP_ZEROTAGS;
> -
> - return vma_alloc_folio(flags, 0, vma, vaddr, false);
> + return __GFP_ZEROTAGS;
> + return 0;
>  }
>  
>  void tag_clear_highpage(struct page *page)
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index c5ddec6b5305..98f81ca08cbe 100644
> --- a/include/linux/pgtable.h
> +++ 

Re: [PATCH RFC v3 11/35] mm: Allow an arch to hook into folio allocation when VMA is known

2024-01-29 Thread Alexandru Elisei
Hi Peter,

On Fri, Jan 26, 2024 at 12:00:36PM -0800, Peter Collingbourne wrote:
> On Thu, Jan 25, 2024 at 8:43 AM Alexandru Elisei
>  wrote:
> >
> > arm64 uses VM_HIGH_ARCH_0 and VM_HIGH_ARCH_1 for enabling MTE for a VMA.
> > When VM_HIGH_ARCH_0, which arm64 renames to VM_MTE, is set for a VMA, and
> > the gfp flag __GFP_ZERO is present, the __GFP_ZEROTAGS gfp flag also gets
> > set in vma_alloc_zeroed_movable_folio().
> >
> > Expand this to be more generic by adding an arch hook that modifes the gfp
> > flags for an allocation when the VMA is known.
> >
> > Note that __GFP_ZEROTAGS is ignored by the page allocator unless __GFP_ZERO
> > is also set; from that point of view, the current behaviour is unchanged,
> > even though the arm64 flag is set in more places.  When arm64 will have
> > support to reuse the tag storage for data allocation, the uses of the
> > __GFP_ZEROTAGS flag will be expanded to instruct the page allocator to try
> > to reserve the corresponding tag storage for the pages being allocated.
> >
> > The flags returned by arch_calc_vma_gfp() are or'ed with the flags set by
> > the caller; this has been done to keep an architecture from modifying the
> > flags already set by the core memory management code; this is similar to
> > how do_mmap() -> calc_vm_flag_bits() -> arch_calc_vm_flag_bits() has been
> > implemented. This can be revisited in the future if there's a need to do
> > so.
> >
> > Signed-off-by: Alexandru Elisei 
> 
> This patch also needs to update the non-CONFIG_NUMA definition of
> vma_alloc_folio in include/linux/gfp.h to call arch_calc_vma_gfp. See:
> https://r.android.com/2849146

Of course, you're already reported this to me, I cherry-pick the version of
the patch that doesn't have the fix for this series.

Will fix.

Thanks,
Alex

> 
> Peter



Re: [PATCH RFC v3 11/35] mm: Allow an arch to hook into folio allocation when VMA is known

2024-01-26 Thread Peter Collingbourne
On Thu, Jan 25, 2024 at 8:43 AM Alexandru Elisei
 wrote:
>
> arm64 uses VM_HIGH_ARCH_0 and VM_HIGH_ARCH_1 for enabling MTE for a VMA.
> When VM_HIGH_ARCH_0, which arm64 renames to VM_MTE, is set for a VMA, and
> the gfp flag __GFP_ZERO is present, the __GFP_ZEROTAGS gfp flag also gets
> set in vma_alloc_zeroed_movable_folio().
>
> Expand this to be more generic by adding an arch hook that modifes the gfp
> flags for an allocation when the VMA is known.
>
> Note that __GFP_ZEROTAGS is ignored by the page allocator unless __GFP_ZERO
> is also set; from that point of view, the current behaviour is unchanged,
> even though the arm64 flag is set in more places.  When arm64 will have
> support to reuse the tag storage for data allocation, the uses of the
> __GFP_ZEROTAGS flag will be expanded to instruct the page allocator to try
> to reserve the corresponding tag storage for the pages being allocated.
>
> The flags returned by arch_calc_vma_gfp() are or'ed with the flags set by
> the caller; this has been done to keep an architecture from modifying the
> flags already set by the core memory management code; this is similar to
> how do_mmap() -> calc_vm_flag_bits() -> arch_calc_vm_flag_bits() has been
> implemented. This can be revisited in the future if there's a need to do
> so.
>
> Signed-off-by: Alexandru Elisei 

This patch also needs to update the non-CONFIG_NUMA definition of
vma_alloc_folio in include/linux/gfp.h to call arch_calc_vma_gfp. See:
https://r.android.com/2849146

Peter



[PATCH RFC v3 11/35] mm: Allow an arch to hook into folio allocation when VMA is known

2024-01-25 Thread Alexandru Elisei
arm64 uses VM_HIGH_ARCH_0 and VM_HIGH_ARCH_1 for enabling MTE for a VMA.
When VM_HIGH_ARCH_0, which arm64 renames to VM_MTE, is set for a VMA, and
the gfp flag __GFP_ZERO is present, the __GFP_ZEROTAGS gfp flag also gets
set in vma_alloc_zeroed_movable_folio().

Expand this to be more generic by adding an arch hook that modifes the gfp
flags for an allocation when the VMA is known.

Note that __GFP_ZEROTAGS is ignored by the page allocator unless __GFP_ZERO
is also set; from that point of view, the current behaviour is unchanged,
even though the arm64 flag is set in more places.  When arm64 will have
support to reuse the tag storage for data allocation, the uses of the
__GFP_ZEROTAGS flag will be expanded to instruct the page allocator to try
to reserve the corresponding tag storage for the pages being allocated.

The flags returned by arch_calc_vma_gfp() are or'ed with the flags set by
the caller; this has been done to keep an architecture from modifying the
flags already set by the core memory management code; this is similar to
how do_mmap() -> calc_vm_flag_bits() -> arch_calc_vm_flag_bits() has been
implemented. This can be revisited in the future if there's a need to do
so.

Signed-off-by: Alexandru Elisei 
---
 arch/arm64/include/asm/page.h|  5 ++---
 arch/arm64/include/asm/pgtable.h |  3 +++
 arch/arm64/mm/fault.c| 19 ++-
 include/linux/pgtable.h  |  7 +++
 mm/mempolicy.c   |  1 +
 mm/shmem.c   |  5 -
 6 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 2312e6ee595f..88bab032a493 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -29,9 +29,8 @@ void copy_user_highpage(struct page *to, struct page *from,
 void copy_highpage(struct page *to, struct page *from);
 #define __HAVE_ARCH_COPY_HIGHPAGE
 
-struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
-   unsigned long vaddr);
-#define vma_alloc_zeroed_movable_folio vma_alloc_zeroed_movable_folio
+#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
+   vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false)
 
 void tag_clear_highpage(struct page *to);
 #define __HAVE_ARCH_TAG_CLEAR_HIGHPAGE
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 79ce70fbb751..08f0904dbfc2 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1071,6 +1071,9 @@ static inline void arch_swap_restore(swp_entry_t entry, 
struct folio *folio)
 
 #endif /* CONFIG_ARM64_MTE */
 
+#define __HAVE_ARCH_CALC_VMA_GFP
+gfp_t arch_calc_vma_gfp(struct vm_area_struct *vma, gfp_t gfp);
+
 /*
  * On AArch64, the cache coherency is handled via the set_pte_at() function.
  */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 55f6455a8284..4d3f0a870ad8 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -937,22 +937,15 @@ void do_debug_exception(unsigned long addr_if_watchpoint, 
unsigned long esr,
 NOKPROBE_SYMBOL(do_debug_exception);
 
 /*
- * Used during anonymous page fault handling.
+ * If this is called during anonymous page fault handling, and the page is
+ * mapped with PROT_MTE, initialise the tags at the point of tag zeroing as 
this
+ * is usually faster than separate DC ZVA and STGM.
  */
-struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
-   unsigned long vaddr)
+gfp_t arch_calc_vma_gfp(struct vm_area_struct *vma, gfp_t gfp)
 {
-   gfp_t flags = GFP_HIGHUSER_MOVABLE | __GFP_ZERO;
-
-   /*
-* If the page is mapped with PROT_MTE, initialise the tags at the
-* point of allocation and page zeroing as this is usually faster than
-* separate DC ZVA and STGM.
-*/
if (vma->vm_flags & VM_MTE)
-   flags |= __GFP_ZEROTAGS;
-
-   return vma_alloc_folio(flags, 0, vma, vaddr, false);
+   return __GFP_ZEROTAGS;
+   return 0;
 }
 
 void tag_clear_highpage(struct page *page)
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index c5ddec6b5305..98f81ca08cbe 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -901,6 +901,13 @@ static inline void arch_do_swap_page(struct mm_struct *mm,
 }
 #endif
 
+#ifndef __HAVE_ARCH_CALC_VMA_GFP
+static inline gfp_t arch_calc_vma_gfp(struct vm_area_struct *vma, gfp_t gfp)
+{
+   return 0;
+}
+#endif
+
 #ifndef __HAVE_ARCH_FREE_PAGES_PREPARE
 static inline void arch_free_pages_prepare(struct page *page, int order) { }
 #endif
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 10a590ee1c89..f7ef52760b32 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2168,6 +2168,7 @@ struct folio *vma_alloc_folio(gfp_t gfp, int order, 
struct vm_area_struct *vma,
pgoff_t ilx;
struct page *page;
 
+   gfp