Re: [PATCH RFC v2 05/27] mm: page_alloc: Add an arch hook to allow prep_new_page() to fail

2023-11-28 Thread Alexandru Elisei
Hi,

On Tue, Nov 28, 2023 at 05:57:31PM +0100, David Hildenbrand wrote:
> On 27.11.23 13:09, Alexandru Elisei wrote:
> > Hi,
> > 
> > Thank you so much for your comments, there are genuinely useful.
> > 
> > On Fri, Nov 24, 2023 at 08:35:47PM +0100, David Hildenbrand wrote:
> > > On 19.11.23 17:56, Alexandru Elisei wrote:
> > > > Introduce arch_prep_new_page(), which will be used by arm64 to reserve 
> > > > tag
> > > > storage for an allocated page. Reserving tag storage can fail, for 
> > > > example,
> > > > if the tag storage page has a short pin on it, so allow prep_new_page() 
> > > > ->
> > > > arch_prep_new_page() to similarly fail.
> > > 
> > > But what are the side-effects of this? How does the calling code recover?
> > > 
> > > E.g., what if we need to populate a page into user space, but that
> > > particular page we allocated fails to be prepared? So we inject a signal
> > > into that poor process?
> > 
> > When the page fails to be prepared, it is put back to the tail of the
> > freelist with __free_one_page(.., FPI_TO_TAIL). If all the allocation paths
> > are exhausted and no page has been found for which tag storage has been
> > reserved, then that's treated like an OOM situation.
> > 
> > I have been thinking about this, and I think I can simplify the code by
> > making tag reservation a best effort approach. The page can be allocated
> > even if reserving tag storage fails, but the page is marked as invalid in
> > set_pte_at() (PAGE_NONE + an extra bit to tell arm64 that it needs tag
> > storage) and next time it is accessed, arm64 will reserve tag storage in
> > the fault handling code (the mechanism for that is implemented in patch #19
> > of the series, "mm: mprotect: Introduce PAGE_FAULT_ON_ACCESS for
> > mprotect(PROT_MTE)").
> > 
> > With this new approach, prep_new_page() stays the way it is, and no further
> > changes are required for the page allocator, as there are already arch
> > callbacks that can be used for that, for example tag_clear_highpage() and
> > arch_alloc_page(). The downside is extra page faults, which might impact
> > performance.
> > 
> > What do you think?
> 
> That sounds a lot more robust, compared to intermittent failures to allocate
> pages.

Great, thank you for the feedback, I will use this approach for the next
iteration of the series.

Thanks,
Alex

> 
> -- 
> Cheers,
> 
> David / dhildenb
> 



Re: [PATCH RFC v2 05/27] mm: page_alloc: Add an arch hook to allow prep_new_page() to fail

2023-11-27 Thread Alexandru Elisei
Hi,

Thank you so much for your comments, there are genuinely useful.

On Fri, Nov 24, 2023 at 08:35:47PM +0100, David Hildenbrand wrote:
> On 19.11.23 17:56, Alexandru Elisei wrote:
> > Introduce arch_prep_new_page(), which will be used by arm64 to reserve tag
> > storage for an allocated page. Reserving tag storage can fail, for example,
> > if the tag storage page has a short pin on it, so allow prep_new_page() ->
> > arch_prep_new_page() to similarly fail.
> 
> But what are the side-effects of this? How does the calling code recover?
> 
> E.g., what if we need to populate a page into user space, but that
> particular page we allocated fails to be prepared? So we inject a signal
> into that poor process?

When the page fails to be prepared, it is put back to the tail of the
freelist with __free_one_page(.., FPI_TO_TAIL). If all the allocation paths
are exhausted and no page has been found for which tag storage has been
reserved, then that's treated like an OOM situation.

I have been thinking about this, and I think I can simplify the code by
making tag reservation a best effort approach. The page can be allocated
even if reserving tag storage fails, but the page is marked as invalid in
set_pte_at() (PAGE_NONE + an extra bit to tell arm64 that it needs tag
storage) and next time it is accessed, arm64 will reserve tag storage in
the fault handling code (the mechanism for that is implemented in patch #19
of the series, "mm: mprotect: Introduce PAGE_FAULT_ON_ACCESS for
mprotect(PROT_MTE)").

With this new approach, prep_new_page() stays the way it is, and no further
changes are required for the page allocator, as there are already arch
callbacks that can be used for that, for example tag_clear_highpage() and
arch_alloc_page(). The downside is extra page faults, which might impact
performance.

What do you think?

Thanks,
Alex

> 
> -- 
> Cheers,
> 
> David / dhildenb
> 



Re: [PATCH RFC v2 05/27] mm: page_alloc: Add an arch hook to allow prep_new_page() to fail

2023-11-24 Thread David Hildenbrand

On 19.11.23 17:56, Alexandru Elisei wrote:

Introduce arch_prep_new_page(), which will be used by arm64 to reserve tag
storage for an allocated page. Reserving tag storage can fail, for example,
if the tag storage page has a short pin on it, so allow prep_new_page() ->
arch_prep_new_page() to similarly fail.


But what are the side-effects of this? How does the calling code recover?

E.g., what if we need to populate a page into user space, but that 
particular page we allocated fails to be prepared? So we inject a signal 
into that poor process?


--
Cheers,

David / dhildenb




[PATCH RFC v2 05/27] mm: page_alloc: Add an arch hook to allow prep_new_page() to fail

2023-11-19 Thread Alexandru Elisei
Introduce arch_prep_new_page(), which will be used by arm64 to reserve tag
storage for an allocated page. Reserving tag storage can fail, for example,
if the tag storage page has a short pin on it, so allow prep_new_page() ->
arch_prep_new_page() to similarly fail.

arch_alloc_page(), called from post_alloc_hook(), has been considered as an
alternative to adding yet another arch hook, but post_alloc_hook() cannot
fail, as it's also called when free pages are isolated.

Signed-off-by: Alexandru Elisei 
---
 include/linux/pgtable.h |  7 
 mm/page_alloc.c | 75 -
 2 files changed, 66 insertions(+), 16 deletions(-)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index af7639c3b0a3..b31f53e9ab1d 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -873,6 +873,13 @@ static inline void arch_do_swap_page(struct mm_struct *mm,
 }
 #endif
 
+#ifndef __HAVE_ARCH_PREP_NEW_PAGE
+static inline int arch_prep_new_page(struct page *page, int order, gfp_t gfp)
+{
+   return 0;
+}
+#endif
+
 #ifndef __HAVE_ARCH_UNMAP_ONE
 /*
  * Some architectures support metadata associated with a page. When a
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 770e585b77c8..b2782b778e78 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1538,9 +1538,15 @@ inline void post_alloc_hook(struct page *page, unsigned 
int order,
page_table_check_alloc(page, order);
 }
 
-static void prep_new_page(struct page *page, unsigned int order, gfp_t 
gfp_flags,
+static int prep_new_page(struct page *page, unsigned int order, gfp_t 
gfp_flags,
unsigned int 
alloc_flags)
 {
+   int ret;
+
+   ret = arch_prep_new_page(page, order, gfp_flags);
+   if (unlikely(ret))
+   return ret;
+
post_alloc_hook(page, order, gfp_flags);
 
if (order && (gfp_flags & __GFP_COMP))
@@ -1556,6 +1562,8 @@ static void prep_new_page(struct page *page, unsigned int 
order, gfp_t gfp_flags
set_page_pfmemalloc(page);
else
clear_page_pfmemalloc(page);
+
+   return 0;
 }
 
 /*
@@ -3163,6 +3171,24 @@ static inline unsigned int gfp_to_alloc_flags_cma(gfp_t 
gfp_mask,
return alloc_flags;
 }
 
+#ifdef HAVE_ARCH_ALLOC_PAGE
+static void return_page_to_buddy(struct page *page, int order)
+{
+   int migratetype = get_pfnblock_migratetype(page, pfn);
+   unsigned long pfn = page_to_pfn(page);
+   struct zone *zone = page_zone(page);
+   unsigned long flags;
+
+   spin_lock_irqsave(&zone->lock, flags);
+   __free_one_page(page, pfn, zone, order, migratetype, FPI_TO_TAIL);
+   spin_unlock_irqrestore(&zone->lock, flags);
+}
+#else
+static void return_page_to_buddy(struct page *page, int order)
+{
+}
+#endif
+
 /*
  * get_page_from_freelist goes through the zonelist trying to allocate
  * a page.
@@ -3309,7 +3335,10 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int 
order, int alloc_flags,
page = rmqueue(ac->preferred_zoneref->zone, zone, order,
gfp_mask, alloc_flags, ac->migratetype);
if (page) {
-   prep_new_page(page, order, gfp_mask, alloc_flags);
+   if (prep_new_page(page, order, gfp_mask, alloc_flags)) {
+   return_page_to_buddy(page, order);
+   goto no_page;
+   }
 
/*
 * If this is a high-order atomic allocation then check
@@ -3319,20 +3348,20 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int 
order, int alloc_flags,
reserve_highatomic_pageblock(page, zone);
 
return page;
-   } else {
-   if (has_unaccepted_memory()) {
-   if (try_to_accept_memory(zone, order))
-   goto try_this_zone;
-   }
+   }
+no_page:
+   if (has_unaccepted_memory()) {
+   if (try_to_accept_memory(zone, order))
+   goto try_this_zone;
+   }
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-   /* Try again if zone has deferred pages */
-   if (deferred_pages_enabled()) {
-   if (_deferred_grow_zone(zone, order))
-   goto try_this_zone;
-   }
-#endif
+   /* Try again if zone has deferred pages */
+   if (deferred_pages_enabled()) {
+   if (_deferred_grow_zone(zone, order))
+   goto try_this_zone;
}
+#endif
}
 
/*
@@ -3538,8 +3567,12 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned 
int order,
count_vm_event(COMPACTSTALL);