On Thu, Jun 11, 2026 at 11:35:36AM +0800, Miaohe Lin wrote: > On 2026/6/11 5:18, Michael S. Tsirkin wrote: > > On Wed, Jun 10, 2026 at 03:24:30PM +0800, Miaohe Lin wrote: > >> On 2026/6/10 5:00, Michael S. Tsirkin wrote: > >>> On Tue, Jun 09, 2026 at 04:54:01PM -0400, Zi Yan wrote: > >>>> On 9 Jun 2026, at 16:34, Michael S. Tsirkin wrote: > >>>> > >>>>> On Tue, Jun 09, 2026 at 02:52:47PM -0400, Zi Yan wrote: > >>>>>> On 9 Jun 2026, at 14:39, Zi Yan wrote: > >>>>>> > >>>>>>> On 9 Jun 2026, at 14:38, David Hildenbrand (Arm) wrote: > >>>>>>> > >>>>>>>> On 6/9/26 20:10, Andrew Morton wrote: > >>>>>>>>> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" > >>>>>>>>> <[email protected]> wrote: > >>>>>>>>> > >>>>>>>>>> TestSetPageHWPoison() is called without zone->lock, so its atomic > >>>>>>>>>> update to page->flags can race with non-atomic flag operations > >>>>>>>>>> that run under zone->lock in the buddy allocator. > >>>>>>>>>> > >>>>>>>>>> In particular, __free_pages_prepare() does: > >>>>>>>>>> > >>>>>>>>>> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > >>>>>>>>>> > >>>>>>>>>> This non-atomic read-modify-write, while correctly excluding > >>>>>>>>>> __PG_HWPOISON from the mask, can still lose a concurrent > >>>>>>>>>> TestSetPageHWPoison if the read happens before the poison bit > >>>>>>>>>> is set and the write happens after. Will only get worse if/when > >>>>>>>>>> we add more non-atomic flag operations. > >>>>>>>>>> > >>>>>>>>>> Fix by acquiring zone->lock around TestSetPageHWPoison and > >>>>>>>>>> around ClearPageHWPoison in the retry path. This > >>>>>>>>>> serializes with all buddy flag manipulation. The cost is > >>>>>>>>>> negligible: one lock/unlock in an extremely rare path > >>>>>>>>>> (hardware memory errors). > >>>>>>>>>> > >>>>>>>>>> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere > >>>>>>>>>> in this file operate on pages already removed from the buddy > >>>>>>>>>> allocator or on non-buddy pages (DAX, hugetlb), so they do not > >>>>>>>>>> need zone->lock protection. > >>>>>>>>> > >>>>>>>>> Sashiko is saying this doesn't do anything "Because > >>>>>>>>> __free_pages_prepare() executes entirely locklessly". Did it goof? > >>>>>>>>> > >>>>>>>>> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git....@redhat.com > >>>>>>>> > >>>>>>>> Battle of the bots: it's right. > >>>>>>> > >>>>>>> Yep, __free_pages_prepare() changes the page flag without holding > >>>>>>> zone->lock. > >>>>>> > >>>>>> __free_pages_prepare() works on frozen pages and assumes no one else > >>>>>> touches the input page. To avoid this race, memory_failure() might > >>>>>> want to try_get_page() before TestClearPageHWPoison(), but I am not > >>>>>> sure if that works along with memory failure flow. > >>>>>> > >>>>>> Best Regards, > >>>>>> Yan, Zi > >>>>> > >>>>> > >>>>> > >>>>> Actually memory failure already plays with this down the road no? > >>>>> > >>>>> So maybe it's enough to just SetPageHWPoison afterwards again? > >>>>> > >>>>> > >>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c > >>>>> index ee42d4361309..4758fea94a96 100644 > >>>>> --- a/mm/memory-failure.c > >>>>> +++ b/mm/memory-failure.c > >>>>> @@ -2415,6 +2415,7 @@ int memory_failure(unsigned long pfn, int flags) > >>>>> if (!res) { > >>>>> if (is_free_buddy_page(p)) { > >>>>> if (take_page_off_buddy(p)) { > >>>>> + SetPageHWPoison(p); > >>>>> page_ref_inc(p); > >>>>> res = MF_RECOVERED; > >>>>> } else { > >>>>> > >>>>> > >>>>> and maybe in a bunch of other places in there? > >>>> > >>>> You mean for fear of losing HWPoison flag in the earlier > >>>> TestSetPageHWPoison(), > >>>> just set it again here? > >>> > >>> Yea. > >>> > >>>> Why not do it after get_hwpoison_page(), since that > >>>> is the expected page flag? > >>> > >>> It's still in the buddy at that point right? I'm worried buddy might > >>> poke at flags. > >> > >> Since __free_pages_prepare() executes entirely locklessly, the only way to > >> ensure > >> HWPoison flag won't be lost might be only set hwpoison flag iff we can > >> make sure > >> pages are not on the way to buddy... > >> > >> Thanks. > >> . > > > > > > To clarify do you not agree repeating SetPageHWPoison is enough for > > this? And if not, do you have suggestions on how to fix this race? > > Do you mean repeating SetPageHWPoison on every branch?
Right. > Is it possible > to make __free_pages_prepare changes page->flags atomically or this race > is specified to memory_failure? > > Thanks. > . Adding an atomic op on every fast path page allocation is, I am guessing, going to slow down Linux measureably. Doing it for the benefit of memory_failure, which is the slowest of slow paths, seems unpalatable, to me. Neither am I sure it's the only racy place - grep for __SetPage and __ClearPage - all these have the same issue, I suspect. At the same time, I'm not an mm maintainer. If you disagree, try to upstream a change converting all non atomics in mm to atomics, and see what others say. -- MST

