On 2026/6/11 13:43, Michael S. Tsirkin wrote: > On Thu, Jun 11, 2026 at 11:35:36AM +0800, Miaohe Lin wrote: >> On 2026/6/11 5:18, Michael S. Tsirkin wrote: >>> On Wed, Jun 10, 2026 at 03:24:30PM +0800, Miaohe Lin wrote: >>>> On 2026/6/10 5:00, Michael S. Tsirkin wrote: >>>>> On Tue, Jun 09, 2026 at 04:54:01PM -0400, Zi Yan wrote: >>>>>> On 9 Jun 2026, at 16:34, Michael S. Tsirkin wrote: >>>>>> >>>>>>> On Tue, Jun 09, 2026 at 02:52:47PM -0400, Zi Yan wrote: >>>>>>>> On 9 Jun 2026, at 14:39, Zi Yan wrote: >>>>>>>> >>>>>>>>> On 9 Jun 2026, at 14:38, David Hildenbrand (Arm) wrote: >>>>>>>>> >>>>>>>>>> On 6/9/26 20:10, Andrew Morton wrote: >>>>>>>>>>> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" >>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> TestSetPageHWPoison() is called without zone->lock, so its atomic >>>>>>>>>>>> update to page->flags can race with non-atomic flag operations >>>>>>>>>>>> that run under zone->lock in the buddy allocator. >>>>>>>>>>>> >>>>>>>>>>>> In particular, __free_pages_prepare() does: >>>>>>>>>>>> >>>>>>>>>>>> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; >>>>>>>>>>>> >>>>>>>>>>>> This non-atomic read-modify-write, while correctly excluding >>>>>>>>>>>> __PG_HWPOISON from the mask, can still lose a concurrent >>>>>>>>>>>> TestSetPageHWPoison if the read happens before the poison bit >>>>>>>>>>>> is set and the write happens after. Will only get worse if/when >>>>>>>>>>>> we add more non-atomic flag operations. >>>>>>>>>>>> >>>>>>>>>>>> Fix by acquiring zone->lock around TestSetPageHWPoison and >>>>>>>>>>>> around ClearPageHWPoison in the retry path. This >>>>>>>>>>>> serializes with all buddy flag manipulation. The cost is >>>>>>>>>>>> negligible: one lock/unlock in an extremely rare path >>>>>>>>>>>> (hardware memory errors). >>>>>>>>>>>> >>>>>>>>>>>> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere >>>>>>>>>>>> in this file operate on pages already removed from the buddy >>>>>>>>>>>> allocator or on non-buddy pages (DAX, hugetlb), so they do not >>>>>>>>>>>> need zone->lock protection. >>>>>>>>>>> >>>>>>>>>>> Sashiko is saying this doesn't do anything "Because >>>>>>>>>>> __free_pages_prepare() executes entirely locklessly". Did it goof? >>>>>>>>>>> >>>>>>>>>>> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git....@redhat.com >>>>>>>>>> >>>>>>>>>> Battle of the bots: it's right. >>>>>>>>> >>>>>>>>> Yep, __free_pages_prepare() changes the page flag without holding >>>>>>>>> zone->lock. >>>>>>>> >>>>>>>> __free_pages_prepare() works on frozen pages and assumes no one else >>>>>>>> touches the input page. To avoid this race, memory_failure() might >>>>>>>> want to try_get_page() before TestClearPageHWPoison(), but I am not >>>>>>>> sure if that works along with memory failure flow. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Yan, Zi >>>>>>> >>>>>>> >>>>>>> >>>>>>> Actually memory failure already plays with this down the road no? >>>>>>> >>>>>>> So maybe it's enough to just SetPageHWPoison afterwards again? >>>>>>> >>>>>>> >>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>>>>>> index ee42d4361309..4758fea94a96 100644 >>>>>>> --- a/mm/memory-failure.c >>>>>>> +++ b/mm/memory-failure.c >>>>>>> @@ -2415,6 +2415,7 @@ int memory_failure(unsigned long pfn, int flags) >>>>>>> if (!res) { >>>>>>> if (is_free_buddy_page(p)) { >>>>>>> if (take_page_off_buddy(p)) { >>>>>>> + SetPageHWPoison(p); >>>>>>> page_ref_inc(p); >>>>>>> res = MF_RECOVERED; >>>>>>> } else { >>>>>>> >>>>>>> >>>>>>> and maybe in a bunch of other places in there? >>>>>> >>>>>> You mean for fear of losing HWPoison flag in the earlier >>>>>> TestSetPageHWPoison(), >>>>>> just set it again here? >>>>> >>>>> Yea. >>>>> >>>>>> Why not do it after get_hwpoison_page(), since that >>>>>> is the expected page flag? >>>>> >>>>> It's still in the buddy at that point right? I'm worried buddy might >>>>> poke at flags. >>>> >>>> Since __free_pages_prepare() executes entirely locklessly, the only way to >>>> ensure >>>> HWPoison flag won't be lost might be only set hwpoison flag iff we can >>>> make sure >>>> pages are not on the way to buddy... >>>> >>>> Thanks. >>>> . >>> >>> >>> To clarify do you not agree repeating SetPageHWPoison is enough for >>> this? And if not, do you have suggestions on how to fix this race? >> >> Do you mean repeating SetPageHWPoison on every branch? > > Right. > >> Is it possible >> to make __free_pages_prepare changes page->flags atomically or this race >> is specified to memory_failure? >> >> Thanks. >> . > > > Adding an atomic op on every fast path page allocation is, I am > guessing, going to slow down Linux measureably. > > Doing it for the benefit of memory_failure, which is the slowest of > slow paths, seems unpalatable, to me.
Agree, it's not worth to do so. > > Neither am I sure it's the only racy place - > grep for __SetPage and __ClearPage - all these have the same issue, I > suspect. > > At the same time, I'm not an mm maintainer. If you disagree, try to > upstream a change converting all non atomics in mm to atomics, and see > what others say. Since memory_failure might be the only place, this change would be unacceptable. We should come up with a better solution. Maybe we can try repeating SetPageHWPoison and ClearPageHWPoison at a first attempt though it looks somewhat weird to me and makes code more complicated. But it's already complicated. :) Thanks. .

