On Tue, Jun 09, 2026 at 08:38:09PM +0200, David Hildenbrand (Arm) wrote: > On 6/9/26 20:10, Andrew Morton wrote: > > On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <[email protected]> > > wrote: > > > >> TestSetPageHWPoison() is called without zone->lock, so its atomic > >> update to page->flags can race with non-atomic flag operations > >> that run under zone->lock in the buddy allocator. > >> > >> In particular, __free_pages_prepare() does: > >> > >> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP; > >> > >> This non-atomic read-modify-write, while correctly excluding > >> __PG_HWPOISON from the mask, can still lose a concurrent > >> TestSetPageHWPoison if the read happens before the poison bit > >> is set and the write happens after. Will only get worse if/when > >> we add more non-atomic flag operations. > >> > >> Fix by acquiring zone->lock around TestSetPageHWPoison and > >> around ClearPageHWPoison in the retry path. This > >> serializes with all buddy flag manipulation. The cost is > >> negligible: one lock/unlock in an extremely rare path > >> (hardware memory errors). > >> > >> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere > >> in this file operate on pages already removed from the buddy > >> allocator or on non-buddy pages (DAX, hugetlb), so they do not > >> need zone->lock protection. > > > > Sashiko is saying this doesn't do anything "Because > > __free_pages_prepare() executes entirely locklessly". Did it goof? > > > > https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git....@redhat.com > > Battle of the bots: it's right.
Ugh it's bot against human - I remembered we have zone lock normally in alloc and thought it helps and didn't double check we don't have it here . The bot wins ( > -- > Cheers, > > Davidc

