On 2026/6/11 13:43, Michael S. Tsirkin wrote:
> On Thu, Jun 11, 2026 at 11:35:36AM +0800, Miaohe Lin wrote:
>> On 2026/6/11 5:18, Michael S. Tsirkin wrote:
>>> On Wed, Jun 10, 2026 at 03:24:30PM +0800, Miaohe Lin wrote:
>>>> On 2026/6/10 5:00, Michael S. Tsirkin wrote:
>>>>> On Tue, Jun 09, 2026 at 04:54:01PM -0400, Zi Yan wrote:
>>>>>> On 9 Jun 2026, at 16:34, Michael S. Tsirkin wrote:
>>>>>>
>>>>>>> On Tue, Jun 09, 2026 at 02:52:47PM -0400, Zi Yan wrote:
>>>>>>>> On 9 Jun 2026, at 14:39, Zi Yan wrote:
>>>>>>>>
>>>>>>>>> On 9 Jun 2026, at 14:38, David Hildenbrand (Arm) wrote:
>>>>>>>>>
>>>>>>>>>> On 6/9/26 20:10, Andrew Morton wrote:
>>>>>>>>>>> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" 
>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> TestSetPageHWPoison() is called without zone->lock, so its atomic
>>>>>>>>>>>> update to page->flags can race with non-atomic flag operations
>>>>>>>>>>>> that run under zone->lock in the buddy allocator.
>>>>>>>>>>>>
>>>>>>>>>>>> In particular, __free_pages_prepare() does:
>>>>>>>>>>>>
>>>>>>>>>>>>     page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP;
>>>>>>>>>>>>
>>>>>>>>>>>> This non-atomic read-modify-write, while correctly excluding
>>>>>>>>>>>> __PG_HWPOISON from the mask, can still lose a concurrent
>>>>>>>>>>>> TestSetPageHWPoison if the read happens before the poison bit
>>>>>>>>>>>> is set and the write happens after.  Will only get worse if/when
>>>>>>>>>>>> we add more non-atomic flag operations.
>>>>>>>>>>>>
>>>>>>>>>>>> Fix by acquiring zone->lock around TestSetPageHWPoison and
>>>>>>>>>>>> around ClearPageHWPoison in the retry path.  This
>>>>>>>>>>>> serializes with all buddy flag manipulation.  The cost is
>>>>>>>>>>>> negligible: one lock/unlock in an extremely rare path
>>>>>>>>>>>> (hardware memory errors).
>>>>>>>>>>>>
>>>>>>>>>>>> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere
>>>>>>>>>>>> in this file operate on pages already removed from the buddy
>>>>>>>>>>>> allocator or on non-buddy pages (DAX, hugetlb), so they do not
>>>>>>>>>>>> need zone->lock protection.
>>>>>>>>>>>
>>>>>>>>>>> Sashiko is saying this doesn't do anything "Because
>>>>>>>>>>> __free_pages_prepare() executes entirely locklessly".  Did it goof?
>>>>>>>>>>>
>>>>>>>>>>> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git....@redhat.com
>>>>>>>>>>
>>>>>>>>>> Battle of the bots: it's right.
>>>>>>>>>
>>>>>>>>> Yep, __free_pages_prepare() changes the page flag without holding
>>>>>>>>> zone->lock.
>>>>>>>>
>>>>>>>> __free_pages_prepare() works on frozen pages and assumes no one else
>>>>>>>> touches the input page. To avoid this race, memory_failure() might
>>>>>>>> want to try_get_page() before TestClearPageHWPoison(), but I am not
>>>>>>>> sure if that works along with memory failure flow.
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Yan, Zi
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Actually memory failure already plays with this down the road no?
>>>>>>>
>>>>>>> So maybe it's enough to just SetPageHWPoison afterwards again?
>>>>>>>
>>>>>>>
>>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>>>> index ee42d4361309..4758fea94a96 100644
>>>>>>> --- a/mm/memory-failure.c
>>>>>>> +++ b/mm/memory-failure.c
>>>>>>> @@ -2415,6 +2415,7 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>>>         if (!res) {
>>>>>>>                 if (is_free_buddy_page(p)) {
>>>>>>>                         if (take_page_off_buddy(p)) {
>>>>>>> +                               SetPageHWPoison(p);
>>>>>>>                                 page_ref_inc(p);
>>>>>>>                                 res = MF_RECOVERED;
>>>>>>>                         } else {
>>>>>>>
>>>>>>>
>>>>>>> and maybe in a bunch of other places in there?
>>>>>>
>>>>>> You mean for fear of losing HWPoison flag in the earlier 
>>>>>> TestSetPageHWPoison(),
>>>>>> just set it again here?
>>>>>
>>>>> Yea.
>>>>>
>>>>>> Why not do it after get_hwpoison_page(), since that
>>>>>> is the expected page flag?
>>>>>
>>>>> It's still in the buddy at that point right? I'm worried buddy might
>>>>> poke at flags.
>>>>
>>>> Since __free_pages_prepare() executes entirely locklessly, the only way to 
>>>> ensure
>>>> HWPoison flag won't be lost might be only set hwpoison flag iff we can 
>>>> make sure
>>>> pages are not on the way to buddy...
>>>>
>>>> Thanks.
>>>> .
>>>
>>>
>>> To clarify do you not agree repeating SetPageHWPoison is enough for
>>> this? And if not, do you have suggestions on how to fix this race?
>>
>> Do you mean repeating SetPageHWPoison on every branch?
> 
> Right.
> 
>> Is it possible
>> to make __free_pages_prepare changes page->flags atomically or this race
>> is specified to memory_failure?
>>
>> Thanks.
>> .
> 
> 
> Adding an atomic op on every fast path page allocation is, I am
> guessing, going to slow down Linux measureably.
> 
> Doing it for the benefit of memory_failure, which is the slowest of
> slow paths, seems unpalatable, to me.

Agree, it's not worth to do so.

> 
> Neither am I sure it's the only racy place -
> grep for __SetPage and __ClearPage - all these have the same issue, I
> suspect.
> 
> At the same time, I'm not an mm maintainer. If you disagree, try to
> upstream a change converting all non atomics in mm to atomics, and see
> what others say.

Since memory_failure might be the only place, this change would be unacceptable.
We should come up with a better solution. Maybe we can try repeating 
SetPageHWPoison
and ClearPageHWPoison at a first attempt though it looks somewhat weird to me 
and makes
code more complicated. But it's already complicated. :)

Thanks.
.



Reply via email to