On Thu 13-08-20 22:57:32, Charan Teja Kalla wrote:
> Thanks Michal.
> 
> On 8/13/2020 10:00 PM, Michal Hocko wrote:
> > On Thu 13-08-20 21:51:29, Charan Teja Kalla wrote:
> >> Thanks Michal for comments.
> >>
> >> On 8/13/2020 5:11 PM, Michal Hocko wrote:
> >>> On Tue 11-08-20 18:28:23, Charan Teja Reddy wrote:
> >>> [...]
> >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>>> index e4896e6..839039f 100644
> >>>> --- a/mm/page_alloc.c
> >>>> +++ b/mm/page_alloc.c
> >>>> @@ -1304,6 +1304,11 @@ static void free_pcppages_bulk(struct zone *zone, 
> >>>> int count,
> >>>>          struct page *page, *tmp;
> >>>>          LIST_HEAD(head);
> >>>>  
> >>>> +        /*
> >>>> +         * Ensure proper count is passed which otherwise would stuck in 
> >>>> the
> >>>> +         * below while (list_empty(list)) loop.
> >>>> +         */
> >>>> +        count = min(pcp->count, count);
> >>>>          while (count) {
> >>>>                  struct list_head *list;
> >>>
> >>>
> >>> How does this prevent the race actually?
> >>
> >> This doesn't prevent the race. This only fixes the core hung(as this is
> >> called with spin_lock_irq()) caused by the race condition. This core
> >> hung is because of incorrect count value is passed to the
> >> free_pcppages_bulk() function.
> > 
> > Let me ask differently. What does enforce that the count and lists do
> > not get out of sync in the loop. 
> 
> count value is updated whenever an order-0 page is being added to the
> pcp lists through free_unref_page_commit(), which is being called with
> both interrupts, premption disabled.
> static void free_unref_page_commit(struct page *page, {
>       ....
>       list_add(&page->lru, &pcp->lists[migratetype]);
>       pcp->count++
> }
> 
> As these are pcp lists, they only gets touched by another process when
> this process is context switched, which happens only after enabling
> premption or interrupts. So, as long as process is operating on these
> pcp lists in free_unref_page_commit function, the count and lists are
> always synced.
> 
> However, the problem here is not that the count and lists are being out
> of sync. They do always in sync, as explained above. It is with the
> asking free_pcppages_bulk() to free the pages more than what is present
> in the pcp lists which is ending up in while(list_empty()).

You are right. I managed to confuse myself. The thing is that the batch
count is out of sync.

> > Your changelog says that the fix is to
> > use the proper value without any specifics.
> > 
> Will change this to: Ensure the count value passed is not greater than
> the pcp lists count. Any better you suggest?

Yes, this makes it more clear.

Feel free to add
Acked-by: Michal Hocko <mho...@suse.com>
-- 
Michal Hocko
SUSE Labs

Reply via email to