On Wed, May 18, 2016 at 09:51:58AM +0200, Vlastimil Babka wrote:
> On 05/17/2016 08:41 AM, Naoya Horiguchi wrote:
> >> @@ -2579,20 +2612,22 @@ struct page *buffered_rmqueue(struct zone 
> >> *preferred_zone,
> >>            struct list_head *list;
> >>   
> >>            local_irq_save(flags);
> >> -          pcp = &this_cpu_ptr(zone->pageset)->pcp;
> >> -          list = &pcp->lists[migratetype];
> >> -          if (list_empty(list)) {
> >> -                  pcp->count += rmqueue_bulk(zone, 0,
> >> -                                  pcp->batch, list,
> >> -                                  migratetype, cold);
> >> -                  if (unlikely(list_empty(list)))
> >> -                          goto failed;
> >> -          }
> >> +          do {
> >> +                  pcp = &this_cpu_ptr(zone->pageset)->pcp;
> >> +                  list = &pcp->lists[migratetype];
> >> +                  if (list_empty(list)) {
> >> +                          pcp->count += rmqueue_bulk(zone, 0,
> >> +                                          pcp->batch, list,
> >> +                                          migratetype, cold);
> >> +                          if (unlikely(list_empty(list)))
> >> +                                  goto failed;
> >> +                  }
> >>   
> >> -          if (cold)
> >> -                  page = list_last_entry(list, struct page, lru);
> >> -          else
> >> -                  page = list_first_entry(list, struct page, lru);
> >> +                  if (cold)
> >> +                          page = list_last_entry(list, struct page, lru);
> >> +                  else
> >> +                          page = list_first_entry(list, struct page, lru);
> >> +          } while (page && check_new_pcp(page));
> > 
> > This causes infinite loop when check_new_pcp() returns 1, because the bad
> > page is still in the list (I assume that a bad page never disappears).
> > The original kernel is free from this problem because we do retry after
> > list_del(). So moving the following 3 lines into this do-while block solves
> > the problem?
> > 
> >      __dec_zone_state(zone, NR_ALLOC_BATCH);
> >      list_del(&page->lru);
> >      pcp->count--;
> > 
> > There seems no infinit loop issue in order > 0 block below, because bad 
> > pages
> > are deleted from free list in __rmqueue_smallest().
> 
> Ooops, thanks for catching this, wish it was sooner...
> 

Still not too late fortunately! Thanks Naoya for identifying this and
Vlastimil for fixing it.

> ----8<----
> From f52f5e2a7dd65f2814183d8fd254ace43120b828 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <[email protected]>
> Date: Wed, 18 May 2016 09:41:01 +0200
> Subject: [PATCH] mm, page_alloc: prevent infinite loop in buffered_rmqueue()
> 
> In DEBUG_VM kernel, we can hit infinite loop for order == 0 in
> buffered_rmqueue() when check_new_pcp() returns 1, because the bad page is
> never removed from the pcp list. Fix this by removing the page before 
> retrying.
> Also we don't need to check if page is non-NULL, because we simply grab it 
> from
> the list which was just tested for being non-empty.
> 
> Fixes: 
> http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-defer-debugging-checks-of-freed-pages-until-a-pcp-drain.patch
> Reported-by: Naoya Horiguchi <[email protected]>
> Signed-off-by: Vlastimil Babka <[email protected]>

Reviewed-by: Mel Gorman <[email protected]>

-- 
Mel Gorman
SUSE Labs

Reply via email to