Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
>
> On Wed, Mar 16, 2005 at 04:04:35AM -0800, Andrew Morton wrote:
> > > +                 if (!reclaim_state->reclaimed_slab &&
> > > +                     zone->pages_scanned >= (zone->nr_active +
> > > +                                             zone->nr_inactive) * 4)
> > >                           zone->all_unreclaimable = 1;
> > 
> > That might not change anything because we clear ->all_unreclaimable in
> > free_page_bulk().  [..]
> 
> Really? free_page_bulk is called inside shrink_slab, and so it's overwritten
> later by all_unreclaimable. Otherwise how could all_unreclaimable be set
> in the first place if a single page freed by shrink_slab would be enough
> to clear it?
> 
>       shrink_slab
>       all_unreclaimable = 0
>       zone->pages_scanned >= (zone->nr_active [..]
>       all_unreclaimable = 1
> 
>                                                       try_to_free_pages
>                                                       all_unreclaimable == 1
>                                                       oom

Spose so.

> I also considering changing shrink_slab to return a progress retval, but
> then I noticed I could get away with a one liner fix ;).
> 
> Your fix is better but it should be mostly equivalent in pratcie. I
> liked the dontrylock not risking to go oom, the one liner couldn't
> handle that ;).

It has a problem.  If ZONE_DMA is really, really oom, kswapd will sit there
freeing up ZONE_NORMAL slab objects and not setting all_unreclaimable. 
We'll end up using tons of CPU and reclaiming lots of slab in response to a
ZONE_DMA oom.

I'm thinking that the most accurate way of fixing this and also avoiding
the "we're fragmenting slab but not actually freeing pages yet" problem is


- change task_struct->reclaim_state so that it has an array of booleans
  (one per zone)

- in kmem_cache_free, work out what zone the object corresponds to and
  set the boolean in current->reclaim_state which corresponds to that zone.

- in balance_pgdat(), inspect this zone's boolean to see if we're making
  any forward progress with slab freeing.

Probably we can do the work in kmem_cache_free() at the place where we
spill the slab magazine, to optimise things a bit.  I haven't looked at it.

But that has a problem too.  Some other task might be freeing objects into
the relevant zone instead of this one.

So maybe a better approach would be to add a "someone freed something"
counter to the zone structure.  That would be incremented whenever anyone
frees a page for a slab object.  Then in balance_pdgat we take a look at
that before and after performing the LRU and slab scans.  If it
incremented, dont' set all_unreclaimable.  And still keep the
free_pages_bulk code there as the code which takes us _out_ of the
all_unreclaimable state.

It's tricky.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to