Badari Pulavarty <[EMAIL PROTECTED]> wrote:
>
> On Mon, 2005-02-14 at 18:57, Andrew Morton wrote:
> > Badari Pulavarty <[EMAIL PROTECTED]> wrote:
> > >
> > >  Most of DB2 customers use filesystem for their database. Under the load,
> > >  they complain that entire memory in the system is used by filesystem
> > >  pagecache, freememory is very low and system starts swapping crazy OR
> > >  see lots of memory allocation failures and OOM killer kills db2.
> > >  slabinfo shows lots of bufferheads and VM folks claim that, bufferheads
> > >  are holding a ref. on the pages, so they can't use them. So, I want
> > >  to find the truth in the story and findout what exactly happening here
> > >  and which one to blame (VM or FS or IO problems) ?
> > > 
> > >  BTW, all these on 2.4 kernels and I don't have a reproducible testcase
> > >  :(
> > > 
> > >  Feb 7 05:35:17 nmcopsu41 kernel: ENOMEM in do_get_write_access,
> > >  retrying.
> > 
> > Do these machines have a large amount of highmem?
> > 
> > If so, yes, you can oom because lots of highmem pages have buffer_heads
> > attached and you've run out of lowmem.  The 2.4 VM will go off looking for
> > lowmem pages to reclaim and will ignore the highmem pages because there's
> > no highmem shortage.  Consequently those buffer_heads don't get freed up
> > and we're unable to reclaim any lowmem -> oom.
> > 
> > Andrea did a patch along time ago (it'll be in suse 2.4 kernels) which,
> > under these circumstances, strip the buffers from those highmem pages when
> > they're encountered on the LRU.  From a quick read it seems that that patch
> > is not in current 2.4 kernels.
> > 
> > It's harder to do that in 2.6 because we have a separate LR per zone.
> 
> Our DB2 folks *claims* to have seen this problem both on ia32 and AMD64
> customers.  So, I am not sure if its really only highmem related. Only
> workaround seems to be configure DB2 to not to use more than 1.5GB on a
> 8GB RAM system :(

It shouldn't happen on amd64.

> I have nothing much to go on, other than looking data from a sick 
> machine. What should I be looking at, to narrow down the problem
> some more ?

/proc/meminfo and /proc/slabinfo (especially the buffer_head line)

> BTW, none of these BIG customers will take a patch to figure out
> whats happening (since its on their production system) :(
> 

Yup.  What kernel(s) are they running?  I _think_ only suse have fixed that
problem.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to