Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-25 Thread Justin Piszcz


On Thu, 25 Jan 2007, Mark Hahn wrote:

> > Something is seriously wrong with that OOM killer.
> 
> do you know you don't have to operate in OOM-slaughter mode?
> 
> "vm.overcommit_memory = 2" in your /etc/sysctl.conf puts you into a mode where
> the kernel tracks your "committed" memory needs, and will eventually cause
> some allocations to fail.
> this is often much nicer than the default random OOM slaughter.
> (you probably also need to adjust vm.overcommit_ratio with some knowlege of
> your MemTotal and SwapTotal.)
> 
> regards, mark hahn.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

# sysctl -a | grep vm.over
vm.overcommit_ratio = 50
vm.overcommit_memory = 0

I'll have to experiment with these options, thanks for the info!

Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-25 Thread Mark Hahn

Something is seriously wrong with that OOM killer.


do you know you don't have to operate in OOM-slaughter mode?

"vm.overcommit_memory = 2" in your /etc/sysctl.conf puts you 
into a mode where the kernel tracks your "committed" memory 
needs, and will eventually cause some allocations to fail.

this is often much nicer than the default random OOM slaughter.
(you probably also need to adjust vm.overcommit_ratio with 
some knowlege of your MemTotal and SwapTotal.)


regards, mark hahn.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-25 Thread Justin Piszcz


On Wed, 24 Jan 2007, Bill Cizek wrote:

> Justin Piszcz wrote:
> > On Mon, 22 Jan 2007, Andrew Morton wrote:
> >   
> > > > On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz
> > > > <[EMAIL PROTECTED]> wrote:
> > > > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to
> > > > invoke the OOM killer and kill all of my processes?
> > > >   
> > Running with PREEMPT OFF lets me copy the file!!  The machine LAGS
> > occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds of
> > lag, but hey, it does not crash!! I will boot the older kernel with preempt
> > on and see if I can get you that information you requested.
> >   
> Justin,
> 
> According to your kernel_ring_buffer.txt (attached to another email), you are
> using "anticipatory" as your io scheduler:
>   289  Jan 24 18:35:25 p34 kernel: [0.142130] io scheduler noop registered
>   290  Jan 24 18:35:25 p34 kernel: [0.142194] io scheduler anticipatory
> registered (default)
> 
> I had a problem with this scheduler where my system would occasionally lockup
> during heavy I/O.  Sometimes it would fix itself, sometimes I had to reboot.
> I changed to the "CFQ" io scheduler and my system has worked fine since then.
> 
> CFQ has to be built into the kernel (under BlockLayer/IOSchedulers).  It can
> be selected as default or you can set it during runtime:
> 
> echo cfq > /sys/block//queue/scheduler
> ...
> 
> Hope this helps,
> Bill
> 
>

I used to run CFQ awhile back but then I switched over to AS as it has 
better performance for my workloads, currently, I am running with PREEMPT 
off, if I see any additional issues, I will switch to the CFQ scheduler.

Right now, its the OOM killer that is going crazy.

Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-25 Thread Justin Piszcz


On Thu, 25 Jan 2007, Nick Piggin wrote:

> Justin Piszcz wrote:
> > 
> > On Mon, 22 Jan 2007, Andrew Morton wrote:
> 
> > >After the oom-killing, please see if you can free up the ZONE_NORMAL memory
> > >via a few `echo 3 > /proc/sys/vm/drop_caches' commands.  See if you can
> > >work out what happened to the missing couple-of-hundred MB from
> > >ZONE_NORMAL.
> > >
> > 
> > Running with PREEMPT OFF lets me copy the file!!  The machine LAGS
> > occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds of
> > lag, but hey, it does not crash!! I will boot the older kernel with preempt
> > on and see if I can get you that information you requested.
> 
> It wouldn't be a bad idea to recompile the new kernel with preempt on
> and get the info from there.
> 
> It is usually best to be working with the most recent kernels. We can
> always backport any important fixes if we need to.
> 
> Thanks,
> Nick
> 
> -- 
> SUSE Labs, Novell Inc.
> Send instant messages to your online friends http://au.messenger.yahoo.com -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

In my tests for the most part I am using the latest kernels.

Justin.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-25 Thread Justin Piszcz


On Thu, 25 Jan 2007, Pavel Machek wrote:

> Hi!
> 
> > > Is it highmem-related? Can you try it with mem=256M?
> > 
> > Bad idea, the kernel crashes & burns when I use mem=256, I had to boot 
> > 2.6.20-rc5-6 single to get back into my machine, very nasty.  Remember I 
> > use an onboard graphics controller that has 128MB of RAM allocated to it 
> > and I believe the ICH8 chipset also uses some memory, in any event mem=256 
> > causes the machine to lockup before it can even get to the boot/init 
> > processes, the two leds on the keyboard were blinking, caps lock and 
> > scroll lock and I saw no console at all!
> 
> Okay, so try mem=700M or disable CONFIG_HIGHMEM or something.
>   Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

I forgot to remove the mem=700M with PREEMPT off and this is what I saw 
this morning:

[18005.261875] Killed process 4768 (screen)
[18005.262343] Out of memory: kill process 4793 (screen) score 385 or a 
child
[18005.262350] Killed process 4793 (screen)
[18005.378536] Out of memory: kill process 4825 (screen) score 385 or a 
child
[18005.378542] Killed process 4825 (screen)
[18005.378547] Out of memory: kill process 4825 (screen) score 385 or a 
child
[18005.378553] Killed process 4825 (screen)
[18005.413072] Out of memory: kill process 4875 (screen) score 385 or a 
child
[18005.413079] Killed process 4875 (screen)
[18005.423735] Out of memory: kill process 4970 (screen) score 385 or a 
child
[18005.423742] Killed process 4970 (screen)
[18005.431391] Out of memory: kill process 21365 (xfs_fsr) score 286 or a 
child
[18005.431398] Killed process 21365 (xfs_fsr)
 
$ screen -ls
  There are screens on:
2532.pts-0.p34  (Dead ???)
3776.pts-2.p34  (Dead ???)
4768.pts-7.p34  (Dead ???)
4793.pts-9.p34  (Dead ???)
4825.pts-11.p34 (Dead ???)
4875.pts-13.p34 (Dead ???)
4970.pts-15.p34 (Dead ???)
 

Lovely...

$ uname -r
2.6.20-rc5

Something is seriously wrong with that OOM killer.

Justin.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-25 Thread Justin Piszcz


On Thu, 25 Jan 2007, Pavel Machek wrote:

 Hi!
 
   Is it highmem-related? Can you try it with mem=256M?
  
  Bad idea, the kernel crashes  burns when I use mem=256, I had to boot 
  2.6.20-rc5-6 single to get back into my machine, very nasty.  Remember I 
  use an onboard graphics controller that has 128MB of RAM allocated to it 
  and I believe the ICH8 chipset also uses some memory, in any event mem=256 
  causes the machine to lockup before it can even get to the boot/init 
  processes, the two leds on the keyboard were blinking, caps lock and 
  scroll lock and I saw no console at all!
 
 Okay, so try mem=700M or disable CONFIG_HIGHMEM or something.
   Pavel
 -- 
 (english) http://www.livejournal.com/~pavelmachek
 (cesky, pictures) 
 http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

I forgot to remove the mem=700M with PREEMPT off and this is what I saw 
this morning:

[18005.261875] Killed process 4768 (screen)
[18005.262343] Out of memory: kill process 4793 (screen) score 385 or a 
child
[18005.262350] Killed process 4793 (screen)
[18005.378536] Out of memory: kill process 4825 (screen) score 385 or a 
child
[18005.378542] Killed process 4825 (screen)
[18005.378547] Out of memory: kill process 4825 (screen) score 385 or a 
child
[18005.378553] Killed process 4825 (screen)
[18005.413072] Out of memory: kill process 4875 (screen) score 385 or a 
child
[18005.413079] Killed process 4875 (screen)
[18005.423735] Out of memory: kill process 4970 (screen) score 385 or a 
child
[18005.423742] Killed process 4970 (screen)
[18005.431391] Out of memory: kill process 21365 (xfs_fsr) score 286 or a 
child
[18005.431398] Killed process 21365 (xfs_fsr)
 
$ screen -ls
  There are screens on:
2532.pts-0.p34  (Dead ???)
3776.pts-2.p34  (Dead ???)
4768.pts-7.p34  (Dead ???)
4793.pts-9.p34  (Dead ???)
4825.pts-11.p34 (Dead ???)
4875.pts-13.p34 (Dead ???)
4970.pts-15.p34 (Dead ???)
 

Lovely...

$ uname -r
2.6.20-rc5

Something is seriously wrong with that OOM killer.

Justin.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-25 Thread Justin Piszcz


On Thu, 25 Jan 2007, Nick Piggin wrote:

 Justin Piszcz wrote:
  
  On Mon, 22 Jan 2007, Andrew Morton wrote:
 
  After the oom-killing, please see if you can free up the ZONE_NORMAL memory
  via a few `echo 3  /proc/sys/vm/drop_caches' commands.  See if you can
  work out what happened to the missing couple-of-hundred MB from
  ZONE_NORMAL.
  
  
  Running with PREEMPT OFF lets me copy the file!!  The machine LAGS
  occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds of
  lag, but hey, it does not crash!! I will boot the older kernel with preempt
  on and see if I can get you that information you requested.
 
 It wouldn't be a bad idea to recompile the new kernel with preempt on
 and get the info from there.
 
 It is usually best to be working with the most recent kernels. We can
 always backport any important fixes if we need to.
 
 Thanks,
 Nick
 
 -- 
 SUSE Labs, Novell Inc.
 Send instant messages to your online friends http://au.messenger.yahoo.com -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

In my tests for the most part I am using the latest kernels.

Justin.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-25 Thread Justin Piszcz


On Wed, 24 Jan 2007, Bill Cizek wrote:

 Justin Piszcz wrote:
  On Mon, 22 Jan 2007, Andrew Morton wrote:

On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz
[EMAIL PROTECTED] wrote:
Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to
invoke the OOM killer and kill all of my processes?
  
  Running with PREEMPT OFF lets me copy the file!!  The machine LAGS
  occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds of
  lag, but hey, it does not crash!! I will boot the older kernel with preempt
  on and see if I can get you that information you requested.

 Justin,
 
 According to your kernel_ring_buffer.txt (attached to another email), you are
 using anticipatory as your io scheduler:
   289  Jan 24 18:35:25 p34 kernel: [0.142130] io scheduler noop registered
   290  Jan 24 18:35:25 p34 kernel: [0.142194] io scheduler anticipatory
 registered (default)
 
 I had a problem with this scheduler where my system would occasionally lockup
 during heavy I/O.  Sometimes it would fix itself, sometimes I had to reboot.
 I changed to the CFQ io scheduler and my system has worked fine since then.
 
 CFQ has to be built into the kernel (under BlockLayer/IOSchedulers).  It can
 be selected as default or you can set it during runtime:
 
 echo cfq  /sys/block/disk/queue/scheduler
 ...
 
 Hope this helps,
 Bill
 


I used to run CFQ awhile back but then I switched over to AS as it has 
better performance for my workloads, currently, I am running with PREEMPT 
off, if I see any additional issues, I will switch to the CFQ scheduler.

Right now, its the OOM killer that is going crazy.

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-25 Thread Mark Hahn

Something is seriously wrong with that OOM killer.


do you know you don't have to operate in OOM-slaughter mode?

vm.overcommit_memory = 2 in your /etc/sysctl.conf puts you 
into a mode where the kernel tracks your committed memory 
needs, and will eventually cause some allocations to fail.

this is often much nicer than the default random OOM slaughter.
(you probably also need to adjust vm.overcommit_ratio with 
some knowlege of your MemTotal and SwapTotal.)


regards, mark hahn.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-25 Thread Justin Piszcz


On Thu, 25 Jan 2007, Mark Hahn wrote:

  Something is seriously wrong with that OOM killer.
 
 do you know you don't have to operate in OOM-slaughter mode?
 
 vm.overcommit_memory = 2 in your /etc/sysctl.conf puts you into a mode where
 the kernel tracks your committed memory needs, and will eventually cause
 some allocations to fail.
 this is often much nicer than the default random OOM slaughter.
 (you probably also need to adjust vm.overcommit_ratio with some knowlege of
 your MemTotal and SwapTotal.)
 
 regards, mark hahn.
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 

# sysctl -a | grep vm.over
vm.overcommit_ratio = 50
vm.overcommit_memory = 0

I'll have to experiment with these options, thanks for the info!

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-24 Thread Bill Cizek

Justin Piszcz wrote:

On Mon, 22 Jan 2007, Andrew Morton wrote:
  

On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <[EMAIL PROTECTED]> 
wrote:
Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
the OOM killer and kill all of my processes?
  
Running with PREEMPT OFF lets me copy the file!!  The machine LAGS 
occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds 
of lag, but hey, it does not crash!! I will boot the older kernel with 
preempt on and see if I can get you that information you requested.
  

Justin,

According to your kernel_ring_buffer.txt (attached to another email), 
you are using "anticipatory" as your io scheduler:
  289  Jan 24 18:35:25 p34 kernel: [0.142130] io scheduler noop 
registered
  290  Jan 24 18:35:25 p34 kernel: [0.142194] io scheduler 
anticipatory registered (default)


I had a problem with this scheduler where my system would occasionally 
lockup during heavy I/O.  Sometimes it would fix itself, sometimes I had 
to reboot.  I changed to the "CFQ" io scheduler and my system has worked 
fine since then.


CFQ has to be built into the kernel (under BlockLayer/IOSchedulers).  It 
can be selected as default or you can set it during runtime:


echo cfq > /sys/block//queue/scheduler
...

Hope this helps,
Bill
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-24 Thread Justin Piszcz


On Thu, 25 Jan 2007, Pavel Machek wrote:

> Hi!
> 
> > > Is it highmem-related? Can you try it with mem=256M?
> > 
> > Bad idea, the kernel crashes & burns when I use mem=256, I had to boot 
> > 2.6.20-rc5-6 single to get back into my machine, very nasty.  Remember I 
> > use an onboard graphics controller that has 128MB of RAM allocated to it 
> > and I believe the ICH8 chipset also uses some memory, in any event mem=256 
> > causes the machine to lockup before it can even get to the boot/init 
> > processes, the two leds on the keyboard were blinking, caps lock and 
> > scroll lock and I saw no console at all!
> 
> Okay, so try mem=700M or disable CONFIG_HIGHMEM or something.
>   Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Looks like you may be onto something Pavel, has not invoked the OOM killer 
yet using mem=700M, I see the swap increasing and the speed of the copy is 
only 14-15MB/s, when mem= is off (giving me all memory w/preempt) I get 
45-65MB/s.

With append="mem=700M", seen below:

top - 19:38:46 up 1 min,  3 users,  load average: 1.24, 0.38, 0.13
Tasks: 172 total,   1 running, 171 sleeping,   0 stopped,   0 zombie
Cpu(s): 10.6%us,  4.6%sy,  1.3%ni, 69.6%id, 13.9%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:705512k total,   699268k used, 6244k free,   12k buffers
Swap:  2200760k total,18520k used,  2182240k free,34968k cached


(with mem=700M):
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 2  0  40276   7456 12  2318400 13196 14904 1204 4756  1  2 50 47
 0  2  40276   6156 12  2426400 20096 18352 1293 6400  1  4 49 46
 0  2  40276   6264 12  2426400 15504 17836 1219 5202  0  2 50 48
 0  2  40276   6144 12  2437600 14348 14453 1190 4815  0  3 49 48
 0  2  40276   6156 12  2450400 11396 12724 1169 3724  1  2 50 48
 0  1  40276   7532 12  2327200 11412 13121 1183 4017  0  2 50 48
 0  1  40276   7944 12  2264400 19084 19144 1234 6548  0  4 50 46


Almost there, looks like its going to make it.
-rw-r--r-- 1 user group 18630127104 2007-01-21 12:41 18gb
-rw-r--r-- 1 user group 15399329792 2007-01-24 19:55 18gb.copy

Hrmm, with preempt off it works and with preempt ON and mem=700M it works.
I guess I will run with preempt off as I'd prefer to have the memory 
available-- and tolerate the large lag bursts w/ no preemption.


Yup it worked.

-rw-r--r-- 1 user group 18630127104 2007-01-21 12:41 18gb
-rw-r--r-- 1 user group 18630127104 2007-01-21 12:41 18gb.copy

Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-24 Thread Nick Piggin

Justin Piszcz wrote:


On Mon, 22 Jan 2007, Andrew Morton wrote:



After the oom-killing, please see if you can free up the ZONE_NORMAL memory
via a few `echo 3 > /proc/sys/vm/drop_caches' commands.  See if you can
work out what happened to the missing couple-of-hundred MB from
ZONE_NORMAL.



Running with PREEMPT OFF lets me copy the file!!  The machine LAGS 
occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds 
of lag, but hey, it does not crash!! I will boot the older kernel with 
preempt on and see if I can get you that information you requested.


It wouldn't be a bad idea to recompile the new kernel with preempt on
and get the info from there.

It is usually best to be working with the most recent kernels. We can
always backport any important fixes if we need to.

Thanks,
Nick

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-24 Thread Justin Piszcz


On Thu, 25 Jan 2007, Pavel Machek wrote:

> Hi!
> 
> > > Is it highmem-related? Can you try it with mem=256M?
> > 
> > Bad idea, the kernel crashes & burns when I use mem=256, I had to boot 
> > 2.6.20-rc5-6 single to get back into my machine, very nasty.  Remember I 
> > use an onboard graphics controller that has 128MB of RAM allocated to it 
> > and I believe the ICH8 chipset also uses some memory, in any event mem=256 
> > causes the machine to lockup before it can even get to the boot/init 
> > processes, the two leds on the keyboard were blinking, caps lock and 
> > scroll lock and I saw no console at all!
> 
> Okay, so try mem=700M or disable CONFIG_HIGHMEM or something.
>   Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Ok, this will be my last test for tonight, trying now.

Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-24 Thread Pavel Machek
Hi!

> > Is it highmem-related? Can you try it with mem=256M?
> 
> Bad idea, the kernel crashes & burns when I use mem=256, I had to boot 
> 2.6.20-rc5-6 single to get back into my machine, very nasty.  Remember I 
> use an onboard graphics controller that has 128MB of RAM allocated to it 
> and I believe the ICH8 chipset also uses some memory, in any event mem=256 
> causes the machine to lockup before it can even get to the boot/init 
> processes, the two leds on the keyboard were blinking, caps lock and 
> scroll lock and I saw no console at all!

Okay, so try mem=700M or disable CONFIG_HIGHMEM or something.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-24 Thread Justin Piszcz


On Mon, 22 Jan 2007, Andrew Morton wrote:

> > On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <[EMAIL PROTECTED]> 
> > wrote:
> > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> > the OOM killer and kill all of my processes?
> 
> What's that?   Software raid or hardware raid?  If the latter, which driver?
> 
> > Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> > happens every time!
> > 
> > Anything to try?  Any other output needed?  Can someone shed some light on 
> > this situation?
> > 
> > Thanks.
> > 
> > 
> > The last lines of vmstat 1 (right before it kill -9'd my shell/ssh)
> > 
> > procs ---memory-- ---swap-- -io -system-- 
> > cpu
> >  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id 
> > wa
> >  0  7764  50348 12 126998800 53632   172 1902 4600  1  8 
> > 29 62
> >  0  7764  49420 12 126000400 53632 34368 1871 6357  2 11 
> > 48 40
> 
> The wordwrapping is painful :(
> 
> > 
> > The last lines of dmesg:
> > [ 5947.199985] lowmem_reserve[]: 0 0 0
> > [ 5947.12] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 
> > 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
> > [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 
> > 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB
> > [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 
> > 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB
> > [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0
> > [ 5947.200055] Free swap  = 2197628kB
> > [ 5947.200058] Total swap = 2200760kB
> > [ 5947.200060] Free swap:   2197628kB
> > [ 5947.205664] 517888 pages of RAM 
> > [ 5947.205671] 288512 pages of HIGHMEM
> > [ 5947.205673] 5666 reserved pages 
> > [ 5947.205675] 257163 pages shared
> > [ 5947.205678] 600 pages swap cached 
> > [ 5947.205680] 88876 pages dirty
> > [ 5947.205682] 115111 pages writeback
> > [ 5947.205684] 5608 pages mapped
> > [ 5947.205686] 49367 pages slab
> > [ 5947.205688] 541 pages pagetables
> > [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a 
> > child
> > [ 5947.205801] Killed process 1853 (named)
> > [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, 
> > oomkilladj=0
> > [ 5947.206621]  [] out_of_memory+0x17b/0x1b0
> > [ 5947.206631]  [] __alloc_pages+0x29c/0x2f0
> > [ 5947.206636]  [] __pte_alloc+0x1d/0x90
> > [ 5947.206643]  [] copy_page_range+0x357/0x380
> > [ 5947.206649]  [] copy_process+0x765/0xfc0
> > [ 5947.206655]  [] alloc_pid+0x1b9/0x280
> > [ 5947.206662]  [] do_fork+0x79/0x1e0
> > [ 5947.206674]  [] do_pipe+0x5f/0xc0
> > [ 5947.206680]  [] sys_clone+0x36/0x40
> > [ 5947.206686]  [] syscall_call+0x7/0xb
> > [ 5947.206691]  [] __sched_text_start+0x853/0x950
> > [ 5947.206698]  === 
> 
> Important information from the oom-killing event is missing.  Please send
> it all.
> 
> >From your earlier reports we have several hundred MB of ZONE_NORMAL memory
> which has gone awol.
> 
> Please include /proc/meminfo from after the oom-killing.
> 
> Please work out what is using all that slab memory, via /proc/slabinfo.
> 
> After the oom-killing, please see if you can free up the ZONE_NORMAL memory
> via a few `echo 3 > /proc/sys/vm/drop_caches' commands.  See if you can
> work out what happened to the missing couple-of-hundred MB from
> ZONE_NORMAL.
> 
> 

Running with PREEMPT OFF lets me copy the file!!  The machine LAGS 
occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds 
of lag, but hey, it does not crash!! I will boot the older kernel with 
preempt on and see if I can get you that information you requested.

Justin.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-24 Thread Justin Piszcz
And FYI yes I used mem=256M just as you said, not mem=256.

Justin.

On Wed, 24 Jan 2007, Justin Piszcz wrote:

> > Is it highmem-related? Can you try it with mem=256M?
> 
> Bad idea, the kernel crashes & burns when I use mem=256, I had to boot 
> 2.6.20-rc5-6 single to get back into my machine, very nasty.  Remember I 
> use an onboard graphics controller that has 128MB of RAM allocated to it 
> and I believe the ICH8 chipset also uses some memory, in any event mem=256 
> causes the machine to lockup before it can even get to the boot/init 
> processes, the two leds on the keyboard were blinking, caps lock and 
> scroll lock and I saw no console at all!
> 
> Justin.
> 
> On Mon, 22 Jan 2007, Justin Piszcz wrote:
> 
> > 
> > 
> > On Mon, 22 Jan 2007, Pavel Machek wrote:
> > 
> > > On Sun 2007-01-21 14:27:34, Justin Piszcz wrote:
> > > > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to 
> > > > invoke 
> > > > the OOM killer and kill all of my processes?
> > > > 
> > > > Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> > > > happens every time!
> > > > 
> > > > Anything to try?  Any other output needed?  Can someone shed some light 
> > > > on 
> > > > this situation?
> > > 
> > > Is it highmem-related? Can you try it with mem=256M?
> > > 
> > >   Pavel
> > > -- 
> > > (english) http://www.livejournal.com/~pavelmachek
> > > (cesky, pictures) 
> > > http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > > the body of a message to [EMAIL PROTECTED]
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> > I will give this a try later or tomorrow, I cannot have my machine crash 
> > at the moment.
> > 
> > Also, the onboard video on the Intel 965 chipset uses 128MB, not sure if 
> > that has anything to do with it because after the system kill -9's all the 
> > processes etc, my terminal looks like garbage.
> > 
> > Justin.
> > 
> > 
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-24 Thread Justin Piszcz


On Mon, 22 Jan 2007, Andrew Morton wrote:

> > On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <[EMAIL PROTECTED]> 
> > wrote:
> > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> > the OOM killer and kill all of my processes?
> 
> What's that?   Software raid or hardware raid?  If the latter, which driver?
> 
> > Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> > happens every time!
> > 
> > Anything to try?  Any other output needed?  Can someone shed some light on 
> > this situation?
> > 
> > Thanks.
> > 
> > 
> > The last lines of vmstat 1 (right before it kill -9'd my shell/ssh)
> > 
> > procs ---memory-- ---swap-- -io -system-- 
> > cpu
> >  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id 
> > wa
> >  0  7764  50348 12 126998800 53632   172 1902 4600  1  8 
> > 29 62
> >  0  7764  49420 12 126000400 53632 34368 1871 6357  2 11 
> > 48 40
> 
> The wordwrapping is painful :(
> 
> > 
> > The last lines of dmesg:
> > [ 5947.199985] lowmem_reserve[]: 0 0 0
> > [ 5947.12] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 
> > 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
> > [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 
> > 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB
> > [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 
> > 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB
> > [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0
> > [ 5947.200055] Free swap  = 2197628kB
> > [ 5947.200058] Total swap = 2200760kB
> > [ 5947.200060] Free swap:   2197628kB
> > [ 5947.205664] 517888 pages of RAM 
> > [ 5947.205671] 288512 pages of HIGHMEM
> > [ 5947.205673] 5666 reserved pages 
> > [ 5947.205675] 257163 pages shared
> > [ 5947.205678] 600 pages swap cached 
> > [ 5947.205680] 88876 pages dirty
> > [ 5947.205682] 115111 pages writeback
> > [ 5947.205684] 5608 pages mapped
> > [ 5947.205686] 49367 pages slab
> > [ 5947.205688] 541 pages pagetables
> > [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a 
> > child
> > [ 5947.205801] Killed process 1853 (named)
> > [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, 
> > oomkilladj=0
> > [ 5947.206621]  [] out_of_memory+0x17b/0x1b0
> > [ 5947.206631]  [] __alloc_pages+0x29c/0x2f0
> > [ 5947.206636]  [] __pte_alloc+0x1d/0x90
> > [ 5947.206643]  [] copy_page_range+0x357/0x380
> > [ 5947.206649]  [] copy_process+0x765/0xfc0
> > [ 5947.206655]  [] alloc_pid+0x1b9/0x280
> > [ 5947.206662]  [] do_fork+0x79/0x1e0
> > [ 5947.206674]  [] do_pipe+0x5f/0xc0
> > [ 5947.206680]  [] sys_clone+0x36/0x40
> > [ 5947.206686]  [] syscall_call+0x7/0xb
> > [ 5947.206691]  [] __sched_text_start+0x853/0x950
> > [ 5947.206698]  === 
> 
> Important information from the oom-killing event is missing.  Please send
> it all.
> 
> >From your earlier reports we have several hundred MB of ZONE_NORMAL memory
> which has gone awol.
> 
> Please include /proc/meminfo from after the oom-killing.
> 
> Please work out what is using all that slab memory, via /proc/slabinfo.
> 
> After the oom-killing, please see if you can free up the ZONE_NORMAL memory
> via a few `echo 3 > /proc/sys/vm/drop_caches' commands.  See if you can
> work out what happened to the missing couple-of-hundred MB from
> ZONE_NORMAL.
> 
> 

Trying this now.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-24 Thread Justin Piszcz
> Is it highmem-related? Can you try it with mem=256M?

Bad idea, the kernel crashes & burns when I use mem=256, I had to boot 
2.6.20-rc5-6 single to get back into my machine, very nasty.  Remember I 
use an onboard graphics controller that has 128MB of RAM allocated to it 
and I believe the ICH8 chipset also uses some memory, in any event mem=256 
causes the machine to lockup before it can even get to the boot/init 
processes, the two leds on the keyboard were blinking, caps lock and 
scroll lock and I saw no console at all!

Justin.

On Mon, 22 Jan 2007, Justin Piszcz wrote:

> 
> 
> On Mon, 22 Jan 2007, Pavel Machek wrote:
> 
> > On Sun 2007-01-21 14:27:34, Justin Piszcz wrote:
> > > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to 
> > > invoke 
> > > the OOM killer and kill all of my processes?
> > > 
> > > Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> > > happens every time!
> > > 
> > > Anything to try?  Any other output needed?  Can someone shed some light 
> > > on 
> > > this situation?
> > 
> > Is it highmem-related? Can you try it with mem=256M?
> > 
> > Pavel
> > -- 
> > (english) http://www.livejournal.com/~pavelmachek
> > (cesky, pictures) 
> > http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> I will give this a try later or tomorrow, I cannot have my machine crash 
> at the moment.
> 
> Also, the onboard video on the Intel 965 chipset uses 128MB, not sure if 
> that has anything to do with it because after the system kill -9's all the 
> processes etc, my terminal looks like garbage.
> 
> Justin.
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-24 Thread Justin Piszcz
 Is it highmem-related? Can you try it with mem=256M?

Bad idea, the kernel crashes  burns when I use mem=256, I had to boot 
2.6.20-rc5-6 single to get back into my machine, very nasty.  Remember I 
use an onboard graphics controller that has 128MB of RAM allocated to it 
and I believe the ICH8 chipset also uses some memory, in any event mem=256 
causes the machine to lockup before it can even get to the boot/init 
processes, the two leds on the keyboard were blinking, caps lock and 
scroll lock and I saw no console at all!

Justin.

On Mon, 22 Jan 2007, Justin Piszcz wrote:

 
 
 On Mon, 22 Jan 2007, Pavel Machek wrote:
 
  On Sun 2007-01-21 14:27:34, Justin Piszcz wrote:
   Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to 
   invoke 
   the OOM killer and kill all of my processes?
   
   Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
   happens every time!
   
   Anything to try?  Any other output needed?  Can someone shed some light 
   on 
   this situation?
  
  Is it highmem-related? Can you try it with mem=256M?
  
  Pavel
  -- 
  (english) http://www.livejournal.com/~pavelmachek
  (cesky, pictures) 
  http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
  -
  To unsubscribe from this list: send the line unsubscribe linux-raid in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
 
 I will give this a try later or tomorrow, I cannot have my machine crash 
 at the moment.
 
 Also, the onboard video on the Intel 965 chipset uses 128MB, not sure if 
 that has anything to do with it because after the system kill -9's all the 
 processes etc, my terminal looks like garbage.
 
 Justin.
 
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-24 Thread Justin Piszcz


On Mon, 22 Jan 2007, Andrew Morton wrote:

  On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz [EMAIL PROTECTED] 
  wrote:
  Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
  the OOM killer and kill all of my processes?
 
 What's that?   Software raid or hardware raid?  If the latter, which driver?
 
  Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
  happens every time!
  
  Anything to try?  Any other output needed?  Can someone shed some light on 
  this situation?
  
  Thanks.
  
  
  The last lines of vmstat 1 (right before it kill -9'd my shell/ssh)
  
  procs ---memory-- ---swap-- -io -system-- 
  cpu
   r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id 
  wa
   0  7764  50348 12 126998800 53632   172 1902 4600  1  8 
  29 62
   0  7764  49420 12 126000400 53632 34368 1871 6357  2 11 
  48 40
 
 The wordwrapping is painful :(
 
  
  The last lines of dmesg:
  [ 5947.199985] lowmem_reserve[]: 0 0 0
  [ 5947.12] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 
  0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
  [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 
  1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB
  [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 
  1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB
  [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0
  [ 5947.200055] Free swap  = 2197628kB
  [ 5947.200058] Total swap = 2200760kB
  [ 5947.200060] Free swap:   2197628kB
  [ 5947.205664] 517888 pages of RAM 
  [ 5947.205671] 288512 pages of HIGHMEM
  [ 5947.205673] 5666 reserved pages 
  [ 5947.205675] 257163 pages shared
  [ 5947.205678] 600 pages swap cached 
  [ 5947.205680] 88876 pages dirty
  [ 5947.205682] 115111 pages writeback
  [ 5947.205684] 5608 pages mapped
  [ 5947.205686] 49367 pages slab
  [ 5947.205688] 541 pages pagetables
  [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a 
  child
  [ 5947.205801] Killed process 1853 (named)
  [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, 
  oomkilladj=0
  [ 5947.206621]  [c013e33b] out_of_memory+0x17b/0x1b0
  [ 5947.206631]  [c013fcac] __alloc_pages+0x29c/0x2f0
  [ 5947.206636]  [c01479ad] __pte_alloc+0x1d/0x90
  [ 5947.206643]  [c0148bf7] copy_page_range+0x357/0x380
  [ 5947.206649]  [c0119d75] copy_process+0x765/0xfc0
  [ 5947.206655]  [c012c3f9] alloc_pid+0x1b9/0x280
  [ 5947.206662]  [c011a839] do_fork+0x79/0x1e0
  [ 5947.206674]  [c015f91f] do_pipe+0x5f/0xc0
  [ 5947.206680]  [c0101176] sys_clone+0x36/0x40
  [ 5947.206686]  [c0103138] syscall_call+0x7/0xb
  [ 5947.206691]  [c0420033] __sched_text_start+0x853/0x950
  [ 5947.206698]  === 
 
 Important information from the oom-killing event is missing.  Please send
 it all.
 
 From your earlier reports we have several hundred MB of ZONE_NORMAL memory
 which has gone awol.
 
 Please include /proc/meminfo from after the oom-killing.
 
 Please work out what is using all that slab memory, via /proc/slabinfo.
 
 After the oom-killing, please see if you can free up the ZONE_NORMAL memory
 via a few `echo 3  /proc/sys/vm/drop_caches' commands.  See if you can
 work out what happened to the missing couple-of-hundred MB from
 ZONE_NORMAL.
 
 

Trying this now.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-24 Thread Justin Piszcz
And FYI yes I used mem=256M just as you said, not mem=256.

Justin.

On Wed, 24 Jan 2007, Justin Piszcz wrote:

  Is it highmem-related? Can you try it with mem=256M?
 
 Bad idea, the kernel crashes  burns when I use mem=256, I had to boot 
 2.6.20-rc5-6 single to get back into my machine, very nasty.  Remember I 
 use an onboard graphics controller that has 128MB of RAM allocated to it 
 and I believe the ICH8 chipset also uses some memory, in any event mem=256 
 causes the machine to lockup before it can even get to the boot/init 
 processes, the two leds on the keyboard were blinking, caps lock and 
 scroll lock and I saw no console at all!
 
 Justin.
 
 On Mon, 22 Jan 2007, Justin Piszcz wrote:
 
  
  
  On Mon, 22 Jan 2007, Pavel Machek wrote:
  
   On Sun 2007-01-21 14:27:34, Justin Piszcz wrote:
Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to 
invoke 
the OOM killer and kill all of my processes?

Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
happens every time!

Anything to try?  Any other output needed?  Can someone shed some light 
on 
this situation?
   
   Is it highmem-related? Can you try it with mem=256M?
   
 Pavel
   -- 
   (english) http://www.livejournal.com/~pavelmachek
   (cesky, pictures) 
   http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
   -
   To unsubscribe from this list: send the line unsubscribe linux-raid in
   the body of a message to [EMAIL PROTECTED]
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
   
  
  I will give this a try later or tomorrow, I cannot have my machine crash 
  at the moment.
  
  Also, the onboard video on the Intel 965 chipset uses 128MB, not sure if 
  that has anything to do with it because after the system kill -9's all the 
  processes etc, my terminal looks like garbage.
  
  Justin.
  
  
 
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-24 Thread Justin Piszcz


On Mon, 22 Jan 2007, Andrew Morton wrote:

  On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz [EMAIL PROTECTED] 
  wrote:
  Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
  the OOM killer and kill all of my processes?
 
 What's that?   Software raid or hardware raid?  If the latter, which driver?
 
  Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
  happens every time!
  
  Anything to try?  Any other output needed?  Can someone shed some light on 
  this situation?
  
  Thanks.
  
  
  The last lines of vmstat 1 (right before it kill -9'd my shell/ssh)
  
  procs ---memory-- ---swap-- -io -system-- 
  cpu
   r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id 
  wa
   0  7764  50348 12 126998800 53632   172 1902 4600  1  8 
  29 62
   0  7764  49420 12 126000400 53632 34368 1871 6357  2 11 
  48 40
 
 The wordwrapping is painful :(
 
  
  The last lines of dmesg:
  [ 5947.199985] lowmem_reserve[]: 0 0 0
  [ 5947.12] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 
  0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
  [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 
  1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB
  [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 
  1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB
  [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0
  [ 5947.200055] Free swap  = 2197628kB
  [ 5947.200058] Total swap = 2200760kB
  [ 5947.200060] Free swap:   2197628kB
  [ 5947.205664] 517888 pages of RAM 
  [ 5947.205671] 288512 pages of HIGHMEM
  [ 5947.205673] 5666 reserved pages 
  [ 5947.205675] 257163 pages shared
  [ 5947.205678] 600 pages swap cached 
  [ 5947.205680] 88876 pages dirty
  [ 5947.205682] 115111 pages writeback
  [ 5947.205684] 5608 pages mapped
  [ 5947.205686] 49367 pages slab
  [ 5947.205688] 541 pages pagetables
  [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a 
  child
  [ 5947.205801] Killed process 1853 (named)
  [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, 
  oomkilladj=0
  [ 5947.206621]  [c013e33b] out_of_memory+0x17b/0x1b0
  [ 5947.206631]  [c013fcac] __alloc_pages+0x29c/0x2f0
  [ 5947.206636]  [c01479ad] __pte_alloc+0x1d/0x90
  [ 5947.206643]  [c0148bf7] copy_page_range+0x357/0x380
  [ 5947.206649]  [c0119d75] copy_process+0x765/0xfc0
  [ 5947.206655]  [c012c3f9] alloc_pid+0x1b9/0x280
  [ 5947.206662]  [c011a839] do_fork+0x79/0x1e0
  [ 5947.206674]  [c015f91f] do_pipe+0x5f/0xc0
  [ 5947.206680]  [c0101176] sys_clone+0x36/0x40
  [ 5947.206686]  [c0103138] syscall_call+0x7/0xb
  [ 5947.206691]  [c0420033] __sched_text_start+0x853/0x950
  [ 5947.206698]  === 
 
 Important information from the oom-killing event is missing.  Please send
 it all.
 
 From your earlier reports we have several hundred MB of ZONE_NORMAL memory
 which has gone awol.
 
 Please include /proc/meminfo from after the oom-killing.
 
 Please work out what is using all that slab memory, via /proc/slabinfo.
 
 After the oom-killing, please see if you can free up the ZONE_NORMAL memory
 via a few `echo 3  /proc/sys/vm/drop_caches' commands.  See if you can
 work out what happened to the missing couple-of-hundred MB from
 ZONE_NORMAL.
 
 

Running with PREEMPT OFF lets me copy the file!!  The machine LAGS 
occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds 
of lag, but hey, it does not crash!! I will boot the older kernel with 
preempt on and see if I can get you that information you requested.

Justin.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-24 Thread Pavel Machek
Hi!

  Is it highmem-related? Can you try it with mem=256M?
 
 Bad idea, the kernel crashes  burns when I use mem=256, I had to boot 
 2.6.20-rc5-6 single to get back into my machine, very nasty.  Remember I 
 use an onboard graphics controller that has 128MB of RAM allocated to it 
 and I believe the ICH8 chipset also uses some memory, in any event mem=256 
 causes the machine to lockup before it can even get to the boot/init 
 processes, the two leds on the keyboard were blinking, caps lock and 
 scroll lock and I saw no console at all!

Okay, so try mem=700M or disable CONFIG_HIGHMEM or something.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-24 Thread Justin Piszcz


On Thu, 25 Jan 2007, Pavel Machek wrote:

 Hi!
 
   Is it highmem-related? Can you try it with mem=256M?
  
  Bad idea, the kernel crashes  burns when I use mem=256, I had to boot 
  2.6.20-rc5-6 single to get back into my machine, very nasty.  Remember I 
  use an onboard graphics controller that has 128MB of RAM allocated to it 
  and I believe the ICH8 chipset also uses some memory, in any event mem=256 
  causes the machine to lockup before it can even get to the boot/init 
  processes, the two leds on the keyboard were blinking, caps lock and 
  scroll lock and I saw no console at all!
 
 Okay, so try mem=700M or disable CONFIG_HIGHMEM or something.
   Pavel
 -- 
 (english) http://www.livejournal.com/~pavelmachek
 (cesky, pictures) 
 http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

Ok, this will be my last test for tonight, trying now.

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-24 Thread Nick Piggin

Justin Piszcz wrote:


On Mon, 22 Jan 2007, Andrew Morton wrote:



After the oom-killing, please see if you can free up the ZONE_NORMAL memory
via a few `echo 3  /proc/sys/vm/drop_caches' commands.  See if you can
work out what happened to the missing couple-of-hundred MB from
ZONE_NORMAL.



Running with PREEMPT OFF lets me copy the file!!  The machine LAGS 
occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds 
of lag, but hey, it does not crash!! I will boot the older kernel with 
preempt on and see if I can get you that information you requested.


It wouldn't be a bad idea to recompile the new kernel with preempt on
and get the info from there.

It is usually best to be working with the most recent kernels. We can
always backport any important fixes if we need to.

Thanks,
Nick

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-24 Thread Justin Piszcz


On Thu, 25 Jan 2007, Pavel Machek wrote:

 Hi!
 
   Is it highmem-related? Can you try it with mem=256M?
  
  Bad idea, the kernel crashes  burns when I use mem=256, I had to boot 
  2.6.20-rc5-6 single to get back into my machine, very nasty.  Remember I 
  use an onboard graphics controller that has 128MB of RAM allocated to it 
  and I believe the ICH8 chipset also uses some memory, in any event mem=256 
  causes the machine to lockup before it can even get to the boot/init 
  processes, the two leds on the keyboard were blinking, caps lock and 
  scroll lock and I saw no console at all!
 
 Okay, so try mem=700M or disable CONFIG_HIGHMEM or something.
   Pavel
 -- 
 (english) http://www.livejournal.com/~pavelmachek
 (cesky, pictures) 
 http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

Looks like you may be onto something Pavel, has not invoked the OOM killer 
yet using mem=700M, I see the swap increasing and the speed of the copy is 
only 14-15MB/s, when mem= is off (giving me all memory w/preempt) I get 
45-65MB/s.

With append=mem=700M, seen below:

top - 19:38:46 up 1 min,  3 users,  load average: 1.24, 0.38, 0.13
Tasks: 172 total,   1 running, 171 sleeping,   0 stopped,   0 zombie
Cpu(s): 10.6%us,  4.6%sy,  1.3%ni, 69.6%id, 13.9%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:705512k total,   699268k used, 6244k free,   12k buffers
Swap:  2200760k total,18520k used,  2182240k free,34968k cached


(with mem=700M):
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 2  0  40276   7456 12  2318400 13196 14904 1204 4756  1  2 50 47
 0  2  40276   6156 12  2426400 20096 18352 1293 6400  1  4 49 46
 0  2  40276   6264 12  2426400 15504 17836 1219 5202  0  2 50 48
 0  2  40276   6144 12  2437600 14348 14453 1190 4815  0  3 49 48
 0  2  40276   6156 12  2450400 11396 12724 1169 3724  1  2 50 48
 0  1  40276   7532 12  2327200 11412 13121 1183 4017  0  2 50 48
 0  1  40276   7944 12  2264400 19084 19144 1234 6548  0  4 50 46


Almost there, looks like its going to make it.
-rw-r--r-- 1 user group 18630127104 2007-01-21 12:41 18gb
-rw-r--r-- 1 user group 15399329792 2007-01-24 19:55 18gb.copy

Hrmm, with preempt off it works and with preempt ON and mem=700M it works.
I guess I will run with preempt off as I'd prefer to have the memory 
available-- and tolerate the large lag bursts w/ no preemption.


Yup it worked.

-rw-r--r-- 1 user group 18630127104 2007-01-21 12:41 18gb
-rw-r--r-- 1 user group 18630127104 2007-01-21 12:41 18gb.copy

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-24 Thread Bill Cizek

Justin Piszcz wrote:

On Mon, 22 Jan 2007, Andrew Morton wrote:
  

On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz [EMAIL PROTECTED] 
wrote:
Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
the OOM killer and kill all of my processes?
  
Running with PREEMPT OFF lets me copy the file!!  The machine LAGS 
occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds 
of lag, but hey, it does not crash!! I will boot the older kernel with 
preempt on and see if I can get you that information you requested.
  

Justin,

According to your kernel_ring_buffer.txt (attached to another email), 
you are using anticipatory as your io scheduler:
  289  Jan 24 18:35:25 p34 kernel: [0.142130] io scheduler noop 
registered
  290  Jan 24 18:35:25 p34 kernel: [0.142194] io scheduler 
anticipatory registered (default)


I had a problem with this scheduler where my system would occasionally 
lockup during heavy I/O.  Sometimes it would fix itself, sometimes I had 
to reboot.  I changed to the CFQ io scheduler and my system has worked 
fine since then.


CFQ has to be built into the kernel (under BlockLayer/IOSchedulers).  It 
can be selected as default or you can set it during runtime:


echo cfq  /sys/block/disk/queue/scheduler
...

Hope this helps,
Bill
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-22 Thread Andrew Morton
On Tue, 23 Jan 2007 11:37:09 +1100
Donald Douwsma <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> >> On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <[EMAIL PROTECTED]> 
> >> wrote:
> >> Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> >> the OOM killer and kill all of my processes?
> > 
> > What's that?   Software raid or hardware raid?  If the latter, which driver?
> 
> I've hit this using local disk while testing xfs built against 2.6.20-rc4 
> (SMP x86_64)
> 
> dmesg follows, I'm not sure if anything in this is useful after the first 
> event as our automated tests continued on
> after the failure.

This looks different.

> ...
>
> Mem-info:
> Node 0 DMA per-cpu:
> CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:  
>  0
> CPU1: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:  
>  0
> CPU2: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:  
>  0
> CPU3: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:  
>  0
> Node 0 DMA32 per-cpu:
> CPU0: Hot: hi:  186, btch:  31 usd:  31   Cold: hi:   62, btch:  15 usd:  
> 53
> CPU1: Hot: hi:  186, btch:  31 usd:   2   Cold: hi:   62, btch:  15 usd:  
> 60
> CPU2: Hot: hi:  186, btch:  31 usd:  20   Cold: hi:   62, btch:  15 usd:  
> 47
> CPU3: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:   62, btch:  15 usd:  
> 56
> Active:76 inactive:495856 dirty:0 writeback:0 unstable:0 free:3680 slab:9119 
> mapped:32 pagetables:637

No dirty pages, no pages under writeback.

> Node 0 DMA free:8036kB min:24kB low:28kB high:36kB active:0kB inactive:1856kB 
> present:9376kB pages_scanned:3296
> all_unreclaimable? yes
> lowmem_reserve[]: 0 2003 2003
> Node 0 DMA32 free:6684kB min:5712kB low:7140kB high:8568kB active:304kB 
> inactive:1981624kB present:2052068kB

Inactive list is filled.

> pages_scanned:4343329 all_unreclaimable? yes

We scanned our guts out and decided that nothing was reclaimable.

> No available memory (MPOL_BIND): kill process 3492 (hald) score 0 or a child
> No available memory (MPOL_BIND): kill process 7914 (top) score 0 or a child
> No available memory (MPOL_BIND): kill process 4166 (nscd) score 0 or a child
> No available memory (MPOL_BIND): kill process 17869 (xfs_repair) score 0 or a 
> child

But in all cases a constrained memory policy was in use.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-22 Thread Donald Douwsma
Andrew Morton wrote:
>> On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <[EMAIL PROTECTED]> 
>> wrote:
>> Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
>> the OOM killer and kill all of my processes?
> 
> What's that?   Software raid or hardware raid?  If the latter, which driver?

I've hit this using local disk while testing xfs built against 2.6.20-rc4 (SMP 
x86_64)

dmesg follows, I'm not sure if anything in this is useful after the first event 
as our automated tests continued on
after the failure.

> Please include /proc/meminfo from after the oom-killing.
>
> Please work out what is using all that slab memory, via /proc/slabinfo.

Sorry I didnt pick this up ether.
I'll try to reproduce this and gather some more detailed info for a single 
event.

Donald


...
XFS mounting filesystem sdb5
Ending clean XFS mount for filesystem: sdb5
XFS mounting filesystem sdb5
Ending clean XFS mount for filesystem: sdb5
hald invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0

Call Trace:
 [] out_of_memory+0x70/0x25d
 [] __alloc_pages+0x22c/0x2b5
 [] alloc_page_vma+0x71/0x76
 [] read_swap_cache_async+0x45/0xd8
 [] swapin_readahead+0x60/0xd3
 [] __handle_mm_fault+0x703/0x9d8
 [] do_page_fault+0x42b/0x7b3
 [] do_readv_writev+0x176/0x18b
 [] thread_return+0x0/0xed
 [] __const_udelay+0x2c/0x2d
 [] scsi_done+0x0/0x17
 [] error_exit+0x0/0x84

Mem-info:
Node 0 DMA per-cpu:
CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU1: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU2: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU3: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU0: Hot: hi:  186, btch:  31 usd:  31   Cold: hi:   62, btch:  15 usd:  53
CPU1: Hot: hi:  186, btch:  31 usd:   2   Cold: hi:   62, btch:  15 usd:  60
CPU2: Hot: hi:  186, btch:  31 usd:  20   Cold: hi:   62, btch:  15 usd:  47
CPU3: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:   62, btch:  15 usd:  56
Active:76 inactive:495856 dirty:0 writeback:0 unstable:0 free:3680 slab:9119 
mapped:32 pagetables:637
Node 0 DMA free:8036kB min:24kB low:28kB high:36kB active:0kB inactive:1856kB 
present:9376kB pages_scanned:3296
all_unreclaimable? yes
lowmem_reserve[]: 0 2003 2003
Node 0 DMA32 free:6684kB min:5712kB low:7140kB high:8568kB active:304kB 
inactive:1981624kB present:2052068kB
pages_scanned:4343329 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 
1*2048kB 1*4096kB = 8036kB
Node 0 DMA32: 273*4kB 29*8kB 1*16kB 1*32kB 1*64kB 1*128kB 2*256kB 1*512kB 
0*1024kB 0*2048kB 1*4096kB = 6684kB
Swap cache: add 741048, delete 244661, find 84826/143198, race 680+239
Free swap  = 1088524kB
Total swap = 3140668kB
Free swap:   1088524kB
524224 pages of RAM
9619 reserved pages
259 pages shared
496388 pages swap cached
No available memory (MPOL_BIND): kill process 3492 (hald) score 0 or a child
Killed process 3626 (hald-addon-acpi)
top invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

Call Trace:
 [] out_of_memory+0x70/0x25d
 [] __alloc_pages+0x22c/0x2b5
 [] alloc_pages_current+0x74/0x79
 [] __page_cache_alloc+0xb/0xe
 [] __do_page_cache_readahead+0xa1/0x217
 [] io_schedule+0x28/0x33
 [] __wait_on_bit_lock+0x5b/0x66
 [] __lock_page+0x72/0x78
 [] do_page_cache_readahead+0x4e/0x5a
 [] filemap_nopage+0x140/0x30c
 [] __handle_mm_fault+0x1fb/0x9d8
 [] do_page_fault+0x42b/0x7b3
 [] __wake_up+0x43/0x50
 [] tty_ldisc_deref+0x71/0x76
 [] error_exit+0x0/0x84

Mem-info:
Node 0 DMA per-cpu:
CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU1: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU2: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU3: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU0: Hot: hi:  186, btch:  31 usd:  31   Cold: hi:   62, btch:  15 usd:  53
CPU1: Hot: hi:  186, btch:  31 usd:   2   Cold: hi:   62, btch:  15 usd:  60
CPU2: Hot: hi:  186, btch:  31 usd:   1   Cold: hi:   62, btch:  15 usd:  10
CPU3: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:   62, btch:  15 usd:  26
Active:90 inactive:496233 dirty:0 writeback:0 unstable:0 free:3485 slab:9119 
mapped:32 pagetables:637
Node 0 DMA free:8036kB min:24kB low:28kB high:36kB active:0kB inactive:1856kB 
present:9376kB pages_scanned:3328
all_unreclaimable? yes
lowmem_reserve[]: 0 2003 2003
Node 0 DMA32 free:5904kB min:5712kB low:7140kB high:8568kB active:360kB 
inactive:1983092kB present:2052068kB
pages_scanned:4587649 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 
1*2048kB 1*4096kB = 8036kB
Node 0 DMA32: 78*4kB 29*8kB 1*16kB 1*32kB 1*64kB 1*128kB 2*256kB 1*512kB 
0*1024kB 0*2048kB 1*4096kB = 

Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-22 Thread Pavel Machek
On Mon 2007-01-22 13:48:44, Justin Piszcz wrote:
> 
> 
> On Mon, 22 Jan 2007, Pavel Machek wrote:
> 
> > On Sun 2007-01-21 14:27:34, Justin Piszcz wrote:
> > > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to 
> > > invoke 
> > > the OOM killer and kill all of my processes?
> > > 
> > > Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> > > happens every time!
> > > 
> > > Anything to try?  Any other output needed?  Can someone shed some light 
> > > on 
> > > this situation?
> > 
> > Is it highmem-related? Can you try it with mem=256M?
> > 
> > Pavel
> > -- 
> > (english) http://www.livejournal.com/~pavelmachek
> > (cesky, pictures) 
> > http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> I will give this a try later or tomorrow, I cannot have my machine crash 
> at the moment.
> 
> Also, the onboard video on the Intel 965 chipset uses 128MB, not sure if 
> that has anything to do with it because after the system kill -9's all the 
> processes etc, my terminal looks like garbage.

That looks like separate problem. Switch to text mode console (vgacon,
not fbcon) for tests.
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-22 Thread Justin Piszcz
> What's that?   Software raid or hardware raid?  If the latter, which 
driver?

Software RAID (md)

On Mon, 22 Jan 2007, Andrew Morton wrote:

> > On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <[EMAIL PROTECTED]> 
> > wrote:
> > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> > the OOM killer and kill all of my processes?
> 
> What's that?   Software raid or hardware raid?  If the latter, which driver?
> 
> > Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> > happens every time!
> > 
> > Anything to try?  Any other output needed?  Can someone shed some light on 
> > this situation?
> > 
> > Thanks.
> > 
> > 
> > The last lines of vmstat 1 (right before it kill -9'd my shell/ssh)
> > 
> > procs ---memory-- ---swap-- -io -system-- 
> > cpu
> >  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id 
> > wa
> >  0  7764  50348 12 126998800 53632   172 1902 4600  1  8 
> > 29 62
> >  0  7764  49420 12 126000400 53632 34368 1871 6357  2 11 
> > 48 40
> 
> The wordwrapping is painful :(
> 
> > 
> > The last lines of dmesg:
> > [ 5947.199985] lowmem_reserve[]: 0 0 0
> > [ 5947.12] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 
> > 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
> > [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 
> > 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB
> > [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 
> > 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB
> > [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0
> > [ 5947.200055] Free swap  = 2197628kB
> > [ 5947.200058] Total swap = 2200760kB
> > [ 5947.200060] Free swap:   2197628kB
> > [ 5947.205664] 517888 pages of RAM 
> > [ 5947.205671] 288512 pages of HIGHMEM
> > [ 5947.205673] 5666 reserved pages 
> > [ 5947.205675] 257163 pages shared
> > [ 5947.205678] 600 pages swap cached 
> > [ 5947.205680] 88876 pages dirty
> > [ 5947.205682] 115111 pages writeback
> > [ 5947.205684] 5608 pages mapped
> > [ 5947.205686] 49367 pages slab
> > [ 5947.205688] 541 pages pagetables
> > [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a 
> > child
> > [ 5947.205801] Killed process 1853 (named)
> > [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, 
> > oomkilladj=0
> > [ 5947.206621]  [] out_of_memory+0x17b/0x1b0
> > [ 5947.206631]  [] __alloc_pages+0x29c/0x2f0
> > [ 5947.206636]  [] __pte_alloc+0x1d/0x90
> > [ 5947.206643]  [] copy_page_range+0x357/0x380
> > [ 5947.206649]  [] copy_process+0x765/0xfc0
> > [ 5947.206655]  [] alloc_pid+0x1b9/0x280
> > [ 5947.206662]  [] do_fork+0x79/0x1e0
> > [ 5947.206674]  [] do_pipe+0x5f/0xc0
> > [ 5947.206680]  [] sys_clone+0x36/0x40
> > [ 5947.206686]  [] syscall_call+0x7/0xb
> > [ 5947.206691]  [] __sched_text_start+0x853/0x950
> > [ 5947.206698]  === 
> 
> Important information from the oom-killing event is missing.  Please send
> it all.
> 
> >From your earlier reports we have several hundred MB of ZONE_NORMAL memory
> which has gone awol.
> 
> Please include /proc/meminfo from after the oom-killing.
> 
> Please work out what is using all that slab memory, via /proc/slabinfo.
> 
> After the oom-killing, please see if you can free up the ZONE_NORMAL memory
> via a few `echo 3 > /proc/sys/vm/drop_caches' commands.  See if you can
> work out what happened to the missing couple-of-hundred MB from
> ZONE_NORMAL.
> 
> 

I believe this is the first part of it (hopefully):

2908kB active:86104kB inactive:1061904kB present:1145032kB pages_scanned:0 
all_unreclaimable? no
[ 5947.199985] lowmem_reserve[]: 0 0 0
[ 5947.12] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 
0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
[ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 
1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB
[ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 
1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB
[ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0
[ 5947.200055] Free swap  = 2197628kB
[ 5947.200058] Total swap = 2200760kB
[ 5947.200060] Free swap:   2197628kB
[ 5947.205664] 517888 pages of RAM
[ 5947.205671] 288512 pages of HIGHMEM
[ 5947.205673] 5666 reserved pages
[ 5947.205675] 257163 pages shared
[ 5947.205678] 600 pages swap cached
[ 5947.205680] 88876 pages dirty
[ 5947.205682] 115111 pages writeback
[ 5947.205684] 5608 pages mapped
[ 5947.205686] 49367 pages slab
[ 5947.205688] 541 pages pagetables
[ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a 
child
[ 5947.205801] Killed process 1853 (named)
[ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, 
oomkilladj=0
[ 5947.206621]  [] out_of_memory+0x17b/0x1b0
[ 5947.206631]  [] __alloc_pages+0x29c/0x2f0
[ 5947.206636]  [] __pte_alloc+0x1d/0x90
[ 5947.206643]  [] 

Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-22 Thread Andrew Morton
> On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <[EMAIL PROTECTED]> 
> wrote:
> Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> the OOM killer and kill all of my processes?

What's that?   Software raid or hardware raid?  If the latter, which driver?

> Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> happens every time!
> 
> Anything to try?  Any other output needed?  Can someone shed some light on 
> this situation?
> 
> Thanks.
> 
> 
> The last lines of vmstat 1 (right before it kill -9'd my shell/ssh)
> 
> procs ---memory-- ---swap-- -io -system-- 
> cpu
>  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id 
> wa
>  0  7764  50348 12 126998800 53632   172 1902 4600  1  8 
> 29 62
>  0  7764  49420 12 126000400 53632 34368 1871 6357  2 11 
> 48 40

The wordwrapping is painful :(

> 
> The last lines of dmesg:
> [ 5947.199985] lowmem_reserve[]: 0 0 0
> [ 5947.12] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 
> 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
> [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 
> 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB
> [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 
> 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB
> [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0
> [ 5947.200055] Free swap  = 2197628kB
> [ 5947.200058] Total swap = 2200760kB
> [ 5947.200060] Free swap:   2197628kB
> [ 5947.205664] 517888 pages of RAM 
> [ 5947.205671] 288512 pages of HIGHMEM
> [ 5947.205673] 5666 reserved pages 
> [ 5947.205675] 257163 pages shared
> [ 5947.205678] 600 pages swap cached 
> [ 5947.205680] 88876 pages dirty
> [ 5947.205682] 115111 pages writeback
> [ 5947.205684] 5608 pages mapped
> [ 5947.205686] 49367 pages slab
> [ 5947.205688] 541 pages pagetables
> [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a 
> child
> [ 5947.205801] Killed process 1853 (named)
> [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, 
> oomkilladj=0
> [ 5947.206621]  [] out_of_memory+0x17b/0x1b0
> [ 5947.206631]  [] __alloc_pages+0x29c/0x2f0
> [ 5947.206636]  [] __pte_alloc+0x1d/0x90
> [ 5947.206643]  [] copy_page_range+0x357/0x380
> [ 5947.206649]  [] copy_process+0x765/0xfc0
> [ 5947.206655]  [] alloc_pid+0x1b9/0x280
> [ 5947.206662]  [] do_fork+0x79/0x1e0
> [ 5947.206674]  [] do_pipe+0x5f/0xc0
> [ 5947.206680]  [] sys_clone+0x36/0x40
> [ 5947.206686]  [] syscall_call+0x7/0xb
> [ 5947.206691]  [] __sched_text_start+0x853/0x950
> [ 5947.206698]  === 

Important information from the oom-killing event is missing.  Please send
it all.

>From your earlier reports we have several hundred MB of ZONE_NORMAL memory
which has gone awol.

Please include /proc/meminfo from after the oom-killing.

Please work out what is using all that slab memory, via /proc/slabinfo.

After the oom-killing, please see if you can free up the ZONE_NORMAL memory
via a few `echo 3 > /proc/sys/vm/drop_caches' commands.  See if you can
work out what happened to the missing couple-of-hundred MB from
ZONE_NORMAL.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-22 Thread Justin Piszcz


On Mon, 22 Jan 2007, Pavel Machek wrote:

> On Sun 2007-01-21 14:27:34, Justin Piszcz wrote:
> > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> > the OOM killer and kill all of my processes?
> > 
> > Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> > happens every time!
> > 
> > Anything to try?  Any other output needed?  Can someone shed some light on 
> > this situation?
> 
> Is it highmem-related? Can you try it with mem=256M?
> 
>   Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

I will give this a try later or tomorrow, I cannot have my machine crash 
at the moment.

Also, the onboard video on the Intel 965 chipset uses 128MB, not sure if 
that has anything to do with it because after the system kill -9's all the 
processes etc, my terminal looks like garbage.

Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-22 Thread Pavel Machek
On Sun 2007-01-21 14:27:34, Justin Piszcz wrote:
> Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> the OOM killer and kill all of my processes?
> 
> Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> happens every time!
> 
> Anything to try?  Any other output needed?  Can someone shed some light on 
> this situation?

Is it highmem-related? Can you try it with mem=256M?

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-22 Thread Pavel Machek
On Sun 2007-01-21 14:27:34, Justin Piszcz wrote:
 Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
 the OOM killer and kill all of my processes?
 
 Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
 happens every time!
 
 Anything to try?  Any other output needed?  Can someone shed some light on 
 this situation?

Is it highmem-related? Can you try it with mem=256M?

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-22 Thread Justin Piszcz


On Mon, 22 Jan 2007, Pavel Machek wrote:

 On Sun 2007-01-21 14:27:34, Justin Piszcz wrote:
  Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
  the OOM killer and kill all of my processes?
  
  Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
  happens every time!
  
  Anything to try?  Any other output needed?  Can someone shed some light on 
  this situation?
 
 Is it highmem-related? Can you try it with mem=256M?
 
   Pavel
 -- 
 (english) http://www.livejournal.com/~pavelmachek
 (cesky, pictures) 
 http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

I will give this a try later or tomorrow, I cannot have my machine crash 
at the moment.

Also, the onboard video on the Intel 965 chipset uses 128MB, not sure if 
that has anything to do with it because after the system kill -9's all the 
processes etc, my terminal looks like garbage.

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-22 Thread Andrew Morton
 On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz [EMAIL PROTECTED] 
 wrote:
 Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
 the OOM killer and kill all of my processes?

What's that?   Software raid or hardware raid?  If the latter, which driver?

 Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
 happens every time!
 
 Anything to try?  Any other output needed?  Can someone shed some light on 
 this situation?
 
 Thanks.
 
 
 The last lines of vmstat 1 (right before it kill -9'd my shell/ssh)
 
 procs ---memory-- ---swap-- -io -system-- 
 cpu
  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id 
 wa
  0  7764  50348 12 126998800 53632   172 1902 4600  1  8 
 29 62
  0  7764  49420 12 126000400 53632 34368 1871 6357  2 11 
 48 40

The wordwrapping is painful :(

 
 The last lines of dmesg:
 [ 5947.199985] lowmem_reserve[]: 0 0 0
 [ 5947.12] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 
 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
 [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 
 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB
 [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 
 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB
 [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0
 [ 5947.200055] Free swap  = 2197628kB
 [ 5947.200058] Total swap = 2200760kB
 [ 5947.200060] Free swap:   2197628kB
 [ 5947.205664] 517888 pages of RAM 
 [ 5947.205671] 288512 pages of HIGHMEM
 [ 5947.205673] 5666 reserved pages 
 [ 5947.205675] 257163 pages shared
 [ 5947.205678] 600 pages swap cached 
 [ 5947.205680] 88876 pages dirty
 [ 5947.205682] 115111 pages writeback
 [ 5947.205684] 5608 pages mapped
 [ 5947.205686] 49367 pages slab
 [ 5947.205688] 541 pages pagetables
 [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a 
 child
 [ 5947.205801] Killed process 1853 (named)
 [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, 
 oomkilladj=0
 [ 5947.206621]  [c013e33b] out_of_memory+0x17b/0x1b0
 [ 5947.206631]  [c013fcac] __alloc_pages+0x29c/0x2f0
 [ 5947.206636]  [c01479ad] __pte_alloc+0x1d/0x90
 [ 5947.206643]  [c0148bf7] copy_page_range+0x357/0x380
 [ 5947.206649]  [c0119d75] copy_process+0x765/0xfc0
 [ 5947.206655]  [c012c3f9] alloc_pid+0x1b9/0x280
 [ 5947.206662]  [c011a839] do_fork+0x79/0x1e0
 [ 5947.206674]  [c015f91f] do_pipe+0x5f/0xc0
 [ 5947.206680]  [c0101176] sys_clone+0x36/0x40
 [ 5947.206686]  [c0103138] syscall_call+0x7/0xb
 [ 5947.206691]  [c0420033] __sched_text_start+0x853/0x950
 [ 5947.206698]  === 

Important information from the oom-killing event is missing.  Please send
it all.

From your earlier reports we have several hundred MB of ZONE_NORMAL memory
which has gone awol.

Please include /proc/meminfo from after the oom-killing.

Please work out what is using all that slab memory, via /proc/slabinfo.

After the oom-killing, please see if you can free up the ZONE_NORMAL memory
via a few `echo 3  /proc/sys/vm/drop_caches' commands.  See if you can
work out what happened to the missing couple-of-hundred MB from
ZONE_NORMAL.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-22 Thread Justin Piszcz
 What's that?   Software raid or hardware raid?  If the latter, which 
driver?

Software RAID (md)

On Mon, 22 Jan 2007, Andrew Morton wrote:

  On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz [EMAIL PROTECTED] 
  wrote:
  Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
  the OOM killer and kill all of my processes?
 
 What's that?   Software raid or hardware raid?  If the latter, which driver?
 
  Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
  happens every time!
  
  Anything to try?  Any other output needed?  Can someone shed some light on 
  this situation?
  
  Thanks.
  
  
  The last lines of vmstat 1 (right before it kill -9'd my shell/ssh)
  
  procs ---memory-- ---swap-- -io -system-- 
  cpu
   r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id 
  wa
   0  7764  50348 12 126998800 53632   172 1902 4600  1  8 
  29 62
   0  7764  49420 12 126000400 53632 34368 1871 6357  2 11 
  48 40
 
 The wordwrapping is painful :(
 
  
  The last lines of dmesg:
  [ 5947.199985] lowmem_reserve[]: 0 0 0
  [ 5947.12] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 
  0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
  [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 
  1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB
  [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 
  1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB
  [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0
  [ 5947.200055] Free swap  = 2197628kB
  [ 5947.200058] Total swap = 2200760kB
  [ 5947.200060] Free swap:   2197628kB
  [ 5947.205664] 517888 pages of RAM 
  [ 5947.205671] 288512 pages of HIGHMEM
  [ 5947.205673] 5666 reserved pages 
  [ 5947.205675] 257163 pages shared
  [ 5947.205678] 600 pages swap cached 
  [ 5947.205680] 88876 pages dirty
  [ 5947.205682] 115111 pages writeback
  [ 5947.205684] 5608 pages mapped
  [ 5947.205686] 49367 pages slab
  [ 5947.205688] 541 pages pagetables
  [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a 
  child
  [ 5947.205801] Killed process 1853 (named)
  [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, 
  oomkilladj=0
  [ 5947.206621]  [c013e33b] out_of_memory+0x17b/0x1b0
  [ 5947.206631]  [c013fcac] __alloc_pages+0x29c/0x2f0
  [ 5947.206636]  [c01479ad] __pte_alloc+0x1d/0x90
  [ 5947.206643]  [c0148bf7] copy_page_range+0x357/0x380
  [ 5947.206649]  [c0119d75] copy_process+0x765/0xfc0
  [ 5947.206655]  [c012c3f9] alloc_pid+0x1b9/0x280
  [ 5947.206662]  [c011a839] do_fork+0x79/0x1e0
  [ 5947.206674]  [c015f91f] do_pipe+0x5f/0xc0
  [ 5947.206680]  [c0101176] sys_clone+0x36/0x40
  [ 5947.206686]  [c0103138] syscall_call+0x7/0xb
  [ 5947.206691]  [c0420033] __sched_text_start+0x853/0x950
  [ 5947.206698]  === 
 
 Important information from the oom-killing event is missing.  Please send
 it all.
 
 From your earlier reports we have several hundred MB of ZONE_NORMAL memory
 which has gone awol.
 
 Please include /proc/meminfo from after the oom-killing.
 
 Please work out what is using all that slab memory, via /proc/slabinfo.
 
 After the oom-killing, please see if you can free up the ZONE_NORMAL memory
 via a few `echo 3  /proc/sys/vm/drop_caches' commands.  See if you can
 work out what happened to the missing couple-of-hundred MB from
 ZONE_NORMAL.
 
 

I believe this is the first part of it (hopefully):

2908kB active:86104kB inactive:1061904kB present:1145032kB pages_scanned:0 
all_unreclaimable? no
[ 5947.199985] lowmem_reserve[]: 0 0 0
[ 5947.12] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 
0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
[ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 
1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB
[ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 
1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB
[ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0
[ 5947.200055] Free swap  = 2197628kB
[ 5947.200058] Total swap = 2200760kB
[ 5947.200060] Free swap:   2197628kB
[ 5947.205664] 517888 pages of RAM
[ 5947.205671] 288512 pages of HIGHMEM
[ 5947.205673] 5666 reserved pages
[ 5947.205675] 257163 pages shared
[ 5947.205678] 600 pages swap cached
[ 5947.205680] 88876 pages dirty
[ 5947.205682] 115111 pages writeback
[ 5947.205684] 5608 pages mapped
[ 5947.205686] 49367 pages slab
[ 5947.205688] 541 pages pagetables
[ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a 
child
[ 5947.205801] Killed process 1853 (named)
[ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, 
oomkilladj=0
[ 5947.206621]  [c013e33b] out_of_memory+0x17b/0x1b0
[ 5947.206631]  [c013fcac] __alloc_pages+0x29c/0x2f0
[ 5947.206636]  [c01479ad] __pte_alloc+0x1d/0x90
[ 5947.206643]  [c0148bf7] copy_page_range+0x357/0x380
[ 5947.206649]  

Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-22 Thread Pavel Machek
On Mon 2007-01-22 13:48:44, Justin Piszcz wrote:
 
 
 On Mon, 22 Jan 2007, Pavel Machek wrote:
 
  On Sun 2007-01-21 14:27:34, Justin Piszcz wrote:
   Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to 
   invoke 
   the OOM killer and kill all of my processes?
   
   Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
   happens every time!
   
   Anything to try?  Any other output needed?  Can someone shed some light 
   on 
   this situation?
  
  Is it highmem-related? Can you try it with mem=256M?
  
  Pavel
  -- 
  (english) http://www.livejournal.com/~pavelmachek
  (cesky, pictures) 
  http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
  -
  To unsubscribe from this list: send the line unsubscribe linux-raid in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
 
 I will give this a try later or tomorrow, I cannot have my machine crash 
 at the moment.
 
 Also, the onboard video on the Intel 965 chipset uses 128MB, not sure if 
 that has anything to do with it because after the system kill -9's all the 
 processes etc, my terminal looks like garbage.

That looks like separate problem. Switch to text mode console (vgacon,
not fbcon) for tests.
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-22 Thread Donald Douwsma
Andrew Morton wrote:
 On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz [EMAIL PROTECTED] 
 wrote:
 Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
 the OOM killer and kill all of my processes?
 
 What's that?   Software raid or hardware raid?  If the latter, which driver?

I've hit this using local disk while testing xfs built against 2.6.20-rc4 (SMP 
x86_64)

dmesg follows, I'm not sure if anything in this is useful after the first event 
as our automated tests continued on
after the failure.

 Please include /proc/meminfo from after the oom-killing.

 Please work out what is using all that slab memory, via /proc/slabinfo.

Sorry I didnt pick this up ether.
I'll try to reproduce this and gather some more detailed info for a single 
event.

Donald


...
XFS mounting filesystem sdb5
Ending clean XFS mount for filesystem: sdb5
XFS mounting filesystem sdb5
Ending clean XFS mount for filesystem: sdb5
hald invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0

Call Trace:
 [80257367] out_of_memory+0x70/0x25d
 [80258f6b] __alloc_pages+0x22c/0x2b5
 [8026d889] alloc_page_vma+0x71/0x76
 [8026937b] read_swap_cache_async+0x45/0xd8
 [8025f2e0] swapin_readahead+0x60/0xd3
 [80260ece] __handle_mm_fault+0x703/0x9d8
 [80532bf7] do_page_fault+0x42b/0x7b3
 [80278adf] do_readv_writev+0x176/0x18b
 [8052efde] thread_return+0x0/0xed
 [8034d7f5] __const_udelay+0x2c/0x2d
 [803f4e0b] scsi_done+0x0/0x17
 [8053109d] error_exit+0x0/0x84

Mem-info:
Node 0 DMA per-cpu:
CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU1: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU2: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU3: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU0: Hot: hi:  186, btch:  31 usd:  31   Cold: hi:   62, btch:  15 usd:  53
CPU1: Hot: hi:  186, btch:  31 usd:   2   Cold: hi:   62, btch:  15 usd:  60
CPU2: Hot: hi:  186, btch:  31 usd:  20   Cold: hi:   62, btch:  15 usd:  47
CPU3: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:   62, btch:  15 usd:  56
Active:76 inactive:495856 dirty:0 writeback:0 unstable:0 free:3680 slab:9119 
mapped:32 pagetables:637
Node 0 DMA free:8036kB min:24kB low:28kB high:36kB active:0kB inactive:1856kB 
present:9376kB pages_scanned:3296
all_unreclaimable? yes
lowmem_reserve[]: 0 2003 2003
Node 0 DMA32 free:6684kB min:5712kB low:7140kB high:8568kB active:304kB 
inactive:1981624kB present:2052068kB
pages_scanned:4343329 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 
1*2048kB 1*4096kB = 8036kB
Node 0 DMA32: 273*4kB 29*8kB 1*16kB 1*32kB 1*64kB 1*128kB 2*256kB 1*512kB 
0*1024kB 0*2048kB 1*4096kB = 6684kB
Swap cache: add 741048, delete 244661, find 84826/143198, race 680+239
Free swap  = 1088524kB
Total swap = 3140668kB
Free swap:   1088524kB
524224 pages of RAM
9619 reserved pages
259 pages shared
496388 pages swap cached
No available memory (MPOL_BIND): kill process 3492 (hald) score 0 or a child
Killed process 3626 (hald-addon-acpi)
top invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

Call Trace:
 [80257367] out_of_memory+0x70/0x25d
 [80258f6b] __alloc_pages+0x22c/0x2b5
 [8026e6a3] alloc_pages_current+0x74/0x79
 [802548c8] __page_cache_alloc+0xb/0xe
 [8025a65f] __do_page_cache_readahead+0xa1/0x217
 [8052f776] io_schedule+0x28/0x33
 [8052f9e7] __wait_on_bit_lock+0x5b/0x66
 [802546de] __lock_page+0x72/0x78
 [8025ab22] do_page_cache_readahead+0x4e/0x5a
 [80256714] filemap_nopage+0x140/0x30c
 [802609c6] __handle_mm_fault+0x1fb/0x9d8
 [80532bf7] do_page_fault+0x42b/0x7b3
 [80228273] __wake_up+0x43/0x50
 [80380bd5] tty_ldisc_deref+0x71/0x76
 [8053109d] error_exit+0x0/0x84

Mem-info:
Node 0 DMA per-cpu:
CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU1: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU2: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU3: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU0: Hot: hi:  186, btch:  31 usd:  31   Cold: hi:   62, btch:  15 usd:  53
CPU1: Hot: hi:  186, btch:  31 usd:   2   Cold: hi:   62, btch:  15 usd:  60
CPU2: Hot: hi:  186, btch:  31 usd:   1   Cold: hi:   62, btch:  15 usd:  10
CPU3: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:   62, btch:  15 usd:  26
Active:90 inactive:496233 dirty:0 writeback:0 unstable:0 free:3485 slab:9119 
mapped:32 pagetables:637
Node 0 DMA free:8036kB min:24kB low:28kB high:36kB active:0kB inactive:1856kB 
present:9376kB pages_scanned:3328
all_unreclaimable? yes

Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-22 Thread Andrew Morton
On Tue, 23 Jan 2007 11:37:09 +1100
Donald Douwsma [EMAIL PROTECTED] wrote:

 Andrew Morton wrote:
  On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz [EMAIL PROTECTED] 
  wrote:
  Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
  the OOM killer and kill all of my processes?
  
  What's that?   Software raid or hardware raid?  If the latter, which driver?
 
 I've hit this using local disk while testing xfs built against 2.6.20-rc4 
 (SMP x86_64)
 
 dmesg follows, I'm not sure if anything in this is useful after the first 
 event as our automated tests continued on
 after the failure.

This looks different.

 ...

 Mem-info:
 Node 0 DMA per-cpu:
 CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:  
  0
 CPU1: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:  
  0
 CPU2: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:  
  0
 CPU3: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:  
  0
 Node 0 DMA32 per-cpu:
 CPU0: Hot: hi:  186, btch:  31 usd:  31   Cold: hi:   62, btch:  15 usd:  
 53
 CPU1: Hot: hi:  186, btch:  31 usd:   2   Cold: hi:   62, btch:  15 usd:  
 60
 CPU2: Hot: hi:  186, btch:  31 usd:  20   Cold: hi:   62, btch:  15 usd:  
 47
 CPU3: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:   62, btch:  15 usd:  
 56
 Active:76 inactive:495856 dirty:0 writeback:0 unstable:0 free:3680 slab:9119 
 mapped:32 pagetables:637

No dirty pages, no pages under writeback.

 Node 0 DMA free:8036kB min:24kB low:28kB high:36kB active:0kB inactive:1856kB 
 present:9376kB pages_scanned:3296
 all_unreclaimable? yes
 lowmem_reserve[]: 0 2003 2003
 Node 0 DMA32 free:6684kB min:5712kB low:7140kB high:8568kB active:304kB 
 inactive:1981624kB present:2052068kB

Inactive list is filled.

 pages_scanned:4343329 all_unreclaimable? yes

We scanned our guts out and decided that nothing was reclaimable.

 No available memory (MPOL_BIND): kill process 3492 (hald) score 0 or a child
 No available memory (MPOL_BIND): kill process 7914 (top) score 0 or a child
 No available memory (MPOL_BIND): kill process 4166 (nscd) score 0 or a child
 No available memory (MPOL_BIND): kill process 17869 (xfs_repair) score 0 or a 
 child

But in all cases a constrained memory policy was in use.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-21 Thread Justin Piszcz
Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
the OOM killer and kill all of my processes?

Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
happens every time!

Anything to try?  Any other output needed?  Can someone shed some light on 
this situation?

Thanks.


The last lines of vmstat 1 (right before it kill -9'd my shell/ssh)

procs ---memory-- ---swap-- -io -system-- 
cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id 
wa
 0  7764  50348 12 126998800 53632   172 1902 4600  1  8 
29 62
 0  7764  49420 12 126000400 53632 34368 1871 6357  2 11 
48 40
 0  6764  39608 12 123742000 52880 94696 1891 7424  1 12 
47 39
 0  6764  44264 12 122606400 42496 29723 1908 6035  1  9 
31 58
 0  6764  26672 12 121418000 43520 117472 1944 6189  1 13  
0 87
 0  7764   9132 12 121173200 22016 80400 1570 3304  1  8  
0 92
 1  8764   9512 12 120038800 33288 62212 1687 4843  1  9  
0 91
 0  5764  13980 12 119709600  5012   161 1619 2115  1  4 
42 54
 0  5764  29604 12 119722000 0   112 1548 1602  0  3 
50 48
 0  5764  49692 12 119739600 0   152 1484 1438  1  3 
50 47
 0  5764  73128 12 119764400 0   120 1463 1392  1  3 
49 47
 0  4764  99460 12 11977040024   168 1545 1803  1  3 
39 57
 0  4764 100088 12 121929600 11672 75450 1614 1371  0  5 
73 22
 0  6764  50404 12 126907200 53632   145 1989 3871  1  9 
34 56
 0  6764  51500 12 126768400 53632   608 1834 4437  1  8 
21 71
 4  5764  51424 12 126679200 53504  7584 1847 4393  2  9 
48 42
 0  6764  51456 12 126373600 53636  9804 1880 4326  1 10  
9 81
 0  6764  50640 12 126306000 53504  4392 1929 4430  1  8 
28 63
 0  6764  50956 12 1257884   240 50724 17214 1858 4755  1 11 
35 54
 0  6764  48360 12 124769200 50840 48880 1871 6242  1 10  
0 89
 0  6764  40028 12 122586000 42512 93346 1770 5599  2 11  
0 87
procs ---memory-- ---swap-- -io -system-- 
cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id 
wa
 0  6764  20372 12 121174400 21512 123664 1747 4378  0  9  
3 88
 0  6764  12140 12 118858400 20224 111244 1628 4244  1  9 
15 76
 0  7764  11280 12 117114400 22300 80936 1669 5314  1  8 
11 80
 0  6764  12168 12 116284000 28168 44072 1808 5065  1  8  
0 92
 1  7   3132 123740 12 10518480 2368  3852 34246 2097 2376  0  5  
0 94
 1  6   3132  19996 12 115566400 51752   290 1999 2136  2  8  
0 91


The last lines of iostat (right before it kill -9'd my shell/ssh)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle  
   0.510.007.65   91.840.000.00

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda  87.63  3873.20 254.64 65.98 21905.15 24202.06   287.61   
144.37  209.54   3.22 103.30
sdb   0.00  3873.20  1.03 132.9912.37 60045.36   896.25
41.19  398.53   6.93  92.89
sdc   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
0.000.00   0.00   0.00
sdd   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
0.000.00   0.00   0.00
sde   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
0.000.00   0.00   0.00
sdf   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
0.000.00   0.00   0.00
sdg   0.0030.93  0.00  9.28 0.00   157.7334.00 
0.011.11   1.11   1.03
sdh   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
0.000.00   0.00   0.00
sdi  12.37 0.00  7.22  1.0378.35 1.0319.25 
0.045.38   5.38   4.43
sdj  12.37 0.00  6.19  1.0374.23 1.0320.86 
0.022.43   2.43   1.75
sdk   0.0030.93  0.00  9.28 0.00   157.7334.00 
0.010.56   0.56   0.52
md0   0.00 0.00  0.00 610.31 0.00  2441.24 8.00 
0.000.00   0.00   0.00
md2   0.00 0.00 347.42 996.91 22181.44  7917.5344.78 
0.000.00   0.00   0.00
md1   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
0.000.00   0.00   0.00
md3   0.00 0.00  1.03  9.28 4.12   164.9532.80 
0.000.00   0.00   0.00
md4   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
0.000.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle  
   1.020.00   12.24   86.730.000.00

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz 

2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-21 Thread Justin Piszcz
Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
the OOM killer and kill all of my processes?

Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
happens every time!

Anything to try?  Any other output needed?  Can someone shed some light on 
this situation?

Thanks.


The last lines of vmstat 1 (right before it kill -9'd my shell/ssh)

procs ---memory-- ---swap-- -io -system-- 
cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id 
wa
 0  7764  50348 12 126998800 53632   172 1902 4600  1  8 
29 62
 0  7764  49420 12 126000400 53632 34368 1871 6357  2 11 
48 40
 0  6764  39608 12 123742000 52880 94696 1891 7424  1 12 
47 39
 0  6764  44264 12 122606400 42496 29723 1908 6035  1  9 
31 58
 0  6764  26672 12 121418000 43520 117472 1944 6189  1 13  
0 87
 0  7764   9132 12 121173200 22016 80400 1570 3304  1  8  
0 92
 1  8764   9512 12 120038800 33288 62212 1687 4843  1  9  
0 91
 0  5764  13980 12 119709600  5012   161 1619 2115  1  4 
42 54
 0  5764  29604 12 119722000 0   112 1548 1602  0  3 
50 48
 0  5764  49692 12 119739600 0   152 1484 1438  1  3 
50 47
 0  5764  73128 12 119764400 0   120 1463 1392  1  3 
49 47
 0  4764  99460 12 11977040024   168 1545 1803  1  3 
39 57
 0  4764 100088 12 121929600 11672 75450 1614 1371  0  5 
73 22
 0  6764  50404 12 126907200 53632   145 1989 3871  1  9 
34 56
 0  6764  51500 12 126768400 53632   608 1834 4437  1  8 
21 71
 4  5764  51424 12 126679200 53504  7584 1847 4393  2  9 
48 42
 0  6764  51456 12 126373600 53636  9804 1880 4326  1 10  
9 81
 0  6764  50640 12 126306000 53504  4392 1929 4430  1  8 
28 63
 0  6764  50956 12 1257884   240 50724 17214 1858 4755  1 11 
35 54
 0  6764  48360 12 124769200 50840 48880 1871 6242  1 10  
0 89
 0  6764  40028 12 122586000 42512 93346 1770 5599  2 11  
0 87
procs ---memory-- ---swap-- -io -system-- 
cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id 
wa
 0  6764  20372 12 121174400 21512 123664 1747 4378  0  9  
3 88
 0  6764  12140 12 118858400 20224 111244 1628 4244  1  9 
15 76
 0  7764  11280 12 117114400 22300 80936 1669 5314  1  8 
11 80
 0  6764  12168 12 116284000 28168 44072 1808 5065  1  8  
0 92
 1  7   3132 123740 12 10518480 2368  3852 34246 2097 2376  0  5  
0 94
 1  6   3132  19996 12 115566400 51752   290 1999 2136  2  8  
0 91


The last lines of iostat (right before it kill -9'd my shell/ssh)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle  
   0.510.007.65   91.840.000.00

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda  87.63  3873.20 254.64 65.98 21905.15 24202.06   287.61   
144.37  209.54   3.22 103.30
sdb   0.00  3873.20  1.03 132.9912.37 60045.36   896.25
41.19  398.53   6.93  92.89
sdc   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
0.000.00   0.00   0.00
sdd   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
0.000.00   0.00   0.00
sde   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
0.000.00   0.00   0.00
sdf   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
0.000.00   0.00   0.00
sdg   0.0030.93  0.00  9.28 0.00   157.7334.00 
0.011.11   1.11   1.03
sdh   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
0.000.00   0.00   0.00
sdi  12.37 0.00  7.22  1.0378.35 1.0319.25 
0.045.38   5.38   4.43
sdj  12.37 0.00  6.19  1.0374.23 1.0320.86 
0.022.43   2.43   1.75
sdk   0.0030.93  0.00  9.28 0.00   157.7334.00 
0.010.56   0.56   0.52
md0   0.00 0.00  0.00 610.31 0.00  2441.24 8.00 
0.000.00   0.00   0.00
md2   0.00 0.00 347.42 996.91 22181.44  7917.5344.78 
0.000.00   0.00   0.00
md1   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
0.000.00   0.00   0.00
md3   0.00 0.00  1.03  9.28 4.12   164.9532.80 
0.000.00   0.00   0.00
md4   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
0.000.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle  
   1.020.00   12.24   86.730.000.00

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz