Re: Small System Paging Problem - OOM-killer goes nuts

2007-11-26 Thread Josh Goldsmith

When you untar, which filesystem do you untar too?

I've untarred it to Ext3, Ext2, and Reiser filesystems.  I've been fighting
with this for a while.

I did manage to get it to happen again doing a recursive chmod after
untarring the kernel (I stopped the untar a few times to let the system
catch up).

Interesting output below.

-J

top - 17:58:03 up  3:08,  1 user,  load average: 3.54, 4.09, 4.08
Tasks:  53 total,   2 running,  51 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.1%us, 11.4%sy,  0.6%ni,  0.0%id, 81.4%wa,  2.7%hi,  1.8%si,
0.0%st
Mem: 30352k total,28252k used, 2100k free,19448k buffers
Swap:   465876k total,15736k used,   450140k free, 1072k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
1357 root  30  15  1568  168   88 R  8.1  0.6   0:07.87 chmod
 168 root  10  -5 000 S  3.1  0.0   6:39.25 usb-storage
1353 root  15   0  2408  540  400 R  2.2  1.8   0:14.29 top
 989 root  15   0  3600  292  192 S  1.2  1.0   0:37.81 sshd
   2 root  34  19 000 S  0.6  0.0   2:14.65 ksoftirqd/0
  56 root  15   0 000 S  0.3  0.0   0:23.85 pdflush
  58 root  10  -5 000 S  0.3  0.0   0:54.70 kswapd0
 950 root  15   0  3128  108   64 S  0.3  0.4   0:13.88 ntpd
   1 root  16   0  144000 S  0.0  0.0   0:10.40 init
   3 root  10  -5 000 S  0.0  0.0   0:00.02 events/0
   4 root  10  -5 000 S  0.0  0.0   0:00.02 khelper
   5 root  10  -5 000 S  0.0  0.0   0:00.00 kthread
  38 root  10  -5 000 S  0.0  0.0   0:00.04 kblockd/0
  41 root  10  -5 000 S  0.0  0.0   0:00.02 khubd
  57 root  15   0 000 D  0.0  0.0   0:20.29 pdflush


And the first of the oom-killer syslog messages:

ntpd invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0
Mem-info:
DMA per-cpu:
CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:
0
sshd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
Active:2816 inactive:2778 dirty:0 writeback:0 unstable:0
free:179 slab:858 mapped:1 pagetables:93 bounce:0 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Small System Paging Problem - OOM-killer goes nuts

2007-11-26 Thread Josh Goldsmith
David:  The exact command this time was a "tar jxf linux-2.6.23.tar.bz2" as 
part of an emerge (gentoo).  Gnu tar version 1.18 but has happened with 
prior versions too.  I replicated it after my post by manually untarring it 
on the command line and can almost always replicate the problem with any 
large (GCC/kernel) tarball.  If I shut down all other processes, the untar 
will go longer but eventually the oom-killer will be invoked.


Pavel:  I'll ping Olver Neukum about it.

Thanks for the responses!
 -Josh

- Original Message - 
From: "David Newall" <[EMAIL PROTECTED]>

To: "Josh Goldsmith" <[EMAIL PROTECTED]>
Cc: 
Sent: Monday, November 26, 2007 4:57 AM
Subject: Re: Small System Paging Problem - OOM-killer goes nuts



Josh Goldsmith wrote:
The problem comes when I try to untar a large file (in this case 
linux-2.6.23.tar.bz2).  Regardless if I kill off every other process, 
eventually the oom-killer will appear and kill either the tar or the 
shell.


What's the actual command you are executing?



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Small System Paging Problem - OOM-killer goes nuts

2007-11-26 Thread Josh Goldsmith
David:  The exact command this time was a tar jxf linux-2.6.23.tar.bz2 as 
part of an emerge (gentoo).  Gnu tar version 1.18 but has happened with 
prior versions too.  I replicated it after my post by manually untarring it 
on the command line and can almost always replicate the problem with any 
large (GCC/kernel) tarball.  If I shut down all other processes, the untar 
will go longer but eventually the oom-killer will be invoked.


Pavel:  I'll ping Olver Neukum about it.

Thanks for the responses!
 -Josh

- Original Message - 
From: David Newall [EMAIL PROTECTED]

To: Josh Goldsmith [EMAIL PROTECTED]
Cc: linux-kernel@vger.kernel.org
Sent: Monday, November 26, 2007 4:57 AM
Subject: Re: Small System Paging Problem - OOM-killer goes nuts



Josh Goldsmith wrote:
The problem comes when I try to untar a large file (in this case 
linux-2.6.23.tar.bz2).  Regardless if I kill off every other process, 
eventually the oom-killer will appear and kill either the tar or the 
shell.


What's the actual command you are executing?



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Small System Paging Problem - OOM-killer goes nuts

2007-11-26 Thread Josh Goldsmith

When you untar, which filesystem do you untar too?

I've untarred it to Ext3, Ext2, and Reiser filesystems.  I've been fighting
with this for a while.

I did manage to get it to happen again doing a recursive chmod after
untarring the kernel (I stopped the untar a few times to let the system
catch up).

Interesting output below.

-J

top - 17:58:03 up  3:08,  1 user,  load average: 3.54, 4.09, 4.08
Tasks:  53 total,   2 running,  51 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.1%us, 11.4%sy,  0.6%ni,  0.0%id, 81.4%wa,  2.7%hi,  1.8%si,
0.0%st
Mem: 30352k total,28252k used, 2100k free,19448k buffers
Swap:   465876k total,15736k used,   450140k free, 1072k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
1357 root  30  15  1568  168   88 R  8.1  0.6   0:07.87 chmod
 168 root  10  -5 000 S  3.1  0.0   6:39.25 usb-storage
1353 root  15   0  2408  540  400 R  2.2  1.8   0:14.29 top
 989 root  15   0  3600  292  192 S  1.2  1.0   0:37.81 sshd
   2 root  34  19 000 S  0.6  0.0   2:14.65 ksoftirqd/0
  56 root  15   0 000 S  0.3  0.0   0:23.85 pdflush
  58 root  10  -5 000 S  0.3  0.0   0:54.70 kswapd0
 950 root  15   0  3128  108   64 S  0.3  0.4   0:13.88 ntpd
   1 root  16   0  144000 S  0.0  0.0   0:10.40 init
   3 root  10  -5 000 S  0.0  0.0   0:00.02 events/0
   4 root  10  -5 000 S  0.0  0.0   0:00.02 khelper
   5 root  10  -5 000 S  0.0  0.0   0:00.00 kthread
  38 root  10  -5 000 S  0.0  0.0   0:00.04 kblockd/0
  41 root  10  -5 000 S  0.0  0.0   0:00.02 khubd
  57 root  15   0 000 D  0.0  0.0   0:20.29 pdflush


And the first of the oom-killer syslog messages:

ntpd invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0
Mem-info:
DMA per-cpu:
CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:
0
sshd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
Active:2816 inactive:2778 dirty:0 writeback:0 unstable:0
free:179 slab:858 mapped:1 pagetables:93 bounce:0 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Small System Paging Problem - OOM-killer goes nuts

2007-11-25 Thread Josh Goldsmith

Thanks for the response Mikael.

Is your 486 running a IDE disk on a normal interface or via USB?  I wonder 
if the NSLU2 only having I/O via USB might be significant.  Also, this is a 
2.6 kernel and I've seen spurious reports across the internet about similar 
oom-killer problems since about 2.6.7.


Thanks!
  -Josh

- Original Message - 
From: "Mikael Pettersson" <[EMAIL PROTECTED]>

To: <[EMAIL PROTECTED]>; 
Sent: Sunday, November 25, 2007 3:55 PM
Subject: Re: Small System Paging Problem - OOM-killer goes nuts



I'm no VM tuning expert, but I have and still do heavy compile
jobs on similarly configured machines, with no OOM problems:

I regularly build 2.6 kernels and occasionally also gcc on a
100MHz 486 with 28MB of RAM and perhaps 500MB of swap. It runs
a standard but stripped down Fedora Core 4 user-space, with ext3
file systems and a kernel that doesn't include anything non-essential.
The machine will swap madly, but the OOM killer never triggers.
(All system settings are FC4 defaults. I haven't touched them.)

In the past I did a fair amount of package rebuilds and test suite
runs on an NSLU2 myself, with a 2.4 Linksys/Openslug kernel, ext3,
and a 1GB or perhaps 2GB swap partition on a disk attached via a
USB2-to-PATA enclosure. Even when swapping heavily the OOM killer
wouldn't trigger.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Small System Paging Problem - OOM-killer goes nuts

2007-11-25 Thread Josh Goldsmith

Hi,

 I have a Linksys NSLU2 running 2.6.21 (I can replicate the problem on 
2.6.23 but it isn't fully supported on SlugOS).  It is a armv5teb device 
with 32MB of RAM, 400+ MB swap on its 160GB USB2 root disk.  The machine is 
used as a fileserver and to build packages for other ARM devices.  It may be 
underpowered by today's standard but is a whole lot faster than my first 
Linux system (386sx20 with 4MB RAM) but the whole system with disk uses <8 
watts and is silent.


 The problem comes when I try to untar a large file (in this case 
linux-2.6.23.tar.bz2).  Regardless if I kill off every other process, 
eventually the oom-killer will appear and kill either the tar or the shell. 
I've tried every tuning option I and my buddy Google could find including 
(/proc/sys/vm/overcommit*) with no success.  I'm not worried about paging 
impacting performance.


 I'd appreciate any help, pointers, or gentle taps with the cluebat.

-Josh

Error output to console: http://www.pastebin.ca/797155

config ->  http://www.pastebin.ca/797206

slug2>$ uname -a
Linux slug2 2.6.21 #1 PREEMPT Fri Nov 9 11:54:06 MST 2007 armv5teb unknown

slug2:~$ free
total   used   free sharedbuffers cached
Mem: 30352  29124   1228  0  10196   9468
-/+ buffers/cache:   9460  20892
Swap:   465876  0 465876

cat /proc/swaps
FilenameTypeSizeUsed 
Priority

/dev/sda4   partition   465876  0   -1

slug2:~$ lsmod
Module  Size  Used by
nfsd  186556  8
exportfs4320  1 nfsd
lockd  51416  2 nfsd
sunrpc131952  2 nfsd,lockd
reiserfs  255380  1
ixp4xx_mac 14644  0
ixp4xx_qmgr 5388  5 ixp4xx_mac
mii 3424  1 ixp4xx_mac
ext3  110472  2
jbd47784  1 ext3
mbcache 5604  1 ext3
ohci_hcd   16804  0
ehci_hcd   30252  0

slug2>$ dmesg
<5>Linux version 2.6.21 ([EMAIL PROTECTED]) (gcc version 4.1.1) #1 PREEMPT Fri Nov 9 
11:54:06 MST 2007

<4>CPU: XScale-IXP42x Family [690541f1] revision 1 (ARMv5TE), cr=39ff
<4>Machine: Linksys NSLU2
<4>Memory policy: ECC disabled, Data cache writeback
<7>On node 0 totalpages: 8192
<7>  DMA zone: 64 pages used for memmap
<7>  DMA zone: 0 pages reserved
<7>  DMA zone: 8128 pages, LIFO batch:0
<7>  Normal zone: 0 pages used for memmap
<4>CPU0: D VIVT undefined 5 cache
<4>CPU0: I cache: 32768 bytes, associativity 32, 32 byte lines, 32 sets
<4>CPU0: D cache: 32768 bytes, associativity 32, 32 byte lines, 32 sets
<4>Built 1 zonelists.  Total pages: 8128
<5>Kernel command line: rtc-x1205.probe=0,0x6f console=ttyS0,115200n8 
root=/dev/mtdblock4 rootfstype=jffs2 rw init=/linuxrc noirqdebug

<6>IRQ lockup detection disabled
<4>PID hash table entries: 128 (order: 7, 512 bytes)
<4>Dentry cache hash table entries: 4096 (order: 2, 16384 bytes)
<4>Inode-cache hash table entries: 2048 (order: 1, 8192 bytes)
<6>Memory: 32MB = 32MB total
<5>Memory: 30268KB available (1940K code, 154K data, 84K init)
<7>Calibrating delay loop... 266.24 BogoMIPS (lpj=1331200)
<4>Mount-cache hash table entries: 512
<6>CPU: Testing write buffer coherency: ok
<6>NET: Registered protocol family 16
<4>IXP4xx: Using 16MiB expansion bus window size
<4>PCI: IXP4xx is host
<4>PCI: IXP4xx Using direct access for memory space
<6>PCI: bus0: Fast back to back transfers disabled
<6>dmabounce: registered device :00:01.0 on pci bus
<6>dmabounce: registered device :00:01.1 on pci bus
<6>dmabounce: registered device :00:01.2 on pci bus
<5>SCSI subsystem initialized
<6>usbcore: registered new interface driver usbfs
<6>usbcore: registered new interface driver hub
<6>usbcore: registered new device driver usb
<6>Time: OSTS clocksource has been installed.
<6>NET: Registered protocol family 2
<4>IP route cache hash table entries: 1024 (order: 0, 4096 bytes)
<4>TCP established hash table entries: 1024 (order: 1, 8192 bytes)
<4>TCP bind hash table entries: 1024 (order: 0, 4096 bytes)
<6>TCP: Hash tables configured (established 1024 bind 1024)
<6>TCP reno registered
<4>NetWinder Floating Point Emulator V0.97 (double precision)
<6>JFFS2 version 2.2. (NAND) (C) 2001-2006 Red Hat, Inc.
<6>io scheduler noop registered
<6>io scheduler deadline registered (default)
<6>Serial: 8250/16550 driver $Revision: 1.90 $ 2 ports, IRQ sharing disabled
<6>serial8250.0: ttyS0 at MMIO 0xc800 (irq = 15) is a XScale
<6>serial8250.0: ttyS1 at MMIO 0xc8001000 (irq = 13) is a XScale
<4>RAMDISK driver initialized: 4 RAM disks of 10240K size 1024 blocksize
<6>IXP4XX NPE driver Version 0.3.0 initialized
<6>NFTL driver: nftlcore.c $Revision: 1.98 $, nftlmount.c $Revision: 1.41 $
<6>IXP4XX-Flash.0: Found 1 x16 devices at 0x0 in 16-bit bank
<7>IXP4XX-Flash.0: Found an alias at 0x80 for the chip at 0x0
<4> Intel/Sharp 

Small System Paging Problem - OOM-killer goes nuts

2007-11-25 Thread Josh Goldsmith

Hi,

 I have a Linksys NSLU2 running 2.6.21 (I can replicate the problem on 
2.6.23 but it isn't fully supported on SlugOS).  It is a armv5teb device 
with 32MB of RAM, 400+ MB swap on its 160GB USB2 root disk.  The machine is 
used as a fileserver and to build packages for other ARM devices.  It may be 
underpowered by today's standard but is a whole lot faster than my first 
Linux system (386sx20 with 4MB RAM) but the whole system with disk uses 8 
watts and is silent.


 The problem comes when I try to untar a large file (in this case 
linux-2.6.23.tar.bz2).  Regardless if I kill off every other process, 
eventually the oom-killer will appear and kill either the tar or the shell. 
I've tried every tuning option I and my buddy Google could find including 
(/proc/sys/vm/overcommit*) with no success.  I'm not worried about paging 
impacting performance.


 I'd appreciate any help, pointers, or gentle taps with the cluebat.

-Josh

Error output to console: http://www.pastebin.ca/797155

config -  http://www.pastebin.ca/797206

slug2$ uname -a
Linux slug2 2.6.21 #1 PREEMPT Fri Nov 9 11:54:06 MST 2007 armv5teb unknown

slug2:~$ free
total   used   free sharedbuffers cached
Mem: 30352  29124   1228  0  10196   9468
-/+ buffers/cache:   9460  20892
Swap:   465876  0 465876

cat /proc/swaps
FilenameTypeSizeUsed 
Priority

/dev/sda4   partition   465876  0   -1

slug2:~$ lsmod
Module  Size  Used by
nfsd  186556  8
exportfs4320  1 nfsd
lockd  51416  2 nfsd
sunrpc131952  2 nfsd,lockd
reiserfs  255380  1
ixp4xx_mac 14644  0
ixp4xx_qmgr 5388  5 ixp4xx_mac
mii 3424  1 ixp4xx_mac
ext3  110472  2
jbd47784  1 ext3
mbcache 5604  1 ext3
ohci_hcd   16804  0
ehci_hcd   30252  0

slug2$ dmesg
5Linux version 2.6.21 ([EMAIL PROTECTED]) (gcc version 4.1.1) #1 PREEMPT Fri Nov 9 
11:54:06 MST 2007

4CPU: XScale-IXP42x Family [690541f1] revision 1 (ARMv5TE), cr=39ff
4Machine: Linksys NSLU2
4Memory policy: ECC disabled, Data cache writeback
7On node 0 totalpages: 8192
7  DMA zone: 64 pages used for memmap
7  DMA zone: 0 pages reserved
7  DMA zone: 8128 pages, LIFO batch:0
7  Normal zone: 0 pages used for memmap
4CPU0: D VIVT undefined 5 cache
4CPU0: I cache: 32768 bytes, associativity 32, 32 byte lines, 32 sets
4CPU0: D cache: 32768 bytes, associativity 32, 32 byte lines, 32 sets
4Built 1 zonelists.  Total pages: 8128
5Kernel command line: rtc-x1205.probe=0,0x6f console=ttyS0,115200n8 
root=/dev/mtdblock4 rootfstype=jffs2 rw init=/linuxrc noirqdebug

6IRQ lockup detection disabled
4PID hash table entries: 128 (order: 7, 512 bytes)
4Dentry cache hash table entries: 4096 (order: 2, 16384 bytes)
4Inode-cache hash table entries: 2048 (order: 1, 8192 bytes)
6Memory: 32MB = 32MB total
5Memory: 30268KB available (1940K code, 154K data, 84K init)
7Calibrating delay loop... 266.24 BogoMIPS (lpj=1331200)
4Mount-cache hash table entries: 512
6CPU: Testing write buffer coherency: ok
6NET: Registered protocol family 16
4IXP4xx: Using 16MiB expansion bus window size
4PCI: IXP4xx is host
4PCI: IXP4xx Using direct access for memory space
6PCI: bus0: Fast back to back transfers disabled
6dmabounce: registered device :00:01.0 on pci bus
6dmabounce: registered device :00:01.1 on pci bus
6dmabounce: registered device :00:01.2 on pci bus
5SCSI subsystem initialized
6usbcore: registered new interface driver usbfs
6usbcore: registered new interface driver hub
6usbcore: registered new device driver usb
6Time: OSTS clocksource has been installed.
6NET: Registered protocol family 2
4IP route cache hash table entries: 1024 (order: 0, 4096 bytes)
4TCP established hash table entries: 1024 (order: 1, 8192 bytes)
4TCP bind hash table entries: 1024 (order: 0, 4096 bytes)
6TCP: Hash tables configured (established 1024 bind 1024)
6TCP reno registered
4NetWinder Floating Point Emulator V0.97 (double precision)
6JFFS2 version 2.2. (NAND) (C) 2001-2006 Red Hat, Inc.
6io scheduler noop registered
6io scheduler deadline registered (default)
6Serial: 8250/16550 driver $Revision: 1.90 $ 2 ports, IRQ sharing disabled
6serial8250.0: ttyS0 at MMIO 0xc800 (irq = 15) is a XScale
6serial8250.0: ttyS1 at MMIO 0xc8001000 (irq = 13) is a XScale
4RAMDISK driver initialized: 4 RAM disks of 10240K size 1024 blocksize
6IXP4XX NPE driver Version 0.3.0 initialized
6NFTL driver: nftlcore.c $Revision: 1.98 $, nftlmount.c $Revision: 1.41 $
6IXP4XX-Flash.0: Found 1 x16 devices at 0x0 in 16-bit bank
7IXP4XX-Flash.0: Found an alias at 0x80 for the chip at 0x0
4 Intel/Sharp Extended Query Table at 0x0031
6Using buffer write method
5cfi_cmdset_0001: Erase suspend on write enabled
7erase region 

Re: Small System Paging Problem - OOM-killer goes nuts

2007-11-25 Thread Josh Goldsmith

Thanks for the response Mikael.

Is your 486 running a IDE disk on a normal interface or via USB?  I wonder 
if the NSLU2 only having I/O via USB might be significant.  Also, this is a 
2.6 kernel and I've seen spurious reports across the internet about similar 
oom-killer problems since about 2.6.7.


Thanks!
  -Josh

- Original Message - 
From: Mikael Pettersson [EMAIL PROTECTED]

To: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org
Sent: Sunday, November 25, 2007 3:55 PM
Subject: Re: Small System Paging Problem - OOM-killer goes nuts



I'm no VM tuning expert, but I have and still do heavy compile
jobs on similarly configured machines, with no OOM problems:

I regularly build 2.6 kernels and occasionally also gcc on a
100MHz 486 with 28MB of RAM and perhaps 500MB of swap. It runs
a standard but stripped down Fedora Core 4 user-space, with ext3
file systems and a kernel that doesn't include anything non-essential.
The machine will swap madly, but the OOM killer never triggers.
(All system settings are FC4 defaults. I haven't touched them.)

In the past I did a fair amount of package rebuilds and test suite
runs on an NSLU2 myself, with a 2.4 Linksys/Openslug kernel, ext3,
and a 1GB or perhaps 2GB swap partition on a disk attached via a
USB2-to-PATA enclosure. Even when swapping heavily the OOM killer
wouldn't trigger.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/