Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-22 Thread Donald Douwsma
Andrew Morton wrote:
>> On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <[EMAIL PROTECTED]> 
>> wrote:
>> Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
>> the OOM killer and kill all of my processes?
> 
> What's that?   Software raid or hardware raid?  If the latter, which driver?

I've hit this using local disk while testing xfs built against 2.6.20-rc4 (SMP 
x86_64)

dmesg follows, I'm not sure if anything in this is useful after the first event 
as our automated tests continued on
after the failure.

> Please include /proc/meminfo from after the oom-killing.
>
> Please work out what is using all that slab memory, via /proc/slabinfo.

Sorry I didnt pick this up ether.
I'll try to reproduce this and gather some more detailed info for a single 
event.

Donald


...
XFS mounting filesystem sdb5
Ending clean XFS mount for filesystem: sdb5
XFS mounting filesystem sdb5
Ending clean XFS mount for filesystem: sdb5
hald invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0

Call Trace:
 [] out_of_memory+0x70/0x25d
 [] __alloc_pages+0x22c/0x2b5
 [] alloc_page_vma+0x71/0x76
 [] read_swap_cache_async+0x45/0xd8
 [] swapin_readahead+0x60/0xd3
 [] __handle_mm_fault+0x703/0x9d8
 [] do_page_fault+0x42b/0x7b3
 [] do_readv_writev+0x176/0x18b
 [] thread_return+0x0/0xed
 [] __const_udelay+0x2c/0x2d
 [] scsi_done+0x0/0x17
 [] error_exit+0x0/0x84

Mem-info:
Node 0 DMA per-cpu:
CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU1: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU2: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU3: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU0: Hot: hi:  186, btch:  31 usd:  31   Cold: hi:   62, btch:  15 usd:  53
CPU1: Hot: hi:  186, btch:  31 usd:   2   Cold: hi:   62, btch:  15 usd:  60
CPU2: Hot: hi:  186, btch:  31 usd:  20   Cold: hi:   62, btch:  15 usd:  47
CPU3: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:   62, btch:  15 usd:  56
Active:76 inactive:495856 dirty:0 writeback:0 unstable:0 free:3680 slab:9119 
mapped:32 pagetables:637
Node 0 DMA free:8036kB min:24kB low:28kB high:36kB active:0kB inactive:1856kB 
present:9376kB pages_scanned:3296
all_unreclaimable? yes
lowmem_reserve[]: 0 2003 2003
Node 0 DMA32 free:6684kB min:5712kB low:7140kB high:8568kB active:304kB 
inactive:1981624kB present:2052068kB
pages_scanned:4343329 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 
1*2048kB 1*4096kB = 8036kB
Node 0 DMA32: 273*4kB 29*8kB 1*16kB 1*32kB 1*64kB 1*128kB 2*256kB 1*512kB 
0*1024kB 0*2048kB 1*4096kB = 6684kB
Swap cache: add 741048, delete 244661, find 84826/143198, race 680+239
Free swap  = 1088524kB
Total swap = 3140668kB
Free swap:   1088524kB
524224 pages of RAM
9619 reserved pages
259 pages shared
496388 pages swap cached
No available memory (MPOL_BIND): kill process 3492 (hald) score 0 or a child
Killed process 3626 (hald-addon-acpi)
top invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

Call Trace:
 [] out_of_memory+0x70/0x25d
 [] __alloc_pages+0x22c/0x2b5
 [] alloc_pages_current+0x74/0x79
 [] __page_cache_alloc+0xb/0xe
 [] __do_page_cache_readahead+0xa1/0x217
 [] io_schedule+0x28/0x33
 [] __wait_on_bit_lock+0x5b/0x66
 [] __lock_page+0x72/0x78
 [] do_page_cache_readahead+0x4e/0x5a
 [] filemap_nopage+0x140/0x30c
 [] __handle_mm_fault+0x1fb/0x9d8
 [] do_page_fault+0x42b/0x7b3
 [] __wake_up+0x43/0x50
 [] tty_ldisc_deref+0x71/0x76
 [] error_exit+0x0/0x84

Mem-info:
Node 0 DMA per-cpu:
CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU1: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU2: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU3: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU0: Hot: hi:  186, btch:  31 usd:  31   Cold: hi:   62, btch:  15 usd:  53
CPU1: Hot: hi:  186, btch:  31 usd:   2   Cold: hi:   62, btch:  15 usd:  60
CPU2: Hot: hi:  186, btch:  31 usd:   1   Cold: hi:   62, btch:  15 usd:  10
CPU3: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:   62, btch:  15 usd:  26
Active:90 inactive:496233 dirty:0 writeback:0 unstable:0 free:3485 slab:9119 
mapped:32 pagetables:637
Node 0 DMA free:8036kB min:24kB low:28kB high:36kB active:0kB inactive:1856kB 
present:9376kB pages_scanned:3328
all_unreclaimable? yes
lowmem_reserve[]: 0 2003 2003
Node 0 DMA32 free:5904kB min:5712kB low:7140kB high:8568kB active:360kB 
inactive:1983092kB present:2052068kB
pages_scanned:4587649 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 
1*2048kB 1*4096kB = 8036kB
Node 0 DMA32: 78*4kB 29*8kB 1*16kB 1*32kB 1*64kB 1*128kB 2*256kB 1*512kB 
0*1024kB 0*2048kB 1*4096kB = 

Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-22 Thread Donald Douwsma
Andrew Morton wrote:
 On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz [EMAIL PROTECTED] 
 wrote:
 Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
 the OOM killer and kill all of my processes?
 
 What's that?   Software raid or hardware raid?  If the latter, which driver?

I've hit this using local disk while testing xfs built against 2.6.20-rc4 (SMP 
x86_64)

dmesg follows, I'm not sure if anything in this is useful after the first event 
as our automated tests continued on
after the failure.

 Please include /proc/meminfo from after the oom-killing.

 Please work out what is using all that slab memory, via /proc/slabinfo.

Sorry I didnt pick this up ether.
I'll try to reproduce this and gather some more detailed info for a single 
event.

Donald


...
XFS mounting filesystem sdb5
Ending clean XFS mount for filesystem: sdb5
XFS mounting filesystem sdb5
Ending clean XFS mount for filesystem: sdb5
hald invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0

Call Trace:
 [80257367] out_of_memory+0x70/0x25d
 [80258f6b] __alloc_pages+0x22c/0x2b5
 [8026d889] alloc_page_vma+0x71/0x76
 [8026937b] read_swap_cache_async+0x45/0xd8
 [8025f2e0] swapin_readahead+0x60/0xd3
 [80260ece] __handle_mm_fault+0x703/0x9d8
 [80532bf7] do_page_fault+0x42b/0x7b3
 [80278adf] do_readv_writev+0x176/0x18b
 [8052efde] thread_return+0x0/0xed
 [8034d7f5] __const_udelay+0x2c/0x2d
 [803f4e0b] scsi_done+0x0/0x17
 [8053109d] error_exit+0x0/0x84

Mem-info:
Node 0 DMA per-cpu:
CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU1: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU2: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU3: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU0: Hot: hi:  186, btch:  31 usd:  31   Cold: hi:   62, btch:  15 usd:  53
CPU1: Hot: hi:  186, btch:  31 usd:   2   Cold: hi:   62, btch:  15 usd:  60
CPU2: Hot: hi:  186, btch:  31 usd:  20   Cold: hi:   62, btch:  15 usd:  47
CPU3: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:   62, btch:  15 usd:  56
Active:76 inactive:495856 dirty:0 writeback:0 unstable:0 free:3680 slab:9119 
mapped:32 pagetables:637
Node 0 DMA free:8036kB min:24kB low:28kB high:36kB active:0kB inactive:1856kB 
present:9376kB pages_scanned:3296
all_unreclaimable? yes
lowmem_reserve[]: 0 2003 2003
Node 0 DMA32 free:6684kB min:5712kB low:7140kB high:8568kB active:304kB 
inactive:1981624kB present:2052068kB
pages_scanned:4343329 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 
1*2048kB 1*4096kB = 8036kB
Node 0 DMA32: 273*4kB 29*8kB 1*16kB 1*32kB 1*64kB 1*128kB 2*256kB 1*512kB 
0*1024kB 0*2048kB 1*4096kB = 6684kB
Swap cache: add 741048, delete 244661, find 84826/143198, race 680+239
Free swap  = 1088524kB
Total swap = 3140668kB
Free swap:   1088524kB
524224 pages of RAM
9619 reserved pages
259 pages shared
496388 pages swap cached
No available memory (MPOL_BIND): kill process 3492 (hald) score 0 or a child
Killed process 3626 (hald-addon-acpi)
top invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

Call Trace:
 [80257367] out_of_memory+0x70/0x25d
 [80258f6b] __alloc_pages+0x22c/0x2b5
 [8026e6a3] alloc_pages_current+0x74/0x79
 [802548c8] __page_cache_alloc+0xb/0xe
 [8025a65f] __do_page_cache_readahead+0xa1/0x217
 [8052f776] io_schedule+0x28/0x33
 [8052f9e7] __wait_on_bit_lock+0x5b/0x66
 [802546de] __lock_page+0x72/0x78
 [8025ab22] do_page_cache_readahead+0x4e/0x5a
 [80256714] filemap_nopage+0x140/0x30c
 [802609c6] __handle_mm_fault+0x1fb/0x9d8
 [80532bf7] do_page_fault+0x42b/0x7b3
 [80228273] __wake_up+0x43/0x50
 [80380bd5] tty_ldisc_deref+0x71/0x76
 [8053109d] error_exit+0x0/0x84

Mem-info:
Node 0 DMA per-cpu:
CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU1: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU2: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
CPU3: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU0: Hot: hi:  186, btch:  31 usd:  31   Cold: hi:   62, btch:  15 usd:  53
CPU1: Hot: hi:  186, btch:  31 usd:   2   Cold: hi:   62, btch:  15 usd:  60
CPU2: Hot: hi:  186, btch:  31 usd:   1   Cold: hi:   62, btch:  15 usd:  10
CPU3: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:   62, btch:  15 usd:  26
Active:90 inactive:496233 dirty:0 writeback:0 unstable:0 free:3485 slab:9119 
mapped:32 pagetables:637
Node 0 DMA free:8036kB min:24kB low:28kB high:36kB active:0kB inactive:1856kB 
present:9376kB pages_scanned:3328
all_unreclaimable? yes