PROBLEM: kernel memory subsystem incorrectly invokes OOM killer under certain situations

2007-10-14 Thread Chris Drake
Hi linux-kernel,



[1.] One line summary of the problem:

kernel memory subsystem incorrectly invokes OOM killer under certain situations


[2.] Full description of the problem/report:

My guess is that whatever invokes the OOM killer is incorrectly
"deciding" that memory allocated for disk cache operations cannot be
"reclaimed", or, the oom killer code itself is incorrectly killing
processes when the cause of the memory exhaustion is the disk cache
subsystem (and not a runaway process).

Specifically - I have a RedHat AS4u5 2.6.9-55.0.6.ELsmp system with
4gigs RAM, running vmware 1.0.4, and another AS4 guest, which has 3
virtual SCSI drives.  The following guest command reliably causes the
host OOM killer to terminate my vmware process:

dd if=/dev/sdb of=/deb/sdc

(to clone the contents of a 16gb virtual disk).  The host has one 2TB
file system only.

While it's easiest to use vmware to demonstrate the problem, this does
not appear to be a problem with vmware itself.


[3.] Keywords (i.e., modules, networking, kernel):

/usr/src/redhat/BUILD/kernel-2.6.9/linux-2.6.9/mm/oom_kill.c

OOM killer


[4.] Kernel version (from /proc/version):

Linux version 2.6.9-55.0.6.ELsmp ([EMAIL PROTECTED]) (gcc version 3.4.6 
20060404 (Red Hat 3.4.6-8)) #1 SMP Thu Aug 23 11:11:20 EDT 2007


[5.] Output of Oops.. message (if applicable) with symbolic information 
 resolved (see Documentation/oops-tracing.txt)

Here's the messages output showing the offending oom-kill.

Oct 14 21:05:14 dor kernel: oom-killer: gfp_mask=0xd0
Oct 14 21:05:14 dor kernel: Mem-info:
Oct 14 21:05:14 dor kernel: DMA per-cpu:
Oct 14 21:05:14 dor kernel: cpu 0 hot: low 2, high 6, batch 1
Oct 14 21:05:14 dor kernel: cpu 0 cold: low 0, high 2, batch 1
Oct 14 21:05:14 dor kernel: cpu 1 hot: low 2, high 6, batch 1
Oct 14 21:05:14 dor kernel: cpu 1 cold: low 0, high 2, batch 1
Oct 14 21:05:14 dor kernel: cpu 2 hot: low 2, high 6, batch 1
Oct 14 21:05:14 dor kernel: cpu 2 cold: low 0, high 2, batch 1
Oct 14 21:05:14 dor kernel: cpu 3 hot: low 2, high 6, batch 1
Oct 14 21:05:14 dor kernel: cpu 3 cold: low 0, high 2, batch 1
Oct 14 21:05:14 dor kernel: Normal per-cpu:
Oct 14 21:05:14 dor kernel: cpu 0 hot: low 32, high 96, batch 16
Oct 14 21:05:20 dor kernel: cpu 0 cold: low 0, high 32, batch 16
Oct 14 21:05:20 dor kernel: cpu 1 hot: low 32, high 96, batch 16
Oct 14 21:05:21 dor kernel: cpu 1 cold: low 0, high 32, batch 16
Oct 14 21:05:21 dor kernel: cpu 2 hot: low 32, high 96, batch 16
Oct 14 21:05:21 dor kernel: cpu 2 cold: low 0, high 32, batch 16
Oct 14 21:05:21 dor kernel: cpu 3 hot: low 32, high 96, batch 16
Oct 14 21:05:21 dor kernel: cpu 3 cold: low 0, high 32, batch 16
Oct 14 21:05:21 dor kernel: HighMem per-cpu:
Oct 14 21:05:21 dor kernel: cpu 0 hot: low 32, high 96, batch 16
Oct 14 21:05:21 dor kernel: cpu 0 cold: low 0, high 32, batch 16
Oct 14 21:05:21 dor kernel: cpu 1 hot: low 32, high 96, batch 16
Oct 14 21:05:21 dor kernel: cpu 1 cold: low 0, high 32, batch 16
Oct 14 21:05:21 dor kernel: cpu 2 hot: low 32, high 96, batch 16
Oct 14 21:05:21 dor kernel: cpu 2 cold: low 0, high 32, batch 16
Oct 14 21:05:21 dor kernel: cpu 3 hot: low 32, high 96, batch 16
Oct 14 21:05:21 dor kernel: cpu 3 cold: low 0, high 32, batch 16
Oct 14 21:05:21 dor kernel: 
Oct 14 21:05:21 dor kernel: Free pages:   26152kB (3584kB HighMem)
Oct 14 21:05:21 dor kernel: Active:599689 inactive:398895 dirty:429 
writeback:15 unstable:0 free:6538 slab:13298 mapped:369678 pagetables:6087
Oct 14 21:05:21 dor kernel: DMA free:12544kB min:180kB low:360kB high:540kB 
active:0kB inactive:0kB present:16384kB pages_scanned:871 all_unreclaimable? yes
Oct 14 21:05:27 dor kernel: protections[]: 0 0 0
Oct 14 21:05:28 dor kernel: Normal free:10024kB min:10056kB low:20112kB 
high:30168kB active:928kB inactive:775024kB present:901120kB 
pages_scanned:5812455 all_unreclaimable? yes
Oct 14 21:05:28 dor kernel: protections[]: 0 0 0
Oct 14 21:05:28 dor kernel: HighMem free:3584kB min:512kB low:1024kB 
high:1536kB active:2397828kB inactive:820556kB present:3538944kB 
pages_scanned:0 all_unreclaimable? no
Oct 14 21:05:28 dor kernel: protections[]: 0 0 0
Oct 14 21:05:28 dor kernel: DMA: 4*4kB 4*8kB 3*16kB 3*32kB 3*64kB 3*128kB 
2*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12544kB
Oct 14 21:05:28 dor kernel: Normal: 0*4kB 1*8kB 0*16kB 1*32kB 0*64kB 10*128kB 
6*256kB 0*512kB 1*1024kB 1*2048kB 1*4096kB = 10024kB
Oct 14 21:05:28 dor kernel: HighMem: 52*4kB 68*8kB 85*16kB 26*32kB 2*64kB 
0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 3584kB
Oct 14 21:05:28 dor kernel: Swap cache: add 678557, delete 673525, find 
277514/347205, race 0+5
Oct 14 21:05:28 dor kernel: 0 bounce buffer pages
Oct 14 21:05:28 dor kernel: Free swap:   20303648kB
Oct 14 21:05:28 dor kernel: 1114112 pages of RAM
Oct 14 21:05:28 dor kernel: 819184 pages of HIGHMEM
Oct 14 21:05:28 dor kernel: 75731 reserved pages
Oct 14 21:05:28 dor kernel: 1013077 pages shared
Oct 14 21:05:28 dor kernel: 5040 p

PROBLEM: kernel memory subsystem incorrectly invokes OOM killer under certain situations

2007-10-14 Thread Chris Drake
Hi linux-kernel,



[1.] One line summary of the problem:

kernel memory subsystem incorrectly invokes OOM killer under certain situations


[2.] Full description of the problem/report:

My guess is that whatever invokes the OOM killer is incorrectly
deciding that memory allocated for disk cache operations cannot be
reclaimed, or, the oom killer code itself is incorrectly killing
processes when the cause of the memory exhaustion is the disk cache
subsystem (and not a runaway process).

Specifically - I have a RedHat AS4u5 2.6.9-55.0.6.ELsmp system with
4gigs RAM, running vmware 1.0.4, and another AS4 guest, which has 3
virtual SCSI drives.  The following guest command reliably causes the
host OOM killer to terminate my vmware process:

dd if=/dev/sdb of=/deb/sdc

(to clone the contents of a 16gb virtual disk).  The host has one 2TB
file system only.

While it's easiest to use vmware to demonstrate the problem, this does
not appear to be a problem with vmware itself.


[3.] Keywords (i.e., modules, networking, kernel):

/usr/src/redhat/BUILD/kernel-2.6.9/linux-2.6.9/mm/oom_kill.c

OOM killer


[4.] Kernel version (from /proc/version):

Linux version 2.6.9-55.0.6.ELsmp ([EMAIL PROTECTED]) (gcc version 3.4.6 
20060404 (Red Hat 3.4.6-8)) #1 SMP Thu Aug 23 11:11:20 EDT 2007


[5.] Output of Oops.. message (if applicable) with symbolic information 
 resolved (see Documentation/oops-tracing.txt)

Here's the messages output showing the offending oom-kill.

Oct 14 21:05:14 dor kernel: oom-killer: gfp_mask=0xd0
Oct 14 21:05:14 dor kernel: Mem-info:
Oct 14 21:05:14 dor kernel: DMA per-cpu:
Oct 14 21:05:14 dor kernel: cpu 0 hot: low 2, high 6, batch 1
Oct 14 21:05:14 dor kernel: cpu 0 cold: low 0, high 2, batch 1
Oct 14 21:05:14 dor kernel: cpu 1 hot: low 2, high 6, batch 1
Oct 14 21:05:14 dor kernel: cpu 1 cold: low 0, high 2, batch 1
Oct 14 21:05:14 dor kernel: cpu 2 hot: low 2, high 6, batch 1
Oct 14 21:05:14 dor kernel: cpu 2 cold: low 0, high 2, batch 1
Oct 14 21:05:14 dor kernel: cpu 3 hot: low 2, high 6, batch 1
Oct 14 21:05:14 dor kernel: cpu 3 cold: low 0, high 2, batch 1
Oct 14 21:05:14 dor kernel: Normal per-cpu:
Oct 14 21:05:14 dor kernel: cpu 0 hot: low 32, high 96, batch 16
Oct 14 21:05:20 dor kernel: cpu 0 cold: low 0, high 32, batch 16
Oct 14 21:05:20 dor kernel: cpu 1 hot: low 32, high 96, batch 16
Oct 14 21:05:21 dor kernel: cpu 1 cold: low 0, high 32, batch 16
Oct 14 21:05:21 dor kernel: cpu 2 hot: low 32, high 96, batch 16
Oct 14 21:05:21 dor kernel: cpu 2 cold: low 0, high 32, batch 16
Oct 14 21:05:21 dor kernel: cpu 3 hot: low 32, high 96, batch 16
Oct 14 21:05:21 dor kernel: cpu 3 cold: low 0, high 32, batch 16
Oct 14 21:05:21 dor kernel: HighMem per-cpu:
Oct 14 21:05:21 dor kernel: cpu 0 hot: low 32, high 96, batch 16
Oct 14 21:05:21 dor kernel: cpu 0 cold: low 0, high 32, batch 16
Oct 14 21:05:21 dor kernel: cpu 1 hot: low 32, high 96, batch 16
Oct 14 21:05:21 dor kernel: cpu 1 cold: low 0, high 32, batch 16
Oct 14 21:05:21 dor kernel: cpu 2 hot: low 32, high 96, batch 16
Oct 14 21:05:21 dor kernel: cpu 2 cold: low 0, high 32, batch 16
Oct 14 21:05:21 dor kernel: cpu 3 hot: low 32, high 96, batch 16
Oct 14 21:05:21 dor kernel: cpu 3 cold: low 0, high 32, batch 16
Oct 14 21:05:21 dor kernel: 
Oct 14 21:05:21 dor kernel: Free pages:   26152kB (3584kB HighMem)
Oct 14 21:05:21 dor kernel: Active:599689 inactive:398895 dirty:429 
writeback:15 unstable:0 free:6538 slab:13298 mapped:369678 pagetables:6087
Oct 14 21:05:21 dor kernel: DMA free:12544kB min:180kB low:360kB high:540kB 
active:0kB inactive:0kB present:16384kB pages_scanned:871 all_unreclaimable? yes
Oct 14 21:05:27 dor kernel: protections[]: 0 0 0
Oct 14 21:05:28 dor kernel: Normal free:10024kB min:10056kB low:20112kB 
high:30168kB active:928kB inactive:775024kB present:901120kB 
pages_scanned:5812455 all_unreclaimable? yes
Oct 14 21:05:28 dor kernel: protections[]: 0 0 0
Oct 14 21:05:28 dor kernel: HighMem free:3584kB min:512kB low:1024kB 
high:1536kB active:2397828kB inactive:820556kB present:3538944kB 
pages_scanned:0 all_unreclaimable? no
Oct 14 21:05:28 dor kernel: protections[]: 0 0 0
Oct 14 21:05:28 dor kernel: DMA: 4*4kB 4*8kB 3*16kB 3*32kB 3*64kB 3*128kB 
2*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12544kB
Oct 14 21:05:28 dor kernel: Normal: 0*4kB 1*8kB 0*16kB 1*32kB 0*64kB 10*128kB 
6*256kB 0*512kB 1*1024kB 1*2048kB 1*4096kB = 10024kB
Oct 14 21:05:28 dor kernel: HighMem: 52*4kB 68*8kB 85*16kB 26*32kB 2*64kB 
0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 3584kB
Oct 14 21:05:28 dor kernel: Swap cache: add 678557, delete 673525, find 
277514/347205, race 0+5
Oct 14 21:05:28 dor kernel: 0 bounce buffer pages
Oct 14 21:05:28 dor kernel: Free swap:   20303648kB
Oct 14 21:05:28 dor kernel: 1114112 pages of RAM
Oct 14 21:05:28 dor kernel: 819184 pages of HIGHMEM
Oct 14 21:05:28 dor kernel: 75731 reserved pages
Oct 14 21:05:28 dor kernel: 1013077 pages shared
Oct 14 21:05:28 dor kernel: 5040 pages swap cached
Oct 14