PROBLEM: kernel memory subsystem incorrectly invokes OOM killer under certain situations
Hi linux-kernel, [1.] One line summary of the problem: kernel memory subsystem incorrectly invokes OOM killer under certain situations [2.] Full description of the problem/report: My guess is that whatever invokes the OOM killer is incorrectly "deciding" that memory allocated for disk cache operations cannot be "reclaimed", or, the oom killer code itself is incorrectly killing processes when the cause of the memory exhaustion is the disk cache subsystem (and not a runaway process). Specifically - I have a RedHat AS4u5 2.6.9-55.0.6.ELsmp system with 4gigs RAM, running vmware 1.0.4, and another AS4 guest, which has 3 virtual SCSI drives. The following guest command reliably causes the host OOM killer to terminate my vmware process: dd if=/dev/sdb of=/deb/sdc (to clone the contents of a 16gb virtual disk). The host has one 2TB file system only. While it's easiest to use vmware to demonstrate the problem, this does not appear to be a problem with vmware itself. [3.] Keywords (i.e., modules, networking, kernel): /usr/src/redhat/BUILD/kernel-2.6.9/linux-2.6.9/mm/oom_kill.c OOM killer [4.] Kernel version (from /proc/version): Linux version 2.6.9-55.0.6.ELsmp ([EMAIL PROTECTED]) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-8)) #1 SMP Thu Aug 23 11:11:20 EDT 2007 [5.] Output of Oops.. message (if applicable) with symbolic information resolved (see Documentation/oops-tracing.txt) Here's the messages output showing the offending oom-kill. Oct 14 21:05:14 dor kernel: oom-killer: gfp_mask=0xd0 Oct 14 21:05:14 dor kernel: Mem-info: Oct 14 21:05:14 dor kernel: DMA per-cpu: Oct 14 21:05:14 dor kernel: cpu 0 hot: low 2, high 6, batch 1 Oct 14 21:05:14 dor kernel: cpu 0 cold: low 0, high 2, batch 1 Oct 14 21:05:14 dor kernel: cpu 1 hot: low 2, high 6, batch 1 Oct 14 21:05:14 dor kernel: cpu 1 cold: low 0, high 2, batch 1 Oct 14 21:05:14 dor kernel: cpu 2 hot: low 2, high 6, batch 1 Oct 14 21:05:14 dor kernel: cpu 2 cold: low 0, high 2, batch 1 Oct 14 21:05:14 dor kernel: cpu 3 hot: low 2, high 6, batch 1 Oct 14 21:05:14 dor kernel: cpu 3 cold: low 0, high 2, batch 1 Oct 14 21:05:14 dor kernel: Normal per-cpu: Oct 14 21:05:14 dor kernel: cpu 0 hot: low 32, high 96, batch 16 Oct 14 21:05:20 dor kernel: cpu 0 cold: low 0, high 32, batch 16 Oct 14 21:05:20 dor kernel: cpu 1 hot: low 32, high 96, batch 16 Oct 14 21:05:21 dor kernel: cpu 1 cold: low 0, high 32, batch 16 Oct 14 21:05:21 dor kernel: cpu 2 hot: low 32, high 96, batch 16 Oct 14 21:05:21 dor kernel: cpu 2 cold: low 0, high 32, batch 16 Oct 14 21:05:21 dor kernel: cpu 3 hot: low 32, high 96, batch 16 Oct 14 21:05:21 dor kernel: cpu 3 cold: low 0, high 32, batch 16 Oct 14 21:05:21 dor kernel: HighMem per-cpu: Oct 14 21:05:21 dor kernel: cpu 0 hot: low 32, high 96, batch 16 Oct 14 21:05:21 dor kernel: cpu 0 cold: low 0, high 32, batch 16 Oct 14 21:05:21 dor kernel: cpu 1 hot: low 32, high 96, batch 16 Oct 14 21:05:21 dor kernel: cpu 1 cold: low 0, high 32, batch 16 Oct 14 21:05:21 dor kernel: cpu 2 hot: low 32, high 96, batch 16 Oct 14 21:05:21 dor kernel: cpu 2 cold: low 0, high 32, batch 16 Oct 14 21:05:21 dor kernel: cpu 3 hot: low 32, high 96, batch 16 Oct 14 21:05:21 dor kernel: cpu 3 cold: low 0, high 32, batch 16 Oct 14 21:05:21 dor kernel: Oct 14 21:05:21 dor kernel: Free pages: 26152kB (3584kB HighMem) Oct 14 21:05:21 dor kernel: Active:599689 inactive:398895 dirty:429 writeback:15 unstable:0 free:6538 slab:13298 mapped:369678 pagetables:6087 Oct 14 21:05:21 dor kernel: DMA free:12544kB min:180kB low:360kB high:540kB active:0kB inactive:0kB present:16384kB pages_scanned:871 all_unreclaimable? yes Oct 14 21:05:27 dor kernel: protections[]: 0 0 0 Oct 14 21:05:28 dor kernel: Normal free:10024kB min:10056kB low:20112kB high:30168kB active:928kB inactive:775024kB present:901120kB pages_scanned:5812455 all_unreclaimable? yes Oct 14 21:05:28 dor kernel: protections[]: 0 0 0 Oct 14 21:05:28 dor kernel: HighMem free:3584kB min:512kB low:1024kB high:1536kB active:2397828kB inactive:820556kB present:3538944kB pages_scanned:0 all_unreclaimable? no Oct 14 21:05:28 dor kernel: protections[]: 0 0 0 Oct 14 21:05:28 dor kernel: DMA: 4*4kB 4*8kB 3*16kB 3*32kB 3*64kB 3*128kB 2*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12544kB Oct 14 21:05:28 dor kernel: Normal: 0*4kB 1*8kB 0*16kB 1*32kB 0*64kB 10*128kB 6*256kB 0*512kB 1*1024kB 1*2048kB 1*4096kB = 10024kB Oct 14 21:05:28 dor kernel: HighMem: 52*4kB 68*8kB 85*16kB 26*32kB 2*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 3584kB Oct 14 21:05:28 dor kernel: Swap cache: add 678557, delete 673525, find 277514/347205, race 0+5 Oct 14 21:05:28 dor kernel: 0 bounce buffer pages Oct 14 21:05:28 dor kernel: Free swap: 20303648kB Oct 14 21:05:28 dor kernel: 1114112 pages of RAM Oct 14 21:05:28 dor kernel: 819184 pages of HIGHMEM Oct 14 21:05:28 dor kernel: 75731 reserved pages Oct 14 21:05:28 dor kernel: 1013077 pages shared Oct 14 21:05:28 dor kernel: 5040 p
PROBLEM: kernel memory subsystem incorrectly invokes OOM killer under certain situations
Hi linux-kernel, [1.] One line summary of the problem: kernel memory subsystem incorrectly invokes OOM killer under certain situations [2.] Full description of the problem/report: My guess is that whatever invokes the OOM killer is incorrectly deciding that memory allocated for disk cache operations cannot be reclaimed, or, the oom killer code itself is incorrectly killing processes when the cause of the memory exhaustion is the disk cache subsystem (and not a runaway process). Specifically - I have a RedHat AS4u5 2.6.9-55.0.6.ELsmp system with 4gigs RAM, running vmware 1.0.4, and another AS4 guest, which has 3 virtual SCSI drives. The following guest command reliably causes the host OOM killer to terminate my vmware process: dd if=/dev/sdb of=/deb/sdc (to clone the contents of a 16gb virtual disk). The host has one 2TB file system only. While it's easiest to use vmware to demonstrate the problem, this does not appear to be a problem with vmware itself. [3.] Keywords (i.e., modules, networking, kernel): /usr/src/redhat/BUILD/kernel-2.6.9/linux-2.6.9/mm/oom_kill.c OOM killer [4.] Kernel version (from /proc/version): Linux version 2.6.9-55.0.6.ELsmp ([EMAIL PROTECTED]) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-8)) #1 SMP Thu Aug 23 11:11:20 EDT 2007 [5.] Output of Oops.. message (if applicable) with symbolic information resolved (see Documentation/oops-tracing.txt) Here's the messages output showing the offending oom-kill. Oct 14 21:05:14 dor kernel: oom-killer: gfp_mask=0xd0 Oct 14 21:05:14 dor kernel: Mem-info: Oct 14 21:05:14 dor kernel: DMA per-cpu: Oct 14 21:05:14 dor kernel: cpu 0 hot: low 2, high 6, batch 1 Oct 14 21:05:14 dor kernel: cpu 0 cold: low 0, high 2, batch 1 Oct 14 21:05:14 dor kernel: cpu 1 hot: low 2, high 6, batch 1 Oct 14 21:05:14 dor kernel: cpu 1 cold: low 0, high 2, batch 1 Oct 14 21:05:14 dor kernel: cpu 2 hot: low 2, high 6, batch 1 Oct 14 21:05:14 dor kernel: cpu 2 cold: low 0, high 2, batch 1 Oct 14 21:05:14 dor kernel: cpu 3 hot: low 2, high 6, batch 1 Oct 14 21:05:14 dor kernel: cpu 3 cold: low 0, high 2, batch 1 Oct 14 21:05:14 dor kernel: Normal per-cpu: Oct 14 21:05:14 dor kernel: cpu 0 hot: low 32, high 96, batch 16 Oct 14 21:05:20 dor kernel: cpu 0 cold: low 0, high 32, batch 16 Oct 14 21:05:20 dor kernel: cpu 1 hot: low 32, high 96, batch 16 Oct 14 21:05:21 dor kernel: cpu 1 cold: low 0, high 32, batch 16 Oct 14 21:05:21 dor kernel: cpu 2 hot: low 32, high 96, batch 16 Oct 14 21:05:21 dor kernel: cpu 2 cold: low 0, high 32, batch 16 Oct 14 21:05:21 dor kernel: cpu 3 hot: low 32, high 96, batch 16 Oct 14 21:05:21 dor kernel: cpu 3 cold: low 0, high 32, batch 16 Oct 14 21:05:21 dor kernel: HighMem per-cpu: Oct 14 21:05:21 dor kernel: cpu 0 hot: low 32, high 96, batch 16 Oct 14 21:05:21 dor kernel: cpu 0 cold: low 0, high 32, batch 16 Oct 14 21:05:21 dor kernel: cpu 1 hot: low 32, high 96, batch 16 Oct 14 21:05:21 dor kernel: cpu 1 cold: low 0, high 32, batch 16 Oct 14 21:05:21 dor kernel: cpu 2 hot: low 32, high 96, batch 16 Oct 14 21:05:21 dor kernel: cpu 2 cold: low 0, high 32, batch 16 Oct 14 21:05:21 dor kernel: cpu 3 hot: low 32, high 96, batch 16 Oct 14 21:05:21 dor kernel: cpu 3 cold: low 0, high 32, batch 16 Oct 14 21:05:21 dor kernel: Oct 14 21:05:21 dor kernel: Free pages: 26152kB (3584kB HighMem) Oct 14 21:05:21 dor kernel: Active:599689 inactive:398895 dirty:429 writeback:15 unstable:0 free:6538 slab:13298 mapped:369678 pagetables:6087 Oct 14 21:05:21 dor kernel: DMA free:12544kB min:180kB low:360kB high:540kB active:0kB inactive:0kB present:16384kB pages_scanned:871 all_unreclaimable? yes Oct 14 21:05:27 dor kernel: protections[]: 0 0 0 Oct 14 21:05:28 dor kernel: Normal free:10024kB min:10056kB low:20112kB high:30168kB active:928kB inactive:775024kB present:901120kB pages_scanned:5812455 all_unreclaimable? yes Oct 14 21:05:28 dor kernel: protections[]: 0 0 0 Oct 14 21:05:28 dor kernel: HighMem free:3584kB min:512kB low:1024kB high:1536kB active:2397828kB inactive:820556kB present:3538944kB pages_scanned:0 all_unreclaimable? no Oct 14 21:05:28 dor kernel: protections[]: 0 0 0 Oct 14 21:05:28 dor kernel: DMA: 4*4kB 4*8kB 3*16kB 3*32kB 3*64kB 3*128kB 2*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12544kB Oct 14 21:05:28 dor kernel: Normal: 0*4kB 1*8kB 0*16kB 1*32kB 0*64kB 10*128kB 6*256kB 0*512kB 1*1024kB 1*2048kB 1*4096kB = 10024kB Oct 14 21:05:28 dor kernel: HighMem: 52*4kB 68*8kB 85*16kB 26*32kB 2*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 3584kB Oct 14 21:05:28 dor kernel: Swap cache: add 678557, delete 673525, find 277514/347205, race 0+5 Oct 14 21:05:28 dor kernel: 0 bounce buffer pages Oct 14 21:05:28 dor kernel: Free swap: 20303648kB Oct 14 21:05:28 dor kernel: 1114112 pages of RAM Oct 14 21:05:28 dor kernel: 819184 pages of HIGHMEM Oct 14 21:05:28 dor kernel: 75731 reserved pages Oct 14 21:05:28 dor kernel: 1013077 pages shared Oct 14 21:05:28 dor kernel: 5040 pages swap cached Oct 14