Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache
So I rebooted with 4.9rc6 with the patch inspired by the thread "[PATCH] btrfs: limit the number of asynchronous delalloc pages to reasonable value", but at 512K pages, ie: diff -u2 fs/btrfs/inode.c ../linux-4.9-rc6/fs/btrfs/ --- fs/btrfs/inode.c2016-11-13 13:32:32.0 -0500 +++ ../linux-4.9-rc6/fs/btrfs/inode.c 2016-11-23 08:31:02.145669550 -0500 @@ -1159,5 +1159,5 @@ unsigned long nr_pages; u64 cur_end; - int limit = 10 * SZ_1M; + int limit = SZ_512K; clear_extent_bit(&BTRFS_I(inode)->io_tree, start, end, EXTENT_LOCKED, System still OOM'd after a few hours of rsync copying & deleting files, but it didn't panic this time which was nice ;-) I then set: echo 500 >> /proc/sys/vm/watermark_scale_factor echo 3 >> /proc/sys/vm/dirty_background_ratio and system has been running rsync fine for most of a day. system memory load is noticably different in sar -r after changing the vm params, rsync during OOM: 12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty 03:25:05 AM158616 32836640 99.52 72376 30853268 2942644 3.62 26048876 4977872 4936488 03:30:36 AM157700 32837556 99.52 72468 30944468 2940028 3.62 26070084 4957328 4957432 03:35:02 AM 1802144 31193112 94.54 72560 29266432 2944352 3.62 26184324 3182048187784 03:40:32 AM157272 32837984 99.52 72648 30934432 3007244 3.70 26102636 4930744 4927832 03:45:05 AM158288 32836968 99.52 72896 30980504 412108 0.51 26089920 4959668 4977556 running rsync after tuning VM params: 08:35:01 AM 1903352 31091904 94.23232772 26603624 2680952 3.30 24133864 5019748 1229964 08:40:01 AM 2878552 30116704 91.28232800 25641520 2697356 3.32 24158248 4039372 2864656 08:45:01 AM 3482616 29512640 89.45232656 25043068 2696144 3.32 24087376 3526164 1897192 08:50:01 AM 3590672 29404584 89.12232856 24962856 2704196 3.33 24078188 3451400666760 08:55:01 AM 2064900 30930356 93.74234800 26480996 2730384 3.36 24009244 5044012 50028 On Tue, Nov 22, 2016 at 9:48 AM, Vlastimil Babka wrote: > On 11/22/2016 02:58 PM, E V wrote: >> System OOM'd several times last night with 4.8.10, I attached the >> page_owner output from a morning cat ~8 hours after OOM's to the >> bugzilla case, split and compressed to fit under the 5M attachment >> limit. Let me know if you need anything else. > > Looks like for some reason, the stack saving produces garbage stacks > that only repeat save_stack_trace and save_stack functions :/ > > But judging from gfp flags and page flags, most pages seem to be > allocated with: > > mask 0x2400840(GFP_NOFS|__GFP_NOFAIL) > > and page flags: > > 0x26c(referenced|uptodate|lru|active) > or > 0x200016c(referenced|uptodate|lru|active|owner_priv_1) > or > 0x200086c(referenced|uptodate|lru|active|private) > > While GFP_HIGHUSER_MOVABLE (which I would expect on lru) are less frequent. > > Example: >> grep GFP_NOFS page_owner_after_af | wc -l > 973596 >> grep GFP_HIGHUSER_MOVABLE page_owner_after_af | wc -l > 158879 >> grep GFP_NOFAIL page_owner_after_af | wc -l > 971442 > > grepping for btrfs shows that at least some stacks for NOFS/NOFAIL pages > imply it: > clear_state_bit+0x135/0x1c0 [btrfs] > or > add_delayed_tree_ref+0xbf/0x170 [btrfs] > or > __btrfs_map_block+0x6a8/0x1200 [btrfs] > or > btrfs_buffer_uptodate+0x48/0x70 [btrfs] > or > btrfs_set_path_blocking+0x34/0x60 [btrfs] > > and some more variants. > > So looks like the pages contain btrfs metadata, are on file lru and from > previous checks of /proc/kpagecount we know that they most likely have > page_count() == 0 but are not freed. Could btrfs guys provide some > insight here? > >> On Fri, Nov 18, 2016 at 10:02 AM, E V wrote: >>> Yes, the short window between the stalls and the panic makes it >>> difficult to manually check much. I could setup a cron every 5 minutes >>> or so if you want. Also, I see the OOM's in 4.8, but it has yet to >>> panic on me. Where as 4.9rc has panic'd both times I've booted it, so >>> depending on what you want to look at it might be easier to >>> investigate on 4.8. Let me know, I can turn on a couple of the DEBUG >>> config's and build a new 4.8.8. Never looked into a netconsole or >>> serial console. I think just getting the system to use a higher res >>> console would be an improvement, but the OOM's seemed to be the root >>> cause of the panic so I haven't spent any time looking into that as of >>> yet, >>> >>> Thanks, >>> -Eli >>> >>> On Fri, Nov 18, 2016 at 6:54 AM, Tetsuo Handa >>> wrote: On 2016/11/18 6:49, Vlastimil Babka wrote: > On 11/16/2016 02:39 PM, E V wrote: >> System panic'd overnight running 4.9rc5 & rsync. Attached a photo of >> the stack trace, and the 38 call traces in a 2 minute window shortly
Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache
On 11/22/2016 02:58 PM, E V wrote: > System OOM'd several times last night with 4.8.10, I attached the > page_owner output from a morning cat ~8 hours after OOM's to the > bugzilla case, split and compressed to fit under the 5M attachment > limit. Let me know if you need anything else. Looks like for some reason, the stack saving produces garbage stacks that only repeat save_stack_trace and save_stack functions :/ But judging from gfp flags and page flags, most pages seem to be allocated with: mask 0x2400840(GFP_NOFS|__GFP_NOFAIL) and page flags: 0x26c(referenced|uptodate|lru|active) or 0x200016c(referenced|uptodate|lru|active|owner_priv_1) or 0x200086c(referenced|uptodate|lru|active|private) While GFP_HIGHUSER_MOVABLE (which I would expect on lru) are less frequent. Example: > grep GFP_NOFS page_owner_after_af | wc -l 973596 > grep GFP_HIGHUSER_MOVABLE page_owner_after_af | wc -l 158879 > grep GFP_NOFAIL page_owner_after_af | wc -l 971442 grepping for btrfs shows that at least some stacks for NOFS/NOFAIL pages imply it: clear_state_bit+0x135/0x1c0 [btrfs] or add_delayed_tree_ref+0xbf/0x170 [btrfs] or __btrfs_map_block+0x6a8/0x1200 [btrfs] or btrfs_buffer_uptodate+0x48/0x70 [btrfs] or btrfs_set_path_blocking+0x34/0x60 [btrfs] and some more variants. So looks like the pages contain btrfs metadata, are on file lru and from previous checks of /proc/kpagecount we know that they most likely have page_count() == 0 but are not freed. Could btrfs guys provide some insight here? > On Fri, Nov 18, 2016 at 10:02 AM, E V wrote: >> Yes, the short window between the stalls and the panic makes it >> difficult to manually check much. I could setup a cron every 5 minutes >> or so if you want. Also, I see the OOM's in 4.8, but it has yet to >> panic on me. Where as 4.9rc has panic'd both times I've booted it, so >> depending on what you want to look at it might be easier to >> investigate on 4.8. Let me know, I can turn on a couple of the DEBUG >> config's and build a new 4.8.8. Never looked into a netconsole or >> serial console. I think just getting the system to use a higher res >> console would be an improvement, but the OOM's seemed to be the root >> cause of the panic so I haven't spent any time looking into that as of >> yet, >> >> Thanks, >> -Eli >> >> On Fri, Nov 18, 2016 at 6:54 AM, Tetsuo Handa >> wrote: >>> On 2016/11/18 6:49, Vlastimil Babka wrote: On 11/16/2016 02:39 PM, E V wrote: > System panic'd overnight running 4.9rc5 & rsync. Attached a photo of > the stack trace, and the 38 call traces in a 2 minute window shortly > before, to the bugzilla case for those not on it's e-mail list: > > https://bugzilla.kernel.org/show_bug.cgi?id=186671 The panic screenshot has only the last part, but the end marker says it's OOM with no killable processes. The DEBUG_VM config thus didn't trigger anything, and still there's tons of pagecache, mostly clean, that's not being reclaimed. Could you now try this? - enable CONFIG_PAGE_OWNER - boot with kernel option: page_owner=on - after the first oom, "cat /sys/kernel/debug/page_owner > file" - provide the file (compressed, it will be quite large) >>> >>> Excuse me for a noise, but do we really need to do >>> "cat /sys/kernel/debug/page_owner > file" after the first OOM killer >>> invocation? I worry that it might be too difficult to do. >>> Shouldn't we rather do "cat /sys/kernel/debug/page_owner > file" >>> hourly and compare tendency between the latest one and previous one? >>> >>> This system has swap, and /var/log/messages before panic >>> reports that swapin was stalling at memory allocation. >>> >>> >>> [130346.262510] dsm_sa_datamgrd: page allocation stalls for 52400ms, >>> order:0, mode:0x24200ca(GFP_HIGHUSER_MOVABLE) >>> [130346.262572] CPU: 1 PID: 3622 Comm: dsm_sa_datamgrd Tainted: GW >>> I 4.9.0-rc5 #2 >>> [130346.262662] 8129ba69 8170e4c8 >>> c90003ccb8d8 >>> [130346.262714] 8113449f 024200ca1ca11b40 8170e4c8 >>> c90003ccb880 >>> [130346.262765] 0010 c90003ccb8e8 c90003ccb898 >>> 88041f226e80 >>> [130346.262817] Call Trace: >>> [130346.262843] [] ? dump_stack+0x46/0x5d >>> [130346.262872] [] ? warn_alloc+0x11f/0x140 >>> [130346.262899] [] ? __alloc_pages_slowpath+0x84b/0xa80 >>> [130346.262929] [] ? __alloc_pages_nodemask+0x2b0/0x2f0 >>> [130346.262960] [] ? alloc_pages_vma+0xbe/0x260 >>> [130346.262989] [] ? pagecache_get_page+0x22/0x280 >>> [130346.263019] [] ? __read_swap_cache_async+0x118/0x1a0 >>> [130346.263048] [] ? read_swap_cache_async+0xf/0x30 >>> [130346.263077] [] ? swapin_readahead+0x16e/0x1c0 >>> [130346.263106] [] ? radix_tree_lookup_slot+0xe/0x20 >>> [130346.263135] [] ? find_get_entry+0x14/0x130 >>> [130346.263162] [] ? pagecache_get_page+0x22/0x280 >>> [130346.263193] [] ? do_
Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache
System OOM'd several times last night with 4.8.10, I attached the page_owner output from a morning cat ~8 hours after OOM's to the bugzilla case, split and compressed to fit under the 5M attachment limit. Let me know if you need anything else. On Fri, Nov 18, 2016 at 10:02 AM, E V wrote: > Yes, the short window between the stalls and the panic makes it > difficult to manually check much. I could setup a cron every 5 minutes > or so if you want. Also, I see the OOM's in 4.8, but it has yet to > panic on me. Where as 4.9rc has panic'd both times I've booted it, so > depending on what you want to look at it might be easier to > investigate on 4.8. Let me know, I can turn on a couple of the DEBUG > config's and build a new 4.8.8. Never looked into a netconsole or > serial console. I think just getting the system to use a higher res > console would be an improvement, but the OOM's seemed to be the root > cause of the panic so I haven't spent any time looking into that as of > yet, > > Thanks, > -Eli > > On Fri, Nov 18, 2016 at 6:54 AM, Tetsuo Handa > wrote: >> On 2016/11/18 6:49, Vlastimil Babka wrote: >>> On 11/16/2016 02:39 PM, E V wrote: System panic'd overnight running 4.9rc5 & rsync. Attached a photo of the stack trace, and the 38 call traces in a 2 minute window shortly before, to the bugzilla case for those not on it's e-mail list: https://bugzilla.kernel.org/show_bug.cgi?id=186671 >>> >>> The panic screenshot has only the last part, but the end marker says >>> it's OOM with no killable processes. The DEBUG_VM config thus didn't >>> trigger anything, and still there's tons of pagecache, mostly clean, >>> that's not being reclaimed. >>> >>> Could you now try this? >>> - enable CONFIG_PAGE_OWNER >>> - boot with kernel option: page_owner=on >>> - after the first oom, "cat /sys/kernel/debug/page_owner > file" >>> - provide the file (compressed, it will be quite large) >> >> Excuse me for a noise, but do we really need to do >> "cat /sys/kernel/debug/page_owner > file" after the first OOM killer >> invocation? I worry that it might be too difficult to do. >> Shouldn't we rather do "cat /sys/kernel/debug/page_owner > file" >> hourly and compare tendency between the latest one and previous one? >> >> This system has swap, and /var/log/messages before panic >> reports that swapin was stalling at memory allocation. >> >> >> [130346.262510] dsm_sa_datamgrd: page allocation stalls for 52400ms, >> order:0, mode:0x24200ca(GFP_HIGHUSER_MOVABLE) >> [130346.262572] CPU: 1 PID: 3622 Comm: dsm_sa_datamgrd Tainted: GW I >> 4.9.0-rc5 #2 >> [130346.262662] 8129ba69 8170e4c8 >> c90003ccb8d8 >> [130346.262714] 8113449f 024200ca1ca11b40 8170e4c8 >> c90003ccb880 >> [130346.262765] 0010 c90003ccb8e8 c90003ccb898 >> 88041f226e80 >> [130346.262817] Call Trace: >> [130346.262843] [] ? dump_stack+0x46/0x5d >> [130346.262872] [] ? warn_alloc+0x11f/0x140 >> [130346.262899] [] ? __alloc_pages_slowpath+0x84b/0xa80 >> [130346.262929] [] ? __alloc_pages_nodemask+0x2b0/0x2f0 >> [130346.262960] [] ? alloc_pages_vma+0xbe/0x260 >> [130346.262989] [] ? pagecache_get_page+0x22/0x280 >> [130346.263019] [] ? __read_swap_cache_async+0x118/0x1a0 >> [130346.263048] [] ? read_swap_cache_async+0xf/0x30 >> [130346.263077] [] ? swapin_readahead+0x16e/0x1c0 >> [130346.263106] [] ? radix_tree_lookup_slot+0xe/0x20 >> [130346.263135] [] ? find_get_entry+0x14/0x130 >> [130346.263162] [] ? pagecache_get_page+0x22/0x280 >> [130346.263193] [] ? do_swap_page+0x44f/0x5f0 >> [130346.263220] [] ? __radix_tree_lookup+0x62/0xc0 >> [130346.263249] [] ? handle_mm_fault+0x66a/0xf00 >> [130346.263277] [] ? find_get_entry+0x14/0x130 >> [130346.263305] [] ? __do_page_fault+0x1c5/0x490 >> [130346.263336] [] ? page_fault+0x22/0x30 >> [130346.263364] [] ? copy_user_generic_string+0x2c/0x40 >> [130346.263395] [] ? set_fd_set+0x1d/0x30 >> [130346.263422] [] ? core_sys_select+0x1a5/0x260 >> [130346.263450] [] ? getname_flags+0x6a/0x1e0 >> [130346.263479] [] ? cp_new_stat+0x115/0x130 >> [130346.263509] [] ? ktime_get_ts64+0x3f/0xf0 >> [130346.263537] [] ? SyS_select+0xa5/0xe0 >> [130346.263564] [] ? entry_SYSCALL_64_fastpath+0x13/0x94 >> >> >> Under such situation, trying to login and execute /bin/cat could take >> minutes. >> Also, writing to btrfs and ext4 seems to be stalling. The btrfs one is a >> situation where WQ_MEM_RECLAIM kernel workqueue is unable to make progress. >> >> >> [130420.008231] kworker/u34:21: page allocation stalls for 35028ms, order:0, >> mode:0x2400840(GFP_NOFS|__GFP_NOFAIL) >> [130420.008287] CPU: 5 PID: 24286 Comm: kworker/u34:21 Tainted: GW I >> 4.9.0-rc5 #2 >> [130420.008401] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] >> [130420.008432] 000
Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache
Yes, the short window between the stalls and the panic makes it difficult to manually check much. I could setup a cron every 5 minutes or so if you want. Also, I see the OOM's in 4.8, but it has yet to panic on me. Where as 4.9rc has panic'd both times I've booted it, so depending on what you want to look at it might be easier to investigate on 4.8. Let me know, I can turn on a couple of the DEBUG config's and build a new 4.8.8. Never looked into a netconsole or serial console. I think just getting the system to use a higher res console would be an improvement, but the OOM's seemed to be the root cause of the panic so I haven't spent any time looking into that as of yet, Thanks, -Eli On Fri, Nov 18, 2016 at 6:54 AM, Tetsuo Handa wrote: > On 2016/11/18 6:49, Vlastimil Babka wrote: >> On 11/16/2016 02:39 PM, E V wrote: >>> System panic'd overnight running 4.9rc5 & rsync. Attached a photo of >>> the stack trace, and the 38 call traces in a 2 minute window shortly >>> before, to the bugzilla case for those not on it's e-mail list: >>> >>> https://bugzilla.kernel.org/show_bug.cgi?id=186671 >> >> The panic screenshot has only the last part, but the end marker says >> it's OOM with no killable processes. The DEBUG_VM config thus didn't >> trigger anything, and still there's tons of pagecache, mostly clean, >> that's not being reclaimed. >> >> Could you now try this? >> - enable CONFIG_PAGE_OWNER >> - boot with kernel option: page_owner=on >> - after the first oom, "cat /sys/kernel/debug/page_owner > file" >> - provide the file (compressed, it will be quite large) > > Excuse me for a noise, but do we really need to do > "cat /sys/kernel/debug/page_owner > file" after the first OOM killer > invocation? I worry that it might be too difficult to do. > Shouldn't we rather do "cat /sys/kernel/debug/page_owner > file" > hourly and compare tendency between the latest one and previous one? > > This system has swap, and /var/log/messages before panic > reports that swapin was stalling at memory allocation. > > > [130346.262510] dsm_sa_datamgrd: page allocation stalls for 52400ms, order:0, > mode:0x24200ca(GFP_HIGHUSER_MOVABLE) > [130346.262572] CPU: 1 PID: 3622 Comm: dsm_sa_datamgrd Tainted: GW I >4.9.0-rc5 #2 > [130346.262662] 8129ba69 8170e4c8 > c90003ccb8d8 > [130346.262714] 8113449f 024200ca1ca11b40 8170e4c8 > c90003ccb880 > [130346.262765] 0010 c90003ccb8e8 c90003ccb898 > 88041f226e80 > [130346.262817] Call Trace: > [130346.262843] [] ? dump_stack+0x46/0x5d > [130346.262872] [] ? warn_alloc+0x11f/0x140 > [130346.262899] [] ? __alloc_pages_slowpath+0x84b/0xa80 > [130346.262929] [] ? __alloc_pages_nodemask+0x2b0/0x2f0 > [130346.262960] [] ? alloc_pages_vma+0xbe/0x260 > [130346.262989] [] ? pagecache_get_page+0x22/0x280 > [130346.263019] [] ? __read_swap_cache_async+0x118/0x1a0 > [130346.263048] [] ? read_swap_cache_async+0xf/0x30 > [130346.263077] [] ? swapin_readahead+0x16e/0x1c0 > [130346.263106] [] ? radix_tree_lookup_slot+0xe/0x20 > [130346.263135] [] ? find_get_entry+0x14/0x130 > [130346.263162] [] ? pagecache_get_page+0x22/0x280 > [130346.263193] [] ? do_swap_page+0x44f/0x5f0 > [130346.263220] [] ? __radix_tree_lookup+0x62/0xc0 > [130346.263249] [] ? handle_mm_fault+0x66a/0xf00 > [130346.263277] [] ? find_get_entry+0x14/0x130 > [130346.263305] [] ? __do_page_fault+0x1c5/0x490 > [130346.263336] [] ? page_fault+0x22/0x30 > [130346.263364] [] ? copy_user_generic_string+0x2c/0x40 > [130346.263395] [] ? set_fd_set+0x1d/0x30 > [130346.263422] [] ? core_sys_select+0x1a5/0x260 > [130346.263450] [] ? getname_flags+0x6a/0x1e0 > [130346.263479] [] ? cp_new_stat+0x115/0x130 > [130346.263509] [] ? ktime_get_ts64+0x3f/0xf0 > [130346.263537] [] ? SyS_select+0xa5/0xe0 > [130346.263564] [] ? entry_SYSCALL_64_fastpath+0x13/0x94 > > > Under such situation, trying to login and execute /bin/cat could take minutes. > Also, writing to btrfs and ext4 seems to be stalling. The btrfs one is a > situation where WQ_MEM_RECLAIM kernel workqueue is unable to make progress. > > > [130420.008231] kworker/u34:21: page allocation stalls for 35028ms, order:0, > mode:0x2400840(GFP_NOFS|__GFP_NOFAIL) > [130420.008287] CPU: 5 PID: 24286 Comm: kworker/u34:21 Tainted: GW I >4.9.0-rc5 #2 > [130420.008401] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] > [130420.008432] 8129ba69 8170e4c8 > c900087836a0 > [130420.008483] 8113449f 024008401e3f1b40 8170e4c8 > c90008783648 > [130420.008534] 0010 c900087836b0 c90008783660 > 88041ecc4340 > [130420.008586] Call Trace: > [130420.008611] [] ? dump_stack+0x46/0x5d > [130420.008640] [] ? warn_alloc+0x11f/0x140 > [130420.008667] [] ? __alloc_pages_sl
Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache
It could be totally unrelated but I have a similar problem: processes get randomly OOM'd when I am doing anything "sort of heavy" on my Btrfs filesystems. I did some "evil tuning", so I assumed that must be the problem (even if the values looked sane for my system). Thus, I kept cutting back on the manually set values (mostly dirty/background ratio, io scheduler request queue size and such tunables) but it seems to be a dead end. I guess anything I change in order to try and cut back on the related memory footprint just makes the OOMs less frequent but it's only a matter of time and coincidence (lots of things randomly happen to do some notable amount of IO) until OOMs happen anyway. It seems to be plenty enough to start a defrag or balance on more than a single filesystem (in parallel) and pretty much any notable "useful" user load will have a high change of triggering OOMs (and get killed) sooner or later. It's just my limited observation but database-like loads [like that of bitcoind] (sync writes and/or frequent flushes?) or high priority buffered writes (ffmpeg running with higher than default priority and saving live video streams into files without recoding) seem to have higher chance of triggering this (more so than simply reading or writing files sequentially and asynchronously, either locally or through Samba). I am on gentoo-sources 4.8.8 right now but it was there with 4.7.x as well. On Thu, Nov 17, 2016 at 10:49 PM, Vlastimil Babka wrote: > On 11/16/2016 02:39 PM, E V wrote: >> System panic'd overnight running 4.9rc5 & rsync. Attached a photo of >> the stack trace, and the 38 call traces in a 2 minute window shortly >> before, to the bugzilla case for those not on it's e-mail list: >> >> https://bugzilla.kernel.org/show_bug.cgi?id=186671 > > The panic screenshot has only the last part, but the end marker says > it's OOM with no killable processes. The DEBUG_VM config thus didn't > trigger anything, and still there's tons of pagecache, mostly clean, > that's not being reclaimed. > > Could you now try this? > - enable CONFIG_PAGE_OWNER > - boot with kernel option: page_owner=on > - after the first oom, "cat /sys/kernel/debug/page_owner > file" > - provide the file (compressed, it will be quite large) > > Vlastimil > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache
On 2016/11/18 6:49, Vlastimil Babka wrote: > On 11/16/2016 02:39 PM, E V wrote: >> System panic'd overnight running 4.9rc5 & rsync. Attached a photo of >> the stack trace, and the 38 call traces in a 2 minute window shortly >> before, to the bugzilla case for those not on it's e-mail list: >> >> https://bugzilla.kernel.org/show_bug.cgi?id=186671 > > The panic screenshot has only the last part, but the end marker says > it's OOM with no killable processes. The DEBUG_VM config thus didn't > trigger anything, and still there's tons of pagecache, mostly clean, > that's not being reclaimed. > > Could you now try this? > - enable CONFIG_PAGE_OWNER > - boot with kernel option: page_owner=on > - after the first oom, "cat /sys/kernel/debug/page_owner > file" > - provide the file (compressed, it will be quite large) Excuse me for a noise, but do we really need to do "cat /sys/kernel/debug/page_owner > file" after the first OOM killer invocation? I worry that it might be too difficult to do. Shouldn't we rather do "cat /sys/kernel/debug/page_owner > file" hourly and compare tendency between the latest one and previous one? This system has swap, and /var/log/messages before panic reports that swapin was stalling at memory allocation. [130346.262510] dsm_sa_datamgrd: page allocation stalls for 52400ms, order:0, mode:0x24200ca(GFP_HIGHUSER_MOVABLE) [130346.262572] CPU: 1 PID: 3622 Comm: dsm_sa_datamgrd Tainted: GW I 4.9.0-rc5 #2 [130346.262662] 8129ba69 8170e4c8 c90003ccb8d8 [130346.262714] 8113449f 024200ca1ca11b40 8170e4c8 c90003ccb880 [130346.262765] 0010 c90003ccb8e8 c90003ccb898 88041f226e80 [130346.262817] Call Trace: [130346.262843] [] ? dump_stack+0x46/0x5d [130346.262872] [] ? warn_alloc+0x11f/0x140 [130346.262899] [] ? __alloc_pages_slowpath+0x84b/0xa80 [130346.262929] [] ? __alloc_pages_nodemask+0x2b0/0x2f0 [130346.262960] [] ? alloc_pages_vma+0xbe/0x260 [130346.262989] [] ? pagecache_get_page+0x22/0x280 [130346.263019] [] ? __read_swap_cache_async+0x118/0x1a0 [130346.263048] [] ? read_swap_cache_async+0xf/0x30 [130346.263077] [] ? swapin_readahead+0x16e/0x1c0 [130346.263106] [] ? radix_tree_lookup_slot+0xe/0x20 [130346.263135] [] ? find_get_entry+0x14/0x130 [130346.263162] [] ? pagecache_get_page+0x22/0x280 [130346.263193] [] ? do_swap_page+0x44f/0x5f0 [130346.263220] [] ? __radix_tree_lookup+0x62/0xc0 [130346.263249] [] ? handle_mm_fault+0x66a/0xf00 [130346.263277] [] ? find_get_entry+0x14/0x130 [130346.263305] [] ? __do_page_fault+0x1c5/0x490 [130346.263336] [] ? page_fault+0x22/0x30 [130346.263364] [] ? copy_user_generic_string+0x2c/0x40 [130346.263395] [] ? set_fd_set+0x1d/0x30 [130346.263422] [] ? core_sys_select+0x1a5/0x260 [130346.263450] [] ? getname_flags+0x6a/0x1e0 [130346.263479] [] ? cp_new_stat+0x115/0x130 [130346.263509] [] ? ktime_get_ts64+0x3f/0xf0 [130346.263537] [] ? SyS_select+0xa5/0xe0 [130346.263564] [] ? entry_SYSCALL_64_fastpath+0x13/0x94 Under such situation, trying to login and execute /bin/cat could take minutes. Also, writing to btrfs and ext4 seems to be stalling. The btrfs one is a situation where WQ_MEM_RECLAIM kernel workqueue is unable to make progress. [130420.008231] kworker/u34:21: page allocation stalls for 35028ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL) [130420.008287] CPU: 5 PID: 24286 Comm: kworker/u34:21 Tainted: GW I 4.9.0-rc5 #2 [130420.008401] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [130420.008432] 8129ba69 8170e4c8 c900087836a0 [130420.008483] 8113449f 024008401e3f1b40 8170e4c8 c90008783648 [130420.008534] 0010 c900087836b0 c90008783660 88041ecc4340 [130420.008586] Call Trace: [130420.008611] [] ? dump_stack+0x46/0x5d [130420.008640] [] ? warn_alloc+0x11f/0x140 [130420.008667] [] ? __alloc_pages_slowpath+0x84b/0xa80 [130420.008707] [] ? search_bitmap+0xc2/0x140 [btrfs] [130420.008736] [] ? __alloc_pages_nodemask+0x2b0/0x2f0 [130420.008766] [] ? alloc_pages_current+0x8a/0x110 [130420.008796] [] ? pagecache_get_page+0xec/0x280 [130420.008836] [] ? alloc_extent_buffer+0x108/0x430 [btrfs] [130420.008875] [] ? btrfs_alloc_tree_block+0x118/0x4d0 [btrfs] [130420.008927] [] ? __btrfs_cow_block+0x148/0x5d0 [btrfs] [130420.008964] [] ? btrfs_cow_block+0x114/0x1d0 [btrfs] [130420.009001] [] ? btrfs_search_slot+0x206/0xa40 [btrfs] [130420.009039] [] ? lookup_inline_extent_backref+0xd9/0x620 [btrfs] [130420.009095] [] ? set_extent_bit+0x24/0x30 [btrfs] [130420.009124] [] ? kmem_cache_alloc+0x17f/0x1b0 [130420.009161] [] ? __btrfs_free_extent.isra.69+0xef/0xd10 [btrfs] [130420.009215] [] ? btrfs_merge_delayed_refs+0x56/0x6f0 [btrfs] [130420.009269] [] ? __btrfs_run_delayed_ref
Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache
On 11/16/2016 02:39 PM, E V wrote: > System panic'd overnight running 4.9rc5 & rsync. Attached a photo of > the stack trace, and the 38 call traces in a 2 minute window shortly > before, to the bugzilla case for those not on it's e-mail list: > > https://bugzilla.kernel.org/show_bug.cgi?id=186671 The panic screenshot has only the last part, but the end marker says it's OOM with no killable processes. The DEBUG_VM config thus didn't trigger anything, and still there's tons of pagecache, mostly clean, that's not being reclaimed. Could you now try this? - enable CONFIG_PAGE_OWNER - boot with kernel option: page_owner=on - after the first oom, "cat /sys/kernel/debug/page_owner > file" - provide the file (compressed, it will be quite large) Vlastimil -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache
System panic'd overnight running 4.9rc5 & rsync. Attached a photo of the stack trace, and the 38 call traces in a 2 minute window shortly before, to the bugzilla case for those not on it's e-mail list: https://bugzilla.kernel.org/show_bug.cgi?id=186671 On Mon, Nov 14, 2016 at 3:56 PM, E V wrote: > Pretty sure it was the system after the OOM just did a history search > to check, though it is 3 days afterwards and several OOMs killed > several processes in somewhat rapid succession, I just listed the 1st. > I'll turn on CONFIG_DEBUG_VM and reboot again. > > On Mon, Nov 14, 2016 at 12:04 PM, Vlastimil Babka wrote: >> On 11/14/2016 02:27 PM, E V wrote: >>> System is an intel dual socket Xeon E5620, 7500/5520/5500/X58 ICH10 >>> family according to lspci. Anyways 4.8.4 OOM'd while I was gone. I'll >>> download the current 4.9rc and reboot, but in the mean time here's >>> xxd, vmstat & kern.log output: >>> 8532039 >> >> Hmm this would suggest that the memory is mostly free. But not according >> to vmstat. Is it possible you mistakenly provided the xxd from a fresh >> boot, but vmstat from after the OOM? >> >> But sure, a page_count() of zero is a reason why __isolate_lru_page() >> would fail due to its get_page_unless_zero(). The question is then how >> could it drop to zero without being freed at the same time, as >> put_page() does. >> >> I was going to suspect commit 83929372f6 and a page_ref_sub() it adds to >> delete_from_page_cache(), but that's since 4.8 and you mention problems >> since 4.7. >> >> Anyway it might be worth enabling CONFIG_DEBUG_VM as the relevant code >> usually has VM_BUG_ONs. >> >> Vlastimil >> >>>9324 0100 >>>2226 0200 >>> 405 0300 >>> 80 0400 >>> 34 0500 >>> 48 0600 >>> 17 0700 >>> 17 0800 >>> 32 0900 >>> 19 0a00 >>> 1 0c00 >>> 1 0d00 >>> 1 0e00 >>> 12 1000 >>> 8 1100 >>> 32 1200 >>> 10 1300 >>> 2 1400 >>> 11 1500 >>> 12 1600 >>> 7 1700 >>> 3 1800 >>> 5 1900 >>> 6 1a00 >>> 11 1b00 >>> 22 1c00 >>> 3 1d00 >>> 19 1e00 >>> 21 1f00 >>> 18 2000 >>> 28 2100 >>> 40 2200 >>> 38 2300 >>> 85 2400 >>> 59 2500 >>> 40520 81ff >>> >>> /proc/vmstat: >>> nr_free_pages 60965 >>> nr_zone_inactive_anon 4646 >>> nr_zone_active_anon 3265 >>> nr_zone_inactive_file 633882 >>> nr_zone_active_file 7017458 >>> nr_zone_unevictable 0 >>> nr_zone_write_pending 0 >>> nr_mlock 0 >>> nr_slab_reclaimable 299205 >>> nr_slab_unreclaimable 195497 >>> nr_page_table_pages 935 >>> nr_kernel_stack 4976 >>> nr_bounce 0 >>> numa_hit 3577063288 >>> numa_miss 541393191 >>> numa_foreign 541393191 >>> numa_interleave 19415 >>> numa_local 3577063288 >>> numa_other 0 >>> nr_free_cma 0 >>> nr_inactive_anon 4646 >>> nr_active_anon 3265 >>> nr_inactive_file 633882 >>> nr_active_file 7017458 >>> nr_unevictable 0 >>> nr_isolated_anon 0 >>> nr_isolated_file 0 >>> nr_pages_scanned 0 >>> workingset_refault 42685891 >>> workingset_activate 15247281 >>> workingset_nodereclaim 26375216 >>> nr_anon_pages 5067 >>> nr_mapped 5630 >>> nr_file_pages 7654746 >>> nr_dirty 0 >>> nr_writeback 0 >>> nr_writeback_temp 0 >>> nr_shmem 2504 >>> nr_shmem_hugepages 0 >>> nr_shmem_pmdmapped 0 >>> nr_anon_transparent_hugepages 0 >>> nr_unstable 0 >>> nr_vmscan_write 5243750485 >>> nr_vmscan_immediate_reclaim 4207633857 >>> nr_dirtied 1839143430 >>> nr_written 1832626107 >>> nr_dirty_threshold 1147728 >>> nr_dirty_background_threshold 151410 >>> pgpgin 166731189 >>> pgpgout 7328142335 >>> pswpin 98608 >>> pswpout 117794 >>> pgalloc_dma 29504 >>> pgalloc_dma32 1006726216 >>> pgalloc_normal 5275218188 >>> pgalloc_movable 0 >>> allocstall_dma 0 >>> allocstall_dma32 0 >>> allocstall_normal 36461 >>> allocstall_movable 5867 >>> pgskip_dma 0 >>> pgskip_dma32 0 >>> pgskip_normal 6417890 >>> pgskip_movable 0 >>> pgfree 6309223401 >>> pgactivate 35076483 >>> pgdeactivate 63556974 >>> pgfault 35753842 >>> pgmajfault 69126 >>> pglazyfreed 0 >>> pgrefill 70008598 >>> pgsteal_kswapd 3567289713 >>> pgsteal_direct 5878057 >>> pgscan_kswapd 9059309872 >>> pgscan_direct 4239367903 >>> pgscan_direct_throttle 0 >>> zone_reclaim_failed 0 >>> pginodesteal 102916 >>> slabs_scanned 460790262 >>> kswapd_inodesteal 9130243 >>> kswapd_low_wmark_hit_quickly 10634373 >>> kswapd_high_wmark_hit_quickly 7348173 >>> pageoutrun 18349115 >>> pgrotated 16291322 >>> drop_pagecache 0 >>> drop_slab 0 >>> pgmigrate_success 18912908 >>> pgmigrate_fail 63382146 >>> compact_migra
Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache
Pretty sure it was the system after the OOM just did a history search to check, though it is 3 days afterwards and several OOMs killed several processes in somewhat rapid succession, I just listed the 1st. I'll turn on CONFIG_DEBUG_VM and reboot again. On Mon, Nov 14, 2016 at 12:04 PM, Vlastimil Babka wrote: > On 11/14/2016 02:27 PM, E V wrote: >> System is an intel dual socket Xeon E5620, 7500/5520/5500/X58 ICH10 >> family according to lspci. Anyways 4.8.4 OOM'd while I was gone. I'll >> download the current 4.9rc and reboot, but in the mean time here's >> xxd, vmstat & kern.log output: >> 8532039 > > Hmm this would suggest that the memory is mostly free. But not according > to vmstat. Is it possible you mistakenly provided the xxd from a fresh > boot, but vmstat from after the OOM? > > But sure, a page_count() of zero is a reason why __isolate_lru_page() > would fail due to its get_page_unless_zero(). The question is then how > could it drop to zero without being freed at the same time, as > put_page() does. > > I was going to suspect commit 83929372f6 and a page_ref_sub() it adds to > delete_from_page_cache(), but that's since 4.8 and you mention problems > since 4.7. > > Anyway it might be worth enabling CONFIG_DEBUG_VM as the relevant code > usually has VM_BUG_ONs. > > Vlastimil > >>9324 0100 >>2226 0200 >> 405 0300 >> 80 0400 >> 34 0500 >> 48 0600 >> 17 0700 >> 17 0800 >> 32 0900 >> 19 0a00 >> 1 0c00 >> 1 0d00 >> 1 0e00 >> 12 1000 >> 8 1100 >> 32 1200 >> 10 1300 >> 2 1400 >> 11 1500 >> 12 1600 >> 7 1700 >> 3 1800 >> 5 1900 >> 6 1a00 >> 11 1b00 >> 22 1c00 >> 3 1d00 >> 19 1e00 >> 21 1f00 >> 18 2000 >> 28 2100 >> 40 2200 >> 38 2300 >> 85 2400 >> 59 2500 >> 40520 81ff >> >> /proc/vmstat: >> nr_free_pages 60965 >> nr_zone_inactive_anon 4646 >> nr_zone_active_anon 3265 >> nr_zone_inactive_file 633882 >> nr_zone_active_file 7017458 >> nr_zone_unevictable 0 >> nr_zone_write_pending 0 >> nr_mlock 0 >> nr_slab_reclaimable 299205 >> nr_slab_unreclaimable 195497 >> nr_page_table_pages 935 >> nr_kernel_stack 4976 >> nr_bounce 0 >> numa_hit 3577063288 >> numa_miss 541393191 >> numa_foreign 541393191 >> numa_interleave 19415 >> numa_local 3577063288 >> numa_other 0 >> nr_free_cma 0 >> nr_inactive_anon 4646 >> nr_active_anon 3265 >> nr_inactive_file 633882 >> nr_active_file 7017458 >> nr_unevictable 0 >> nr_isolated_anon 0 >> nr_isolated_file 0 >> nr_pages_scanned 0 >> workingset_refault 42685891 >> workingset_activate 15247281 >> workingset_nodereclaim 26375216 >> nr_anon_pages 5067 >> nr_mapped 5630 >> nr_file_pages 7654746 >> nr_dirty 0 >> nr_writeback 0 >> nr_writeback_temp 0 >> nr_shmem 2504 >> nr_shmem_hugepages 0 >> nr_shmem_pmdmapped 0 >> nr_anon_transparent_hugepages 0 >> nr_unstable 0 >> nr_vmscan_write 5243750485 >> nr_vmscan_immediate_reclaim 4207633857 >> nr_dirtied 1839143430 >> nr_written 1832626107 >> nr_dirty_threshold 1147728 >> nr_dirty_background_threshold 151410 >> pgpgin 166731189 >> pgpgout 7328142335 >> pswpin 98608 >> pswpout 117794 >> pgalloc_dma 29504 >> pgalloc_dma32 1006726216 >> pgalloc_normal 5275218188 >> pgalloc_movable 0 >> allocstall_dma 0 >> allocstall_dma32 0 >> allocstall_normal 36461 >> allocstall_movable 5867 >> pgskip_dma 0 >> pgskip_dma32 0 >> pgskip_normal 6417890 >> pgskip_movable 0 >> pgfree 6309223401 >> pgactivate 35076483 >> pgdeactivate 63556974 >> pgfault 35753842 >> pgmajfault 69126 >> pglazyfreed 0 >> pgrefill 70008598 >> pgsteal_kswapd 3567289713 >> pgsteal_direct 5878057 >> pgscan_kswapd 9059309872 >> pgscan_direct 4239367903 >> pgscan_direct_throttle 0 >> zone_reclaim_failed 0 >> pginodesteal 102916 >> slabs_scanned 460790262 >> kswapd_inodesteal 9130243 >> kswapd_low_wmark_hit_quickly 10634373 >> kswapd_high_wmark_hit_quickly 7348173 >> pageoutrun 18349115 >> pgrotated 16291322 >> drop_pagecache 0 >> drop_slab 0 >> pgmigrate_success 18912908 >> pgmigrate_fail 63382146 >> compact_migrate_scanned 2986269789 >> compact_free_scanned 190451505123 >> compact_isolated 109549437 >> compact_stall 3544 >> compact_fail 8 >> compact_success 3536 >> compact_daemon_wake 1403515 >> htlb_buddy_alloc_success 0 >> htlb_buddy_alloc_fail 0 >> unevictable_pgs_culled 12473 >> unevictable_pgs_scanned 0 >> unevictable_pgs_rescued 11979 >> unevictable_pgs_mlocked 14556 >> unevictable_pgs_munlocked 14556 >> unevictable_pgs_cleared 0 >> unevictable_pgs_stranded 0 >
Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache
On 11/14/2016 02:27 PM, E V wrote: > System is an intel dual socket Xeon E5620, 7500/5520/5500/X58 ICH10 > family according to lspci. Anyways 4.8.4 OOM'd while I was gone. I'll > download the current 4.9rc and reboot, but in the mean time here's > xxd, vmstat & kern.log output: > 8532039 Hmm this would suggest that the memory is mostly free. But not according to vmstat. Is it possible you mistakenly provided the xxd from a fresh boot, but vmstat from after the OOM? But sure, a page_count() of zero is a reason why __isolate_lru_page() would fail due to its get_page_unless_zero(). The question is then how could it drop to zero without being freed at the same time, as put_page() does. I was going to suspect commit 83929372f6 and a page_ref_sub() it adds to delete_from_page_cache(), but that's since 4.8 and you mention problems since 4.7. Anyway it might be worth enabling CONFIG_DEBUG_VM as the relevant code usually has VM_BUG_ONs. Vlastimil >9324 0100 >2226 0200 > 405 0300 > 80 0400 > 34 0500 > 48 0600 > 17 0700 > 17 0800 > 32 0900 > 19 0a00 > 1 0c00 > 1 0d00 > 1 0e00 > 12 1000 > 8 1100 > 32 1200 > 10 1300 > 2 1400 > 11 1500 > 12 1600 > 7 1700 > 3 1800 > 5 1900 > 6 1a00 > 11 1b00 > 22 1c00 > 3 1d00 > 19 1e00 > 21 1f00 > 18 2000 > 28 2100 > 40 2200 > 38 2300 > 85 2400 > 59 2500 > 40520 81ff > > /proc/vmstat: > nr_free_pages 60965 > nr_zone_inactive_anon 4646 > nr_zone_active_anon 3265 > nr_zone_inactive_file 633882 > nr_zone_active_file 7017458 > nr_zone_unevictable 0 > nr_zone_write_pending 0 > nr_mlock 0 > nr_slab_reclaimable 299205 > nr_slab_unreclaimable 195497 > nr_page_table_pages 935 > nr_kernel_stack 4976 > nr_bounce 0 > numa_hit 3577063288 > numa_miss 541393191 > numa_foreign 541393191 > numa_interleave 19415 > numa_local 3577063288 > numa_other 0 > nr_free_cma 0 > nr_inactive_anon 4646 > nr_active_anon 3265 > nr_inactive_file 633882 > nr_active_file 7017458 > nr_unevictable 0 > nr_isolated_anon 0 > nr_isolated_file 0 > nr_pages_scanned 0 > workingset_refault 42685891 > workingset_activate 15247281 > workingset_nodereclaim 26375216 > nr_anon_pages 5067 > nr_mapped 5630 > nr_file_pages 7654746 > nr_dirty 0 > nr_writeback 0 > nr_writeback_temp 0 > nr_shmem 2504 > nr_shmem_hugepages 0 > nr_shmem_pmdmapped 0 > nr_anon_transparent_hugepages 0 > nr_unstable 0 > nr_vmscan_write 5243750485 > nr_vmscan_immediate_reclaim 4207633857 > nr_dirtied 1839143430 > nr_written 1832626107 > nr_dirty_threshold 1147728 > nr_dirty_background_threshold 151410 > pgpgin 166731189 > pgpgout 7328142335 > pswpin 98608 > pswpout 117794 > pgalloc_dma 29504 > pgalloc_dma32 1006726216 > pgalloc_normal 5275218188 > pgalloc_movable 0 > allocstall_dma 0 > allocstall_dma32 0 > allocstall_normal 36461 > allocstall_movable 5867 > pgskip_dma 0 > pgskip_dma32 0 > pgskip_normal 6417890 > pgskip_movable 0 > pgfree 6309223401 > pgactivate 35076483 > pgdeactivate 63556974 > pgfault 35753842 > pgmajfault 69126 > pglazyfreed 0 > pgrefill 70008598 > pgsteal_kswapd 3567289713 > pgsteal_direct 5878057 > pgscan_kswapd 9059309872 > pgscan_direct 4239367903 > pgscan_direct_throttle 0 > zone_reclaim_failed 0 > pginodesteal 102916 > slabs_scanned 460790262 > kswapd_inodesteal 9130243 > kswapd_low_wmark_hit_quickly 10634373 > kswapd_high_wmark_hit_quickly 7348173 > pageoutrun 18349115 > pgrotated 16291322 > drop_pagecache 0 > drop_slab 0 > pgmigrate_success 18912908 > pgmigrate_fail 63382146 > compact_migrate_scanned 2986269789 > compact_free_scanned 190451505123 > compact_isolated 109549437 > compact_stall 3544 > compact_fail 8 > compact_success 3536 > compact_daemon_wake 1403515 > htlb_buddy_alloc_success 0 > htlb_buddy_alloc_fail 0 > unevictable_pgs_culled 12473 > unevictable_pgs_scanned 0 > unevictable_pgs_rescued 11979 > unevictable_pgs_mlocked 14556 > unevictable_pgs_munlocked 14556 > unevictable_pgs_cleared 0 > unevictable_pgs_stranded 0 > thp_fault_alloc 0 > thp_fault_fallback 0 > thp_collapse_alloc 0 > thp_collapse_alloc_failed 0 > thp_file_alloc 0 > thp_file_mapped 0 > thp_split_page 0 > thp_split_page_failed 0 > thp_deferred_split_page 0 > thp_split_pmd 0 > thp_zero_page_alloc 0 > thp_zero_page_alloc_failed 0 > > kern.log OOM message: > [737778.724194] snmpd invoked oom-killer: > gfp_mask=0x24200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 > [737778.724246] snmpd cpuset=/ mems_allowed=0-1 > [737778.724278] CPU:
Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache
System is an intel dual socket Xeon E5620, 7500/5520/5500/X58 ICH10 family according to lspci. Anyways 4.8.4 OOM'd while I was gone. I'll download the current 4.9rc and reboot, but in the mean time here's xxd, vmstat & kern.log output: 8532039 9324 0100 2226 0200 405 0300 80 0400 34 0500 48 0600 17 0700 17 0800 32 0900 19 0a00 1 0c00 1 0d00 1 0e00 12 1000 8 1100 32 1200 10 1300 2 1400 11 1500 12 1600 7 1700 3 1800 5 1900 6 1a00 11 1b00 22 1c00 3 1d00 19 1e00 21 1f00 18 2000 28 2100 40 2200 38 2300 85 2400 59 2500 40520 81ff /proc/vmstat: nr_free_pages 60965 nr_zone_inactive_anon 4646 nr_zone_active_anon 3265 nr_zone_inactive_file 633882 nr_zone_active_file 7017458 nr_zone_unevictable 0 nr_zone_write_pending 0 nr_mlock 0 nr_slab_reclaimable 299205 nr_slab_unreclaimable 195497 nr_page_table_pages 935 nr_kernel_stack 4976 nr_bounce 0 numa_hit 3577063288 numa_miss 541393191 numa_foreign 541393191 numa_interleave 19415 numa_local 3577063288 numa_other 0 nr_free_cma 0 nr_inactive_anon 4646 nr_active_anon 3265 nr_inactive_file 633882 nr_active_file 7017458 nr_unevictable 0 nr_isolated_anon 0 nr_isolated_file 0 nr_pages_scanned 0 workingset_refault 42685891 workingset_activate 15247281 workingset_nodereclaim 26375216 nr_anon_pages 5067 nr_mapped 5630 nr_file_pages 7654746 nr_dirty 0 nr_writeback 0 nr_writeback_temp 0 nr_shmem 2504 nr_shmem_hugepages 0 nr_shmem_pmdmapped 0 nr_anon_transparent_hugepages 0 nr_unstable 0 nr_vmscan_write 5243750485 nr_vmscan_immediate_reclaim 4207633857 nr_dirtied 1839143430 nr_written 1832626107 nr_dirty_threshold 1147728 nr_dirty_background_threshold 151410 pgpgin 166731189 pgpgout 7328142335 pswpin 98608 pswpout 117794 pgalloc_dma 29504 pgalloc_dma32 1006726216 pgalloc_normal 5275218188 pgalloc_movable 0 allocstall_dma 0 allocstall_dma32 0 allocstall_normal 36461 allocstall_movable 5867 pgskip_dma 0 pgskip_dma32 0 pgskip_normal 6417890 pgskip_movable 0 pgfree 6309223401 pgactivate 35076483 pgdeactivate 63556974 pgfault 35753842 pgmajfault 69126 pglazyfreed 0 pgrefill 70008598 pgsteal_kswapd 3567289713 pgsteal_direct 5878057 pgscan_kswapd 9059309872 pgscan_direct 4239367903 pgscan_direct_throttle 0 zone_reclaim_failed 0 pginodesteal 102916 slabs_scanned 460790262 kswapd_inodesteal 9130243 kswapd_low_wmark_hit_quickly 10634373 kswapd_high_wmark_hit_quickly 7348173 pageoutrun 18349115 pgrotated 16291322 drop_pagecache 0 drop_slab 0 pgmigrate_success 18912908 pgmigrate_fail 63382146 compact_migrate_scanned 2986269789 compact_free_scanned 190451505123 compact_isolated 109549437 compact_stall 3544 compact_fail 8 compact_success 3536 compact_daemon_wake 1403515 htlb_buddy_alloc_success 0 htlb_buddy_alloc_fail 0 unevictable_pgs_culled 12473 unevictable_pgs_scanned 0 unevictable_pgs_rescued 11979 unevictable_pgs_mlocked 14556 unevictable_pgs_munlocked 14556 unevictable_pgs_cleared 0 unevictable_pgs_stranded 0 thp_fault_alloc 0 thp_fault_fallback 0 thp_collapse_alloc 0 thp_collapse_alloc_failed 0 thp_file_alloc 0 thp_file_mapped 0 thp_split_page 0 thp_split_page_failed 0 thp_deferred_split_page 0 thp_split_pmd 0 thp_zero_page_alloc 0 thp_zero_page_alloc_failed 0 kern.log OOM message: [737778.724194] snmpd invoked oom-killer: gfp_mask=0x24200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [737778.724246] snmpd cpuset=/ mems_allowed=0-1 [737778.724278] CPU: 15 PID: 2976 Comm: snmpd Tainted: GW I 4.8.4 #1 [737778.724352] 81292069 88041e043c48 88041e043c48 [737778.724403] 8118d1f6 88041dd70fc0 88041e043c48 0136236f [737778.724454] 8170e11e 0001 8112a700 030f [737778.724505] Call Trace: [737778.724533] [] ? dump_stack+0x46/0x5d [737778.727077] [] ? dump_header.isra.16+0x56/0x185 [737778.727108] [] ? oom_kill_process+0x210/0x3c0 [737778.727136] [] ? out_of_memory+0x34b/0x420 [737778.727165] [] ? __alloc_pages_nodemask+0xd9a/0xde0 [737778.727195] [] ? alloc_pages_vma+0xc1/0x240 [737778.727223] [] ? pagecache_get_page+0x22/0x230 [737778.727253] [] ? __read_swap_cache_async+0x104/0x180 [737778.727282] [] ? read_swap_cache_async+0xf/0x30 [737778.727311] [] ? swapin_readahead+0xec/0x1a0 [737778.727340] [] ? do_swap_page+0x420/0x5c0 [737778.727369] [] ? SYSC_recvfrom+0xa8/0x110 [737778.727397] [] ? handle_mm_fault+0x629/0xe30 [737778.727426] [] ? __do_page_
Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache
On 11/04/2016 03:13 PM, E V wrote: > After the system panic'd yesterday I booted back into 4.8.4 and > restarted the rsync's. I'm away on vacation next week, so when I get > back I'll get rc4 or rc5 and try again. In the mean time here's data > from the system running 4.8.4 without problems for about a day. I'm > not familiar with xxd and didn't see a -e option, so used -E: > xxd -E -g8 -c8 /proc/kpagecount | cut -d" " -f2 | sort | uniq -c > 8258633 > 216440 0100 The lack of -e means it's big endian, which is not a big issue. So here most of memory is free, some pages have just one pin, and only relatively few have more. The vmstats also doesn't show anything bad, so we'll have to wait if something appears within the week, or after you try 4.9 again. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache
After the system panic'd yesterday I booted back into 4.8.4 and restarted the rsync's. I'm away on vacation next week, so when I get back I'll get rc4 or rc5 and try again. In the mean time here's data from the system running 4.8.4 without problems for about a day. I'm not familiar with xxd and didn't see a -e option, so used -E: xxd -E -g8 -c8 /proc/kpagecount | cut -d" " -f2 | sort | uniq -c 8258633 216440 0100 5576 0200 592 0300 195 0400 184 0500 171 0600 70 0700 3 0800 17 0900 48 0a00 78 0b00 33 0c00 23 0d00 18 0e00 3 0f00 5 1000 2 1100 7 1200 5 1300 2 1400 36 1500 10 1600 6 1700 3 1800 8 1900 4 1a00 7 1b00 4 1c00 5 1d00 3 1e00 18 1f00 9 2000 9 2100 9 2200 19 2300 13 2400 6 2500 13 2600 13 2700 3 2800 16 2900 7 2a00 21 2b00 33 2c00 19 2d00 54 2e00 29 2f00 72 3000 27 3100 102635 81ff cat /proc/vmstat nr_free_pages 106970 nr_zone_inactive_anon 110034 nr_zone_active_anon 108424 nr_zone_inactive_file 350017 nr_zone_active_file 2158161 nr_zone_unevictable 0 nr_zone_write_pending 114 nr_mlock 0 nr_slab_reclaimable 4962990 nr_slab_unreclaimable 415089 nr_page_table_pages 2149 nr_kernel_stack 6176 nr_bounce 0 numa_hit 403780590 numa_miss 176970926 numa_foreign 176970926 numa_interleave 19415 numa_local 403780590 numa_other 0 nr_free_cma 0 nr_inactive_anon 110034 nr_active_anon 108424 nr_inactive_file 350017 nr_active_file 2158161 nr_unevictable 0 nr_isolated_anon 0 nr_isolated_file 0 nr_pages_scanned 0 workingset_refault 1443060 workingset_activate 558143 workingset_nodereclaim 6879280 nr_anon_pages 216243 nr_mapped 6462 nr_file_pages 2510544 nr_dirty 114 nr_writeback 0 nr_writeback_temp 0 nr_shmem 2179 nr_shmem_hugepages 0 nr_shmem_pmdmapped 0 nr_anon_transparent_hugepages 0 nr_unstable 0 nr_vmscan_write 1127 nr_vmscan_immediate_reclaim 19056 nr_dirtied 254716641 nr_written 254532248 nr_dirty_threshold 383652 nr_dirty_background_threshold 50612 pgpgin 21962903 pgpgout 1024651087 pswpin 214 pswpout 1127 pgalloc_dma 0 pgalloc_dma32 87690791 pgalloc_normal 806119097 pgalloc_movable 0 allocstall_dma 0 allocstall_dma32 0 allocstall_normal 210 allocstall_movable 0 pgskip_dma 0 pgskip_dma32 0 pgskip_normal 0 pgskip_movable 0 pgfree 894694404 pgactivate 5513535 pgdeactivate 7989719 pgfault 4748538 pgmajfault 2528 pglazyfreed 0 pgrefill 7999038 pgsteal_kswapd 504125672 pgsteal_direct 36130 pgscan_kswapd 504479233 pgscan_direct 36142 pgscan_direct_throttle 0 zone_reclaim_failed 0 pginodesteal 1074 slabs_scanned 61625344 kswapd_inodesteal 1956613 kswapd_low_wmark_hit_quickly 49386 kswapd_high_wmark_hit_quickly 79880 pageoutrun 211656 pgrotated 203832 drop_pagecache 0 drop_slab 0 pgmigrate_success 684523 pgmigrate_fail 1189249 compact_migrate_scanned 94848219 compact_free_scanned 2329620072 compact_isolated 2648057 compact_stall 38 compact_fail 0 compact_success 38 compact_daemon_wake 9682 htlb_buddy_alloc_success 0 htlb_buddy_alloc_fail 0 unevictable_pgs_culled 12473 unevictable_pgs_scanned 0 unevictable_pgs_rescued 11979 unevictable_pgs_mlocked 14556 unevictable_pgs_munlocked 14556 unevictable_pgs_cleared 0 unevictable_pgs_stranded 0 thp_fault_alloc 0 thp_fault_fallback 0 thp_collapse_alloc 0 thp_collapse_alloc_failed 0 thp_file_alloc 0 thp_file_mapped 0 thp_split_page 0 thp_split_page_failed 0 thp_deferred_split_page 0 thp_split_pmd 0 thp_zero_page_alloc 0 thp_zero_page_alloc_failed 0 On Thu, Nov 3, 2016 at 7:58 PM, Vlastimil Babka wrote: > On 11/03/2016 07:53 PM, Andrew Morton wrote: >> >> (switched to email. Please respond via emailed reply-to-all, not via the >> bugzilla web interface). > > +CC also btrfs just in case it's a problem in page reclaim there > >> On Wed, 02 Nov 2016 13:02:39 + bugzilla-dae...@bugzilla.kernel.org wrote: >> >>> https://bugzilla.kernel.org/show_bug.cgi?id=186671 >>> >>> Bug ID: 186671 >>>Summary: OOM on system with just rsync running 32GB of ram 30GB >>> of pagecache >>>Product: Memory Management >>>Version: 2.5 >>> Kernel Version: 4.9-rc3 >>> Hardware: x86-64 >>> OS: Linux >>> Tree: Mainline >>> Statu
Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache
On 11/03/2016 07:53 PM, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). +CC also btrfs just in case it's a problem in page reclaim there > On Wed, 02 Nov 2016 13:02:39 + bugzilla-dae...@bugzilla.kernel.org wrote: > >> https://bugzilla.kernel.org/show_bug.cgi?id=186671 >> >> Bug ID: 186671 >>Summary: OOM on system with just rsync running 32GB of ram 30GB >> of pagecache >>Product: Memory Management >>Version: 2.5 >> Kernel Version: 4.9-rc3 >> Hardware: x86-64 >> OS: Linux >> Tree: Mainline >> Status: NEW >> Severity: high >> Priority: P1 >> Component: Page Allocator >> Assignee: a...@linux-foundation.org >> Reporter: eliven...@gmail.com >> Regression: No >> >> Running rsync on a debian jessie system with 32GB of RAM and a big >> 250TB btrfs filesystem. 30 GB of ram show up as cached, not much else >> running on the system. Lots of page alloction stalls in dmesg before >> hand, and several OOM's after this one as well until it finally killed >> the rsync. So more traces available if desired. Started with the 4.7 >> series kernels, thought it was going to be fixed in 4.9: > > OK, this looks bad. Please let's work it via email so do remember the > reply-to-alls. It's bad but note the "started with 4.7" so it's not a 4.9 regression. Also not a high-order OOM (phew!). >> [93428.029768] irqbalance invoked oom-killer: >> gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=0-1, order=0, >> oom_score_adj=0 >> [93428.029824] irqbalance cpuset=/ mems_allowed=0-1 >> [93428.029857] CPU: 11 PID: 2992 Comm: irqbalance Tainted: G W >> 4.9.0-rc3 >> #1 >> [93428.029945] 812946c9 c90003d8bb10 >> c90003d8bb10 >> [93428.029997] 81190dd5 >> 88081db051c0 >> [93428.030049] c90003d8bb10 81711866 0002 >> 0213 >> [93428.030101] Call Trace: >> [93428.030127] [] ? dump_stack+0x46/0x5d >> [93428.030157] [] ? dump_header.isra.20+0x75/0x1a6 >> [93428.030189] [] ? oom_kill_process+0x219/0x3d0 >> [93428.030218] [] ? out_of_memory+0xd9/0x570 >> [93428.030246] [] ? __alloc_pages_slowpath+0xa4b/0xa80 >> [93428.030276] [] ? __alloc_pages_nodemask+0x288/0x2c0 >> [93428.030306] [] ? alloc_pages_vma+0xc1/0x240 >> [93428.030337] [] ? handle_mm_fault+0xccb/0xe60 >> [93428.030367] [] ? __do_page_fault+0x1c5/0x490 >> [93428.030397] [] ? page_fault+0x22/0x30 >> [93428.030425] [] ? copy_user_generic_string+0x2c/0x40 >> [93428.030455] [] ? seq_read+0x305/0x370 >> [93428.030483] [] ? proc_reg_read+0x3e/0x60 >> [93428.030511] [] ? __vfs_read+0x1e/0x110 >> [93428.030538] [] ? vfs_read+0x89/0x130 >> [93428.030564] [] ? SyS_read+0x3d/0x90 >> [93428.030591] [] ? entry_SYSCALL_64_fastpath+0x13/0x94 >> [93428.030620] Mem-Info: >> [93428.030647] active_anon:9283 inactive_anon:9905 isolated_anon:0 >> [93428.030647] active_file:6752598 inactive_file:999166 isolated_file:288 >> [93428.030647] unevictable:0 dirty:997857 writeback:1665 unstable:0 >> [93428.030647] slab_reclaimable:203122 slab_unreclaimable:202102 >> [93428.030647] mapped:7933 shmem:3170 pagetables:1752 bounce:0 >> [93428.030647] free:39250 free_pcp:954 free_cma:0 >> [93428.030800] Node 0 active_anon:24984kB inactive_anon:26704kB >> active_file:14365920kB inactive_file:1341120kB unevictable:0kB >> isolated(anon):0kB isolated(file):0kB mapped:15852kB dirty:1338044kB >> writeback:3072kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB >> anon_thp: 9484kB writeback_tmp:0kB unstable:0kB pages_scanned:23811175 >> all_unreclaimable? yes >> [93428.030933] Node 1 active_anon:12148kB inactive_anon:12916kB >> active_file:12644472kB inactive_file:2655544kB unevictable:0kB >> isolated(anon):0kB isolated(file):1152kB mapped:15880kB >> dirty:2653384kB writeback:3588kB shmem:0kB shmem_thp: 0kB >> shmem_pmdmapped: 0kB anon_thp: 3196kB writeback_tmp:0kB unstable:0kB >> pages_scanned:23178917 all_unreclaimable? yes Note the high pages_scanned and all_unreclaimable. I suspect something is pinning the memory. Can you post /proc/vmstat from the system with an uptime after it experiences the OOM? There's /proc/kpagecount file that could confirm that. Could you provide it too? Try running something like this and provide the output please. xxd -e -g8 -c8 /proc/kpagecount | cut -d" " -f2 | sort | uniq -c >> [93428.031059] Node 0 Normal free:44968kB min:45192kB low:61736kB >> high:78280kB active_anon:24984kB inactive_anon:26704kB >> active_file:14365920kB inactive_file:1341120kB unevictable:0kB >> writepending:1341116kB present:16777216kB managed:16546296kB >> mlocked:0kB slab_reclaimable:413824kB slab_unreclaimable:253144kB >> kernel_stack:3496kB pagetables:4104kB bounce:0kB free_pcp:1388kB >> local_pcp:0