On Tue, Mar 26, 2019 at 12:30:07AM -0400, Zygo Blaxell wrote: > On Mon, Mar 25, 2019 at 10:50:28PM -0400, Zygo Blaxell wrote: > > Running balance, rsync, and dedupe, I get kernel warnings every few > > minutes on 5.0.4. No warnings on 5.0.3 under similar conditions. > > > > Mount options are: flushoncommit,space_cache=v2,compress=zstd. > > > > There are two different stacks on the warnings. This one comes from > > btrfs balance: > > [snip] > > Possibly unrelated, but I'm also repeatably getting this in 5.0.4 and > not 5.0.3, after about 5 hours of uptime. Different processes, same > kernel stack: > > [Mon Mar 25 23:35:17 2019] kworker/u8:4: page allocation failure: > order:0, mode:0x404000(GFP_NOWAIT|__GFP_COMP), > nodemask=(null),cpuset=/,mems_allowed=0
It turns out there were two bugs overlapping here: 5.0.3 -> 5.0.4 introduced the initial bug I observed in March In my own 5.0.6 builds, I turned on KASAN (and several other memory debug kernel config options) to chase an unrelated bug with behavior that was looking like a use-after-free. This introduced a second page allocation failure behavior that occurred much more rapidly (every few seconds instead of every few hours). Somewhere between 5.0.5 and 5.0.10, the first, infrequent bug introduced in March was fixed. In my 5.0.10 and later builds, I turned KASAN off again. This avoids the second, very frequent bug. I have no idea what bug got introduced and removed somewhere in 5.0.3..5.0.10, but I can't reproduce it on 5.0.11 and later, and I have more urgent bugs to squash. I'll leave what I've learned here in case someone rediscovers these problems. I ran a lot of tests turning things on and off and found that page allocation failures seem to be triggered in large numbers by a combination of _all_ of the following: 1. btrfs (ext4 doesn't trigger it) 2. lvmcache (using the same hardware in a btrfs on plain HDD or SSD doesn't trigger it, bcache has other problems that make it untestable) 3. kernel KASAN (must be enabled to trigger it) 4. userspace running rsync (seems any workload that writes pages continuously to a btrfs filesystem will suffice) 5. Linux 5.0.6..5.0.10 (haven't rigorously tested anything else) 6. Unknown other factors? (I only tested the above) Some other things I tried turning on or off that have no effect: A. dedupe, snapshots, balances, processes reading the btrfs filesystem B. using memory cgroups to limit page cache RAM usage of the rsync process C. sysctl vm.dirty_ratio (1..50), dirty_background_ratio (1..10) dirty_expire_centisecs (100..30000) to try to increase or decrease the number of reclaimable pages and reclaim rate D. btrfs flushoncommit and compress mount options E. running memory-hog applications to try to trigger page allocation failures faster (fork() returns ENOMEM before these even get to start) F. changing host memory size (2GB to 8GB in 1GB increments) The page allocation failures start soon after all the host's RAM is allocated to page cache. They lead to a lot of processes getting ENOMEM on various system calls, especially fork(). This effectively kills some random process every second or two. Half the commands run from a shell will immediately fail with ENOMEM, assuming the shell itself doesn't die. Booting a desktop or building a kernel is impossible. I can live without KASAN, so I've just turned it off for my 5.0 kernel builds. I've had no problems since I did that near the end of April. Here's a stacktrace with KASAN: [ 5157.469012] kworker/2:0: page allocation failure: order:0, mode:0x400000(GFP_NOWAIT), nodemask=(null),cpuset=/,mems_allowed=0 [ 5157.482299] CPU: 2 PID: 12944 Comm: kworker/2:0 Not tainted 5.0.10-zb64-1bb762d31089+ #1 [ 5157.487810] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 5157.494046] Workqueue: dm-cache check_migrations [dm_cache] [ 5157.496751] Call Trace: [ 5157.497564] dump_stack+0x9c/0xf5 [ 5157.500331] warn_alloc+0x19a/0x240 [ 5157.501518] ? zone_watermark_ok_safe+0x140/0x140 [ 5157.502796] ? __isolate_free_page+0x2b0/0x2b0 [ 5157.503916] __alloc_pages_nodemask+0x1857/0x1980 [ 5157.506179] ? debug_show_all_locks+0x210/0x210 [ 5157.508559] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [ 5157.509939] ? __asan_loadN+0xf/0x20 [ 5157.511398] ? pvclock_clocksource_read+0xd8/0x180 [ 5157.513057] ? kvm_sched_clock_read+0x18/0x30 [ 5157.514301] ? check_chain_key+0x147/0x200 [ 5157.515414] ? mark_lock+0xad/0x930 [ 5157.516368] ? mark_lock+0xad/0x930 [ 5157.517333] ? mark_held_locks+0x92/0xc0 [ 5157.518398] ? lock_release+0x7e/0x720 [ 5157.519432] ? mark_held_locks+0x92/0xc0 [ 5157.520627] ? __asan_loadN+0xf/0x20 [ 5157.521574] ? pvclock_clocksource_read+0xd8/0x180 [ 5157.523130] alloc_pages_current+0x75/0x110 [ 5157.524244] new_slab+0x44a/0x6a0 [ 5157.525244] ___slab_alloc+0x434/0x5c0 [ 5157.526469] ? btracker_queue+0xcc/0x2d0 [dm_cache] [ 5157.527976] ? debug_show_all_locks+0x210/0x210 [ 5157.529254] ? __slab_alloc+0x3c/0x90 [ 5157.530412] ? btracker_queue+0xcc/0x2d0 [dm_cache] [ 5157.531749] __slab_alloc+0x51/0x90 [ 5157.532789] ? __slab_alloc+0x51/0x90 [ 5157.533766] ? btracker_queue+0xcc/0x2d0 [dm_cache] [ 5157.535172] kmem_cache_alloc+0x229/0x290 [ 5157.536310] btracker_queue+0xcc/0x2d0 [dm_cache] [ 5157.537692] queue_writeback+0x157/0x1d0 [dm_cache_smq] [ 5157.539206] ? mark_pending.isra.24+0x50/0x50 [dm_cache_smq] [ 5157.540879] ? btracker_issue+0x28/0x120 [dm_cache] [ 5157.542302] smq_get_background_work+0xa3/0xc0 [dm_cache_smq] [ 5157.543805] check_migrations+0x1b8/0x2e0 [dm_cache] [ 5157.545426] ? cache_ctr+0x1a70/0x1a70 [dm_cache] [ 5157.548005] process_one_work+0x476/0xa00 [ 5157.549104] ? pwq_dec_nr_in_flight+0x130/0x130 [ 5157.550784] ? do_raw_spin_lock+0x1d0/0x1d0 [ 5157.551952] ? move_linked_works+0x113/0x150 [ 5157.553119] worker_thread+0x76/0x5d0 [ 5157.554163] kthread+0x1a9/0x200 [ 5157.555028] ? process_one_work+0xa00/0xa00 [ 5157.556221] ? kthread_park+0xb0/0xb0 [ 5157.559982] ret_from_fork+0x3a/0x50 [ 5157.561253] Mem-Info: [ 5157.561822] active_anon:134320 inactive_anon:4419 isolated_anon:0 [ 5157.561822] active_file:1107612 inactive_file:249821 isolated_file:115 [ 5157.561822] unevictable:0 dirty:4092 writeback:25 unstable:0 [ 5157.561822] slab_reclaimable:95574 slab_unreclaimable:98822 [ 5157.561822] mapped:8463 shmem:4486 pagetables:2982 bounce:0 [ 5157.561822] free:25133 free_pcp:1021 free_cma:13 [ 5157.572482] Node 0 active_anon:537280kB inactive_anon:17676kB active_file:4430448kB inactive_file:999284kB unevictable:0kB isolated(anon):0kB isolated(file):460kB mapped:33852kB dirty:16368kB writeback:100kB shmem:17944kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 323584kB wr iteback_tmp:0kB unstable:0kB all_unreclaimable? no [ 5157.579907] Node 0 DMA free:15812kB min:152kB low:188kB high:224kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0 kB free_cma:0kB [ 5157.585961] lowmem_reserve[]: 0 2991 6846 6846 6846 [ 5157.587060] Node 0 DMA32 free:44432kB min:29352kB low:36688kB high:44024kB active_anon:14056kB inactive_anon:8kB active_file:2068012kB inactive_file:594748kB unevictable:0kB writepending:8620kB present:3129176kB managed:3063640kB mlocked:0kB kernel_stack:960kB pagetables:556kB b ounce:0kB free_pcp:4272kB local_pcp:748kB free_cma:0kB [ 5157.593250] lowmem_reserve[]: 0 0 3854 3854 3854 [ 5157.594346] Node 0 Normal free:40288kB min:40124kB low:51688kB high:61204kB active_anon:523580kB inactive_anon:17668kB active_file:2361432kB inactive_file:404492kB unevictable:0kB writepending:6060kB present:5242880kB managed:3974560kB mlocked:0kB kernel_stack:15488kB pagetables :11372kB bounce:0kB free_pcp:404kB local_pcp:32kB free_cma:52kB [ 5157.601624] lowmem_reserve[]: 0 0 0 0 0 [ 5157.602574] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 0*32kB 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15812kB [ 5157.605746] Node 0 DMA32: 605*4kB (UME) 438*8kB (UME) 378*16kB (UME) 124*32kB (UME) 162*64kB (UME) 101*128kB (UM) 21*256kB (ME) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 45124kB [ 5157.609543] Node 0 Normal: 651*4kB (UMEC) 693*8kB (UMEC) 658*16kB (UME) 398*32kB (UMEH) 63*64kB (UME) 26*128kB (UE) 9*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 41076kB [ 5157.613190] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 5157.615305] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 5157.617353] 1361051 total pagecache pages [ 5157.618413] 0 pages in swap cache [ 5157.619220] Swap cache stats: add 0, delete 0, find 0/0 [ 5157.620582] Free swap = 0kB [ 5157.621380] Total swap = 0kB [ 5157.622092] 2097012 pages RAM [ 5157.622815] 0 pages HighMem/MovableOnly [ 5157.623804] 333485 pages reserved [ 5157.624606] 4096 pages cma reserved [ 5157.625493] 0 pages hwpoisoned [ 5157.626286] SLUB: Unable to allocate memory on node -1, gfp=0x400000(GFP_NOWAIT) [ 5157.628335] cache: bt_work, object size: 64, buffer size: 96, default order: 0, min order: 0 [ 5157.630495] node 0: slabs: 111, objs: 4662, free: 0 This is the original (5.0.4) KASAN-free stacktrace: > [Mon Mar 25 23:35:17 2019] CPU: 2 PID: 29518 Comm: kworker/u8:4 > Tainted: G W 5.0.4-zb64-303ce93b05c9+ #1 > [Mon Mar 25 23:35:17 2019] Hardware name: QEMU Standard PC (i440FX + > PIIX, 1996), BIOS 1.10.2-1 04/01/2014 > [Mon Mar 25 23:35:17 2019] Workqueue: btrfs-submit btrfs_submit_helper > [Mon Mar 25 23:35:17 2019] Call Trace: > [Mon Mar 25 23:35:17 2019] dump_stack+0x7d/0xbb > [Mon Mar 25 23:35:17 2019] warn_alloc+0x108/0x190 > [Mon Mar 25 23:35:17 2019] __alloc_pages_nodemask+0x12c4/0x13f0 > [Mon Mar 25 23:35:17 2019] ? rcu_read_lock_sched_held+0x68/0x70 > [Mon Mar 25 23:35:17 2019] ? __update_load_avg_se+0x208/0x280 > [Mon Mar 25 23:35:17 2019] cache_grow_begin+0x79/0x730 > [Mon Mar 25 23:35:17 2019] ? cache_grow_begin+0x79/0x730 > [Mon Mar 25 23:35:17 2019] ? ____cache_alloc_node+0x165/0x1e0 > [Mon Mar 25 23:35:17 2019] fallback_alloc+0x1e4/0x280 > [Mon Mar 25 23:35:17 2019] kmem_cache_alloc+0x2e9/0x310 > [Mon Mar 25 23:35:17 2019] btracker_queue+0x47/0x170 [dm_cache] > [Mon Mar 25 23:35:17 2019] __lookup+0x51a/0x600 [dm_cache_smq] > [Mon Mar 25 23:35:17 2019] ? smq_lookup+0x37/0x7b [dm_cache_smq] > [Mon Mar 25 23:35:17 2019] smq_lookup+0x5d/0x7b [dm_cache_smq] > [Mon Mar 25 23:35:18 2019] map_bio.part.40+0x14d/0x5d0 [dm_cache] > [Mon Mar 25 23:35:18 2019] ? bio_detain_shared+0xb3/0x120 [dm_cache] > [Mon Mar 25 23:35:18 2019] cache_map+0x120/0x170 [dm_cache] > [Mon Mar 25 23:35:18 2019] __map_bio+0x42/0x1f0 [dm_mod] > [Mon Mar 25 23:35:18 2019] __split_and_process_non_flush+0x152/0x1e0 > [dm_mod] > [Mon Mar 25 23:35:18 2019] __split_and_process_bio+0xd4/0x400 [dm_mod] > [Mon Mar 25 23:35:18 2019] ? lock_acquire+0xbc/0x1c0 > [Mon Mar 25 23:35:18 2019] ? dm_get_live_table+0x5/0xc0 [dm_mod] > [Mon Mar 25 23:35:18 2019] dm_make_request+0x4d/0x100 [dm_mod] > [Mon Mar 25 23:35:18 2019] generic_make_request+0x297/0x470 > [Mon Mar 25 23:35:18 2019] ? kvm_sched_clock_read+0x14/0x30 > [Mon Mar 25 23:35:18 2019] ? submit_bio+0x6c/0x140 > [Mon Mar 25 23:35:18 2019] submit_bio+0x6c/0x140 > [Mon Mar 25 23:35:18 2019] run_scheduled_bios+0x1e6/0x500 > [Mon Mar 25 23:35:18 2019] ? normal_work_helper+0x95/0x530 > [Mon Mar 25 23:35:18 2019] normal_work_helper+0x95/0x530 > [Mon Mar 25 23:35:18 2019] process_one_work+0x223/0x5d0 > [Mon Mar 25 23:35:18 2019] worker_thread+0x4f/0x3b0 > [Mon Mar 25 23:35:18 2019] kthread+0x106/0x140 > [Mon Mar 25 23:35:18 2019] ? process_one_work+0x5d0/0x5d0 > [Mon Mar 25 23:35:18 2019] ? kthread_park+0x90/0x90 > [Mon Mar 25 23:35:18 2019] ret_from_fork+0x3a/0x50 > [Mon Mar 25 23:35:18 2019] Mem-Info: > [Mon Mar 25 23:35:18 2019] active_anon:195872 inactive_anon:15658 > isolated_anon:0 > active_file:629653 inactive_file:308914 > isolated_file:0 > unevictable:65536 dirty:14449 > writeback:27580 unstable:0 > slab_reclaimable:492522 > slab_unreclaimable:94393 > mapped:10915 shmem:18846 pagetables:2178 > bounce:0 > free:66082 free_pcp:1963 free_cma:24 > [Mon Mar 25 23:35:18 2019] Node 0 active_anon:783488kB > inactive_anon:62632kB active_file:2516656kB inactive_file:1235604kB > unevictable:262144kB isolated(anon):0kB isolated(file):0kB mapped:43660kB > dirty:57796kB writeback:110320kB shmem:75384kB shmem_thp: 0kB > shmem_pmdmapped: 0kB anon_thp: 137216kB writeback_tmp:0kB unstable:0kB > all_unreclaimable? no > [Mon Mar 25 23:35:18 2019] Node 0 DMA free:15844kB min:132kB low:164kB > high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB > inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB > managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB > free_pcp:0kB local_pcp:0kB free_cma:0kB > [Mon Mar 25 23:35:18 2019] lowmem_reserve[]: 0 2939 7906 7906 7906 > [Mon Mar 25 23:35:18 2019] Node 0 DMA32 free:116632kB min:81336kB > low:87592kB high:93848kB active_anon:330436kB inactive_anon:31840kB > active_file:531512kB inactive_file:491372kB unevictable:108544kB > writepending:121984kB present:3129176kB managed:3019336kB mlocked:108544kB > kernel_stack:2704kB pagetables:2944kB bounce:0kB free_pcp:3800kB > local_pcp:256kB free_cma:0kB > [Mon Mar 25 23:35:18 2019] lowmem_reserve[]: 0 0 4966 4966 4966 > [Mon Mar 25 23:35:18 2019] Node 0 Normal free:141932kB min:42420kB > low:53024kB high:63628kB active_anon:453224kB inactive_anon:30792kB > active_file:1984568kB inactive_file:737244kB unevictable:153600kB > writepending:46544kB present:5242880kB managed:5102032kB mlocked:153600kB > kernel_stack:7020kB pagetables:5768kB bounce:0kB free_pcp:4188kB > local_pcp:1700kB free_cma:96kB > [Mon Mar 25 23:35:18 2019] lowmem_reserve[]: 0 0 0 0 0 > [Mon Mar 25 23:35:18 2019] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB > (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) > 3*4096kB (M) = 15844kB > [Mon Mar 25 23:35:18 2019] Node 0 DMA32: 12698*4kB (UME) 6110*8kB (UME) > 1087*16kB (UME) 4*32kB (M) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB > 0*4096kB = 117192kB > [Mon Mar 25 23:35:18 2019] Node 0 Normal: 13507*4kB (UMEHC) 8008*8kB > (UMEHC) 1374*16kB (UMEH) 3*32kB (H) 4*64kB (H) 4*128kB (H) 2*256kB (H) > 1*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 141964kB > [Mon Mar 25 23:35:18 2019] Node 0 hugepages_total=0 hugepages_free=0 > hugepages_surp=0 hugepages_size=1048576kB > [Mon Mar 25 23:35:18 2019] Node 0 hugepages_total=0 hugepages_free=0 > hugepages_surp=0 hugepages_size=2048kB > [Mon Mar 25 23:35:18 2019] 955001 total pagecache pages > [Mon Mar 25 23:35:18 2019] 0 pages in swap cache > [Mon Mar 25 23:35:18 2019] Swap cache stats: add 0, delete 0, find 0/0 > [Mon Mar 25 23:35:18 2019] Free swap = 0kB > [Mon Mar 25 23:35:18 2019] Total swap = 0kB > [Mon Mar 25 23:35:18 2019] 2097012 pages RAM > [Mon Mar 25 23:35:18 2019] 0 pages HighMem/MovableOnly > [Mon Mar 25 23:35:18 2019] 62693 pages reserved > [Mon Mar 25 23:35:18 2019] 4096 pages cma reserved > [Mon Mar 25 23:35:18 2019] 0 pages hwpoisoned > > As far as I can tell from logs of memory usage (recording memcg and > MemAvail stats every second), there are gigabytes of available (clean > page cache) RAM at all times during these tests.
signature.asc
Description: PGP signature