Re: list corruption in deferred_split_scan()
> On Aug 5, 2019, at 6:15 PM, Yang Shi wrote: > > > > On 7/25/19 2:46 PM, Yang Shi wrote: >> >> >> On 7/24/19 2:13 PM, Qian Cai wrote: >>> On Wed, 2019-07-10 at 17:43 -0400, Qian Cai wrote: Running LTP oom01 test case with swap triggers a crash below. Revert the series "Make deferred split shrinker memcg aware" [1] seems fix the issue. >>> You might want to look harder on this commit, as reverted it alone on the >>> top of >>> 5.2.0-next-20190711 fixed the issue. >>> >>> aefde94195ca mm: thp: make deferred split shrinker memcg aware [1] >>> >>> [1] >>> https://lore.kernel.org/linux-mm/1561507361-59349-5-git-send-email-yang.shi@ >>> linux.alibaba.com/ >> >> This is the real meat of the patch series, which converted to memcg deferred >> split queue actually. >> >>> >>> >>> list_del corruption. prev->next should be ea0022b10098, but was >>> >> >> Finally I could reproduce the list corruption issue on my machine with THP >> swap (swap device is fast device). I should checked this with you at the >> first place. The problem can't be reproduced with rotate swap device. So, >> I'm supposed you were using THP swap too. >> >> Actually, I found two issues with THP swap: >> 1. free_transhuge_page() is called in reclaim path instead of put_page. The >> mem_cgroup_uncharge() is called before free_transhuge_page() in reclaim >> path, which causes page->mem_cgroup is NULL so the wrong >> deferred_split_queue would be used, so the THP was not deleted from the >> memcg's list at all. Then the page might be split or reused later, >> page->mapping would be override. >> >> 2. There is a race condition caused by try_to_unmap() with THP swap. The >> try_to_unmap() just calls page_remove_rmap() to add THP to deferred split >> queue in reclaim path. This might cause the below race condition to corrupt >> the list: >> >> A B >> deferred_split_scan >> list_move >>try_to_unmap >> list_add_tail >> >> list_splice <-- The list might get corrupted here >> >>free_transhuge_page >> list_del <-- kernel >> bug triggered >> >> I hope the below patch would solve your problem (tested locally). > > Hi Qian, > > Did the below patch solve your problem? I would like the fold the fix into > the series then target to 5.4 release. It is going to take a while before I would be able to access that system again. Since you can reproduce this and test yourself now, I’d say go ahead posting the patch. > > Thanks, > Yang > >> >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index b7f709d..d6612ec 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -2830,6 +2830,19 @@ void deferred_split_huge_page(struct page *page) >> >> VM_BUG_ON_PAGE(!PageTransHuge(page), page); >> >> + /* >> +* The try_to_unmap() in page reclaim path might reach here too, >> +* this may cause a race condition to corrupt deferred split queue. >> +* And, if page reclaim is already handling the same page, it is >> +* unnecessary to handle it again in shrinker. >> +* >> +* Check PageSwapCache to determine if the page is being >> +* handled by page reclaim since THP swap would add the page into >> +* swap cache before reaching try_to_unmap(). >> +*/ >> + if (PageSwapCache(page)) >> + return; >> + >> spin_lock_irqsave(&ds_queue->split_queue_lock, flags); >> if (list_empty(page_deferred_list(page))) { >> count_vm_event(THP_DEFERRED_SPLIT_PAGE); >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index a0301ed..40c684a 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -1485,10 +1485,9 @@ static unsigned long shrink_page_list(struct >> list_head *page_list, >> * Is there need to periodically free_page_list? It would >> * appear not as the counts should be low >> */ >> - if (unlikely(PageTransHuge(page))) { >> - mem_cgroup_uncharge(page); >> + if (unlikely(PageTransHuge(page))) >> (*get_compound_page_dtor(page))(page); >> - } else >> + else >> list_add(&page->lru, &free_pages); >> continue; >> >> @@ -1909,7 +1908,6 @@ static unsigned noinline_for_stack >> move_pages_to_lru(struct lruvec *lruvec, >> >> if (unlikely(PageCompound(page))) { >> spin_unlock_irq(&pgdat->lru_lock); >> - mem_cgroup_uncharge(page); >> (*get_compound_page_dtor(page))(page); >> spin_lock_irq(&pgdat->lru_lock); >> } else >> >>> [ 685.284254][ T3456] [ cut here ] >>> [ 685.289616][ T34
Re: list corruption in deferred_split_scan()
On 7/25/19 2:46 PM, Yang Shi wrote: On 7/24/19 2:13 PM, Qian Cai wrote: On Wed, 2019-07-10 at 17:43 -0400, Qian Cai wrote: Running LTP oom01 test case with swap triggers a crash below. Revert the series "Make deferred split shrinker memcg aware" [1] seems fix the issue. You might want to look harder on this commit, as reverted it alone on the top of 5.2.0-next-20190711 fixed the issue. aefde94195ca mm: thp: make deferred split shrinker memcg aware [1] [1] https://lore.kernel.org/linux-mm/1561507361-59349-5-git-send-email-yang.shi@ linux.alibaba.com/ This is the real meat of the patch series, which converted to memcg deferred split queue actually. list_del corruption. prev->next should be ea0022b10098, but was Finally I could reproduce the list corruption issue on my machine with THP swap (swap device is fast device). I should checked this with you at the first place. The problem can't be reproduced with rotate swap device. So, I'm supposed you were using THP swap too. Actually, I found two issues with THP swap: 1. free_transhuge_page() is called in reclaim path instead of put_page. The mem_cgroup_uncharge() is called before free_transhuge_page() in reclaim path, which causes page->mem_cgroup is NULL so the wrong deferred_split_queue would be used, so the THP was not deleted from the memcg's list at all. Then the page might be split or reused later, page->mapping would be override. 2. There is a race condition caused by try_to_unmap() with THP swap. The try_to_unmap() just calls page_remove_rmap() to add THP to deferred split queue in reclaim path. This might cause the below race condition to corrupt the list: A B deferred_split_scan list_move try_to_unmap list_add_tail list_splice <-- The list might get corrupted here free_transhuge_page list_del <-- kernel bug triggered I hope the below patch would solve your problem (tested locally). Hi Qian, Did the below patch solve your problem? I would like the fold the fix into the series then target to 5.4 release. Thanks, Yang diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b7f709d..d6612ec 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2830,6 +2830,19 @@ void deferred_split_huge_page(struct page *page) VM_BUG_ON_PAGE(!PageTransHuge(page), page); + /* + * The try_to_unmap() in page reclaim path might reach here too, + * this may cause a race condition to corrupt deferred split queue. + * And, if page reclaim is already handling the same page, it is + * unnecessary to handle it again in shrinker. + * + * Check PageSwapCache to determine if the page is being + * handled by page reclaim since THP swap would add the page into + * swap cache before reaching try_to_unmap(). + */ + if (PageSwapCache(page)) + return; + spin_lock_irqsave(&ds_queue->split_queue_lock, flags); if (list_empty(page_deferred_list(page))) { count_vm_event(THP_DEFERRED_SPLIT_PAGE); diff --git a/mm/vmscan.c b/mm/vmscan.c index a0301ed..40c684a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1485,10 +1485,9 @@ static unsigned long shrink_page_list(struct list_head *page_list, * Is there need to periodically free_page_list? It would * appear not as the counts should be low */ - if (unlikely(PageTransHuge(page))) { - mem_cgroup_uncharge(page); + if (unlikely(PageTransHuge(page))) (*get_compound_page_dtor(page))(page); - } else + else list_add(&page->lru, &free_pages); continue; @@ -1909,7 +1908,6 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, if (unlikely(PageCompound(page))) { spin_unlock_irq(&pgdat->lru_lock); - mem_cgroup_uncharge(page); (*get_compound_page_dtor(page))(page); spin_lock_irq(&pgdat->lru_lock); } else [ 685.284254][ T3456] [ cut here ] [ 685.289616][ T3456] kernel BUG at lib/list_debug.c:53! [ 685.294808][ T3456] invalid opcode: [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 685.301998][ T3456] CPU: 5 PID: 3456 Comm: oom01 Tainted: GW 5.2.0-next-20190711+ #3 [ 685.311193][ T3456] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/24/2019 [ 685.320485][ T3456] RIP: 0010:__list_del_entry_valid+0x8b/0xb6 [ 685.326364][ T3456] Code: f1 e0 ff 49 8b 55 08 4c 39 e2 75 2c 5b b8 01 00 00 00 41 5c 41 5d 5d c3 4c 89 e2 48 89 de 48 c7 c7 c0 5a 73 a3 e8 d9 fa bc ff <0f> 0b 48 c7 c7 60 a0 e1
Re: list corruption in deferred_split_scan()
On 7/24/19 2:13 PM, Qian Cai wrote: On Wed, 2019-07-10 at 17:43 -0400, Qian Cai wrote: Running LTP oom01 test case with swap triggers a crash below. Revert the series "Make deferred split shrinker memcg aware" [1] seems fix the issue. You might want to look harder on this commit, as reverted it alone on the top of 5.2.0-next-20190711 fixed the issue. aefde94195ca mm: thp: make deferred split shrinker memcg aware [1] [1] https://lore.kernel.org/linux-mm/1561507361-59349-5-git-send-email-yang.shi@ linux.alibaba.com/ This is the real meat of the patch series, which converted to memcg deferred split queue actually. list_del corruption. prev->next should be ea0022b10098, but was Finally I could reproduce the list corruption issue on my machine with THP swap (swap device is fast device). I should checked this with you at the first place. The problem can't be reproduced with rotate swap device. So, I'm supposed you were using THP swap too. Actually, I found two issues with THP swap: 1. free_transhuge_page() is called in reclaim path instead of put_page. The mem_cgroup_uncharge() is called before free_transhuge_page() in reclaim path, which causes page->mem_cgroup is NULL so the wrong deferred_split_queue would be used, so the THP was not deleted from the memcg's list at all. Then the page might be split or reused later, page->mapping would be override. 2. There is a race condition caused by try_to_unmap() with THP swap. The try_to_unmap() just calls page_remove_rmap() to add THP to deferred split queue in reclaim path. This might cause the below race condition to corrupt the list: A B deferred_split_scan list_move try_to_unmap list_add_tail list_splice <-- The list might get corrupted here free_transhuge_page list_del <-- kernel bug triggered I hope the below patch would solve your problem (tested locally). diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b7f709d..d6612ec 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2830,6 +2830,19 @@ void deferred_split_huge_page(struct page *page) VM_BUG_ON_PAGE(!PageTransHuge(page), page); + /* + * The try_to_unmap() in page reclaim path might reach here too, + * this may cause a race condition to corrupt deferred split queue. + * And, if page reclaim is already handling the same page, it is + * unnecessary to handle it again in shrinker. + * + * Check PageSwapCache to determine if the page is being + * handled by page reclaim since THP swap would add the page into + * swap cache before reaching try_to_unmap(). + */ + if (PageSwapCache(page)) + return; + spin_lock_irqsave(&ds_queue->split_queue_lock, flags); if (list_empty(page_deferred_list(page))) { count_vm_event(THP_DEFERRED_SPLIT_PAGE); diff --git a/mm/vmscan.c b/mm/vmscan.c index a0301ed..40c684a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1485,10 +1485,9 @@ static unsigned long shrink_page_list(struct list_head *page_list, * Is there need to periodically free_page_list? It would * appear not as the counts should be low */ - if (unlikely(PageTransHuge(page))) { - mem_cgroup_uncharge(page); + if (unlikely(PageTransHuge(page))) (*get_compound_page_dtor(page))(page); - } else + else list_add(&page->lru, &free_pages); continue; @@ -1909,7 +1908,6 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, if (unlikely(PageCompound(page))) { spin_unlock_irq(&pgdat->lru_lock); - mem_cgroup_uncharge(page); (*get_compound_page_dtor(page))(page); spin_lock_irq(&pgdat->lru_lock); } else [ 685.284254][ T3456] [ cut here ] [ 685.289616][ T3456] kernel BUG at lib/list_debug.c:53! [ 685.294808][ T3456] invalid opcode: [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 685.301998][ T3456] CPU: 5 PID: 3456 Comm: oom01 Tainted: GW 5.2.0-next-20190711+ #3 [ 685.311193][ T3456] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/24/2019 [ 685.320485][ T3456] RIP: 0010:__list_del_entry_valid+0x8b/0xb6 [ 685.326364][ T3456] Code: f1 e0 ff 49 8b 55 08 4c 39 e2 75 2c 5b b8 01 00 00 00 41 5c 41 5d 5d c3 4c 89 e2 48 89 de 48 c7 c7 c0 5a 73 a3 e8 d9 fa bc ff <0f> 0b 48 c7 c7 60 a0 e1 a3 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b [ 685.345956][ T3456] RSP: 0018:888e0c8a73c0 EFLAGS: 00010082 [ 685.351920][ T3456] RAX: 0
Re: list corruption in deferred_split_scan()
On Wed, 2019-07-10 at 17:43 -0400, Qian Cai wrote: > Running LTP oom01 test case with swap triggers a crash below. Revert the > series > "Make deferred split shrinker memcg aware" [1] seems fix the issue. You might want to look harder on this commit, as reverted it alone on the top of 5.2.0-next-20190711 fixed the issue. aefde94195ca mm: thp: make deferred split shrinker memcg aware [1] [1] https://lore.kernel.org/linux-mm/1561507361-59349-5-git-send-email-yang.shi@ linux.alibaba.com/ There are all console output while running LTP oom01 before the crash that might be useful. [ 656.302886][ T3384] WARNING: CPU: 79 PID: 3384 at mm/page_alloc.c:4608 __alloc_pages_nodemask+0x1a8a/0x1bc0 [ 656.304395][ T3409] kmemleak: Cannot allocate a kmemleak_object structure [ 656.312714][ T3384] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_amd kvm ses enclosure dax_pmem irqbypass dax_pmem_core efivars ip_tables x_tables xfs sd_mod smartpqi scsi_transport_sas mlx5_core tg3 libphy firmware_class dm_mirror dm_region_hash dm_log dm_mod efivarfs [ 656.320916][ T3409] kmemleak: Kernel memory leak detector disabled [ 656.344509][ T3384] CPU: 79 PID: 3384 Comm: oom01 Not tainted 5.2.0-next- 20190711+ #3 [ 656.344523][ T3384] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/24/2019 [ 656.352100][ T829] kmemleak: Automatic memory scanning thread ended [ 656.358648][ T3384] RIP: 0010:__alloc_pages_nodemask+0x1a8a/0x1bc0 [ 656.358658][ T3384] Code: 00 85 d2 0f 85 a1 00 00 00 48 c7 c7 e0 29 c3 a3 e8 3b 98 62 00 65 48 8b 1c 25 80 ee 01 00 e9 85 fa ff ff 0f 0b e9 3e fb ff ff <0f> 0b 48 8b b5 00 ff ff ff 8b 8d 84 fe ff ff 48 c7 c2 00 1d 6c a3 [ 656.358675][ T3384] RSP: :888efa4a6210 EFLAGS: 00010046 [ 656.406140][ T3384] RAX: RBX: RCX: a2b28be2 [ 656.414033][ T3384] RDX: RSI: dc00 RDI: a4d15d60 [ 656.421926][ T3384] RBP: 888efa4a6420 R08: fbfff49a2bad R09: fbfff49a2bac [ 656.429818][ T3384] R10: fbfff49a2bac R11: 0003 R12: a4d15d60 [ 656.437711][ T3384] R13: R14: 0800 R15: [ 656.445605][ T3384] FS: 7ff44adfc700() GS:889032f8() knlGS: [ 656.454459][ T3384] CS: 0010 DS: ES: CR0: 80050033 [ 656.460952][ T3384] CR2: 7ff2f05e1000 CR3: 001012e44000 CR4: 001406a0 [ 656.468843][ T3384] Call Trace: [ 656.472026][ T3384] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [ 656.477303][ T3384] ? stack_depot_save+0x215/0x58b [ 656.482228][ T3384] ? lock_downgrade+0x390/0x390 [ 656.486976][ T3384] ? stack_depot_save+0x183/0x58b [ 656.491900][ T3384] ? kasan_check_read+0x11/0x20 [ 656.496647][ T3384] ? do_raw_spin_unlock+0xa8/0x140 [ 656.501658][ T3384] ? stack_depot_save+0x215/0x58b [ 656.506582][ T3384] alloc_pages_current+0x9c/0x110 [ 656.511505][ T3384] allocate_slab+0x351/0x11f0 [ 656.516077][ T3384] ? kasan_slab_alloc+0x11/0x20 [ 656.520824][ T3384] new_slab+0x46/0x70 [ 656.524702][ T3384] ? pageout.isra.4+0x3e5/0xa00 [ 656.529449][ T3384] ___slab_alloc+0x5d4/0x9c0 [ 656.533933][ T3384] ? try_to_free_pages+0x242/0x4d0 [ 656.538941][ T3384] ? __alloc_pages_nodemask+0x9ce/0x1bc0 [ 656.544476][ T3384] ? alloc_pages_vma+0x89/0x2c0 [ 656.549226][ T3384] ? __do_page_fault+0x25b/0x5d0 [ 656.554064][ T3384] ? create_object+0x3a/0x3e0 [ 656.558637][ T3384] ? init_object+0x7e/0x90 [ 656.562947][ T3384] ? create_object+0x3a/0x3e0 [ 656.567520][ T3384] __slab_alloc+0x12/0x20 [ 656.571742][ T3384] ? __slab_alloc+0x12/0x20 [ 656.576142][ T3384] kmem_cache_alloc+0x32a/0x400 [ 656.580890][ T3384] create_object+0x3a/0x3e0 [ 656.585291][ T3384] ? stack_depot_save+0x183/0x58b [ 656.590215][ T3384] kmemleak_alloc+0x71/0xa0 [ 656.594611][ T3384] kmem_cache_alloc+0x272/0x400 [ 656.599361][ T3384] ? ___might_sleep+0xab/0xc0 [ 656.603934][ T3384] ? mempool_free+0x170/0x170 [ 656.608507][ T3384] mempool_alloc_slab+0x2d/0x40 [ 656.613254][ T3384] mempool_alloc+0x10a/0x29e [ 656.617739][ T3384] ? alloc_pages_vma+0x89/0x2c0 [ 656.622485][ T3384] ? mempool_resize+0x390/0x390 [ 656.627233][ T3384] ? __read_once_size_nocheck.constprop.2+0x10/0x10 [ 656.633730][ T3384] bio_alloc_bioset+0x150/0x330 [ 656.638477][ T3384] ? bvec_alloc+0x1b0/0x1b0 [ 656.642892][ T3384] alloc_io+0x2f/0x230 [dm_mod] [ 656.647654][ T3384] __split_and_process_bio+0x99/0x630 [dm_mod] [ 656.653714][ T3384] ? blk_rq_map_sg+0x9f0/0x9f0 [ 656.658388][ T3384] ? __send_empty_flush.constprop.11+0x1f0/0x1f0 [dm_mod] [ 656.665407][ T3384] ? check_chain_key+0x1df/0x2e0 [ 656.670244][ T3384] ? kasan_check_read+0x11/0x20 [ 656.674992][ T3384] ? blk_queue_split+0x60/0x90 [ 656.679654][ T3384] ? __blk_queue_split+0x970/0x970 [ 656.684679][ T3384] dm_process_bio+0x33f/0x520 [dm_mod] [ 656.690054][ T3384] ? __process_bio+0x230/0x230 [dm_mod] [ 65
Re: list corruption in deferred_split_scan()
On Thu, 2019-07-18 at 17:59 -0700, Yang Shi wrote: > > On 7/18/19 5:54 PM, Qian Cai wrote: > > > > > On Jul 12, 2019, at 3:12 PM, Yang Shi wrote: > > > > > > > > > > > > On 7/11/19 2:07 PM, Qian Cai wrote: > > > > On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote: > > > > > Hi Qian, > > > > > > > > > > > > > > > Thanks for reporting the issue. But, I can't reproduce it on my > > > > > machine. > > > > > Could you please share more details about your test? How often did you > > > > > run into this problem? > > > > > > > > I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 > > > > server. Here > > > > is some more information. > > > > > > > > # cat .config > > > > > > > > https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > > > > > > I tried your kernel config, but I still can't reproduce it. My compiler > > > doesn't have retpoline support, so CONFIG_RETPOLINE is disabled in my > > > test, but I don't think this would make any difference for this case. > > > > > > According to the bug call trace in the earlier email, it looks deferred > > > _split_scan lost race with put_compound_page. The put_compound_page would > > > call free_transhuge_page() which delete the page from the deferred split > > > queue, but it may still appear on the deferred list due to some reason. > > > > > > Would you please try the below patch? > > > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > > index b7f709d..66bd9db 100644 > > > --- a/mm/huge_memory.c > > > +++ b/mm/huge_memory.c > > > @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, > > > struct list_head *list) > > > if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { > > > if (!list_empty(page_deferred_list(head))) { > > > ds_queue->split_queue_len--; > > > - list_del(page_deferred_list(head)); > > > + list_del_init(page_deferred_list(head)); > > > } > > > if (mapping) > > > __dec_node_page_state(page, NR_SHMEM_THPS); > > > @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page) > > > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > > > if (!list_empty(page_deferred_list(page))) { > > > ds_queue->split_queue_len--; > > > - list_del(page_deferred_list(page)); > > > + list_del_init(page_deferred_list(page)); > > > } > > > spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > > > free_compound_page(page); > > > > Unfortunately, I am no longer be able to reproduce the original list > > corruption with today’s linux-next. > > It is because the patches have been dropped from -mm tree by Andrew due > to this problem I guess. You have to use next-20190711, or apply the > patches on today's linux-next. > The patch you have here does not help. Only applied the part for free_transhuge_page() as you requested. [ 375.006307][ T3580] list_del corruption. next->prev should be ea0030e10098, but was 888ea8d0cdb8 [ 375.015928][ T3580] [ cut here ] [ 375.021296][ T3580] kernel BUG at lib/list_debug.c:56! [ 375.026491][ T3580] invalid opcode: [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 375.033680][ T3580] CPU: 84 PID: 3580 Comm: oom01 Tainted: GW 5.2.0-next-20190711+ #2 [ 375.042964][ T3580] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/24/2019 [ 375.052256][ T3580] RIP: 0010:__list_del_entry_valid+0xa8/0xb6 [ 375.058135][ T3580] Code: de 48 c7 c7 c0 5a b3 b0 e8 b9 fa bc ff 0f 0b 48 c7 c7 60 a0 21 b1 e8 13 52 01 00 4c 89 e6 48 c7 c7 20 5b b3 b0 e8 9c fa bc ff <0f> 0b 48 c7 c7 20 a0 21 b1 e8 f6 51 01 00 4c 89 ea 48 89 de 48 c7 [ 375.077722][ T3580] RSP: 0018:888ebc4b73c0 EFLAGS: 00010082 [ 375.083684][ T3580] RAX: 0054 RBX: ea0030e10098 RCX: b015d728 [ 375.091566][ T3580] RDX: RSI: 0008 RDI: 88903263d380 [ 375.099448][ T3580] RBP: 888ebc4b73d8 R08: ed12064c7a71 R09: ed12064c7a70 [ 375.107330][ T3580] R10: ed12064c7a70 R11: 88903263d387 R12: ea0030e10098 [ 375.115212][ T3580] R13: ea0031d40098 R14: ea0030e10034 R15: ea0031d40098 [ 375.123095][ T3580] FS: 7fc3dc851700() GS:88903260() knlGS: [ 375.131937][ T3580] CS: 0010 DS: ES: CR0: 80050033 [ 375.138421][ T3580] CR2: 7fc25fa39000 CR3: 000884762000 CR4: 001406a0 [ 375.146301][ T3580] Call Trace: [ 375.149472][ T3580] deferred_split_scan+0x337/0x740 [ 375.154475][ T3580] ? split_huge_page_to_list+0xe30/0xe30 [ 375.160002][ T3580] ? __sched_text_start+0x8/0x8 [ 375.164743][ T3580] ? __radix_tree_lookup+0x12d/0x1e0 [ 375.169923][ T3580] do_shrink_slab+0x244/0x5a0 [ 375.174490][ T3580] shrink_slab+0x253/0x440 [ 375.178794][ T3
Re: list corruption in deferred_split_scan()
On 7/18/19 5:54 PM, Qian Cai wrote: On Jul 12, 2019, at 3:12 PM, Yang Shi wrote: On 7/11/19 2:07 PM, Qian Cai wrote: On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote: Hi Qian, Thanks for reporting the issue. But, I can't reproduce it on my machine. Could you please share more details about your test? How often did you run into this problem? I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 server. Here is some more information. # cat .config https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config I tried your kernel config, but I still can't reproduce it. My compiler doesn't have retpoline support, so CONFIG_RETPOLINE is disabled in my test, but I don't think this would make any difference for this case. According to the bug call trace in the earlier email, it looks deferred _split_scan lost race with put_compound_page. The put_compound_page would call free_transhuge_page() which delete the page from the deferred split queue, but it may still appear on the deferred list due to some reason. Would you please try the below patch? diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b7f709d..66bd9db 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { if (!list_empty(page_deferred_list(head))) { ds_queue->split_queue_len--; - list_del(page_deferred_list(head)); + list_del_init(page_deferred_list(head)); } if (mapping) __dec_node_page_state(page, NR_SHMEM_THPS); @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page) spin_lock_irqsave(&ds_queue->split_queue_lock, flags); if (!list_empty(page_deferred_list(page))) { ds_queue->split_queue_len--; - list_del(page_deferred_list(page)); + list_del_init(page_deferred_list(page)); } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); free_compound_page(page); Unfortunately, I am no longer be able to reproduce the original list corruption with today’s linux-next. It is because the patches have been dropped from -mm tree by Andrew due to this problem I guess. You have to use next-20190711, or apply the patches on today's linux-next.
Re: list corruption in deferred_split_scan()
> On Jul 12, 2019, at 3:12 PM, Yang Shi wrote: > > > > On 7/11/19 2:07 PM, Qian Cai wrote: >> On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote: >>> Hi Qian, >>> >>> >>> Thanks for reporting the issue. But, I can't reproduce it on my machine. >>> Could you please share more details about your test? How often did you >>> run into this problem? >> I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 server. >> Here >> is some more information. >> >> # cat .config >> >> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > > I tried your kernel config, but I still can't reproduce it. My compiler > doesn't have retpoline support, so CONFIG_RETPOLINE is disabled in my test, > but I don't think this would make any difference for this case. > > According to the bug call trace in the earlier email, it looks deferred > _split_scan lost race with put_compound_page. The put_compound_page would > call free_transhuge_page() which delete the page from the deferred split > queue, but it may still appear on the deferred list due to some reason. > > Would you please try the below patch? > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index b7f709d..66bd9db 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, struct > list_head *list) > if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { > if (!list_empty(page_deferred_list(head))) { > ds_queue->split_queue_len--; > - list_del(page_deferred_list(head)); > + list_del_init(page_deferred_list(head)); > } > if (mapping) > __dec_node_page_state(page, NR_SHMEM_THPS); > @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page) > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > if (!list_empty(page_deferred_list(page))) { > ds_queue->split_queue_len--; > - list_del(page_deferred_list(page)); > + list_del_init(page_deferred_list(page)); > } > spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > free_compound_page(page); Unfortunately, I am no longer be able to reproduce the original list corruption with today’s linux-next.
Re: list corruption in deferred_split_scan()
On 7/17/19 10:02 AM, Shakeel Butt wrote: On Tue, Jul 16, 2019 at 5:12 PM Yang Shi wrote: On 7/16/19 4:36 PM, Shakeel Butt wrote: Adding related people. The thread starts at: http://lkml.kernel.org/r/1562795006.8510.19.ca...@lca.pw On Mon, Jul 15, 2019 at 8:01 PM Yang Shi wrote: On 7/15/19 6:36 PM, Qian Cai wrote: On Jul 15, 2019, at 8:22 PM, Yang Shi wrote: On 7/15/19 2:23 PM, Qian Cai wrote: On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote: Another possible lead is that without reverting the those commits below, kdump kernel would always also crash in shrink_slab_memcg() at this line, map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't think of where nodeinfo was freed but memcg was still online. Maybe a check is needed: Actually, "memcg" is NULL. It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away. Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(), - if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg)) + if (!mem_cgroup_online(memcg)) return 0; Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as, if (mem_cgroup_disabled()) return NULL; Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled(). Thanks for figuring this out. I think we need add mem_cgroup_dsiabled() check before calling shrink_slab_memcg() as below: diff --git a/mm/vmscan.c b/mm/vmscan.c index a0301ed..2f03c61 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, unsigned long ret, freed = 0; struct shrinker *shrinker; - if (!mem_cgroup_is_root(memcg)) + if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) return shrink_slab_memcg(gfp_mask, nid, memcg, priority); if (!down_read_trylock(&shrinker_rwsem)) We were seeing unneeded oom-kills on kernels with "cgroup_disabled=memory" and Yang's patch series basically expose the bug to crash. I think the commit aeed1d325d42 ("mm/vmscan.c: generalize shrink_slab() calls in shrink_node()") missed the case for "cgroup_disabled=memory". However I am surprised that root_mem_cgroup is allocated even for "cgroup_disabled=memory" and it seems like css_alloc() is called even before checking if the corresponding controller is disabled. I'm surprised too. A quick test with drgn shows root memcg is definitely allocated: >>> prog['root_mem_cgroup'] *(struct mem_cgroup *)0x8902cf058000 = { [snip] But, isn't this a bug? It can be treated as a bug as this is not expected but we can discuss and take care of it later. I think we need your patch urgently as memory reclaim and /proc/sys/vm/drop_caches is broken for "cgroup_disabled=memory" kernel. So, please send your patch asap. Sure. I'm going to post the patch soon. thanks, Shakeel
Re: list corruption in deferred_split_scan()
On Tue, Jul 16, 2019 at 5:12 PM Yang Shi wrote: > > > > On 7/16/19 4:36 PM, Shakeel Butt wrote: > > Adding related people. > > > > The thread starts at: > > http://lkml.kernel.org/r/1562795006.8510.19.ca...@lca.pw > > > > On Mon, Jul 15, 2019 at 8:01 PM Yang Shi wrote: > >> > >> > >> On 7/15/19 6:36 PM, Qian Cai wrote: > On Jul 15, 2019, at 8:22 PM, Yang Shi wrote: > > > > On 7/15/19 2:23 PM, Qian Cai wrote: > > On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote: > >>> Another possible lead is that without reverting the those commits > >>> below, > >>> kdump > >>> kernel would always also crash in shrink_slab_memcg() at this line, > >>> > >>> map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, > >>> true); > >> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't > >> think of where nodeinfo was freed but memcg was still online. Maybe a > >> check is needed: > > Actually, "memcg" is NULL. > It sounds weird. shrink_slab() is called in mem_cgroup_iter which does > pin the memcg. So, the memcg should not go away. > >>> Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” > >>> changed this line in shrink_slab_memcg(), > >>> > >>> - if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg)) > >>> + if (!mem_cgroup_online(memcg)) > >>>return 0; > >>> > >>> Since the kdump kernel has the parameter “cgroup_disable=memory”, > >>> shrink_slab_memcg() will no longer be able to handle NULL memcg from > >>> mem_cgroup_iter() as, > >>> > >>> if (mem_cgroup_disabled()) > >>>return NULL; > >> Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled(). > >> Thanks for figuring this out. I think we need add mem_cgroup_dsiabled() > >> check before calling shrink_slab_memcg() as below: > >> > >> diff --git a/mm/vmscan.c b/mm/vmscan.c > >> index a0301ed..2f03c61 100644 > >> --- a/mm/vmscan.c > >> +++ b/mm/vmscan.c > >> @@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int > >> nid, > >> unsigned long ret, freed = 0; > >> struct shrinker *shrinker; > >> > >> - if (!mem_cgroup_is_root(memcg)) > >> + if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) > >> return shrink_slab_memcg(gfp_mask, nid, memcg, priority); > >> > >> if (!down_read_trylock(&shrinker_rwsem)) > >> > > We were seeing unneeded oom-kills on kernels with > > "cgroup_disabled=memory" and Yang's patch series basically expose the > > bug to crash. I think the commit aeed1d325d42 ("mm/vmscan.c: > > generalize shrink_slab() calls in shrink_node()") missed the case for > > "cgroup_disabled=memory". However I am surprised that root_mem_cgroup > > is allocated even for "cgroup_disabled=memory" and it seems like > > css_alloc() is called even before checking if the corresponding > > controller is disabled. > > I'm surprised too. A quick test with drgn shows root memcg is definitely > allocated: > > >>> prog['root_mem_cgroup'] > *(struct mem_cgroup *)0x8902cf058000 = { > [snip] > > But, isn't this a bug? It can be treated as a bug as this is not expected but we can discuss and take care of it later. I think we need your patch urgently as memory reclaim and /proc/sys/vm/drop_caches is broken for "cgroup_disabled=memory" kernel. So, please send your patch asap. thanks, Shakeel
Re: list corruption in deferred_split_scan()
On 7/16/19 4:36 PM, Shakeel Butt wrote: Adding related people. The thread starts at: http://lkml.kernel.org/r/1562795006.8510.19.ca...@lca.pw On Mon, Jul 15, 2019 at 8:01 PM Yang Shi wrote: On 7/15/19 6:36 PM, Qian Cai wrote: On Jul 15, 2019, at 8:22 PM, Yang Shi wrote: On 7/15/19 2:23 PM, Qian Cai wrote: On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote: Another possible lead is that without reverting the those commits below, kdump kernel would always also crash in shrink_slab_memcg() at this line, map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't think of where nodeinfo was freed but memcg was still online. Maybe a check is needed: Actually, "memcg" is NULL. It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away. Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(), - if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg)) + if (!mem_cgroup_online(memcg)) return 0; Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as, if (mem_cgroup_disabled()) return NULL; Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled(). Thanks for figuring this out. I think we need add mem_cgroup_dsiabled() check before calling shrink_slab_memcg() as below: diff --git a/mm/vmscan.c b/mm/vmscan.c index a0301ed..2f03c61 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, unsigned long ret, freed = 0; struct shrinker *shrinker; - if (!mem_cgroup_is_root(memcg)) + if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) return shrink_slab_memcg(gfp_mask, nid, memcg, priority); if (!down_read_trylock(&shrinker_rwsem)) We were seeing unneeded oom-kills on kernels with "cgroup_disabled=memory" and Yang's patch series basically expose the bug to crash. I think the commit aeed1d325d42 ("mm/vmscan.c: generalize shrink_slab() calls in shrink_node()") missed the case for "cgroup_disabled=memory". However I am surprised that root_mem_cgroup is allocated even for "cgroup_disabled=memory" and it seems like css_alloc() is called even before checking if the corresponding controller is disabled. I'm surprised too. A quick test with drgn shows root memcg is definitely allocated: >>> prog['root_mem_cgroup'] *(struct mem_cgroup *)0x8902cf058000 = { [snip] But, isn't this a bug? Thanks, Yang Yang, can you please send the above change with signed-off and CC to stable as well? thanks, Shakeel
Re: list corruption in deferred_split_scan()
Adding related people. The thread starts at: http://lkml.kernel.org/r/1562795006.8510.19.ca...@lca.pw On Mon, Jul 15, 2019 at 8:01 PM Yang Shi wrote: > > > > On 7/15/19 6:36 PM, Qian Cai wrote: > > > >> On Jul 15, 2019, at 8:22 PM, Yang Shi wrote: > >> > >> > >> > >> On 7/15/19 2:23 PM, Qian Cai wrote: > >>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote: > > Another possible lead is that without reverting the those commits below, > > kdump > > kernel would always also crash in shrink_slab_memcg() at this line, > > > > map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, > > true); > This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't > think of where nodeinfo was freed but memcg was still online. Maybe a > check is needed: > >>> Actually, "memcg" is NULL. > >> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin > >> the memcg. So, the memcg should not go away. > > Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” > > changed this line in shrink_slab_memcg(), > > > > - if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg)) > > + if (!mem_cgroup_online(memcg)) > > return 0; > > > > Since the kdump kernel has the parameter “cgroup_disable=memory”, > > shrink_slab_memcg() will no longer be able to handle NULL memcg from > > mem_cgroup_iter() as, > > > > if (mem_cgroup_disabled()) > > return NULL; > > Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled(). > Thanks for figuring this out. I think we need add mem_cgroup_dsiabled() > check before calling shrink_slab_memcg() as below: > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index a0301ed..2f03c61 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int > nid, > unsigned long ret, freed = 0; > struct shrinker *shrinker; > > - if (!mem_cgroup_is_root(memcg)) > + if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) > return shrink_slab_memcg(gfp_mask, nid, memcg, priority); > > if (!down_read_trylock(&shrinker_rwsem)) > We were seeing unneeded oom-kills on kernels with "cgroup_disabled=memory" and Yang's patch series basically expose the bug to crash. I think the commit aeed1d325d42 ("mm/vmscan.c: generalize shrink_slab() calls in shrink_node()") missed the case for "cgroup_disabled=memory". However I am surprised that root_mem_cgroup is allocated even for "cgroup_disabled=memory" and it seems like css_alloc() is called even before checking if the corresponding controller is disabled. Yang, can you please send the above change with signed-off and CC to stable as well? thanks, Shakeel
Re: list corruption in deferred_split_scan()
On 7/15/19 6:36 PM, Qian Cai wrote: On Jul 15, 2019, at 8:22 PM, Yang Shi wrote: On 7/15/19 2:23 PM, Qian Cai wrote: On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote: Another possible lead is that without reverting the those commits below, kdump kernel would always also crash in shrink_slab_memcg() at this line, map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't think of where nodeinfo was freed but memcg was still online. Maybe a check is needed: Actually, "memcg" is NULL. It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away. Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(), - if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg)) + if (!mem_cgroup_online(memcg)) return 0; Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as, if (mem_cgroup_disabled()) return NULL; Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled(). Thanks for figuring this out. I think we need add mem_cgroup_dsiabled() check before calling shrink_slab_memcg() as below: diff --git a/mm/vmscan.c b/mm/vmscan.c index a0301ed..2f03c61 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, unsigned long ret, freed = 0; struct shrinker *shrinker; - if (!mem_cgroup_is_root(memcg)) + if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg)) return shrink_slab_memcg(gfp_mask, nid, memcg, priority); if (!down_read_trylock(&shrinker_rwsem)) diff --git a/mm/vmscan.c b/mm/vmscan.c index a0301ed..bacda49 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, if (!mem_cgroup_online(memcg)) return 0; + if (!memcg->nodeinfo[nid]) + return 0; + if (!down_read_trylock(&shrinker_rwsem)) return 0; [9.072036][T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440 [9.072036][T1] Read of size 8 at addr 0dc8 by task swapper/0/1 [9.072036][T1] [9.072036][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next- 20190711+ #10 [9.072036][T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 01/25/2019 [9.072036][T1] Call Trace: [9.072036][T1] dump_stack+0x62/0x9a [9.072036][T1] __kasan_report.cold.4+0xb0/0xb4 [9.072036][T1] ? unwind_get_return_address+0x40/0x50 [9.072036][T1] ? shrink_slab+0x111/0x440 [9.072036][T1] kasan_report+0xc/0xe [9.072036][T1] __asan_load8+0x71/0xa0 [9.072036][T1] shrink_slab+0x111/0x440 [9.072036][T1] ? mem_cgroup_iter+0x98/0x840 [9.072036][T1] ? unregister_shrinker+0x110/0x110 [9.072036][T1] ? kasan_check_read+0x11/0x20 [9.072036][T1] ? mem_cgroup_protected+0x39/0x260 [9.072036][T1] shrink_node+0x31e/0xa30 [9.072036][T1] ? shrink_node_memcg+0x1560/0x1560 [9.072036][T1] ? ktime_get+0x93/0x110 [9.072036][T1] do_try_to_free_pages+0x22f/0x820 [9.072036][T1] ? shrink_node+0xa30/0xa30 [9.072036][T1] ? kasan_check_read+0x11/0x20 [9.072036][T1] ? check_chain_key+0x1df/0x2e0 [9.072036][T1] try_to_free_pages+0x242/0x4d0 [9.072036][T1] ? do_try_to_free_pages+0x820/0x820 [9.072036][T1] __alloc_pages_nodemask+0x9ce/0x1bc0 [9.072036][T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [9.072036][T1] ? unwind_dump+0x260/0x260 [9.072036][T1] ? kernel_text_address+0x33/0xc0 [9.072036][T1] ? arch_stack_walk+0x8f/0xf0 [9.072036][T1] ? ret_from_fork+0x22/0x40 [9.072036][T1] alloc_page_interleave+0x18/0x130 [9.072036][T1] alloc_pages_current+0xf6/0x110 [9.072036][T1] allocate_slab+0x600/0x11f0 [9.072036][T1] new_slab+0x46/0x70 [9.072036][T1] ___slab_alloc+0x5d4/0x9c0 [9.072036][T1] ? create_object+0x3a/0x3e0 [9.072036][T1] ? fs_reclaim_acquire.part.15+0x5/0x30 [9.072036][T1] ? ___might_sleep+0xab/0xc0 [9.072036][T1] ? create_object+0x3a/0x3e0 [9.072036][T1] __slab_alloc+0x12/0x20 [9.072036][T1] ? __slab_alloc+0x12/0x20 [9.072036][T1] kmem_cache_alloc+0x32a/0x400 [9.072036][T1] create_object+0x3a/0x3e0 [9.072036][T1] kmemleak_alloc+0x71/0xa0 [9.072036][T1] kmem_cache_alloc+0x272/0x400 [9.072036][T1] ? kasan_check_read+0x11/0x20 [9.072036][T1] ? do_raw_spin_unlock+0xa8/0x140 [9.072036][T1] acpi_ps_alloc_op+0
Re: list corruption in deferred_split_scan()
> On Jul 15, 2019, at 8:22 PM, Yang Shi wrote: > > > > On 7/15/19 2:23 PM, Qian Cai wrote: >> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote: Another possible lead is that without reverting the those commits below, kdump kernel would always also crash in shrink_slab_memcg() at this line, map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); >>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't >>> think of where nodeinfo was freed but memcg was still online. Maybe a >>> check is needed: >> Actually, "memcg" is NULL. > > It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin > the memcg. So, the memcg should not go away. Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed this line in shrink_slab_memcg(), - if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg)) + if (!mem_cgroup_online(memcg)) return 0; Since the kdump kernel has the parameter “cgroup_disable=memory”, shrink_slab_memcg() will no longer be able to handle NULL memcg from mem_cgroup_iter() as, if (mem_cgroup_disabled()) return NULL; > >> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>> index a0301ed..bacda49 100644 >>> --- a/mm/vmscan.c >>> +++ b/mm/vmscan.c >>> @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t >>> gfp_mask, int nid, >>> if (!mem_cgroup_online(memcg)) >>> return 0; >>> >>> + if (!memcg->nodeinfo[nid]) >>> + return 0; >>> + >>> if (!down_read_trylock(&shrinker_rwsem)) >>> return 0; >>> [9.072036][T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440 [9.072036][T1] Read of size 8 at addr 0dc8 by task swapper/0/1 [9.072036][T1] [9.072036][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next- 20190711+ #10 [9.072036][T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 01/25/2019 [9.072036][T1] Call Trace: [9.072036][T1] dump_stack+0x62/0x9a [9.072036][T1] __kasan_report.cold.4+0xb0/0xb4 [9.072036][T1] ? unwind_get_return_address+0x40/0x50 [9.072036][T1] ? shrink_slab+0x111/0x440 [9.072036][T1] kasan_report+0xc/0xe [9.072036][T1] __asan_load8+0x71/0xa0 [9.072036][T1] shrink_slab+0x111/0x440 [9.072036][T1] ? mem_cgroup_iter+0x98/0x840 [9.072036][T1] ? unregister_shrinker+0x110/0x110 [9.072036][T1] ? kasan_check_read+0x11/0x20 [9.072036][T1] ? mem_cgroup_protected+0x39/0x260 [9.072036][T1] shrink_node+0x31e/0xa30 [9.072036][T1] ? shrink_node_memcg+0x1560/0x1560 [9.072036][T1] ? ktime_get+0x93/0x110 [9.072036][T1] do_try_to_free_pages+0x22f/0x820 [9.072036][T1] ? shrink_node+0xa30/0xa30 [9.072036][T1] ? kasan_check_read+0x11/0x20 [9.072036][T1] ? check_chain_key+0x1df/0x2e0 [9.072036][T1] try_to_free_pages+0x242/0x4d0 [9.072036][T1] ? do_try_to_free_pages+0x820/0x820 [9.072036][T1] __alloc_pages_nodemask+0x9ce/0x1bc0 [9.072036][T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [9.072036][T1] ? unwind_dump+0x260/0x260 [9.072036][T1] ? kernel_text_address+0x33/0xc0 [9.072036][T1] ? arch_stack_walk+0x8f/0xf0 [9.072036][T1] ? ret_from_fork+0x22/0x40 [9.072036][T1] alloc_page_interleave+0x18/0x130 [9.072036][T1] alloc_pages_current+0xf6/0x110 [9.072036][T1] allocate_slab+0x600/0x11f0 [9.072036][T1] new_slab+0x46/0x70 [9.072036][T1] ___slab_alloc+0x5d4/0x9c0 [9.072036][T1] ? create_object+0x3a/0x3e0 [9.072036][T1] ? fs_reclaim_acquire.part.15+0x5/0x30 [9.072036][T1] ? ___might_sleep+0xab/0xc0 [9.072036][T1] ? create_object+0x3a/0x3e0 [9.072036][T1] __slab_alloc+0x12/0x20 [9.072036][T1] ? __slab_alloc+0x12/0x20 [9.072036][T1] kmem_cache_alloc+0x32a/0x400 [9.072036][T1] create_object+0x3a/0x3e0 [9.072036][T1] kmemleak_alloc+0x71/0xa0 [9.072036][T1] kmem_cache_alloc+0x272/0x400 [9.072036][T1] ? kasan_check_read+0x11/0x20 [9.072036][T1] ? do_raw_spin_unlock+0xa8/0x140 [9.072036][T1] acpi_ps_alloc_op+0x76/0x122 [9.072036][T1] acpi_ds_execute_arguments+0x2f/0x18d [9.072036][T1] acpi_ds_get_package_arguments+0x7d/0x84 [9.072036][T1] acpi_ns_init_one_package+0x33/0x61 [9.072036][T1] acpi_ns_init_one_object+0xfc/0x189 [9.072036][T1] acpi_ns_walk_namespace+0x114/0x1f
Re: list corruption in deferred_split_scan()
On 7/15/19 2:23 PM, Qian Cai wrote: On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote: Another possible lead is that without reverting the those commits below, kdump kernel would always also crash in shrink_slab_memcg() at this line, map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't think of where nodeinfo was freed but memcg was still online. Maybe a check is needed: Actually, "memcg" is NULL. It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the memcg. So, the memcg should not go away. diff --git a/mm/vmscan.c b/mm/vmscan.c index a0301ed..bacda49 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, if (!mem_cgroup_online(memcg)) return 0; + if (!memcg->nodeinfo[nid]) + return 0; + if (!down_read_trylock(&shrinker_rwsem)) return 0; [9.072036][T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440 [9.072036][T1] Read of size 8 at addr 0dc8 by task swapper/0/1 [9.072036][T1] [9.072036][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next- 20190711+ #10 [9.072036][T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 01/25/2019 [9.072036][T1] Call Trace: [9.072036][T1] dump_stack+0x62/0x9a [9.072036][T1] __kasan_report.cold.4+0xb0/0xb4 [9.072036][T1] ? unwind_get_return_address+0x40/0x50 [9.072036][T1] ? shrink_slab+0x111/0x440 [9.072036][T1] kasan_report+0xc/0xe [9.072036][T1] __asan_load8+0x71/0xa0 [9.072036][T1] shrink_slab+0x111/0x440 [9.072036][T1] ? mem_cgroup_iter+0x98/0x840 [9.072036][T1] ? unregister_shrinker+0x110/0x110 [9.072036][T1] ? kasan_check_read+0x11/0x20 [9.072036][T1] ? mem_cgroup_protected+0x39/0x260 [9.072036][T1] shrink_node+0x31e/0xa30 [9.072036][T1] ? shrink_node_memcg+0x1560/0x1560 [9.072036][T1] ? ktime_get+0x93/0x110 [9.072036][T1] do_try_to_free_pages+0x22f/0x820 [9.072036][T1] ? shrink_node+0xa30/0xa30 [9.072036][T1] ? kasan_check_read+0x11/0x20 [9.072036][T1] ? check_chain_key+0x1df/0x2e0 [9.072036][T1] try_to_free_pages+0x242/0x4d0 [9.072036][T1] ? do_try_to_free_pages+0x820/0x820 [9.072036][T1] __alloc_pages_nodemask+0x9ce/0x1bc0 [9.072036][T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [9.072036][T1] ? unwind_dump+0x260/0x260 [9.072036][T1] ? kernel_text_address+0x33/0xc0 [9.072036][T1] ? arch_stack_walk+0x8f/0xf0 [9.072036][T1] ? ret_from_fork+0x22/0x40 [9.072036][T1] alloc_page_interleave+0x18/0x130 [9.072036][T1] alloc_pages_current+0xf6/0x110 [9.072036][T1] allocate_slab+0x600/0x11f0 [9.072036][T1] new_slab+0x46/0x70 [9.072036][T1] ___slab_alloc+0x5d4/0x9c0 [9.072036][T1] ? create_object+0x3a/0x3e0 [9.072036][T1] ? fs_reclaim_acquire.part.15+0x5/0x30 [9.072036][T1] ? ___might_sleep+0xab/0xc0 [9.072036][T1] ? create_object+0x3a/0x3e0 [9.072036][T1] __slab_alloc+0x12/0x20 [9.072036][T1] ? __slab_alloc+0x12/0x20 [9.072036][T1] kmem_cache_alloc+0x32a/0x400 [9.072036][T1] create_object+0x3a/0x3e0 [9.072036][T1] kmemleak_alloc+0x71/0xa0 [9.072036][T1] kmem_cache_alloc+0x272/0x400 [9.072036][T1] ? kasan_check_read+0x11/0x20 [9.072036][T1] ? do_raw_spin_unlock+0xa8/0x140 [9.072036][T1] acpi_ps_alloc_op+0x76/0x122 [9.072036][T1] acpi_ds_execute_arguments+0x2f/0x18d [9.072036][T1] acpi_ds_get_package_arguments+0x7d/0x84 [9.072036][T1] acpi_ns_init_one_package+0x33/0x61 [9.072036][T1] acpi_ns_init_one_object+0xfc/0x189 [9.072036][T1] acpi_ns_walk_namespace+0x114/0x1f2 [9.072036][T1] ? acpi_ns_init_one_package+0x61/0x61 [9.072036][T1] ? acpi_ns_init_one_package+0x61/0x61 [9.072036][T1] acpi_walk_namespace+0x9e/0xcb [9.072036][T1] ? acpi_sleep_proc_init+0x36/0x36 [9.072036][T1] acpi_ns_initialize_objects+0x99/0xed [9.072036][T1] ? acpi_ns_find_ini_methods+0xa2/0xa2 [9.072036][T1] ? acpi_tb_load_namespace+0x2dc/0x2eb [9.072036][T1] acpi_load_tables+0x61/0x80 [9.072036][T1] acpi_init+0x10d/0x44b [9.072036][T1] ? acpi_sleep_proc_init+0x36/0x36 [9.072036][T1] ? bus_uevent_filter+0x16/0x30 [9.072036][T1] ? kobject_uevent_env+0x109/0x980 [9.072036][T1] ? kernfs_get+0x13/0x20 [9.072036][T1] ? kobject_uevent+0xb/0x10 [9.072036][T1] ? kset_register+0x31/0x50 [9.072036][T1] ? kset_create_and_add+0x9f/0xd0 [9.072036][T1] ? acpi_sleep_proc_init+0x3
Re: list corruption in deferred_split_scan()
On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote: > > Another possible lead is that without reverting the those commits below, > > kdump > > kernel would always also crash in shrink_slab_memcg() at this line, > > > > map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); > > This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't > think of where nodeinfo was freed but memcg was still online. Maybe a > check is needed: Actually, "memcg" is NULL. > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index a0301ed..bacda49 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t > gfp_mask, int nid, > if (!mem_cgroup_online(memcg)) > return 0; > > + if (!memcg->nodeinfo[nid]) > + return 0; > + > if (!down_read_trylock(&shrinker_rwsem)) > return 0; > > > > > [9.072036][T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440 > > [9.072036][T1] Read of size 8 at addr 0dc8 by task > > swapper/0/1 > > [9.072036][T1] > > [9.072036][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next- > > 20190711+ #10 > > [9.072036][T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant > > DL385 > > Gen10, BIOS A40 01/25/2019 > > [9.072036][T1] Call Trace: > > [9.072036][T1] dump_stack+0x62/0x9a > > [9.072036][T1] __kasan_report.cold.4+0xb0/0xb4 > > [9.072036][T1] ? unwind_get_return_address+0x40/0x50 > > [9.072036][T1] ? shrink_slab+0x111/0x440 > > [9.072036][T1] kasan_report+0xc/0xe > > [9.072036][T1] __asan_load8+0x71/0xa0 > > [9.072036][T1] shrink_slab+0x111/0x440 > > [9.072036][T1] ? mem_cgroup_iter+0x98/0x840 > > [9.072036][T1] ? unregister_shrinker+0x110/0x110 > > [9.072036][T1] ? kasan_check_read+0x11/0x20 > > [9.072036][T1] ? mem_cgroup_protected+0x39/0x260 > > [9.072036][T1] shrink_node+0x31e/0xa30 > > [9.072036][T1] ? shrink_node_memcg+0x1560/0x1560 > > [9.072036][T1] ? ktime_get+0x93/0x110 > > [9.072036][T1] do_try_to_free_pages+0x22f/0x820 > > [9.072036][T1] ? shrink_node+0xa30/0xa30 > > [9.072036][T1] ? kasan_check_read+0x11/0x20 > > [9.072036][T1] ? check_chain_key+0x1df/0x2e0 > > [9.072036][T1] try_to_free_pages+0x242/0x4d0 > > [9.072036][T1] ? do_try_to_free_pages+0x820/0x820 > > [9.072036][T1] __alloc_pages_nodemask+0x9ce/0x1bc0 > > [9.072036][T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 > > [9.072036][T1] ? unwind_dump+0x260/0x260 > > [9.072036][T1] ? kernel_text_address+0x33/0xc0 > > [9.072036][T1] ? arch_stack_walk+0x8f/0xf0 > > [9.072036][T1] ? ret_from_fork+0x22/0x40 > > [9.072036][T1] alloc_page_interleave+0x18/0x130 > > [9.072036][T1] alloc_pages_current+0xf6/0x110 > > [9.072036][T1] allocate_slab+0x600/0x11f0 > > [9.072036][T1] new_slab+0x46/0x70 > > [9.072036][T1] ___slab_alloc+0x5d4/0x9c0 > > [9.072036][T1] ? create_object+0x3a/0x3e0 > > [9.072036][T1] ? fs_reclaim_acquire.part.15+0x5/0x30 > > [9.072036][T1] ? ___might_sleep+0xab/0xc0 > > [9.072036][T1] ? create_object+0x3a/0x3e0 > > [9.072036][T1] __slab_alloc+0x12/0x20 > > [9.072036][T1] ? __slab_alloc+0x12/0x20 > > [9.072036][T1] kmem_cache_alloc+0x32a/0x400 > > [9.072036][T1] create_object+0x3a/0x3e0 > > [9.072036][T1] kmemleak_alloc+0x71/0xa0 > > [9.072036][T1] kmem_cache_alloc+0x272/0x400 > > [9.072036][T1] ? kasan_check_read+0x11/0x20 > > [9.072036][T1] ? do_raw_spin_unlock+0xa8/0x140 > > [9.072036][T1] acpi_ps_alloc_op+0x76/0x122 > > [9.072036][T1] acpi_ds_execute_arguments+0x2f/0x18d > > [9.072036][T1] acpi_ds_get_package_arguments+0x7d/0x84 > > [9.072036][T1] acpi_ns_init_one_package+0x33/0x61 > > [9.072036][T1] acpi_ns_init_one_object+0xfc/0x189 > > [9.072036][T1] acpi_ns_walk_namespace+0x114/0x1f2 > > [9.072036][T1] ? acpi_ns_init_one_package+0x61/0x61 > > [9.072036][T1] ? acpi_ns_init_one_package+0x61/0x61 > > [9.072036][T1] acpi_walk_namespace+0x9e/0xcb > > [9.072036][T1] ? acpi_sleep_proc_init+0x36/0x36 > > [9.072036][T1] acpi_ns_initialize_objects+0x99/0xed > > [9.072036][T1] ? acpi_ns_find_ini_methods+0xa2/0xa2 > > [9.072036][T1] ? acpi_tb_load_namespace+0x2dc/0x2eb > > [9.072036][T1] acpi_load_tables+0x61/0x80 > > [9.072036][T1] acpi_init+0x10d/0x44b > > [9.072036][T1] ? acpi_sleep_proc_init+0x36/0x36 > > [9.072036][T1] ? bus_uevent_filter+0x16/0x30 > > [9.072036][T1] ? kobject_uevent_env+0x109/0x980 > > [9.072036][T1] ? kernfs_get+0x13/0x20 > > [9.072036][
Re: list corruption in deferred_split_scan()
On 7/13/19 8:53 PM, Hillf Danton wrote: On Wed, 10 Jul 2019 14:43:28 -0700 (PDT) Qian Cai wrote: Running LTP oom01 test case with swap triggers a crash below. Revert the series "Make deferred split shrinker memcg aware" [1] seems fix the issue. aefde94195ca mm: thp: make deferred split shrinker memcg aware cf402211cacc mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem 1c0af4b86bcf mm: move mem_cgroup_uncharge out of __page_cache_release() 4e050f2df876 mm: thp: extract split_queue_* into a struct [1] https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang@linux.alibaba.com/ [ 1145.730682][ T5764] list_del corruption, ea00251c8098->next is LIST_POISON1 (dead0100) [ 1145.739763][ T5764] [ cut here ] [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47! [ 1145.750320][ T5764] invalid opcode: [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted: GW 5.2.0-next-20190710+ #7 [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 01/25/2019 [ 1145.776000][ T5764] RIP: 0010:__list_del_entry_valid.cold.0+0x12/0x4a [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 c7 c7 80 9e a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c fe bc ff <0f> 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7 [ 1145.802078][ T5764] RSP: 0018:888514d773c0 EFLAGS: 00010082 [ 1145.808042][ T5764] RAX: 004e RBX: ea00251c8098 RCX: ae95d318 [ 1145.815923][ T5764] RDX: RSI: 0008 RDI: 440bd380 [ 1145.823806][ T5764] RBP: 888514d773d8 R08: ed1108817a71 R09: ed1108817a70 [ 1145.831689][ T5764] R10: ed1108817a70 R11: 440bd387 R12: dead0122 [ 1145.839571][ T5764] R13: dead0100 R14: ea00251c8034 R15: dead0100 [ 1145.847455][ T5764] FS: 7f765ad4d700() GS:4408() knlGS: [ 1145.856299][ T5764] CS: 0010 DS: ES: CR0: 80050033 [ 1145.862784][ T5764] CR2: 7f8cebec7000 CR3: 000459338000 CR4: 001406a0 [ 1145.870664][ T5764] Call Trace: [ 1145.873835][ T5764] deferred_split_scan+0x337/0x740 [ 1145.878835][ T5764] ? split_huge_page_to_list+0xe30/0xe30 [ 1145.884364][ T5764] ? __radix_tree_lookup+0x12d/0x1e0 [ 1145.889539][ T5764] ? node_tag_get.part.0.constprop.6+0x40/0x40 [ 1145.895592][ T5764] do_shrink_slab+0x244/0x5a0 [ 1145.900159][ T5764] shrink_slab+0x253/0x440 [ 1145.904462][ T5764] ? unregister_shrinker+0x110/0x110 [ 1145.909641][ T5764] ? kasan_check_read+0x11/0x20 [ 1145.914383][ T5764] ? mem_cgroup_protected+0x20f/0x260 [ 1145.919645][ T5764] shrink_node+0x31e/0xa30 [ 1145.923949][ T5764] ? shrink_node_memcg+0x1560/0x1560 [ 1145.929126][ T5764] ? ktime_get+0x93/0x110 [ 1145.933340][ T5764] do_try_to_free_pages+0x22f/0x820 [ 1145.938429][ T5764] ? shrink_node+0xa30/0xa30 [ 1145.942906][ T5764] ? kasan_check_read+0x11/0x20 [ 1145.947647][ T5764] ? check_chain_key+0x1df/0x2e0 [ 1145.952474][ T5764] try_to_free_pages+0x242/0x4d0 [ 1145.957299][ T5764] ? do_try_to_free_pages+0x820/0x820 [ 1145.962566][ T5764] __alloc_pages_nodemask+0x9ce/0x1bc0 [ 1145.967917][ T5764] ? kasan_check_read+0x11/0x20 [ 1145.972657][ T5764] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [ 1145.977920][ T5764] ? kasan_check_read+0x11/0x20 [ 1145.982659][ T5764] ? check_chain_key+0x1df/0x2e0 [ 1145.987487][ T5764] ? do_anonymous_page+0x343/0xe30 [ 1145.992489][ T5764] ? lock_downgrade+0x390/0x390 [ 1145.997230][ T5764] ? __count_memcg_events+0x8b/0x1c0 [ 1146.002404][ T5764] ? kasan_check_read+0x11/0x20 [ 1146.007145][ T5764] ? __lru_cache_add+0x122/0x160 [ 1146.011974][ T5764] alloc_pages_vma+0x89/0x2c0 [ 1146.016538][ T5764] do_anonymous_page+0x3e1/0xe30 [ 1146.021367][ T5764] ? __update_load_avg_cfs_rq+0x2c/0x490 [ 1146.026893][ T5764] ? finish_fault+0x120/0x120 [ 1146.031461][ T5764] ? call_function_interrupt+0xa/0x20 [ 1146.036724][ T5764] handle_pte_fault+0x457/0x12c0 [ 1146.041552][ T5764] __handle_mm_fault+0x79a/0xa50 [ 1146.046378][ T5764] ? vmf_insert_mixed_mkwrite+0x20/0x20 [ 1146.051817][ T5764] ? kasan_check_read+0x11/0x20 [ 1146.056557][ T5764] ? __count_memcg_events+0x8b/0x1c0 [ 1146.061732][ T5764] handle_mm_fault+0x17f/0x370 [ 1146.066386][ T5764] __do_page_fault+0x25b/0x5d0 [ 1146.071037][ T5764] do_page_fault+0x4c/0x2cf [ 1146.075426][ T5764] ? page_fault+0x5/0x20 [ 1146.079553][ T5764] page_fault+0x1b/0x20 [ 1146.083594][ T5764] RIP: 0033:0x410be0 [ 1146.087373][ T5764] Code: 89 de e8 e3 23 ff ff 48 83 f8 ff 0f 84 86 00 00 00 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 95 29 ff ff 31 d2 48 98 90 44 1
Re: list corruption in deferred_split_scan()
On 7/12/19 12:12 PM, Yang Shi wrote: On 7/11/19 2:07 PM, Qian Cai wrote: On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote: Hi Qian, Thanks for reporting the issue. But, I can't reproduce it on my machine. Could you please share more details about your test? How often did you run into this problem? I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 server. Here is some more information. # cat .config https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config I tried your kernel config, but I still can't reproduce it. My compiler doesn't have retpoline support, so CONFIG_RETPOLINE is disabled in my test, but I don't think this would make any difference for this case. According to the bug call trace in the earlier email, it looks deferred _split_scan lost race with put_compound_page. The put_compound_page would call free_transhuge_page() which delete the page from the deferred split queue, but it may still appear on the deferred list due to some reason. Would you please try the below patch? diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b7f709d..66bd9db 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { if (!list_empty(page_deferred_list(head))) { ds_queue->split_queue_len--; - list_del(page_deferred_list(head)); + list_del_init(page_deferred_list(head)); This line should not be changed. Please just apply the below part. } if (mapping) __dec_node_page_state(page, NR_SHMEM_THPS); @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page) spin_lock_irqsave(&ds_queue->split_queue_lock, flags); if (!list_empty(page_deferred_list(page))) { ds_queue->split_queue_len--; - list_del(page_deferred_list(page)); + list_del_init(page_deferred_list(page)); } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); free_compound_page(page); # numactl -H available: 8 nodes (0-7) node 0 cpus: 0 1 2 3 4 5 6 7 64 65 66 67 68 69 70 71 node 0 size: 19984 MB node 0 free: 7251 MB node 1 cpus: 8 9 10 11 12 13 14 15 72 73 74 75 76 77 78 79 node 1 size: 0 MB node 1 free: 0 MB node 2 cpus: 16 17 18 19 20 21 22 23 80 81 82 83 84 85 86 87 node 2 size: 0 MB node 2 free: 0 MB node 3 cpus: 24 25 26 27 28 29 30 31 88 89 90 91 92 93 94 95 node 3 size: 0 MB node 3 free: 0 MB node 4 cpus: 32 33 34 35 36 37 38 39 96 97 98 99 100 101 102 103 node 4 size: 31524 MB node 4 free: 25165 MB node 5 cpus: 40 41 42 43 44 45 46 47 104 105 106 107 108 109 110 111 node 5 size: 0 MB node 5 free: 0 MB node 6 cpus: 48 49 50 51 52 53 54 55 112 113 114 115 116 117 118 119 node 6 size: 0 MB node 6 free: 0 MB node 7 cpus: 56 57 58 59 60 61 62 63 120 121 122 123 124 125 126 127 node 7 size: 0 MB node 7 free: 0 MB node distances: node 0 1 2 3 4 5 6 7 0: 10 16 16 16 32 32 32 32 1: 16 10 16 16 32 32 32 32 2: 16 16 10 16 32 32 32 32 3: 16 16 16 10 32 32 32 32 4: 32 32 32 32 10 16 16 16 5: 32 32 32 32 16 10 16 16 6: 32 32 32 32 16 16 10 16 7: 32 32 32 32 16 16 16 10 # lscpu Architecture:x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 2 Core(s) per socket: 32 Socket(s): 2 NUMA node(s):8 Vendor ID: AuthenticAMD CPU family: 23 Model: 1 Model name: AMD EPYC 7601 32-Core Processor Stepping:2 CPU MHz: 2713.551 BogoMIPS:4391.39 Virtualization: AMD-V L1d cache: 32K L1i cache: 64K L2 cache:512K L3 cache:8192K NUMA node0 CPU(s): 0-7,64-71 NUMA node1 CPU(s): 8-15,72-79 NUMA node2 CPU(s): 16-23,80-87 NUMA node3 CPU(s): 24-31,88-95 NUMA node4 CPU(s): 32-39,96-103 NUMA node5 CPU(s): 40-47,104-111 NUMA node6 CPU(s): 48-55,112-119 NUMA node7 CPU(s): 56-63,120-127 Another possible lead is that without reverting the those commits below, kdump kernel would always also crash in shrink_slab_memcg() at this line, map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't think of where nodeinfo was freed but memcg was still online. Maybe a check is needed: diff --git a/mm/vmscan.c b/mm/vmscan.c index a0301ed..bacda49 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, if (!mem_cgroup_online(memcg)) return 0; + if (!memcg->nodeinfo[nid]) + return 0; + if
Re: list corruption in deferred_split_scan()
On 7/11/19 2:07 PM, Qian Cai wrote: On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote: Hi Qian, Thanks for reporting the issue. But, I can't reproduce it on my machine. Could you please share more details about your test? How often did you run into this problem? I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 server. Here is some more information. # cat .config https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config I tried your kernel config, but I still can't reproduce it. My compiler doesn't have retpoline support, so CONFIG_RETPOLINE is disabled in my test, but I don't think this would make any difference for this case. According to the bug call trace in the earlier email, it looks deferred _split_scan lost race with put_compound_page. The put_compound_page would call free_transhuge_page() which delete the page from the deferred split queue, but it may still appear on the deferred list due to some reason. Would you please try the below patch? diff --git a/mm/huge_memory.c b/mm/huge_memory.c index b7f709d..66bd9db 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2765,7 +2765,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) { if (!list_empty(page_deferred_list(head))) { ds_queue->split_queue_len--; - list_del(page_deferred_list(head)); + list_del_init(page_deferred_list(head)); } if (mapping) __dec_node_page_state(page, NR_SHMEM_THPS); @@ -2814,7 +2814,7 @@ void free_transhuge_page(struct page *page) spin_lock_irqsave(&ds_queue->split_queue_lock, flags); if (!list_empty(page_deferred_list(page))) { ds_queue->split_queue_len--; - list_del(page_deferred_list(page)); + list_del_init(page_deferred_list(page)); } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); free_compound_page(page); # numactl -H available: 8 nodes (0-7) node 0 cpus: 0 1 2 3 4 5 6 7 64 65 66 67 68 69 70 71 node 0 size: 19984 MB node 0 free: 7251 MB node 1 cpus: 8 9 10 11 12 13 14 15 72 73 74 75 76 77 78 79 node 1 size: 0 MB node 1 free: 0 MB node 2 cpus: 16 17 18 19 20 21 22 23 80 81 82 83 84 85 86 87 node 2 size: 0 MB node 2 free: 0 MB node 3 cpus: 24 25 26 27 28 29 30 31 88 89 90 91 92 93 94 95 node 3 size: 0 MB node 3 free: 0 MB node 4 cpus: 32 33 34 35 36 37 38 39 96 97 98 99 100 101 102 103 node 4 size: 31524 MB node 4 free: 25165 MB node 5 cpus: 40 41 42 43 44 45 46 47 104 105 106 107 108 109 110 111 node 5 size: 0 MB node 5 free: 0 MB node 6 cpus: 48 49 50 51 52 53 54 55 112 113 114 115 116 117 118 119 node 6 size: 0 MB node 6 free: 0 MB node 7 cpus: 56 57 58 59 60 61 62 63 120 121 122 123 124 125 126 127 node 7 size: 0 MB node 7 free: 0 MB node distances: node 0 1 2 3 4 5 6 7 0: 10 16 16 16 32 32 32 32 1: 16 10 16 16 32 32 32 32 2: 16 16 10 16 32 32 32 32 3: 16 16 16 10 32 32 32 32 4: 32 32 32 32 10 16 16 16 5: 32 32 32 32 16 10 16 16 6: 32 32 32 32 16 16 10 16 7: 32 32 32 32 16 16 16 10 # lscpu Architecture:x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 2 Core(s) per socket: 32 Socket(s): 2 NUMA node(s):8 Vendor ID: AuthenticAMD CPU family: 23 Model: 1 Model name: AMD EPYC 7601 32-Core Processor Stepping:2 CPU MHz: 2713.551 BogoMIPS:4391.39 Virtualization: AMD-V L1d cache: 32K L1i cache: 64K L2 cache:512K L3 cache:8192K NUMA node0 CPU(s): 0-7,64-71 NUMA node1 CPU(s): 8-15,72-79 NUMA node2 CPU(s): 16-23,80-87 NUMA node3 CPU(s): 24-31,88-95 NUMA node4 CPU(s): 32-39,96-103 NUMA node5 CPU(s): 40-47,104-111 NUMA node6 CPU(s): 48-55,112-119 NUMA node7 CPU(s): 56-63,120-127 Another possible lead is that without reverting the those commits below, kdump kernel would always also crash in shrink_slab_memcg() at this line, map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't think of where nodeinfo was freed but memcg was still online. Maybe a check is needed: diff --git a/mm/vmscan.c b/mm/vmscan.c index a0301ed..bacda49 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, int nid, if (!mem_cgroup_online(memcg)) return 0; + if (!memcg->nodeinfo[nid]) + return 0; + if (!down_read_trylock(&shrinker_rwsem)) return 0; [9.072036][T1] BUG: KASAN:
Re: list corruption in deferred_split_scan()
On Wed, 2019-07-10 at 17:16 -0700, Yang Shi wrote: > Hi Qian, > > > Thanks for reporting the issue. But, I can't reproduce it on my machine. > Could you please share more details about your test? How often did you > run into this problem? I can almost reproduce it every time on a HPE ProLiant DL385 Gen10 server. Here is some more information. # cat .config https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config # numactl -H available: 8 nodes (0-7) node 0 cpus: 0 1 2 3 4 5 6 7 64 65 66 67 68 69 70 71 node 0 size: 19984 MB node 0 free: 7251 MB node 1 cpus: 8 9 10 11 12 13 14 15 72 73 74 75 76 77 78 79 node 1 size: 0 MB node 1 free: 0 MB node 2 cpus: 16 17 18 19 20 21 22 23 80 81 82 83 84 85 86 87 node 2 size: 0 MB node 2 free: 0 MB node 3 cpus: 24 25 26 27 28 29 30 31 88 89 90 91 92 93 94 95 node 3 size: 0 MB node 3 free: 0 MB node 4 cpus: 32 33 34 35 36 37 38 39 96 97 98 99 100 101 102 103 node 4 size: 31524 MB node 4 free: 25165 MB node 5 cpus: 40 41 42 43 44 45 46 47 104 105 106 107 108 109 110 111 node 5 size: 0 MB node 5 free: 0 MB node 6 cpus: 48 49 50 51 52 53 54 55 112 113 114 115 116 117 118 119 node 6 size: 0 MB node 6 free: 0 MB node 7 cpus: 56 57 58 59 60 61 62 63 120 121 122 123 124 125 126 127 node 7 size: 0 MB node 7 free: 0 MB node distances: node 0 1 2 3 4 5 6 7 0: 10 16 16 16 32 32 32 32 1: 16 10 16 16 32 32 32 32 2: 16 16 10 16 32 32 32 32 3: 16 16 16 10 32 32 32 32 4: 32 32 32 32 10 16 16 16 5: 32 32 32 32 16 10 16 16 6: 32 32 32 32 16 16 10 16 7: 32 32 32 32 16 16 16 10 # lscpu Architecture:x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 2 Core(s) per socket: 32 Socket(s): 2 NUMA node(s):8 Vendor ID: AuthenticAMD CPU family: 23 Model: 1 Model name: AMD EPYC 7601 32-Core Processor Stepping:2 CPU MHz: 2713.551 BogoMIPS:4391.39 Virtualization: AMD-V L1d cache: 32K L1i cache: 64K L2 cache:512K L3 cache:8192K NUMA node0 CPU(s): 0-7,64-71 NUMA node1 CPU(s): 8-15,72-79 NUMA node2 CPU(s): 16-23,80-87 NUMA node3 CPU(s): 24-31,88-95 NUMA node4 CPU(s): 32-39,96-103 NUMA node5 CPU(s): 40-47,104-111 NUMA node6 CPU(s): 48-55,112-119 NUMA node7 CPU(s): 56-63,120-127 Another possible lead is that without reverting the those commits below, kdump kernel would always also crash in shrink_slab_memcg() at this line, map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true); [9.072036][T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440 [9.072036][T1] Read of size 8 at addr 0dc8 by task swapper/0/1 [9.072036][T1] [9.072036][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next- 20190711+ #10 [9.072036][T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 01/25/2019 [9.072036][T1] Call Trace: [9.072036][T1] dump_stack+0x62/0x9a [9.072036][T1] __kasan_report.cold.4+0xb0/0xb4 [9.072036][T1] ? unwind_get_return_address+0x40/0x50 [9.072036][T1] ? shrink_slab+0x111/0x440 [9.072036][T1] kasan_report+0xc/0xe [9.072036][T1] __asan_load8+0x71/0xa0 [9.072036][T1] shrink_slab+0x111/0x440 [9.072036][T1] ? mem_cgroup_iter+0x98/0x840 [9.072036][T1] ? unregister_shrinker+0x110/0x110 [9.072036][T1] ? kasan_check_read+0x11/0x20 [9.072036][T1] ? mem_cgroup_protected+0x39/0x260 [9.072036][T1] shrink_node+0x31e/0xa30 [9.072036][T1] ? shrink_node_memcg+0x1560/0x1560 [9.072036][T1] ? ktime_get+0x93/0x110 [9.072036][T1] do_try_to_free_pages+0x22f/0x820 [9.072036][T1] ? shrink_node+0xa30/0xa30 [9.072036][T1] ? kasan_check_read+0x11/0x20 [9.072036][T1] ? check_chain_key+0x1df/0x2e0 [9.072036][T1] try_to_free_pages+0x242/0x4d0 [9.072036][T1] ? do_try_to_free_pages+0x820/0x820 [9.072036][T1] __alloc_pages_nodemask+0x9ce/0x1bc0 [9.072036][T1] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [9.072036][T1] ? unwind_dump+0x260/0x260 [9.072036][T1] ? kernel_text_address+0x33/0xc0 [9.072036][T1] ? arch_stack_walk+0x8f/0xf0 [9.072036][T1] ? ret_from_fork+0x22/0x40 [9.072036][T1] alloc_page_interleave+0x18/0x130 [9.072036][T1] alloc_pages_current+0xf6/0x110 [9.072036][T1] allocate_slab+0x600/0x11f0 [9.072036][T1] new_slab+0x46/0x70 [9.072036][T1] ___slab_alloc+0x5d4/0x9c0 [9.072036][T1] ? create_object+0x3a/0x3e0 [9.072036][T1] ? fs_reclaim_acquire.part.15+0x5/0x30 [9.072036][T1] ? ___might_sleep+0xab/0xc0 [9.072036][T1] ? create_object+0x3a/0x3e0
Re: list corruption in deferred_split_scan()
Hi Qian, Thanks for reporting the issue. But, I can't reproduce it on my machine. Could you please share more details about your test? How often did you run into this problem? Regards, Yang On 7/10/19 2:43 PM, Qian Cai wrote: Running LTP oom01 test case with swap triggers a crash below. Revert the series "Make deferred split shrinker memcg aware" [1] seems fix the issue. aefde94195ca mm: thp: make deferred split shrinker memcg aware cf402211cacc mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2-fix ca37e9e5f18d mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix-2 5f419d89cab4 mm-shrinker-make-shrinker-not-depend-on-memcg-kmem-fix c9d49e69e887 mm: shrinker: make shrinker not depend on memcg kmem 1c0af4b86bcf mm: move mem_cgroup_uncharge out of __page_cache_release() 4e050f2df876 mm: thp: extract split_queue_* into a struct [1] https://lore.kernel.org/linux-mm/1561507361-59349-1-git-send-email-yang.shi@ linux.alibaba.com/ [ 1145.730682][ T5764] list_del corruption, ea00251c8098->next is LIST_POISON1 (dead0100) [ 1145.739763][ T5764] [ cut here ] [ 1145.745126][ T5764] kernel BUG at lib/list_debug.c:47! [ 1145.750320][ T5764] invalid opcode: [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI [ 1145.757513][ T5764] CPU: 1 PID: 5764 Comm: oom01 Tainted: GW 5.2.0-next-20190710+ #7 [ 1145.766709][ T5764] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 01/25/2019 [ 1145.776000][ T5764] RIP: 0010:__list_del_entry_valid.cold.0+0x12/0x4a [ 1145.782491][ T5764] Code: c7 40 5a 33 af e8 ac fe bc ff 0f 0b 48 c7 c7 80 9e a1 af e8 f6 4c 01 00 4c 89 ea 48 89 de 48 c7 c7 20 59 33 af e8 8c fe bc ff <0f> 0b 48 c7 c7 40 9f a1 af e8 d6 4c 01 00 4c 89 e2 48 89 de 48 c7 [ 1145.802078][ T5764] RSP: 0018:888514d773c0 EFLAGS: 00010082 [ 1145.808042][ T5764] RAX: 004e RBX: ea00251c8098 RCX: ae95d318 [ 1145.815923][ T5764] RDX: RSI: 0008 RDI: 440bd380 [ 1145.823806][ T5764] RBP: 888514d773d8 R08: ed1108817a71 R09: ed1108817a70 [ 1145.831689][ T5764] R10: ed1108817a70 R11: 440bd387 R12: dead0122 [ 1145.839571][ T5764] R13: dead0100 R14: ea00251c8034 R15: dead0100 [ 1145.847455][ T5764] FS: 7f765ad4d700() GS:4408() knlGS: [ 1145.856299][ T5764] CS: 0010 DS: ES: CR0: 80050033 [ 1145.862784][ T5764] CR2: 7f8cebec7000 CR3: 000459338000 CR4: 001406a0 [ 1145.870664][ T5764] Call Trace: [ 1145.873835][ T5764] deferred_split_scan+0x337/0x740 [ 1145.878835][ T5764] ? split_huge_page_to_list+0xe30/0xe30 [ 1145.884364][ T5764] ? __radix_tree_lookup+0x12d/0x1e0 [ 1145.889539][ T5764] ? node_tag_get.part.0.constprop.6+0x40/0x40 [ 1145.895592][ T5764] do_shrink_slab+0x244/0x5a0 [ 1145.900159][ T5764] shrink_slab+0x253/0x440 [ 1145.904462][ T5764] ? unregister_shrinker+0x110/0x110 [ 1145.909641][ T5764] ? kasan_check_read+0x11/0x20 [ 1145.914383][ T5764] ? mem_cgroup_protected+0x20f/0x260 [ 1145.919645][ T5764] shrink_node+0x31e/0xa30 [ 1145.923949][ T5764] ? shrink_node_memcg+0x1560/0x1560 [ 1145.929126][ T5764] ? ktime_get+0x93/0x110 [ 1145.933340][ T5764] do_try_to_free_pages+0x22f/0x820 [ 1145.938429][ T5764] ? shrink_node+0xa30/0xa30 [ 1145.942906][ T5764] ? kasan_check_read+0x11/0x20 [ 1145.947647][ T5764] ? check_chain_key+0x1df/0x2e0 [ 1145.952474][ T5764] try_to_free_pages+0x242/0x4d0 [ 1145.957299][ T5764] ? do_try_to_free_pages+0x820/0x820 [ 1145.962566][ T5764] __alloc_pages_nodemask+0x9ce/0x1bc0 [ 1145.967917][ T5764] ? kasan_check_read+0x11/0x20 [ 1145.972657][ T5764] ? gfp_pfmemalloc_allowed+0xc0/0xc0 [ 1145.977920][ T5764] ? kasan_check_read+0x11/0x20 [ 1145.982659][ T5764] ? check_chain_key+0x1df/0x2e0 [ 1145.987487][ T5764] ? do_anonymous_page+0x343/0xe30 [ 1145.992489][ T5764] ? lock_downgrade+0x390/0x390 [ 1145.997230][ T5764] ? __count_memcg_events+0x8b/0x1c0 [ 1146.002404][ T5764] ? kasan_check_read+0x11/0x20 [ 1146.007145][ T5764] ? __lru_cache_add+0x122/0x160 [ 1146.011974][ T5764] alloc_pages_vma+0x89/0x2c0 [ 1146.016538][ T5764] do_anonymous_page+0x3e1/0xe30 [ 1146.021367][ T5764] ? __update_load_avg_cfs_rq+0x2c/0x490 [ 1146.026893][ T5764] ? finish_fault+0x120/0x120 [ 1146.031461][ T5764] ? call_function_interrupt+0xa/0x20 [ 1146.036724][ T5764] handle_pte_fault+0x457/0x12c0 [ 1146.041552][ T5764] __handle_mm_fault+0x79a/0xa50 [ 1146.046378][ T5764] ? vmf_insert_mixed_mkwrite+0x20/0x20 [ 1146.051817][ T5764] ? kasan_check_read+0x11/0x20 [ 1146.056557][ T5764] ? __count_memcg_events+0x8b/0x1c0 [ 1146.061732][ T5764] handle_mm_fault+0x17f/0x370 [ 1146.066386][ T5764] __do_page_fault+0x25b/0x5d0 [ 1146.071037][ T5764] do_page_fault+0x4c/0x2cf [ 1146.075426][ T5764] ? page_fault+0x5/0x20 [ 1146.079553][ T5764] page_fault+0x1b/0x20 [ 1146.083594][ T5764] RIP: 0033:0x410be0 [ 1146.087373][ T5764] Code: 89 de