Re: [linux-next] khugepaged inconsistent lock state
On (09/23/15 16:22), Kirill A. Shutemov wrote: [..] > khugepaged does swap in during collapse under anon_vma lock. It causes > complain from lockdep. The trace below shows following scenario: > > - khugepaged tries to swap in a page under mmap_sem and anon_vma lock; > - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE; > - __read_swap_cache_async() tries to allocate the page for swap in; > - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with >given gfp_mask we could end up in direct relaim. > - Lockdep already knows that reclaim sometimes (e.g. in case of >split_huge_page()) wants to take anon_vma lock on its own. > > Therefore deadlock is possible. [..] Gave it some testing on my box. Works fine on my side. I guess you can add (if needed) Tested-by: Sergey Senozhatsky -ss > Signed-off-by: Kirill A. Shutemov > Reported-by: Sergey Senozhatsky > --- > mm/huge_memory.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index dd58ecfcafe6..06c8f6d8fee2 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2725,10 +2725,10 @@ static void collapse_huge_page(struct mm_struct *mm, > goto out; > } > > - anon_vma_lock_write(vma->anon_vma); > - > __collapse_huge_page_swapin(mm, vma, address, pmd); > > + anon_vma_lock_write(vma->anon_vma); > + > pte = pte_offset_map(pmd, address); > pte_ptl = pte_lockptr(mm, pmd); > > -- > Kirill A. Shutemov > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-next] khugepaged inconsistent lock state
2015-09-23 15:22 GMT+02:00 Kirill A. Shutemov : > On Mon, Sep 21, 2015 at 04:57:05PM -0700, Hugh Dickins wrote: >> On Mon, 21 Sep 2015, Kirill A. Shutemov wrote: >> > On Mon, Sep 21, 2015 at 01:46:00PM +0900, Sergey Senozhatsky wrote: >> > > Hi, >> > > >> > > 4.3.0-rc1-next-20150918 >> > > >> > > [18344.236625] = >> > > [18344.236628] [ INFO: inconsistent lock state ] >> > > [18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not >> > > tainted >> > > [18344.236636] - >> > > [18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. >> > > [18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes: >> > > [18344.236648] (_vma->rwsem){?.}, at: [] >> > > khugepaged+0x8b0/0x1987 >> > > [18344.236662] {IN-RECLAIM_FS-W} state was registered at: >> > > [18344.23] [] __lock_acquire+0x8e2/0x1183 >> > > [18344.236673] [] lock_acquire+0x10b/0x1a6 >> > > [18344.236678] [] down_write+0x3b/0x6a >> > > [18344.236686] [] split_huge_page_to_list+0x5b/0x61f >> > > [18344.236689] [] add_to_swap+0x37/0x78 >> > > [18344.236691] [] shrink_page_list+0x4c2/0xb9a >> > > [18344.236694] [] shrink_inactive_list+0x371/0x5d9 >> > > [18344.236696] [] shrink_lruvec+0x410/0x5ae >> > > [18344.236698] [] shrink_zone+0x57/0x140 >> > > [18344.236700] [] kswapd+0x6a5/0x91b >> > > [18344.236702] [] kthread+0x107/0x10f >> > > [18344.236706] [] ret_from_fork+0x3f/0x70 >> > > [18344.236708] irq event stamp: 6517947 >> > > [18344.236709] hardirqs last enabled at (6517947): [] >> > > get_page_from_freelist+0x362/0x59e >> > > [18344.236713] hardirqs last disabled at (6517946): [] >> > > _raw_spin_lock_irqsave+0x18/0x51 >> > > [18344.236715] softirqs last enabled at (6507072): [] >> > > __do_softirq+0x2df/0x3f5 >> > > [18344.236719] softirqs last disabled at (6507055): [] >> > > irq_exit+0x40/0x94 >> > > [18344.236722] >> > >other info that might help us debug this: >> > > [18344.236723] Possible unsafe locking scenario: >> > > >> > > [18344.236724]CPU0 >> > > [18344.236725] >> > > [18344.236726] lock(_vma->rwsem); >> > > [18344.236728] >> > > [18344.236729] lock(_vma->rwsem); >> > > [18344.236731] >> > > *** DEADLOCK *** >> > > >> > > [18344.236733] 2 locks held by khugepaged/32: >> > > [18344.236733] #0: (>mmap_sem){++}, at: [] >> > > khugepaged+0x5cf/0x1987 >> > > [18344.236738] #1: (_vma->rwsem){?.}, at: >> > > [] khugepaged+0x8b0/0x1987 >> > > [18344.236741] >> > >stack backtrace: >> > > [18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted >> > > 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 >> > > [18344.236747] 880132827a00 81230867 >> > > 8237ba90 >> > > [18344.236750] 880132827a38 810ea9b9 000a >> > > 8801333b52e0 >> > > [18344.236753] 8801333b4c00 8107b3ce 000a >> > > 880132827a78 >> > > [18344.236755] Call Trace: >> > > [18344.236758] [] dump_stack+0x4e/0x79 >> > > [18344.236761] [] print_usage_bug.part.24+0x259/0x268 >> > > [18344.236763] [] ? >> > > print_shortest_lock_dependencies+0x180/0x180 >> > > [18344.236765] [] mark_lock+0x381/0x567 >> > > [18344.236766] [] mark_held_locks+0x5e/0x74 >> > > [18344.236768] [] lockdep_trace_alloc+0xb0/0xb3 >> > > [18344.236771] [] __alloc_pages_nodemask+0x99/0x856 >> > > [18344.236772] [] ? find_get_entry+0x14b/0x17a >> > > [18344.236774] [] ? find_get_entry+0x168/0x17a >> > > [18344.236777] [] __read_swap_cache_async+0x7b/0x1aa >> > > [18344.236778] [] read_swap_cache_async+0x15/0x2d >> > > [18344.236780] [] swapin_readahead+0x11a/0x16a >> > > [18344.236783] [] do_swap_page+0xa7/0x36b >> > > [18344.236784] [] ? do_swap_page+0xa7/0x36b >> > > [18344.236787] [] khugepaged+0x8f9/0x1987 >> > > [18344.236790] [] ? wait_woken+0x88/0x88 >> > > [18344.236792] [] ? maybe_pmd_mkwrite+0x1a/0x1a >> > > [18344.236794] [] kthread+0x107/0x10f >> > > [18344.236797] [] ? kthread_create_on_node+0x1ea/0x1ea >> > > [18344.236799] [] ret_from_fork+0x3f/0x70 >> > > [18344.236801] [] ? kthread_create_on_node+0x1ea/0x1ea >> > >> > Hm. If I read this correctly, we see following scenario: >> > >> > - khugepaged tries to swap in a page under mmap_sem and anon_vma lock; >> > - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE; >> > - __read_swap_cache_async() tries to allocate the page for swap in; >> > - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with >> >given gfp_mask we could end up in direct relaim. >> > - Lockdep already knows that reclaim sometimes (e.g. in case of >> >split_huge_page()) wants to take anon_vma lock on its own. >> > >> > Therefore deadlock is possible. >> >> Oh, thank you for working that out. As usual with a lockdep trace, >> I knew it was telling me something important, but in a language I >>
Re: [linux-next] khugepaged inconsistent lock state
On Mon, Sep 21, 2015 at 04:57:05PM -0700, Hugh Dickins wrote: > On Mon, 21 Sep 2015, Kirill A. Shutemov wrote: > > On Mon, Sep 21, 2015 at 01:46:00PM +0900, Sergey Senozhatsky wrote: > > > Hi, > > > > > > 4.3.0-rc1-next-20150918 > > > > > > [18344.236625] = > > > [18344.236628] [ INFO: inconsistent lock state ] > > > [18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not > > > tainted > > > [18344.236636] - > > > [18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. > > > [18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes: > > > [18344.236648] (_vma->rwsem){?.}, at: [] > > > khugepaged+0x8b0/0x1987 > > > [18344.236662] {IN-RECLAIM_FS-W} state was registered at: > > > [18344.23] [] __lock_acquire+0x8e2/0x1183 > > > [18344.236673] [] lock_acquire+0x10b/0x1a6 > > > [18344.236678] [] down_write+0x3b/0x6a > > > [18344.236686] [] split_huge_page_to_list+0x5b/0x61f > > > [18344.236689] [] add_to_swap+0x37/0x78 > > > [18344.236691] [] shrink_page_list+0x4c2/0xb9a > > > [18344.236694] [] shrink_inactive_list+0x371/0x5d9 > > > [18344.236696] [] shrink_lruvec+0x410/0x5ae > > > [18344.236698] [] shrink_zone+0x57/0x140 > > > [18344.236700] [] kswapd+0x6a5/0x91b > > > [18344.236702] [] kthread+0x107/0x10f > > > [18344.236706] [] ret_from_fork+0x3f/0x70 > > > [18344.236708] irq event stamp: 6517947 > > > [18344.236709] hardirqs last enabled at (6517947): [] > > > get_page_from_freelist+0x362/0x59e > > > [18344.236713] hardirqs last disabled at (6517946): [] > > > _raw_spin_lock_irqsave+0x18/0x51 > > > [18344.236715] softirqs last enabled at (6507072): [] > > > __do_softirq+0x2df/0x3f5 > > > [18344.236719] softirqs last disabled at (6507055): [] > > > irq_exit+0x40/0x94 > > > [18344.236722] > > >other info that might help us debug this: > > > [18344.236723] Possible unsafe locking scenario: > > > > > > [18344.236724]CPU0 > > > [18344.236725] > > > [18344.236726] lock(_vma->rwsem); > > > [18344.236728] > > > [18344.236729] lock(_vma->rwsem); > > > [18344.236731] > > > *** DEADLOCK *** > > > > > > [18344.236733] 2 locks held by khugepaged/32: > > > [18344.236733] #0: (>mmap_sem){++}, at: [] > > > khugepaged+0x5cf/0x1987 > > > [18344.236738] #1: (_vma->rwsem){?.}, at: [] > > > khugepaged+0x8b0/0x1987 > > > [18344.236741] > > >stack backtrace: > > > [18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted > > > 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 > > > [18344.236747] 880132827a00 81230867 > > > 8237ba90 > > > [18344.236750] 880132827a38 810ea9b9 000a > > > 8801333b52e0 > > > [18344.236753] 8801333b4c00 8107b3ce 000a > > > 880132827a78 > > > [18344.236755] Call Trace: > > > [18344.236758] [] dump_stack+0x4e/0x79 > > > [18344.236761] [] print_usage_bug.part.24+0x259/0x268 > > > [18344.236763] [] ? > > > print_shortest_lock_dependencies+0x180/0x180 > > > [18344.236765] [] mark_lock+0x381/0x567 > > > [18344.236766] [] mark_held_locks+0x5e/0x74 > > > [18344.236768] [] lockdep_trace_alloc+0xb0/0xb3 > > > [18344.236771] [] __alloc_pages_nodemask+0x99/0x856 > > > [18344.236772] [] ? find_get_entry+0x14b/0x17a > > > [18344.236774] [] ? find_get_entry+0x168/0x17a > > > [18344.236777] [] __read_swap_cache_async+0x7b/0x1aa > > > [18344.236778] [] read_swap_cache_async+0x15/0x2d > > > [18344.236780] [] swapin_readahead+0x11a/0x16a > > > [18344.236783] [] do_swap_page+0xa7/0x36b > > > [18344.236784] [] ? do_swap_page+0xa7/0x36b > > > [18344.236787] [] khugepaged+0x8f9/0x1987 > > > [18344.236790] [] ? wait_woken+0x88/0x88 > > > [18344.236792] [] ? maybe_pmd_mkwrite+0x1a/0x1a > > > [18344.236794] [] kthread+0x107/0x10f > > > [18344.236797] [] ? kthread_create_on_node+0x1ea/0x1ea > > > [18344.236799] [] ret_from_fork+0x3f/0x70 > > > [18344.236801] [] ? kthread_create_on_node+0x1ea/0x1ea > > > > Hm. If I read this correctly, we see following scenario: > > > > - khugepaged tries to swap in a page under mmap_sem and anon_vma lock; > > - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE; > > - __read_swap_cache_async() tries to allocate the page for swap in; > > - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with > >given gfp_mask we could end up in direct relaim. > > - Lockdep already knows that reclaim sometimes (e.g. in case of > >split_huge_page()) wants to take anon_vma lock on its own. > > > > Therefore deadlock is possible. > > Oh, thank you for working that out. As usual with a lockdep trace, > I knew it was telling me something important, but in a language I > just couldn't understand without spending much longer to decode it. > Yes, wrong to call do_swap_page() while holding anon_vma lock. > > > >
Re: [linux-next] khugepaged inconsistent lock state
On Mon, Sep 21, 2015 at 04:57:05PM -0700, Hugh Dickins wrote: > On Mon, 21 Sep 2015, Kirill A. Shutemov wrote: > > On Mon, Sep 21, 2015 at 01:46:00PM +0900, Sergey Senozhatsky wrote: > > > Hi, > > > > > > 4.3.0-rc1-next-20150918 > > > > > > [18344.236625] = > > > [18344.236628] [ INFO: inconsistent lock state ] > > > [18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not > > > tainted > > > [18344.236636] - > > > [18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. > > > [18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes: > > > [18344.236648] (_vma->rwsem){?.}, at: [] > > > khugepaged+0x8b0/0x1987 > > > [18344.236662] {IN-RECLAIM_FS-W} state was registered at: > > > [18344.23] [] __lock_acquire+0x8e2/0x1183 > > > [18344.236673] [] lock_acquire+0x10b/0x1a6 > > > [18344.236678] [] down_write+0x3b/0x6a > > > [18344.236686] [] split_huge_page_to_list+0x5b/0x61f > > > [18344.236689] [] add_to_swap+0x37/0x78 > > > [18344.236691] [] shrink_page_list+0x4c2/0xb9a > > > [18344.236694] [] shrink_inactive_list+0x371/0x5d9 > > > [18344.236696] [] shrink_lruvec+0x410/0x5ae > > > [18344.236698] [] shrink_zone+0x57/0x140 > > > [18344.236700] [] kswapd+0x6a5/0x91b > > > [18344.236702] [] kthread+0x107/0x10f > > > [18344.236706] [] ret_from_fork+0x3f/0x70 > > > [18344.236708] irq event stamp: 6517947 > > > [18344.236709] hardirqs last enabled at (6517947): [] > > > get_page_from_freelist+0x362/0x59e > > > [18344.236713] hardirqs last disabled at (6517946): [] > > > _raw_spin_lock_irqsave+0x18/0x51 > > > [18344.236715] softirqs last enabled at (6507072): [] > > > __do_softirq+0x2df/0x3f5 > > > [18344.236719] softirqs last disabled at (6507055): [] > > > irq_exit+0x40/0x94 > > > [18344.236722] > > >other info that might help us debug this: > > > [18344.236723] Possible unsafe locking scenario: > > > > > > [18344.236724]CPU0 > > > [18344.236725] > > > [18344.236726] lock(_vma->rwsem); > > > [18344.236728] > > > [18344.236729] lock(_vma->rwsem); > > > [18344.236731] > > > *** DEADLOCK *** > > > > > > [18344.236733] 2 locks held by khugepaged/32: > > > [18344.236733] #0: (>mmap_sem){++}, at: [] > > > khugepaged+0x5cf/0x1987 > > > [18344.236738] #1: (_vma->rwsem){?.}, at: [] > > > khugepaged+0x8b0/0x1987 > > > [18344.236741] > > >stack backtrace: > > > [18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted > > > 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 > > > [18344.236747] 880132827a00 81230867 > > > 8237ba90 > > > [18344.236750] 880132827a38 810ea9b9 000a > > > 8801333b52e0 > > > [18344.236753] 8801333b4c00 8107b3ce 000a > > > 880132827a78 > > > [18344.236755] Call Trace: > > > [18344.236758] [] dump_stack+0x4e/0x79 > > > [18344.236761] [] print_usage_bug.part.24+0x259/0x268 > > > [18344.236763] [] ? > > > print_shortest_lock_dependencies+0x180/0x180 > > > [18344.236765] [] mark_lock+0x381/0x567 > > > [18344.236766] [] mark_held_locks+0x5e/0x74 > > > [18344.236768] [] lockdep_trace_alloc+0xb0/0xb3 > > > [18344.236771] [] __alloc_pages_nodemask+0x99/0x856 > > > [18344.236772] [] ? find_get_entry+0x14b/0x17a > > > [18344.236774] [] ? find_get_entry+0x168/0x17a > > > [18344.236777] [] __read_swap_cache_async+0x7b/0x1aa > > > [18344.236778] [] read_swap_cache_async+0x15/0x2d > > > [18344.236780] [] swapin_readahead+0x11a/0x16a > > > [18344.236783] [] do_swap_page+0xa7/0x36b > > > [18344.236784] [] ? do_swap_page+0xa7/0x36b > > > [18344.236787] [] khugepaged+0x8f9/0x1987 > > > [18344.236790] [] ? wait_woken+0x88/0x88 > > > [18344.236792] [] ? maybe_pmd_mkwrite+0x1a/0x1a > > > [18344.236794] [] kthread+0x107/0x10f > > > [18344.236797] [] ? kthread_create_on_node+0x1ea/0x1ea > > > [18344.236799] [] ret_from_fork+0x3f/0x70 > > > [18344.236801] [] ? kthread_create_on_node+0x1ea/0x1ea > > > > Hm. If I read this correctly, we see following scenario: > > > > - khugepaged tries to swap in a page under mmap_sem and anon_vma lock; > > - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE; > > - __read_swap_cache_async() tries to allocate the page for swap in; > > - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with > >given gfp_mask we could end up in direct relaim. > > - Lockdep already knows that reclaim sometimes (e.g. in case of > >split_huge_page()) wants to take anon_vma lock on its own. > > > > Therefore deadlock is possible. > > Oh, thank you for working that out. As usual with a lockdep trace, > I knew it was telling me something important, but in a language I > just couldn't understand without spending much longer to decode it. > Yes, wrong to call do_swap_page() while holding anon_vma lock. > > > >
Re: [linux-next] khugepaged inconsistent lock state
2015-09-23 15:22 GMT+02:00 Kirill A. Shutemov: > On Mon, Sep 21, 2015 at 04:57:05PM -0700, Hugh Dickins wrote: >> On Mon, 21 Sep 2015, Kirill A. Shutemov wrote: >> > On Mon, Sep 21, 2015 at 01:46:00PM +0900, Sergey Senozhatsky wrote: >> > > Hi, >> > > >> > > 4.3.0-rc1-next-20150918 >> > > >> > > [18344.236625] = >> > > [18344.236628] [ INFO: inconsistent lock state ] >> > > [18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not >> > > tainted >> > > [18344.236636] - >> > > [18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. >> > > [18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes: >> > > [18344.236648] (_vma->rwsem){?.}, at: [] >> > > khugepaged+0x8b0/0x1987 >> > > [18344.236662] {IN-RECLAIM_FS-W} state was registered at: >> > > [18344.23] [] __lock_acquire+0x8e2/0x1183 >> > > [18344.236673] [] lock_acquire+0x10b/0x1a6 >> > > [18344.236678] [] down_write+0x3b/0x6a >> > > [18344.236686] [] split_huge_page_to_list+0x5b/0x61f >> > > [18344.236689] [] add_to_swap+0x37/0x78 >> > > [18344.236691] [] shrink_page_list+0x4c2/0xb9a >> > > [18344.236694] [] shrink_inactive_list+0x371/0x5d9 >> > > [18344.236696] [] shrink_lruvec+0x410/0x5ae >> > > [18344.236698] [] shrink_zone+0x57/0x140 >> > > [18344.236700] [] kswapd+0x6a5/0x91b >> > > [18344.236702] [] kthread+0x107/0x10f >> > > [18344.236706] [] ret_from_fork+0x3f/0x70 >> > > [18344.236708] irq event stamp: 6517947 >> > > [18344.236709] hardirqs last enabled at (6517947): [] >> > > get_page_from_freelist+0x362/0x59e >> > > [18344.236713] hardirqs last disabled at (6517946): [] >> > > _raw_spin_lock_irqsave+0x18/0x51 >> > > [18344.236715] softirqs last enabled at (6507072): [] >> > > __do_softirq+0x2df/0x3f5 >> > > [18344.236719] softirqs last disabled at (6507055): [] >> > > irq_exit+0x40/0x94 >> > > [18344.236722] >> > >other info that might help us debug this: >> > > [18344.236723] Possible unsafe locking scenario: >> > > >> > > [18344.236724]CPU0 >> > > [18344.236725] >> > > [18344.236726] lock(_vma->rwsem); >> > > [18344.236728] >> > > [18344.236729] lock(_vma->rwsem); >> > > [18344.236731] >> > > *** DEADLOCK *** >> > > >> > > [18344.236733] 2 locks held by khugepaged/32: >> > > [18344.236733] #0: (>mmap_sem){++}, at: [] >> > > khugepaged+0x5cf/0x1987 >> > > [18344.236738] #1: (_vma->rwsem){?.}, at: >> > > [] khugepaged+0x8b0/0x1987 >> > > [18344.236741] >> > >stack backtrace: >> > > [18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted >> > > 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 >> > > [18344.236747] 880132827a00 81230867 >> > > 8237ba90 >> > > [18344.236750] 880132827a38 810ea9b9 000a >> > > 8801333b52e0 >> > > [18344.236753] 8801333b4c00 8107b3ce 000a >> > > 880132827a78 >> > > [18344.236755] Call Trace: >> > > [18344.236758] [] dump_stack+0x4e/0x79 >> > > [18344.236761] [] print_usage_bug.part.24+0x259/0x268 >> > > [18344.236763] [] ? >> > > print_shortest_lock_dependencies+0x180/0x180 >> > > [18344.236765] [] mark_lock+0x381/0x567 >> > > [18344.236766] [] mark_held_locks+0x5e/0x74 >> > > [18344.236768] [] lockdep_trace_alloc+0xb0/0xb3 >> > > [18344.236771] [] __alloc_pages_nodemask+0x99/0x856 >> > > [18344.236772] [] ? find_get_entry+0x14b/0x17a >> > > [18344.236774] [] ? find_get_entry+0x168/0x17a >> > > [18344.236777] [] __read_swap_cache_async+0x7b/0x1aa >> > > [18344.236778] [] read_swap_cache_async+0x15/0x2d >> > > [18344.236780] [] swapin_readahead+0x11a/0x16a >> > > [18344.236783] [] do_swap_page+0xa7/0x36b >> > > [18344.236784] [] ? do_swap_page+0xa7/0x36b >> > > [18344.236787] [] khugepaged+0x8f9/0x1987 >> > > [18344.236790] [] ? wait_woken+0x88/0x88 >> > > [18344.236792] [] ? maybe_pmd_mkwrite+0x1a/0x1a >> > > [18344.236794] [] kthread+0x107/0x10f >> > > [18344.236797] [] ? kthread_create_on_node+0x1ea/0x1ea >> > > [18344.236799] [] ret_from_fork+0x3f/0x70 >> > > [18344.236801] [] ? kthread_create_on_node+0x1ea/0x1ea >> > >> > Hm. If I read this correctly, we see following scenario: >> > >> > - khugepaged tries to swap in a page under mmap_sem and anon_vma lock; >> > - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE; >> > - __read_swap_cache_async() tries to allocate the page for swap in; >> > - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with >> >given gfp_mask we could end up in direct relaim. >> > - Lockdep already knows that reclaim sometimes (e.g. in case of >> >split_huge_page()) wants to take anon_vma lock on its own. >> > >> > Therefore deadlock is possible. >> >> Oh, thank you for working that out. As usual with a lockdep trace, >> I knew it was telling me something important,
Re: [linux-next] khugepaged inconsistent lock state
On (09/23/15 16:22), Kirill A. Shutemov wrote: [..] > khugepaged does swap in during collapse under anon_vma lock. It causes > complain from lockdep. The trace below shows following scenario: > > - khugepaged tries to swap in a page under mmap_sem and anon_vma lock; > - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE; > - __read_swap_cache_async() tries to allocate the page for swap in; > - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with >given gfp_mask we could end up in direct relaim. > - Lockdep already knows that reclaim sometimes (e.g. in case of >split_huge_page()) wants to take anon_vma lock on its own. > > Therefore deadlock is possible. [..] Gave it some testing on my box. Works fine on my side. I guess you can add (if needed) Tested-by: Sergey Senozhatsky-ss > Signed-off-by: Kirill A. Shutemov > Reported-by: Sergey Senozhatsky > --- > mm/huge_memory.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index dd58ecfcafe6..06c8f6d8fee2 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2725,10 +2725,10 @@ static void collapse_huge_page(struct mm_struct *mm, > goto out; > } > > - anon_vma_lock_write(vma->anon_vma); > - > __collapse_huge_page_swapin(mm, vma, address, pmd); > > + anon_vma_lock_write(vma->anon_vma); > + > pte = pte_offset_map(pmd, address); > pte_ptl = pte_lockptr(mm, pmd); > > -- > Kirill A. Shutemov > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-next] khugepaged inconsistent lock state
On Mon, 21 Sep 2015, Kirill A. Shutemov wrote: > On Mon, Sep 21, 2015 at 01:46:00PM +0900, Sergey Senozhatsky wrote: > > Hi, > > > > 4.3.0-rc1-next-20150918 > > > > [18344.236625] = > > [18344.236628] [ INFO: inconsistent lock state ] > > [18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not > > tainted > > [18344.236636] - > > [18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. > > [18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes: > > [18344.236648] (_vma->rwsem){?.}, at: [] > > khugepaged+0x8b0/0x1987 > > [18344.236662] {IN-RECLAIM_FS-W} state was registered at: > > [18344.23] [] __lock_acquire+0x8e2/0x1183 > > [18344.236673] [] lock_acquire+0x10b/0x1a6 > > [18344.236678] [] down_write+0x3b/0x6a > > [18344.236686] [] split_huge_page_to_list+0x5b/0x61f > > [18344.236689] [] add_to_swap+0x37/0x78 > > [18344.236691] [] shrink_page_list+0x4c2/0xb9a > > [18344.236694] [] shrink_inactive_list+0x371/0x5d9 > > [18344.236696] [] shrink_lruvec+0x410/0x5ae > > [18344.236698] [] shrink_zone+0x57/0x140 > > [18344.236700] [] kswapd+0x6a5/0x91b > > [18344.236702] [] kthread+0x107/0x10f > > [18344.236706] [] ret_from_fork+0x3f/0x70 > > [18344.236708] irq event stamp: 6517947 > > [18344.236709] hardirqs last enabled at (6517947): [] > > get_page_from_freelist+0x362/0x59e > > [18344.236713] hardirqs last disabled at (6517946): [] > > _raw_spin_lock_irqsave+0x18/0x51 > > [18344.236715] softirqs last enabled at (6507072): [] > > __do_softirq+0x2df/0x3f5 > > [18344.236719] softirqs last disabled at (6507055): [] > > irq_exit+0x40/0x94 > > [18344.236722] > >other info that might help us debug this: > > [18344.236723] Possible unsafe locking scenario: > > > > [18344.236724]CPU0 > > [18344.236725] > > [18344.236726] lock(_vma->rwsem); > > [18344.236728] > > [18344.236729] lock(_vma->rwsem); > > [18344.236731] > > *** DEADLOCK *** > > > > [18344.236733] 2 locks held by khugepaged/32: > > [18344.236733] #0: (>mmap_sem){++}, at: [] > > khugepaged+0x5cf/0x1987 > > [18344.236738] #1: (_vma->rwsem){?.}, at: [] > > khugepaged+0x8b0/0x1987 > > [18344.236741] > >stack backtrace: > > [18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted > > 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 > > [18344.236747] 880132827a00 81230867 > > 8237ba90 > > [18344.236750] 880132827a38 810ea9b9 000a > > 8801333b52e0 > > [18344.236753] 8801333b4c00 8107b3ce 000a > > 880132827a78 > > [18344.236755] Call Trace: > > [18344.236758] [] dump_stack+0x4e/0x79 > > [18344.236761] [] print_usage_bug.part.24+0x259/0x268 > > [18344.236763] [] ? > > print_shortest_lock_dependencies+0x180/0x180 > > [18344.236765] [] mark_lock+0x381/0x567 > > [18344.236766] [] mark_held_locks+0x5e/0x74 > > [18344.236768] [] lockdep_trace_alloc+0xb0/0xb3 > > [18344.236771] [] __alloc_pages_nodemask+0x99/0x856 > > [18344.236772] [] ? find_get_entry+0x14b/0x17a > > [18344.236774] [] ? find_get_entry+0x168/0x17a > > [18344.236777] [] __read_swap_cache_async+0x7b/0x1aa > > [18344.236778] [] read_swap_cache_async+0x15/0x2d > > [18344.236780] [] swapin_readahead+0x11a/0x16a > > [18344.236783] [] do_swap_page+0xa7/0x36b > > [18344.236784] [] ? do_swap_page+0xa7/0x36b > > [18344.236787] [] khugepaged+0x8f9/0x1987 > > [18344.236790] [] ? wait_woken+0x88/0x88 > > [18344.236792] [] ? maybe_pmd_mkwrite+0x1a/0x1a > > [18344.236794] [] kthread+0x107/0x10f > > [18344.236797] [] ? kthread_create_on_node+0x1ea/0x1ea > > [18344.236799] [] ret_from_fork+0x3f/0x70 > > [18344.236801] [] ? kthread_create_on_node+0x1ea/0x1ea > > Hm. If I read this correctly, we see following scenario: > > - khugepaged tries to swap in a page under mmap_sem and anon_vma lock; > - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE; > - __read_swap_cache_async() tries to allocate the page for swap in; > - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with >given gfp_mask we could end up in direct relaim. > - Lockdep already knows that reclaim sometimes (e.g. in case of >split_huge_page()) wants to take anon_vma lock on its own. > > Therefore deadlock is possible. Oh, thank you for working that out. As usual with a lockdep trace, I knew it was telling me something important, but in a language I just couldn't understand without spending much longer to decode it. Yes, wrong to call do_swap_page() while holding anon_vma lock. > > I see two ways to fix this: > > - take anon_vma lock *after* __collapse_huge_page_swapin() in >collapse_huge_page(): I don't really see why we need the lock >during swapin; Agreed. > - respect FAULT_FLAG_RETRY_NOWAIT in do_swap_page(): add GFP_NOWAIT to >
Re: [linux-next] khugepaged inconsistent lock state
On Mon, Sep 21, 2015 at 01:46:00PM +0900, Sergey Senozhatsky wrote: > Hi, > > 4.3.0-rc1-next-20150918 > > [18344.236625] = > [18344.236628] [ INFO: inconsistent lock state ] > [18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not > tainted > [18344.236636] - > [18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. > [18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes: > [18344.236648] (_vma->rwsem){?.}, at: [] > khugepaged+0x8b0/0x1987 > [18344.236662] {IN-RECLAIM_FS-W} state was registered at: > [18344.23] [] __lock_acquire+0x8e2/0x1183 > [18344.236673] [] lock_acquire+0x10b/0x1a6 > [18344.236678] [] down_write+0x3b/0x6a > [18344.236686] [] split_huge_page_to_list+0x5b/0x61f > [18344.236689] [] add_to_swap+0x37/0x78 > [18344.236691] [] shrink_page_list+0x4c2/0xb9a > [18344.236694] [] shrink_inactive_list+0x371/0x5d9 > [18344.236696] [] shrink_lruvec+0x410/0x5ae > [18344.236698] [] shrink_zone+0x57/0x140 > [18344.236700] [] kswapd+0x6a5/0x91b > [18344.236702] [] kthread+0x107/0x10f > [18344.236706] [] ret_from_fork+0x3f/0x70 > [18344.236708] irq event stamp: 6517947 > [18344.236709] hardirqs last enabled at (6517947): [] > get_page_from_freelist+0x362/0x59e > [18344.236713] hardirqs last disabled at (6517946): [] > _raw_spin_lock_irqsave+0x18/0x51 > [18344.236715] softirqs last enabled at (6507072): [] > __do_softirq+0x2df/0x3f5 > [18344.236719] softirqs last disabled at (6507055): [] > irq_exit+0x40/0x94 > [18344.236722] >other info that might help us debug this: > [18344.236723] Possible unsafe locking scenario: > > [18344.236724]CPU0 > [18344.236725] > [18344.236726] lock(_vma->rwsem); > [18344.236728] > [18344.236729] lock(_vma->rwsem); > [18344.236731] > *** DEADLOCK *** > > [18344.236733] 2 locks held by khugepaged/32: > [18344.236733] #0: (>mmap_sem){++}, at: [] > khugepaged+0x5cf/0x1987 > [18344.236738] #1: (_vma->rwsem){?.}, at: [] > khugepaged+0x8b0/0x1987 > [18344.236741] >stack backtrace: > [18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted > 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 > [18344.236747] 880132827a00 81230867 > 8237ba90 > [18344.236750] 880132827a38 810ea9b9 000a > 8801333b52e0 > [18344.236753] 8801333b4c00 8107b3ce 000a > 880132827a78 > [18344.236755] Call Trace: > [18344.236758] [] dump_stack+0x4e/0x79 > [18344.236761] [] print_usage_bug.part.24+0x259/0x268 > [18344.236763] [] ? > print_shortest_lock_dependencies+0x180/0x180 > [18344.236765] [] mark_lock+0x381/0x567 > [18344.236766] [] mark_held_locks+0x5e/0x74 > [18344.236768] [] lockdep_trace_alloc+0xb0/0xb3 > [18344.236771] [] __alloc_pages_nodemask+0x99/0x856 > [18344.236772] [] ? find_get_entry+0x14b/0x17a > [18344.236774] [] ? find_get_entry+0x168/0x17a > [18344.236777] [] __read_swap_cache_async+0x7b/0x1aa > [18344.236778] [] read_swap_cache_async+0x15/0x2d > [18344.236780] [] swapin_readahead+0x11a/0x16a > [18344.236783] [] do_swap_page+0xa7/0x36b > [18344.236784] [] ? do_swap_page+0xa7/0x36b > [18344.236787] [] khugepaged+0x8f9/0x1987 > [18344.236790] [] ? wait_woken+0x88/0x88 > [18344.236792] [] ? maybe_pmd_mkwrite+0x1a/0x1a > [18344.236794] [] kthread+0x107/0x10f > [18344.236797] [] ? kthread_create_on_node+0x1ea/0x1ea > [18344.236799] [] ret_from_fork+0x3f/0x70 > [18344.236801] [] ? kthread_create_on_node+0x1ea/0x1ea Hm. If I read this correctly, we see following scenario: - khugepaged tries to swap in a page under mmap_sem and anon_vma lock; - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE; - __read_swap_cache_async() tries to allocate the page for swap in; - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with given gfp_mask we could end up in direct relaim. - Lockdep already knows that reclaim sometimes (e.g. in case of split_huge_page()) wants to take anon_vma lock on its own. Therefore deadlock is possible. I see two ways to fix this: - take anon_vma lock *after* __collapse_huge_page_swapin() in collapse_huge_page(): I don't really see why we need the lock during swapin; - respect FAULT_FLAG_RETRY_NOWAIT in do_swap_page(): add GFP_NOWAIT to gfp_mask for swapin_readahead() in this case. I guess it could be beneficial to do both. Any comments? -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-next] khugepaged inconsistent lock state
On Mon, 21 Sep 2015, Kirill A. Shutemov wrote: > On Mon, Sep 21, 2015 at 01:46:00PM +0900, Sergey Senozhatsky wrote: > > Hi, > > > > 4.3.0-rc1-next-20150918 > > > > [18344.236625] = > > [18344.236628] [ INFO: inconsistent lock state ] > > [18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not > > tainted > > [18344.236636] - > > [18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. > > [18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes: > > [18344.236648] (_vma->rwsem){?.}, at: [] > > khugepaged+0x8b0/0x1987 > > [18344.236662] {IN-RECLAIM_FS-W} state was registered at: > > [18344.23] [] __lock_acquire+0x8e2/0x1183 > > [18344.236673] [] lock_acquire+0x10b/0x1a6 > > [18344.236678] [] down_write+0x3b/0x6a > > [18344.236686] [] split_huge_page_to_list+0x5b/0x61f > > [18344.236689] [] add_to_swap+0x37/0x78 > > [18344.236691] [] shrink_page_list+0x4c2/0xb9a > > [18344.236694] [] shrink_inactive_list+0x371/0x5d9 > > [18344.236696] [] shrink_lruvec+0x410/0x5ae > > [18344.236698] [] shrink_zone+0x57/0x140 > > [18344.236700] [] kswapd+0x6a5/0x91b > > [18344.236702] [] kthread+0x107/0x10f > > [18344.236706] [] ret_from_fork+0x3f/0x70 > > [18344.236708] irq event stamp: 6517947 > > [18344.236709] hardirqs last enabled at (6517947): [] > > get_page_from_freelist+0x362/0x59e > > [18344.236713] hardirqs last disabled at (6517946): [] > > _raw_spin_lock_irqsave+0x18/0x51 > > [18344.236715] softirqs last enabled at (6507072): [] > > __do_softirq+0x2df/0x3f5 > > [18344.236719] softirqs last disabled at (6507055): [] > > irq_exit+0x40/0x94 > > [18344.236722] > >other info that might help us debug this: > > [18344.236723] Possible unsafe locking scenario: > > > > [18344.236724]CPU0 > > [18344.236725] > > [18344.236726] lock(_vma->rwsem); > > [18344.236728] > > [18344.236729] lock(_vma->rwsem); > > [18344.236731] > > *** DEADLOCK *** > > > > [18344.236733] 2 locks held by khugepaged/32: > > [18344.236733] #0: (>mmap_sem){++}, at: [] > > khugepaged+0x5cf/0x1987 > > [18344.236738] #1: (_vma->rwsem){?.}, at: [] > > khugepaged+0x8b0/0x1987 > > [18344.236741] > >stack backtrace: > > [18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted > > 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 > > [18344.236747] 880132827a00 81230867 > > 8237ba90 > > [18344.236750] 880132827a38 810ea9b9 000a > > 8801333b52e0 > > [18344.236753] 8801333b4c00 8107b3ce 000a > > 880132827a78 > > [18344.236755] Call Trace: > > [18344.236758] [] dump_stack+0x4e/0x79 > > [18344.236761] [] print_usage_bug.part.24+0x259/0x268 > > [18344.236763] [] ? > > print_shortest_lock_dependencies+0x180/0x180 > > [18344.236765] [] mark_lock+0x381/0x567 > > [18344.236766] [] mark_held_locks+0x5e/0x74 > > [18344.236768] [] lockdep_trace_alloc+0xb0/0xb3 > > [18344.236771] [] __alloc_pages_nodemask+0x99/0x856 > > [18344.236772] [] ? find_get_entry+0x14b/0x17a > > [18344.236774] [] ? find_get_entry+0x168/0x17a > > [18344.236777] [] __read_swap_cache_async+0x7b/0x1aa > > [18344.236778] [] read_swap_cache_async+0x15/0x2d > > [18344.236780] [] swapin_readahead+0x11a/0x16a > > [18344.236783] [] do_swap_page+0xa7/0x36b > > [18344.236784] [] ? do_swap_page+0xa7/0x36b > > [18344.236787] [] khugepaged+0x8f9/0x1987 > > [18344.236790] [] ? wait_woken+0x88/0x88 > > [18344.236792] [] ? maybe_pmd_mkwrite+0x1a/0x1a > > [18344.236794] [] kthread+0x107/0x10f > > [18344.236797] [] ? kthread_create_on_node+0x1ea/0x1ea > > [18344.236799] [] ret_from_fork+0x3f/0x70 > > [18344.236801] [] ? kthread_create_on_node+0x1ea/0x1ea > > Hm. If I read this correctly, we see following scenario: > > - khugepaged tries to swap in a page under mmap_sem and anon_vma lock; > - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE; > - __read_swap_cache_async() tries to allocate the page for swap in; > - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with >given gfp_mask we could end up in direct relaim. > - Lockdep already knows that reclaim sometimes (e.g. in case of >split_huge_page()) wants to take anon_vma lock on its own. > > Therefore deadlock is possible. Oh, thank you for working that out. As usual with a lockdep trace, I knew it was telling me something important, but in a language I just couldn't understand without spending much longer to decode it. Yes, wrong to call do_swap_page() while holding anon_vma lock. > > I see two ways to fix this: > > - take anon_vma lock *after* __collapse_huge_page_swapin() in >collapse_huge_page(): I don't really see why we need the lock >during swapin; Agreed. > - respect FAULT_FLAG_RETRY_NOWAIT in do_swap_page(): add GFP_NOWAIT to >
Re: [linux-next] khugepaged inconsistent lock state
On Mon, Sep 21, 2015 at 01:46:00PM +0900, Sergey Senozhatsky wrote: > Hi, > > 4.3.0-rc1-next-20150918 > > [18344.236625] = > [18344.236628] [ INFO: inconsistent lock state ] > [18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not > tainted > [18344.236636] - > [18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. > [18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes: > [18344.236648] (_vma->rwsem){?.}, at: [] > khugepaged+0x8b0/0x1987 > [18344.236662] {IN-RECLAIM_FS-W} state was registered at: > [18344.23] [] __lock_acquire+0x8e2/0x1183 > [18344.236673] [] lock_acquire+0x10b/0x1a6 > [18344.236678] [] down_write+0x3b/0x6a > [18344.236686] [] split_huge_page_to_list+0x5b/0x61f > [18344.236689] [] add_to_swap+0x37/0x78 > [18344.236691] [] shrink_page_list+0x4c2/0xb9a > [18344.236694] [] shrink_inactive_list+0x371/0x5d9 > [18344.236696] [] shrink_lruvec+0x410/0x5ae > [18344.236698] [] shrink_zone+0x57/0x140 > [18344.236700] [] kswapd+0x6a5/0x91b > [18344.236702] [] kthread+0x107/0x10f > [18344.236706] [] ret_from_fork+0x3f/0x70 > [18344.236708] irq event stamp: 6517947 > [18344.236709] hardirqs last enabled at (6517947): [] > get_page_from_freelist+0x362/0x59e > [18344.236713] hardirqs last disabled at (6517946): [] > _raw_spin_lock_irqsave+0x18/0x51 > [18344.236715] softirqs last enabled at (6507072): [] > __do_softirq+0x2df/0x3f5 > [18344.236719] softirqs last disabled at (6507055): [] > irq_exit+0x40/0x94 > [18344.236722] >other info that might help us debug this: > [18344.236723] Possible unsafe locking scenario: > > [18344.236724]CPU0 > [18344.236725] > [18344.236726] lock(_vma->rwsem); > [18344.236728] > [18344.236729] lock(_vma->rwsem); > [18344.236731] > *** DEADLOCK *** > > [18344.236733] 2 locks held by khugepaged/32: > [18344.236733] #0: (>mmap_sem){++}, at: [] > khugepaged+0x5cf/0x1987 > [18344.236738] #1: (_vma->rwsem){?.}, at: [] > khugepaged+0x8b0/0x1987 > [18344.236741] >stack backtrace: > [18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted > 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 > [18344.236747] 880132827a00 81230867 > 8237ba90 > [18344.236750] 880132827a38 810ea9b9 000a > 8801333b52e0 > [18344.236753] 8801333b4c00 8107b3ce 000a > 880132827a78 > [18344.236755] Call Trace: > [18344.236758] [] dump_stack+0x4e/0x79 > [18344.236761] [] print_usage_bug.part.24+0x259/0x268 > [18344.236763] [] ? > print_shortest_lock_dependencies+0x180/0x180 > [18344.236765] [] mark_lock+0x381/0x567 > [18344.236766] [] mark_held_locks+0x5e/0x74 > [18344.236768] [] lockdep_trace_alloc+0xb0/0xb3 > [18344.236771] [] __alloc_pages_nodemask+0x99/0x856 > [18344.236772] [] ? find_get_entry+0x14b/0x17a > [18344.236774] [] ? find_get_entry+0x168/0x17a > [18344.236777] [] __read_swap_cache_async+0x7b/0x1aa > [18344.236778] [] read_swap_cache_async+0x15/0x2d > [18344.236780] [] swapin_readahead+0x11a/0x16a > [18344.236783] [] do_swap_page+0xa7/0x36b > [18344.236784] [] ? do_swap_page+0xa7/0x36b > [18344.236787] [] khugepaged+0x8f9/0x1987 > [18344.236790] [] ? wait_woken+0x88/0x88 > [18344.236792] [] ? maybe_pmd_mkwrite+0x1a/0x1a > [18344.236794] [] kthread+0x107/0x10f > [18344.236797] [] ? kthread_create_on_node+0x1ea/0x1ea > [18344.236799] [] ret_from_fork+0x3f/0x70 > [18344.236801] [] ? kthread_create_on_node+0x1ea/0x1ea Hm. If I read this correctly, we see following scenario: - khugepaged tries to swap in a page under mmap_sem and anon_vma lock; - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE; - __read_swap_cache_async() tries to allocate the page for swap in; - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with given gfp_mask we could end up in direct relaim. - Lockdep already knows that reclaim sometimes (e.g. in case of split_huge_page()) wants to take anon_vma lock on its own. Therefore deadlock is possible. I see two ways to fix this: - take anon_vma lock *after* __collapse_huge_page_swapin() in collapse_huge_page(): I don't really see why we need the lock during swapin; - respect FAULT_FLAG_RETRY_NOWAIT in do_swap_page(): add GFP_NOWAIT to gfp_mask for swapin_readahead() in this case. I guess it could be beneficial to do both. Any comments? -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[linux-next] khugepaged inconsistent lock state
Hi, 4.3.0-rc1-next-20150918 [18344.236625] = [18344.236628] [ INFO: inconsistent lock state ] [18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not tainted [18344.236636] - [18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. [18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes: [18344.236648] (_vma->rwsem){?.}, at: [] khugepaged+0x8b0/0x1987 [18344.236662] {IN-RECLAIM_FS-W} state was registered at: [18344.23] [] __lock_acquire+0x8e2/0x1183 [18344.236673] [] lock_acquire+0x10b/0x1a6 [18344.236678] [] down_write+0x3b/0x6a [18344.236686] [] split_huge_page_to_list+0x5b/0x61f [18344.236689] [] add_to_swap+0x37/0x78 [18344.236691] [] shrink_page_list+0x4c2/0xb9a [18344.236694] [] shrink_inactive_list+0x371/0x5d9 [18344.236696] [] shrink_lruvec+0x410/0x5ae [18344.236698] [] shrink_zone+0x57/0x140 [18344.236700] [] kswapd+0x6a5/0x91b [18344.236702] [] kthread+0x107/0x10f [18344.236706] [] ret_from_fork+0x3f/0x70 [18344.236708] irq event stamp: 6517947 [18344.236709] hardirqs last enabled at (6517947): [] get_page_from_freelist+0x362/0x59e [18344.236713] hardirqs last disabled at (6517946): [] _raw_spin_lock_irqsave+0x18/0x51 [18344.236715] softirqs last enabled at (6507072): [] __do_softirq+0x2df/0x3f5 [18344.236719] softirqs last disabled at (6507055): [] irq_exit+0x40/0x94 [18344.236722] other info that might help us debug this: [18344.236723] Possible unsafe locking scenario: [18344.236724]CPU0 [18344.236725] [18344.236726] lock(_vma->rwsem); [18344.236728] [18344.236729] lock(_vma->rwsem); [18344.236731] *** DEADLOCK *** [18344.236733] 2 locks held by khugepaged/32: [18344.236733] #0: (>mmap_sem){++}, at: [] khugepaged+0x5cf/0x1987 [18344.236738] #1: (_vma->rwsem){?.}, at: [] khugepaged+0x8b0/0x1987 [18344.236741] stack backtrace: [18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 [18344.236747] 880132827a00 81230867 8237ba90 [18344.236750] 880132827a38 810ea9b9 000a 8801333b52e0 [18344.236753] 8801333b4c00 8107b3ce 000a 880132827a78 [18344.236755] Call Trace: [18344.236758] [] dump_stack+0x4e/0x79 [18344.236761] [] print_usage_bug.part.24+0x259/0x268 [18344.236763] [] ? print_shortest_lock_dependencies+0x180/0x180 [18344.236765] [] mark_lock+0x381/0x567 [18344.236766] [] mark_held_locks+0x5e/0x74 [18344.236768] [] lockdep_trace_alloc+0xb0/0xb3 [18344.236771] [] __alloc_pages_nodemask+0x99/0x856 [18344.236772] [] ? find_get_entry+0x14b/0x17a [18344.236774] [] ? find_get_entry+0x168/0x17a [18344.236777] [] __read_swap_cache_async+0x7b/0x1aa [18344.236778] [] read_swap_cache_async+0x15/0x2d [18344.236780] [] swapin_readahead+0x11a/0x16a [18344.236783] [] do_swap_page+0xa7/0x36b [18344.236784] [] ? do_swap_page+0xa7/0x36b [18344.236787] [] khugepaged+0x8f9/0x1987 [18344.236790] [] ? wait_woken+0x88/0x88 [18344.236792] [] ? maybe_pmd_mkwrite+0x1a/0x1a [18344.236794] [] kthread+0x107/0x10f [18344.236797] [] ? kthread_create_on_node+0x1ea/0x1ea [18344.236799] [] ret_from_fork+0x3f/0x70 [18344.236801] [] ? kthread_create_on_node+0x1ea/0x1ea -ss -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[linux-next] khugepaged inconsistent lock state
Hi, 4.3.0-rc1-next-20150918 [18344.236625] = [18344.236628] [ INFO: inconsistent lock state ] [18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not tainted [18344.236636] - [18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. [18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes: [18344.236648] (_vma->rwsem){?.}, at: [] khugepaged+0x8b0/0x1987 [18344.236662] {IN-RECLAIM_FS-W} state was registered at: [18344.23] [] __lock_acquire+0x8e2/0x1183 [18344.236673] [] lock_acquire+0x10b/0x1a6 [18344.236678] [] down_write+0x3b/0x6a [18344.236686] [] split_huge_page_to_list+0x5b/0x61f [18344.236689] [] add_to_swap+0x37/0x78 [18344.236691] [] shrink_page_list+0x4c2/0xb9a [18344.236694] [] shrink_inactive_list+0x371/0x5d9 [18344.236696] [] shrink_lruvec+0x410/0x5ae [18344.236698] [] shrink_zone+0x57/0x140 [18344.236700] [] kswapd+0x6a5/0x91b [18344.236702] [] kthread+0x107/0x10f [18344.236706] [] ret_from_fork+0x3f/0x70 [18344.236708] irq event stamp: 6517947 [18344.236709] hardirqs last enabled at (6517947): [] get_page_from_freelist+0x362/0x59e [18344.236713] hardirqs last disabled at (6517946): [] _raw_spin_lock_irqsave+0x18/0x51 [18344.236715] softirqs last enabled at (6507072): [] __do_softirq+0x2df/0x3f5 [18344.236719] softirqs last disabled at (6507055): [] irq_exit+0x40/0x94 [18344.236722] other info that might help us debug this: [18344.236723] Possible unsafe locking scenario: [18344.236724]CPU0 [18344.236725] [18344.236726] lock(_vma->rwsem); [18344.236728] [18344.236729] lock(_vma->rwsem); [18344.236731] *** DEADLOCK *** [18344.236733] 2 locks held by khugepaged/32: [18344.236733] #0: (>mmap_sem){++}, at: [] khugepaged+0x5cf/0x1987 [18344.236738] #1: (_vma->rwsem){?.}, at: [] khugepaged+0x8b0/0x1987 [18344.236741] stack backtrace: [18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 [18344.236747] 880132827a00 81230867 8237ba90 [18344.236750] 880132827a38 810ea9b9 000a 8801333b52e0 [18344.236753] 8801333b4c00 8107b3ce 000a 880132827a78 [18344.236755] Call Trace: [18344.236758] [] dump_stack+0x4e/0x79 [18344.236761] [] print_usage_bug.part.24+0x259/0x268 [18344.236763] [] ? print_shortest_lock_dependencies+0x180/0x180 [18344.236765] [] mark_lock+0x381/0x567 [18344.236766] [] mark_held_locks+0x5e/0x74 [18344.236768] [] lockdep_trace_alloc+0xb0/0xb3 [18344.236771] [] __alloc_pages_nodemask+0x99/0x856 [18344.236772] [] ? find_get_entry+0x14b/0x17a [18344.236774] [] ? find_get_entry+0x168/0x17a [18344.236777] [] __read_swap_cache_async+0x7b/0x1aa [18344.236778] [] read_swap_cache_async+0x15/0x2d [18344.236780] [] swapin_readahead+0x11a/0x16a [18344.236783] [] do_swap_page+0xa7/0x36b [18344.236784] [] ? do_swap_page+0xa7/0x36b [18344.236787] [] khugepaged+0x8f9/0x1987 [18344.236790] [] ? wait_woken+0x88/0x88 [18344.236792] [] ? maybe_pmd_mkwrite+0x1a/0x1a [18344.236794] [] kthread+0x107/0x10f [18344.236797] [] ? kthread_create_on_node+0x1ea/0x1ea [18344.236799] [] ret_from_fork+0x3f/0x70 [18344.236801] [] ? kthread_create_on_node+0x1ea/0x1ea -ss -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/