Re: [linux-next] khugepaged inconsistent lock state

2015-09-23 Thread Sergey Senozhatsky
On (09/23/15 16:22), Kirill A. Shutemov wrote:
[..]
> khugepaged does swap in during collapse under anon_vma lock. It causes
> complain from lockdep. The trace below shows following scenario:
> 
>  - khugepaged tries to swap in a page under mmap_sem and anon_vma lock;
>  - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE;
>  - __read_swap_cache_async() tries to allocate the page for swap in;
>  - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with
>given gfp_mask we could end up in direct relaim.
>  - Lockdep already knows that reclaim sometimes (e.g. in case of
>split_huge_page()) wants to take anon_vma lock on its own.
> 
> Therefore deadlock is possible.
[..]

Gave it some testing on my box. Works fine on my side.

I guess you can add (if needed)
Tested-by: Sergey Senozhatsky 

-ss

> Signed-off-by: Kirill A. Shutemov 
> Reported-by: Sergey Senozhatsky 
> ---
>  mm/huge_memory.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index dd58ecfcafe6..06c8f6d8fee2 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2725,10 +2725,10 @@ static void collapse_huge_page(struct mm_struct *mm,
>   goto out;
>   }
>  
> - anon_vma_lock_write(vma->anon_vma);
> -
>   __collapse_huge_page_swapin(mm, vma, address, pmd);
>  
> + anon_vma_lock_write(vma->anon_vma);
> +
>   pte = pte_offset_map(pmd, address);
>   pte_ptl = pte_lockptr(mm, pmd);
>  
> -- 
>  Kirill A. Shutemov
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-next] khugepaged inconsistent lock state

2015-09-23 Thread Ebru Akagündüz
2015-09-23 15:22 GMT+02:00 Kirill A. Shutemov :
> On Mon, Sep 21, 2015 at 04:57:05PM -0700, Hugh Dickins wrote:
>> On Mon, 21 Sep 2015, Kirill A. Shutemov wrote:
>> > On Mon, Sep 21, 2015 at 01:46:00PM +0900, Sergey Senozhatsky wrote:
>> > > Hi,
>> > >
>> > > 4.3.0-rc1-next-20150918
>> > >
>> > > [18344.236625] =
>> > > [18344.236628] [ INFO: inconsistent lock state ]
>> > > [18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not 
>> > > tainted
>> > > [18344.236636] -
>> > > [18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
>> > > [18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes:
>> > > [18344.236648]  (_vma->rwsem){?.}, at: [] 
>> > > khugepaged+0x8b0/0x1987
>> > > [18344.236662] {IN-RECLAIM_FS-W} state was registered at:
>> > > [18344.23]   [] __lock_acquire+0x8e2/0x1183
>> > > [18344.236673]   [] lock_acquire+0x10b/0x1a6
>> > > [18344.236678]   [] down_write+0x3b/0x6a
>> > > [18344.236686]   [] split_huge_page_to_list+0x5b/0x61f
>> > > [18344.236689]   [] add_to_swap+0x37/0x78
>> > > [18344.236691]   [] shrink_page_list+0x4c2/0xb9a
>> > > [18344.236694]   [] shrink_inactive_list+0x371/0x5d9
>> > > [18344.236696]   [] shrink_lruvec+0x410/0x5ae
>> > > [18344.236698]   [] shrink_zone+0x57/0x140
>> > > [18344.236700]   [] kswapd+0x6a5/0x91b
>> > > [18344.236702]   [] kthread+0x107/0x10f
>> > > [18344.236706]   [] ret_from_fork+0x3f/0x70
>> > > [18344.236708] irq event stamp: 6517947
>> > > [18344.236709] hardirqs last  enabled at (6517947): [] 
>> > > get_page_from_freelist+0x362/0x59e
>> > > [18344.236713] hardirqs last disabled at (6517946): [] 
>> > > _raw_spin_lock_irqsave+0x18/0x51
>> > > [18344.236715] softirqs last  enabled at (6507072): [] 
>> > > __do_softirq+0x2df/0x3f5
>> > > [18344.236719] softirqs last disabled at (6507055): [] 
>> > > irq_exit+0x40/0x94
>> > > [18344.236722]
>> > >other info that might help us debug this:
>> > > [18344.236723]  Possible unsafe locking scenario:
>> > >
>> > > [18344.236724]CPU0
>> > > [18344.236725]
>> > > [18344.236726]   lock(_vma->rwsem);
>> > > [18344.236728]   
>> > > [18344.236729] lock(_vma->rwsem);
>> > > [18344.236731]
>> > > *** DEADLOCK ***
>> > >
>> > > [18344.236733] 2 locks held by khugepaged/32:
>> > > [18344.236733]  #0:  (>mmap_sem){++}, at: [] 
>> > > khugepaged+0x5cf/0x1987
>> > > [18344.236738]  #1:  (_vma->rwsem){?.}, at: 
>> > > [] khugepaged+0x8b0/0x1987
>> > > [18344.236741]
>> > >stack backtrace:
>> > > [18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted 
>> > > 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361
>> > > [18344.236747]   880132827a00 81230867 
>> > > 8237ba90
>> > > [18344.236750]  880132827a38 810ea9b9 000a 
>> > > 8801333b52e0
>> > > [18344.236753]  8801333b4c00 8107b3ce 000a 
>> > > 880132827a78
>> > > [18344.236755] Call Trace:
>> > > [18344.236758]  [] dump_stack+0x4e/0x79
>> > > [18344.236761]  [] print_usage_bug.part.24+0x259/0x268
>> > > [18344.236763]  [] ? 
>> > > print_shortest_lock_dependencies+0x180/0x180
>> > > [18344.236765]  [] mark_lock+0x381/0x567
>> > > [18344.236766]  [] mark_held_locks+0x5e/0x74
>> > > [18344.236768]  [] lockdep_trace_alloc+0xb0/0xb3
>> > > [18344.236771]  [] __alloc_pages_nodemask+0x99/0x856
>> > > [18344.236772]  [] ? find_get_entry+0x14b/0x17a
>> > > [18344.236774]  [] ? find_get_entry+0x168/0x17a
>> > > [18344.236777]  [] __read_swap_cache_async+0x7b/0x1aa
>> > > [18344.236778]  [] read_swap_cache_async+0x15/0x2d
>> > > [18344.236780]  [] swapin_readahead+0x11a/0x16a
>> > > [18344.236783]  [] do_swap_page+0xa7/0x36b
>> > > [18344.236784]  [] ? do_swap_page+0xa7/0x36b
>> > > [18344.236787]  [] khugepaged+0x8f9/0x1987
>> > > [18344.236790]  [] ? wait_woken+0x88/0x88
>> > > [18344.236792]  [] ? maybe_pmd_mkwrite+0x1a/0x1a
>> > > [18344.236794]  [] kthread+0x107/0x10f
>> > > [18344.236797]  [] ? kthread_create_on_node+0x1ea/0x1ea
>> > > [18344.236799]  [] ret_from_fork+0x3f/0x70
>> > > [18344.236801]  [] ? kthread_create_on_node+0x1ea/0x1ea
>> >
>> > Hm. If I read this correctly, we see following scenario:
>> >
>> >  - khugepaged tries to swap in a page under mmap_sem and anon_vma lock;
>> >  - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE;
>> >  - __read_swap_cache_async() tries to allocate the page for swap in;
>> >  - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with
>> >given gfp_mask we could end up in direct relaim.
>> >  - Lockdep already knows that reclaim sometimes (e.g. in case of
>> >split_huge_page()) wants to take anon_vma lock on its own.
>> >
>> > Therefore deadlock is possible.
>>
>> Oh, thank you for working that out.  As usual with a lockdep trace,
>> I knew it was telling me something important, but in a language I
>> 

Re: [linux-next] khugepaged inconsistent lock state

2015-09-23 Thread Kirill A. Shutemov
On Mon, Sep 21, 2015 at 04:57:05PM -0700, Hugh Dickins wrote:
> On Mon, 21 Sep 2015, Kirill A. Shutemov wrote:
> > On Mon, Sep 21, 2015 at 01:46:00PM +0900, Sergey Senozhatsky wrote:
> > > Hi,
> > > 
> > > 4.3.0-rc1-next-20150918
> > > 
> > > [18344.236625] =
> > > [18344.236628] [ INFO: inconsistent lock state ]
> > > [18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not 
> > > tainted
> > > [18344.236636] -
> > > [18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
> > > [18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > > [18344.236648]  (_vma->rwsem){?.}, at: [] 
> > > khugepaged+0x8b0/0x1987
> > > [18344.236662] {IN-RECLAIM_FS-W} state was registered at:
> > > [18344.23]   [] __lock_acquire+0x8e2/0x1183
> > > [18344.236673]   [] lock_acquire+0x10b/0x1a6
> > > [18344.236678]   [] down_write+0x3b/0x6a
> > > [18344.236686]   [] split_huge_page_to_list+0x5b/0x61f
> > > [18344.236689]   [] add_to_swap+0x37/0x78
> > > [18344.236691]   [] shrink_page_list+0x4c2/0xb9a
> > > [18344.236694]   [] shrink_inactive_list+0x371/0x5d9
> > > [18344.236696]   [] shrink_lruvec+0x410/0x5ae
> > > [18344.236698]   [] shrink_zone+0x57/0x140
> > > [18344.236700]   [] kswapd+0x6a5/0x91b
> > > [18344.236702]   [] kthread+0x107/0x10f
> > > [18344.236706]   [] ret_from_fork+0x3f/0x70
> > > [18344.236708] irq event stamp: 6517947
> > > [18344.236709] hardirqs last  enabled at (6517947): [] 
> > > get_page_from_freelist+0x362/0x59e
> > > [18344.236713] hardirqs last disabled at (6517946): [] 
> > > _raw_spin_lock_irqsave+0x18/0x51
> > > [18344.236715] softirqs last  enabled at (6507072): [] 
> > > __do_softirq+0x2df/0x3f5
> > > [18344.236719] softirqs last disabled at (6507055): [] 
> > > irq_exit+0x40/0x94
> > > [18344.236722] 
> > >other info that might help us debug this:
> > > [18344.236723]  Possible unsafe locking scenario:
> > > 
> > > [18344.236724]CPU0
> > > [18344.236725]
> > > [18344.236726]   lock(_vma->rwsem);
> > > [18344.236728]   
> > > [18344.236729] lock(_vma->rwsem);
> > > [18344.236731] 
> > > *** DEADLOCK ***
> > > 
> > > [18344.236733] 2 locks held by khugepaged/32:
> > > [18344.236733]  #0:  (>mmap_sem){++}, at: [] 
> > > khugepaged+0x5cf/0x1987
> > > [18344.236738]  #1:  (_vma->rwsem){?.}, at: [] 
> > > khugepaged+0x8b0/0x1987
> > > [18344.236741] 
> > >stack backtrace:
> > > [18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted 
> > > 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361
> > > [18344.236747]   880132827a00 81230867 
> > > 8237ba90
> > > [18344.236750]  880132827a38 810ea9b9 000a 
> > > 8801333b52e0
> > > [18344.236753]  8801333b4c00 8107b3ce 000a 
> > > 880132827a78
> > > [18344.236755] Call Trace:
> > > [18344.236758]  [] dump_stack+0x4e/0x79
> > > [18344.236761]  [] print_usage_bug.part.24+0x259/0x268
> > > [18344.236763]  [] ? 
> > > print_shortest_lock_dependencies+0x180/0x180
> > > [18344.236765]  [] mark_lock+0x381/0x567
> > > [18344.236766]  [] mark_held_locks+0x5e/0x74
> > > [18344.236768]  [] lockdep_trace_alloc+0xb0/0xb3
> > > [18344.236771]  [] __alloc_pages_nodemask+0x99/0x856
> > > [18344.236772]  [] ? find_get_entry+0x14b/0x17a
> > > [18344.236774]  [] ? find_get_entry+0x168/0x17a
> > > [18344.236777]  [] __read_swap_cache_async+0x7b/0x1aa
> > > [18344.236778]  [] read_swap_cache_async+0x15/0x2d
> > > [18344.236780]  [] swapin_readahead+0x11a/0x16a
> > > [18344.236783]  [] do_swap_page+0xa7/0x36b
> > > [18344.236784]  [] ? do_swap_page+0xa7/0x36b
> > > [18344.236787]  [] khugepaged+0x8f9/0x1987
> > > [18344.236790]  [] ? wait_woken+0x88/0x88
> > > [18344.236792]  [] ? maybe_pmd_mkwrite+0x1a/0x1a
> > > [18344.236794]  [] kthread+0x107/0x10f
> > > [18344.236797]  [] ? kthread_create_on_node+0x1ea/0x1ea
> > > [18344.236799]  [] ret_from_fork+0x3f/0x70
> > > [18344.236801]  [] ? kthread_create_on_node+0x1ea/0x1ea
> > 
> > Hm. If I read this correctly, we see following scenario:
> > 
> >  - khugepaged tries to swap in a page under mmap_sem and anon_vma lock;
> >  - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE;
> >  - __read_swap_cache_async() tries to allocate the page for swap in;
> >  - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with
> >given gfp_mask we could end up in direct relaim.
> >  - Lockdep already knows that reclaim sometimes (e.g. in case of
> >split_huge_page()) wants to take anon_vma lock on its own.
> > 
> > Therefore deadlock is possible.
> 
> Oh, thank you for working that out.  As usual with a lockdep trace,
> I knew it was telling me something important, but in a language I
> just couldn't understand without spending much longer to decode it.
> Yes, wrong to call do_swap_page() while holding anon_vma lock.
> 
> > 
> 

Re: [linux-next] khugepaged inconsistent lock state

2015-09-23 Thread Kirill A. Shutemov
On Mon, Sep 21, 2015 at 04:57:05PM -0700, Hugh Dickins wrote:
> On Mon, 21 Sep 2015, Kirill A. Shutemov wrote:
> > On Mon, Sep 21, 2015 at 01:46:00PM +0900, Sergey Senozhatsky wrote:
> > > Hi,
> > > 
> > > 4.3.0-rc1-next-20150918
> > > 
> > > [18344.236625] =
> > > [18344.236628] [ INFO: inconsistent lock state ]
> > > [18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not 
> > > tainted
> > > [18344.236636] -
> > > [18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
> > > [18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > > [18344.236648]  (_vma->rwsem){?.}, at: [] 
> > > khugepaged+0x8b0/0x1987
> > > [18344.236662] {IN-RECLAIM_FS-W} state was registered at:
> > > [18344.23]   [] __lock_acquire+0x8e2/0x1183
> > > [18344.236673]   [] lock_acquire+0x10b/0x1a6
> > > [18344.236678]   [] down_write+0x3b/0x6a
> > > [18344.236686]   [] split_huge_page_to_list+0x5b/0x61f
> > > [18344.236689]   [] add_to_swap+0x37/0x78
> > > [18344.236691]   [] shrink_page_list+0x4c2/0xb9a
> > > [18344.236694]   [] shrink_inactive_list+0x371/0x5d9
> > > [18344.236696]   [] shrink_lruvec+0x410/0x5ae
> > > [18344.236698]   [] shrink_zone+0x57/0x140
> > > [18344.236700]   [] kswapd+0x6a5/0x91b
> > > [18344.236702]   [] kthread+0x107/0x10f
> > > [18344.236706]   [] ret_from_fork+0x3f/0x70
> > > [18344.236708] irq event stamp: 6517947
> > > [18344.236709] hardirqs last  enabled at (6517947): [] 
> > > get_page_from_freelist+0x362/0x59e
> > > [18344.236713] hardirqs last disabled at (6517946): [] 
> > > _raw_spin_lock_irqsave+0x18/0x51
> > > [18344.236715] softirqs last  enabled at (6507072): [] 
> > > __do_softirq+0x2df/0x3f5
> > > [18344.236719] softirqs last disabled at (6507055): [] 
> > > irq_exit+0x40/0x94
> > > [18344.236722] 
> > >other info that might help us debug this:
> > > [18344.236723]  Possible unsafe locking scenario:
> > > 
> > > [18344.236724]CPU0
> > > [18344.236725]
> > > [18344.236726]   lock(_vma->rwsem);
> > > [18344.236728]   
> > > [18344.236729] lock(_vma->rwsem);
> > > [18344.236731] 
> > > *** DEADLOCK ***
> > > 
> > > [18344.236733] 2 locks held by khugepaged/32:
> > > [18344.236733]  #0:  (>mmap_sem){++}, at: [] 
> > > khugepaged+0x5cf/0x1987
> > > [18344.236738]  #1:  (_vma->rwsem){?.}, at: [] 
> > > khugepaged+0x8b0/0x1987
> > > [18344.236741] 
> > >stack backtrace:
> > > [18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted 
> > > 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361
> > > [18344.236747]   880132827a00 81230867 
> > > 8237ba90
> > > [18344.236750]  880132827a38 810ea9b9 000a 
> > > 8801333b52e0
> > > [18344.236753]  8801333b4c00 8107b3ce 000a 
> > > 880132827a78
> > > [18344.236755] Call Trace:
> > > [18344.236758]  [] dump_stack+0x4e/0x79
> > > [18344.236761]  [] print_usage_bug.part.24+0x259/0x268
> > > [18344.236763]  [] ? 
> > > print_shortest_lock_dependencies+0x180/0x180
> > > [18344.236765]  [] mark_lock+0x381/0x567
> > > [18344.236766]  [] mark_held_locks+0x5e/0x74
> > > [18344.236768]  [] lockdep_trace_alloc+0xb0/0xb3
> > > [18344.236771]  [] __alloc_pages_nodemask+0x99/0x856
> > > [18344.236772]  [] ? find_get_entry+0x14b/0x17a
> > > [18344.236774]  [] ? find_get_entry+0x168/0x17a
> > > [18344.236777]  [] __read_swap_cache_async+0x7b/0x1aa
> > > [18344.236778]  [] read_swap_cache_async+0x15/0x2d
> > > [18344.236780]  [] swapin_readahead+0x11a/0x16a
> > > [18344.236783]  [] do_swap_page+0xa7/0x36b
> > > [18344.236784]  [] ? do_swap_page+0xa7/0x36b
> > > [18344.236787]  [] khugepaged+0x8f9/0x1987
> > > [18344.236790]  [] ? wait_woken+0x88/0x88
> > > [18344.236792]  [] ? maybe_pmd_mkwrite+0x1a/0x1a
> > > [18344.236794]  [] kthread+0x107/0x10f
> > > [18344.236797]  [] ? kthread_create_on_node+0x1ea/0x1ea
> > > [18344.236799]  [] ret_from_fork+0x3f/0x70
> > > [18344.236801]  [] ? kthread_create_on_node+0x1ea/0x1ea
> > 
> > Hm. If I read this correctly, we see following scenario:
> > 
> >  - khugepaged tries to swap in a page under mmap_sem and anon_vma lock;
> >  - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE;
> >  - __read_swap_cache_async() tries to allocate the page for swap in;
> >  - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with
> >given gfp_mask we could end up in direct relaim.
> >  - Lockdep already knows that reclaim sometimes (e.g. in case of
> >split_huge_page()) wants to take anon_vma lock on its own.
> > 
> > Therefore deadlock is possible.
> 
> Oh, thank you for working that out.  As usual with a lockdep trace,
> I knew it was telling me something important, but in a language I
> just couldn't understand without spending much longer to decode it.
> Yes, wrong to call do_swap_page() while holding anon_vma lock.
> 
> > 
> 

Re: [linux-next] khugepaged inconsistent lock state

2015-09-23 Thread Ebru Akagündüz
2015-09-23 15:22 GMT+02:00 Kirill A. Shutemov :
> On Mon, Sep 21, 2015 at 04:57:05PM -0700, Hugh Dickins wrote:
>> On Mon, 21 Sep 2015, Kirill A. Shutemov wrote:
>> > On Mon, Sep 21, 2015 at 01:46:00PM +0900, Sergey Senozhatsky wrote:
>> > > Hi,
>> > >
>> > > 4.3.0-rc1-next-20150918
>> > >
>> > > [18344.236625] =
>> > > [18344.236628] [ INFO: inconsistent lock state ]
>> > > [18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not 
>> > > tainted
>> > > [18344.236636] -
>> > > [18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
>> > > [18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes:
>> > > [18344.236648]  (_vma->rwsem){?.}, at: [] 
>> > > khugepaged+0x8b0/0x1987
>> > > [18344.236662] {IN-RECLAIM_FS-W} state was registered at:
>> > > [18344.23]   [] __lock_acquire+0x8e2/0x1183
>> > > [18344.236673]   [] lock_acquire+0x10b/0x1a6
>> > > [18344.236678]   [] down_write+0x3b/0x6a
>> > > [18344.236686]   [] split_huge_page_to_list+0x5b/0x61f
>> > > [18344.236689]   [] add_to_swap+0x37/0x78
>> > > [18344.236691]   [] shrink_page_list+0x4c2/0xb9a
>> > > [18344.236694]   [] shrink_inactive_list+0x371/0x5d9
>> > > [18344.236696]   [] shrink_lruvec+0x410/0x5ae
>> > > [18344.236698]   [] shrink_zone+0x57/0x140
>> > > [18344.236700]   [] kswapd+0x6a5/0x91b
>> > > [18344.236702]   [] kthread+0x107/0x10f
>> > > [18344.236706]   [] ret_from_fork+0x3f/0x70
>> > > [18344.236708] irq event stamp: 6517947
>> > > [18344.236709] hardirqs last  enabled at (6517947): [] 
>> > > get_page_from_freelist+0x362/0x59e
>> > > [18344.236713] hardirqs last disabled at (6517946): [] 
>> > > _raw_spin_lock_irqsave+0x18/0x51
>> > > [18344.236715] softirqs last  enabled at (6507072): [] 
>> > > __do_softirq+0x2df/0x3f5
>> > > [18344.236719] softirqs last disabled at (6507055): [] 
>> > > irq_exit+0x40/0x94
>> > > [18344.236722]
>> > >other info that might help us debug this:
>> > > [18344.236723]  Possible unsafe locking scenario:
>> > >
>> > > [18344.236724]CPU0
>> > > [18344.236725]
>> > > [18344.236726]   lock(_vma->rwsem);
>> > > [18344.236728]   
>> > > [18344.236729] lock(_vma->rwsem);
>> > > [18344.236731]
>> > > *** DEADLOCK ***
>> > >
>> > > [18344.236733] 2 locks held by khugepaged/32:
>> > > [18344.236733]  #0:  (>mmap_sem){++}, at: [] 
>> > > khugepaged+0x5cf/0x1987
>> > > [18344.236738]  #1:  (_vma->rwsem){?.}, at: 
>> > > [] khugepaged+0x8b0/0x1987
>> > > [18344.236741]
>> > >stack backtrace:
>> > > [18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted 
>> > > 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361
>> > > [18344.236747]   880132827a00 81230867 
>> > > 8237ba90
>> > > [18344.236750]  880132827a38 810ea9b9 000a 
>> > > 8801333b52e0
>> > > [18344.236753]  8801333b4c00 8107b3ce 000a 
>> > > 880132827a78
>> > > [18344.236755] Call Trace:
>> > > [18344.236758]  [] dump_stack+0x4e/0x79
>> > > [18344.236761]  [] print_usage_bug.part.24+0x259/0x268
>> > > [18344.236763]  [] ? 
>> > > print_shortest_lock_dependencies+0x180/0x180
>> > > [18344.236765]  [] mark_lock+0x381/0x567
>> > > [18344.236766]  [] mark_held_locks+0x5e/0x74
>> > > [18344.236768]  [] lockdep_trace_alloc+0xb0/0xb3
>> > > [18344.236771]  [] __alloc_pages_nodemask+0x99/0x856
>> > > [18344.236772]  [] ? find_get_entry+0x14b/0x17a
>> > > [18344.236774]  [] ? find_get_entry+0x168/0x17a
>> > > [18344.236777]  [] __read_swap_cache_async+0x7b/0x1aa
>> > > [18344.236778]  [] read_swap_cache_async+0x15/0x2d
>> > > [18344.236780]  [] swapin_readahead+0x11a/0x16a
>> > > [18344.236783]  [] do_swap_page+0xa7/0x36b
>> > > [18344.236784]  [] ? do_swap_page+0xa7/0x36b
>> > > [18344.236787]  [] khugepaged+0x8f9/0x1987
>> > > [18344.236790]  [] ? wait_woken+0x88/0x88
>> > > [18344.236792]  [] ? maybe_pmd_mkwrite+0x1a/0x1a
>> > > [18344.236794]  [] kthread+0x107/0x10f
>> > > [18344.236797]  [] ? kthread_create_on_node+0x1ea/0x1ea
>> > > [18344.236799]  [] ret_from_fork+0x3f/0x70
>> > > [18344.236801]  [] ? kthread_create_on_node+0x1ea/0x1ea
>> >
>> > Hm. If I read this correctly, we see following scenario:
>> >
>> >  - khugepaged tries to swap in a page under mmap_sem and anon_vma lock;
>> >  - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE;
>> >  - __read_swap_cache_async() tries to allocate the page for swap in;
>> >  - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with
>> >given gfp_mask we could end up in direct relaim.
>> >  - Lockdep already knows that reclaim sometimes (e.g. in case of
>> >split_huge_page()) wants to take anon_vma lock on its own.
>> >
>> > Therefore deadlock is possible.
>>
>> Oh, thank you for working that out.  As usual with a lockdep trace,
>> I knew it was telling me something important, 

Re: [linux-next] khugepaged inconsistent lock state

2015-09-23 Thread Sergey Senozhatsky
On (09/23/15 16:22), Kirill A. Shutemov wrote:
[..]
> khugepaged does swap in during collapse under anon_vma lock. It causes
> complain from lockdep. The trace below shows following scenario:
> 
>  - khugepaged tries to swap in a page under mmap_sem and anon_vma lock;
>  - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE;
>  - __read_swap_cache_async() tries to allocate the page for swap in;
>  - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with
>given gfp_mask we could end up in direct relaim.
>  - Lockdep already knows that reclaim sometimes (e.g. in case of
>split_huge_page()) wants to take anon_vma lock on its own.
> 
> Therefore deadlock is possible.
[..]

Gave it some testing on my box. Works fine on my side.

I guess you can add (if needed)
Tested-by: Sergey Senozhatsky 

-ss

> Signed-off-by: Kirill A. Shutemov 
> Reported-by: Sergey Senozhatsky 
> ---
>  mm/huge_memory.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index dd58ecfcafe6..06c8f6d8fee2 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2725,10 +2725,10 @@ static void collapse_huge_page(struct mm_struct *mm,
>   goto out;
>   }
>  
> - anon_vma_lock_write(vma->anon_vma);
> -
>   __collapse_huge_page_swapin(mm, vma, address, pmd);
>  
> + anon_vma_lock_write(vma->anon_vma);
> +
>   pte = pte_offset_map(pmd, address);
>   pte_ptl = pte_lockptr(mm, pmd);
>  
> -- 
>  Kirill A. Shutemov
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-next] khugepaged inconsistent lock state

2015-09-21 Thread Hugh Dickins
On Mon, 21 Sep 2015, Kirill A. Shutemov wrote:
> On Mon, Sep 21, 2015 at 01:46:00PM +0900, Sergey Senozhatsky wrote:
> > Hi,
> > 
> > 4.3.0-rc1-next-20150918
> > 
> > [18344.236625] =
> > [18344.236628] [ INFO: inconsistent lock state ]
> > [18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not 
> > tainted
> > [18344.236636] -
> > [18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
> > [18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > [18344.236648]  (_vma->rwsem){?.}, at: [] 
> > khugepaged+0x8b0/0x1987
> > [18344.236662] {IN-RECLAIM_FS-W} state was registered at:
> > [18344.23]   [] __lock_acquire+0x8e2/0x1183
> > [18344.236673]   [] lock_acquire+0x10b/0x1a6
> > [18344.236678]   [] down_write+0x3b/0x6a
> > [18344.236686]   [] split_huge_page_to_list+0x5b/0x61f
> > [18344.236689]   [] add_to_swap+0x37/0x78
> > [18344.236691]   [] shrink_page_list+0x4c2/0xb9a
> > [18344.236694]   [] shrink_inactive_list+0x371/0x5d9
> > [18344.236696]   [] shrink_lruvec+0x410/0x5ae
> > [18344.236698]   [] shrink_zone+0x57/0x140
> > [18344.236700]   [] kswapd+0x6a5/0x91b
> > [18344.236702]   [] kthread+0x107/0x10f
> > [18344.236706]   [] ret_from_fork+0x3f/0x70
> > [18344.236708] irq event stamp: 6517947
> > [18344.236709] hardirqs last  enabled at (6517947): [] 
> > get_page_from_freelist+0x362/0x59e
> > [18344.236713] hardirqs last disabled at (6517946): [] 
> > _raw_spin_lock_irqsave+0x18/0x51
> > [18344.236715] softirqs last  enabled at (6507072): [] 
> > __do_softirq+0x2df/0x3f5
> > [18344.236719] softirqs last disabled at (6507055): [] 
> > irq_exit+0x40/0x94
> > [18344.236722] 
> >other info that might help us debug this:
> > [18344.236723]  Possible unsafe locking scenario:
> > 
> > [18344.236724]CPU0
> > [18344.236725]
> > [18344.236726]   lock(_vma->rwsem);
> > [18344.236728]   
> > [18344.236729] lock(_vma->rwsem);
> > [18344.236731] 
> > *** DEADLOCK ***
> > 
> > [18344.236733] 2 locks held by khugepaged/32:
> > [18344.236733]  #0:  (>mmap_sem){++}, at: [] 
> > khugepaged+0x5cf/0x1987
> > [18344.236738]  #1:  (_vma->rwsem){?.}, at: [] 
> > khugepaged+0x8b0/0x1987
> > [18344.236741] 
> >stack backtrace:
> > [18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted 
> > 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361
> > [18344.236747]   880132827a00 81230867 
> > 8237ba90
> > [18344.236750]  880132827a38 810ea9b9 000a 
> > 8801333b52e0
> > [18344.236753]  8801333b4c00 8107b3ce 000a 
> > 880132827a78
> > [18344.236755] Call Trace:
> > [18344.236758]  [] dump_stack+0x4e/0x79
> > [18344.236761]  [] print_usage_bug.part.24+0x259/0x268
> > [18344.236763]  [] ? 
> > print_shortest_lock_dependencies+0x180/0x180
> > [18344.236765]  [] mark_lock+0x381/0x567
> > [18344.236766]  [] mark_held_locks+0x5e/0x74
> > [18344.236768]  [] lockdep_trace_alloc+0xb0/0xb3
> > [18344.236771]  [] __alloc_pages_nodemask+0x99/0x856
> > [18344.236772]  [] ? find_get_entry+0x14b/0x17a
> > [18344.236774]  [] ? find_get_entry+0x168/0x17a
> > [18344.236777]  [] __read_swap_cache_async+0x7b/0x1aa
> > [18344.236778]  [] read_swap_cache_async+0x15/0x2d
> > [18344.236780]  [] swapin_readahead+0x11a/0x16a
> > [18344.236783]  [] do_swap_page+0xa7/0x36b
> > [18344.236784]  [] ? do_swap_page+0xa7/0x36b
> > [18344.236787]  [] khugepaged+0x8f9/0x1987
> > [18344.236790]  [] ? wait_woken+0x88/0x88
> > [18344.236792]  [] ? maybe_pmd_mkwrite+0x1a/0x1a
> > [18344.236794]  [] kthread+0x107/0x10f
> > [18344.236797]  [] ? kthread_create_on_node+0x1ea/0x1ea
> > [18344.236799]  [] ret_from_fork+0x3f/0x70
> > [18344.236801]  [] ? kthread_create_on_node+0x1ea/0x1ea
> 
> Hm. If I read this correctly, we see following scenario:
> 
>  - khugepaged tries to swap in a page under mmap_sem and anon_vma lock;
>  - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE;
>  - __read_swap_cache_async() tries to allocate the page for swap in;
>  - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with
>given gfp_mask we could end up in direct relaim.
>  - Lockdep already knows that reclaim sometimes (e.g. in case of
>split_huge_page()) wants to take anon_vma lock on its own.
> 
> Therefore deadlock is possible.

Oh, thank you for working that out.  As usual with a lockdep trace,
I knew it was telling me something important, but in a language I
just couldn't understand without spending much longer to decode it.
Yes, wrong to call do_swap_page() while holding anon_vma lock.

> 
> I see two ways to fix this:
> 
>  - take anon_vma lock *after* __collapse_huge_page_swapin() in
>collapse_huge_page(): I don't really see why we need the lock
>during swapin;

Agreed.

>  - respect FAULT_FLAG_RETRY_NOWAIT in do_swap_page(): add GFP_NOWAIT to
>

Re: [linux-next] khugepaged inconsistent lock state

2015-09-21 Thread Kirill A. Shutemov
On Mon, Sep 21, 2015 at 01:46:00PM +0900, Sergey Senozhatsky wrote:
> Hi,
> 
> 4.3.0-rc1-next-20150918
> 
> [18344.236625] =
> [18344.236628] [ INFO: inconsistent lock state ]
> [18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not 
> tainted
> [18344.236636] -
> [18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
> [18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [18344.236648]  (_vma->rwsem){?.}, at: [] 
> khugepaged+0x8b0/0x1987
> [18344.236662] {IN-RECLAIM_FS-W} state was registered at:
> [18344.23]   [] __lock_acquire+0x8e2/0x1183
> [18344.236673]   [] lock_acquire+0x10b/0x1a6
> [18344.236678]   [] down_write+0x3b/0x6a
> [18344.236686]   [] split_huge_page_to_list+0x5b/0x61f
> [18344.236689]   [] add_to_swap+0x37/0x78
> [18344.236691]   [] shrink_page_list+0x4c2/0xb9a
> [18344.236694]   [] shrink_inactive_list+0x371/0x5d9
> [18344.236696]   [] shrink_lruvec+0x410/0x5ae
> [18344.236698]   [] shrink_zone+0x57/0x140
> [18344.236700]   [] kswapd+0x6a5/0x91b
> [18344.236702]   [] kthread+0x107/0x10f
> [18344.236706]   [] ret_from_fork+0x3f/0x70
> [18344.236708] irq event stamp: 6517947
> [18344.236709] hardirqs last  enabled at (6517947): [] 
> get_page_from_freelist+0x362/0x59e
> [18344.236713] hardirqs last disabled at (6517946): [] 
> _raw_spin_lock_irqsave+0x18/0x51
> [18344.236715] softirqs last  enabled at (6507072): [] 
> __do_softirq+0x2df/0x3f5
> [18344.236719] softirqs last disabled at (6507055): [] 
> irq_exit+0x40/0x94
> [18344.236722] 
>other info that might help us debug this:
> [18344.236723]  Possible unsafe locking scenario:
> 
> [18344.236724]CPU0
> [18344.236725]
> [18344.236726]   lock(_vma->rwsem);
> [18344.236728]   
> [18344.236729] lock(_vma->rwsem);
> [18344.236731] 
> *** DEADLOCK ***
> 
> [18344.236733] 2 locks held by khugepaged/32:
> [18344.236733]  #0:  (>mmap_sem){++}, at: [] 
> khugepaged+0x5cf/0x1987
> [18344.236738]  #1:  (_vma->rwsem){?.}, at: [] 
> khugepaged+0x8b0/0x1987
> [18344.236741] 
>stack backtrace:
> [18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted 
> 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361
> [18344.236747]   880132827a00 81230867 
> 8237ba90
> [18344.236750]  880132827a38 810ea9b9 000a 
> 8801333b52e0
> [18344.236753]  8801333b4c00 8107b3ce 000a 
> 880132827a78
> [18344.236755] Call Trace:
> [18344.236758]  [] dump_stack+0x4e/0x79
> [18344.236761]  [] print_usage_bug.part.24+0x259/0x268
> [18344.236763]  [] ? 
> print_shortest_lock_dependencies+0x180/0x180
> [18344.236765]  [] mark_lock+0x381/0x567
> [18344.236766]  [] mark_held_locks+0x5e/0x74
> [18344.236768]  [] lockdep_trace_alloc+0xb0/0xb3
> [18344.236771]  [] __alloc_pages_nodemask+0x99/0x856
> [18344.236772]  [] ? find_get_entry+0x14b/0x17a
> [18344.236774]  [] ? find_get_entry+0x168/0x17a
> [18344.236777]  [] __read_swap_cache_async+0x7b/0x1aa
> [18344.236778]  [] read_swap_cache_async+0x15/0x2d
> [18344.236780]  [] swapin_readahead+0x11a/0x16a
> [18344.236783]  [] do_swap_page+0xa7/0x36b
> [18344.236784]  [] ? do_swap_page+0xa7/0x36b
> [18344.236787]  [] khugepaged+0x8f9/0x1987
> [18344.236790]  [] ? wait_woken+0x88/0x88
> [18344.236792]  [] ? maybe_pmd_mkwrite+0x1a/0x1a
> [18344.236794]  [] kthread+0x107/0x10f
> [18344.236797]  [] ? kthread_create_on_node+0x1ea/0x1ea
> [18344.236799]  [] ret_from_fork+0x3f/0x70
> [18344.236801]  [] ? kthread_create_on_node+0x1ea/0x1ea

Hm. If I read this correctly, we see following scenario:

 - khugepaged tries to swap in a page under mmap_sem and anon_vma lock;
 - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE;
 - __read_swap_cache_async() tries to allocate the page for swap in;
 - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with
   given gfp_mask we could end up in direct relaim.
 - Lockdep already knows that reclaim sometimes (e.g. in case of
   split_huge_page()) wants to take anon_vma lock on its own.

Therefore deadlock is possible.

I see two ways to fix this:

 - take anon_vma lock *after* __collapse_huge_page_swapin() in
   collapse_huge_page(): I don't really see why we need the lock
   during swapin;
 - respect FAULT_FLAG_RETRY_NOWAIT in do_swap_page(): add GFP_NOWAIT to
   gfp_mask for swapin_readahead() in this case.

I guess it could be beneficial to do both.

Any comments?

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-next] khugepaged inconsistent lock state

2015-09-21 Thread Hugh Dickins
On Mon, 21 Sep 2015, Kirill A. Shutemov wrote:
> On Mon, Sep 21, 2015 at 01:46:00PM +0900, Sergey Senozhatsky wrote:
> > Hi,
> > 
> > 4.3.0-rc1-next-20150918
> > 
> > [18344.236625] =
> > [18344.236628] [ INFO: inconsistent lock state ]
> > [18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not 
> > tainted
> > [18344.236636] -
> > [18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
> > [18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > [18344.236648]  (_vma->rwsem){?.}, at: [] 
> > khugepaged+0x8b0/0x1987
> > [18344.236662] {IN-RECLAIM_FS-W} state was registered at:
> > [18344.23]   [] __lock_acquire+0x8e2/0x1183
> > [18344.236673]   [] lock_acquire+0x10b/0x1a6
> > [18344.236678]   [] down_write+0x3b/0x6a
> > [18344.236686]   [] split_huge_page_to_list+0x5b/0x61f
> > [18344.236689]   [] add_to_swap+0x37/0x78
> > [18344.236691]   [] shrink_page_list+0x4c2/0xb9a
> > [18344.236694]   [] shrink_inactive_list+0x371/0x5d9
> > [18344.236696]   [] shrink_lruvec+0x410/0x5ae
> > [18344.236698]   [] shrink_zone+0x57/0x140
> > [18344.236700]   [] kswapd+0x6a5/0x91b
> > [18344.236702]   [] kthread+0x107/0x10f
> > [18344.236706]   [] ret_from_fork+0x3f/0x70
> > [18344.236708] irq event stamp: 6517947
> > [18344.236709] hardirqs last  enabled at (6517947): [] 
> > get_page_from_freelist+0x362/0x59e
> > [18344.236713] hardirqs last disabled at (6517946): [] 
> > _raw_spin_lock_irqsave+0x18/0x51
> > [18344.236715] softirqs last  enabled at (6507072): [] 
> > __do_softirq+0x2df/0x3f5
> > [18344.236719] softirqs last disabled at (6507055): [] 
> > irq_exit+0x40/0x94
> > [18344.236722] 
> >other info that might help us debug this:
> > [18344.236723]  Possible unsafe locking scenario:
> > 
> > [18344.236724]CPU0
> > [18344.236725]
> > [18344.236726]   lock(_vma->rwsem);
> > [18344.236728]   
> > [18344.236729] lock(_vma->rwsem);
> > [18344.236731] 
> > *** DEADLOCK ***
> > 
> > [18344.236733] 2 locks held by khugepaged/32:
> > [18344.236733]  #0:  (>mmap_sem){++}, at: [] 
> > khugepaged+0x5cf/0x1987
> > [18344.236738]  #1:  (_vma->rwsem){?.}, at: [] 
> > khugepaged+0x8b0/0x1987
> > [18344.236741] 
> >stack backtrace:
> > [18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted 
> > 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361
> > [18344.236747]   880132827a00 81230867 
> > 8237ba90
> > [18344.236750]  880132827a38 810ea9b9 000a 
> > 8801333b52e0
> > [18344.236753]  8801333b4c00 8107b3ce 000a 
> > 880132827a78
> > [18344.236755] Call Trace:
> > [18344.236758]  [] dump_stack+0x4e/0x79
> > [18344.236761]  [] print_usage_bug.part.24+0x259/0x268
> > [18344.236763]  [] ? 
> > print_shortest_lock_dependencies+0x180/0x180
> > [18344.236765]  [] mark_lock+0x381/0x567
> > [18344.236766]  [] mark_held_locks+0x5e/0x74
> > [18344.236768]  [] lockdep_trace_alloc+0xb0/0xb3
> > [18344.236771]  [] __alloc_pages_nodemask+0x99/0x856
> > [18344.236772]  [] ? find_get_entry+0x14b/0x17a
> > [18344.236774]  [] ? find_get_entry+0x168/0x17a
> > [18344.236777]  [] __read_swap_cache_async+0x7b/0x1aa
> > [18344.236778]  [] read_swap_cache_async+0x15/0x2d
> > [18344.236780]  [] swapin_readahead+0x11a/0x16a
> > [18344.236783]  [] do_swap_page+0xa7/0x36b
> > [18344.236784]  [] ? do_swap_page+0xa7/0x36b
> > [18344.236787]  [] khugepaged+0x8f9/0x1987
> > [18344.236790]  [] ? wait_woken+0x88/0x88
> > [18344.236792]  [] ? maybe_pmd_mkwrite+0x1a/0x1a
> > [18344.236794]  [] kthread+0x107/0x10f
> > [18344.236797]  [] ? kthread_create_on_node+0x1ea/0x1ea
> > [18344.236799]  [] ret_from_fork+0x3f/0x70
> > [18344.236801]  [] ? kthread_create_on_node+0x1ea/0x1ea
> 
> Hm. If I read this correctly, we see following scenario:
> 
>  - khugepaged tries to swap in a page under mmap_sem and anon_vma lock;
>  - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE;
>  - __read_swap_cache_async() tries to allocate the page for swap in;
>  - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with
>given gfp_mask we could end up in direct relaim.
>  - Lockdep already knows that reclaim sometimes (e.g. in case of
>split_huge_page()) wants to take anon_vma lock on its own.
> 
> Therefore deadlock is possible.

Oh, thank you for working that out.  As usual with a lockdep trace,
I knew it was telling me something important, but in a language I
just couldn't understand without spending much longer to decode it.
Yes, wrong to call do_swap_page() while holding anon_vma lock.

> 
> I see two ways to fix this:
> 
>  - take anon_vma lock *after* __collapse_huge_page_swapin() in
>collapse_huge_page(): I don't really see why we need the lock
>during swapin;

Agreed.

>  - respect FAULT_FLAG_RETRY_NOWAIT in do_swap_page(): add GFP_NOWAIT to
>

Re: [linux-next] khugepaged inconsistent lock state

2015-09-21 Thread Kirill A. Shutemov
On Mon, Sep 21, 2015 at 01:46:00PM +0900, Sergey Senozhatsky wrote:
> Hi,
> 
> 4.3.0-rc1-next-20150918
> 
> [18344.236625] =
> [18344.236628] [ INFO: inconsistent lock state ]
> [18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not 
> tainted
> [18344.236636] -
> [18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
> [18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [18344.236648]  (_vma->rwsem){?.}, at: [] 
> khugepaged+0x8b0/0x1987
> [18344.236662] {IN-RECLAIM_FS-W} state was registered at:
> [18344.23]   [] __lock_acquire+0x8e2/0x1183
> [18344.236673]   [] lock_acquire+0x10b/0x1a6
> [18344.236678]   [] down_write+0x3b/0x6a
> [18344.236686]   [] split_huge_page_to_list+0x5b/0x61f
> [18344.236689]   [] add_to_swap+0x37/0x78
> [18344.236691]   [] shrink_page_list+0x4c2/0xb9a
> [18344.236694]   [] shrink_inactive_list+0x371/0x5d9
> [18344.236696]   [] shrink_lruvec+0x410/0x5ae
> [18344.236698]   [] shrink_zone+0x57/0x140
> [18344.236700]   [] kswapd+0x6a5/0x91b
> [18344.236702]   [] kthread+0x107/0x10f
> [18344.236706]   [] ret_from_fork+0x3f/0x70
> [18344.236708] irq event stamp: 6517947
> [18344.236709] hardirqs last  enabled at (6517947): [] 
> get_page_from_freelist+0x362/0x59e
> [18344.236713] hardirqs last disabled at (6517946): [] 
> _raw_spin_lock_irqsave+0x18/0x51
> [18344.236715] softirqs last  enabled at (6507072): [] 
> __do_softirq+0x2df/0x3f5
> [18344.236719] softirqs last disabled at (6507055): [] 
> irq_exit+0x40/0x94
> [18344.236722] 
>other info that might help us debug this:
> [18344.236723]  Possible unsafe locking scenario:
> 
> [18344.236724]CPU0
> [18344.236725]
> [18344.236726]   lock(_vma->rwsem);
> [18344.236728]   
> [18344.236729] lock(_vma->rwsem);
> [18344.236731] 
> *** DEADLOCK ***
> 
> [18344.236733] 2 locks held by khugepaged/32:
> [18344.236733]  #0:  (>mmap_sem){++}, at: [] 
> khugepaged+0x5cf/0x1987
> [18344.236738]  #1:  (_vma->rwsem){?.}, at: [] 
> khugepaged+0x8b0/0x1987
> [18344.236741] 
>stack backtrace:
> [18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted 
> 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361
> [18344.236747]   880132827a00 81230867 
> 8237ba90
> [18344.236750]  880132827a38 810ea9b9 000a 
> 8801333b52e0
> [18344.236753]  8801333b4c00 8107b3ce 000a 
> 880132827a78
> [18344.236755] Call Trace:
> [18344.236758]  [] dump_stack+0x4e/0x79
> [18344.236761]  [] print_usage_bug.part.24+0x259/0x268
> [18344.236763]  [] ? 
> print_shortest_lock_dependencies+0x180/0x180
> [18344.236765]  [] mark_lock+0x381/0x567
> [18344.236766]  [] mark_held_locks+0x5e/0x74
> [18344.236768]  [] lockdep_trace_alloc+0xb0/0xb3
> [18344.236771]  [] __alloc_pages_nodemask+0x99/0x856
> [18344.236772]  [] ? find_get_entry+0x14b/0x17a
> [18344.236774]  [] ? find_get_entry+0x168/0x17a
> [18344.236777]  [] __read_swap_cache_async+0x7b/0x1aa
> [18344.236778]  [] read_swap_cache_async+0x15/0x2d
> [18344.236780]  [] swapin_readahead+0x11a/0x16a
> [18344.236783]  [] do_swap_page+0xa7/0x36b
> [18344.236784]  [] ? do_swap_page+0xa7/0x36b
> [18344.236787]  [] khugepaged+0x8f9/0x1987
> [18344.236790]  [] ? wait_woken+0x88/0x88
> [18344.236792]  [] ? maybe_pmd_mkwrite+0x1a/0x1a
> [18344.236794]  [] kthread+0x107/0x10f
> [18344.236797]  [] ? kthread_create_on_node+0x1ea/0x1ea
> [18344.236799]  [] ret_from_fork+0x3f/0x70
> [18344.236801]  [] ? kthread_create_on_node+0x1ea/0x1ea

Hm. If I read this correctly, we see following scenario:

 - khugepaged tries to swap in a page under mmap_sem and anon_vma lock;
 - do_swap_page() calls swapin_readahead() with GFP_HIGHUSER_MOVABLE;
 - __read_swap_cache_async() tries to allocate the page for swap in;
 - lockdep_trace_alloc() in __alloc_pages_nodemask() notices that with
   given gfp_mask we could end up in direct relaim.
 - Lockdep already knows that reclaim sometimes (e.g. in case of
   split_huge_page()) wants to take anon_vma lock on its own.

Therefore deadlock is possible.

I see two ways to fix this:

 - take anon_vma lock *after* __collapse_huge_page_swapin() in
   collapse_huge_page(): I don't really see why we need the lock
   during swapin;
 - respect FAULT_FLAG_RETRY_NOWAIT in do_swap_page(): add GFP_NOWAIT to
   gfp_mask for swapin_readahead() in this case.

I guess it could be beneficial to do both.

Any comments?

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[linux-next] khugepaged inconsistent lock state

2015-09-20 Thread Sergey Senozhatsky
Hi,

4.3.0-rc1-next-20150918

[18344.236625] =
[18344.236628] [ INFO: inconsistent lock state ]
[18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not tainted
[18344.236636] -
[18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
[18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes:
[18344.236648]  (_vma->rwsem){?.}, at: [] 
khugepaged+0x8b0/0x1987
[18344.236662] {IN-RECLAIM_FS-W} state was registered at:
[18344.23]   [] __lock_acquire+0x8e2/0x1183
[18344.236673]   [] lock_acquire+0x10b/0x1a6
[18344.236678]   [] down_write+0x3b/0x6a
[18344.236686]   [] split_huge_page_to_list+0x5b/0x61f
[18344.236689]   [] add_to_swap+0x37/0x78
[18344.236691]   [] shrink_page_list+0x4c2/0xb9a
[18344.236694]   [] shrink_inactive_list+0x371/0x5d9
[18344.236696]   [] shrink_lruvec+0x410/0x5ae
[18344.236698]   [] shrink_zone+0x57/0x140
[18344.236700]   [] kswapd+0x6a5/0x91b
[18344.236702]   [] kthread+0x107/0x10f
[18344.236706]   [] ret_from_fork+0x3f/0x70
[18344.236708] irq event stamp: 6517947
[18344.236709] hardirqs last  enabled at (6517947): [] 
get_page_from_freelist+0x362/0x59e
[18344.236713] hardirqs last disabled at (6517946): [] 
_raw_spin_lock_irqsave+0x18/0x51
[18344.236715] softirqs last  enabled at (6507072): [] 
__do_softirq+0x2df/0x3f5
[18344.236719] softirqs last disabled at (6507055): [] 
irq_exit+0x40/0x94
[18344.236722] 
   other info that might help us debug this:
[18344.236723]  Possible unsafe locking scenario:

[18344.236724]CPU0
[18344.236725]
[18344.236726]   lock(_vma->rwsem);
[18344.236728]   
[18344.236729] lock(_vma->rwsem);
[18344.236731] 
*** DEADLOCK ***

[18344.236733] 2 locks held by khugepaged/32:
[18344.236733]  #0:  (>mmap_sem){++}, at: [] 
khugepaged+0x5cf/0x1987
[18344.236738]  #1:  (_vma->rwsem){?.}, at: [] 
khugepaged+0x8b0/0x1987
[18344.236741] 
   stack backtrace:
[18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted 
4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361
[18344.236747]   880132827a00 81230867 
8237ba90
[18344.236750]  880132827a38 810ea9b9 000a 
8801333b52e0
[18344.236753]  8801333b4c00 8107b3ce 000a 
880132827a78
[18344.236755] Call Trace:
[18344.236758]  [] dump_stack+0x4e/0x79
[18344.236761]  [] print_usage_bug.part.24+0x259/0x268
[18344.236763]  [] ? 
print_shortest_lock_dependencies+0x180/0x180
[18344.236765]  [] mark_lock+0x381/0x567
[18344.236766]  [] mark_held_locks+0x5e/0x74
[18344.236768]  [] lockdep_trace_alloc+0xb0/0xb3
[18344.236771]  [] __alloc_pages_nodemask+0x99/0x856
[18344.236772]  [] ? find_get_entry+0x14b/0x17a
[18344.236774]  [] ? find_get_entry+0x168/0x17a
[18344.236777]  [] __read_swap_cache_async+0x7b/0x1aa
[18344.236778]  [] read_swap_cache_async+0x15/0x2d
[18344.236780]  [] swapin_readahead+0x11a/0x16a
[18344.236783]  [] do_swap_page+0xa7/0x36b
[18344.236784]  [] ? do_swap_page+0xa7/0x36b
[18344.236787]  [] khugepaged+0x8f9/0x1987
[18344.236790]  [] ? wait_woken+0x88/0x88
[18344.236792]  [] ? maybe_pmd_mkwrite+0x1a/0x1a
[18344.236794]  [] kthread+0x107/0x10f
[18344.236797]  [] ? kthread_create_on_node+0x1ea/0x1ea
[18344.236799]  [] ret_from_fork+0x3f/0x70
[18344.236801]  [] ? kthread_create_on_node+0x1ea/0x1ea


-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[linux-next] khugepaged inconsistent lock state

2015-09-20 Thread Sergey Senozhatsky
Hi,

4.3.0-rc1-next-20150918

[18344.236625] =
[18344.236628] [ INFO: inconsistent lock state ]
[18344.236633] 4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361 Not tainted
[18344.236636] -
[18344.236640] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
[18344.236645] khugepaged/32 [HC0[0]:SC0[0]:HE1:SE1] takes:
[18344.236648]  (_vma->rwsem){?.}, at: [] 
khugepaged+0x8b0/0x1987
[18344.236662] {IN-RECLAIM_FS-W} state was registered at:
[18344.23]   [] __lock_acquire+0x8e2/0x1183
[18344.236673]   [] lock_acquire+0x10b/0x1a6
[18344.236678]   [] down_write+0x3b/0x6a
[18344.236686]   [] split_huge_page_to_list+0x5b/0x61f
[18344.236689]   [] add_to_swap+0x37/0x78
[18344.236691]   [] shrink_page_list+0x4c2/0xb9a
[18344.236694]   [] shrink_inactive_list+0x371/0x5d9
[18344.236696]   [] shrink_lruvec+0x410/0x5ae
[18344.236698]   [] shrink_zone+0x57/0x140
[18344.236700]   [] kswapd+0x6a5/0x91b
[18344.236702]   [] kthread+0x107/0x10f
[18344.236706]   [] ret_from_fork+0x3f/0x70
[18344.236708] irq event stamp: 6517947
[18344.236709] hardirqs last  enabled at (6517947): [] 
get_page_from_freelist+0x362/0x59e
[18344.236713] hardirqs last disabled at (6517946): [] 
_raw_spin_lock_irqsave+0x18/0x51
[18344.236715] softirqs last  enabled at (6507072): [] 
__do_softirq+0x2df/0x3f5
[18344.236719] softirqs last disabled at (6507055): [] 
irq_exit+0x40/0x94
[18344.236722] 
   other info that might help us debug this:
[18344.236723]  Possible unsafe locking scenario:

[18344.236724]CPU0
[18344.236725]
[18344.236726]   lock(_vma->rwsem);
[18344.236728]   
[18344.236729] lock(_vma->rwsem);
[18344.236731] 
*** DEADLOCK ***

[18344.236733] 2 locks held by khugepaged/32:
[18344.236733]  #0:  (>mmap_sem){++}, at: [] 
khugepaged+0x5cf/0x1987
[18344.236738]  #1:  (_vma->rwsem){?.}, at: [] 
khugepaged+0x8b0/0x1987
[18344.236741] 
   stack backtrace:
[18344.236744] CPU: 3 PID: 32 Comm: khugepaged Not tainted 
4.3.0-rc1-next-20150918-dbg-00014-ge5128d0-dirty #361
[18344.236747]   880132827a00 81230867 
8237ba90
[18344.236750]  880132827a38 810ea9b9 000a 
8801333b52e0
[18344.236753]  8801333b4c00 8107b3ce 000a 
880132827a78
[18344.236755] Call Trace:
[18344.236758]  [] dump_stack+0x4e/0x79
[18344.236761]  [] print_usage_bug.part.24+0x259/0x268
[18344.236763]  [] ? 
print_shortest_lock_dependencies+0x180/0x180
[18344.236765]  [] mark_lock+0x381/0x567
[18344.236766]  [] mark_held_locks+0x5e/0x74
[18344.236768]  [] lockdep_trace_alloc+0xb0/0xb3
[18344.236771]  [] __alloc_pages_nodemask+0x99/0x856
[18344.236772]  [] ? find_get_entry+0x14b/0x17a
[18344.236774]  [] ? find_get_entry+0x168/0x17a
[18344.236777]  [] __read_swap_cache_async+0x7b/0x1aa
[18344.236778]  [] read_swap_cache_async+0x15/0x2d
[18344.236780]  [] swapin_readahead+0x11a/0x16a
[18344.236783]  [] do_swap_page+0xa7/0x36b
[18344.236784]  [] ? do_swap_page+0xa7/0x36b
[18344.236787]  [] khugepaged+0x8f9/0x1987
[18344.236790]  [] ? wait_woken+0x88/0x88
[18344.236792]  [] ? maybe_pmd_mkwrite+0x1a/0x1a
[18344.236794]  [] kthread+0x107/0x10f
[18344.236797]  [] ? kthread_create_on_node+0x1ea/0x1ea
[18344.236799]  [] ret_from_fork+0x3f/0x70
[18344.236801]  [] ? kthread_create_on_node+0x1ea/0x1ea


-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/