On 09/10/2014 03:36 PM, Hugh Dickins wrote:
>> migrate: debug patch to try identify race between migration completion and 
>> mprotect
>> > 
>> > A migration entry is marked as write if pte_write was true at the
>> > time the entry was created. The VMA protections are not double checked
>> > when migration entries are being removed but mprotect itself will mark
>> > write-migration-entries as read to avoid problems. It means we potentially
>> > take a spurious fault to mark these ptes write again but otherwise it's
>> > harmless.  Still, one dump indicates that this situation can actually
>> > happen so this debugging patch spits out a warning if the situation occurs
>> > and hopefully the resulting warning will contain a clue as to how exactly
>> > it happens
>> > 
>> > Not-signed-off
>> > ---
>> >  mm/migrate.c | 12 ++++++++++--
>> >  1 file changed, 10 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/mm/migrate.c b/mm/migrate.c
>> > index 09d489c..631725c 100644
>> > --- a/mm/migrate.c
>> > +++ b/mm/migrate.c
>> > @@ -146,8 +146,16 @@ static int remove_migration_pte(struct page *new, 
>> > struct vm_area_struct *vma,
>> >    pte = pte_mkold(mk_pte(new, vma->vm_page_prot));
>> >    if (pte_swp_soft_dirty(*ptep))
>> >            pte = pte_mksoft_dirty(pte);
>> > -  if (is_write_migration_entry(entry))
>> > -          pte = pte_mkwrite(pte);
>> > +  if (is_write_migration_entry(entry)) {
>> > +          /*
>> > +           * This WARN_ON_ONCE is temporary for the purposes of seeing if
>> > +           * it's a case encountered by trinity in Sasha's testing
>> > +           */
>> > +          if (!(vma->vm_flags & (VM_WRITE)))
>> > +                  WARN_ON_ONCE(1);
>> > +          else
>> > +                  pte = pte_mkwrite(pte);
>> > +  }
>> >  #ifdef CONFIG_HUGETLB_PAGE
>> >    if (PageHuge(new)) {
>> >            pte = pte_mkhuge(pte);
>> > 
> Right, and Sasha  reports that that can fire, but he sees the bug
> with this patch in and without that firing.

I've changed that WARN_ON_ONCE() to a VM_BUG_ON_VMA() to get some useful VMA 
information
out, and got the following:

[ 4018.870776] vma ffff8801a0f1e800 start 00007f3fd0ca7000 end 00007f3fd16a7000
[ 4018.870776] next ffff8804e1b89800 prev ffff88008cd9a000 mm ffff88054b17d000
[ 4018.870776] prot 120 anon_vma ffff880bc858a200 vm_ops           (null)
[ 4018.870776] pgoff 41bc8 file           (null) private_data           (null)
[ 4018.879731] flags: 0x8100070(mayread|maywrite|mayexec|account)
[ 4018.881324] ------------[ cut here ]------------
[ 4018.882612] kernel BUG at mm/migrate.c:155!
[ 4018.883649] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 4018.889647] Dumping ftrace buffer:
[ 4018.890323]    (ftrace buffer empty)
[ 4018.890323] Modules linked in:
[ 4018.890323] CPU: 4 PID: 9966 Comm: trinity-main Tainted: G        W      
3.17.0-rc4-next-20140910-sasha-00042-ga4bad9b-dirty #1140
[ 4018.890323] task: ffff880695b83000 ti: ffff880560c44000 task.ti: 
ffff880560c44000
[ 4018.890323] RIP: 0010:[<ffffffff9b2fd4c1>]  [<ffffffff9b2fd4c1>] 
remove_migration_pte+0x3e1/0x3f0
[ 4018.890323] RSP: 0000:ffff880560c477c8  EFLAGS: 00010292
[ 4018.890323] RAX: 0000000000000001 RBX: 00007f3fd129b000 RCX: 0000000000000000
[ 4018.890323] RDX: 0000000000000001 RSI: ffffffff9e4ba395 RDI: 0000000000000001
[ 4018.890323] RBP: ffff880560c47800 R08: 0000000000000001 R09: 0000000000000001
[ 4018.890323] R10: 0000000000045401 R11: 0000000000000001 R12: ffff8801a0f1e800
[ 4018.890323] R13: ffff88054b17d000 R14: ffffea000478eb40 R15: ffff880122bcf070
[ 4018.890323] FS:  00007f3fd55bb700(0000) GS:ffff8803d6a00000(0000) 
knlGS:0000000000000000
[ 4018.890323] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 4018.890323] CR2: 0000000000fcbca8 CR3: 0000000561bab000 CR4: 00000000000006a0
[ 4018.890323] DR0: 00000000006f0000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4018.890323] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[ 4018.890323] Stack:
[ 4018.890323]  ffffea00046ed980 ffff88011079c4d8 ffffea000478eb40 
ffff880560c47858
[ 4018.890323]  ffff88019fde0330 00000000000421bc ffff8801a0f1e800 
ffff880560c47848
[ 4018.890323]  ffffffff9b2d1b0f ffff880bc858a200 ffff880560c47850 
ffffea000478eb40
[ 4018.890323] Call Trace:
[ 4018.890323]  [<ffffffff9b2d1b0f>] rmap_walk+0x22f/0x380
[ 4018.890323]  [<ffffffff9b2fc841>] remove_migration_ptes+0x41/0x50
[ 4018.890323]  [<ffffffff9b2fd0e0>] ? 
__migration_entry_wait.isra.24+0x160/0x160
[ 4018.890323]  [<ffffffff9b2fd4d0>] ? remove_migration_pte+0x3f0/0x3f0
[ 4018.890323]  [<ffffffff9b2fe73b>] move_to_new_page+0x16b/0x230
[ 4018.890323]  [<ffffffff9b2d1e8c>] ? try_to_unmap+0x6c/0xf0
[ 4018.890323]  [<ffffffff9b2d08a0>] ? try_to_unmap_nonlinear+0x5c0/0x5c0
[ 4018.890323]  [<ffffffff9b2cf0a0>] ? invalid_migration_vma+0x30/0x30
[ 4018.890323]  [<ffffffff9b2d02e0>] ? page_remove_rmap+0x320/0x320
[ 4018.890323]  [<ffffffff9b2ff19c>] migrate_pages+0x85c/0x930
[ 4018.890323]  [<ffffffff9b2b8e20>] ? isolate_freepages_block+0x410/0x410
[ 4018.890323]  [<ffffffff9b2b7a60>] ? arch_local_save_flags+0x30/0x30
[ 4018.890323]  [<ffffffff9b2b9803>] compact_zone+0x4d3/0x8a0
[ 4018.890323]  [<ffffffff9b2b9c2f>] compact_zone_order+0x5f/0xa0
[ 4018.890323]  [<ffffffff9b2b9f87>] try_to_compact_pages+0x127/0x2f0
[ 4018.890323]  [<ffffffff9b298c98>] __alloc_pages_direct_compact+0x68/0x200
[ 4018.890323]  [<ffffffff9b2995af>] __alloc_pages_nodemask+0x77f/0xd90
[ 4018.890323]  [<ffffffff9b192fad>] ? sched_clock_local+0x1d/0x90
[ 4018.890323]  [<ffffffff9b2e8a1c>] alloc_pages_vma+0x13c/0x270
[ 4018.890323]  [<ffffffff9b305934>] ? do_huge_pmd_wp_page+0x494/0xc90
[ 4018.890323]  [<ffffffff9b305934>] do_huge_pmd_wp_page+0x494/0xc90
[ 4018.890323]  [<ffffffff9b308d40>] ? __mem_cgroup_count_vm_event+0xd0/0x240
[ 4018.890323]  [<ffffffff9b2c4b7d>] handle_mm_fault+0x8bd/0xc50
[ 4018.890323]  [<ffffffff9b1ba6e6>] ? __lock_is_held+0x56/0x80
[ 4018.890323]  [<ffffffff9b0afbc7>] __do_page_fault+0x1b7/0x660
[ 4018.890323]  [<ffffffff9b1b5c5e>] ? put_lock_stats.isra.13+0xe/0x30
[ 4018.890323]  [<ffffffff9b193f41>] ? vtime_account_user+0x91/0xa0
[ 4018.890323]  [<ffffffff9b28ac35>] ? context_tracking_user_exit+0xb5/0x1b0
[ 4018.890323]  [<ffffffff9bb55d33>] ? __this_cpu_preempt_check+0x13/0x20
[ 4018.890323]  [<ffffffff9b1b62e2>] ? trace_hardirqs_off_caller+0xe2/0x1b0
[ 4018.890323]  [<ffffffff9b0b0141>] trace_do_page_fault+0x51/0x2b0
[ 4018.890323]  [<ffffffff9b0a6e83>] do_async_page_fault+0x63/0xd0
[ 4018.890323]  [<ffffffff9e4bccf8>] async_page_fault+0x28/0x30
[ 4018.890323] Code: 0f 0b 48 c7 c6 b0 f2 71 9f 4c 89 f7 e8 b9 79 f9 ff 0f 0b 
48 83 c9 02 41 f6 44 24 50 02 0f 85 70 fe ff ff 4c 89 e7 e8 af 4a f9 ff <0f> 0b 
0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55
[ 4018.890323] RIP  [<ffffffff9b2fd4c1>] remove_migration_pte+0x3e1/0x3f0
[ 4018.890323]  RSP <ffff880560c477c8>

And from a different log:

[ 2035.602565] vma ffff88054b666c00 start 00007f561ffad000 end 00007f56203ad000
[ 2035.602565] next ffff88054b665a00 prev ffff8801f7a31800 mm ffff8804f207a000
[ 2035.602565] prot 120 anon_vma           (null) vm_ops ffffffffb5671e80
[ 2035.602565] pgoff 0 file ffff88054b430a80 private_data           (null)
[ 2035.608469] flags: 0x80000f8(shared|mayread|maywrite|mayexec|mayshare)


And on a maybe related note, I've started seeing the following today. It may
be because we fixed mbind() in trinity but it could also be related to
this issue (free_pgtables() is in the call chain). If you don't think it has
anything to do with it let me know and I'll start a new thread:

[ 1195.996803] BUG: unable to handle kernel NULL pointer dereference at         
  (null)
[ 1196.001744] IP: __rb_erase_color (include/linux/rbtree_augmented.h:107 
lib/rbtree.c:229 lib/rbtree.c:367)
[ 1196.001744] PGD 196787067 PUD 117522067 PMD 0
[ 1196.001744] Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 1196.001744] Dumping ftrace buffer:
[ 1196.001744]    (ftrace buffer empty)
[ 1196.001744] Modules linked in:
[ 1196.001744] CPU: 5 PID: 5724 Comm: trinity-c890 Not tainted 
3.17.0-rc4-next-20140910-sasha-00042-ga4bad9b-dirty #1140
[ 1196.001744] task: ffff88024207b000 ti: ffff8808b25e0000 task.ti: 
ffff8808b25e0000
[ 1196.001744] RIP: __rb_erase_color (include/linux/rbtree_augmented.h:107 
lib/rbtree.c:229 lib/rbtree.c:367)
[ 1196.001744] RSP: 0018:ffff8808b25e3d18  EFLAGS: 00010286
[ 1196.001744] RAX: ffff8808890ed059 RBX: ffff88091f75f458 RCX: 0000000000000000
[ 1196.001744] RDX: 0000000000000000 RSI: ffff8800b83396c8 RDI: ffff8808890ed058
[ 1196.001744] RBP: ffff8808b25e3d40 R08: ffff8808890ed058 R09: 0000000000000000
[ 1196.001744] R10: 0000000000000000 R11: ffff88085697d658 R12: ffff8808890ed058
[ 1196.001744] R13: ffffffff912ba700 R14: ffff8800b83396c8 R15: 0000000000000000
[ 1196.001744] FS:  00007f00e4458700(0000) GS:ffff880492c00000(0000) 
knlGS:0000000000000000
[ 1196.001744] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1196.001744] CR2: 0000000000000000 CR3: 0000000196786000 CR4: 00000000000006a0
[ 1196.001744] DR0: 00000000006f0000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1196.001744] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000070602
[ 1196.001744] Stack:
[ 1196.001744]  ffff88085697d600 ffff8800b5d13480 ffff8800b83396e0 
ffff8800b8339660
[ 1196.001744]  ffff88085697d600 ffff8808b25e3d58 ffffffff912ba9e4 
ffff88085697d600
[ 1196.001744]  ffff8808b25e3d78 ffffffff912c8446 ffff88085697d600 
ffff8800b5d13480
[ 1196.001744] Call Trace:
[ 1196.001744] vma_interval_tree_remove (mm/interval_tree.c:24)
[ 1196.001744] __remove_shared_vm_struct (mm/mmap.c:232)
[ 1196.001744] unlink_file_vma (mm/mmap.c:246)
[ 1196.001744] free_pgtables (mm/memory.c:547)
[ 1196.001744] exit_mmap (mm/mmap.c:2826)
[ 1196.001744] mmput (kernel/fork.c:654)
[ 1196.001744] do_exit (./arch/x86/include/asm/thread_info.h:168 
kernel/exit.c:461 kernel/exit.c:746)
[ 1196.001744] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 1196.001744] ? trace_hardirqs_on (kernel/locking/lockdep.c:2609)
[ 1196.001744] do_group_exit (./arch/x86/include/asm/current.h:14 
kernel/exit.c:874)
[ 1196.001744] SyS_exit_group (kernel/exit.c:900)
[ 1196.001744] tracesys (arch/x86/kernel/entry_64.S:542)
[ 1196.001744] Code: e2 49 89 c4 49 8b 5c 24 08 48 39 d3 0f 84 e2 00 00 00 f6 
03 01 75 ad 4c 8b 7b 10 4c 89 e0 48 83 c8 01 4d 89 7c 24 08 4c 89 63 10 <49> 89 
07 49 8b 04 24 48 89 03 48 83 e0 fc 49 89 1c 24 0f 84 69
All code
========
   0:   e2 49                   loop   0x4b
   2:   89 c4                   mov    %eax,%esp
   4:   49 8b 5c 24 08          mov    0x8(%r12),%rbx
   9:   48 39 d3                cmp    %rdx,%rbx
   c:   0f 84 e2 00 00 00       je     0xf4
  12:   f6 03 01                testb  $0x1,(%rbx)
  15:   75 ad                   jne    0xffffffffffffffc4
  17:   4c 8b 7b 10             mov    0x10(%rbx),%r15
  1b:   4c 89 e0                mov    %r12,%rax
  1e:   48 83 c8 01             or     $0x1,%rax
  22:   4d 89 7c 24 08          mov    %r15,0x8(%r12)
  27:   4c 89 63 10             mov    %r12,0x10(%rbx)
  2b:*  49 89 07                mov    %rax,(%r15)              <-- trapping 
instruction
  2e:   49 8b 04 24             mov    (%r12),%rax
  32:   48 89 03                mov    %rax,(%rbx)
  35:   48 83 e0 fc             and    $0xfffffffffffffffc,%rax
  39:   49 89 1c 24             mov    %rbx,(%r12)
  3d:   0f                      .byte 0xf
  3e:   84 69 00                test   %ch,0x0(%rcx)

Code starting with the faulting instruction
===========================================
   0:   49 89 07                mov    %rax,(%r15)
   3:   49 8b 04 24             mov    (%r12),%rax
   7:   48 89 03                mov    %rax,(%rbx)
   a:   48 83 e0 fc             and    $0xfffffffffffffffc,%rax
   e:   49 89 1c 24             mov    %rbx,(%r12)
  12:   0f                      .byte 0xf
  13:   84 69 00                test   %ch,0x0(%rcx)
[ 1196.001744] RIP __rb_erase_color (include/linux/rbtree_augmented.h:107 
lib/rbtree.c:229 lib/rbtree.c:367)
[ 1196.001744]  RSP <ffff8808b25e3d18>
[ 1196.001744] CR2: 0000000000000000
[ 1196.001744] ---[ end trace 67e0103d243f3c04 ]---
[ 1196.050031] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000008
[ 1196.050031] IP: __rb_insert_augmented (lib/rbtree.c:94 lib/rbtree.c:411)
[ 1196.050031] PGD a3ea09067 PUD a69b38067 PMD 0
[ 1196.050031] Oops: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC
[ 1196.050031] Dumping ftrace buffer:
[ 1196.050031]    (ftrace buffer empty)
[ 1196.050031] Modules linked in:
[ 1196.050031] CPU: 3 PID: 5688 Comm: trinity-c802 Tainted: G      D        
3.17.0-rc4-next-20140910-sasha-00042-ga4bad9b-dirty #1140
[ 1196.050031] task: ffff880a508f8000 ti: ffff880a6950c000 task.ti: 
ffff880a6950c000
[ 1196.050031] RIP: __rb_insert_augmented (lib/rbtree.c:94 lib/rbtree.c:411)
[ 1196.050031] RSP: 0018:ffff880a6950fd68  EFLAGS: 00010246
[ 1196.050031] RAX: ffff88091f75a058 RBX: 0000000000000000 RCX: 0000000000000000
[ 1196.050031] RDX: ffffffff912ba700 RSI: ffff8800b4cb3718 RDI: ffff8802d786ca58
[ 1196.050031] RBP: ffff880a6950fd90 R08: ffff8802d786ca00 R09: ffff8800b4cb3718
[ 1196.050031] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8801fd067400
[ 1196.050031] R13: ffff8802d786ca00 R14: ffff8800b4cb3718 R15: 00007f00e44589d0
[ 1196.050031] FS:  00007f00e4458700(0000) GS:ffff88031ac00000(0000) 
knlGS:0000000000000000
[ 1196.050031] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1196.050031] CR2: 0000000000000008 CR3: 0000000a62597000 CR4: 00000000000006a0
[ 1196.050031] DR0: 00000000006f0000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1196.050031] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000070602
[ 1196.050031] Stack:
[ 1196.050031]  ffffffff9115499e ffff88028bc7a000 ffff8801fd067400 
ffff8802d786ca00
[ 1196.050031]  ffff8800b4cb3730 ffff880a6950fda0 ffffffff912babfd 
ffff880a6950fe70
[ 1196.050031]  ffffffff91154a77 ffff8800b4cb36b0 000000003ebe3540 
0000000000000000
[ 1196.050031] Call Trace:
[ 1196.050031] ? copy_process (kernel/fork.c:409 kernel/fork.c:859 
kernel/fork.c:913 kernel/fork.c:1381)
[ 1196.050031] vma_interval_tree_insert_after (mm/interval_tree.c:60)
[ 1196.050031] copy_process (kernel/fork.c:442 kernel/fork.c:859 
kernel/fork.c:913 kernel/fork.c:1381)
[ 1196.050031] do_fork (kernel/fork.c:1644)
[ 1196.050031] ? context_tracking_user_exit 
(./arch/x86/include/asm/paravirt.h:809 (discriminator 2) 
kernel/context_tracking.c:184 (discriminator 2))
[ 1196.050031] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 1196.050031] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2559 
kernel/locking/lockdep.c:2601)
[ 1196.050031] ? trace_hardirqs_on (kernel/locking/lockdep.c:2609)
[ 1196.050031] SyS_clone (kernel/fork.c:1733)
[ 1196.050031] stub_clone (arch/x86/kernel/entry_64.S:637)
[ 1196.050031] ? tracesys (arch/x86/kernel/entry_64.S:542)
[ 1196.050031] Code: ff ff 0f 1f 00 48 8b 07 48 85 c0 0f 84 a4 01 00 00 55 48 
89 e5 41 56 49 89 f6 41 55 41 54 53 48 83 ec 08 48 8b 18 f6 c3 01 75 6b <48> 8b 
4b 08 49 89 d8 48 39 c8 0f 84 a5 00 00 00 48 85 c9 74 05
All code
========
   0:   ff                      (bad)
   1:   ff 0f                   decl   (%rdi)
   3:   1f                      (bad)
   4:   00 48 8b                add    %cl,-0x75(%rax)
   7:   07                      (bad)
   8:   48 85 c0                test   %rax,%rax
   b:   0f 84 a4 01 00 00       je     0x1b5
  11:   55                      push   %rbp
  12:   48 89 e5                mov    %rsp,%rbp
  15:   41 56                   push   %r14
  17:   49 89 f6                mov    %rsi,%r14
  1a:   41 55                   push   %r13
  1c:   41 54                   push   %r12
  1e:   53                      push   %rbx
  1f:   48 83 ec 08             sub    $0x8,%rsp
  23:   48 8b 18                mov    (%rax),%rbx
  26:   f6 c3 01                test   $0x1,%bl
  29:   75 6b                   jne    0x96
  2b:*  48 8b 4b 08             mov    0x8(%rbx),%rcx           <-- trapping 
instruction
  2f:   49 89 d8                mov    %rbx,%r8
  32:   48 39 c8                cmp    %rcx,%rax
  35:   0f 84 a5 00 00 00       je     0xe0
  3b:   48 85 c9                test   %rcx,%rcx
  3e:   74 05                   je     0x45
        ...

Code starting with the faulting instruction
===========================================
   0:   48 8b 4b 08             mov    0x8(%rbx),%rcx
   4:   49 89 d8                mov    %rbx,%r8
   7:   48 39 c8                cmp    %rcx,%rax
   a:   0f 84 a5 00 00 00       je     0xb5
  10:   48 85 c9                test   %rcx,%rcx
  13:   74 05                   je     0x1a
        ...
[ 1196.050031] RIP __rb_insert_augmented (lib/rbtree.c:94 lib/rbtree.c:411)
[ 1196.050031]  RSP <ffff880a6950fd68>
[ 1196.050031] CR2: 0000000000000008


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to