On Thu, Sep 05, 2013 at 05:27:46PM -0400, Naoya Horiguchi wrote: > Thp related code also uses per process mm->page_table_lock now. > So making it fine-grained can provide better performance. > > This patch makes thp support split page table lock by using page->ptl > of the pages storing "pmd_trans_huge" pmds. > > Some functions like pmd_trans_huge_lock() and page_check_address_pmd() > are expected by their caller to pass back the pointer of ptl, so this > patch adds to those functions new arguments for that. Rather than that, > this patch gives only straightforward replacement. > > ChangeLog v3: > - fixed argument of huge_pmd_lockptr() in copy_huge_pmd() > - added missing declaration of ptl in do_huge_pmd_anonymous_page()
I've applied these and tested them using the same tests program that I used when I was working on the same issue, and I'm running into some bugs. Here's a stack trace: general protection fault: 0000 [#1] SMP Modules linked in: CPU: 268 PID: 32381 Comm: memscale Not tainted 3.11.0-medusa-03121-g757f8ca #184 Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series BIOS 01/15/2013 task: ffff880fbdd82180 ti: ffff880fc0c5a000 task.ti: ffff880fc0c5a000 RIP: 0010:[<ffffffff810e3eef>] [<ffffffff810e3eef>] pgtable_trans_huge_withdraw+0x38/0x60 RSP: 0018:ffff880fc0c5bc88 EFLAGS: 00010297 RAX: ffffea17cebe8838 RBX: 00000015309bd000 RCX: ffffea01f623b028 RDX: dead000000100100 RSI: ffff8dcf77d84c30 RDI: ffff880fbda67580 RBP: ffff880fc0c5bc88 R08: 0000000000000013 R09: 0000000000014da0 R10: ffff880fc0c5bc88 R11: ffff888f7efda000 R12: ffff8dcf77d84c30 R13: ffff880fc0c5bdf8 R14: 800005cf401ff067 R15: ffff8b4de5fabff8 FS: 0000000000000000(0000) GS:ffff880fffd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffff768b0b8 CR3: 0000000001a0b000 CR4: 00000000000407e0 Stack: ffff880fc0c5bcc8 ffffffff810f7643 ffff880fc0c5bcc8 ffffffff810d8297 ffffea1456237510 00007fc7b0e00000 0000000000000000 00007fc7b0c00000 ffff880fc0c5bda8 ffffffff810d85ba ffff880fc0c5bd48 ffff880fc0c5bd68 Call Trace: [<ffffffff810f7643>] zap_huge_pmd+0x4c/0x101 [<ffffffff810d8297>] ? tlb_flush_mmu+0x58/0x75 [<ffffffff810d85ba>] unmap_single_vma+0x306/0x7d6 [<ffffffff810d8ad9>] unmap_vmas+0x4f/0x82 [<ffffffff810dab5e>] exit_mmap+0x8b/0x113 [<ffffffff810a9743>] ? __delayacct_add_tsk+0x170/0x182 [<ffffffff8103c609>] mmput+0x3e/0xc4 [<ffffffff8104088c>] do_exit+0x380/0x907 [<ffffffff810fb89c>] ? vfs_write+0x149/0x1a3 [<ffffffff81040e85>] do_group_exit+0x72/0x9b [<ffffffff81040ec0>] SyS_exit_group+0x12/0x16 [<ffffffff814f52d2>] system_call_fastpath+0x16/0x1b Code: 51 20 48 8d 41 20 48 39 c2 75 0d 48 c7 87 28 03 00 00 00 00 00 00 eb 36 48 8d 42 e0 48 89 87 28 03 00 00 48 8b 51 20 48 8b 41 28 <48> 89 42 08 48 89 10 48 ba 00 01 10 00 00 00 ad de 48 b8 00 02 RIP [<ffffffff810e3eef>] pgtable_trans_huge_withdraw+0x38/0x60 RSP <ffff880fc0c5bc88> ---[ end trace e5413b388b6ea448 ]--- Fixing recursive fault but reboot is needed! general protection fault: 0000 [#2] SMP Modules linked in: CPU: 268 PID: 1722 Comm: kworker/268:1 Tainted: G D 3.11.0-medusa-03121-g757f8ca #184 Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series BIOS 01/15/2013 Workqueue: events vmstat_update task: ffff880fc1a74280 ti: ffff880fc1a76000 task.ti: ffff880fc1a76000 RIP: 0010:[<ffffffff810bcdcb>] [<ffffffff810bcdcb>] free_pcppages_bulk+0x97/0x329 RSP: 0018:ffff880fc1a77c98 EFLAGS: 00010082 RAX: ffff880fffd94d68 RBX: dead0000002001e0 RCX: ffff880fffd94d50 RDX: ffff880fffd94d68 RSI: 000000000000001f RDI: ffff888f7efdac68 RBP: ffff880fc1a77cf8 R08: 0000000000000400 R09: ffffffff81a8bf00 R10: ffff884f7efdac00 R11: ffffffff81009bae R12: dead000000200200 R13: ffff888f7efdac00 R14: 000000000000001f R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880fffd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffff768b0b8 CR3: 0000000001a0b000 CR4: 00000000000407e0 Stack: ffff880fc1a77ce8 ffff880fffd94d68 0000000000000010 ffff880fffd94d50 0000001ff9276a68 ffff880fffd94d60 0000000000000000 000000000000001f ffff880fffd94d50 0000000000000292 ffff880fc1a77d38 ffff880fffd95d05 Call Trace: [<ffffffff810bd149>] drain_zone_pages+0x33/0x42 [<ffffffff810cd5a6>] refresh_cpu_vm_stats+0xcc/0x11e [<ffffffff810cd609>] vmstat_update+0x11/0x43 [<ffffffff8105350f>] process_one_work+0x260/0x389 [<ffffffff8105381a>] worker_thread+0x1e2/0x332 [<ffffffff81053638>] ? process_one_work+0x389/0x389 [<ffffffff810579df>] kthread+0xb3/0xbd [<ffffffff81053638>] ? process_one_work+0x389/0x389 [<ffffffff8105792c>] ? kthread_freezable_should_stop+0x5b/0x5b [<ffffffff814f522c>] ret_from_fork+0x7c/0xb0 [<ffffffff8105792c>] ? kthread_freezable_should_stop+0x5b/0x5b Code: 48 89 55 c8 48 39 14 08 74 ce 41 83 fe 03 44 0f 44 75 c4 48 83 c2 08 48 89 45 b0 48 89 55 a8 48 8b 45 a8 4c 8b 20 49 8d 5c 24 e0 <48> 8b 53 20 48 8b 43 28 48 89 42 08 48 89 10 48 ba 00 01 10 00 RIP [<ffffffff810bcdcb>] free_pcppages_bulk+0x97/0x329 RSP <ffff880fc1a77c98> ---[ end trace e5413b388b6ea449 ]--- BUG: unable to handle kernel paging request at ffffffffffffffd8 IP: [<ffffffff8105742c>] kthread_data+0xb/0x11 PGD 1a0c067 PUD 1a0e067 PMD 0 Oops: 0000 [#3] SMP Modules linked in: CPU: 268 PID: 1722 Comm: kworker/268:1 Tainted: G D 3.11.0-medusa-03121-g757f8ca #184 Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series BIOS 01/15/2013 task: ffff880fc1a74280 ti: ffff880fc1a76000 task.ti: ffff880fc1a76000 RIP: 0010:[<ffffffff8105742c>] [<ffffffff8105742c>] kthread_data+0xb/0x11 RSP: 0018:ffff880fc1a77948 EFLAGS: 00010092 RAX: 0000000000000000 RBX: 000000000000010c RCX: 0000000000000000 RDX: 000000000000000f RSI: 000000000000010c RDI: ffff880fc1a74280 RBP: ffff880fc1a77948 R08: 00000000000442c8 R09: 0000000000000000 R10: dead000000200200 R11: ffff880fc1a742e8 R12: ffff880fc1a74868 R13: ffff880fffd91cc0 R14: ffff880ff9b7a040 R15: 000000000000010c FS: 0000000000000000(0000) GS:ffff880fffd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000028 CR3: 0000000001a0b000 CR4: 00000000000407e0 Stack: ffff880fc1a77968 ffffffff8105151f ffff880fc1a77968 ffff880fc1a74280 ffff880fc1a77ab8 ffffffff814f2e98 ffff880fc1a76010 0000000000004000 ffff880fc1a74280 0000000000011cc0 ffff880fc1a77fd8 ffff880fc1a77fd8 Call Trace: [<ffffffff8105151f>] wq_worker_sleeping+0x10/0x82 [<ffffffff814f2e98>] __schedule+0x1b7/0x8f7 [<ffffffff8135d4bd>] ? mix_pool_bytes+0x4a/0x56 [<ffffffff810a5d05>] ? call_rcu_sched+0x16/0x18 [<ffffffff8103f708>] ? release_task+0x3a7/0x3bf [<ffffffff814f36b5>] schedule+0x61/0x63 [<ffffffff81040e0f>] do_exit+0x903/0x907 [<ffffffff8100529a>] oops_end+0xb9/0xc1 [<ffffffff81005393>] die+0x55/0x5e [<ffffffff8100341a>] do_general_protection+0x93/0x139 [<ffffffff814f4d82>] general_protection+0x22/0x30 [<ffffffff81009bae>] ? default_idle+0x6/0x8 [<ffffffff810bcdcb>] ? free_pcppages_bulk+0x97/0x329 [<ffffffff810bcd5d>] ? free_pcppages_bulk+0x29/0x329 [<ffffffff810bd149>] drain_zone_pages+0x33/0x42 [<ffffffff810cd5a6>] refresh_cpu_vm_stats+0xcc/0x11e [<ffffffff810cd609>] vmstat_update+0x11/0x43 [<ffffffff8105350f>] process_one_work+0x260/0x389 [<ffffffff8105381a>] worker_thread+0x1e2/0x332 [<ffffffff81053638>] ? process_one_work+0x389/0x389 [<ffffffff810579df>] kthread+0xb3/0xbd [<ffffffff81053638>] ? process_one_work+0x389/0x389 [<ffffffff8105792c>] ? kthread_freezable_should_stop+0x5b/0x5b [<ffffffff814f522c>] ret_from_fork+0x7c/0xb0 [<ffffffff8105792c>] ? kthread_freezable_should_stop+0x5b/0x5b Code: 65 48 8b 04 25 40 b7 00 00 48 8b 80 90 05 00 00 48 89 e5 48 8b 40 c8 c9 48 c1 e8 02 83 e0 01 c3 48 8b 87 90 05 00 00 55 48 89 e5 <48> 8b 40 d8 c9 c3 48 3b 3d 67 ca c2 00 55 48 89 e5 75 09 0f bf RIP [<ffffffff8105742c>] kthread_data+0xb/0x11 RSP <ffff880fc1a77948> CR2: ffffffffffffffd8 ---[ end trace e5413b388b6ea44a ]--- Fixing recursive fault but reboot is needed! I'm testing on a 528 core machine, with ~2TB of memory, THP on. The test case works like this: - Spawn 512 threads using pthread_create, pin each thread to a separate cpu - Each thread allocates 512mb, local to its cpu - Threads are sent a "go" signal, all threads begin touching the first byte of each 4k chunk of their 512mb simultaneously I'm working on debugging the issue now, but I thought I'd get this out to everyone in case they might have some input. I'll try and get my test program cleaned up and posted somewhere today so that others can try it out as well. - Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/