[Kernel-packages] [Bug 1665113] Comment bridged from LTC Bugzilla
--- Comment From laurent.duf...@fr.ibm.com 2017-03-02 05:37 EDT--- Based on the discussion in the mm mailing list, only the first patch is going to be accepted upstream: https://patchwork.kernel.org/patch/9588337/ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1665113 Title: [Ubuntu 17.04] Kernel panics when large number of hugepages is passed as an boot argument to kernel. Status in linux package in Ubuntu: Triaged Bug description: Issue: --- Kernel unable to handle paging request and panic occurs when more number of hugepages is passed as a boot argument to the kernel . Environment: -- Power NV : Habanaro Bare metal OS : Ubuntu 17.04 Kernel Version : 4.9.0-11-generic Steps To reproduce: --- 1 - When the ubuntu Kernel boots try to add the boot argument 'hugepages = 1200'. The Kernel Panics and displays call traces like as below. [5.030274] Unable to handle kernel paging request for data at address 0x [5.030323] Faulting instruction address: 0xc0302848 [5.030366] Oops: Kernel access of bad area, sig: 11 [#1] [5.030399] SMP NR_CPUS=2048 [5.030416] NUMA [5.039443] PowerNV [5.039461] Modules linked in: [5.050091] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.9.0-11-generic #12-Ubuntu [5.053266] Workqueue: events pcpu_balance_workfn [5.080647] task: c03c8fe9b800 task.stack: c03ffb118000 [5.090876] NIP: c0302848 LR: c02709d4 CTR: c016cef0 [5.094175] REGS: c03ffb11b410 TRAP: 0300 Not tainted (4.9.0-11-generic) [5.103040] MSR: 92009033[ 5.114466] CR: 22424222 XER: [5.124932] CFAR: c0008a60 DAR: DSISR: 4000 SOFTE: 1 GPR00: c02709d4 c03ffb11b690 c141a400 c03fff50e300 GPR04: 024001c2 c03ffb11b780 00219df5 GPR08: 003ffb09 c1454fd8 GPR12: 4400 c7b6 024001c2 024001c2 GPR16: 024001c2 0002 GPR20: 000c 024200c0 GPR24: c16eef48 c03fff50fd00 024001c2 GPR28: c03fff50fd00 c03fff50e300 c03ffb11b820 NIP [c0302848] mem_cgroup_soft_limit_reclaim+0xf8/0x4f0 [5.213613] LR [c02709d4] do_try_to_free_pages+0x1b4/0x450 [5.230521] Call Trace: [5.230643] [c03ffb11b760] [c02709d4] do_try_to_free_pages+0x1b4/0x450 [5.254184] [c03ffb11b800] [c0270d68] try_to_free_pages+0xf8/0x270 [5.281896] [c03ffb11b890] [c0259b88] __alloc_pages_nodemask+0x7a8/0xff0 [5.321407] [c03ffb11bab0] [c0282cd0] pcpu_populate_chunk+0x110/0x520 [5.336262] [c03ffb11bb50] [c02841b8] pcpu_balance_workfn+0x758/0x960 [5.351526] [c03ffb11bc50] [c00ecdd0] process_one_work+0x2b0/0x5a0 [5.362561] [c03ffb11bce0] [c00ed168] worker_thread+0xa8/0x660 [5.374007] [c03ffb11bd80] [c00f5320] kthread+0x110/0x130 [5.385160] [c03ffb11be30] [c000c0e8] ret_from_kernel_thread+0x5c/0x74 [5.389456] Instruction dump: [5.410036] eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 4e800020 3d230001 e9499a42 3d220004 [5.423598] 3929abd8 794a1f24 7d295214 eac90100 2fa9 419eff74 3b20 [5.436503] ---[ end trace 23b650e96be5c549 ]--- [5.439700] This is purely a negative scenario where the system does not have enough memory as the hugepages is given a very large argument. Free output in a system: free -h totalusedfree shared buff/cache available Mem: 251G2.1G248G5.2M502M 248G Swap: 2.0G159M1.8G The same scenario when tried after the linux is up like as, echo 1200 > /proc/sys/vm/nr_hugepages HugePages_Total: 15069 HugePages_Free:15069 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 16384 kB root@ltc-haba2:~# free -h totalusedfree shared buff/cache available Mem: 251G237G 13G5.6M311M 13G Swap: 2.0G159M1.8G In this case the kernel is able to allocate around 237 Gb for hugetlb. But while the system is booting it gives us panic so please let know if this scenario is expected to be handled. I identified the root cause of the panic. When the system is running with low memory during mem
[Kernel-packages] [Bug 1665113] Comment bridged from LTC Bugzilla
--- Comment From laurent.duf...@fr.ibm.com 2017-02-23 08:51 EDT--- A new set of 2 patches have been sent to the community : https://patchwork.kernel.org/patch/9588337/ https://patchwork.kernel.org/patch/9588335/ I'm waiting for these 2 patches to be accepted upstream. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1665113 Title: [Ubuntu 17.04] Kernel panics when large number of hugepages is passed as an boot argument to kernel. Status in linux package in Ubuntu: Triaged Bug description: Issue: --- Kernel unable to handle paging request and panic occurs when more number of hugepages is passed as a boot argument to the kernel . Environment: -- Power NV : Habanaro Bare metal OS : Ubuntu 17.04 Kernel Version : 4.9.0-11-generic Steps To reproduce: --- 1 - When the ubuntu Kernel boots try to add the boot argument 'hugepages = 1200'. The Kernel Panics and displays call traces like as below. [5.030274] Unable to handle kernel paging request for data at address 0x [5.030323] Faulting instruction address: 0xc0302848 [5.030366] Oops: Kernel access of bad area, sig: 11 [#1] [5.030399] SMP NR_CPUS=2048 [5.030416] NUMA [5.039443] PowerNV [5.039461] Modules linked in: [5.050091] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.9.0-11-generic #12-Ubuntu [5.053266] Workqueue: events pcpu_balance_workfn [5.080647] task: c03c8fe9b800 task.stack: c03ffb118000 [5.090876] NIP: c0302848 LR: c02709d4 CTR: c016cef0 [5.094175] REGS: c03ffb11b410 TRAP: 0300 Not tainted (4.9.0-11-generic) [5.103040] MSR: 92009033[ 5.114466] CR: 22424222 XER: [5.124932] CFAR: c0008a60 DAR: DSISR: 4000 SOFTE: 1 GPR00: c02709d4 c03ffb11b690 c141a400 c03fff50e300 GPR04: 024001c2 c03ffb11b780 00219df5 GPR08: 003ffb09 c1454fd8 GPR12: 4400 c7b6 024001c2 024001c2 GPR16: 024001c2 0002 GPR20: 000c 024200c0 GPR24: c16eef48 c03fff50fd00 024001c2 GPR28: c03fff50fd00 c03fff50e300 c03ffb11b820 NIP [c0302848] mem_cgroup_soft_limit_reclaim+0xf8/0x4f0 [5.213613] LR [c02709d4] do_try_to_free_pages+0x1b4/0x450 [5.230521] Call Trace: [5.230643] [c03ffb11b760] [c02709d4] do_try_to_free_pages+0x1b4/0x450 [5.254184] [c03ffb11b800] [c0270d68] try_to_free_pages+0xf8/0x270 [5.281896] [c03ffb11b890] [c0259b88] __alloc_pages_nodemask+0x7a8/0xff0 [5.321407] [c03ffb11bab0] [c0282cd0] pcpu_populate_chunk+0x110/0x520 [5.336262] [c03ffb11bb50] [c02841b8] pcpu_balance_workfn+0x758/0x960 [5.351526] [c03ffb11bc50] [c00ecdd0] process_one_work+0x2b0/0x5a0 [5.362561] [c03ffb11bce0] [c00ed168] worker_thread+0xa8/0x660 [5.374007] [c03ffb11bd80] [c00f5320] kthread+0x110/0x130 [5.385160] [c03ffb11be30] [c000c0e8] ret_from_kernel_thread+0x5c/0x74 [5.389456] Instruction dump: [5.410036] eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 4e800020 3d230001 e9499a42 3d220004 [5.423598] 3929abd8 794a1f24 7d295214 eac90100 2fa9 419eff74 3b20 [5.436503] ---[ end trace 23b650e96be5c549 ]--- [5.439700] This is purely a negative scenario where the system does not have enough memory as the hugepages is given a very large argument. Free output in a system: free -h totalusedfree shared buff/cache available Mem: 251G2.1G248G5.2M502M 248G Swap: 2.0G159M1.8G The same scenario when tried after the linux is up like as, echo 1200 > /proc/sys/vm/nr_hugepages HugePages_Total: 15069 HugePages_Free:15069 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 16384 kB root@ltc-haba2:~# free -h totalusedfree shared buff/cache available Mem: 251G237G 13G5.6M311M 13G Swap: 2.0G159M1.8G In this case the kernel is able to allocate around 237 Gb for hugetlb. But while the system is booting it gives us panic so please let know if this scenario is expected to be handled. I identified the root cause of the panic. When
[Kernel-packages] [Bug 1665113] Comment bridged from LTC Bugzilla
--- Comment From laurent.duf...@fr.ibm.com 2017-02-21 02:51 EDT--- As discussed with Michal Hocko, I will write a larger patch which is delayed the soft limit's data as it is rarely used. This will extend the patch I already sent. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1665113 Title: [Ubuntu 17.04] Kernel panics when large number of hugepages is passed as an boot argument to kernel. Status in linux package in Ubuntu: Triaged Bug description: Issue: --- Kernel unable to handle paging request and panic occurs when more number of hugepages is passed as a boot argument to the kernel . Environment: -- Power NV : Habanaro Bare metal OS : Ubuntu 17.04 Kernel Version : 4.9.0-11-generic Steps To reproduce: --- 1 - When the ubuntu Kernel boots try to add the boot argument 'hugepages = 1200'. The Kernel Panics and displays call traces like as below. [5.030274] Unable to handle kernel paging request for data at address 0x [5.030323] Faulting instruction address: 0xc0302848 [5.030366] Oops: Kernel access of bad area, sig: 11 [#1] [5.030399] SMP NR_CPUS=2048 [5.030416] NUMA [5.039443] PowerNV [5.039461] Modules linked in: [5.050091] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.9.0-11-generic #12-Ubuntu [5.053266] Workqueue: events pcpu_balance_workfn [5.080647] task: c03c8fe9b800 task.stack: c03ffb118000 [5.090876] NIP: c0302848 LR: c02709d4 CTR: c016cef0 [5.094175] REGS: c03ffb11b410 TRAP: 0300 Not tainted (4.9.0-11-generic) [5.103040] MSR: 92009033[ 5.114466] CR: 22424222 XER: [5.124932] CFAR: c0008a60 DAR: DSISR: 4000 SOFTE: 1 GPR00: c02709d4 c03ffb11b690 c141a400 c03fff50e300 GPR04: 024001c2 c03ffb11b780 00219df5 GPR08: 003ffb09 c1454fd8 GPR12: 4400 c7b6 024001c2 024001c2 GPR16: 024001c2 0002 GPR20: 000c 024200c0 GPR24: c16eef48 c03fff50fd00 024001c2 GPR28: c03fff50fd00 c03fff50e300 c03ffb11b820 NIP [c0302848] mem_cgroup_soft_limit_reclaim+0xf8/0x4f0 [5.213613] LR [c02709d4] do_try_to_free_pages+0x1b4/0x450 [5.230521] Call Trace: [5.230643] [c03ffb11b760] [c02709d4] do_try_to_free_pages+0x1b4/0x450 [5.254184] [c03ffb11b800] [c0270d68] try_to_free_pages+0xf8/0x270 [5.281896] [c03ffb11b890] [c0259b88] __alloc_pages_nodemask+0x7a8/0xff0 [5.321407] [c03ffb11bab0] [c0282cd0] pcpu_populate_chunk+0x110/0x520 [5.336262] [c03ffb11bb50] [c02841b8] pcpu_balance_workfn+0x758/0x960 [5.351526] [c03ffb11bc50] [c00ecdd0] process_one_work+0x2b0/0x5a0 [5.362561] [c03ffb11bce0] [c00ed168] worker_thread+0xa8/0x660 [5.374007] [c03ffb11bd80] [c00f5320] kthread+0x110/0x130 [5.385160] [c03ffb11be30] [c000c0e8] ret_from_kernel_thread+0x5c/0x74 [5.389456] Instruction dump: [5.410036] eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 4e800020 3d230001 e9499a42 3d220004 [5.423598] 3929abd8 794a1f24 7d295214 eac90100 2fa9 419eff74 3b20 [5.436503] ---[ end trace 23b650e96be5c549 ]--- [5.439700] This is purely a negative scenario where the system does not have enough memory as the hugepages is given a very large argument. Free output in a system: free -h totalusedfree shared buff/cache available Mem: 251G2.1G248G5.2M502M 248G Swap: 2.0G159M1.8G The same scenario when tried after the linux is up like as, echo 1200 > /proc/sys/vm/nr_hugepages HugePages_Total: 15069 HugePages_Free:15069 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 16384 kB root@ltc-haba2:~# free -h totalusedfree shared buff/cache available Mem: 251G237G 13G5.6M311M 13G Swap: 2.0G159M1.8G In this case the kernel is able to allocate around 237 Gb for hugetlb. But while the system is booting it gives us panic so please let know if this scenario is expected to be handled. I identified the root cause of the panic. When the system is running with low