[Kernel-packages] [Bug 1323165] Re: [HP ProLiant DL380p Gen8] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
** Tags removed: verification-needed-trusty ** Tags added: verification-done-trusty -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1323165 Title: [HP ProLiant DL380p Gen8] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756! Status in “linux” package in Ubuntu: Fix Released Status in “linux” source package in Trusty: Fix Released Status in “linux” source package in Utopic: Fix Released Bug description: The machine becomes non-responsive, unable to ssh, high load average, trying to access the running java process does not work as per syslog: May 26 06:19:38 server06 kernel: [75831.929529] [ cut here ] May 26 06:19:38 server06 kernel: [75831.930191] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756! May 26 06:19:38 server06 kernel: [75831.931129] invalid opcode: [#1] SMP May 26 06:19:38 server06 kernel: [75831.931729] Modules linked in: xt_multiport ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables gpio_ich nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw sb_edac edac_core lpc_ich hpwdt hpilo ioatdma lp dca ipmi_si parport acpi_power_meter mac_hid tg3 ptp psmouse hpsa pps_core May 26 06:19:38 server06 kernel: [75831.941585] CPU: 4 PID: 2930 Comm: java Not tainted 3.13.0-24-generic #47-Ubuntu May 26 06:19:38 server06 kernel: [75831.942633] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 02/10/2014 May 26 06:19:38 server06 kernel: [75831.943583] task: 881fe8372fe0 ti: 881fe632a000 task.ti: 881fe632a000 May 26 06:19:38 server06 kernel: [75831.944654] RIP: 0010:[] [] handle_mm_fault+0xe61/0xf10 May 26 06:19:38 server06 kernel: [75831.946137] RSP: :881fe632bd98 EFLAGS: 00010246 May 26 06:19:38 server06 kernel: [75831.946885] RAX: 0100 RBX: 7fc37320a370 RCX: 881fe632bb18 May 26 06:19:38 server06 kernel: [75831.947902] RDX: 881fe8372fe0 RSI: RDI: 800100c009e6 May 26 06:19:38 server06 kernel: [75831.948932] RBP: 881fe632be20 R08: R09: 00a9 May 26 06:19:38 server06 kernel: [75831.949952] R10: 0001 R11: R12: 881fd83a7cc8 May 26 06:19:38 server06 kernel: [75831.950961] R13: 880fe6787d40 R14: 880fe5d95780 R15: 0080 May 26 06:19:38 server06 kernel: [75831.951985] FS: 7fc938145700() GS:880fffa8() knlGS: May 26 06:19:38 server06 kernel: [75831.976736] CS: 0010 DS: ES: CR0: 80050033 May 26 06:19:38 server06 kernel: [75832.005183] CR2: 7fc373620930 CR3: 000fe63fe000 CR4: 000407e0 May 26 06:19:38 server06 kernel: [75832.033473] Stack: May 26 06:19:38 server06 kernel: [75832.060551] 0001 881fe632bdb0 8109a780 881fe632bdd0 May 26 06:19:38 server06 kernel: [75832.117385] 810d7ad6 0001 81f1ea20 881fe632be78 May 26 06:19:38 server06 kernel: [75832.173599] 810d983d 881fe632be48 88a9 0001 May 26 06:19:38 server06 kernel: [75832.231813] Call Trace: May 26 06:19:38 server06 kernel: [75832.258781] [] ? wake_up_state+0x10/0x20 May 26 06:19:38 server06 kernel: [75832.286702] [] ? wake_futex+0x66/0x90 May 26 06:19:38 server06 kernel: [75832.311849] [] ? futex_wake_op+0x4ed/0x620 May 26 06:19:38 server06 kernel: [75832.337329] [] __do_page_fault+0x184/0x560 May 26 06:19:38 server06 kernel: [75832.363061] [] ? acct_account_cputime+0x1c/0x20 May 26 06:19:38 server06 kernel: [75832.387739] [] ? account_user_time+0x8b/0xa0 May 26 06:19:38 server06 kernel: [75832.411608] [] ? vtime_account_user+0x54/0x60 May 26 06:19:38 server06 kernel: [75832.436126] [] do_page_fault+0x1a/0x70 May 26 06:19:38 server06 kernel: [75832.458239] [] page_fault+0x28/0x30 May 26 06:19:38 server06 kernel: [75832.481780] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7 May 26 06:19:38 server06 kernel: [75832.551672] RIP [] handle_mm_fault+0xe61/0xf10 May 26 06:19:38 server06 kernel: [75832.574254] RSP May 26 06:19:38 server06 kernel: [75832.630392] ---[ end trace e41b58adf8e0d72b ]--- With Precise, it runs with 16 threads with no problem so it would appear a regression. WORKAROUND: Run the task with 8 threads instead of 16
[Kernel-packages] [Bug 1323165] Re: [HP ProLiant DL380p Gen8] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
Not sure if this information is still relevant, but the server has been stable for three days now with the workaround "echo never > /sys/kernel/mm/transparent_hugepage/enabled" -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1323165 Title: [HP ProLiant DL380p Gen8] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756! Status in “linux” package in Ubuntu: Fix Released Status in “linux” source package in Trusty: In Progress Status in “linux” source package in Utopic: Fix Released Bug description: The machine becomes non-responsive, unable to ssh, high load average, trying to access the running java process does not work as per syslog: May 26 06:19:38 server06 kernel: [75831.929529] [ cut here ] May 26 06:19:38 server06 kernel: [75831.930191] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756! May 26 06:19:38 server06 kernel: [75831.931129] invalid opcode: [#1] SMP May 26 06:19:38 server06 kernel: [75831.931729] Modules linked in: xt_multiport ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables gpio_ich nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw sb_edac edac_core lpc_ich hpwdt hpilo ioatdma lp dca ipmi_si parport acpi_power_meter mac_hid tg3 ptp psmouse hpsa pps_core May 26 06:19:38 server06 kernel: [75831.941585] CPU: 4 PID: 2930 Comm: java Not tainted 3.13.0-24-generic #47-Ubuntu May 26 06:19:38 server06 kernel: [75831.942633] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 02/10/2014 May 26 06:19:38 server06 kernel: [75831.943583] task: 881fe8372fe0 ti: 881fe632a000 task.ti: 881fe632a000 May 26 06:19:38 server06 kernel: [75831.944654] RIP: 0010:[] [] handle_mm_fault+0xe61/0xf10 May 26 06:19:38 server06 kernel: [75831.946137] RSP: :881fe632bd98 EFLAGS: 00010246 May 26 06:19:38 server06 kernel: [75831.946885] RAX: 0100 RBX: 7fc37320a370 RCX: 881fe632bb18 May 26 06:19:38 server06 kernel: [75831.947902] RDX: 881fe8372fe0 RSI: RDI: 800100c009e6 May 26 06:19:38 server06 kernel: [75831.948932] RBP: 881fe632be20 R08: R09: 00a9 May 26 06:19:38 server06 kernel: [75831.949952] R10: 0001 R11: R12: 881fd83a7cc8 May 26 06:19:38 server06 kernel: [75831.950961] R13: 880fe6787d40 R14: 880fe5d95780 R15: 0080 May 26 06:19:38 server06 kernel: [75831.951985] FS: 7fc938145700() GS:880fffa8() knlGS: May 26 06:19:38 server06 kernel: [75831.976736] CS: 0010 DS: ES: CR0: 80050033 May 26 06:19:38 server06 kernel: [75832.005183] CR2: 7fc373620930 CR3: 000fe63fe000 CR4: 000407e0 May 26 06:19:38 server06 kernel: [75832.033473] Stack: May 26 06:19:38 server06 kernel: [75832.060551] 0001 881fe632bdb0 8109a780 881fe632bdd0 May 26 06:19:38 server06 kernel: [75832.117385] 810d7ad6 0001 81f1ea20 881fe632be78 May 26 06:19:38 server06 kernel: [75832.173599] 810d983d 881fe632be48 88a9 0001 May 26 06:19:38 server06 kernel: [75832.231813] Call Trace: May 26 06:19:38 server06 kernel: [75832.258781] [] ? wake_up_state+0x10/0x20 May 26 06:19:38 server06 kernel: [75832.286702] [] ? wake_futex+0x66/0x90 May 26 06:19:38 server06 kernel: [75832.311849] [] ? futex_wake_op+0x4ed/0x620 May 26 06:19:38 server06 kernel: [75832.337329] [] __do_page_fault+0x184/0x560 May 26 06:19:38 server06 kernel: [75832.363061] [] ? acct_account_cputime+0x1c/0x20 May 26 06:19:38 server06 kernel: [75832.387739] [] ? account_user_time+0x8b/0xa0 May 26 06:19:38 server06 kernel: [75832.411608] [] ? vtime_account_user+0x54/0x60 May 26 06:19:38 server06 kernel: [75832.436126] [] do_page_fault+0x1a/0x70 May 26 06:19:38 server06 kernel: [75832.458239] [] page_fault+0x28/0x30 May 26 06:19:38 server06 kernel: [75832.481780] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7 May 26 06:19:38 server06 kernel: [75832.551672] RIP [] handle_mm_fault+0xe61/0xf10 May 26 06:19:38 server06 kernel: [75832.574254] RSP May 26 06:19:38 server06 kernel: [75832.630392] ---[ end trace e41b58adf8e0d72b ]--- With Precise, it runs with 16 threads with no problem
[Kernel-packages] [Bug 1323165] Re: [HP ProLiant DL380p Gen8] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
We have not yet been able to try the workaround since we decided to downgrade the servers to 12.04.0 which is taking all our time due to the number of servers involved. I will try the workaround on a test server with the same config and report back by the end of this week. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1323165 Title: [HP ProLiant DL380p Gen8] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756! Status in “linux” package in Ubuntu: Confirmed Bug description: The machine becomes non-responsive, unable to ssh, high load average, trying to access the running java process does not work as per syslog: May 26 06:19:38 server06 kernel: [75831.929529] [ cut here ] May 26 06:19:38 server06 kernel: [75831.930191] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756! May 26 06:19:38 server06 kernel: [75831.931129] invalid opcode: [#1] SMP May 26 06:19:38 server06 kernel: [75831.931729] Modules linked in: xt_multiport ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables gpio_ich nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw sb_edac edac_core lpc_ich hpwdt hpilo ioatdma lp dca ipmi_si parport acpi_power_meter mac_hid tg3 ptp psmouse hpsa pps_core May 26 06:19:38 server06 kernel: [75831.941585] CPU: 4 PID: 2930 Comm: java Not tainted 3.13.0-24-generic #47-Ubuntu May 26 06:19:38 server06 kernel: [75831.942633] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 02/10/2014 May 26 06:19:38 server06 kernel: [75831.943583] task: 881fe8372fe0 ti: 881fe632a000 task.ti: 881fe632a000 May 26 06:19:38 server06 kernel: [75831.944654] RIP: 0010:[] [] handle_mm_fault+0xe61/0xf10 May 26 06:19:38 server06 kernel: [75831.946137] RSP: :881fe632bd98 EFLAGS: 00010246 May 26 06:19:38 server06 kernel: [75831.946885] RAX: 0100 RBX: 7fc37320a370 RCX: 881fe632bb18 May 26 06:19:38 server06 kernel: [75831.947902] RDX: 881fe8372fe0 RSI: RDI: 800100c009e6 May 26 06:19:38 server06 kernel: [75831.948932] RBP: 881fe632be20 R08: R09: 00a9 May 26 06:19:38 server06 kernel: [75831.949952] R10: 0001 R11: R12: 881fd83a7cc8 May 26 06:19:38 server06 kernel: [75831.950961] R13: 880fe6787d40 R14: 880fe5d95780 R15: 0080 May 26 06:19:38 server06 kernel: [75831.951985] FS: 7fc938145700() GS:880fffa8() knlGS: May 26 06:19:38 server06 kernel: [75831.976736] CS: 0010 DS: ES: CR0: 80050033 May 26 06:19:38 server06 kernel: [75832.005183] CR2: 7fc373620930 CR3: 000fe63fe000 CR4: 000407e0 May 26 06:19:38 server06 kernel: [75832.033473] Stack: May 26 06:19:38 server06 kernel: [75832.060551] 0001 881fe632bdb0 8109a780 881fe632bdd0 May 26 06:19:38 server06 kernel: [75832.117385] 810d7ad6 0001 81f1ea20 881fe632be78 May 26 06:19:38 server06 kernel: [75832.173599] 810d983d 881fe632be48 88a9 0001 May 26 06:19:38 server06 kernel: [75832.231813] Call Trace: May 26 06:19:38 server06 kernel: [75832.258781] [] ? wake_up_state+0x10/0x20 May 26 06:19:38 server06 kernel: [75832.286702] [] ? wake_futex+0x66/0x90 May 26 06:19:38 server06 kernel: [75832.311849] [] ? futex_wake_op+0x4ed/0x620 May 26 06:19:38 server06 kernel: [75832.337329] [] __do_page_fault+0x184/0x560 May 26 06:19:38 server06 kernel: [75832.363061] [] ? acct_account_cputime+0x1c/0x20 May 26 06:19:38 server06 kernel: [75832.387739] [] ? account_user_time+0x8b/0xa0 May 26 06:19:38 server06 kernel: [75832.411608] [] ? vtime_account_user+0x54/0x60 May 26 06:19:38 server06 kernel: [75832.436126] [] do_page_fault+0x1a/0x70 May 26 06:19:38 server06 kernel: [75832.458239] [] page_fault+0x28/0x30 May 26 06:19:38 server06 kernel: [75832.481780] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7 May 26 06:19:38 server06 kernel: [75832.551672] RIP [] handle_mm_fault+0xe61/0xf10 May 26 06:19:38 server06 kernel: [75832.574254] RSP May 26 06:19:38 server06 kernel: [75832.630392] ---[ end trace e41b58adf8e0d72b ]--- With Precise, it runs with 16 threads with no problem so it would appear a
[Kernel-packages] [Bug 1323165] Re: [HP ProLiant DL380p Gen8] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
Hello Philipp, any known workarounds for this apart from reducing the load on the server? I have not yet tried the latest mainline kernel, but from your comment it sounds like it will not help. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1323165 Title: [HP ProLiant DL380p Gen8] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756! Status in “linux” package in Ubuntu: Confirmed Bug description: The machine becomes non-responsive, unable to ssh, high load average, trying to access the running java process does not work as per syslog: May 26 06:19:38 server06 kernel: [75831.929529] [ cut here ] May 26 06:19:38 server06 kernel: [75831.930191] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756! May 26 06:19:38 server06 kernel: [75831.931129] invalid opcode: [#1] SMP May 26 06:19:38 server06 kernel: [75831.931729] Modules linked in: xt_multiport ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables gpio_ich nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw sb_edac edac_core lpc_ich hpwdt hpilo ioatdma lp dca ipmi_si parport acpi_power_meter mac_hid tg3 ptp psmouse hpsa pps_core May 26 06:19:38 server06 kernel: [75831.941585] CPU: 4 PID: 2930 Comm: java Not tainted 3.13.0-24-generic #47-Ubuntu May 26 06:19:38 server06 kernel: [75831.942633] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 02/10/2014 May 26 06:19:38 server06 kernel: [75831.943583] task: 881fe8372fe0 ti: 881fe632a000 task.ti: 881fe632a000 May 26 06:19:38 server06 kernel: [75831.944654] RIP: 0010:[] [] handle_mm_fault+0xe61/0xf10 May 26 06:19:38 server06 kernel: [75831.946137] RSP: :881fe632bd98 EFLAGS: 00010246 May 26 06:19:38 server06 kernel: [75831.946885] RAX: 0100 RBX: 7fc37320a370 RCX: 881fe632bb18 May 26 06:19:38 server06 kernel: [75831.947902] RDX: 881fe8372fe0 RSI: RDI: 800100c009e6 May 26 06:19:38 server06 kernel: [75831.948932] RBP: 881fe632be20 R08: R09: 00a9 May 26 06:19:38 server06 kernel: [75831.949952] R10: 0001 R11: R12: 881fd83a7cc8 May 26 06:19:38 server06 kernel: [75831.950961] R13: 880fe6787d40 R14: 880fe5d95780 R15: 0080 May 26 06:19:38 server06 kernel: [75831.951985] FS: 7fc938145700() GS:880fffa8() knlGS: May 26 06:19:38 server06 kernel: [75831.976736] CS: 0010 DS: ES: CR0: 80050033 May 26 06:19:38 server06 kernel: [75832.005183] CR2: 7fc373620930 CR3: 000fe63fe000 CR4: 000407e0 May 26 06:19:38 server06 kernel: [75832.033473] Stack: May 26 06:19:38 server06 kernel: [75832.060551] 0001 881fe632bdb0 8109a780 881fe632bdd0 May 26 06:19:38 server06 kernel: [75832.117385] 810d7ad6 0001 81f1ea20 881fe632be78 May 26 06:19:38 server06 kernel: [75832.173599] 810d983d 881fe632be48 88a9 0001 May 26 06:19:38 server06 kernel: [75832.231813] Call Trace: May 26 06:19:38 server06 kernel: [75832.258781] [] ? wake_up_state+0x10/0x20 May 26 06:19:38 server06 kernel: [75832.286702] [] ? wake_futex+0x66/0x90 May 26 06:19:38 server06 kernel: [75832.311849] [] ? futex_wake_op+0x4ed/0x620 May 26 06:19:38 server06 kernel: [75832.337329] [] __do_page_fault+0x184/0x560 May 26 06:19:38 server06 kernel: [75832.363061] [] ? acct_account_cputime+0x1c/0x20 May 26 06:19:38 server06 kernel: [75832.387739] [] ? account_user_time+0x8b/0xa0 May 26 06:19:38 server06 kernel: [75832.411608] [] ? vtime_account_user+0x54/0x60 May 26 06:19:38 server06 kernel: [75832.436126] [] do_page_fault+0x1a/0x70 May 26 06:19:38 server06 kernel: [75832.458239] [] page_fault+0x28/0x30 May 26 06:19:38 server06 kernel: [75832.481780] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7 May 26 06:19:38 server06 kernel: [75832.551672] RIP [] handle_mm_fault+0xe61/0xf10 May 26 06:19:38 server06 kernel: [75832.574254] RSP May 26 06:19:38 server06 kernel: [75832.630392] ---[ end trace e41b58adf8e0d72b ]--- With Precise, it runs with 16 threads with no problem so it would appear a regression. WORKAROUND: Run the task with 8 threads instead of 16. Probl
[Kernel-packages] [Bug 1323165] Re: [HP ProLiant DL380p Gen8] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
We have some more information which may help. Once this problem happens, then the system goes into some kind of unstable state. Existing console or ssh sessions continue working as long as we don't try to access the java process. However, new SSH sessions don't start. Logging in from the console leads to the server information getting displayed but the command prompt does not show up after that. The task that we execute on the server has a setting for number of threads to use and it is set to 16 by default which consistently leads to this bug after 24-48 hours of processing. We tried to run the same task with 8 threads and it has been running without any problem for days. We have multiple servers with the exact same hardware and software where we are seeing this bug. We have downgraded one of them to Ubuntu 12.04.0 and that server has been working fine even with 16 threads. We have now upgraded the other servers to the new kernel released yesterday (3.13.0-27) and will report back if the issue is fixed there. If not, then we will try the latest mainline kernel. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1323165 Title: [HP ProLiant DL380p Gen8] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756! Status in “linux” package in Ubuntu: Incomplete Bug description: The machine becomes non-responsive, unable to ssh, high load average, trying to access the running java process does not work as per syslog: May 26 06:19:38 server06 kernel: [75831.929529] [ cut here ] May 26 06:19:38 server06 kernel: [75831.930191] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756! May 26 06:19:38 server06 kernel: [75831.931129] invalid opcode: [#1] SMP May 26 06:19:38 server06 kernel: [75831.931729] Modules linked in: xt_multiport ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables gpio_ich nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw sb_edac edac_core lpc_ich hpwdt hpilo ioatdma lp dca ipmi_si parport acpi_power_meter mac_hid tg3 ptp psmouse hpsa pps_core May 26 06:19:38 server06 kernel: [75831.941585] CPU: 4 PID: 2930 Comm: java Not tainted 3.13.0-24-generic #47-Ubuntu May 26 06:19:38 server06 kernel: [75831.942633] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 02/10/2014 May 26 06:19:38 server06 kernel: [75831.943583] task: 881fe8372fe0 ti: 881fe632a000 task.ti: 881fe632a000 May 26 06:19:38 server06 kernel: [75831.944654] RIP: 0010:[] [] handle_mm_fault+0xe61/0xf10 May 26 06:19:38 server06 kernel: [75831.946137] RSP: :881fe632bd98 EFLAGS: 00010246 May 26 06:19:38 server06 kernel: [75831.946885] RAX: 0100 RBX: 7fc37320a370 RCX: 881fe632bb18 May 26 06:19:38 server06 kernel: [75831.947902] RDX: 881fe8372fe0 RSI: RDI: 800100c009e6 May 26 06:19:38 server06 kernel: [75831.948932] RBP: 881fe632be20 R08: R09: 00a9 May 26 06:19:38 server06 kernel: [75831.949952] R10: 0001 R11: R12: 881fd83a7cc8 May 26 06:19:38 server06 kernel: [75831.950961] R13: 880fe6787d40 R14: 880fe5d95780 R15: 0080 May 26 06:19:38 server06 kernel: [75831.951985] FS: 7fc938145700() GS:880fffa8() knlGS: May 26 06:19:38 server06 kernel: [75831.976736] CS: 0010 DS: ES: CR0: 80050033 May 26 06:19:38 server06 kernel: [75832.005183] CR2: 7fc373620930 CR3: 000fe63fe000 CR4: 000407e0 May 26 06:19:38 server06 kernel: [75832.033473] Stack: May 26 06:19:38 server06 kernel: [75832.060551] 0001 881fe632bdb0 8109a780 881fe632bdd0 May 26 06:19:38 server06 kernel: [75832.117385] 810d7ad6 0001 81f1ea20 881fe632be78 May 26 06:19:38 server06 kernel: [75832.173599] 810d983d 881fe632be48 88a9 0001 May 26 06:19:38 server06 kernel: [75832.231813] Call Trace: May 26 06:19:38 server06 kernel: [75832.258781] [] ? wake_up_state+0x10/0x20 May 26 06:19:38 server06 kernel: [75832.286702] [] ? wake_futex+0x66/0x90 May 26 06:19:38 server06 kernel: [75832.311849] [] ? futex_wake_op+0x4ed/0x620 May 26 06:19:38 server06 kernel: [75832.337329] [] __do_page_fault+0x184/0x560 May 26 06:19:38 server06 kernel: [75832.363061] [] ? acct_account_cputime+0x1c/0x20 May 26 06:19:38 server06 kernel: [75832.387739] [] ? account_user_time+0
[Kernel-packages] [Bug 1323165] [NEW] kernel bug which seems to affect java processes
Public bug reported: More information for Bug #1315736 Reproducing Comment #36 from there: We are also seeing this bug. The machine becomes non-responsive, unable to ssh, high load average, trying to access the running java process does not work. I will file a bug as described in Comment #30 Our hardware is HP Proliant DL380p and we see the following in the syslog May 26 06:19:38 server06 kernel: [75831.929529] [ cut here ] May 26 06:19:38 server06 kernel: [75831.930191] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756! May 26 06:19:38 server06 kernel: [75831.931129] invalid opcode: [#1] SMP May 26 06:19:38 server06 kernel: [75831.931729] Modules linked in: xt_multiport ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables gpio_ich nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw sb_edac edac_core lpc_ich hpwdt hpilo ioatdma lp dca ipmi_si parport acpi_power_meter mac_hid tg3 ptp psmouse hpsa pps_core May 26 06:19:38 server06 kernel: [75831.941585] CPU: 4 PID: 2930 Comm: java Not tainted 3.13.0-24-generic #47-Ubuntu May 26 06:19:38 server06 kernel: [75831.942633] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 02/10/2014 May 26 06:19:38 server06 kernel: [75831.943583] task: 881fe8372fe0 ti: 881fe632a000 task.ti: 881fe632a000 May 26 06:19:38 server06 kernel: [75831.944654] RIP: 0010:[] [] handle_mm_fault+0xe61/0xf10 May 26 06:19:38 server06 kernel: [75831.946137] RSP: :881fe632bd98 EFLAGS: 00010246 May 26 06:19:38 server06 kernel: [75831.946885] RAX: 0100 RBX: 7fc37320a370 RCX: 881fe632bb18 May 26 06:19:38 server06 kernel: [75831.947902] RDX: 881fe8372fe0 RSI: RDI: 800100c009e6 May 26 06:19:38 server06 kernel: [75831.948932] RBP: 881fe632be20 R08: R09: 00a9 May 26 06:19:38 server06 kernel: [75831.949952] R10: 0001 R11: R12: 881fd83a7cc8 May 26 06:19:38 server06 kernel: [75831.950961] R13: 880fe6787d40 R14: 880fe5d95780 R15: 0080 May 26 06:19:38 server06 kernel: [75831.951985] FS: 7fc938145700() GS:880fffa8() knlGS: May 26 06:19:38 server06 kernel: [75831.976736] CS: 0010 DS: ES: CR0: 80050033 May 26 06:19:38 server06 kernel: [75832.005183] CR2: 7fc373620930 CR3: 000fe63fe000 CR4: 000407e0 May 26 06:19:38 server06 kernel: [75832.033473] Stack: May 26 06:19:38 server06 kernel: [75832.060551] 0001 881fe632bdb0 8109a780 881fe632bdd0 May 26 06:19:38 server06 kernel: [75832.117385] 810d7ad6 0001 81f1ea20 881fe632be78 May 26 06:19:38 server06 kernel: [75832.173599] 810d983d 881fe632be48 88a9 0001 May 26 06:19:38 server06 kernel: [75832.231813] Call Trace: May 26 06:19:38 server06 kernel: [75832.258781] [] ? wake_up_state+0x10/0x20 May 26 06:19:38 server06 kernel: [75832.286702] [] ? wake_futex+0x66/0x90 May 26 06:19:38 server06 kernel: [75832.311849] [] ? futex_wake_op+0x4ed/0x620 May 26 06:19:38 server06 kernel: [75832.337329] [] __do_page_fault+0x184/0x560 May 26 06:19:38 server06 kernel: [75832.363061] [] ? acct_account_cputime+0x1c/0x20 May 26 06:19:38 server06 kernel: [75832.387739] [] ? account_user_time+0x8b/0xa0 May 26 06:19:38 server06 kernel: [75832.411608] [] ? vtime_account_user+0x54/0x60 May 26 06:19:38 server06 kernel: [75832.436126] [] do_page_fault+0x1a/0x70 May 26 06:19:38 server06 kernel: [75832.458239] [] page_fault+0x28/0x30 May 26 06:19:38 server06 kernel: [75832.481780] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7 May 26 06:19:38 server06 kernel: [75832.551672] RIP [] handle_mm_fault+0xe61/0xf10 May 26 06:19:38 server06 kernel: [75832.574254] RSP May 26 06:19:38 server06 kernel: [75832.630392] ---[ end trace e41b58adf8e0d72b ]--- ProblemType: Bug DistroRelease: Ubuntu 14.04 Package: linux-image-3.13.0-24-generic 3.13.0-24.47 ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9 Uname: Linux 3.13.0-24-generic x86_64 AlsaDevices: total 0 crw-rw 1 root audio 116, 1 May 26 09:30 seq crw-rw 1 root audio 116, 33 May 26 09:30 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.14.1-0ubuntu3.2 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse
[Kernel-packages] [Bug 1315736] Re: [Dell PowerEdge R720] Machine Check Exception
We are also seeing this bug. The machine becomes non-responsive, unable to ssh, high load average, trying to access the running java process does not work. I will file a bug as described in Comment #30 Our hardware is HP Proliant DL380p and we see the following in the syslog May 26 06:19:38 server06 kernel: [75831.929529] [ cut here ] May 26 06:19:38 server06 kernel: [75831.930191] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756! May 26 06:19:38 server06 kernel: [75831.931129] invalid opcode: [#1] SMP May 26 06:19:38 server06 kernel: [75831.931729] Modules linked in: xt_multiport ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables gpio_ich nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw sb_edac edac_core lpc_ich hpwdt hpilo ioatdma lp dca ipmi_si parport acpi_power_meter mac_hid tg3 ptp psmouse hpsa pps_core May 26 06:19:38 server06 kernel: [75831.941585] CPU: 4 PID: 2930 Comm: java Not tainted 3.13.0-24-generic #47-Ubuntu May 26 06:19:38 server06 kernel: [75831.942633] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 02/10/2014 May 26 06:19:38 server06 kernel: [75831.943583] task: 881fe8372fe0 ti: 881fe632a000 task.ti: 881fe632a000 May 26 06:19:38 server06 kernel: [75831.944654] RIP: 0010:[] [] handle_mm_fault+0xe61/0xf10 May 26 06:19:38 server06 kernel: [75831.946137] RSP: :881fe632bd98 EFLAGS: 00010246 May 26 06:19:38 server06 kernel: [75831.946885] RAX: 0100 RBX: 7fc37320a370 RCX: 881fe632bb18 May 26 06:19:38 server06 kernel: [75831.947902] RDX: 881fe8372fe0 RSI: RDI: 800100c009e6 May 26 06:19:38 server06 kernel: [75831.948932] RBP: 881fe632be20 R08: R09: 00a9 May 26 06:19:38 server06 kernel: [75831.949952] R10: 0001 R11: R12: 881fd83a7cc8 May 26 06:19:38 server06 kernel: [75831.950961] R13: 880fe6787d40 R14: 880fe5d95780 R15: 0080 May 26 06:19:38 server06 kernel: [75831.951985] FS: 7fc938145700() GS:880fffa8() knlGS: May 26 06:19:38 server06 kernel: [75831.976736] CS: 0010 DS: ES: CR0: 80050033 May 26 06:19:38 server06 kernel: [75832.005183] CR2: 7fc373620930 CR3: 000fe63fe000 CR4: 000407e0 May 26 06:19:38 server06 kernel: [75832.033473] Stack: May 26 06:19:38 server06 kernel: [75832.060551] 0001 881fe632bdb0 8109a780 881fe632bdd0 May 26 06:19:38 server06 kernel: [75832.117385] 810d7ad6 0001 81f1ea20 881fe632be78 May 26 06:19:38 server06 kernel: [75832.173599] 810d983d 881fe632be48 88a9 0001 May 26 06:19:38 server06 kernel: [75832.231813] Call Trace: May 26 06:19:38 server06 kernel: [75832.258781] [] ? wake_up_state+0x10/0x20 May 26 06:19:38 server06 kernel: [75832.286702] [] ? wake_futex+0x66/0x90 May 26 06:19:38 server06 kernel: [75832.311849] [] ? futex_wake_op+0x4ed/0x620 May 26 06:19:38 server06 kernel: [75832.337329] [] __do_page_fault+0x184/0x560 May 26 06:19:38 server06 kernel: [75832.363061] [] ? acct_account_cputime+0x1c/0x20 May 26 06:19:38 server06 kernel: [75832.387739] [] ? account_user_time+0x8b/0xa0 May 26 06:19:38 server06 kernel: [75832.411608] [] ? vtime_account_user+0x54/0x60 May 26 06:19:38 server06 kernel: [75832.436126] [] do_page_fault+0x1a/0x70 May 26 06:19:38 server06 kernel: [75832.458239] [] page_fault+0x28/0x30 May 26 06:19:38 server06 kernel: [75832.481780] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 18 25 a6 81 44 89 4d c8 e8 18 e7 May 26 06:19:38 server06 kernel: [75832.551672] RIP [] handle_mm_fault+0xe61/0xf10 May 26 06:19:38 server06 kernel: [75832.574254] RSP May 26 06:19:38 server06 kernel: [75832.630392] ---[ end trace e41b58adf8e0d72b ]--- -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1315736 Title: [Dell PowerEdge R720] Machine Check Exception Status in “linux” package in Ubuntu: Incomplete Bug description: Dell PowerEdge 720 on ubuntu 14.04 shows MCE errors on dmesg. Dell support instructed to run DSET and BIOS hardware diagnostics. Neither of the tools showed any errors. Dell support said that if there was a hardware error it would have been shown on Dell logs and the probable reason for the dmesg log i