** Description changed: SRU Justification: [Impact] Users of ppc64el hardware need the ability to use crashdumps to do kernel debugging. [Fix] Commit upstream and already in utopic: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=429d2e8342954d337abe370d957e78291032d867 [Test Case] Taken from: https://wiki.ubuntu.com/Kernel/CrashdumpRecipe https://help.ubuntu.com/14.04/serverguide/kernel-crash-dump.html 1) apt-get install linux-crashdump - 2) reboot the machine - 3) sudo sed -i 's/USE_KDUMP=0/USE_KDUMP=1/g' /etc/default/kdump-tools - 4) kdump-config show # should return no errors - 5) echo 'c' | sudo tee /proc/sysrq-trigger - 6) This should crash the machine and we should kexec into another kernel to dump the core, then on the next reboot we should see a crash in /var/crash/* + 2) increase crashdump size: + sudo vim /etc/default/grub.d/kexec-tools.cfg + set crashkernel=1024M + sudo update-grub + 3) reboot the machine + 4) sudo sed -i 's/USE_KDUMP=0/USE_KDUMP=1/g' /etc/default/kdump-tools + 5) kdump-config show # should return no errors + 6) echo 'c' | sudo tee /proc/sysrq-trigger + 7) This should crash the machine and we should kexec into another kernel to dump the core, then on the next reboot we should see a crash in /var/crash/* -- - ---Problem Description--- kdump is not producing a dump on powerKVM LE P8 Ubuntu 14.04 ---uname output--- 3.13.0-30-generic ---Additional Hardware Info--- Power8 LE configuration. ---Patches Installed--- 1324544 - kdump-config load fails with vmlinux kernel (vs. vmlinuz) Machine Type = 8247-22L ---Steps to Reproduce--- Installed kdump-tools 1.5.5-2ubuntu1 and crash 7.0.3-3ubuntu3. Updated /etc/default/kdump-tools, first I updated just USE_KDUMP=1. Rebooted the node and see: root=UUID=87986483-5fec-4b4d-b22e-bf2a72096df8 ro quiet splash crashkernel=384M-:128M root@c656f2n02:~# cat /proc/sys/kernel/sysrq 1 root@c656f2n02:~# cat /proc/sys/kernel/sysrq 1 root@c656f2n02:~# ^Cnd /proc | grep sysrq root@c656f2n02:~# kdump-config status current state : ready to kdump root@c656f2n02:~# kdump-config show USE_KDUMP: 1 KDUMP_SYSCTL: kernel.panic_on_oops=1 KDUMP_COREDIR: /var/crash crashkernel addr: current state: ready to kdump kexec command: /sbin/kexec -p --args-linux --command-line="root=UUID=87986483-5fec-4b4d-b22e-bf2a72096df8 ro quiet splash irqpoll maxcpus=1 nousb" --initrd=/boot/initrd.img-3.13.0-30-generic /boot/vmlinux-3.13.0-30-generic root@c656f2n02:/boot/grub# cat /sys/kernel/kexec_crash_loaded 1 root@c656f2n02:/boot/grub# cat /sys/kernel/kexec_loaded 0 echo c > /proc/sysrq-trigger root@c656f2n02:/var/log# echo c > /proc/sysrq-trigger [ 1956.014243] SysRq : Trigger a crash [ 1956.014328] Unable to handle kernel paging request for data at address 0x00000000 [ 1956.014404] Faulting instruction address: 0xc000000000586c2c [ 1956.014468] Oops: Kernel access of bad area, sig: 11 [#1] [ 1956.014518] SMP NR_CPUS=2048 NUMA PowerNV [ 1956.014570] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables autofs4 rdma_ucm(OF) ib_ucm(OF) rdma_cm(OF) iw_cm(OF) ib_ipoib(OF) ib_cm(OF) ib_uverbs(OF) ib_umad(OF) mlx5_ib(OF) mlx5_core(OF) mlx4_ib(OF) ib_sa(OF) ib_mad(OF) ib_core(OF) ib_addr(OF) mlx4_en(OF) mlx4_core(OF) compat(OF) nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache rtc_generic powernv_rng ses enclosure ipr [ 1956.015306] CPU: 146 PID: 2522 Comm: bash Tainted: GF O 3.13.0-30-generic #54-Ubuntu [ 1956.015394] task: c000003fcabda120 ti: c000003fcac58000 task.ti: c000003fcac58000 [ 1956.015469] NIP: c000000000586c2c LR: c000000000587b8c CTR: c000000000586c00 [ 1956.015543] REGS: c000003fcac5b820 TRAP: 0300 Tainted: GF O (3.13.0-30-generic) [ 1956.015617] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 42422822 XER: 20000000 [ 1956.015804] CFAR: c000000000009318 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 0 GPR00: c000000000587b8c c000003fcac5baa0 c00000000162e840 0000000000000063 GPR04: c000000002f45bd0 c000000002f564c8 0000000000015ad0 c000000001827480 GPR08: c000000000dfe840 0000000000000000 0000000000000001 0000000000015ad0 GPR12: 0000000042422822 c000000007e5ff00 000001002fe90648 000000001016e008 GPR16: 000000001013ad70 000001002fe94648 000000001016fed0 000000001016e008 GPR20: 00000000100c31e0 0000000000000000 0000000010171fc8 000000001016f840 GPR24: 0000000000000004 0000000000000000 0000000000000001 c0000000014b7dc8 GPR28: c000000001974c90 0000000000000063 c00000000148d9c0 c0000000014b8188 [ 1956.016794] NIP [c000000000586c2c] .sysrq_handle_crash+0x2c/0x40 [ 1956.016858] LR [c000000000587b8c] .__handle_sysrq+0xfc/0x260 [ 1956.016920] Call Trace: [ 1956.016948] [c000003fcac5baa0] [0000000010172a34] 0x10172a34 (unreliable) [ 1956.017025] [c000003fcac5bb10] [c000000000587b8c] .__handle_sysrq+0xfc/0x260 [ 1956.017101] [c000003fcac5bbd0] [c000000000588324] .write_sysrq_trigger+0x74/0x90 [ 1956.017190] [c000003fcac5bc50] [c0000000002dff1c] .proc_reg_write+0xac/0x110 [ 1956.017266] [c000003fcac5bcf0] [c000000000254c00] .vfs_write+0xe0/0x260 [ 1956.017342] [c000003fcac5bd90] [c0000000002558f4] .SyS_write+0x64/0xe0 [ 1956.017418] [c000003fcac5be30] [c00000000000a158] syscall_exit+0x0/0x98 [ 1956.017492] Instruction dump: [ 1956.017530] 4bffffac 7c0802a6 f8010010 f821ff91 60000000 60000000 3d42001f 392a8ca8 [ 1956.017658] 39400001 91490000 7c0004ac 39200000 <99490000> 38210070 e8010010 7c0803a6 [ 1956.017894] ---[ end trace d163ff42366bde72 ]--- [ 1956.017986] [ 1956.018042] Sending IPI to other CPUs [ 1956.019188] IPI complete -> smp_release_cpus() spinning_secondaries = 159 <- smp_release_cpus() <- setup_system() The console stays remains at this message until I power cycle the cec. There is no /proc/vmcore on reboot. I recreated the hang on my victim node. Some CPUs are hitting the 4400's interrupt vector. I think this is due to the commit 429d2e834295 "powerpc: Fix kdump hang issue on p8 with relocation on exception enabled." from Mahesh but I need to double check that since it may not be only patch missing. Definitively, the patch I mentioned is fixing the hang. Here are the commit details : https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=429d2e8342954d337abe370d957e78291032d867 powerpc: Fix kdump hang issue on p8 with relocation on exception enabled. On p8 systems, with relocation on exception feature enabled we are seeing kdump kernel hang at interrupt vector 0xc*4400. The reason is, with this feature enabled, exception are raised with MMU (IR=DR=1) ON with the default offset of 0xc*4000. Since exception is raised in virtual mode it requires the vector region to be executable without which it fails to fetch and execute instruction at 0xc*4xxx. For default kernel since kernel is loaded at real 0, the htab mappings sets the entire kernel text region executable. But for relocatable kernel (e.g. kdump case) we only copy interrupt vectors down to real 0 and never marked that region as executable because in p7 and below we always get exception in real mode. This patch fixes this issue by marking htab mapping range as executable that overlaps with the interrupt vector region for relocatable kernel. Thanks to Ben who helped me to debug this issue and find the root cause. Signed-off-by: Mahesh Salgaonkar <mah...@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <b...@kernel.crashing.org> I think this bug should be mirrored to Ubuntu so they can include this patch in the 14.04 kernel, and may be also in the 14.10 kernel too.
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1352056 Title: linux: kdump on Ubuntu 14.04 is not generating a dump. Status in “linux” package in Ubuntu: Fix Released Status in “linux” source package in Trusty: In Progress Status in “linux” source package in Utopic: Fix Released Bug description: SRU Justification: [Impact] Users of ppc64el hardware need the ability to use crashdumps to do kernel debugging. [Fix] Commit upstream and already in utopic: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=429d2e8342954d337abe370d957e78291032d867 [Test Case] Taken from: https://wiki.ubuntu.com/Kernel/CrashdumpRecipe https://help.ubuntu.com/14.04/serverguide/kernel-crash-dump.html 1) apt-get install linux-crashdump 2) increase crashdump size: sudo vim /etc/default/grub.d/kexec-tools.cfg set crashkernel=1024M sudo update-grub 3) reboot the machine 4) sudo sed -i 's/USE_KDUMP=0/USE_KDUMP=1/g' /etc/default/kdump-tools 5) kdump-config show # should return no errors 6) echo 'c' | sudo tee /proc/sysrq-trigger 7) This should crash the machine and we should kexec into another kernel to dump the core, then on the next reboot we should see a crash in /var/crash/* -- ---Problem Description--- kdump is not producing a dump on powerKVM LE P8 Ubuntu 14.04 ---uname output--- 3.13.0-30-generic ---Additional Hardware Info--- Power8 LE configuration. ---Patches Installed--- 1324544 - kdump-config load fails with vmlinux kernel (vs. vmlinuz) Machine Type = 8247-22L ---Steps to Reproduce--- Installed kdump-tools 1.5.5-2ubuntu1 and crash 7.0.3-3ubuntu3. Updated /etc/default/kdump-tools, first I updated just USE_KDUMP=1. Rebooted the node and see: root=UUID=87986483-5fec-4b4d-b22e-bf2a72096df8 ro quiet splash crashkernel=384M-:128M root@c656f2n02:~# cat /proc/sys/kernel/sysrq 1 root@c656f2n02:~# cat /proc/sys/kernel/sysrq 1 root@c656f2n02:~# ^Cnd /proc | grep sysrq root@c656f2n02:~# kdump-config status current state : ready to kdump root@c656f2n02:~# kdump-config show USE_KDUMP: 1 KDUMP_SYSCTL: kernel.panic_on_oops=1 KDUMP_COREDIR: /var/crash crashkernel addr: current state: ready to kdump kexec command: /sbin/kexec -p --args-linux --command-line="root=UUID=87986483-5fec-4b4d-b22e-bf2a72096df8 ro quiet splash irqpoll maxcpus=1 nousb" --initrd=/boot/initrd.img-3.13.0-30-generic /boot/vmlinux-3.13.0-30-generic root@c656f2n02:/boot/grub# cat /sys/kernel/kexec_crash_loaded 1 root@c656f2n02:/boot/grub# cat /sys/kernel/kexec_loaded 0 echo c > /proc/sysrq-trigger root@c656f2n02:/var/log# echo c > /proc/sysrq-trigger [ 1956.014243] SysRq : Trigger a crash [ 1956.014328] Unable to handle kernel paging request for data at address 0x00000000 [ 1956.014404] Faulting instruction address: 0xc000000000586c2c [ 1956.014468] Oops: Kernel access of bad area, sig: 11 [#1] [ 1956.014518] SMP NR_CPUS=2048 NUMA PowerNV [ 1956.014570] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables autofs4 rdma_ucm(OF) ib_ucm(OF) rdma_cm(OF) iw_cm(OF) ib_ipoib(OF) ib_cm(OF) ib_uverbs(OF) ib_umad(OF) mlx5_ib(OF) mlx5_core(OF) mlx4_ib(OF) ib_sa(OF) ib_mad(OF) ib_core(OF) ib_addr(OF) mlx4_en(OF) mlx4_core(OF) compat(OF) nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache rtc_generic powernv_rng ses enclosure ipr [ 1956.015306] CPU: 146 PID: 2522 Comm: bash Tainted: GF O 3.13.0-30-generic #54-Ubuntu [ 1956.015394] task: c000003fcabda120 ti: c000003fcac58000 task.ti: c000003fcac58000 [ 1956.015469] NIP: c000000000586c2c LR: c000000000587b8c CTR: c000000000586c00 [ 1956.015543] REGS: c000003fcac5b820 TRAP: 0300 Tainted: GF O (3.13.0-30-generic) [ 1956.015617] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 42422822 XER: 20000000 [ 1956.015804] CFAR: c000000000009318 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 0 GPR00: c000000000587b8c c000003fcac5baa0 c00000000162e840 0000000000000063 GPR04: c000000002f45bd0 c000000002f564c8 0000000000015ad0 c000000001827480 GPR08: c000000000dfe840 0000000000000000 0000000000000001 0000000000015ad0 GPR12: 0000000042422822 c000000007e5ff00 000001002fe90648 000000001016e008 GPR16: 000000001013ad70 000001002fe94648 000000001016fed0 000000001016e008 GPR20: 00000000100c31e0 0000000000000000 0000000010171fc8 000000001016f840 GPR24: 0000000000000004 0000000000000000 0000000000000001 c0000000014b7dc8 GPR28: c000000001974c90 0000000000000063 c00000000148d9c0 c0000000014b8188 [ 1956.016794] NIP [c000000000586c2c] .sysrq_handle_crash+0x2c/0x40 [ 1956.016858] LR [c000000000587b8c] .__handle_sysrq+0xfc/0x260 [ 1956.016920] Call Trace: [ 1956.016948] [c000003fcac5baa0] [0000000010172a34] 0x10172a34 (unreliable) [ 1956.017025] [c000003fcac5bb10] [c000000000587b8c] .__handle_sysrq+0xfc/0x260 [ 1956.017101] [c000003fcac5bbd0] [c000000000588324] .write_sysrq_trigger+0x74/0x90 [ 1956.017190] [c000003fcac5bc50] [c0000000002dff1c] .proc_reg_write+0xac/0x110 [ 1956.017266] [c000003fcac5bcf0] [c000000000254c00] .vfs_write+0xe0/0x260 [ 1956.017342] [c000003fcac5bd90] [c0000000002558f4] .SyS_write+0x64/0xe0 [ 1956.017418] [c000003fcac5be30] [c00000000000a158] syscall_exit+0x0/0x98 [ 1956.017492] Instruction dump: [ 1956.017530] 4bffffac 7c0802a6 f8010010 f821ff91 60000000 60000000 3d42001f 392a8ca8 [ 1956.017658] 39400001 91490000 7c0004ac 39200000 <99490000> 38210070 e8010010 7c0803a6 [ 1956.017894] ---[ end trace d163ff42366bde72 ]--- [ 1956.017986] [ 1956.018042] Sending IPI to other CPUs [ 1956.019188] IPI complete -> smp_release_cpus() spinning_secondaries = 159 <- smp_release_cpus() <- setup_system() The console stays remains at this message until I power cycle the cec. There is no /proc/vmcore on reboot. I recreated the hang on my victim node. Some CPUs are hitting the 4400's interrupt vector. I think this is due to the commit 429d2e834295 "powerpc: Fix kdump hang issue on p8 with relocation on exception enabled." from Mahesh but I need to double check that since it may not be only patch missing. Definitively, the patch I mentioned is fixing the hang. Here are the commit details : https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=429d2e8342954d337abe370d957e78291032d867 powerpc: Fix kdump hang issue on p8 with relocation on exception enabled. On p8 systems, with relocation on exception feature enabled we are seeing kdump kernel hang at interrupt vector 0xc*4400. The reason is, with this feature enabled, exception are raised with MMU (IR=DR=1) ON with the default offset of 0xc*4000. Since exception is raised in virtual mode it requires the vector region to be executable without which it fails to fetch and execute instruction at 0xc*4xxx. For default kernel since kernel is loaded at real 0, the htab mappings sets the entire kernel text region executable. But for relocatable kernel (e.g. kdump case) we only copy interrupt vectors down to real 0 and never marked that region as executable because in p7 and below we always get exception in real mode. This patch fixes this issue by marking htab mapping range as executable that overlaps with the interrupt vector region for relocatable kernel. Thanks to Ben who helped me to debug this issue and find the root cause. Signed-off-by: Mahesh Salgaonkar <mah...@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <b...@kernel.crashing.org> I think this bug should be mirrored to Ubuntu so they can include this patch in the 14.04 kernel, and may be also in the 14.10 kernel too. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1352056/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp