Can you try 16.04? Thanks. Cascardo.
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to makedumpfile in Ubuntu. https://bugs.launchpad.net/bugs/1681909 Title: Ubuntu 17.04: dump is not captured in remote host when kdump over ssh is configured on firestone. Status in The Ubuntu-power-systems project: Incomplete Status in makedumpfile package in Ubuntu: New Bug description: == Comment: #0 - PAVITHRA R. PRAKASH <pavra...@in.ibm.com> - 2017-03-07 05:00:29 == ---Problem Description--- Ubuntu 17.04: dump is not captured in remote host when kdump over ssh is configured on firestone. ---Steps to Reproduce--- 1. Configure kdump. 2. Check whether kdump is operational using ?# kdump-config show?. 3. Install ?kernel-debuginfo? and ?kernel-debuginfo-common? rpms. 4. Setup password less ssh connection, generate rsa key. # ssh-keygen -t rsa 5. verify id_rsa and id_rsa.pub are created under /root/.ssh/ 6. Edit /etc/default/kdump-tools and add below entries. SSH="ubuntu@9.114.15.239" SSH_KEY=/root/.ssh/id_rsa 7. Propagate RSA key. # kdump-config propagate 8. Restart kdump service. # kdump-config load 9. Trigger Crash using below commands. # echo "1" > /proc/sys/kernel/sysrq # echo "c" > /proc/sysrq-trigger 10. Verify dump is available in remote server in configured path. Machine details =========== $ ipmitool -I lanplus -H 9.47.70.3 -U ADMIN -P admin sol activate $ ssh ubuntu@9.47.70.29 PW: shriya101 Attaching logs == Comment: #1 - PAVITHRA R. PRAKASH <pavra...@in.ibm.com> - 2017-03-07 05:01:42 == == Comment: #5 - PAVITHRA R. PRAKASH <pavra...@in.ibm.com> - 2017-03-07 23:19:46 == Hi, Attaching the logs. Network info: root@ltc-firep3:~# hwinfo --network 36: None 00.0: 10700 Loopback [Created at net.126] Unique ID: ZsBS.GQNx7L4uPNA SysFS ID: /class/net/lo Hardware Class: network interface Model: "Loopback network interface" Device File: lo Link detected: yes Config Status: cfg=new, avail=yes, need=no, active=unknown 37: None 00.0: 10701 Ethernet [Created at net.126] Unique ID: 2lHw.ndpeucax6V1 Parent ID: mIXc.aXC4wIvegH8 SysFS ID: /class/net/enP33p3s0f2 SysFS Device Link: /devices/pci0021:00/0021:00:00.0/0021:01:00.0/0021:02:01.0/0021:03:00.2 Hardware Class: network interface Model: "Ethernet network interface" Driver: "tg3" Driver Modules: "tg3" Device File: enP33p3s0f2 HW Address: 98:be:94:03:18:4a Permanent HW Address: 98:be:94:03:18:4a Link detected: no Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #15 (Ethernet controller) 38: None 00.0: 10701 Ethernet [Created at net.126] Unique ID: 7Onn.ndpeucax6V1 Parent ID: sx0U.aXC4wIvegH8 SysFS ID: /class/net/enP33p3s0f0 SysFS Device Link: /devices/pci0021:00/0021:00:00.0/0021:01:00.0/0021:02:01.0/0021:03:00.0 Hardware Class: network interface Model: "Ethernet network interface" Driver: "tg3" Driver Modules: "tg3" Device File: enP33p3s0f0 HW Address: 98:be:94:03:18:48 Permanent HW Address: 98:be:94:03:18:48 Link detected: yes Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #16 (Ethernet controller) 39: None 00.0: 10701 Ethernet [Created at net.126] Unique ID: VwX_.ndpeucax6V1 Parent ID: DUng.aXC4wIvegH8 SysFS ID: /class/net/enP33p3s0f3 SysFS Device Link: /devices/pci0021:00/0021:00:00.0/0021:01:00.0/0021:02:01.0/0021:03:00.3 Hardware Class: network interface Model: "Ethernet network interface" Driver: "tg3" Driver Modules: "tg3" Device File: enP33p3s0f3 HW Address: 98:be:94:03:18:4b Permanent HW Address: 98:be:94:03:18:4b Link detected: no Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #25 (Ethernet controller) 40: None 00.0: 10701 Ethernet [Created at net.126] Unique ID: bZ1s.ndpeucax6V1 Parent ID: J7HY.aXC4wIvegH8 SysFS ID: /class/net/enP33p3s0f1 SysFS Device Link: /devices/pci0021:00/0021:00:00.0/0021:01:00.0/0021:02:01.0/0021:03:00.1 Hardware Class: network interface Model: "Ethernet network interface" Driver: "tg3" Driver Modules: "tg3" Device File: enP33p3s0f1 HW Address: 98:be:94:03:18:49 Permanent HW Address: 98:be:94:03:18:49 Link detected: no Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #4 (Ethernet controller) root@ltc-firep3:~# Thanks, Pavithra == Comment: #6 - PAVITHRA R. PRAKASH <pavra...@in.ibm.com> - 2017-03-07 23:20:47 == == Comment: #7 - PAVITHRA R. PRAKASH <pavra...@in.ibm.com> - 2017-03-07 23:21:27 == == Comment: #8 - Urvashi Jawere <urjaw...@in.ibm.com> - 2017-03-08 02:48:15 == I am able to see some errors in syslog ; auxiliary Mar 7 04:57:44 ltc-firep3 systemd-resolved[3486]: DNSSEC validation failed for question 114.15.239:/home/ubuntu/test IN SOA: failed-auxiliary Mar 7 04:57:44 ltc-firep3 systemd-resolved[3486]: DNSSEC validation failed for question 9.114.15.239:/home/ubuntu/test IN DS: failed-auxiliary Mar 7 04:57:44 ltc-firep3 systemd-resolved[3486]: DNSSEC validation failed for question 9.114.15.239:/home/ubuntu/test IN SOA: failed-auxiliary Mar 7 04:57:44 ltc-firep3 systemd-resolved[3486]: DNSSEC validation failed for question 9.114.15.239:/home/ubuntu/test IN A: failed-auxiliary Mar 7 04:57:44 ltc-firep3 systemd-resolved[3486]: Server 9.12.16.2 does not support DNSSEC, downgrading to non-DNSSEC mode. Mar 7 04:57:44 ltc-firep3 kdump-config: /root/.ssh/id_rsa failed to be sent to ubuntu@9.114.15.239:/home/ubuntu/test Mar 7 04:58:04 ltc-firep3 systemd[1]: Reloading. Mar 7 04:59:15 ltc-firep3 systemd[1]: Reloading. Mar 7 04:59:16 ltc-firep3 kdump-config: propagated ssh key /root/.ssh/id_rsa to server ubuntu@9.114.15.239 . . . Mar 7 05:06:55 ltc-firep3 systemd[1]: Started Accounts Service. Mar 7 05:06:56 ltc-firep3 kdump-tools[3498]: Starting kdump-tools: Modified cmdline:root=UUID=1e76cfd5-988c-46f4-bdc4-39fe1ed01152 ro quiet splash irqpoll nr_cpus=1 nousb systemd.unit=kdump-tools.service ata_piix.prefer_ms_hyperv=0 elfcorehdr=155136K Mar 7 05:06:57 ltc-firep3 kdump-tools[3498]: * loaded kdump kernel Mar 7 05:06:57 ltc-firep3 kdump-tools: /sbin/kexec -p --command-line="root=UUID=1e76cfd5-988c-46f4-bdc4-39fe1ed01152 ro quiet splash irqpoll nr_cpus=1 nousb systemd.unit=kdump-tools.service ata_piix.prefer_ms_hyperv=0" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz Mar 7 05:06:57 ltc-firep3 kdump-tools: loaded kdump kernel Mar 7 05:06:57 ltc-firep3 systemd[1]: Started Kernel crash dump capture service. Mar 7 05:06:57 ltc-firep3 apport[3584]: ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/linux-image-4.10.0-9-generic-201703060521.crash' Mar 7 05:06:57 ltc-firep3 apport[3584]: ...done. == Comment: #18 - Hari Krishna Bathini <hbath...@in.ibm.com> - 2017-03-28 06:55:20 == Looks like tg3 module was not needed after all. Interesting thing though is even after enP34p1s0f0 is up (ifup) and network.online target is reached, network was not really active. It took about 30 seconds, after reaching network.online target, for the network to be active, even on a normal boot. Adding this wait time in kdump script, before saving dump, ensured that vmcore is captured successful. Attaching the log for the same.. Not sure why enP34p1s0f0 is taking that long to configure/initialize. Even so, this delay should be part of ifup/network-online.target if it is inevitable, so that network is pingable after network-online.target Thanks Hari == Comment: #19 - Hari Krishna Bathini <hbath...@in.ibm.com> - 2017-03-28 07:01:52 == The workaround snippet adding delay in kdump script: --- kdump-config.orig 2017-03-28 03:35:17.753542107 -0500 +++ kdump-config 2017-03-28 06:59:22.887576623 -0500 @@ -761,6 +761,7 @@ KDUMP_DMESGFILE="$KDUMP_STAMPDIR/dmesg.$KDUMP_STAMP" ERROR=0 + sleep 30 ssh -i $KDUMP_SSH_KEY $KDUMP_REMOTE_HOST mkdir -p $KDUMP_STAMPDIR ERROR=$? # If remote connections fails, no need to continue --- Thanks Hari == Comment: #20 - PAVITHRA R. PRAKASH <pavra...@in.ibm.com> - 2017-03-30 01:33:56 == (In reply to comment #19) > The workaround snippet adding delay in kdump script: > > > --- kdump-config.orig 2017-03-28 03:35:17.753542107 -0500 > +++ kdump-config 2017-03-28 06:59:22.887576623 -0500 > @@ -761,6 +761,7 @@ > KDUMP_DMESGFILE="$KDUMP_STAMPDIR/dmesg.$KDUMP_STAMP" > ERROR=0 > > + sleep 30 > ssh -i $KDUMP_SSH_KEY $KDUMP_REMOTE_HOST mkdir -p $KDUMP_STAMPDIR > ERROR=$? > # If remote connections fails, no need to continue > > --- > > Thanks > Hari With above workaround dump captured successfully in remote host. Thanks, Pavithra == Comment: #22 - Hari Krishna Bathini <hbath...@in.ibm.com> - 2017-04-10 22:14:27 == (In reply to comment #18) > Created attachment 117088 [details] > Console log of successful dump capture after adding a time delay of 'sleep > 30' > > Looks like tg3 module was not needed after all. Interesting thing though is > even after enP34p1s0f0 is up (ifup) and network.online target is reached, > network was not really active. It took about 30 seconds, after reaching > network.online target, for the network to be active, even on a normal boot. > Adding this wait time in kdump script, before saving dump, ensured that > vmcore is captured successful. Attaching the log for the same.. > > Not sure why enP34p1s0f0 is taking that long to configure/initialize. Even > so, > this delay should be part of ifup/network-online.target if it is inevitable, > so that network is pingable after network-online.target Hi Canonical, Since this falls outside the realm of kdump, should we add a NET_WAIT_TIME field in /etc/default/kdump-tools file that defaults to 0 but can be changed when the user sees timing troubles? Thanks Hari To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1681909/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp