------- Comment From bren...@br.ibm.com 2016-10-05 10:23 EDT-------
Hello Hari,

It seems that there is another issue now, correct? Should we open a new
defect for it, or, just track it here? I think that Canonical's action
is not clear at this moment.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1627036

Title:
  In Ubuntu16.10:Fadump fails as Kernel panic reported while
  dumping-,console got hung on 32TB Brazos System (kdump)

Status in linux package in Ubuntu:
  Triaged

Bug description:
  == Comment: #0 - Praveen K. Pandey <praveen.pan...@in.ibm.com> - 2016-07-17 
02:37:31 ==
  Hi 

   In Ubuntu16.10 I  I tried fadump in Brazos system (32TB Memory and
  192 core) , when trigger panic in kernel panic occur and console got
  hung.

  Reproducible Step:

  1- Install Ubuntu16.10
  2- boot system with 31TB and 192 Core 
  3- configure fadump in system 
  4- verify fadump in system that it is running 
  5- Trigger panic in system

  Actual Result

  Not able  to take Fadump , kernel panic and console got hung

  Expected Result

  Fadump will be captured

  Log:

  root@ltc-brazos1:~# kdump-config show
  DUMP_MODE:        fadump
  USE_KDUMP:        1
  KDUMP_SYSCTL:     kernel.panic_on_oops=1
  KDUMP_COREDIR:    /var/crash
     /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinux-4.4.0-30-generic
  kdump initrd: 
     /var/lib/kdump/initrd.img: symbolic link to 
/var/lib/kdump/initrd.img-4.4.0-30-generic
  current state:    ready to fadump
  root@ltc-brazos1:~# 

  root@ltc-brazos1:~# cat /proc/cmdline 
  BOOT_IMAGE=/boot/vmlinux-4.4.0-30-generic 
root=UUID=516c4b1b-6700-4b55-bd37-d61c4c5af6af ro quiet splash fadump=on 
fadump_reserve_mem=4096M crashkernel=4096M
  root@ltc-brazos1:~# 

  ltc-brazos1 login: [  442.749993] sysrq: SysRq : Trigger a crash              
                                                                                
              
  [  442.750031] Unable to handle kernel paging request for data at address 
0x00000000                                                                      
                  
  [  442.750037] Faulting instruction address: 0xc000000000670014               
                                                                                
              
  [  442.750043] Oops: Kernel access of bad area, sig: 11 [#1]                  
                                                                                
              
  [  442.750047] SMP NR_CPUS=2048 NUMA pSeries                                  
                                                                                
              
  [  442.750053] Modules linked in: pseries_rng btrfs xor raid6_pq rtc_generic 
sunrpc autofs4 ses enclosure ipr                                                
               
  [  442.750068] CPU: 157 PID: 403890 Comm: bash Not tainted 4.4.0-30-generic 
#49-Ubuntu                                                                      
                
  [  442.750074] task: c00003f97b0af640 ti: c00003f97b104000 task.ti: 
c00003f97b104000                                                                
                        
  [  442.750079] NIP: c000000000670014 LR: c0000000006710c8 CTR: 
c00000000066ffe0                                                                
                             
  [  442.750083] REGS: c00003f97b107990 TRAP: 0300   Not tainted  
(4.4.0-30-generic)                                                              
                            
  [  442.750088] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28242222  
XER: 00000001                                                                   
                 
  [  442.750100] CFAR: c000000000008468 DAR: 0000000000000000 DSISR: 42000000 
SOFTE: 1                                                                        
                
  GPR00: c0000000006710c8 c00003f97b107c10 c0000000015b5d00 0000000000000063    
                                                                                
              
  GPR04: c00001faba749c50 c00001faba75b4e0 c0001f3efe7c0000 0000000000000313 
  GPR08: 0000000000000007 0000000000000001 0000000000000000 c0001f3efe7cecb8 
  GPR12: c00000000066ffe0 c00000000bc9d380 ffffffffffffffff 0000000022000000 
  GPR16: 0000000010170dc8 000001001ef401d8 0000000010140f58 00000000100c7570 
  GPR20: 0000000000000000 000000001017dd58 0000000010153618 000000001017b608 
  GPR24: 00003ffff7c9e7b4 0000000000000001 c0000000014f8e58 0000000000000004 
  GPR28: c0000000014f9218 0000000000000063 c0000000014b11dc 0000000000000000 
  [  442.750165] NIP [c000000000670014] sysrq_handle_crash+0x34/0x50
  [  442.750170] LR [c0000000006710c8] __handle_sysrq+0xe8/0x270
  [  442.750174] Call Trace:
  [  442.750179] [c00003f97b107c10] [c000000000e08f28] 
_fw_tigon_tg3_bin_name+0x2ce58/0x342b0 (unreliable)
  [  442.750186] [c00003f97b107c30] [c0000000006710c8] __handle_sysrq+0xe8/0x270
  [  442.750192] [c00003f97b107cd0] [c000000000671868] 
write_sysrq_trigger+0x78/0xa0
  [  442.750199] [c00003f97b107d00] [c00000000037ae30] proc_reg_write+0xb0/0x110
  [  442.750205] [c00003f97b107d50] [c0000000002e186c] __vfs_write+0x6c/0xe0
  [  442.750210] [c00003f97b107d90] [c0000000002e25a0] vfs_write+0xc0/0x230
  [  442.750216] [c00003f97b107de0] [c0000000002e35dc] SyS_write+0x6c/0x110
  [  442.750222] [c00003f97b107e30] [c000000000009204] system_call+0x38/0xb4
  [  442.750226] Instruction dump:
  [  442.750229] 38425d20 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d220019 
394931e4 
  [  442.750238] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 
e8010010 7c0803a6 
  [  442.750248] ---[ end trace ff61e1bc4dd59a42 ]---
  [  442.752585] 

  
  Loading Linux 4.4.0-30-generic ...
  Loading initial ramdisk ...
  OF stdout device is: /vdevice/vty@30000000
  Preparing to boot Linux version 4.4.0-30-generic (buildd@bos01-ppc64el-023) 
(gcc version 5.3.1 20160413 (Ubuntu/IBM 5.3.1-14ubuntu2.1) ) #49-Ubuntu SMP Fri 
Jul 1 10:00:36 UTC 2016 (Ubuntu 4.4.0-30.49-generic 4.4.13)
  Detected machine type: 0000000000000101
  Max number of cores passed to firmware: 256 (NR_CPUS = 2048)
  Calling ibm,client-architecture-support... done
  command line: BOOT_IMAGE=/boot/vmlinux-4.4.0-30-generic 
root=UUID=516c4b1b-6700-4b55-bd37-d61c4c5af6af ro quiet splash fadump=on 
fadump_reserve_mem=4096M crashkernel=4096M
  Ignoring mem=0000000100000000 >= ram_top.
  memory layout at init:
    memory_limit : 0000000000000000 (16 MB aligned)
    alloc_bottom : 000000000e020000
    alloc_top    : 0000000010000000
    alloc_top_hi : 0000000010000000
    rmo_top      : 0000000010000000
    ram_top      : 0000000010000000
  instantiating rtas at 0x000000000e9e0000... done
  prom_hold_cpus: skipped
  copying OF device tree...
  Building dt strings...
  Building dt structure...
  Device tree strings 0x000000000e030000 -> 0x000000000e0319a4
  Device tree struct  0x000000000e040000 -> 0x000000000e640000
  Quiescing Open Firmware ...
  Booting Linux via __start() ...
   -> smp_release_cpus()
  spinning_secondaries = 1535
   <- smp_release_cpus()
   <- setup_system()
  [    0.000000] Kernel panic - not syncing: memblock_virt_alloc_try_nid: 
Failed to allocate 16777216 bytes align=0x1000000 nid=1 from=0xfffffffffffffff 
max_addr=0x0
  [    0.000000] 
  [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.4.0-30-generic 
#49-Ubuntu
  [    0.000000] Call Trace:
  [    0.000000] [c0000000015b39d0] [c000000000af955c] dump_stack+0xb0/0xf0 
(unreliable)
  [    0.000000] [c0000000015b3a10] [c000000000af5790] panic+0x100/0x2c0
  [    0.000000] [c0000000015b3aa0] [c000000000ed238c] 
memblock_virt_alloc_try_nid+0xc0/0xe8
  [    0.000000] [c0000000015b3b30] [c0000000002db69c] 
__earlyonly_bootmem_alloc.constprop.2+0x50/0x74
  [    0.000000] [c0000000015b3b70] [c000000000afc5fc] 
vmemmap_populate+0xf8/0x250
  [    0.000000] [c0000000015b3c40] [c000000000afdfa8] 
sparse_mem_map_populate+0x38/0x64
  [    0.000000] [c0000000015b3c70] [c000000000ed4234] sparse_init+0x1d4/0x298
  [    0.000000] [c0000000015b3d30] [c000000000eb3604] initmem_init+0xabc/0xd68
  [    0.000000] [c0000000015b3e50] [c000000000eab418] setup_arch+0x270/0x300
  [    0.000000] [c0000000015b3f00] [c000000000ea3ae4] start_kernel+0xc4/0x558
  [    0.000000] [c0000000015b3f90] [c000000000008c6c] 
start_here_common+0x20/0xa8
  [    0.000000] ---[ end Kernel panic - not syncing: 
memblock_virt_alloc_try_nid: Failed to allocate 16777216 bytes align=0x1000000 
nid=1 from=0xfffffffffffffff max_addr=0x0
  [    0.000000] 

  Regards
  Praveen

  == Comment: #1 - Praveen K. Pandey <praveen.pan...@in.ibm.com> -
  2016-07-17 02:40:23 ==

  
  == Comment: #14 - SRIKAR DRONAMRAJU <srikar.dronamr...@in.ibm.com> - 
2016-08-31 11:02:28 ==
  V3 was posted upstream at 
http://lkml.kernel.org/r/1472476010-4709-1-git-send-email-sri...@linux.vnet.ibm.com.

  That should atleast solve the problem (atleast it wouldnt panic/hang
  on triggering fadump)

  The patches posted were on top of 4.8-rc3 and apply cleanly on v4.4
  I am not sure what is the kernel targeted for 16.10.  I hear its going to be 
based on v4.8
  Once we know which kernel version ubuntu is targeting we can backport the 
patchset accordingly.

  == Comment: #18 - Gary M. Gaydos <gmgay...@us.ibm.com> - 2016-09-14 16:56:11 
==
  Hi Canonical:  Per this comment with patch set link, this bug appears to be 
fixed using the 4.40-34 kernel.  Of course the 16.10 release will use a newer 
kernel.

  V3 was posted upstream at http://lkml.kernel.org/r/1472476010-4709-1
  -git-send-email-sri...@linux.vnet.ibm.com.

  That should atleast solve the problem (atleast it wouldnt panic/hang
  on triggering fadump)

  The patches posted were on top of 4.8-rc3 and apply cleanly on v4.4
  I am not sure what is the kernel targeted for 16.10.  I hear its going to be 
based on v4.8
  Once we know which kernel version ubuntu is targeting we can backport the 
patchset accordingly.

  
  Exposing a comment from test that was previously private:
  (In reply to comment #16)
  > Hi Praveen, 
  > 
  > I have applied the patches to the Yakkety kernel source and built the *.deb
  > files. I have kept them on powerdev.in.ibm.com. Have sent you the access
  > details over email

  Hi latha ,

    Thanks i tried with patched kernel and seems me issue is fixed .
  able to capture FAdump .

  Log:

  root@ltc-brazos1:~# cat /proc/cmdline 
  BOOT_IMAGE=/boot/vmlinux-4.4.0-34-generic 
root=UUID=bfdd4041-1b2f-42b1-b202-2c09f781bbcc ro fadump=on quiet splash 
fadump=on crashkernel=384M-:128M
  root@ltc-brazos1:~# 

   root@ltc-brazos1:/var/crash# ls
  201609140950  kexec_cmd  linux-image-4.4.0-34-generic-201609140950.crash
  root@ltc-brazos1:/var/crash# cd 201609140950
  root@ltc-brazos1:/var/crash/201609140950# ls
  dmesg.201609140950  dump.201609140950
  root@ltc-brazos1:/var/crash/201609140950# 

  Regards
  Praveen

  == Comment: #20 - Hari Krishna Bathini <hbath...@in.ibm.com> - 2016-09-23 
03:49:36 ==
  Mirror the bug so Canonical can pick the fix patches.
  Srikar, can you please provide the upstream commit ids of the fix patches..

  Thanks
  Hari

  == Comment: #21 - Hari Krishna Bathini <hbath...@in.ibm.com> - 2016-09-23 
03:59:17 ==
  (In reply to comment #14)
  > V3 was posted upstream at
  > http://lkml.kernel.org/r/1472476010-4709-1-git-send-email-sri...@linux.vnet.
  > ibm.com.
  > 
  > That should atleast solve the problem (atleast it wouldnt panic/hang on
  > triggering fadump)
  > 
  > The patches posted were on top of 4.8-rc3 and apply cleanly on v4.4
  > I am not sure what is the kernel targeted for 16.10.  I hear its going to be
  > based on v4.8

  Yeah. 16.10 -proposed now has v4.8 based kernel..

  Thanks
  Hari

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627036/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to