As this is "for our awareness", marking as incomplete.

** Changed in: linux (Ubuntu)
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1658968

Title:
  ubuntu 16.04.2: crashed at deactivate_slab+0x18c/0x640 when testing
  dlpar

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Problem Description
  ===============================
  When testing cpu, memory and slot DLPAR on roselp4, the system crashed.
    
  ---uname output---
  Linux roselp4 4.8.0-34-generic #36~16.04.1-Ubuntu SMP Wed Dec 21 18:53:20 UTC 
2016 ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = lpar 
   
  Stack trace output:
   [ 3289.065350] Unable to handle kernel paging request for data at address 
0xc0000404565d6a00
  [ 3289.065375] Faulting instruction address: 0xc0000000002e6eec
  [ 3289.065379] Oops: Kernel access of bad area, sig: 11 [#1]
  [ 3289.065382] SMP NR_CPUS=2048 NUMA pSeries
  [ 3289.065386] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag 
udp_diag inet_diag unix_diag af_packet_diag netlink_diag rpcsec_gss_krb5 
auth_rpcgss nfsv4 nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) 
iw_cm(OE) configfs ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) 
mlx5_core(OE) mlx4_ib(OE) mlx4_en(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) 
ib_addr(OE) ib_netlink(OE) mlx4_core(OE) mlx_compat(OE) binfmt_misc pseries_rng 
vmx_crypto sunrpc knem(OE) autofs4 dm_round_robin btrfs xor raid6_pq lpfc 
crc32c_vpmsum ipr scsi_transport_fc devlink be2net scsi_dh_emc scsi_dh_rdac 
scsi_dh_alua dm_multipath [last unloaded: mlx4_core]
  [ 3289.065424] CPU: 82 PID: 40197 Comm: drmgr Tainted: G           OE   
4.8.0-34-generic #36~16.04.1-Ubuntu
  [ 3289.065427] task: c00000045081ce00 task.stack: c00000044d414000
  [ 3289.065430] NIP: c0000000002e6eec LR: c0000000002e7718 CTR: 
c0000000002e7630
  [ 3289.065433] REGS: c00000044d417470 TRAP: 0300   Tainted: G           OE    
(4.8.0-34-generic)
  [ 3289.065435] MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  
CR: 24082822  XER: 20000000
  [ 3289.065446] CFAR: c000000000008750 DAR: c0000404565d6a00 DSISR: 40000000 
SOFTE: 0
                 GPR00: c0000000002e7718 c00000044d4176f0 c0000000014a6600 
c00000047e01f480
                 GPR04: 0000000000000010 0000000082000075 0000000000000075 
0000000000000001
                 GPR08: 0000000002000000 0000000000000000 0000000082000075 
0000000000000009
                 GPR12: 0000000084002828 c000000007b4e200 0000000000000000 
0000000000000000
                 GPR16: 0000000000000000 0000000000000000 0000000000000000 
c000000000d7a800
                 GPR20: 0000000010000050 c000000000fd4e6c c0000003e7933840 
c0000000014daae0
                 GPR24: c00000000138dc48 0000000000000000 0000000000000001 
c00000047e00fe80
                 GPR28: c0000404565d6a00 c00000047e01f480 c0000004565de700 
f000000001159740
  [ 3289.065486] NIP [c0000000002e6eec] deactivate_slab+0x18c/0x640
  [ 3289.065489] LR [c0000000002e7718] slab_cpuup_callback+0xe8/0x170
  [ 3289.065491] Call Trace:
  [ 3289.065493] [c00000044d4176f0] [c0000000002e715c] 
deactivate_slab+0x3fc/0x640 (unreliable)
  [ 3289.065498] [c00000044d417810] [c0000000002e7718] 
slab_cpuup_callback+0xe8/0x170
  [ 3289.065502] [c00000044d417880] [c0000000000f98c8] 
notifier_call_chain+0x98/0x110
  [ 3289.065506] [c00000044d4178d0] [c0000000000ca564] __cpu_notify+0x54/0xa0
  [ 3289.065509] [c00000044d4178f0] [c0000000000ca77c] 
cpu_notify_nofail+0x2c/0x40
  [ 3289.065512] [c00000044d417910] [c0000000000ca7e4] notify_dead+0x54/0x170
  [ 3289.065515] [c00000044d4179b0] [c0000000000c98c4] 
cpuhp_invoke_callback+0x84/0x250
  [ 3289.065519] [c00000044d417a10] [c0000000000c9bfc] 
cpuhp_down_callbacks+0x8c/0x110
  [ 3289.065523] [c00000044d417a60] [c00000000024e328] _cpu_down+0x168/0x2b0
  [ 3289.065526] [c00000044d417ac0] [c0000000000cc068] do_cpu_down+0x68/0xb0
  [ 3289.065530] [c00000044d417b00] [c000000000738448] 
cpu_subsys_offline+0x28/0x40
  [ 3289.065534] [c00000044d417b20] [c00000000072f9e4] 
device_offline+0x104/0x140
  [ 3289.065538] [c00000044d417b60] [c00000000009a7bc] 
dlpar_cpu_remove+0x24c/0x350
  [ 3289.065542] [c00000044d417c40] [c00000000009aa50] 
dlpar_cpu_release+0x70/0xe0
  [ 3289.065545] [c00000044d417c90] [c000000000021a04] 
arch_cpu_release+0x44/0x80
  [ 3289.065548] [c00000044d417cb0] [c000000000738c8c] 
cpu_release_store+0x4c/0x80
  [ 3289.065552] [c00000044d417ce0] [c00000000072b7b0] dev_attr_store+0x40/0x70
  [ 3289.065555] [c00000044d417d00] [c0000000003e1e1c] sysfs_kf_write+0x6c/0xa0
  [ 3289.065559] [c00000044d417d20] [c0000000003e0cdc] 
kernfs_fop_write+0x17c/0x250
  [ 3289.065563] [c00000044d417d70] [c000000000322b20] __vfs_write+0x40/0x80
  [ 3289.065566] [c00000044d417d90] [c000000000323ec4] vfs_write+0xd4/0x270
  [ 3289.065571] [c00000044d417de0] [c000000000325acc] SyS_write+0x6c/0x110
  [ 3289.065575] [c00000044d417e30] [c000000000009584] system_call+0x38/0xec
  [ 3289.065577] Instruction dump:
  [ 3289.065579] b0df0018 60420000 815f0018 55490bfe 5529f83e 7d294378 913f0018 
7c2004ac
  [ 3289.065585] e93f0000 792907a4 f93f0000 e93d0022 <7d5c482a> 2faa0000 
419e0064 7f86e378
  [ 3289.065596] ---[ end trace 7f6da25673d4d05e ]---
   
  Oops output:
   Oops: Kernel access of bad area, sig: 11 [#1]

  == Comment: #12 - Ping Tian Han <pt...@cn.ibm.com> - 2017-01-17 20:57:37 ==
  Looks like this bug can be reproduced without the CadetE card. I think the 
problem occurs on the BabyBlueTip card:

  0292:60:00.0 Ethernet controller: Mellanox Technologies MT27520 Family 
[ConnectX-3 Pro]
          Subsystem: IBM MT27520 Family [ConnectX-3 Pro]
          Kernel driver in use: mlx4_core
          Kernel modules: mlx4_core

  == Comment: #14 - Carol L. Soto <cls...@us.ibm.com> - 2017-01-19 10:39:38 ==
  I can not see in the report the stack trace this bugzilla is complaining.
  but in the report I saw the known issue that when u did this dlpar with 
memory and cpu and mellanox cards the card hits eeh. I think that was kernel 
issue. 

  You can try to recreate what this bugzilla complains with dlpar of the
  IO card but the test that you are running will hit the known issue I
  explained.

  == Comment: #15 - Ping Tian Han <pt...@cn.ibm.com> - 2017-01-19 19:12:47 ==
  (In reply to comment #14)
  > I can not see in the report the stack trace this bugzilla is complaining.
  > but in the report I saw the known issue that when u did this dlpar with
  > memory and cpu and mellanox cards the card hits eeh. I think that was kernel
  > issue. 
  > 
  > You can try to recreate what this bugzilla complains with dlpar of the IO
  > card but the test that you are running will hit the known issue I explained.

  Thanks. Looks like this is a mellanox card issue.

  Mirroring the bug to Canonical for their awareness.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1658968/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to