I built a Zesty test kernel with a pick of commit 3d3efb68c19e539f.  The
test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1693566/

Can you see if this kernel resolves this bug?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1693566

Title:
  Ubuntu 16.04.03: "NMI watchdog: BUG: soft lockup" occurs while running
  stress-ng on PowerNV machine.

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Zesty:
  In Progress

Bug description:
  == Comment: #0 - PAVITHRA R. PRAKASH <pavra...@in.ibm.com> - 2017-05-17 
05:55:38 ==
  --- Problem description ----

  Ubuntu 16.04.03: "NMI watchdog: BUG: soft lockup" occurs while running
  stress-ng on NV machine.

  --- Steps to recreate------

  1. Install ubuntu16.04.03.
  2. Run "stress-ng -a 0".

  Logs:
  ====

  [ 2660.437087] INFO: rcu_sched self-detected stall on CPU
  [ 2660.437111]        22-...: (5247 ticks this GP) idle=e19/140000000000001/0 
softirq=905/905 fqs=2380 
  [ 2660.437114]         (t=5251 jiffies g=95606 c=95605 q=2545946)
  [ 2660.437750]        24-...: (5250 ticks this GP) idle=0b7/140000000000001/0 
softirq=5805/5805 fqs=2380 
  [ 2660.437859]        
  [ 2664.172796] NMI watchdog: BUG: soft lockup - CPU#22 stuck for 23s! 
[stress-ng-mmap:3509]
  [ 2664.172808] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 23s! 
[stress-ng-mrema:3536]
  [ 2674.848037] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 33s! 
[stress-ng-fork:3381]
  [ 2676.172894] NMI watchdog: BUG: soft lockup - CPU#30 stuck for 22s! 
[kswapd0:992]
  [ 2680.336844] NMI watchdog: BUG: soft lockup - CPU#98 stuck for 23s! 
[stress-ng-clock:5099]
  [ 2686.140931] NMI watchdog: BUG: soft lockup - CPU#16 stuck for 39s! 
[stress-ng-clone:3366]
  [ 2686.987192] xhci_hcd 0003:09:00.0: HC died; cleaning up
  [ 2686.987212] usb 1-3-port3: cannot reset (err = -108)

  After few hours machine will become completely unresponsive

  [pavithra@localhost ~]$ ping 9.47.69.255
  PING 9.47.69.255 (9.47.69.255) 56(84) bytes of data.
  ^C
  --- 9.47.69.255 ping statistics ---
  12 packets transmitted, 0 received, 100% packet loss, time 11000ms

  
  Thanks,
  Pavithra

  == Comment: #6 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2017-05-25 
03:32:26 ==
  ubuntu@ltc-firep2:~$ hostname -i
  9.47.69.255
  ubuntu@ltc-firep2:~$ uname -a
  Linux ltc-firep2 4.10.0-21-generic #23~16.04.1-Ubuntu SMP Tue May 2 12:54:57 
UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
  ubuntu@ltc-firep2:~$ cat /etc/os-release
  NAME="Ubuntu"
  VERSION="16.04.2 LTS (Xenial Xerus)"
  ID=ubuntu
  ID_LIKE=debian
  PRETTY_NAME="Ubuntu 16.04.2 LTS"
  VERSION_ID="16.04"
  HOME_URL="http://www.ubuntu.com/";
  SUPPORT_URL="http://help.ubuntu.com/";
  BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/";
  VERSION_CODENAME=xenial
  UBUNTU_CODENAME=xenial
  ubuntu@ltc-firep2:~$ tail /proc/cpuinfo
  processor     : 159
  cpu           : POWER8 (raw), altivec supported
  clock         : 2061.000000MHz
  revision      : 2.0 (pvr 004d 0200)

  timebase      : 512000000
  platform      : PowerNV
  model         : 8335-GTA        
  machine               : PowerNV 8335-GTA        
  firmware      : OPAL
  ubuntu@ltc-firep2:~$

  
  == Comment: #11 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2017-05-25 
06:27:12 ==
  System Memory stats
  ==============
  ubuntu@ltc-firep2:~$ numactl -H
  available: 2 nodes (0,8)
  node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 
77 78 79
  node 0 size: 61321 MB
  node 0 free: 60297 MB
  node 8 cpus: 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
  node 8 size: 65303 MB
  node 8 free: 64923 MB
  node distances:
  node   0   8 
    0:  10  40 
    8:  40  10 
  ubuntu@ltc-firep2:~$ free -h 
                total        used        free      shared  buff/cache   
available
  Mem:           123G        534M        122G         20M        868M        
121G
  Swap:           37G          0B         37G
  ubuntu@ltc-firep2:~$ sudo sysctl vm | grep free
  vm.min_free_kbytes = 360448
  ubuntu@ltc-firep2:~$ 

  Host is having 123 GB of memory spread across two nodes. 
  Swap is configured to be 37GB and VM min free bytes is set to 360MB.

  == Comment: #13 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2017-05-25
  07:25:35 ==

  [  280.494345] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! 
[stress-ng-mmap:4172]
  [  280.495250] CPU: 5 PID: 4172 Comm: stress-ng-mmap Not tainted 
4.10.0-21-generic #23~16.04.1-Ubuntu
  [  280.495262] task: c000000fe318c600 task.stack: c000000fc0d7c000
  [  280.495271] NIP: c0000000001a3248 LR: c0000000001a3204 CTR: 
c0000000000871f0
  [  280.495285] REGS: c000000fc0d7f7d0 TRAP: 0901   Not tainted  
(4.10.0-21-generic)
  [  280.495299] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
  [  280.495408]   CR: 44424444  XER: 20000000
  [  280.495416] CFAR: c0000000001a3250 SOFTE: 1 
  [  280.495624] NIP [c0000000001a3248] smp_call_function_many+0x358/0x3f0
  [  280.495636] LR [c0000000001a3204] smp_call_function_many+0x314/0x3f0
  [  280.495645] Call Trace:
  [  280.495660] [c000000fc0d7fa50] [c0000000001a31e4] 
smp_call_function_many+0x2f4/0x3f0 (unreliable)
  [  280.495697] [c000000fc0d7fac0] [c0000000001a3430] 
kick_all_cpus_sync+0x40/0x50
  [  280.495726] [c000000fc0d7fae0] [c000000000069728] 
hash__pmdp_huge_get_and_clear+0xa8/0xf0
  [  280.495742] [c000000fc0d7fb10] [c00000000032b600] 
change_huge_pmd+0x210/0x2d0
  [  280.495762] [c000000fc0d7fb80] [c0000000002df638] 
change_protection_range+0xb38/0xe60
  [  280.495789] [c000000fc0d7fcc0] [c00000000030994c] 
change_prot_numa+0x3c/0xc0
  [  280.495815] [c000000fc0d7fcf0] [c00000000012e854] 
task_numa_work+0x2d4/0x3f0
  [  280.495844] [c000000fc0d7fdb0] [c00000000010f330] task_work_run+0x140/0x1a0
  [  280.495868] [c000000fc0d7fe00] [c00000000001db04] 
do_notify_resume+0xe4/0xf0
  [  280.495885] [c000000fc0d7fe30] [c00000000000b744] 
ret_from_except_lite+0x70/0x74
  [  280.495909] Instruction dump:
  [  280.495925] 3d020003 78691f24 39480fe0 7d2a482a e95d0000 7d4a4a14 812a0018 
792707e1 
  [  280.496022] 4182001c 60420000 7c210b78 7c421378 <812a0018> 792807e1 
4082fff0 7c2004ac 

  
  [  636.509312] NMI watchdog: BUG: soft lockup - CPU#29 stuck for 22s! 
[stress-ng-mrema:4205]
  [  636.510076] CPU: 29 PID: 4205 Comm: stress-ng-mrema Tainted: G             
L  4.10.0-21-generic #23~16.04.1-Ubuntu
  [  636.510090] task: c000000fdef86e00 task.stack: c000000fdd074000
  [  636.510104] NIP: c0000000001a3244 LR: c0000000001a3204 CTR: 
c0000000000871f0
  [  636.510136] REGS: c000000fdd077760 TRAP: 0901   Tainted: G             L   
(4.10.0-21-generic)
  [  636.510146] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
  [  636.510302]   CR: 44484824  XER: 20000000
  [  636.510319] CFAR: c0000000001a3250 SOFTE: 1 
  [  636.510620] NIP [c0000000001a3244] smp_call_function_many+0x354/0x3f0
  [  636.510647] LR [c0000000001a3204] smp_call_function_many+0x314/0x3f0
  [  636.510658] Call Trace:
  [  636.510676] [c000000fdd0779e0] [c0000000001a31e4] 
smp_call_function_many+0x2f4/0x3f0 (unreliable)
  [  636.510759] [c000000fdd077a50] [c0000000001a3430] 
kick_all_cpus_sync+0x40/0x50
  [  636.510791] [c000000fdd077a70] [c00000000006f350] pmdp_invalidate+0x80/0xc0
  [  636.510820] [c000000fdd077aa0] [c000000000327d7c] 
__split_huge_pmd_locked+0x5bc/0xaa0
  [  636.510842] [c000000fdd077b60] [c00000000032b834] 
__split_huge_pmd+0x174/0x280
  [  636.510876] [c000000fdd077bc0] [c00000000032bc04] 
vma_adjust_trans_huge+0x134/0x1a0
  [  636.510909] [c000000fdd077c10] [c0000000002da1e4] __vma_adjust+0x114/0x8e0
  [  636.510932] [c000000fdd077cf0] [c0000000002dac2c] 
__split_vma.isra.5+0x27c/0x2a0
  [  636.510969] [c000000fdd077d40] [c0000000002dbb34] do_munmap+0x134/0x480
  [  636.510991] [c000000fdd077db0] [c0000000002e1550] SyS_mremap+0x1f0/0x550
  [  636.511029] [c000000fdd077e30] [c00000000000b184] system_call+0x38/0xe0
  [  636.511048] Instruction dump:
  [  636.511065] 409dfda4 3d020003 78691f24 39480fe0 7d2a482a e95d0000 7d4a4a14 
812a0018 
  [  636.511184] 792707e1 4182001c 60420000 7c210b78 <7c421378> 812a0018 
792807e1 4082fff0 

  Even after increasing  vm.min_free_kbytes to 2GB also, soft lockups and hang 
is still being
  seen after running stress-ng tool. This seems to be kernel issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1693566/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to