[Kernel-packages] [Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels

2017-03-21 Thread Alexandru Avadanii
Hi, Dann,
First of all, I think the bug title is misleading, as this issue happens on all 
kernels we tested (4.4.0-45..66, 4.8.0-x, 4.10.0-x etc).

To be fair, we haven't this exact bug (or at least I don't think we did)
in practice, i.e. without running stress-ng, 4.4.0-x never ever crashed.

The VM use case turned out to be a different bug [1], triggered 100% by
AAVMF + vhost.

Let me know if I can provide anything else.
I consider this particular bug minor (if we don't poke it with stress-ng, 
everything works well), compared to AAVMF + vhost [1].

Thanks,
Alex

[1] https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1673564

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Yakkety:
  Triaged
Status in linux source package in Zesty:
  Triaged

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 

[Kernel-packages] [Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels

2017-03-21 Thread dann frazier
@Alexandru: do you have a console log of a system hitting the issue w/
the VM use case? Soft lockups are a fairly generic failure mode, and it
would not surprise me if stress-ng was triggering a different issue than
the VM case, but both emitting soft lockups.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Yakkety:
  Triaged
Status in linux source package in Zesty:
  Triaged

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 
dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567:
  dmi.product.name: R120-T30
  dmi.product.version: 0100
  dmi.sys.vendor: GIGABYTE

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : 

[Kernel-packages] [Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels

2017-03-14 Thread Alexandru Avadanii
4.11-rc1 console log attached.
Board firmware is latest available on Gigabyte's site (T31).

1. Install 4.11-rc1 (`make modules_install install`) and reboot
2. Observe networking driver issues in boot log
   Dmesg: 4.11-rc1_dmesg_on_clean_boot.log [3]
3. Try `ping google.com`, obviously not working
4. `modprobe -r nicpf` (leads to multiple oopses in dmesg)
Console log: 4.11-rc1_modprobe_r_nicpf_output.log [1]
Dmesg :4.11-rc1_dmesg_after_modprobe_r_nicpf.log [2]
5. `modprobe nicpf` (this usually works, and afterwards network is up and 
running - not sure whether ALL interfaces are ok, as not all of them are 
connected) - however this time it led to a soft lockup (see full logs attached 
here);

[1] http://paste.ubuntu.com/24178311/
[2] http://paste.ubuntu.com/24178312/
[3] http://paste.ubuntu.com/24178313/

** Attachment added: "ThunderX 4.11-rc1 console log"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+attachment/4837770/+files/thunderx_4.11_rc1_console_log.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Yakkety:
  Triaged
Status in linux source package in Zesty:
  Triaged

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  

[Kernel-packages] [Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels

2017-03-14 Thread Ciprian Barbu
Just one addition, the log before contains dmesg output too. The task
that hanged was systemd, it might be related with some VMs from the
previous boot record being restarted automatically, but it still doesn't
explain the crash.

Rebooting the node again with 4.4 did not result in kernel crash.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Yakkety:
  Triaged
Status in linux source package in Zesty:
  Triaged

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 
dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567:
  dmi.product.name: R120-T30
  dmi.product.version: 0100
  dmi.sys.vendor: GIGABYTE

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : 

[Kernel-packages] [Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels

2017-03-14 Thread Ciprian Barbu
Hi,

The same bug happened again on a similar board with T27 firmware, but
this time running kernel 4.4.0-45-generic. I'm attaching log with serial
console (with debug info from the FW). I can't attach more because the
kernel hanged.

So far 4.4.0-45-generic was stable on our lab, this happened with no
obvious reason.

/ciprian

** Attachment added: "dmesg.log"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+attachment/4837761/+files/dmesg.log

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Yakkety:
  Triaged
Status in linux source package in Zesty:
  Triaged

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 
dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567:
  dmi.product.name: R120-T30
  dmi.product.version: 0100
  dmi.sys.vendor: GIGABYTE

To manage notifications about this bug go 

[Kernel-packages] [Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels

2017-03-14 Thread Alexandru Avadanii
Hi,
I tried out 4.11-rc1 a few days ago. Unfortunately, I did not get the board to 
boot properly from the start, since ThunderX networking drivers failed to 
allocate MSI-X/MSI interrupts, and polling on some registers also failed ...

So, with 4.11-rc1, at least one networking interfaces was never coming
online due to unmapped interrupts/failed polling, but unloading `nicpf`
and reloading it seemed to work (networking worked after this). After
this, the soft lockup happened, but I can't be sure I did not mess
something else.

Let me try this again and get back to you with some proper logs, but off
the top of my head, things got worse with 4.11-rc1 ...

Thanks,
Alex

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Yakkety:
  Triaged
Status in linux source package in Zesty:
  Triaged

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 

[Kernel-packages] [Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels

2017-03-14 Thread Joseph Salisbury
Would it be possible for you to test the latest upstream kernel? Refer
to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
v4.11 kernel[0].

If this bug is fixed in the mainline kernel, please add the following
tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag:
'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as
"Confirmed".


Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc2

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Yakkety:
  Triaged
Status in linux source package in Zesty:
  Triaged

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 
dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567:
  

[Kernel-packages] [Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels

2017-03-14 Thread Joseph Salisbury
If the mainline kernel still exhibits the bug, we can perform a kernel
bisect to identify what commit introduced the regression.

If the mainline kernel fixes the bug, we can perform a "Reverse" bisect
to identify the fix.

** Also affects: linux (Ubuntu Zesty)
   Importance: High
   Status: Confirmed

** Also affects: linux (Ubuntu Yakkety)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Yakkety)
   Status: New => Triaged

** Changed in: linux (Ubuntu Zesty)
   Status: Confirmed => Triaged

** Changed in: linux (Ubuntu Yakkety)
   Importance: Undecided => High

** Tags added: kernel-da-key needs-bisect yakkety zesty

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Yakkety:
  Triaged
Status in linux source package in Zesty:
  Triaged

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 

[Kernel-packages] [Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels

2017-03-13 Thread Alexandru Avadanii
apport information

** Tags added: apport-collected xenial

** Description changed:

  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).
  
  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:
  
  $ apt-get install stress-ng
  $ stress-ng --hdd 1024
  
  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40
  
  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.
  
  [1] http://paste.ubuntu.com/24172516/
+ --- 
+ AlsaDevices:
+  total 0
+  crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
+  crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
+ AplayDevices: Error: [Errno 2] No such file or directory
+ ApportVersion: 2.20.1-0ubuntu2.5
+ Architecture: arm64
+ ArecordDevices: Error: [Errno 2] No such file or directory
+ AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
+ DistroRelease: Ubuntu 16.04
+ IwConfig: Error: [Errno 2] No such file or directory
+ MachineType: GIGABYTE R120-T30
+ Package: linux (not installed)
+ PciMultimedia:
+  
+ ProcEnviron:
+  TERM=vt220
+  PATH=(custom, no user)
+  XDG_RUNTIME_DIR=
+  LANG=en_US.UTF-8
+  SHELL=/bin/bash
+ ProcFB: 0 astdrmfb
+ ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
+ ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
+ RelatedPackageVersions:
+  linux-restricted-modules-4.8.0-41-generic N/A
+  linux-backports-modules-4.8.0-41-generic  N/A
+  linux-firmware1.157.8
+ RfKill: Error: [Errno 2] No such file or directory
+ Tags:  xenial
+ Uname: Linux 4.8.0-41-generic aarch64
+ UpgradeStatus: No upgrade log present (probably fresh install)
+ UserGroups:
+  
+ _MarkForUpload: True
+ dmi.bios.date: 11/22/2016
+ dmi.bios.vendor: GIGABYTE
+ dmi.bios.version: T22
+ dmi.board.asset.tag: 01234567890123456789AB
+ dmi.board.name: MT30-GS0
+ dmi.board.vendor: GIGABYTE
+ dmi.board.version: 01234567
+ dmi.chassis.asset.tag: 01234567890123456789AB
+ dmi.chassis.type: 17
+ dmi.chassis.vendor: GIGABYTE
+ dmi.chassis.version: 01234567
+ dmi.modalias: 
dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567:
+ dmi.product.name: R120-T30
+ dmi.product.version: 0100
+ dmi.sys.vendor: GIGABYTE

** Attachment added: "CRDA.txt"
   https://bugs.launchpad.net/bugs/1672521/+attachment/4837212/+files/CRDA.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple