[Kernel-packages] [Bug 1857074] Re: Cavium ThunderX CN88XX Panic : Unknown reason

2020-01-15 Thread Alexandru Avadanii
Hi,
Not sure this is useful (since it might be obvious), but adding `nopti` to 
kernel parameters works around the issue, indicating this is indeed related to 
kpti.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1857074

Title:
  Cavium ThunderX CN88XX Panic : Unknown reason

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  Confirmed

Bug description:
  Series: Bionic
  Kernel: 4.15.0-74.84 linux-generic
  Steps to reproduce:  Install 4.15.0-74.84 Kernel and boot the system.

  The following crash was observed while testing the proposed kernel for the 
2019.12.02 SRU Cycle.
  This kernel was built to include fixes for the following bugs:

    * [Regression] Bionic kernel 4.15.0-71.80 can not boot on ThunderX
  (LP: #1853326)
  - Revert "arm64: Use firmware to detect CPUs that are not affected by
    Spectre-v2"
  - Revert "arm64: Get rid of __smccc_workaround_1_hvc_*"

    * [Regression] Bionic kernel 4.15.0-71.80 can not boot on ThunderX2 and
  Kunpeng920 (LP: #1852723)
  - SAUCE: arm64: capabilities: Move setup_boot_cpu_capabilities() call to
    correct place

  The following crash appears to be a NEW bug. not related to the prior bugs 
listed above.
  This bug DOES NOT APPEAR to be related to LP#1857073.

  This is another NEW BUG.

  Hostname: Starmie

  Probable Cause is unknown at this point and still under investigation.

  [  OK  ] Found device WDC_WD5003ABYZ-011FA0 efi.
   Mounting /boot/efi...
  [  OK  ] Mounted /boot/efi.
  [  OK  ] Reached target Local File Systems.
   Starting AppArmor initialization...
   Starting Tell Plymouth To Write Out Runtime Data...
   Starting ebtables ruleset management...
  [   20.942427] kernel BUG at 
/build/linux-pWET3k/linux-4.15.0/fs/buffer.c:1240!
  [   20.951416] Internal error: Oops - BUG: 0 [#1] SMP
  [   20.958153] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip 
cavium_rng_vf shpchp cavium_rng gpio_keys uio_pdrv_genirq ipmi_ssif uio 
ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf 
nicpf ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect aes_ce_blk 
sysimgblt fb_sys_fops aes_ce_cipher crc32_ce drm crct10dif_ce ghash_ce sha2_ce 
sha256_arm64 sha1_ce ahci thunder_bgx libahci thunder_xcv i2c_thunderx 
mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd 
cryptd aes_arm64
  [   21.044326] Process systemd (pid: 1, stack limit = 0x5af6f18b)
  [   21.053858] CPU: 1 PID: 1 Comm: systemd Not tainted 4.15.0-74-generic 
#84-Ubuntu
  [   21.063931] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
BIOS 5.11 12/12/2012
  [   21.074790] pstate: 20400085 (nzCv daIf +PAN -UAO)
  [   21.082096] pc : __find_get_block+0x2e8/0x398
  [   21.088917] lr : __getblk_gfp+0x3c/0x2a8
  [   21.095379] sp : 099ab7e0
  [   21.101062] x29: 099ab7e0 x28: 
  [   21.108699] x27:  x26: 
  [   21.116265] x25: 0001 x24: 
  [   21.123788] x23: 0008 x22: 801f26116c80
  [   21.131302] x21: 801f26116c80 x20: 245c
  [   21.138808] x19: 1000 x18: a59c3a70
  [   21.146300] x17:  x16: 
  [   21.153730] x15: 0020 x14: 0012
  [   21.161083] x13: 2f7374696e752f64 x12: 0101010101010101
  [   21.168397] x11: 7f7f7f7f7f7f7f7f x10: 0972d000
  [   21.175689] x9 :  x8 : 801f7ba7e3c0
  [   21.183042] x7 : 801f7ba7e3e0 x6 : 
  [   21.190667] x5 : 0004 x4 : 0020
  [   21.197955] x3 : 0008 x2 : 1000
  [   21.205680] x1 : 245c x0 : 0080
  [   21.212918] Call trace:
  [   21.217257]  __find_get_block+0x2e8/0x398
  [   21.223160]  __getblk_gfp+0x3c/0x2a8
  [   21.228644]  ext4_getblk+0xcc/0x1b0
  [   21.233991]  ext4_bread_batch+0x78/0x1c8
  [   21.239726]  ext4_find_entry+0x2d4/0x598
  [   21.245416]  ext4_lookup+0xac/0x278
  [   21.250612]  lookup_slow+0xac/0x190
  [   21.255736]  walk_component+0x228/0x340
  [   21.261151]  link_path_walk+0x2f4/0x568
  [   21.266499]  path_parentat+0x44/0x88
  [   21.271521]  filename_parentat+0xa0/0x170
  [   21.276924]  filename_create+0x60/0x168
  [   21.282082]  SyS_symlinkat+0x80/0x128
  [   21.287013]  el0_svc_naked+0x30/0x34
  [   21.291835] Code: 17e7 a90363b7 a9046bb9 f9002bbb (d421)
  [   21.299191] ---[ end trace b07cecc329f07f48 ]---
  [   21.347488] systemd: 35 output lines suppressed due to ratelimiting
  [   

[Kernel-packages] [Bug 1673564] Re: ThunderX: soft lockup on 4.8+ kernels when running qemu-efi with vhost=on

2017-03-27 Thread Alexandru Avadanii
Hi, Dann,
Thanks for looking into this!
One more thing: we blacklisted the module "vhost_net", and that bypasses the 
issue.
I know it's not the right direction for finding a fix, but maybe it helps with 
the debug.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1673564

Title:
  ThunderX: soft lockup on 4.8+ kernels when running qemu-efi with
  vhost=on

Status in edk2 package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  This is a followup of an earlier thread/bug that we have narrowed down
  to an incompatibility/issue with vhost support in qemu-efi. Without
  vhost=on qemu seems to be working fine.

  I have tested several edk2 firmwares:
  - xenial
  - zesty
  - Fedora: 
ftp://195.220.108.108/linux/fedora-secondary/development/rawhide/Everything/aarch64/os/Packages/e/edk2-aarch64-20170209git296153c5-2.fc26.noarch.rpm

  I have also tested with different guests:
  - cirros: 
https://download.cirros-cloud.net/daily/20161201/cirros-d161201-aarch64-disk.img
  - ubuntu xenial: 
https://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-arm64-uefi1.img

  The test steps are simple enough. A tap device is needed, qemu-kvm,
  qemu-efi need to be installed. The UEFI iamge is run as shown in the
  launch.sh script, the tap device is used in vhost=on mode.

  Also note that the QEMU_EFI.fd binary needs to be padded up to 64M:
  dd if=/dev/zero of=AAVMF_CODE.fd bs=1M count=64
  dd if=QEMU_EFI.fd of=AAVMF_CODE.fd conv=notrunc

  
  The result was always the same, the node crashing with soft-lockups when qemu 
was attempting to boot the kernel.

  I will attach all the relevant information shortly.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1673564/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1674837] Re: thunder nic: RX_PACKET_DIS fix regression with Extreme Networks switch

2017-03-21 Thread Alexandru Avadanii
** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1674837

Title:
  thunder nic: RX_PACKET_DIS fix regression with Extreme Networks switch

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Upstream backport [3] introduced a regression with ThunderX nodes (CRB-1S, 
CRB-2S) and our 10G switch (Extreme Networks x670 10GE L3).
  We have opened a downstream bug report [1], where we temporarily bypassed 
this by pinning the kernel to 4.4.0-45.
  I also tested 4.8 (multiple builds), 4.10 and 4.11-rc1 (vanilla); all are 
still affected by link training issues with our switch, with 4.11-rc1 not 
working at all and reporting more issues (logs attached in a different LP 
comment [2]).

  I also confirmed that reverting the commit in questions fixes the
  issues in our setup (tested on top of 4.10.0-13 linux-image-generic-
  hwe-edge package from Xenial).

  BR,
  Alex

  [1] https://jira.opnfv.org/browse/ARMBAND-168
  [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/comments/17
  [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630038

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674837/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1630038] Re: thunder nic: avoid link delays due to RX_PACKET_DIS

2017-03-21 Thread Alexandru Avadanii
Hi, Dann,
I created a new bug and pasted the same info as above at [1].
Afaict, there is no useful information in the logs when link training fails.

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674837

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1630038

Title:
  thunder nic: avoid link delays due to RX_PACKET_DIS

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Yakkety:
  Fix Released

Bug description:
  [Impact]
  Link establishment is delayed during initialization, possibly resulting in 
remote fault conditions that may cause the interface to fail to come up.

  [Test Case]
  Put the system in a reboot loop and watch for a remote fault condition, or a 
failure to bring up the link that can only be resolved by reloading the module.

  [Regression Risk]
  Patch is to a specific driver that is only used on Cavium ThunderX systems. 
The patch is upstream, so will have upstream support for regressions.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630038/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1674837] Re: thunder nic: RX_PACKET_DIS fix regression with Extreme Networks switch

2017-03-21 Thread Alexandru Avadanii
Let me know if I should attach any logs, although there are *no* traces
anywhere, at least with default log levels (without recompiling).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1674837

Title:
  thunder nic: RX_PACKET_DIS fix regression with Extreme Networks switch

Status in linux package in Ubuntu:
  New

Bug description:
  Upstream backport [3] introduced a regression with ThunderX nodes (CRB-1S, 
CRB-2S) and our 10G switch (Extreme Networks x670 10GE L3).
  We have opened a downstream bug report [1], where we temporarily bypassed 
this by pinning the kernel to 4.4.0-45.
  I also tested 4.8 (multiple builds), 4.10 and 4.11-rc1 (vanilla); all are 
still affected by link training issues with our switch, with 4.11-rc1 not 
working at all and reporting more issues (logs attached in a different LP 
comment [2]).

  I also confirmed that reverting the commit in questions fixes the
  issues in our setup (tested on top of 4.10.0-13 linux-image-generic-
  hwe-edge package from Xenial).

  BR,
  Alex

  [1] https://jira.opnfv.org/browse/ARMBAND-168
  [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/comments/17
  [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630038

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674837/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1674837] [NEW] thunder nic: RX_PACKET_DIS fix regression with Extreme Networks switch

2017-03-21 Thread Alexandru Avadanii
Public bug reported:

Upstream backport [3] introduced a regression with ThunderX nodes (CRB-1S, 
CRB-2S) and our 10G switch (Extreme Networks x670 10GE L3).
We have opened a downstream bug report [1], where we temporarily bypassed this 
by pinning the kernel to 4.4.0-45.
I also tested 4.8 (multiple builds), 4.10 and 4.11-rc1 (vanilla); all are still 
affected by link training issues with our switch, with 4.11-rc1 not working at 
all and reporting more issues (logs attached in a different LP comment [2]).

I also confirmed that reverting the commit in questions fixes the issues
in our setup (tested on top of 4.10.0-13 linux-image-generic-hwe-edge
package from Xenial).

BR,
Alex

[1] https://jira.opnfv.org/browse/ARMBAND-168
[2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/comments/17
[3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630038

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1674837

Title:
  thunder nic: RX_PACKET_DIS fix regression with Extreme Networks switch

Status in linux package in Ubuntu:
  New

Bug description:
  Upstream backport [3] introduced a regression with ThunderX nodes (CRB-1S, 
CRB-2S) and our 10G switch (Extreme Networks x670 10GE L3).
  We have opened a downstream bug report [1], where we temporarily bypassed 
this by pinning the kernel to 4.4.0-45.
  I also tested 4.8 (multiple builds), 4.10 and 4.11-rc1 (vanilla); all are 
still affected by link training issues with our switch, with 4.11-rc1 not 
working at all and reporting more issues (logs attached in a different LP 
comment [2]).

  I also confirmed that reverting the commit in questions fixes the
  issues in our setup (tested on top of 4.10.0-13 linux-image-generic-
  hwe-edge package from Xenial).

  BR,
  Alex

  [1] https://jira.opnfv.org/browse/ARMBAND-168
  [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/comments/17
  [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630038

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674837/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels

2017-03-21 Thread Alexandru Avadanii
Hi, Dann,
First of all, I think the bug title is misleading, as this issue happens on all 
kernels we tested (4.4.0-45..66, 4.8.0-x, 4.10.0-x etc).

To be fair, we haven't this exact bug (or at least I don't think we did)
in practice, i.e. without running stress-ng, 4.4.0-x never ever crashed.

The VM use case turned out to be a different bug [1], triggered 100% by
AAVMF + vhost.

Let me know if I can provide anything else.
I consider this particular bug minor (if we don't poke it with stress-ng, 
everything works well), compared to AAVMF + vhost [1].

Thanks,
Alex

[1] https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1673564

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Yakkety:
  Triaged
Status in linux source package in Zesty:
  Triaged

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 

[Kernel-packages] [Bug 1630038] Re: thunder nic: avoid link delays due to RX_PACKET_DIS

2017-03-21 Thread Alexandru Avadanii
Hi,

1) We tested different models (CRB-1S, CRB-2S) - all behave the same.
2) Please check the logs "ThunderX 4.11-rc1 console log" in [2] linked above. I 
don't think firmware version makes a difference for this issue (we saw the same 
bug with firmwares: T22, T27, T31).

All in all, this issue seems pretty tied to the switch we use, and all
firmware/board model combinations behaved the same ...

Thanks,
Alex

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1630038

Title:
  thunder nic: avoid link delays due to RX_PACKET_DIS

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Yakkety:
  Fix Released

Bug description:
  [Impact]
  Link establishment is delayed during initialization, possibly resulting in 
remote fault conditions that may cause the interface to fail to come up.

  [Test Case]
  Put the system in a reboot loop and watch for a remote fault condition, or a 
failure to bring up the link that can only be resolved by reloading the module.

  [Regression Risk]
  Patch is to a specific driver that is only used on Cavium ThunderX systems. 
The patch is upstream, so will have upstream support for regressions.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630038/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1630038] Re: thunder nic: avoid link delays due to RX_PACKET_DIS

2017-03-19 Thread Alexandru Avadanii
Hi,
This fix introduced a regression with ThunderX nodes (CRB-1S, CRB-2S) and our 
10G switch (Extreme Networks x670 10GE L3).
We have opened a downstream bug report [1], where we temporarily bypassed this 
by pinning the kernel to 4.4.0-45.
I also tested 4.8 (multiple builds), 4.10 and 4.11-rc1 (vanilla); all are still 
affected by link training issues with our switch, with 4.11-rc1 not working at 
all and reporting more issues (logs attached in a different LP comment [2]).

BR,
Alex

[1] https://jira.opnfv.org/browse/ARMBAND-168
[2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/comments/17

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1630038

Title:
  thunder nic: avoid link delays due to RX_PACKET_DIS

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Yakkety:
  Fix Released

Bug description:
  [Impact]
  Link establishment is delayed during initialization, possibly resulting in 
remote fault conditions that may cause the interface to fail to come up.

  [Test Case]
  Put the system in a reboot loop and watch for a remote fault condition, or a 
failure to bring up the link that can only be resolved by reloading the module.

  [Regression Risk]
  Patch is to a specific driver that is only used on Cavium ThunderX systems. 
The patch is upstream, so will have upstream support for regressions.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630038/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels

2017-03-14 Thread Alexandru Avadanii
4.11-rc1 console log attached.
Board firmware is latest available on Gigabyte's site (T31).

1. Install 4.11-rc1 (`make modules_install install`) and reboot
2. Observe networking driver issues in boot log
   Dmesg: 4.11-rc1_dmesg_on_clean_boot.log [3]
3. Try `ping google.com`, obviously not working
4. `modprobe -r nicpf` (leads to multiple oopses in dmesg)
Console log: 4.11-rc1_modprobe_r_nicpf_output.log [1]
Dmesg :4.11-rc1_dmesg_after_modprobe_r_nicpf.log [2]
5. `modprobe nicpf` (this usually works, and afterwards network is up and 
running - not sure whether ALL interfaces are ok, as not all of them are 
connected) - however this time it led to a soft lockup (see full logs attached 
here);

[1] http://paste.ubuntu.com/24178311/
[2] http://paste.ubuntu.com/24178312/
[3] http://paste.ubuntu.com/24178313/

** Attachment added: "ThunderX 4.11-rc1 console log"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+attachment/4837770/+files/thunderx_4.11_rc1_console_log.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Yakkety:
  Triaged
Status in linux source package in Zesty:
  Triaged

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  

[Kernel-packages] [Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels

2017-03-14 Thread Alexandru Avadanii
Hi,
I tried out 4.11-rc1 a few days ago. Unfortunately, I did not get the board to 
boot properly from the start, since ThunderX networking drivers failed to 
allocate MSI-X/MSI interrupts, and polling on some registers also failed ...

So, with 4.11-rc1, at least one networking interfaces was never coming
online due to unmapped interrupts/failed polling, but unloading `nicpf`
and reloading it seemed to work (networking worked after this). After
this, the soft lockup happened, but I can't be sure I did not mess
something else.

Let me try this again and get back to you with some proper logs, but off
the top of my head, things got worse with 4.11-rc1 ...

Thanks,
Alex

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Yakkety:
  Triaged
Status in linux source package in Zesty:
  Triaged

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 

[Kernel-packages] [Bug 1672521] Lspci.txt

2017-03-13 Thread Alexandru Avadanii
apport information

** Attachment added: "Lspci.txt"
   https://bugs.launchpad.net/bugs/1672521/+attachment/4837215/+files/Lspci.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 
dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567:
  dmi.product.name: R120-T30
  dmi.product.version: 0100
  dmi.sys.vendor: GIGABYTE

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1672521] ProcCpuinfo.txt

2017-03-13 Thread Alexandru Avadanii
apport information

** Attachment added: "ProcCpuinfo.txt"
   
https://bugs.launchpad.net/bugs/1672521/+attachment/4837217/+files/ProcCpuinfo.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 
dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567:
  dmi.product.name: R120-T30
  dmi.product.version: 0100
  dmi.sys.vendor: GIGABYTE

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1672521] WifiSyslog.txt

2017-03-13 Thread Alexandru Avadanii
apport information

** Attachment added: "WifiSyslog.txt"
   
https://bugs.launchpad.net/bugs/1672521/+attachment/4837221/+files/WifiSyslog.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 
dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567:
  dmi.product.name: R120-T30
  dmi.product.version: 0100
  dmi.sys.vendor: GIGABYTE

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1672521] Lsusb.txt

2017-03-13 Thread Alexandru Avadanii
apport information

** Attachment added: "Lsusb.txt"
   https://bugs.launchpad.net/bugs/1672521/+attachment/4837216/+files/Lsusb.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 
dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567:
  dmi.product.name: R120-T30
  dmi.product.version: 0100
  dmi.sys.vendor: GIGABYTE

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1672521] UdevDb.txt

2017-03-13 Thread Alexandru Avadanii
apport information

** Attachment added: "UdevDb.txt"
   https://bugs.launchpad.net/bugs/1672521/+attachment/4837220/+files/UdevDb.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 
dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567:
  dmi.product.name: R120-T30
  dmi.product.version: 0100
  dmi.sys.vendor: GIGABYTE

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1672521] CurrentDmesg.txt

2017-03-13 Thread Alexandru Avadanii
apport information

** Attachment added: "CurrentDmesg.txt"
   
https://bugs.launchpad.net/bugs/1672521/+attachment/4837213/+files/CurrentDmesg.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 
dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567:
  dmi.product.name: R120-T30
  dmi.product.version: 0100
  dmi.sys.vendor: GIGABYTE

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1672521] JournalErrors.txt

2017-03-13 Thread Alexandru Avadanii
apport information

** Attachment added: "JournalErrors.txt"
   
https://bugs.launchpad.net/bugs/1672521/+attachment/4837214/+files/JournalErrors.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 
dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567:
  dmi.product.name: R120-T30
  dmi.product.version: 0100
  dmi.sys.vendor: GIGABYTE

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1672521] ProcModules.txt

2017-03-13 Thread Alexandru Avadanii
apport information

** Attachment added: "ProcModules.txt"
   
https://bugs.launchpad.net/bugs/1672521/+attachment/4837219/+files/ProcModules.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 
dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567:
  dmi.product.name: R120-T30
  dmi.product.version: 0100
  dmi.sys.vendor: GIGABYTE

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1672521] ProcInterrupts.txt

2017-03-13 Thread Alexandru Avadanii
apport information

** Attachment added: "ProcInterrupts.txt"
   
https://bugs.launchpad.net/bugs/1672521/+attachment/4837218/+files/ProcInterrupts.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 
dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567:
  dmi.product.name: R120-T30
  dmi.product.version: 0100
  dmi.sys.vendor: GIGABYTE

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels

2017-03-13 Thread Alexandru Avadanii
apport information

** Tags added: apport-collected xenial

** Description changed:

  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).
  
  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:
  
  $ apt-get install stress-ng
  $ stress-ng --hdd 1024
  
  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40
  
  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.
  
  [1] http://paste.ubuntu.com/24172516/
+ --- 
+ AlsaDevices:
+  total 0
+  crw-rw 1 root audio 116,  1 Mar 13 19:27 seq
+  crw-rw 1 root audio 116, 33 Mar 13 19:27 timer
+ AplayDevices: Error: [Errno 2] No such file or directory
+ ApportVersion: 2.20.1-0ubuntu2.5
+ Architecture: arm64
+ ArecordDevices: Error: [Errno 2] No such file or directory
+ AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
+ DistroRelease: Ubuntu 16.04
+ IwConfig: Error: [Errno 2] No such file or directory
+ MachineType: GIGABYTE R120-T30
+ Package: linux (not installed)
+ PciMultimedia:
+  
+ ProcEnviron:
+  TERM=vt220
+  PATH=(custom, no user)
+  XDG_RUNTIME_DIR=
+  LANG=en_US.UTF-8
+  SHELL=/bin/bash
+ ProcFB: 0 astdrmfb
+ ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
+ ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
+ RelatedPackageVersions:
+  linux-restricted-modules-4.8.0-41-generic N/A
+  linux-backports-modules-4.8.0-41-generic  N/A
+  linux-firmware1.157.8
+ RfKill: Error: [Errno 2] No such file or directory
+ Tags:  xenial
+ Uname: Linux 4.8.0-41-generic aarch64
+ UpgradeStatus: No upgrade log present (probably fresh install)
+ UserGroups:
+  
+ _MarkForUpload: True
+ dmi.bios.date: 11/22/2016
+ dmi.bios.vendor: GIGABYTE
+ dmi.bios.version: T22
+ dmi.board.asset.tag: 01234567890123456789AB
+ dmi.board.name: MT30-GS0
+ dmi.board.vendor: GIGABYTE
+ dmi.board.version: 01234567
+ dmi.chassis.asset.tag: 01234567890123456789AB
+ dmi.chassis.type: 17
+ dmi.chassis.vendor: GIGABYTE
+ dmi.chassis.version: 01234567
+ dmi.modalias: 
dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567:
+ dmi.product.name: R120-T30
+ dmi.product.version: 0100
+ dmi.sys.vendor: GIGABYTE

** Attachment added: "CRDA.txt"
   https://bugs.launchpad.net/bugs/1672521/+attachment/4837212/+files/CRDA.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple 

[Kernel-packages] [Bug 1672521] [NEW] ThunderX: soft lockup on 4.8+ kernels

2017-03-13 Thread Alexandru Avadanii
Public bug reported:

I have been trying to easily reproduce this for days.
We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu 
Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

In our environment, this was easily triggered on compute nodes, when launching 
multiple VMs (we suspected OVS, QEMU etc.).
However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

$ apt-get install stress-ng
$ stress-ng --hdd 1024

We tested different FW versions, provided by both chip/board manufacturers, and 
with all of them the result is 100% reproductible, leading to a kernel Oops [1]:
[  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
[  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
[  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  726.094383] kworker/0:1 D 080861bc 0   312  2 0x
[  726.094401] Workqueue: events vmstat_shepherd
[  726.094404] Call trace:
[  726.094411] [] __switch_to+0x94/0xa8
[  726.094418] [] __schedule+0x224/0x718
[  726.094421] [] schedule+0x38/0x98
[  726.094425] [] schedule_preempt_disabled+0x14/0x20
[  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
[  726.094431] [] mutex_lock+0x58/0x70
[  726.094437] [] get_online_cpus+0x44/0x70
[  726.094440] [] vmstat_shepherd+0x3c/0xe8
[  726.094446] [] process_one_work+0x150/0x478
[  726.094449] [] worker_thread+0x50/0x4b8
[  726.094453] [] kthread+0xec/0x100
[  726.094456] [] ret_from_fork+0x10/0x40


Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the soft 
lockup happens with each and every one of them.
On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

[1] http://paste.ubuntu.com/24172516/

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  New

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]   Tainted: GW I 4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1 D 080861bc 0   312  2 
0x
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [] __switch_to+0x94/0xa8
  [  726.094418] [] __schedule+0x224/0x718
  [  726.094421] [] schedule+0x38/0x98
  [  726.094425] [] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [] mutex_lock+0x58/0x70
  [  726.094437] [] get_online_cpus+0x44/0x70
  [  726.094440] [] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [] process_one_work+0x150/0x478
  [  726.094449] [] worker_thread+0x50/0x4b8
  [  726.094453] [] kthread+0xec/0x100
  [  726.094456] [] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp