[Kernel-packages] [Bug 1857074] Re: Cavium ThunderX CN88XX Panic : Unknown reason
Hi, Not sure this is useful (since it might be obvious), but adding `nopti` to kernel parameters works around the issue, indicating this is indeed related to kpti. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1857074 Title: Cavium ThunderX CN88XX Panic : Unknown reason Status in linux package in Ubuntu: Confirmed Status in linux source package in Bionic: Confirmed Bug description: Series: Bionic Kernel: 4.15.0-74.84 linux-generic Steps to reproduce: Install 4.15.0-74.84 Kernel and boot the system. The following crash was observed while testing the proposed kernel for the 2019.12.02 SRU Cycle. This kernel was built to include fixes for the following bugs: * [Regression] Bionic kernel 4.15.0-71.80 can not boot on ThunderX (LP: #1853326) - Revert "arm64: Use firmware to detect CPUs that are not affected by Spectre-v2" - Revert "arm64: Get rid of __smccc_workaround_1_hvc_*" * [Regression] Bionic kernel 4.15.0-71.80 can not boot on ThunderX2 and Kunpeng920 (LP: #1852723) - SAUCE: arm64: capabilities: Move setup_boot_cpu_capabilities() call to correct place The following crash appears to be a NEW bug. not related to the prior bugs listed above. This bug DOES NOT APPEAR to be related to LP#1857073. This is another NEW BUG. Hostname: Starmie Probable Cause is unknown at this point and still under investigation. [ OK ] Found device WDC_WD5003ABYZ-011FA0 efi. Mounting /boot/efi... [ OK ] Mounted /boot/efi. [ OK ] Reached target Local File Systems. Starting AppArmor initialization... Starting Tell Plymouth To Write Out Runtime Data... Starting ebtables ruleset management... [ 20.942427] kernel BUG at /build/linux-pWET3k/linux-4.15.0/fs/buffer.c:1240! [ 20.951416] Internal error: Oops - BUG: 0 [#1] SMP [ 20.958153] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip cavium_rng_vf shpchp cavium_rng gpio_keys uio_pdrv_genirq ipmi_ssif uio ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nicvf nicpf ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect aes_ce_blk sysimgblt fb_sys_fops aes_ce_cipher crc32_ce drm crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ahci thunder_bgx libahci thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 [ 21.044326] Process systemd (pid: 1, stack limit = 0x5af6f18b) [ 21.053858] CPU: 1 PID: 1 Comm: systemd Not tainted 4.15.0-74-generic #84-Ubuntu [ 21.063931] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 21.074790] pstate: 20400085 (nzCv daIf +PAN -UAO) [ 21.082096] pc : __find_get_block+0x2e8/0x398 [ 21.088917] lr : __getblk_gfp+0x3c/0x2a8 [ 21.095379] sp : 099ab7e0 [ 21.101062] x29: 099ab7e0 x28: [ 21.108699] x27: x26: [ 21.116265] x25: 0001 x24: [ 21.123788] x23: 0008 x22: 801f26116c80 [ 21.131302] x21: 801f26116c80 x20: 245c [ 21.138808] x19: 1000 x18: a59c3a70 [ 21.146300] x17: x16: [ 21.153730] x15: 0020 x14: 0012 [ 21.161083] x13: 2f7374696e752f64 x12: 0101010101010101 [ 21.168397] x11: 7f7f7f7f7f7f7f7f x10: 0972d000 [ 21.175689] x9 : x8 : 801f7ba7e3c0 [ 21.183042] x7 : 801f7ba7e3e0 x6 : [ 21.190667] x5 : 0004 x4 : 0020 [ 21.197955] x3 : 0008 x2 : 1000 [ 21.205680] x1 : 245c x0 : 0080 [ 21.212918] Call trace: [ 21.217257] __find_get_block+0x2e8/0x398 [ 21.223160] __getblk_gfp+0x3c/0x2a8 [ 21.228644] ext4_getblk+0xcc/0x1b0 [ 21.233991] ext4_bread_batch+0x78/0x1c8 [ 21.239726] ext4_find_entry+0x2d4/0x598 [ 21.245416] ext4_lookup+0xac/0x278 [ 21.250612] lookup_slow+0xac/0x190 [ 21.255736] walk_component+0x228/0x340 [ 21.261151] link_path_walk+0x2f4/0x568 [ 21.266499] path_parentat+0x44/0x88 [ 21.271521] filename_parentat+0xa0/0x170 [ 21.276924] filename_create+0x60/0x168 [ 21.282082] SyS_symlinkat+0x80/0x128 [ 21.287013] el0_svc_naked+0x30/0x34 [ 21.291835] Code: 17e7 a90363b7 a9046bb9 f9002bbb (d421) [ 21.299191] ---[ end trace b07cecc329f07f48 ]--- [ 21.347488] systemd: 35 output lines suppressed due to ratelimiting [
[Kernel-packages] [Bug 1673564] Re: ThunderX: soft lockup on 4.8+ kernels when running qemu-efi with vhost=on
Hi, Dann, Thanks for looking into this! One more thing: we blacklisted the module "vhost_net", and that bypasses the issue. I know it's not the right direction for finding a fix, but maybe it helps with the debug. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1673564 Title: ThunderX: soft lockup on 4.8+ kernels when running qemu-efi with vhost=on Status in edk2 package in Ubuntu: Confirmed Status in linux package in Ubuntu: Confirmed Bug description: This is a followup of an earlier thread/bug that we have narrowed down to an incompatibility/issue with vhost support in qemu-efi. Without vhost=on qemu seems to be working fine. I have tested several edk2 firmwares: - xenial - zesty - Fedora: ftp://195.220.108.108/linux/fedora-secondary/development/rawhide/Everything/aarch64/os/Packages/e/edk2-aarch64-20170209git296153c5-2.fc26.noarch.rpm I have also tested with different guests: - cirros: https://download.cirros-cloud.net/daily/20161201/cirros-d161201-aarch64-disk.img - ubuntu xenial: https://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-arm64-uefi1.img The test steps are simple enough. A tap device is needed, qemu-kvm, qemu-efi need to be installed. The UEFI iamge is run as shown in the launch.sh script, the tap device is used in vhost=on mode. Also note that the QEMU_EFI.fd binary needs to be padded up to 64M: dd if=/dev/zero of=AAVMF_CODE.fd bs=1M count=64 dd if=QEMU_EFI.fd of=AAVMF_CODE.fd conv=notrunc The result was always the same, the node crashing with soft-lockups when qemu was attempting to boot the kernel. I will attach all the relevant information shortly. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1673564/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1674837] Re: thunder nic: RX_PACKET_DIS fix regression with Extreme Networks switch
** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1674837 Title: thunder nic: RX_PACKET_DIS fix regression with Extreme Networks switch Status in linux package in Ubuntu: Confirmed Bug description: Upstream backport [3] introduced a regression with ThunderX nodes (CRB-1S, CRB-2S) and our 10G switch (Extreme Networks x670 10GE L3). We have opened a downstream bug report [1], where we temporarily bypassed this by pinning the kernel to 4.4.0-45. I also tested 4.8 (multiple builds), 4.10 and 4.11-rc1 (vanilla); all are still affected by link training issues with our switch, with 4.11-rc1 not working at all and reporting more issues (logs attached in a different LP comment [2]). I also confirmed that reverting the commit in questions fixes the issues in our setup (tested on top of 4.10.0-13 linux-image-generic- hwe-edge package from Xenial). BR, Alex [1] https://jira.opnfv.org/browse/ARMBAND-168 [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/comments/17 [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630038 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674837/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1630038] Re: thunder nic: avoid link delays due to RX_PACKET_DIS
Hi, Dann, I created a new bug and pasted the same info as above at [1]. Afaict, there is no useful information in the logs when link training fails. [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674837 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1630038 Title: thunder nic: avoid link delays due to RX_PACKET_DIS Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Fix Released Status in linux source package in Yakkety: Fix Released Bug description: [Impact] Link establishment is delayed during initialization, possibly resulting in remote fault conditions that may cause the interface to fail to come up. [Test Case] Put the system in a reboot loop and watch for a remote fault condition, or a failure to bring up the link that can only be resolved by reloading the module. [Regression Risk] Patch is to a specific driver that is only used on Cavium ThunderX systems. The patch is upstream, so will have upstream support for regressions. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630038/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1674837] Re: thunder nic: RX_PACKET_DIS fix regression with Extreme Networks switch
Let me know if I should attach any logs, although there are *no* traces anywhere, at least with default log levels (without recompiling). -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1674837 Title: thunder nic: RX_PACKET_DIS fix regression with Extreme Networks switch Status in linux package in Ubuntu: New Bug description: Upstream backport [3] introduced a regression with ThunderX nodes (CRB-1S, CRB-2S) and our 10G switch (Extreme Networks x670 10GE L3). We have opened a downstream bug report [1], where we temporarily bypassed this by pinning the kernel to 4.4.0-45. I also tested 4.8 (multiple builds), 4.10 and 4.11-rc1 (vanilla); all are still affected by link training issues with our switch, with 4.11-rc1 not working at all and reporting more issues (logs attached in a different LP comment [2]). I also confirmed that reverting the commit in questions fixes the issues in our setup (tested on top of 4.10.0-13 linux-image-generic- hwe-edge package from Xenial). BR, Alex [1] https://jira.opnfv.org/browse/ARMBAND-168 [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/comments/17 [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630038 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674837/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1674837] [NEW] thunder nic: RX_PACKET_DIS fix regression with Extreme Networks switch
Public bug reported: Upstream backport [3] introduced a regression with ThunderX nodes (CRB-1S, CRB-2S) and our 10G switch (Extreme Networks x670 10GE L3). We have opened a downstream bug report [1], where we temporarily bypassed this by pinning the kernel to 4.4.0-45. I also tested 4.8 (multiple builds), 4.10 and 4.11-rc1 (vanilla); all are still affected by link training issues with our switch, with 4.11-rc1 not working at all and reporting more issues (logs attached in a different LP comment [2]). I also confirmed that reverting the commit in questions fixes the issues in our setup (tested on top of 4.10.0-13 linux-image-generic-hwe-edge package from Xenial). BR, Alex [1] https://jira.opnfv.org/browse/ARMBAND-168 [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/comments/17 [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630038 ** Affects: linux (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1674837 Title: thunder nic: RX_PACKET_DIS fix regression with Extreme Networks switch Status in linux package in Ubuntu: New Bug description: Upstream backport [3] introduced a regression with ThunderX nodes (CRB-1S, CRB-2S) and our 10G switch (Extreme Networks x670 10GE L3). We have opened a downstream bug report [1], where we temporarily bypassed this by pinning the kernel to 4.4.0-45. I also tested 4.8 (multiple builds), 4.10 and 4.11-rc1 (vanilla); all are still affected by link training issues with our switch, with 4.11-rc1 not working at all and reporting more issues (logs attached in a different LP comment [2]). I also confirmed that reverting the commit in questions fixes the issues in our setup (tested on top of 4.10.0-13 linux-image-generic- hwe-edge package from Xenial). BR, Alex [1] https://jira.opnfv.org/browse/ARMBAND-168 [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/comments/17 [3] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630038 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674837/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels
Hi, Dann, First of all, I think the bug title is misleading, as this issue happens on all kernels we tested (4.4.0-45..66, 4.8.0-x, 4.10.0-x etc). To be fair, we haven't this exact bug (or at least I don't think we did) in practice, i.e. without running stress-ng, 4.4.0-x never ever crashed. The VM use case turned out to be a different bug [1], triggered 100% by AAVMF + vhost. Let me know if I can provide anything else. I consider this particular bug minor (if we don't poke it with stress-ng, everything works well), compared to AAVMF + vhost [1]. Thanks, Alex [1] https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1673564 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels Status in linux package in Ubuntu: Triaged Status in linux source package in Yakkety: Triaged Status in linux source package in Zesty: Triaged Bug description: I have been trying to easily reproduce this for days. We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8). In our environment, this was easily triggered on compute nodes, when launching multiple VMs (we suspected OVS, QEMU etc.). However, in order to rule out our specifics, we looked for a simple way to reproduce it on all ThunderX nodes we have access to, and we finally found it: $ apt-get install stress-ng $ stress-ng --hdd 1024 We tested different FW versions, provided by both chip/board manufacturers, and with all of them the result is 100% reproductible, leading to a kernel Oops [1]: [ 726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds. [ 726.077908] Tainted: GW I 4.8.0-41-generic #44~16.04.1-Ubuntu [ 726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.094383] kworker/0:1 D 080861bc 0 312 2 0x [ 726.094401] Workqueue: events vmstat_shepherd [ 726.094404] Call trace: [ 726.094411] [] __switch_to+0x94/0xa8 [ 726.094418] [] __schedule+0x224/0x718 [ 726.094421] [] schedule+0x38/0x98 [ 726.094425] [] schedule_preempt_disabled+0x14/0x20 [ 726.094428] [] __mutex_lock_slowpath+0xd4/0x168 [ 726.094431] [] mutex_lock+0x58/0x70 [ 726.094437] [] get_online_cpus+0x44/0x70 [ 726.094440] [] vmstat_shepherd+0x3c/0xe8 [ 726.094446] [] process_one_work+0x150/0x478 [ 726.094449] [] worker_thread+0x50/0x4b8 [ 726.094453] [] kthread+0xec/0x100 [ 726.094456] [] ret_from_fork+0x10/0x40 Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the soft lockup happens with each and every one of them. On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably newer 4.4.0-* too, but due to a regression in the ethernet drivers after 4.4.0-45, we can't test those with ease) under normal conditions, yet running stress-ng leads to the same oops. [1] http://paste.ubuntu.com/24172516/ --- AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Mar 13 19:27 seq crw-rw 1 root audio 116, 33 Mar 13 19:27 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.20.1-0ubuntu2.5 Architecture: arm64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 16.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: GIGABYTE R120-T30 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=vt220 PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 astdrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet splash vt.handoff=7 ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17 RelatedPackageVersions: linux-restricted-modules-4.8.0-41-generic N/A linux-backports-modules-4.8.0-41-generic N/A linux-firmware1.157.8 RfKill: Error: [Errno 2] No such file or directory Tags: xenial Uname: Linux 4.8.0-41-generic aarch64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 11/22/2016 dmi.bios.vendor: GIGABYTE dmi.bios.version: T22 dmi.board.asset.tag: 01234567890123456789AB dmi.board.name: MT30-GS0 dmi.board.vendor: GIGABYTE dmi.board.version: 01234567 dmi.chassis.asset.tag: 01234567890123456789AB dmi.chassis.type: 17 dmi.chassis.vendor: GIGABYTE dmi.chassis.version: 01234567 dmi.modalias:
[Kernel-packages] [Bug 1630038] Re: thunder nic: avoid link delays due to RX_PACKET_DIS
Hi, 1) We tested different models (CRB-1S, CRB-2S) - all behave the same. 2) Please check the logs "ThunderX 4.11-rc1 console log" in [2] linked above. I don't think firmware version makes a difference for this issue (we saw the same bug with firmwares: T22, T27, T31). All in all, this issue seems pretty tied to the switch we use, and all firmware/board model combinations behaved the same ... Thanks, Alex -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1630038 Title: thunder nic: avoid link delays due to RX_PACKET_DIS Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Fix Released Status in linux source package in Yakkety: Fix Released Bug description: [Impact] Link establishment is delayed during initialization, possibly resulting in remote fault conditions that may cause the interface to fail to come up. [Test Case] Put the system in a reboot loop and watch for a remote fault condition, or a failure to bring up the link that can only be resolved by reloading the module. [Regression Risk] Patch is to a specific driver that is only used on Cavium ThunderX systems. The patch is upstream, so will have upstream support for regressions. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630038/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1630038] Re: thunder nic: avoid link delays due to RX_PACKET_DIS
Hi, This fix introduced a regression with ThunderX nodes (CRB-1S, CRB-2S) and our 10G switch (Extreme Networks x670 10GE L3). We have opened a downstream bug report [1], where we temporarily bypassed this by pinning the kernel to 4.4.0-45. I also tested 4.8 (multiple builds), 4.10 and 4.11-rc1 (vanilla); all are still affected by link training issues with our switch, with 4.11-rc1 not working at all and reporting more issues (logs attached in a different LP comment [2]). BR, Alex [1] https://jira.opnfv.org/browse/ARMBAND-168 [2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/comments/17 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1630038 Title: thunder nic: avoid link delays due to RX_PACKET_DIS Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Fix Released Status in linux source package in Yakkety: Fix Released Bug description: [Impact] Link establishment is delayed during initialization, possibly resulting in remote fault conditions that may cause the interface to fail to come up. [Test Case] Put the system in a reboot loop and watch for a remote fault condition, or a failure to bring up the link that can only be resolved by reloading the module. [Regression Risk] Patch is to a specific driver that is only used on Cavium ThunderX systems. The patch is upstream, so will have upstream support for regressions. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1630038/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels
4.11-rc1 console log attached. Board firmware is latest available on Gigabyte's site (T31). 1. Install 4.11-rc1 (`make modules_install install`) and reboot 2. Observe networking driver issues in boot log Dmesg: 4.11-rc1_dmesg_on_clean_boot.log [3] 3. Try `ping google.com`, obviously not working 4. `modprobe -r nicpf` (leads to multiple oopses in dmesg) Console log: 4.11-rc1_modprobe_r_nicpf_output.log [1] Dmesg :4.11-rc1_dmesg_after_modprobe_r_nicpf.log [2] 5. `modprobe nicpf` (this usually works, and afterwards network is up and running - not sure whether ALL interfaces are ok, as not all of them are connected) - however this time it led to a soft lockup (see full logs attached here); [1] http://paste.ubuntu.com/24178311/ [2] http://paste.ubuntu.com/24178312/ [3] http://paste.ubuntu.com/24178313/ ** Attachment added: "ThunderX 4.11-rc1 console log" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+attachment/4837770/+files/thunderx_4.11_rc1_console_log.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels Status in linux package in Ubuntu: Triaged Status in linux source package in Yakkety: Triaged Status in linux source package in Zesty: Triaged Bug description: I have been trying to easily reproduce this for days. We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8). In our environment, this was easily triggered on compute nodes, when launching multiple VMs (we suspected OVS, QEMU etc.). However, in order to rule out our specifics, we looked for a simple way to reproduce it on all ThunderX nodes we have access to, and we finally found it: $ apt-get install stress-ng $ stress-ng --hdd 1024 We tested different FW versions, provided by both chip/board manufacturers, and with all of them the result is 100% reproductible, leading to a kernel Oops [1]: [ 726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds. [ 726.077908] Tainted: GW I 4.8.0-41-generic #44~16.04.1-Ubuntu [ 726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.094383] kworker/0:1 D 080861bc 0 312 2 0x [ 726.094401] Workqueue: events vmstat_shepherd [ 726.094404] Call trace: [ 726.094411] [] __switch_to+0x94/0xa8 [ 726.094418] [] __schedule+0x224/0x718 [ 726.094421] [] schedule+0x38/0x98 [ 726.094425] [] schedule_preempt_disabled+0x14/0x20 [ 726.094428] [] __mutex_lock_slowpath+0xd4/0x168 [ 726.094431] [] mutex_lock+0x58/0x70 [ 726.094437] [] get_online_cpus+0x44/0x70 [ 726.094440] [] vmstat_shepherd+0x3c/0xe8 [ 726.094446] [] process_one_work+0x150/0x478 [ 726.094449] [] worker_thread+0x50/0x4b8 [ 726.094453] [] kthread+0xec/0x100 [ 726.094456] [] ret_from_fork+0x10/0x40 Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the soft lockup happens with each and every one of them. On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably newer 4.4.0-* too, but due to a regression in the ethernet drivers after 4.4.0-45, we can't test those with ease) under normal conditions, yet running stress-ng leads to the same oops. [1] http://paste.ubuntu.com/24172516/ --- AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Mar 13 19:27 seq crw-rw 1 root audio 116, 33 Mar 13 19:27 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.20.1-0ubuntu2.5 Architecture: arm64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 16.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: GIGABYTE R120-T30 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=vt220 PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 astdrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet splash vt.handoff=7 ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17 RelatedPackageVersions: linux-restricted-modules-4.8.0-41-generic N/A linux-backports-modules-4.8.0-41-generic N/A linux-firmware1.157.8 RfKill: Error: [Errno 2] No such file or directory Tags: xenial Uname: Linux 4.8.0-41-generic aarch64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 11/22/2016 dmi.bios.vendor: GIGABYTE dmi.bios.version: T22
[Kernel-packages] [Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels
Hi, I tried out 4.11-rc1 a few days ago. Unfortunately, I did not get the board to boot properly from the start, since ThunderX networking drivers failed to allocate MSI-X/MSI interrupts, and polling on some registers also failed ... So, with 4.11-rc1, at least one networking interfaces was never coming online due to unmapped interrupts/failed polling, but unloading `nicpf` and reloading it seemed to work (networking worked after this). After this, the soft lockup happened, but I can't be sure I did not mess something else. Let me try this again and get back to you with some proper logs, but off the top of my head, things got worse with 4.11-rc1 ... Thanks, Alex -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels Status in linux package in Ubuntu: Triaged Status in linux source package in Yakkety: Triaged Status in linux source package in Zesty: Triaged Bug description: I have been trying to easily reproduce this for days. We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8). In our environment, this was easily triggered on compute nodes, when launching multiple VMs (we suspected OVS, QEMU etc.). However, in order to rule out our specifics, we looked for a simple way to reproduce it on all ThunderX nodes we have access to, and we finally found it: $ apt-get install stress-ng $ stress-ng --hdd 1024 We tested different FW versions, provided by both chip/board manufacturers, and with all of them the result is 100% reproductible, leading to a kernel Oops [1]: [ 726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds. [ 726.077908] Tainted: GW I 4.8.0-41-generic #44~16.04.1-Ubuntu [ 726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.094383] kworker/0:1 D 080861bc 0 312 2 0x [ 726.094401] Workqueue: events vmstat_shepherd [ 726.094404] Call trace: [ 726.094411] [] __switch_to+0x94/0xa8 [ 726.094418] [] __schedule+0x224/0x718 [ 726.094421] [] schedule+0x38/0x98 [ 726.094425] [] schedule_preempt_disabled+0x14/0x20 [ 726.094428] [] __mutex_lock_slowpath+0xd4/0x168 [ 726.094431] [] mutex_lock+0x58/0x70 [ 726.094437] [] get_online_cpus+0x44/0x70 [ 726.094440] [] vmstat_shepherd+0x3c/0xe8 [ 726.094446] [] process_one_work+0x150/0x478 [ 726.094449] [] worker_thread+0x50/0x4b8 [ 726.094453] [] kthread+0xec/0x100 [ 726.094456] [] ret_from_fork+0x10/0x40 Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the soft lockup happens with each and every one of them. On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably newer 4.4.0-* too, but due to a regression in the ethernet drivers after 4.4.0-45, we can't test those with ease) under normal conditions, yet running stress-ng leads to the same oops. [1] http://paste.ubuntu.com/24172516/ --- AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Mar 13 19:27 seq crw-rw 1 root audio 116, 33 Mar 13 19:27 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.20.1-0ubuntu2.5 Architecture: arm64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 16.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: GIGABYTE R120-T30 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=vt220 PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 astdrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet splash vt.handoff=7 ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17 RelatedPackageVersions: linux-restricted-modules-4.8.0-41-generic N/A linux-backports-modules-4.8.0-41-generic N/A linux-firmware1.157.8 RfKill: Error: [Errno 2] No such file or directory Tags: xenial Uname: Linux 4.8.0-41-generic aarch64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 11/22/2016 dmi.bios.vendor: GIGABYTE dmi.bios.version: T22 dmi.board.asset.tag: 01234567890123456789AB dmi.board.name: MT30-GS0 dmi.board.vendor: GIGABYTE dmi.board.version: 01234567 dmi.chassis.asset.tag: 01234567890123456789AB dmi.chassis.type: 17 dmi.chassis.vendor: GIGABYTE dmi.chassis.version: 01234567 dmi.modalias:
[Kernel-packages] [Bug 1672521] Lspci.txt
apport information ** Attachment added: "Lspci.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837215/+files/Lspci.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels Status in linux package in Ubuntu: Confirmed Bug description: I have been trying to easily reproduce this for days. We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8). In our environment, this was easily triggered on compute nodes, when launching multiple VMs (we suspected OVS, QEMU etc.). However, in order to rule out our specifics, we looked for a simple way to reproduce it on all ThunderX nodes we have access to, and we finally found it: $ apt-get install stress-ng $ stress-ng --hdd 1024 We tested different FW versions, provided by both chip/board manufacturers, and with all of them the result is 100% reproductible, leading to a kernel Oops [1]: [ 726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds. [ 726.077908] Tainted: GW I 4.8.0-41-generic #44~16.04.1-Ubuntu [ 726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.094383] kworker/0:1 D 080861bc 0 312 2 0x [ 726.094401] Workqueue: events vmstat_shepherd [ 726.094404] Call trace: [ 726.094411] [] __switch_to+0x94/0xa8 [ 726.094418] [] __schedule+0x224/0x718 [ 726.094421] [] schedule+0x38/0x98 [ 726.094425] [] schedule_preempt_disabled+0x14/0x20 [ 726.094428] [] __mutex_lock_slowpath+0xd4/0x168 [ 726.094431] [] mutex_lock+0x58/0x70 [ 726.094437] [] get_online_cpus+0x44/0x70 [ 726.094440] [] vmstat_shepherd+0x3c/0xe8 [ 726.094446] [] process_one_work+0x150/0x478 [ 726.094449] [] worker_thread+0x50/0x4b8 [ 726.094453] [] kthread+0xec/0x100 [ 726.094456] [] ret_from_fork+0x10/0x40 Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the soft lockup happens with each and every one of them. On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably newer 4.4.0-* too, but due to a regression in the ethernet drivers after 4.4.0-45, we can't test those with ease) under normal conditions, yet running stress-ng leads to the same oops. [1] http://paste.ubuntu.com/24172516/ --- AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Mar 13 19:27 seq crw-rw 1 root audio 116, 33 Mar 13 19:27 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.20.1-0ubuntu2.5 Architecture: arm64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 16.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: GIGABYTE R120-T30 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=vt220 PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 astdrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet splash vt.handoff=7 ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17 RelatedPackageVersions: linux-restricted-modules-4.8.0-41-generic N/A linux-backports-modules-4.8.0-41-generic N/A linux-firmware1.157.8 RfKill: Error: [Errno 2] No such file or directory Tags: xenial Uname: Linux 4.8.0-41-generic aarch64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 11/22/2016 dmi.bios.vendor: GIGABYTE dmi.bios.version: T22 dmi.board.asset.tag: 01234567890123456789AB dmi.board.name: MT30-GS0 dmi.board.vendor: GIGABYTE dmi.board.version: 01234567 dmi.chassis.asset.tag: 01234567890123456789AB dmi.chassis.type: 17 dmi.chassis.vendor: GIGABYTE dmi.chassis.version: 01234567 dmi.modalias: dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567: dmi.product.name: R120-T30 dmi.product.version: 0100 dmi.sys.vendor: GIGABYTE To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1672521] ProcCpuinfo.txt
apport information ** Attachment added: "ProcCpuinfo.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837217/+files/ProcCpuinfo.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels Status in linux package in Ubuntu: Confirmed Bug description: I have been trying to easily reproduce this for days. We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8). In our environment, this was easily triggered on compute nodes, when launching multiple VMs (we suspected OVS, QEMU etc.). However, in order to rule out our specifics, we looked for a simple way to reproduce it on all ThunderX nodes we have access to, and we finally found it: $ apt-get install stress-ng $ stress-ng --hdd 1024 We tested different FW versions, provided by both chip/board manufacturers, and with all of them the result is 100% reproductible, leading to a kernel Oops [1]: [ 726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds. [ 726.077908] Tainted: GW I 4.8.0-41-generic #44~16.04.1-Ubuntu [ 726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.094383] kworker/0:1 D 080861bc 0 312 2 0x [ 726.094401] Workqueue: events vmstat_shepherd [ 726.094404] Call trace: [ 726.094411] [] __switch_to+0x94/0xa8 [ 726.094418] [] __schedule+0x224/0x718 [ 726.094421] [] schedule+0x38/0x98 [ 726.094425] [] schedule_preempt_disabled+0x14/0x20 [ 726.094428] [] __mutex_lock_slowpath+0xd4/0x168 [ 726.094431] [] mutex_lock+0x58/0x70 [ 726.094437] [] get_online_cpus+0x44/0x70 [ 726.094440] [] vmstat_shepherd+0x3c/0xe8 [ 726.094446] [] process_one_work+0x150/0x478 [ 726.094449] [] worker_thread+0x50/0x4b8 [ 726.094453] [] kthread+0xec/0x100 [ 726.094456] [] ret_from_fork+0x10/0x40 Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the soft lockup happens with each and every one of them. On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably newer 4.4.0-* too, but due to a regression in the ethernet drivers after 4.4.0-45, we can't test those with ease) under normal conditions, yet running stress-ng leads to the same oops. [1] http://paste.ubuntu.com/24172516/ --- AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Mar 13 19:27 seq crw-rw 1 root audio 116, 33 Mar 13 19:27 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.20.1-0ubuntu2.5 Architecture: arm64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 16.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: GIGABYTE R120-T30 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=vt220 PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 astdrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet splash vt.handoff=7 ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17 RelatedPackageVersions: linux-restricted-modules-4.8.0-41-generic N/A linux-backports-modules-4.8.0-41-generic N/A linux-firmware1.157.8 RfKill: Error: [Errno 2] No such file or directory Tags: xenial Uname: Linux 4.8.0-41-generic aarch64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 11/22/2016 dmi.bios.vendor: GIGABYTE dmi.bios.version: T22 dmi.board.asset.tag: 01234567890123456789AB dmi.board.name: MT30-GS0 dmi.board.vendor: GIGABYTE dmi.board.version: 01234567 dmi.chassis.asset.tag: 01234567890123456789AB dmi.chassis.type: 17 dmi.chassis.vendor: GIGABYTE dmi.chassis.version: 01234567 dmi.modalias: dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567: dmi.product.name: R120-T30 dmi.product.version: 0100 dmi.sys.vendor: GIGABYTE To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1672521] WifiSyslog.txt
apport information ** Attachment added: "WifiSyslog.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837221/+files/WifiSyslog.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels Status in linux package in Ubuntu: Confirmed Bug description: I have been trying to easily reproduce this for days. We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8). In our environment, this was easily triggered on compute nodes, when launching multiple VMs (we suspected OVS, QEMU etc.). However, in order to rule out our specifics, we looked for a simple way to reproduce it on all ThunderX nodes we have access to, and we finally found it: $ apt-get install stress-ng $ stress-ng --hdd 1024 We tested different FW versions, provided by both chip/board manufacturers, and with all of them the result is 100% reproductible, leading to a kernel Oops [1]: [ 726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds. [ 726.077908] Tainted: GW I 4.8.0-41-generic #44~16.04.1-Ubuntu [ 726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.094383] kworker/0:1 D 080861bc 0 312 2 0x [ 726.094401] Workqueue: events vmstat_shepherd [ 726.094404] Call trace: [ 726.094411] [] __switch_to+0x94/0xa8 [ 726.094418] [] __schedule+0x224/0x718 [ 726.094421] [] schedule+0x38/0x98 [ 726.094425] [] schedule_preempt_disabled+0x14/0x20 [ 726.094428] [] __mutex_lock_slowpath+0xd4/0x168 [ 726.094431] [] mutex_lock+0x58/0x70 [ 726.094437] [] get_online_cpus+0x44/0x70 [ 726.094440] [] vmstat_shepherd+0x3c/0xe8 [ 726.094446] [] process_one_work+0x150/0x478 [ 726.094449] [] worker_thread+0x50/0x4b8 [ 726.094453] [] kthread+0xec/0x100 [ 726.094456] [] ret_from_fork+0x10/0x40 Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the soft lockup happens with each and every one of them. On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably newer 4.4.0-* too, but due to a regression in the ethernet drivers after 4.4.0-45, we can't test those with ease) under normal conditions, yet running stress-ng leads to the same oops. [1] http://paste.ubuntu.com/24172516/ --- AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Mar 13 19:27 seq crw-rw 1 root audio 116, 33 Mar 13 19:27 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.20.1-0ubuntu2.5 Architecture: arm64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 16.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: GIGABYTE R120-T30 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=vt220 PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 astdrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet splash vt.handoff=7 ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17 RelatedPackageVersions: linux-restricted-modules-4.8.0-41-generic N/A linux-backports-modules-4.8.0-41-generic N/A linux-firmware1.157.8 RfKill: Error: [Errno 2] No such file or directory Tags: xenial Uname: Linux 4.8.0-41-generic aarch64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 11/22/2016 dmi.bios.vendor: GIGABYTE dmi.bios.version: T22 dmi.board.asset.tag: 01234567890123456789AB dmi.board.name: MT30-GS0 dmi.board.vendor: GIGABYTE dmi.board.version: 01234567 dmi.chassis.asset.tag: 01234567890123456789AB dmi.chassis.type: 17 dmi.chassis.vendor: GIGABYTE dmi.chassis.version: 01234567 dmi.modalias: dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567: dmi.product.name: R120-T30 dmi.product.version: 0100 dmi.sys.vendor: GIGABYTE To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1672521] Lsusb.txt
apport information ** Attachment added: "Lsusb.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837216/+files/Lsusb.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels Status in linux package in Ubuntu: Confirmed Bug description: I have been trying to easily reproduce this for days. We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8). In our environment, this was easily triggered on compute nodes, when launching multiple VMs (we suspected OVS, QEMU etc.). However, in order to rule out our specifics, we looked for a simple way to reproduce it on all ThunderX nodes we have access to, and we finally found it: $ apt-get install stress-ng $ stress-ng --hdd 1024 We tested different FW versions, provided by both chip/board manufacturers, and with all of them the result is 100% reproductible, leading to a kernel Oops [1]: [ 726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds. [ 726.077908] Tainted: GW I 4.8.0-41-generic #44~16.04.1-Ubuntu [ 726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.094383] kworker/0:1 D 080861bc 0 312 2 0x [ 726.094401] Workqueue: events vmstat_shepherd [ 726.094404] Call trace: [ 726.094411] [] __switch_to+0x94/0xa8 [ 726.094418] [] __schedule+0x224/0x718 [ 726.094421] [] schedule+0x38/0x98 [ 726.094425] [] schedule_preempt_disabled+0x14/0x20 [ 726.094428] [] __mutex_lock_slowpath+0xd4/0x168 [ 726.094431] [] mutex_lock+0x58/0x70 [ 726.094437] [] get_online_cpus+0x44/0x70 [ 726.094440] [] vmstat_shepherd+0x3c/0xe8 [ 726.094446] [] process_one_work+0x150/0x478 [ 726.094449] [] worker_thread+0x50/0x4b8 [ 726.094453] [] kthread+0xec/0x100 [ 726.094456] [] ret_from_fork+0x10/0x40 Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the soft lockup happens with each and every one of them. On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably newer 4.4.0-* too, but due to a regression in the ethernet drivers after 4.4.0-45, we can't test those with ease) under normal conditions, yet running stress-ng leads to the same oops. [1] http://paste.ubuntu.com/24172516/ --- AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Mar 13 19:27 seq crw-rw 1 root audio 116, 33 Mar 13 19:27 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.20.1-0ubuntu2.5 Architecture: arm64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 16.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: GIGABYTE R120-T30 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=vt220 PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 astdrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet splash vt.handoff=7 ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17 RelatedPackageVersions: linux-restricted-modules-4.8.0-41-generic N/A linux-backports-modules-4.8.0-41-generic N/A linux-firmware1.157.8 RfKill: Error: [Errno 2] No such file or directory Tags: xenial Uname: Linux 4.8.0-41-generic aarch64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 11/22/2016 dmi.bios.vendor: GIGABYTE dmi.bios.version: T22 dmi.board.asset.tag: 01234567890123456789AB dmi.board.name: MT30-GS0 dmi.board.vendor: GIGABYTE dmi.board.version: 01234567 dmi.chassis.asset.tag: 01234567890123456789AB dmi.chassis.type: 17 dmi.chassis.vendor: GIGABYTE dmi.chassis.version: 01234567 dmi.modalias: dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567: dmi.product.name: R120-T30 dmi.product.version: 0100 dmi.sys.vendor: GIGABYTE To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1672521] UdevDb.txt
apport information ** Attachment added: "UdevDb.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837220/+files/UdevDb.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels Status in linux package in Ubuntu: Confirmed Bug description: I have been trying to easily reproduce this for days. We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8). In our environment, this was easily triggered on compute nodes, when launching multiple VMs (we suspected OVS, QEMU etc.). However, in order to rule out our specifics, we looked for a simple way to reproduce it on all ThunderX nodes we have access to, and we finally found it: $ apt-get install stress-ng $ stress-ng --hdd 1024 We tested different FW versions, provided by both chip/board manufacturers, and with all of them the result is 100% reproductible, leading to a kernel Oops [1]: [ 726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds. [ 726.077908] Tainted: GW I 4.8.0-41-generic #44~16.04.1-Ubuntu [ 726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.094383] kworker/0:1 D 080861bc 0 312 2 0x [ 726.094401] Workqueue: events vmstat_shepherd [ 726.094404] Call trace: [ 726.094411] [] __switch_to+0x94/0xa8 [ 726.094418] [] __schedule+0x224/0x718 [ 726.094421] [] schedule+0x38/0x98 [ 726.094425] [] schedule_preempt_disabled+0x14/0x20 [ 726.094428] [] __mutex_lock_slowpath+0xd4/0x168 [ 726.094431] [] mutex_lock+0x58/0x70 [ 726.094437] [] get_online_cpus+0x44/0x70 [ 726.094440] [] vmstat_shepherd+0x3c/0xe8 [ 726.094446] [] process_one_work+0x150/0x478 [ 726.094449] [] worker_thread+0x50/0x4b8 [ 726.094453] [] kthread+0xec/0x100 [ 726.094456] [] ret_from_fork+0x10/0x40 Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the soft lockup happens with each and every one of them. On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably newer 4.4.0-* too, but due to a regression in the ethernet drivers after 4.4.0-45, we can't test those with ease) under normal conditions, yet running stress-ng leads to the same oops. [1] http://paste.ubuntu.com/24172516/ --- AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Mar 13 19:27 seq crw-rw 1 root audio 116, 33 Mar 13 19:27 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.20.1-0ubuntu2.5 Architecture: arm64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 16.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: GIGABYTE R120-T30 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=vt220 PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 astdrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet splash vt.handoff=7 ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17 RelatedPackageVersions: linux-restricted-modules-4.8.0-41-generic N/A linux-backports-modules-4.8.0-41-generic N/A linux-firmware1.157.8 RfKill: Error: [Errno 2] No such file or directory Tags: xenial Uname: Linux 4.8.0-41-generic aarch64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 11/22/2016 dmi.bios.vendor: GIGABYTE dmi.bios.version: T22 dmi.board.asset.tag: 01234567890123456789AB dmi.board.name: MT30-GS0 dmi.board.vendor: GIGABYTE dmi.board.version: 01234567 dmi.chassis.asset.tag: 01234567890123456789AB dmi.chassis.type: 17 dmi.chassis.vendor: GIGABYTE dmi.chassis.version: 01234567 dmi.modalias: dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567: dmi.product.name: R120-T30 dmi.product.version: 0100 dmi.sys.vendor: GIGABYTE To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1672521] CurrentDmesg.txt
apport information ** Attachment added: "CurrentDmesg.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837213/+files/CurrentDmesg.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels Status in linux package in Ubuntu: Confirmed Bug description: I have been trying to easily reproduce this for days. We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8). In our environment, this was easily triggered on compute nodes, when launching multiple VMs (we suspected OVS, QEMU etc.). However, in order to rule out our specifics, we looked for a simple way to reproduce it on all ThunderX nodes we have access to, and we finally found it: $ apt-get install stress-ng $ stress-ng --hdd 1024 We tested different FW versions, provided by both chip/board manufacturers, and with all of them the result is 100% reproductible, leading to a kernel Oops [1]: [ 726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds. [ 726.077908] Tainted: GW I 4.8.0-41-generic #44~16.04.1-Ubuntu [ 726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.094383] kworker/0:1 D 080861bc 0 312 2 0x [ 726.094401] Workqueue: events vmstat_shepherd [ 726.094404] Call trace: [ 726.094411] [] __switch_to+0x94/0xa8 [ 726.094418] [] __schedule+0x224/0x718 [ 726.094421] [] schedule+0x38/0x98 [ 726.094425] [] schedule_preempt_disabled+0x14/0x20 [ 726.094428] [] __mutex_lock_slowpath+0xd4/0x168 [ 726.094431] [] mutex_lock+0x58/0x70 [ 726.094437] [] get_online_cpus+0x44/0x70 [ 726.094440] [] vmstat_shepherd+0x3c/0xe8 [ 726.094446] [] process_one_work+0x150/0x478 [ 726.094449] [] worker_thread+0x50/0x4b8 [ 726.094453] [] kthread+0xec/0x100 [ 726.094456] [] ret_from_fork+0x10/0x40 Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the soft lockup happens with each and every one of them. On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably newer 4.4.0-* too, but due to a regression in the ethernet drivers after 4.4.0-45, we can't test those with ease) under normal conditions, yet running stress-ng leads to the same oops. [1] http://paste.ubuntu.com/24172516/ --- AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Mar 13 19:27 seq crw-rw 1 root audio 116, 33 Mar 13 19:27 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.20.1-0ubuntu2.5 Architecture: arm64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 16.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: GIGABYTE R120-T30 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=vt220 PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 astdrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet splash vt.handoff=7 ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17 RelatedPackageVersions: linux-restricted-modules-4.8.0-41-generic N/A linux-backports-modules-4.8.0-41-generic N/A linux-firmware1.157.8 RfKill: Error: [Errno 2] No such file or directory Tags: xenial Uname: Linux 4.8.0-41-generic aarch64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 11/22/2016 dmi.bios.vendor: GIGABYTE dmi.bios.version: T22 dmi.board.asset.tag: 01234567890123456789AB dmi.board.name: MT30-GS0 dmi.board.vendor: GIGABYTE dmi.board.version: 01234567 dmi.chassis.asset.tag: 01234567890123456789AB dmi.chassis.type: 17 dmi.chassis.vendor: GIGABYTE dmi.chassis.version: 01234567 dmi.modalias: dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567: dmi.product.name: R120-T30 dmi.product.version: 0100 dmi.sys.vendor: GIGABYTE To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1672521] JournalErrors.txt
apport information ** Attachment added: "JournalErrors.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837214/+files/JournalErrors.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels Status in linux package in Ubuntu: Confirmed Bug description: I have been trying to easily reproduce this for days. We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8). In our environment, this was easily triggered on compute nodes, when launching multiple VMs (we suspected OVS, QEMU etc.). However, in order to rule out our specifics, we looked for a simple way to reproduce it on all ThunderX nodes we have access to, and we finally found it: $ apt-get install stress-ng $ stress-ng --hdd 1024 We tested different FW versions, provided by both chip/board manufacturers, and with all of them the result is 100% reproductible, leading to a kernel Oops [1]: [ 726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds. [ 726.077908] Tainted: GW I 4.8.0-41-generic #44~16.04.1-Ubuntu [ 726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.094383] kworker/0:1 D 080861bc 0 312 2 0x [ 726.094401] Workqueue: events vmstat_shepherd [ 726.094404] Call trace: [ 726.094411] [] __switch_to+0x94/0xa8 [ 726.094418] [] __schedule+0x224/0x718 [ 726.094421] [] schedule+0x38/0x98 [ 726.094425] [] schedule_preempt_disabled+0x14/0x20 [ 726.094428] [] __mutex_lock_slowpath+0xd4/0x168 [ 726.094431] [] mutex_lock+0x58/0x70 [ 726.094437] [] get_online_cpus+0x44/0x70 [ 726.094440] [] vmstat_shepherd+0x3c/0xe8 [ 726.094446] [] process_one_work+0x150/0x478 [ 726.094449] [] worker_thread+0x50/0x4b8 [ 726.094453] [] kthread+0xec/0x100 [ 726.094456] [] ret_from_fork+0x10/0x40 Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the soft lockup happens with each and every one of them. On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably newer 4.4.0-* too, but due to a regression in the ethernet drivers after 4.4.0-45, we can't test those with ease) under normal conditions, yet running stress-ng leads to the same oops. [1] http://paste.ubuntu.com/24172516/ --- AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Mar 13 19:27 seq crw-rw 1 root audio 116, 33 Mar 13 19:27 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.20.1-0ubuntu2.5 Architecture: arm64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 16.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: GIGABYTE R120-T30 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=vt220 PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 astdrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet splash vt.handoff=7 ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17 RelatedPackageVersions: linux-restricted-modules-4.8.0-41-generic N/A linux-backports-modules-4.8.0-41-generic N/A linux-firmware1.157.8 RfKill: Error: [Errno 2] No such file or directory Tags: xenial Uname: Linux 4.8.0-41-generic aarch64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 11/22/2016 dmi.bios.vendor: GIGABYTE dmi.bios.version: T22 dmi.board.asset.tag: 01234567890123456789AB dmi.board.name: MT30-GS0 dmi.board.vendor: GIGABYTE dmi.board.version: 01234567 dmi.chassis.asset.tag: 01234567890123456789AB dmi.chassis.type: 17 dmi.chassis.vendor: GIGABYTE dmi.chassis.version: 01234567 dmi.modalias: dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567: dmi.product.name: R120-T30 dmi.product.version: 0100 dmi.sys.vendor: GIGABYTE To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1672521] ProcModules.txt
apport information ** Attachment added: "ProcModules.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837219/+files/ProcModules.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels Status in linux package in Ubuntu: Confirmed Bug description: I have been trying to easily reproduce this for days. We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8). In our environment, this was easily triggered on compute nodes, when launching multiple VMs (we suspected OVS, QEMU etc.). However, in order to rule out our specifics, we looked for a simple way to reproduce it on all ThunderX nodes we have access to, and we finally found it: $ apt-get install stress-ng $ stress-ng --hdd 1024 We tested different FW versions, provided by both chip/board manufacturers, and with all of them the result is 100% reproductible, leading to a kernel Oops [1]: [ 726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds. [ 726.077908] Tainted: GW I 4.8.0-41-generic #44~16.04.1-Ubuntu [ 726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.094383] kworker/0:1 D 080861bc 0 312 2 0x [ 726.094401] Workqueue: events vmstat_shepherd [ 726.094404] Call trace: [ 726.094411] [] __switch_to+0x94/0xa8 [ 726.094418] [] __schedule+0x224/0x718 [ 726.094421] [] schedule+0x38/0x98 [ 726.094425] [] schedule_preempt_disabled+0x14/0x20 [ 726.094428] [] __mutex_lock_slowpath+0xd4/0x168 [ 726.094431] [] mutex_lock+0x58/0x70 [ 726.094437] [] get_online_cpus+0x44/0x70 [ 726.094440] [] vmstat_shepherd+0x3c/0xe8 [ 726.094446] [] process_one_work+0x150/0x478 [ 726.094449] [] worker_thread+0x50/0x4b8 [ 726.094453] [] kthread+0xec/0x100 [ 726.094456] [] ret_from_fork+0x10/0x40 Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the soft lockup happens with each and every one of them. On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably newer 4.4.0-* too, but due to a regression in the ethernet drivers after 4.4.0-45, we can't test those with ease) under normal conditions, yet running stress-ng leads to the same oops. [1] http://paste.ubuntu.com/24172516/ --- AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Mar 13 19:27 seq crw-rw 1 root audio 116, 33 Mar 13 19:27 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.20.1-0ubuntu2.5 Architecture: arm64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 16.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: GIGABYTE R120-T30 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=vt220 PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 astdrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet splash vt.handoff=7 ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17 RelatedPackageVersions: linux-restricted-modules-4.8.0-41-generic N/A linux-backports-modules-4.8.0-41-generic N/A linux-firmware1.157.8 RfKill: Error: [Errno 2] No such file or directory Tags: xenial Uname: Linux 4.8.0-41-generic aarch64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 11/22/2016 dmi.bios.vendor: GIGABYTE dmi.bios.version: T22 dmi.board.asset.tag: 01234567890123456789AB dmi.board.name: MT30-GS0 dmi.board.vendor: GIGABYTE dmi.board.version: 01234567 dmi.chassis.asset.tag: 01234567890123456789AB dmi.chassis.type: 17 dmi.chassis.vendor: GIGABYTE dmi.chassis.version: 01234567 dmi.modalias: dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567: dmi.product.name: R120-T30 dmi.product.version: 0100 dmi.sys.vendor: GIGABYTE To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1672521] ProcInterrupts.txt
apport information ** Attachment added: "ProcInterrupts.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837218/+files/ProcInterrupts.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels Status in linux package in Ubuntu: Confirmed Bug description: I have been trying to easily reproduce this for days. We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8). In our environment, this was easily triggered on compute nodes, when launching multiple VMs (we suspected OVS, QEMU etc.). However, in order to rule out our specifics, we looked for a simple way to reproduce it on all ThunderX nodes we have access to, and we finally found it: $ apt-get install stress-ng $ stress-ng --hdd 1024 We tested different FW versions, provided by both chip/board manufacturers, and with all of them the result is 100% reproductible, leading to a kernel Oops [1]: [ 726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds. [ 726.077908] Tainted: GW I 4.8.0-41-generic #44~16.04.1-Ubuntu [ 726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.094383] kworker/0:1 D 080861bc 0 312 2 0x [ 726.094401] Workqueue: events vmstat_shepherd [ 726.094404] Call trace: [ 726.094411] [] __switch_to+0x94/0xa8 [ 726.094418] [] __schedule+0x224/0x718 [ 726.094421] [] schedule+0x38/0x98 [ 726.094425] [] schedule_preempt_disabled+0x14/0x20 [ 726.094428] [] __mutex_lock_slowpath+0xd4/0x168 [ 726.094431] [] mutex_lock+0x58/0x70 [ 726.094437] [] get_online_cpus+0x44/0x70 [ 726.094440] [] vmstat_shepherd+0x3c/0xe8 [ 726.094446] [] process_one_work+0x150/0x478 [ 726.094449] [] worker_thread+0x50/0x4b8 [ 726.094453] [] kthread+0xec/0x100 [ 726.094456] [] ret_from_fork+0x10/0x40 Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the soft lockup happens with each and every one of them. On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably newer 4.4.0-* too, but due to a regression in the ethernet drivers after 4.4.0-45, we can't test those with ease) under normal conditions, yet running stress-ng leads to the same oops. [1] http://paste.ubuntu.com/24172516/ --- AlsaDevices: total 0 crw-rw 1 root audio 116, 1 Mar 13 19:27 seq crw-rw 1 root audio 116, 33 Mar 13 19:27 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.20.1-0ubuntu2.5 Architecture: arm64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: DistroRelease: Ubuntu 16.04 IwConfig: Error: [Errno 2] No such file or directory MachineType: GIGABYTE R120-T30 Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=vt220 PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 astdrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet splash vt.handoff=7 ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17 RelatedPackageVersions: linux-restricted-modules-4.8.0-41-generic N/A linux-backports-modules-4.8.0-41-generic N/A linux-firmware1.157.8 RfKill: Error: [Errno 2] No such file or directory Tags: xenial Uname: Linux 4.8.0-41-generic aarch64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: True dmi.bios.date: 11/22/2016 dmi.bios.vendor: GIGABYTE dmi.bios.version: T22 dmi.board.asset.tag: 01234567890123456789AB dmi.board.name: MT30-GS0 dmi.board.vendor: GIGABYTE dmi.board.version: 01234567 dmi.chassis.asset.tag: 01234567890123456789AB dmi.chassis.type: 17 dmi.chassis.vendor: GIGABYTE dmi.chassis.version: 01234567 dmi.modalias: dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567: dmi.product.name: R120-T30 dmi.product.version: 0100 dmi.sys.vendor: GIGABYTE To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1672521] Re: ThunderX: soft lockup on 4.8+ kernels
apport information ** Tags added: apport-collected xenial ** Description changed: I have been trying to easily reproduce this for days. We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8). In our environment, this was easily triggered on compute nodes, when launching multiple VMs (we suspected OVS, QEMU etc.). However, in order to rule out our specifics, we looked for a simple way to reproduce it on all ThunderX nodes we have access to, and we finally found it: $ apt-get install stress-ng $ stress-ng --hdd 1024 We tested different FW versions, provided by both chip/board manufacturers, and with all of them the result is 100% reproductible, leading to a kernel Oops [1]: [ 726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds. [ 726.077908] Tainted: GW I 4.8.0-41-generic #44~16.04.1-Ubuntu [ 726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.094383] kworker/0:1 D 080861bc 0 312 2 0x [ 726.094401] Workqueue: events vmstat_shepherd [ 726.094404] Call trace: [ 726.094411] [] __switch_to+0x94/0xa8 [ 726.094418] [] __schedule+0x224/0x718 [ 726.094421] [] schedule+0x38/0x98 [ 726.094425] [] schedule_preempt_disabled+0x14/0x20 [ 726.094428] [] __mutex_lock_slowpath+0xd4/0x168 [ 726.094431] [] mutex_lock+0x58/0x70 [ 726.094437] [] get_online_cpus+0x44/0x70 [ 726.094440] [] vmstat_shepherd+0x3c/0xe8 [ 726.094446] [] process_one_work+0x150/0x478 [ 726.094449] [] worker_thread+0x50/0x4b8 [ 726.094453] [] kthread+0xec/0x100 [ 726.094456] [] ret_from_fork+0x10/0x40 Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the soft lockup happens with each and every one of them. On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably newer 4.4.0-* too, but due to a regression in the ethernet drivers after 4.4.0-45, we can't test those with ease) under normal conditions, yet running stress-ng leads to the same oops. [1] http://paste.ubuntu.com/24172516/ + --- + AlsaDevices: + total 0 + crw-rw 1 root audio 116, 1 Mar 13 19:27 seq + crw-rw 1 root audio 116, 33 Mar 13 19:27 timer + AplayDevices: Error: [Errno 2] No such file or directory + ApportVersion: 2.20.1-0ubuntu2.5 + Architecture: arm64 + ArecordDevices: Error: [Errno 2] No such file or directory + AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: + DistroRelease: Ubuntu 16.04 + IwConfig: Error: [Errno 2] No such file or directory + MachineType: GIGABYTE R120-T30 + Package: linux (not installed) + PciMultimedia: + + ProcEnviron: + TERM=vt220 + PATH=(custom, no user) + XDG_RUNTIME_DIR= + LANG=en_US.UTF-8 + SHELL=/bin/bash + ProcFB: 0 astdrmfb + ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet splash vt.handoff=7 + ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17 + RelatedPackageVersions: + linux-restricted-modules-4.8.0-41-generic N/A + linux-backports-modules-4.8.0-41-generic N/A + linux-firmware1.157.8 + RfKill: Error: [Errno 2] No such file or directory + Tags: xenial + Uname: Linux 4.8.0-41-generic aarch64 + UpgradeStatus: No upgrade log present (probably fresh install) + UserGroups: + + _MarkForUpload: True + dmi.bios.date: 11/22/2016 + dmi.bios.vendor: GIGABYTE + dmi.bios.version: T22 + dmi.board.asset.tag: 01234567890123456789AB + dmi.board.name: MT30-GS0 + dmi.board.vendor: GIGABYTE + dmi.board.version: 01234567 + dmi.chassis.asset.tag: 01234567890123456789AB + dmi.chassis.type: 17 + dmi.chassis.vendor: GIGABYTE + dmi.chassis.version: 01234567 + dmi.modalias: dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567: + dmi.product.name: R120-T30 + dmi.product.version: 0100 + dmi.sys.vendor: GIGABYTE ** Attachment added: "CRDA.txt" https://bugs.launchpad.net/bugs/1672521/+attachment/4837212/+files/CRDA.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels Status in linux package in Ubuntu: Confirmed Bug description: I have been trying to easily reproduce this for days. We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8). In our environment, this was easily triggered on compute nodes, when launching multiple VMs (we suspected OVS, QEMU etc.). However, in order to rule out our specifics, we looked for a simple
[Kernel-packages] [Bug 1672521] [NEW] ThunderX: soft lockup on 4.8+ kernels
Public bug reported: I have been trying to easily reproduce this for days. We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8). In our environment, this was easily triggered on compute nodes, when launching multiple VMs (we suspected OVS, QEMU etc.). However, in order to rule out our specifics, we looked for a simple way to reproduce it on all ThunderX nodes we have access to, and we finally found it: $ apt-get install stress-ng $ stress-ng --hdd 1024 We tested different FW versions, provided by both chip/board manufacturers, and with all of them the result is 100% reproductible, leading to a kernel Oops [1]: [ 726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds. [ 726.077908] Tainted: GW I 4.8.0-41-generic #44~16.04.1-Ubuntu [ 726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.094383] kworker/0:1 D 080861bc 0 312 2 0x [ 726.094401] Workqueue: events vmstat_shepherd [ 726.094404] Call trace: [ 726.094411] [] __switch_to+0x94/0xa8 [ 726.094418] [] __schedule+0x224/0x718 [ 726.094421] [] schedule+0x38/0x98 [ 726.094425] [] schedule_preempt_disabled+0x14/0x20 [ 726.094428] [] __mutex_lock_slowpath+0xd4/0x168 [ 726.094431] [] mutex_lock+0x58/0x70 [ 726.094437] [] get_online_cpus+0x44/0x70 [ 726.094440] [] vmstat_shepherd+0x3c/0xe8 [ 726.094446] [] process_one_work+0x150/0x478 [ 726.094449] [] worker_thread+0x50/0x4b8 [ 726.094453] [] kthread+0xec/0x100 [ 726.094456] [] ret_from_fork+0x10/0x40 Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the soft lockup happens with each and every one of them. On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably newer 4.4.0-* too, but due to a regression in the ethernet drivers after 4.4.0-45, we can't test those with ease) under normal conditions, yet running stress-ng leads to the same oops. [1] http://paste.ubuntu.com/24172516/ ** Affects: linux (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1672521 Title: ThunderX: soft lockup on 4.8+ kernels Status in linux package in Ubuntu: New Bug description: I have been trying to easily reproduce this for days. We initially observed it in OPNFV Armband, when we tried to upgrade our Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8). In our environment, this was easily triggered on compute nodes, when launching multiple VMs (we suspected OVS, QEMU etc.). However, in order to rule out our specifics, we looked for a simple way to reproduce it on all ThunderX nodes we have access to, and we finally found it: $ apt-get install stress-ng $ stress-ng --hdd 1024 We tested different FW versions, provided by both chip/board manufacturers, and with all of them the result is 100% reproductible, leading to a kernel Oops [1]: [ 726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds. [ 726.077908] Tainted: GW I 4.8.0-41-generic #44~16.04.1-Ubuntu [ 726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.094383] kworker/0:1 D 080861bc 0 312 2 0x [ 726.094401] Workqueue: events vmstat_shepherd [ 726.094404] Call trace: [ 726.094411] [] __switch_to+0x94/0xa8 [ 726.094418] [] __schedule+0x224/0x718 [ 726.094421] [] schedule+0x38/0x98 [ 726.094425] [] schedule_preempt_disabled+0x14/0x20 [ 726.094428] [] __mutex_lock_slowpath+0xd4/0x168 [ 726.094431] [] mutex_lock+0x58/0x70 [ 726.094437] [] get_online_cpus+0x44/0x70 [ 726.094440] [] vmstat_shepherd+0x3c/0xe8 [ 726.094446] [] process_one_work+0x150/0x478 [ 726.094449] [] worker_thread+0x50/0x4b8 [ 726.094453] [] kthread+0xec/0x100 [ 726.094456] [] ret_from_fork+0x10/0x40 Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the soft lockup happens with each and every one of them. On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably newer 4.4.0-* too, but due to a regression in the ethernet drivers after 4.4.0-45, we can't test those with ease) under normal conditions, yet running stress-ng leads to the same oops. [1] http://paste.ubuntu.com/24172516/ To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp