I can see this issue with 5.4.0-124-generic #140~18.04.1-Ubuntu on node
appleton-kernel as well.
After this, it's cpu soft lockup:
[ 19.296854] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion
event for bogus CQ 0x5a5aa9
[ 19.296855] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion
event for bogus CQ 0x5a5aa9
[ 19.296858] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion
event for bogus CQ 0x5a5aa9
[ 19.296860] mlx5_core 0005:01:00.0: mlx5_eq_comp_int:159:(pid 0): Completion
event for bogus CQ 0x5a5aa9
[ 19.347370] mlx5_core 0005:01:00.0 enP5p1s0f0: Link down
[ 19.634790] ixgbe 000a:11:00.0: registered PHC device on enP10p17s0f0
[ 21.492952] hns-nic HISI00C2:00 enahisic2i0: link up
[ 21.492971] IPv6: ADDRCONF(NETDEV_CHANGE): enahisic2i0: link becomes ready
[ 25.794327] EXT4-fs (nvme0n1p2): resizing filesystem from 390571008 to
390572113 blocks
[ 25.794567] EXT4-fs (nvme0n1p2): resized filesystem to 390572113
[ 27.550919] new mount options do not match the existing superblock, will be
ignored
[ 32.692121] fbcon: Taking over console
[ 32.698403] Console: switching to colour frame buffer device 100x37
[ 64.276773] watchdog: BUG: soft lockup - CPU#16 stuck for 22s! [swapper/16:0]
[ 64.283899] Modules linked in: nls_iso8859_1 ipmi_ssif input_leds joydev
ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4
btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear
mlx5_ib hibmc_drm drm_vram_helper ses enclosure ttm hid_generic usbhid
ib_uverbs hid ib_core marvell drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops crct10dif_ce mlx5_core hisi_sas_v2_hw ghash_ce sha2_ce sha256_arm64
ixgbe sha1_ce tls hisi_sas_main nvme xfrm_algo drm megaraid_sas nvme_core mdio
mlxfw libsas ehci_platform scsi_transport_sas hns_dsaf hns_enet_drv hns_mdio
hnae aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
[ 64.283952] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 5.4.0-124-generic
#140~18.04.1-Ubuntu
[ 64.283954] Hardware name: Hisilicon D05/BC11SPCD, BIOS 1.50 06/01/2018
[ 64.283956] pstate: 40400005 (nZcv daif +PAN -UAO)
[ 64.283962] pc : __do_softirq+0x98/0x350
[ 64.283966] lr : irq_exit+0xc0/0xc8
[ 64.283967] sp : ffff8000123b3ef0
[ 64.283969] x29: ffff8000123b3ef0 x28: ffff002fb7193d00
[ 64.283971] x27: 0000000000000000 x26: ffff8000123b4000
[ 64.283972] x25: ffff8000123b0000 x24: ffff001fba073600
[ 64.283974] x23: ffff8000127cbdb0 x22: 0000000000000000
[ 64.283976] x21: 0000000000000282 x20: 0000000000000002
[ 64.283977] x19: ffff800011b84000 x18: ffff800011268830
[ 64.283979] x17: 0000000000000000 x16: 0000000000000000
[ 64.283980] x15: 0000000000000001 x14: ffff002fbb9f21c8
[ 64.283982] x13: 0000000000000004 x12: 0000000000000003
[ 64.283984] x11: 0000000000000000 x10: 0000000000000040
[ 64.283985] x9 : ffff80001208f358 x8 : ffff80001208f350
[ 64.283987] x7 : ffff001fb9002270 x6 : 00000002a698ef5f
[ 64.283989] x5 : 00000000ffff0031 x4 : ffff802fa9e81000
[ 64.283991] x3 : ffff800011b84780 x2 : ffff802fa9e81000
[ 64.283993] x1 : 00000000000000e0 x0 : ffff800011b84780
[ 64.283995] Call trace:
[ 64.283998] __do_softirq+0x98/0x350
[ 64.284000] irq_exit+0xc0/0xc8
[ 64.284003] __handle_domain_irq+0x6c/0xc0
[ 64.284005] gic_handle_irq+0x84/0x2c0
[ 64.284007] el1_irq+0x104/0x1c0
[ 64.284010] arch_cpu_idle+0x34/0x1c0
[ 64.284014] default_idle_call+0x24/0x60
[ 64.284016] do_idle+0x1d8/0x2b8
[ 64.284017] cpu_startup_entry+0x2c/0xb0
[ 64.284020] secondary_start_kernel+0x198/0x288
[ 98.196663] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 98.202575] rcu: 16-....: (3 GPs behind) idle=8fa/0/0x3 softirq=983/983
fqs=7488
[ 98.210133] (detected by 5, t=15002 jiffies, g=4709, q=3243)
[ 98.210134] Task dump for CPU 16:
[ 98.210137] swapper/16 R running task 0 0 1 0x0000002a
[ 98.210140] Call trace:
[ 98.210146] __switch_to+0xcc/0x210
[ 98.210149] 0x0
[ 119.928660] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: {
16-... } 15393 jiffies s: 229 root: 0x2/.
[ 119.939266] rcu: blocking rcu_node structures: l=1:16-31:0x1/.
[ 119.945099] Task dump for CPU 16:
[ 119.945102] swapper/16 R running task 0 0 1 0x0000002a
[ 119.945108] Call trace:
[ 119.945120] __switch_to+0xcc/0x210
[ 119.945127] 0x0
[ 242.808432] INFO: task ureadahead:1097 blocked for more than 120 seconds.
[ 242.815214] Tainted: G L 5.4.0-124-generic
#140~18.04.1-Ubuntu
[ 242.822868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 242.830691] ureadahead D 0 1097 1 0x00000000
[ 242.830695] Call trace:
[ 242.830703] __switch_to+0xcc/0x210
[ 242.830710] __schedule+0x310/0x7a8
[ 242.830712] schedule+0x38/0xa8
[ 242.830714] schedule_timeout+0x228/0x388
[ 242.830716] wait_for_completion+0xf4/0x4b8
[ 242.830719] __wait_rcu_gp+0x170/0x1a8
[ 242.830722] synchronize_rcu+0x68/0x98
[ 242.830725] ring_buffer_read_prepare_sync+0xc/0x18
[ 242.830727] __tracing_open+0x200/0x368
[ 242.830729] tracing_open+0xa4/0xf0
[ 242.830733] do_dentry_open+0x1cc/0x3e0
[ 242.830735] vfs_open+0x38/0x48
[ 242.830738] path_openat+0x2ac/0x1368
[ 242.830740] do_filp_open+0x88/0x108
[ 242.830742] do_sys_open+0x1b4/0x2e8
[ 242.830743] __arm64_sys_openat+0x2c/0x38
[ 242.830746] el0_svc_common.constprop.3+0x80/0x1f8
[ 242.830748] el0_svc_handler+0x34/0xa0
[ 242.830750] el0_svc+0x10/0x180
** Tags added: sru-20220808
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1958952
Title:
ARM64 node dmesg spammed with "mlx5_core 0005:01:00.0:
mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ
0x5a5aa9"
Status in ubuntu-kernel-tests:
New
Status in linux package in Ubuntu:
Confirmed
Bug description:
While investigating the SRU deployment failure, I noticed the dmesg
will be spammed with:
Jan 25 07:48:36 appleton-kernel kernel: [ 22.885627] mlx5_core
0005:01:00.0: mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ
0x5a5aa9
Jan 25 07:48:36 appleton-kernel kernel: [ 22.885628] mlx5_core
0005:01:00.0: mlx5_eq_comp_int:159:(pid 1218): Completion event for bogus CQ
0x5a5aa9
Jan 25 07:48:36 appleton-kernel kernel: [ 22.885629] mlx5_core
0005:01:00.0: mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ
0x5a5aa9
Jan 25 07:48:36 appleton-kernel kernel: [ 22.885631] mlx5_core
0005:01:00.0: mlx5_eq_comp_int:159:(pid 1180): Completion event for bogus CQ
0x5a5aa9
Issue found with Focal 5.4.0-96-generic
Please find attachment for the syslog.
Not sure if this is cause of our deployment issue, but it seems odd to me.
And here is our deployment issue:
1. System successfully deployed with Focal
2. Deployment process hangs with "Enabling PPA" stage
3. I cannot connect to this system manually, ssh hangs (soft lockup maybe?)
after:
Warning: Permanently added '10.229.50.13' (ECDSA) to the list of
known hosts.
ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-96-generic 5.4.0-96.109
ProcVersionSignature: Ubuntu 5.4.0-96.109-generic 5.4.157
Uname: Linux 5.4.0-96-generic aarch64
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Jan 25 07:48 seq
crw-rw---- 1 root audio 116, 33 Jan 25 07:48 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu27.21
Architecture: arm64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq',
'/dev/snd/timer'] failed with exit code 1:
CasperMD5CheckResult: skip
Date: Tue Jan 25 07:53:33 2022
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
Bus 001 Device 004: ID 12d1:0003 Huawei Technologies Co., Ltd.
Bus 001 Device 003: ID 0424:2514 Microchip Technology, Inc. (formerly SMSC)
USB 2.0 Hub
Bus 001 Device 002: ID 0424:2514 Microchip Technology, Inc. (formerly SMSC)
USB 2.0 Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Lsusb-t:
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-platform/2p, 480M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 2: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
|__ Port 1: Dev 4, If 1, Class=Human Interface Device,
Driver=usbhid, 12M
|__ Port 1: Dev 4, If 0, Class=Human Interface Device,
Driver=usbhid, 12M
MachineType: Hisilicon D05
PciMultimedia:
ProcFB: 0 hibmcdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-96-generic
root=UUID=3abb8e5a-2f46-4221-b664-cb02a273a249 ro sysrq_always_enabled
RelatedPackageVersions:
linux-restricted-modules-5.4.0-96-generic N/A
linux-backports-modules-5.4.0-96-generic N/A
linux-firmware 1.187.25
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 06/01/2018
dmi.bios.vendor: Huawei
dmi.bios.version: 1.50
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: BC11SPCD
dmi.board.vendor: Huawei
dmi.board.version: VER.A
dmi.chassis.asset.tag: To be filled by O.E.M.
dmi.chassis.type: 17
dmi.chassis.vendor: Hisilicon
dmi.chassis.version: To be filled by O.E.M.
dmi.modalias:
dmi:bvnHuawei:bvr1.50:bd06/01/2018:svnHisilicon:pnD05:pvrV100R001C00:rvnHuawei:rnBC11SPCD:rvrVER.A:cvnHisilicon:ct17:cvrTobefilledbyO.E.M.:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: D05
dmi.product.sku: To be filled by O.E.M.
dmi.product.version: V100R001C00
dmi.sys.vendor: Hisilicon
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1958952/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp