You have been subscribed to a public bug: ---Problem Description--- During PCIe recovery on Mensa card second port fails recovery. First port recovers, second requires users to recover Physical function. ---Additional Hardware Info--- No additional setup, only PF are defined before inject. No traffic was running. ---uname output--- Linux t249sb1 5.15.0-60-generic #66-Ubuntu SMP Fri Jan 20 14:30:43 UTC 2023 s390x s390x s390x GNU/Linux Machine Type = 3932-AGZ ---Debugger--- A debugger is not configured ---Steps to Reproduce--- Inject error on the PBU attach to the mensa card. miep pbu 1800 etu_txe_eir store 00200000_00000000. Will cause AIB Data Bus UE ECC Error. Checking Linux distro, port 0 recover but port 1 is in unusable state. Stack trace output: no Oops output: no System Dump Info: The system is not configured to capture a system dump. *Additional Instructions for Schayne Bellrose/schayne.bellro...@ibm.com: -Post a private note with access information to the machine that the bug is occuring on. -Attach sysctl -a output output to the bug.
[ 3135.702386] ------------[ cut here ]------------ [ 3135.702387] refcount_t: underflow; use-after-free. [ 3135.702409] WARNING: CPU: 10 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0x138/0x210 [ 3135.702416] Modules linked in: binfmt_misc s390_trng chsc_sch vfio_ccw eadm_sch zcrypt_cex4 mdev vfio_iommu_type1 vfio sch_fq_codel drm i2c_core drm_panel_orientation_quirks ip_tables x_tables btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear mlx5_ib ib_uverbs ib_core dm_service_time qeth_l2 bridge stp llc mlx5_core pkey zcrypt crc32_vx_s390 ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 tls sha1_s390 sha_common mlxfw psample zfcp ptp qeth pps_core qdio scsi_transport_fc ccwgroup scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath [ 3135.702459] CPU: 10 PID: 0 Comm: swapper/10 Not tainted 5.15.0-60-generic #66-Ubuntu [ 3135.702461] Hardware name: IBM 3932 AGZ Z06 (LPAR) [ 3135.702462] Krnl PSW : 0404c00180000000 0000000269ab050c (refcount_warn_saturate+0x13c/0x210) [ 3135.702465] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 3135.702468] Krnl GPRS: 0000000080010000 0000000700000027 0000000000000026 000000026a835560 [ 3135.702469] 00000380006fb710 00000380006fb708 0000000000000000 0000000096b4c180 [ 3135.702470] 0000000000000001 0400000000000001 0000000096b4c200 000000026a7c5578 [ 3135.702472] 00000000802f6c00 0000000000000001 0000000269ab0508 00000380006fb930 [ 3135.702477] Krnl Code: 0000000269ab04fc: c020003b89f6 larl %r2,000000026a2218e8 0000000269ab0502: c0e50021f267 brasl %r14,0000000269eee9d0 #0000000269ab0508: af000000 mc 0,0 >0000000269ab050c: a7f4ffab brc 15,0000000269ab0462 0000000269ab0510: c0b00068a834 larl %r11,000000026a7c5578 0000000269ab0516: 43a0b00a ic %r10,10(%r11) 0000000269ab051a: 42a0f0a7 stc %r10,167(%r15) 0000000269ab051e: 9501f0a7 cli 167(%r15),1 [ 3135.702488] Call Trace: [ 3135.702489] [<0000000269ab050c>] refcount_warn_saturate+0x13c/0x210 [ 3135.702492] ([<0000000269ab0508>] refcount_warn_saturate+0x138/0x210) [ 3135.702494] [<000003ff80dd8930>] cmd_ent_put+0xf0/0x100 [mlx5_core] [ 3135.702598] [<000003ff80dd9e52>] mlx5_cmd_comp_handler+0x382/0x590 [mlx5_core] [ 3135.702661] [<000003ff80dda092>] cmd_comp_notifier+0x32/0x50 [mlx5_core] [ 3135.702723] [<00000002694ea04e>] atomic_notifier_call_chain+0x4e/0x90 [ 3135.702727] [<000003ff80de029c>] mlx5_eq_async_int+0x12c/0x320 [mlx5_core] [ 3135.702789] [<00000002694ea04e>] atomic_notifier_call_chain+0x4e/0x90 [ 3135.702790] [<000003ff80df3c4e>] irq_int_handler+0x2e/0x40 [mlx5_core] [ 3135.702850] [<000000026953fb1c>] __handle_irq_event_percpu+0x6c/0x210 [ 3135.702853] [<000000026953fcf0>] handle_irq_event_percpu+0x30/0x80 [ 3135.702854] [<0000000269545d28>] handle_percpu_irq+0x68/0x90 [ 3135.702857] [<000000026953e79e>] generic_handle_irq+0x3e/0x60 [ 3135.702858] [<00000002694aba9c>] zpci_floating_irq_handler+0xdc/0x170 [ 3135.702862] [<0000000269eb4de2>] do_airq_interrupt+0x92/0x100 [ 3135.702864] [<000000026953fb1c>] __handle_irq_event_percpu+0x6c/0x210 [ 3135.702866] [<000000026953fcf0>] handle_irq_event_percpu+0x30/0x80 [ 3135.702867] [<0000000269545d28>] handle_percpu_irq+0x68/0x90 [ 3135.702869] [<000000026953e79e>] generic_handle_irq+0x3e/0x60 [ 3135.702870] [<0000000269435e26>] do_irq_async+0x56/0xb0 [ 3135.702872] [<0000000269ef836a>] do_io_irq+0xba/0x150 [ 3135.702875] [<0000000269f0592c>] io_int_handler+0xdc/0x110 [ 3135.702879] [<0000000269f059a6>] psw_idle_exit+0x0/0xa [ 3135.702881] ([<000000026942c3f0>] arch_cpu_idle+0x40/0xd0) [ 3135.702883] [<0000000269f05302>] default_idle_call+0x42/0x110 [ 3135.702884] [<0000000269504542>] do_idle+0xd2/0x160 [ 3135.702887] [<0000000269504796>] cpu_startup_entry+0x36/0x40 [ 3135.702889] [<0000000269f05c8e>] restart_int_handler+0x6e/0x90 [ 3135.702891] Last Breaking-Event-Address: [ 3135.702891] [<00000380006fb7e0>] 0x380006fb7e0 [ 3135.702893] ---[ end trace 2c5fbdbb204a97e3 ]--- ** Affects: linux (Ubuntu) Importance: Undecided Assignee: Skipper Bug Screeners (skipper-screen-team) Status: New ** Tags: architecture-s39064 bugnameltc-201679 severity-medium targetmilestone-inin--- -- [UBUNTU 22.04] PCIe Mensa card: Automatic recovery on second port fails - operator intervention required https://bugs.launchpad.net/bugs/2007967 You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp