I am seeing this error on a Lenovo  ThinkSystem SR665 V3 with a BCM57416
NetXtreme-E Dual-Media 10G RDMA Ethernet Controller.

It appears that the driver is unable to communicate with the network
card's firmware, causing it to fail. This is appearing in the
6.8.0-79-generic #79-Ubuntu  with the bnxt_en driver.  This appears to
be triggered when the  bnxt_re (RoCE) module is loaded. It's this
specific interaction that seems to trigger the firmware stall.

uname -a
Linux romano 6.8.0-79-generic #79-Ubuntu SMP PREEMPT_DYNAMIC Tue Aug 12 
14:42:46 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

As a workaround I blackisted the bnxt_re driver and the system booted
and the BCM57416 NIC is working.

** Attachment added: "bcm57416_boot_log.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2099708/+attachment/5905327/+files/bcm57416_boot_log.txt

** Changed in: linux (Ubuntu)
       Status: Invalid => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2099708

Title:
  Broadcom RDMA over Converged Ethernet driver bnxt_re stalling on 24.04

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  We have a set of servers, equipped with Broadcom BCM57414 NetXtreme-E
  10Gb/25Gb RDMA Ethernet Controllers.

  Booting Ubuntu 24.04 (on 6.8 kernel) on these machines leads to the
  bnxt_re driver stalling during boot, and outputting the following
  kernel log:

      bnxt_en 0000:ab:00.0: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. 
cmdq[0xf]=0x3 waited (102721 > 100000) msec active 1
      bnxt_en 0000:ab:00.0 bnxt_re0: Failed to modify HW QP
      infiniband bnxt_re0: Couldn't change QP1 state to INIT: -110
      infiniband bnxt_re0: Couldn't start port
      bnxt_en 0000:ab:00.0 bnxt_re0: Failed to destroy HW QP

  This causes systemd-udev-settle.service to fail:

      udevadm[1212]: Timed out for waiting the udev queue being empty.

  After this point, if the machine is PXE booting and/ or provisioning
  via MaaS (which is the case), the provisioning basically fails.

  The current workaround is to disable RDMA in the BIOS, thus avoiding loading 
bnxt_re, I believe.
  This behavior doesn't seem to affect Ubuntu 22.04 with 5.15 kernel.

  The following Blog seems to explain this issue in great detail:
  https://utcc.utoronto.ca/~cks/space/blog/linux/BroadcomNetworkDriverAndRDMA

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2099708/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to