Update:
I've contacted the mail addresses from the Kai-Heng's post.
Michael Chan (from Broadcom) replied that they've seen similar issues on other 
AMD systems and that they were working with AMD to resolve this.
The plan was to establish contact between me and AMD, unfortunately this never 
happened.  The attempt to contact AMD via the official way (tech support) 
failed because I could not answer AMD's questions without feedback from 
Broadcom, who then also did not reply anymore.

Workaround:
Luckily, with the information that came out of the conversation with Broadcom, 
I was able to troubleshoot a bit myself since I knew at least somewhat where to 
look.
It appears that by setting Advanced -> NB Configuration -> IOMMU to "disabled" 
(default is "Auto") in Supermicro BIOS the problem does not occur anymore.

Since then the whole topic is "stuck".

It's just a workaround and not really a fix, but at least servers
running stable now for me. Since I don't know where the actual problem
is (whether in AMD hardware, bios, kernel, or whatever) so I can't say
if this bug report can be marked as closed or not.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1931106

Title:
  bnxt_en NIC driver crashes IO_PAGE_FAULT

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Hi all,

  We received a bunch of new servers with a Supermicro H12SSL-NT
  mainboard that has an embedded Broadcom BCM57416 NIC.

  On all those servers we observe crashes of the NIC driver (bnxt_en)
  from time to time. We're not able to manually reproduce this issue, it
  just occurs at some point. Also our monitoring does not show any
  irregularities(high traffic flow or sth. like this).

  Syslog: https://pastebin.com/yDAyjHvF

  All servers are running with up-to-date packages:
  $ lsb_release -rd
  Description: Ubuntu 20.04.2 LTS
  Release: 20.04
  $ uname -r
  5.4.0-73-generic 

  It also happens on older kernel versions (tested 5.4.0-66) as well as
  the HWE kernel (tested 5.8.0-55).

  
  Thanks in advance.
  ~ Roman

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1931106/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to