OK, I tried it.   It still fails in the infiniband infrastructure, but 
in a new way, and from an ib_core function rather than ib_mthca.  There 
were still a couple of shift-out-of-bounds UBSAN warnings, but then an 
attempt to execute in a non-executable page, as if following a trashed 
function pointer.

If there is other debugging information I should gather, please let me 
know.   The kern.log is attached.

On 2/22/23 18:45, Kai-Heng Feng wrote:
> Please test latest mainline kernel:
> https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.2/amd64/
>
> Headers are not needed.
>
> ** Changed in: linux (Ubuntu)
>         Status: Confirmed => Incomplete
>


** Attachment added: "tate-6.2.0-kern.log"
   
https://bugs.launchpad.net/bugs/2007038/+attachment/5649830/+files/tate-6.2.0-kern.log

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2007038

Title:
  22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  I run some x86_64 machines with Infiniband interfaces (Mellanox
  MT25204, ib_mthca driver + ib_ipoib for IP-over-IB).

  This had worked fine for years under Ubuntu 20.04.1 LTS and under
  RHEL6 before it.

  But as soon as I updated to 22.04.1 LTS -- with both its default
  5.15.0-60-generic kernel and also 6.1.0-1006-oem (the latest packaged
  one I could find), the IB interface doesn't work.

  dmesg shows some UBSAN shift-out-of-bounds warnings in mthca modules,
  e.g. "shift exponent -25557 is negative". That's a bizarre number -
  maybe a hint of something uninitialized?

  The crippling symptom shows up within a second after that: a NULL
  dereference within the ib_mthca driver -- the "BUG: kernel NULL
  pointer dereference", in mthca_poll_one.  The interface never sets its
  RUNNING flag (as shown by ifconfig).

  The rest of the system remains usable after the "BUG" message -- the
  ethernet, disk, etc. drivers and other functions work as expected.

  Attempting to unload the ib_mthca driver causes a kernel panic.

  Is there anything I should try?   Should I build a kernel from source
  with debugging?   I could try installing the 5.4.0 kernel from 20.04,
  but would rather use something that will continue to get security
  patches.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.04
  Package: linux-image-5.15.0-60-generic 5.15.0-60.66
  ProcVersionSignature: Ubuntu 5.15.0-60.66-generic 5.15.78
  Uname: Linux 5.15.0-60-generic x86_64
  AlsaDevices:
   total 0
   crw-rw----+ 1 root audio 116,  1 Feb 12 14:12 seq
   crw-rw----+ 1 root audio 116, 33 Feb 12 14:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu82.3
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CasperMD5CheckResult: pass
  Date: Sun Feb 12 14:17:28 2023
  InstallationDate: Installed on 2020-11-22 (812 days ago)
  InstallationMedia: Ubuntu-Server 20.04.1 LTS "Focal Fossa" - Release amd64 
(20200731)
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  MachineType: Supermicro X7DBR-8
  PciMultimedia:
   
  ProcEnviron:
   TERM=linux
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 VESA VGA
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-60-generic 
root=UUID=8624cf02-e743-4da6-9209-14ef2c2abd10 ro
  RelatedPackageVersions:
   linux-restricted-modules-5.15.0-60-generic N/A
   linux-backports-modules-5.15.0-60-generic  N/A
   linux-firmware                             20220329.git681281e4-0ubuntu3.9
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to jammy on 2023-02-10 (2 days ago)
  dmi.bios.date: 12/03/2007
  dmi.bios.vendor: Phoenix Technologies LTD
  dmi.bios.version: 6.00
  dmi.board.name: X7DBR-8
  dmi.board.vendor: Supermicro
  dmi.board.version: PCB Version
  dmi.chassis.type: 1
  dmi.chassis.vendor: Supermicro
  dmi.chassis.version: 0123456789
  dmi.modalias: 
dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd12/03/2007:svnSupermicro:pnX7DBR-8:pvr0123456789:rvnSupermicro:rnX7DBR-8:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:sku:
  dmi.product.name: X7DBR-8
  dmi.product.version: 0123456789
  dmi.sys.vendor: Supermicro

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2007038/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to