Glad it is not just me. I've acquired some other IB cards (Mellanox MHJH29-XTC X5 and an Oracle 7046442) and hope to try them against the later kernels' IB drivers too, but haven't had the time to take down the server yet.
On 7/6/23 09:53, Shurak wrote: > Hello! > > > Same problem here: Ubuntu 22.04 (kernel 5.15.0-76-generic) with mother > S3200SHC (with latest fw) and pci-e card (with latest fw 1.2.0) > > 01:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx > HCA] (rev 20) > > I can submit any report needed (just tell me the link to the procedure > or the console commands) > > Thank you very much > Best Regards > -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2007038 Title: 22.04 ib_mthca BUG: kernel NULL pointer, but had worked in 20.04 Status in linux package in Ubuntu: Expired Bug description: I run some x86_64 machines with Infiniband interfaces (Mellanox MT25204, ib_mthca driver + ib_ipoib for IP-over-IB). This had worked fine for years under Ubuntu 20.04.1 LTS and under RHEL6 before it. But as soon as I updated to 22.04.1 LTS -- with both its default 5.15.0-60-generic kernel and also 6.1.0-1006-oem (the latest packaged one I could find), the IB interface doesn't work. dmesg shows some UBSAN shift-out-of-bounds warnings in mthca modules, e.g. "shift exponent -25557 is negative". That's a bizarre number - maybe a hint of something uninitialized? The crippling symptom shows up within a second after that: a NULL dereference within the ib_mthca driver -- the "BUG: kernel NULL pointer dereference", in mthca_poll_one. The interface never sets its RUNNING flag (as shown by ifconfig). The rest of the system remains usable after the "BUG" message -- the ethernet, disk, etc. drivers and other functions work as expected. Attempting to unload the ib_mthca driver causes a kernel panic. Is there anything I should try? Should I build a kernel from source with debugging? I could try installing the 5.4.0 kernel from 20.04, but would rather use something that will continue to get security patches. ProblemType: Bug DistroRelease: Ubuntu 22.04 Package: linux-image-5.15.0-60-generic 5.15.0-60.66 ProcVersionSignature: Ubuntu 5.15.0-60.66-generic 5.15.78 Uname: Linux 5.15.0-60-generic x86_64 AlsaDevices: total 0 crw-rw----+ 1 root audio 116, 1 Feb 12 14:12 seq crw-rw----+ 1 root audio 116, 33 Feb 12 14:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu82.3 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CasperMD5CheckResult: pass Date: Sun Feb 12 14:17:28 2023 InstallationDate: Installed on 2020-11-22 (812 days ago) InstallationMedia: Ubuntu-Server 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731) IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' MachineType: Supermicro X7DBR-8 PciMultimedia: ProcEnviron: TERM=linux PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 VESA VGA ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-60-generic root=UUID=8624cf02-e743-4da6-9209-14ef2c2abd10 ro RelatedPackageVersions: linux-restricted-modules-5.15.0-60-generic N/A linux-backports-modules-5.15.0-60-generic N/A linux-firmware 20220329.git681281e4-0ubuntu3.9 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to jammy on 2023-02-10 (2 days ago) dmi.bios.date: 12/03/2007 dmi.bios.vendor: Phoenix Technologies LTD dmi.bios.version: 6.00 dmi.board.name: X7DBR-8 dmi.board.vendor: Supermicro dmi.board.version: PCB Version dmi.chassis.type: 1 dmi.chassis.vendor: Supermicro dmi.chassis.version: 0123456789 dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd12/03/2007:svnSupermicro:pnX7DBR-8:pvr0123456789:rvnSupermicro:rnX7DBR-8:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:sku: dmi.product.name: X7DBR-8 dmi.product.version: 0123456789 dmi.sys.vendor: Supermicro To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2007038/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp