To add a bit more detail (maybe unrelated but with so little evidence everything helps), when thos lockups happen, is the server at least pingable? Some other idea would be, as long as those servers are accessible enough to see whether sysrq combinations are still handled. Though I fear at least for Stéphane that server is somewhere else with probably only ssh (maybe ipmi) access. But if that was possible and working, maybe one could prepare kdump and enable the sysrq crashing combo.
Otherwise, and that again is probably only possible for Luis if his devel servers do not need zfs, it would help to see how various mainline kernels between 4.4 and 4.15 are doing. And in parallel have some "canary" using the latest update. IIRC the one just released had a large portion of upstream stable pulled in. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1799497 Title: 4.15 kernel hard lockup about once a week Status in linux package in Ubuntu: Incomplete Status in linux source package in Bionic: Incomplete Bug description: My main server has been running into hard lockups about once a week ever since I switched to the 4.15 Ubuntu 18.04 kernel. When this happens, nothing is printed to the console, it's effectively stuck showing a login prompt. The system is running with panic=1 on the cmdline but isn't rebooting so the kernel isn't even processing this as a kernel panic. As this felt like a potential hardware issue, I had my hosting provider give me a completely different system, different motherboard, different CPU, different RAM and different storage, I installed that system on 18.04 and moved my data over, a week later, I hit the issue again. We've since also had a LXD user reporting similar symptoms here also on varying hardware: https://github.com/lxc/lxd/issues/5197 My system doesn't have a lot of memory pressure with about 50% of free memory: root@vorash:~# free -m total used free shared buff/cache available Mem: 31819 17574 402 513 13842 13292 Swap: 15909 2687 13222 I will now try to increase console logging as much as possible on the system in the hopes that next time it hangs we can get a better idea of what happened but I'm not too hopeful given the complete silence on the console when this occurs. System is currently on: Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux But I've seen this since the GA kernel on 4.15 so it's not a recent regression. --- ProblemType: Bug AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Oct 23 16:12 seq crw-rw---- 1 root audio 116, 33 Oct 23 16:12 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.4 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied Cannot stat file /proc/22831/fd/10: Permission denied DistroRelease: Ubuntu 18.04 HibernationDevice: RESUME=none CRYPTSETUP=n IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' Lsusb: Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub MachineType: Intel Corporation S1200SP NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=xterm PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 mgadrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-38-generic root=UUID=575c878a-0be6-4806-9c83-28f67aedea65 ro biosdevname=0 net.ifnames=0 panic=1 verbose console=tty0 console=ttyS0,115200n8 ProcVersionSignature: Ubuntu 4.15.0-38.41-generic 4.15.18 RelatedPackageVersions: linux-restricted-modules-4.15.0-38-generic N/A linux-backports-modules-4.15.0-38-generic N/A linux-firmware 1.173.1 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' Tags: bionic Uname: Linux 4.15.0-38-generic x86_64 UnreportableReason: This report is about a package that is not installed. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: _MarkForUpload: False dmi.bios.date: 01/25/2018 dmi.bios.vendor: Intel Corporation dmi.bios.version: S1200SP.86B.03.01.1029.012520180838 dmi.board.asset.tag: Base Board Asset Tag dmi.board.name: S1200SP dmi.board.vendor: Intel Corporation dmi.board.version: H57532-271 dmi.chassis.asset.tag: .................... dmi.chassis.type: 23 dmi.chassis.vendor: ............................... dmi.chassis.version: .................. dmi.modalias: dmi:bvnIntelCorporation:bvrS1200SP.86B.03.01.1029.012520180838:bd01/25/2018:svnIntelCorporation:pnS1200SP:pvr....................:rvnIntelCorporation:rnS1200SP:rvrH57532-271:cvn...............................:ct23:cvr..................: dmi.product.family: Family dmi.product.name: S1200SP dmi.product.version: .................... dmi.sys.vendor: Intel Corporation To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1799497/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp