[Bug 1921355] Re: cgroups related kernel panics
Greetings! No luck with 5.4.0-80.90, still getting the same bug as before even on kernel version 5.4.0-86. Still no clue on how to reproduce it – hypervisor nodes just randomly crash. I have attached dmesg of the most recent encounter, but it seems identical to previous ones. Here is fresh crash dump – https://drive.google.com/file/d/1skA238DVtxpY8t8ANdzX1gBC8muChxto/view?usp=sharing ** Attachment added: "crash-260122.log" https://bugs.launchpad.net/ubuntu/+source/linux-hwe-5.4/+bug/1921355/+attachment/5557608/+files/crash-260122.log -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1921355 Title: cgroups related kernel panics To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1921355/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1921355] Re: cgroups related kernel panics
Can you please give 5.4.0-80.90 a try? ** Changed in: linux (Ubuntu) Status: Confirmed => Incomplete -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1921355 Title: cgroups related kernel panics To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1921355/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1921355] Re: cgroups related kernel panics
Hello! Actually, we got a surprising behavior. Shortly after communication in this thread, the bug just disappeared, for nearly two months. Still had no luck reproducing. We used this opportunity to migrate and reboot part of our servers to activate kdump on them, and decided to wait. A couple of days ago one of our hypervisors hung, and we got our crash kernel dump :) Kernel version was 5.4.0-73-generic this time. Now that we have it, could somebody please have a look at it? The file is quite large, ~2.5 GB (3.2 GB unpacked) https://drive.google.com/file/d/1JVMWJpXNeou06UxqJwl5wjbLKzcb2rOq/view?usp=sharing ** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1921355 Title: cgroups related kernel panics To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1921355/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1921355] Re: cgroups related kernel panics
Thank you all for your ideas! Sure, we do have some modules not from the kernel source tree. These are Mellanox (our NICs) and OpenvSwitch, as we've had some problems that were fixed in the newer driver versions. We don't have apport enabled, and actually, the hypervisor nodes don't even have direct access to the internet (only some VMs on them). I checked on a test VM what kind of info it collects, and it seems that these are the arch, kernel version, and the stack trace. That kind of info is attached manually, we have netconsole enabled that collected it. When the issue started, it was even reproducible on the then-latest kernel (5.4.0-66), so I'm not sure that simply upgrading can help. Currently I'm working on integrating kdump into our infrastructure, trying to reproduce again, and I'll also try to schedule migration + upgrade for our hypervisor node (that's not fast though). -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1921355 Title: cgroups related kernel panics To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1921355/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1921355] Re: cgroups related kernel panics
Can you collect and upload logs per the previous comment? I've googled around some but nothing jumped out. This will be difficult without a reproducer. Have you tried the latest HWE kernel 5.4.0-71.79~18.04.1? Is there any chance you can enable kdump? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1921355 Title: cgroups related kernel panics To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1921355/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1921355] Re: cgroups related kernel panics
** Also affects: linux (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1921355 Title: cgroups related kernel panics To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1921355/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1921355] Re: cgroups related kernel panics
CPU: 0 PID: 1 Comm: systemd Tainted: G OE 5.4.0-66-generic #74~18.04.2-Ubuntu The stand-out info in the log fragments is the kernel is tainted with GPL (G) unsigned (E) out-of-tree (O) modules: openvswitch(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) mlx_compat(OE) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1921355 Title: cgroups related kernel panics To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-hwe-5.4/+bug/1921355/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1921355] Re: cgroups related kernel panics
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: linux-hwe-5.4 (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1921355 Title: cgroups related kernel panics To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-hwe-5.4/+bug/1921355/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1921355] Re: cgroups related kernel panics
** Attachment added: "crash-160321.log" https://bugs.launchpad.net/ubuntu/+source/linux-hwe-5.4/+bug/1921355/+attachment/5480851/+files/crash-160321.log -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1921355 Title: cgroups related kernel panics To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-hwe-5.4/+bug/1921355/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1921355] Re: cgroups related kernel panics
** Attachment added: "crash-080321.log" https://bugs.launchpad.net/ubuntu/+source/linux-hwe-5.4/+bug/1921355/+attachment/5480849/+files/crash-080321.log -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1921355 Title: cgroups related kernel panics To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-hwe-5.4/+bug/1921355/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1921355] Re: cgroups related kernel panics
** Attachment added: "crash-110321.log" https://bugs.launchpad.net/ubuntu/+source/linux-hwe-5.4/+bug/1921355/+attachment/5480850/+files/crash-110321.log -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1921355 Title: cgroups related kernel panics To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-hwe-5.4/+bug/1921355/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs