Bug#1033398: linux-image-amd64: reproducible kernel freeze on 5.19+
Hi all, the fix was merged upstream with https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/maccess.c?id=d319f344561de23e810515d109c7278919bff7b0 - florian On 3/25/23 16:58, Diederik de Haas wrote: Control: found -1 5.19~rc4-1~exp1 Control: forwarded -1 https://lore.kernel.org/bpf/20230118051443.78988-1-alexei.starovoi...@gmail.com/ On Saturday, 25 March 2023 16:00:47 CET Florian Lehner wrote: Via https://snapshot.debian.org/binary/linux-image-amd64/ you can easily test various kernel versions. Could you try whether 5.19~rc4-1~exp1 indeed produces the problem? Yes - I can reproduce the total system freeze with 5.19~rc4-1~exp1 Thanks. Then the most likely case was that it was introduced in the 5.19 merge window and thus also present in 5.19-rc1, but there isn't a prebuild kernel to verify. Since the running program is rather complex, it is not easily possible to carve out a small reproducer. We can provide gdb backtraces from freezes inside qemu. Someone else would have to chime in for the backtraces; that's beyond my skill set. I just learned about https://lore.kernel.org/bpf/20230118051443.78988-1-alexei.starovoitov@gmail. com/. With the provided patch applied I no longer mange to freeze the system. I see you already responded to that thread, excellent :-) Hopefully they'll read this whole bug report, but mentioning that your actual problem was NOT triggered till 5.18, but did trigger from 5.19-rc4 and later, could be useful. I may not fully understand what upstream talked about, but I only saw a reference to a 6.0.0 kernel. Thanks for testing and reporting back :-)
Bug#1033398: linux-image-amd64: reproducible kernel freeze on 5.19+
On Fri, 24 Mar 2023 13:50:15 +0100 Diederik de Haas wrote: On Friday, 24 March 2023 12:44:33 CET Tim Rühsen wrote: > Package: linux-image-amd64 > Version: 6.1.20-1 > > We run a priviledged eBPF based tool with a communication between kernel and > user space. It runs without issues on kernels 4.15 to 5.18. > On kernels 5.19+, the whole system freezes after a few minutes. Via https://snapshot.debian.org/binary/linux-image-amd64/ you can easily test various kernel versions. Could you try whether 5.19~rc4-1~exp1 indeed produces the problem? Yes - I can reproduce the total system freeze with 5.19~rc4-1~exp1 (2022-07-01) from https://snapshot.debian.org/package/linux-signed-amd64/5.19~rc4%2B1~exp1/. > Since the running program is rather complex, it is not easily possible to > carve out a small reproducer. We can provide gdb backtraces from freezes > inside qemu. Someone else would have to chime in for the backtraces; that's beyond my skill set. I just learned about https://lore.kernel.org/bpf/20230118051443.78988-1-alexei.starovoi...@gmail.com/. With the provided patch applied I no longer mange to freeze the system. - florian
Bug#1033398: linux-image-amd64: reproducible kernel freeze on 5.19+
Hi, maybe some additional information. The eBPF program is of type BPF_PROG_TYPE_PERF_EVENT and attached to all CPUs via the perf subsystem and the use of PERF_COUNT_SW_CPU_CLOCK. It is executed on a constant sampling frequency (usually 20 Hz). We also do have qemus guest memory dumps available if this would help investigate the issue. - florian On Fri, 24 Mar 2023 12:44:33 +0100 =?utf-8?q?Tim_R=C3=BChsen?= wrote: Package: linux-image-amd64 Version: 6.1.20-1 Severity: important X-Debbugs-Cc: tim.rueh...@gmx.de Dear Maintainer, * What led up to the situation? We run a priviledged eBPF based tool with a communication between kernel and user space. It runs without issues on kernels 4.15 to 5.18. On kernels 5.19+, the whole system freezes after a few minutes. It seems that with more system activities (load, forks) the freeze happens earlier. The underlying hardware seems to play no role, we could reproduce this on different bare metal systems as well as within a qemu based VM. Since the running program is rather complex, it is not easily possible to carve out a small reproducer. We can provide gdb backtraces from freezes inside qemu. -- System Information: Debian Release: 12.0 APT prefers testing-security APT policy: (500, 'testing-security'), (500, 'testing-debug'), (500, 'unstable'), (500, 'testing'), (1, 'experimental') Architecture: amd64 (x86_64) Kernel: Linux 6.1.0-7-amd64 (SMP w/20 CPU threads; PREEMPT) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=locale: Cannot set LC_ALL to default locale: No such file or directory UTF-8), LANGUAGE=en_US:en Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages linux-image-amd64 depends on: ii linux-image-6.1.0-7-amd64 6.1.20-1 linux-image-amd64 recommends no packages. linux-image-amd64 suggests no packages. -- debconf information: perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = "en_US:en", LC_ALL = (unset), LC_TIME = "en_DE.UTF-8", LC_MONETARY = "en_DE.UTF-8", LC_COLLATE = "en_DE.UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to a fallback locale ("en_US.UTF-8"). locale: Cannot set LC_ALL to default locale: No such file or directory