[Bug 2058557] Re: Kernel panic during checkbox stress_ng_test on Grace running noble 6.8 (arm64+largemem) kernel

2024-05-08 Thread Mitchell Augustin
** Tags removed: verification-needed-noble-linux ** Tags added: verification-done-noble-linux -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2058557 Title: Kernel panic during checkbox

[Bug 2052663] Re: fabric-manager-535 setup fails during install on Grace/Hopper arm64 system running noble

2024-04-24 Thread Mitchell Augustin
This bug no longer appears to be reproducible on noble with the 6.8 generic kernels, so I have marked it as resolved. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2052663 Title: fabric-manager-535

[Bug 2062380] Re: Using a 6.8 kernel 'modprobe nvidia' hangs on Quanta Grace Hopper

2024-04-24 Thread Mitchell Augustin
Compiling the Nvidia drivers with -ffixed-x18 on affected versions is also sufficient to prevent this hang/panic: https://github.com/NVIDIA/open-gpu-kernel-modules diff --git a/src/nvidia-modeset/Makefile b/src/nvidia-modeset/Makefile index 66edbf4e..d49a3bfb 100644 ---

[Bug 2052663] Re: fabric-manager-535 setup fails during install on Grace/Hopper arm64 system running noble

2024-04-24 Thread Mitchell Augustin
** Changed in: fabric-manager-535 (Ubuntu) Assignee: (unassigned) => Mitchell Augustin (mitchellaugustin) ** Changed in: linux (Ubuntu) Assignee: (unassigned) => Mitchell Augustin (mitchellaugustin) ** Changed in: nvidia-graphics-drivers-535-server (Ubuntu) Assignee: (unas

[Bug 2062380] Re: Using a 6.8 kernel 'modprobe nvidia' hangs on Quanta Grace Hopper

2024-04-24 Thread Mitchell Augustin
In trying to determine if core count had any effect on this bug, I set maxcpus to 4 and tried loading the driver on the kernel with the shadow stack enabled (aka the standard -generic config). It looks like the same root issue occurred, but this time, I got a panic with a trace that corroborates

[Bug 2062380] Re: Using a 6.8 kernel 'modprobe nvidia' hangs on Quanta Grace Hopper

2024-04-24 Thread Mitchell Augustin
It looks like this is the relevant option present in the upstream stable 6.8.1 defconfig but not in the 6.8.0-31-generic config that enables the defconfig kernel to load the Nvidia driver: CONFIG_SHADOW_CALL_STACK=n I suspect that the kernel team is not going to want to disable kernel support

[Bug 2062380] Re: Using a 6.8 kernel 'modprobe nvidia' hangs on Quanta Grace Hopper

2024-04-24 Thread Mitchell Augustin
** Changed in: nvidia-graphics-drivers-535-server (Ubuntu) Assignee: (unassigned) => Mitchell Augustin (mitchellaugustin) ** Changed in: nvidia-graphics-drivers-550-server (Ubuntu) Assignee: (unassigned) => Mitchell Augustin (mitchellaugustin) -- You received this bug notifi

[Bug 2058557] Re: Kernel panic during checkbox stress_ng_test on Grace running noble 6.8 (arm64+largemem) kernel

2024-04-09 Thread Mitchell Augustin
Fix has landed upstream: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/aio.c?h=v6.9-rc3=caeb4b0a11b3393e43f7fa8e0a5a18462acc66bd -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 2058557] Re: Kernel panic during checkbox stress_ng_test on Grace running noble 6.8 (arm64+largemem) kernel

2024-04-01 Thread Mitchell Augustin
A fix has been applied to vfs.fixes upstream and should land soon. I have tested this patch and verified that the panic no longer occurs. ** Changed in: linux (Ubuntu) Status: New => Fix Committed -- You received this bug notification because you are a member of Ubuntu Bugs, which is

[Bug 2058557] Re: Kernel panic during checkbox stress_ng_test on Grace running noble 6.8 (arm64+largemem) kernel

2024-03-28 Thread Mitchell Augustin
This issue is still present upstream, so I reported it to the original committer of the patch. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2058557 Title: Kernel panic during checkbox

[Bug 2058557] Re: Kernel panic during checkbox stress_ng_test on Grace running noble 6.8 (arm64+largemem) kernel

2024-03-28 Thread Mitchell Augustin
I have isolated the cause of this bug to this commit: https://git.launchpad.net/~ubuntu- kernel/ubuntu/+source/linux/+git/noble/commit/?h=Ubuntu-6.8.0-20.20=71eb6b6b0ba93b1467bccff57b5de746b09113d2 All versions that I tested before this commit during my bisect passed the aiol test at least 15

[Bug 2058557] Re: Kernel panic during checkbox stress_ng_test on Grace running noble 6.8 (arm64+largemem) kernel

2024-03-26 Thread Mitchell Augustin
It turns out that this issue does not appear with *every* run of the aiol test on affected kernels, so multiple runs of that test may be necessary for the panic to occur. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 2058557] Re: Kernel panic during checkbox stress_ng_test on Grace running noble 6.8 (arm64+largemem) kernel

2024-03-25 Thread Mitchell Augustin
I did some more version testing, and I have not been able to reproduce this bug with the "aiol" stressor on either Upstream 6.5 or Ubuntu 6.5.0-26-generic-64k, so it was evidently introduced after that version. -- You received this bug notification because you are a member of Ubuntu Bugs, which

[Bug 2058557] Re: Kernel panic during checkbox stress_ng_test on Grace running noble 6.8 (arm64+largemem) kernel

2024-03-22 Thread Mitchell Augustin
Earlier, I said that the device mapper observation did not seem to be a hard line - however, further testing now indicates that the situations where I observed panics when stressing nvme0n1 were due to an unrelated bug that is present in the latest 6.5 mainline tree, but *not* the latest 6.5

[Bug 2058557] Re: Kernel panic during checkbox stress_ng_test on Grace running noble 6.8 (arm64+largemem) kernel

2024-03-22 Thread Mitchell Augustin
I did not observe this issue with any other stress_ng disk tests on linux-image-6.8.0-11-generic-64k after 1 full run of the suite with the "aiol" test disabled. (When running the "aiol" test alone, it panicked reliably each time.) -- You received this bug notification because you are a member

[Bug 2058557] Re: Kernel panic during checkbox stress_ng_test on Grace running noble 6.8 (arm64+largemem) kernel

2024-03-21 Thread Mitchell Augustin
Upon further investigation, the device mapper observation does not seem to be a hard line, as I was able to observe panics when stressing both dm-0 and nvme0n1 under different circumstances. At the moment, it also seems like the specific part of stress_ng_test that is the culprit is the

[Bug 2058557] Re: Kernel panic during checkbox stress_ng_test on Grace running noble 6.8 (arm64+largemem) kernel

2024-03-21 Thread Mitchell Augustin
I have observed that this panic does not seem to happen when stressing non-device-mapper devices (ex: it panics when running /usr/lib/checkbox- provider-base/bin/stress_ng_test.py disk --device dm-0 --base-time 240, but completes successfully when running /usr/lib/checkbox-provider-

[Bug 2058557] Re: Kernel panic during checkbox stress_ng_test on Grace running noble 6.8 (arm64+largemem) kernel

2024-03-20 Thread Mitchell Augustin
This is also reproducible on the latest mainline version (https://kernel.ubuntu.com/mainline/v6.8/arm64/, retrieved 20 Mar 2024 @ 5 PM): 20 Mar 22:54: Running stress-ng aiol stressor for 240 seconds... [ 354.451450] Unable to handle kernel paging request at virtual address 17be9b4aa3e187be [

[Bug 2058557] [NEW] Kernel panic during checkbox stress_ng_test on Grace running noble 6.8 (arm64+largemem) kernel

2024-03-20 Thread Mitchell Augustin
Public bug reported: A kernel oops and panic occurred during 22.04 SoC certification on Gunyolk (Grace/Grace) with 6.8 kernel, arm64+largemem variant Steps to reproduce: Run (as root) the following commands: add-apt-repository -y ppa:checkbox-dev/stable apt-add-repository -y