(Sorry for the messy quoting, I'm not actually on the list so I didn't see this reply until I thought to check the ML archives)
> If the stack is corrupted the backtrace may or may not be affected. Sure, but it happening every time is pretty surprising to me. > Why not bisect the kernel to find the actual bug? A) I'm going to try booting variously old versions of the kernel, but... B) I don't actually know that there was a version where the problem I'm encountering didn't exist, so it's a relatively open search, and C) Actually compiling kernels on this hardware will take an age each time, so I was hoping to get better insight into the bug through a stacktrace. - Rich On Wed, May 12, 2021 at 10:49 PM Rich <rincebr...@gmail.com> wrote: > > Hi all, > So, I got my earlier system running sparc64 using a terrible method > (from inside the existing sparc install, mount -o remount,ro /; nc -l > | dd of=/dev/sda [...] an image generated in a VM, reboot and pray), > but now I'm doing the thing I actually wanted a sparc64 system for > (testing a kernel module on sparc64), and encountering a problem. > > While running through its test suite, when it runs through a certain > suite of tests, every time (so far) it dies in the same annoying > fashion: > [ 1435.191913] Kernel panic - not syncing: corrupted stack end > detected inside scheduler > [ 1435.294939] CPU: 0 PID: 722 Comm: spl_system_task Tainted: P > OE 5.10.0-6-sparc64 #1 Debian 5.10.28-1 > [ 1435.431126] Call Trace: > [ 1435.463267] Press Stop-A (L1-A) from sun keyboard or send break > [ 1435.463267] twice on console to return to the boot prom > [ 1435.609777] ---[ end Kernel panic - not syncing: corrupted stack > end detected inside scheduler ]--- > > RED State Exception > > TL=0000.0000.0000.0005 TT=0000.0000.0000.0010 > TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506 > TL=0000.0000.0000.0004 TT=0000.0000.0000.0010 > TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506 > TL=0000.0000.0000.0003 TT=0000.0000.0000.0010 > TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506 > TL=0000.0000.0000.0002 TT=0000.0000.0000.0010 > TPC=0000.0000.0040.70d0 TnPC=0000.0000.0040.70d4 TSTATE=0000.0000.8004.1406 > TL=0000.0000.0000.0001 TT=0000.0000.0000.0068 > TPC=0000.0000.0048.bba4 TnPC=0000.0000.0048.bba8 TSTATE=0000.0000.8000.1606 > > > Watchdog Reset > Externally Initiated Reset > ok > > (Sometimes, it winds up so disgruntled, the watchdog reset never > triggers, break twice on the console doesn't work, you need to > physically power cycle it.) > > I'm mostly curious about whether anyone knows why the Call Trace might > be empty - I see the message about corrupted stack end above it, but > from what I can see online, plenty of people get that message and a > call trace printout below it (...on other architectures, at least). > https://lists.debian.org/debian-sparc/2016/09/msg00002.html is even an > example of someone on this very list. > > Does anyone have any insights? Or am I going to have to resort to > printks in random parts of the thread the panic notes and hope I find > the problem? > > Thanks! > - Rich