On Wed, Jan 27, 2021 at 04:16:50PM +0000, Stuart Henderson wrote: > On 2021/01/27 09:03, Bryan Steele wrote: > > On Wed, Jan 27, 2021 at 07:11:49AM +0100, alf wrote: > > > Hello, > > > > > > while trying to upgrade one of our machines to 6.8 we experienced a > > > repeatable crash while booting (bsd.rd + install went fine). > > > > > > The machine in question is a: > > > ... > > > hw.vendor=HP > > > hw.product=ProLiant DL360 G7 > > > hw.serialno=CZ3451KJW6 > > > hw.uuid=36333337-3738-435a-3334-35314b4a5736 > > > hw.physmem=8562860032 > > > hw.usermem=8562847744 > > > hw.ncpufound=12 > > > hw.allowpowerdown=1 > > > hw.perfpolicy=manual > > > hw.smt=0 > > > hw.ncpuonline=6 > > > ... > > > > > > Since this is a production machine we downgraded to 6.7 (upgrade from > > > 6.6 which it was running before went flawlessly). > > > > > > Find below the dmesg of the 6.8 kernel, 6.8-current and finally the > > > 6.7 kernel. For the 6.8* I also provided 'trace' and 'show registers' > > > output. > > > > > > I hope this is enough info to get an idea of what was going on. > > > I'll happily will provide additional info if needed. > > > > > > Alf > > > > > > [SNIP] > > > ... > > > radeondrm0: RV100 > > > NMI ... going to debugger > > > > The machine did not panic, instead it received an NMI (non-maskable > > interrupt), this could be a sign of hardware failure. > > radeondrm was updated in the 6.7 -> 6.8 window too. I wonder if it still > occurs with radeondrm disabled. alf, if you have time to test, you can > at least see if a kernel boots without updating the rest of the OS; fetch > a 6.8 bsd.mp on the existing system (e.g. as /bsd.mp.68), reboot, at > the boot loader prompt "boot -c bsd.mp.68", "disable radeondrm", "quit".
Since this is a production machine I'll need to find a time window for this test. Actually I thought about disabling radeondrm however since I didn't use config(8) for years I misremembered it and typed -s which of course didn't help:) Alf > > > -Bryan. > > > > > Stopped at tsc_delay+0x66: rdtsc > > > ddb{0}> trace > > > tsc_delay(1) at tsc_delay+0x66 > > > r100_ring_test(ffff8000001a5000,ffff8000001a6938) at r100_ring_test+0x228 > > > r100_cp_init(ffff8000001a5000,100000) at r100_cp_init+0x499 > > > r100_startup(ffff8000001a5000) at r100_startup+0x457 > > > r100_init(ffff8000001a5000) at r100_init+0x3f8 > > > radeon_device_init(ffff8000001a5000,ffff800000198800,ffff800000198850,840001) > > > a > > > t radeon_device_init+0x963 > > > radeondrm_attachhook(ffff8000001a5000) at radeondrm_attachhook+0x36 > > > config_process_deferred_mountroot() at > > > config_process_deferred_mountroot+0x6b > > > main(0) at main+0x733 > > > end trace frame: 0x0, count: -9 > > > ddb{0}> shw ow registers > > > rdi 0x1 > > > rsi 0x45d5418924 > > > rbp 0xffffffff82520cd0 end+0x120cd0 > > > rbx 0xc8000400 > > > rdx 0x4500000000 > > > rcx 0xa6a > > > rax 0x1fe > > > r8 0x5 > > > r9 0x7f7fffffc000 > > > r10 0xda85623203c11f01 > > > r11 0xd6f674e06cf62a5b > > > r12 0xcafedead > > > r13 0xffff8000001a54c0 > > > r14 0xffff8000001a5000 > > > r15 0x1 > > > rip 0xffffffff81131ec6 tsc_delay+0x66 > > > cs 0x8 > > > rflags 0x283 > > > rsp 0xffffffff82520cc0 end+0x120cc0 > > > ss 0x10 > > > tsc_delay+0x66: rdtsc > > > ddb{0}> re boot rebvoo oot > > > rebooting... > > > >