On 2021/01/28 12:21, Jonathan Gray wrote: > > NMI ... going to debugger > > Stopped at tsc_delay+0x63: lfence > > ddb{0}> trace > > tsc_delay(1) at tsc_delay+0x63 > > r100_ring_test(ffff8000001a4000,ffff8000001a5858) at r100_ring_test+0x277 > > r100_cp_init(ffff8000001a4000,100000) at r100_cp_init+0x5a1 > > r100_startup(ffff8000001a4000) at r100_startup+0x535 > > r100_init(ffff8000001a4000) at r100_init+0x4ac > > radeon_device_init(ffff8000001a4000,ffff800000196800,ffff800000196840,840001) > > a > > t radeon_device_init+0x944 > > radeondrm_attachhook(ffff8000001a4000) at radeondrm_attachhook+0x36 > > config_process_deferred_mountroot() at > > config_process_deferred_mountroot+0x6b > > main(0) at main+0x723 > > end trace frame: 0x0, count: -9 > > I don't understand why an lfence would cause an nmi.
I was thinking that it might not be the lfence triggering it but something that happened just before connected with the video init, and it's just that the tsc_delay/lfence is what's running when it hit .. > Does it still occur with the below diff to change lfence;rdtsc to rdtscp? > This requires RDTSCP which your machine has but bluhm's machine does not. > > Perhaps it is related to some kind of watchdog timer? Can you check if > the ilo event log has any relevant information? > > Index: sys/arch/amd64/include/cpufunc.h > =================================================================== > RCS file: /cvs/src/sys/arch/amd64/include/cpufunc.h,v > retrieving revision 1.36 > diff -u -p -r1.36 cpufunc.h > --- sys/arch/amd64/include/cpufunc.h 13 Sep 2020 11:53:16 -0000 1.36 > +++ sys/arch/amd64/include/cpufunc.h 28 Jan 2021 00:47:16 -0000 > @@ -307,7 +307,8 @@ rdtsc_lfence(void) > { > uint32_t hi, lo; > > - __asm volatile("lfence; rdtsc" : "=d" (hi), "=a" (lo)); > +// __asm volatile("lfence; rdtsc" : "=d" (hi), "=a" (lo)); > + __asm volatile("rdtscp" : "=d" (hi), "=a" (lo) :: "ecx"); > return (((uint64_t)hi << 32) | (uint64_t) lo); > } > >