re: panic in evo_wait
> > > [184218.xxx] fatal page fault in supervisor mode > > > [184218.xxx] trap type 6 code 0x2 ... > > > > this line's contents would have included the fault address, > > which is kinda useful for next time :-) > > I've got the rip -- it's 0x8095e177. oh - i was after the "cr2" value -- the actual fault address, not the code address that triggered it. your patch looks good. .mrg.
Re: panic in evo_wait
Hi Matt! On Mon, Jul 18, 2022 at 01:53:49PM +1000, Matthew Green wrote: > > [184218.xxx] warning: > > /usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83: > > 1 > > can you patch this code to print the value of "data" here? > it's probably a bad request for userland, but the BUG_ON() > here does not give you any indication on _what_. Ok, I'll use the attached diff for my next kernel. > > [184218.xxx] uvm_fault(0x8191ba80, 0xb649e46a3000, 2) -> e > > [184218.xxx] fatal page fault in supervisor mode > > [184218.xxx] trap type 6 code 0x2 ... > > this line's contents would have included the fault address, > which is kinda useful for next time :-) I've got the rip -- it's 0x8095e177. > > [184218.xxx] curlpw 0xa8d4e6f36500 pid 27414.3207 lowest kstrack > > 0xb589296452c0 > > kernel: page fault trap, code=0 > > Stopped in pid 27414.3207 (mpv) at netbsd:evo_wait+0x7b: movl $0x2 > > 000,0(%rdx,%rax,1) > > evo_wait() at netbsd:evo_wait+0x7b > > base507c_ntfy_set() > > nv50_wndw_flush_set() > > nv50_disp_atomic_commit_tail() > > nv50_disp_atomic_commit() > > drm_atomic_helper_set_config() > > drm_mode_setcrtc() > > drm_ioctl() > > can you find out where evo_wait+0x7b is? in my kernel it's > at line 243, and the disasm seems to patch your "movl" above. > > 235 evo_wait(struct nv50_dmac *evoc, int nr) > 236 { > 237 struct nv50_dmac *dmac = evoc; > 238 struct nvif_device *device = dmac->base.device; > 239 u32 put = nvif_rd32(>base.user, 0x) / 4; > 240 > 241 spin_lock(>lock); > 242 if (put + nr >= (PAGE_SIZE / 4) - 8) { > 243 dmac->ptr[put] = 0x2000; > 244 evo_flush(dmac); > > Dump of assembler code for function evo_wait: >0x8084dfe1 <+0>: push %rbp > [...] >0x8084e05c <+123>: movl $0x2000,(%rdx,%rax,1) > > (0x7b = 123) exactly: (gdb) 241 spin_lock(>lock); 242 if (put + nr >= (PAGE_SIZE / 4) - 8) { 243 dmac->ptr[put] = 0x2000; 244 evo_flush(dmac); 245 246 nvif_wr32(>base.user, 0x, 0x); 247 if (nvif_msec(device, 2000, 248 if (!nvif_rd32(>base.user, 0x0004)) 249 break; 250 ) < 0) { (gdb) info line *(evo_wait+0x7b) Line 243 of "/disk/6/archive/foreign/src/sys/external/bsd/drm2/dist/drm/nouveau/dispnv50/nouveau_dispnv50_disp.c" starts at address 0x8095e170 and ends at 0x8095e17e . which also matches the rip: (gdb) info line *(0x8095e177) Line 243 of "/disk/6/archive/foreign/src/sys/external/bsd/drm2/dist/drm/nouveau/dispnv50/nouveau_dispnv50_disp.c" starts at address 0x8095e170 and ends at 0x8095e17e . > probably "dmac->ptr" is invalid here. a quick guess at the > code indicates it's only set once in nv50_dmac_create(), > the source from the caller(s). at least, i can't see it > set anywhere else right now. Thomas Index: nouveau_nvkm_engine_disp_headgf119.c === RCS file: /cvsroot/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c,v retrieving revision 1.2 diff -u -r1.2 nouveau_nvkm_engine_disp_headgf119.c --- nouveau_nvkm_engine_disp_headgf119.c18 Dec 2021 23:45:35 - 1.2 +++ nouveau_nvkm_engine_disp_headgf119.c18 Jul 2022 18:36:47 - @@ -80,7 +80,7 @@ case 0: state->or.depth = 18; break; /*XXX: "default" */ default: state->or.depth = 18; - WARN_ON(1); + WARN_ON(data); break; } }
re: panic in evo_wait
> [184218.xxx] warning: > /usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83: > 1 can you patch this code to print the value of "data" here? it's probably a bad request for userland, but the BUG_ON() here does not give you any indication on _what_. > [184218.xxx] uvm_fault(0x8191ba80, 0xb649e46a3000, 2) -> e > [184218.xxx] fatal page fault in supervisor mode > [184218.xxx] trap type 6 code 0x2 ... this line's contents would have included the fault address, which is kinda useful for next time :-) > [184218.xxx] curlpw 0xa8d4e6f36500 pid 27414.3207 lowest kstrack > 0xb589296452c0 > kernel: page fault trap, code=0 > Stopped in pid 27414.3207 (mpv) at netbsd:evo_wait+0x7b: movl $0x2 > 000,0(%rdx,%rax,1) > evo_wait() at netbsd:evo_wait+0x7b > base507c_ntfy_set() > nv50_wndw_flush_set() > nv50_disp_atomic_commit_tail() > nv50_disp_atomic_commit() > drm_atomic_helper_set_config() > drm_mode_setcrtc() > drm_ioctl() can you find out where evo_wait+0x7b is? in my kernel it's at line 243, and the disasm seems to patch your "movl" above. 235 evo_wait(struct nv50_dmac *evoc, int nr) 236 { 237 struct nv50_dmac *dmac = evoc; 238 struct nvif_device *device = dmac->base.device; 239 u32 put = nvif_rd32(>base.user, 0x) / 4; 240 241 spin_lock(>lock); 242 if (put + nr >= (PAGE_SIZE / 4) - 8) { 243 dmac->ptr[put] = 0x2000; 244 evo_flush(dmac); Dump of assembler code for function evo_wait: 0x8084dfe1 <+0>: push %rbp [...] 0x8084e05c <+123>: movl $0x2000,(%rdx,%rax,1) (0x7b = 123) probably "dmac->ptr" is invalid here. a quick guess at the code indicates it's only set once in nv50_dmac_create(), the source from the caller(s). at least, i can't see it set anywhere else right now. .mrg.
panic in evo_wait
Hi! Yesterday I had a panic on 9.99.98/amd64 from June 22 while playing a couple of videos using mpv. Hand-transcribed from the console [184197.xxx] nouveau0: error: bus: MMIO read of FAULT at 409800 [TIMEOUT ] [184199.xxx] nouveau0: warn: timeout [184199.xxx] nouveau0: error: gr: init failed, -16 [184201.xxx] nouveau0: warn: timeout [184203.xxx] nouveau0: warn: timeout [184205.xxx] nouveau0: warn: timeout [184207.xxx] nouveau0: warn: timeout [184209.xxx] nouveau0: warn: timeout [184211.xxx] nouveau0: warn: timeout [184213.xxx] nouveau0: warn: timeout [184215.xxx] nouveau0: warn: timeout [184218.xxx] nouveau0: warn: timeout [184218.xxx] warning: /usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83: 1 [184218.xxx] warning: /usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83: 1 [184218.xxx] warning: /usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83: 1 [184218.xxx] uvm_fault(0x8191ba80, 0xb649e46a3000, 2) -> e [184218.xxx] fatal page fault in supervisor mode [184218.xxx] trap type 6 code 0x2 ... [184218.xxx] curlpw 0xa8d4e6f36500 pid 27414.3207 lowest kstrack 0xb589296452c0 kernel: page fault trap, code=0 Stopped in pid 27414.3207 (mpv) at netbsd:evo_wait+0x7b: movl $0x2 000,0(%rdx,%rax,1) evo_wait() at netbsd:evo_wait+0x7b base507c_ntfy_set() nv50_wndw_flush_set() nv50_disp_atomic_commit_tail() nv50_disp_atomic_commit() drm_atomic_helper_set_config() drm_mode_setcrtc() drm_ioctl() drm_ioctl_shim() sys_ioctl() syscall() --- syscall (number 54) --- Does this ring a bell with anyone? Should I file a PR? Thomas