re: panic in evo_wait

2022-07-18 Thread matthew green
> > > [184218.xxx] fatal page fault in supervisor mode
> > > [184218.xxx] trap type 6 code 0x2 ...
> > 
> > this line's contents would have included the fault address,
> > which is kinda useful for next time :-)
>
> I've got the rip -- it's 0x8095e177.

oh - i was after the "cr2" value -- the actual fault address,
not the code address that triggered it.

your patch looks good.


.mrg.


Re: panic in evo_wait

2022-07-18 Thread Thomas Klausner
Hi Matt!

On Mon, Jul 18, 2022 at 01:53:49PM +1000, Matthew Green wrote:
> > [184218.xxx] warning: 
> > /usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83:
> >  1
> 
> can you patch this code to print the value of "data" here?
> it's probably a bad request for userland, but the BUG_ON()
> here does not give you any indication on _what_.

Ok, I'll use the attached diff for my next kernel.

> > [184218.xxx] uvm_fault(0x8191ba80, 0xb649e46a3000, 2) -> e
> > [184218.xxx] fatal page fault in supervisor mode
> > [184218.xxx] trap type 6 code 0x2 ...
> 
> this line's contents would have included the fault address,
> which is kinda useful for next time :-)

I've got the rip -- it's 0x8095e177.

> > [184218.xxx] curlpw 0xa8d4e6f36500 pid 27414.3207 lowest kstrack 
> > 0xb589296452c0
> > kernel: page fault trap, code=0
> > Stopped in pid 27414.3207 (mpv) at netbsd:evo_wait+0x7b: movl $0x2
> > 000,0(%rdx,%rax,1)
> > evo_wait() at netbsd:evo_wait+0x7b
> > base507c_ntfy_set()
> > nv50_wndw_flush_set()
> > nv50_disp_atomic_commit_tail()
> > nv50_disp_atomic_commit()
> > drm_atomic_helper_set_config()
> > drm_mode_setcrtc()
> > drm_ioctl()
> 
> can you find out where evo_wait+0x7b is?  in my kernel it's
> at line 243, and the disasm seems to patch your "movl" above.
>
> 235 evo_wait(struct nv50_dmac *evoc, int nr)
> 236 {
> 237 struct nv50_dmac *dmac = evoc;
> 238 struct nvif_device *device = dmac->base.device;
> 239 u32 put = nvif_rd32(>base.user, 0x) / 4;
> 240
> 241 spin_lock(>lock);
> 242 if (put + nr >= (PAGE_SIZE / 4) - 8) {
> 243 dmac->ptr[put] = 0x2000;
> 244 evo_flush(dmac);
> 
> Dump of assembler code for function evo_wait:
>0x8084dfe1 <+0>:   push   %rbp
> [...]
>0x8084e05c <+123>: movl   $0x2000,(%rdx,%rax,1)
> 
> (0x7b = 123)

exactly:

(gdb) 
241 spin_lock(>lock);
242 if (put + nr >= (PAGE_SIZE / 4) - 8) {
243 dmac->ptr[put] = 0x2000;
244 evo_flush(dmac);
245
246 nvif_wr32(>base.user, 0x, 0x);
247 if (nvif_msec(device, 2000,
248 if (!nvif_rd32(>base.user, 0x0004))
249 break;
250 ) < 0) {
(gdb) info line *(evo_wait+0x7b)
Line 243 of 
"/disk/6/archive/foreign/src/sys/external/bsd/drm2/dist/drm/nouveau/dispnv50/nouveau_dispnv50_disp.c"
 starts at address 0x8095e170  and ends at 
0x8095e17e .

which also matches the rip:

(gdb) info line *(0x8095e177)
Line 243 of 
"/disk/6/archive/foreign/src/sys/external/bsd/drm2/dist/drm/nouveau/dispnv50/nouveau_dispnv50_disp.c"
 starts at address 0x8095e170  and ends at 
0x8095e17e .

> probably "dmac->ptr" is invalid here.  a quick guess at the
> code indicates it's only set once in nv50_dmac_create(),
> the source from the caller(s).  at least, i can't see it
> set anywhere else right now.

 Thomas
Index: nouveau_nvkm_engine_disp_headgf119.c
===
RCS file: 
/cvsroot/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c,v
retrieving revision 1.2
diff -u -r1.2 nouveau_nvkm_engine_disp_headgf119.c
--- nouveau_nvkm_engine_disp_headgf119.c18 Dec 2021 23:45:35 -  
1.2
+++ nouveau_nvkm_engine_disp_headgf119.c18 Jul 2022 18:36:47 -
@@ -80,7 +80,7 @@
case 0: state->or.depth = 18; break; /*XXX: "default" */
default:
state->or.depth = 18;
-   WARN_ON(1);
+   WARN_ON(data);
break;
}
 }


re: panic in evo_wait

2022-07-17 Thread matthew green
> [184218.xxx] warning: 
> /usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83:
>  1

can you patch this code to print the value of "data" here?
it's probably a bad request for userland, but the BUG_ON()
here does not give you any indication on _what_.

> [184218.xxx] uvm_fault(0x8191ba80, 0xb649e46a3000, 2) -> e
> [184218.xxx] fatal page fault in supervisor mode
> [184218.xxx] trap type 6 code 0x2 ...

this line's contents would have included the fault address,
which is kinda useful for next time :-)

> [184218.xxx] curlpw 0xa8d4e6f36500 pid 27414.3207 lowest kstrack 
> 0xb589296452c0
> kernel: page fault trap, code=0
> Stopped in pid 27414.3207 (mpv) at netbsd:evo_wait+0x7b: movl $0x2
> 000,0(%rdx,%rax,1)
> evo_wait() at netbsd:evo_wait+0x7b
> base507c_ntfy_set()
> nv50_wndw_flush_set()
> nv50_disp_atomic_commit_tail()
> nv50_disp_atomic_commit()
> drm_atomic_helper_set_config()
> drm_mode_setcrtc()
> drm_ioctl()

can you find out where evo_wait+0x7b is?  in my kernel it's
at line 243, and the disasm seems to patch your "movl" above.

235 evo_wait(struct nv50_dmac *evoc, int nr)
236 {
237 struct nv50_dmac *dmac = evoc;
238 struct nvif_device *device = dmac->base.device;
239 u32 put = nvif_rd32(>base.user, 0x) / 4;
240
241 spin_lock(>lock);
242 if (put + nr >= (PAGE_SIZE / 4) - 8) {
243 dmac->ptr[put] = 0x2000;
244 evo_flush(dmac);

Dump of assembler code for function evo_wait:
   0x8084dfe1 <+0>:   push   %rbp
[...]
   0x8084e05c <+123>: movl   $0x2000,(%rdx,%rax,1)

(0x7b = 123)

probably "dmac->ptr" is invalid here.  a quick guess at the
code indicates it's only set once in nv50_dmac_create(),
the source from the caller(s).  at least, i can't see it
set anywhere else right now.


.mrg.


panic in evo_wait

2022-07-17 Thread Thomas Klausner
Hi!

Yesterday I had a panic on 9.99.98/amd64 from June 22 while playing a
couple of videos using mpv. Hand-transcribed from the console

[184197.xxx] nouveau0: error: bus: MMIO read of  FAULT at 409800 
[TIMEOUT ]
[184199.xxx] nouveau0: warn: timeout
[184199.xxx] nouveau0: error: gr: init failed, -16
[184201.xxx] nouveau0: warn: timeout
[184203.xxx] nouveau0: warn: timeout
[184205.xxx] nouveau0: warn: timeout
[184207.xxx] nouveau0: warn: timeout
[184209.xxx] nouveau0: warn: timeout
[184211.xxx] nouveau0: warn: timeout
[184213.xxx] nouveau0: warn: timeout
[184215.xxx] nouveau0: warn: timeout
[184218.xxx] nouveau0: warn: timeout
[184218.xxx] warning: 
/usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83:
 1
[184218.xxx] warning: 
/usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83:
 1
[184218.xxx] warning: 
/usr/src/sys/external/bsd/drm2/dist/drm/nouveau/nvkm/engine/disp/nouveau_nvkm_engine_disp_headgf119.c:83:
 1
[184218.xxx] uvm_fault(0x8191ba80, 0xb649e46a3000, 2) -> e
[184218.xxx] fatal page fault in supervisor mode
[184218.xxx] trap type 6 code 0x2 ...
[184218.xxx] curlpw 0xa8d4e6f36500 pid 27414.3207 lowest kstrack 
0xb589296452c0
kernel: page fault trap, code=0
Stopped in pid 27414.3207 (mpv) at netbsd:evo_wait+0x7b: movl $0x2
000,0(%rdx,%rax,1)
evo_wait() at netbsd:evo_wait+0x7b
base507c_ntfy_set()
nv50_wndw_flush_set()
nv50_disp_atomic_commit_tail()
nv50_disp_atomic_commit()
drm_atomic_helper_set_config()
drm_mode_setcrtc()
drm_ioctl()
drm_ioctl_shim()
sys_ioctl()
syscall()
--- syscall (number 54) ---


Does this ring a bell with anyone?

Should I file a PR?
 Thomas