[Nouveau] NVAC - BUG: unable to handle kernel NULL pointer dereference

2017-03-25 Thread poma

With lightweight desktoping,
the atomic modesetting seems far from robust.

BUG: unable to handle kernel NULL pointer dereference at 0021
IP: dma_fence_wait_timeout+0x36/0xf0
...
Oops:  [#1] SMP
Modules linked in: ... nouveau ...
CPU: 0 PID: 6895 Comm: Xorg Not tainted 4.10.5-1001.fc24.x86_64 #1
...
Call Trace:
 drm_atomic_helper_wait_for_fences+0x48/0x120 [drm_kms_helper]
 nv50_disp_atomic_commit+0x19c/0x2a0 [nouveau]
 drm_atomic_commit+0x4b/0x50 [drm]
 drm_atomic_helper_update_plane+0xec/0x150 [drm_kms_helper]
 __setplane_internal+0x1b4/0x280 [drm]
 drm_mode_cursor_universal+0x126/0x210 [drm]
 drm_mode_cursor_common+0x86/0x180 [drm]
 drm_mode_cursor_ioctl+0x50/0x70 [drm]
 drm_ioctl+0x21b/0x4c0 [drm]
 ? drm_mode_setplane+0x1a0/0x1a0 [drm]
 nouveau_drm_ioctl+0x74/0xc0 [nouveau]
 do_vfs_ioctl+0xa3/0x5f0
 SyS_ioctl+0x79/0x90
 entry_SYSCALL_64_fastpath+0x1a/0xa9
...
RIP: dma_fence_wait_timeout+0x36/0xf0 RSP: c1f700723a38
...
---[ end trace a6bef2d32ed5fbbc ]---


BUG: unable to handle kernel NULL pointer dereference at 0021
IP: dma_fence_wait_timeout+0x36/0xf0
...
Oops:  [#1] SMP
Modules linked in: ... nouveau ...
CPU: 3 PID: 30654 Comm: Xorg Tainted: GE   
4.11.0-0.rc3.git0.1.fc26.x86_64 #1
...
Call Trace:
 drm_atomic_helper_wait_for_fences+0x73/0x110 [drm_kms_helper]
 nv50_disp_atomic_commit+0x28a/0x2c0 [nouveau]
 ? refcount_dec_and_test+0x11/0x20
 drm_atomic_commit+0x4b/0x50 [drm]
 drm_atomic_helper_update_plane+0xf1/0x150 [drm_kms_helper]
 __setplane_internal+0x1fa/0x260 [drm]
 drm_mode_cursor_universal+0x12a/0x220 [drm]
 drm_mode_cursor_common+0x88/0x180 [drm]
 drm_mode_cursor_ioctl+0x4a/0x60 [drm]
 drm_ioctl+0x203/0x4d0 [drm]
 ? drm_mode_setplane+0x1a0/0x1a0 [drm]
 nouveau_drm_ioctl+0x72/0xc0 [nouveau]
 do_vfs_ioctl+0xa5/0x600
 ? security_inode_getsecid+0x1b/0x40
 SyS_ioctl+0x79/0x90
 entry_SYSCALL_64_fastpath+0x1a/0xa9
...
RIP: dma_fence_wait_timeout+0x36/0xf0 RSP: bda700723a40
...
---[ end trace 95b0fca6a8295839 ]---


Subsequently, hardware reset is needed.

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] NVAC - BUG: unable to handle kernel NULL pointer dereference

2017-03-25 Thread Ard Biesheuvel


> On 25 Mar 2017, at 10:47, poma  wrote:
> 
> 
> With lightweight desktoping,
> the atomic modesetting seems far from robust.
> 
> BUG: unable to handle kernel NULL pointer dereference at 0021
> IP: dma_fence_wait_timeout+0x36/0xf0
> ...

I am seeing similar issues with v4.10 on arm64 using a gt218.

Kasan tells me it is a use-after-free error of a dma_fence. Full report was 
sent to the mailing list

> Oops:  [#1] SMP
> Modules linked in: ... nouveau ...
> CPU: 0 PID: 6895 Comm: Xorg Not tainted 4.10.5-1001.fc24.x86_64 #1
> ...
> Call Trace:
> drm_atomic_helper_wait_for_fences+0x48/0x120 [drm_kms_helper]
> nv50_disp_atomic_commit+0x19c/0x2a0 [nouveau]
> drm_atomic_commit+0x4b/0x50 [drm]
> drm_atomic_helper_update_plane+0xec/0x150 [drm_kms_helper]
> __setplane_internal+0x1b4/0x280 [drm]
> drm_mode_cursor_universal+0x126/0x210 [drm]
> drm_mode_cursor_common+0x86/0x180 [drm]
> drm_mode_cursor_ioctl+0x50/0x70 [drm]
> drm_ioctl+0x21b/0x4c0 [drm]
> ? drm_mode_setplane+0x1a0/0x1a0 [drm]
> nouveau_drm_ioctl+0x74/0xc0 [nouveau]
> do_vfs_ioctl+0xa3/0x5f0
> SyS_ioctl+0x79/0x90
> entry_SYSCALL_64_fastpath+0x1a/0xa9
> ...
> RIP: dma_fence_wait_timeout+0x36/0xf0 RSP: c1f700723a38
> ...
> ---[ end trace a6bef2d32ed5fbbc ]---
> 
> 
> BUG: unable to handle kernel NULL pointer dereference at 0021
> IP: dma_fence_wait_timeout+0x36/0xf0
> ...
> Oops:  [#1] SMP
> Modules linked in: ... nouveau ...
> CPU: 3 PID: 30654 Comm: Xorg Tainted: GE   
> 4.11.0-0.rc3.git0.1.fc26.x86_64 #1
> ...
> Call Trace:
> drm_atomic_helper_wait_for_fences+0x73/0x110 [drm_kms_helper]
> nv50_disp_atomic_commit+0x28a/0x2c0 [nouveau]
> ? refcount_dec_and_test+0x11/0x20
> drm_atomic_commit+0x4b/0x50 [drm]
> drm_atomic_helper_update_plane+0xf1/0x150 [drm_kms_helper]
> __setplane_internal+0x1fa/0x260 [drm]
> drm_mode_cursor_universal+0x12a/0x220 [drm]
> drm_mode_cursor_common+0x88/0x180 [drm]
> drm_mode_cursor_ioctl+0x4a/0x60 [drm]
> drm_ioctl+0x203/0x4d0 [drm]
> ? drm_mode_setplane+0x1a0/0x1a0 [drm]
> nouveau_drm_ioctl+0x72/0xc0 [nouveau]
> do_vfs_ioctl+0xa5/0x600
> ? security_inode_getsecid+0x1b/0x40
> SyS_ioctl+0x79/0x90
> entry_SYSCALL_64_fastpath+0x1a/0xa9
> ...
> RIP: dma_fence_wait_timeout+0x36/0xf0 RSP: bda700723a40
> ...
> ---[ end trace 95b0fca6a8295839 ]---
> 
> 
> Subsequently, hardware reset is needed.
> 
> ___
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [Bug 99400] [nouveau] garbled rendering with glamor on G71

2017-03-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99400

--- Comment #28 from Ilia Mirkin  ---
OK, this weston stuff is a no-go. No way to run it against a specific dri card
without intensive udev work (which I have no interest in learning), and
apparently won't even let me start without some env vars (XDG_RUNTIME_DIR,
perhaps others) after I tried starting it under X.

So... I can't repro with the given instructions (because I can't follow the
instructions successfully). Perhaps there are other instructions that reproduce
the issue? Does Xephyr -glamor reproduce the issue? Something else that I can
run?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [Bug 99584] XVMC on nv43 class card broken with recent mesa + kernel.

2017-03-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99584

--- Comment #5 from Ilia Mirkin  ---
OK, so the NV4A actually runs into a different issue - the MPEG class can only
take linear memory, but the PCI GART is paged. So it can't deal the CMD/DATA
bo's. When moving those to VRAM (in mesa), the NV4A is fine. I plugged a NV42
in with much worse results. On boot I get:

https://hastebin.com/datuzivebu.sql

[   10.110345] nouveau :04:00.0: mpeg: ch -1 [unknown] 03100023 
0001 
[   10.191580] nouveau :04:00.0: mpeg: MSRCH 0x

And just all kinds of fail. This is with some local patches to print the extra
stuff out. Feels like it's not getting powered on (all those 0x's) in
MC.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [Bug 99584] XVMC on nv43 class card broken with recent mesa + kernel.

2017-03-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99584

--- Comment #6 from Ilia Mirkin  ---
Er, scratch that. I guess the board doesn't have enough power when there's a
second GPU in another PCIe slot. It comes up fine now, and I get the same
issue.

Looks like this bit of nvkm_ioctl_new is somehow failing with -ENODEV. My
latest theory is:

nvkm_fifo_chan_child_new calls engine_ctor (nv40_fifo_dma_engine_ctor), which
in turn calls nvkm_object_bind() on something it's not supposed to (like the
engine object, I think), which in turn returns -ENODEV as there's no bind
pointer. I suspect the solution here is to add a dummy .bind to nv31_mpeg_chan,
since the binding effectively happens at chan_new time. Or we could move the
mpeg->chan check to the bind action.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [Bug 99584] XVMC on nv43 class card broken with recent mesa + kernel.

2017-03-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99584

--- Comment #7 from Ilia Mirkin  ---
OK, looks like this isn't so trivial to solve. The code really likes having
something in chan->engn[] so that it can get the address. The old code just
stuck a "4" in if it was a non-GR class. I wonder if the current code should
check for ->engn[] != null, and if it's null, use a 4 there for the inst
address.

This will need consultation with Ben, as this is well outside my knowledge
area.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau