[Bug 104289] [regression][vega10] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout on exiting certain Steam games

2018-10-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104289

Michel Dänzer  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #15 from Michel Dänzer  ---
(In reply to dallase from comment #14)
> My Radeon Pro Duo (polaris) is experiencing ring sdma0 timeouts when trying
> to move to newer kernels.

Please file your own report. Per comment 12, the issue this report is about is
fixed.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 104289] [regression][vega10] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout on exiting certain Steam games

2018-10-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104289

dallase  changed:

   What|Removed |Added

 Resolution|FIXED   |---
 Status|CLOSED  |REOPENED

--- Comment #14 from dallase  ---
My Radeon Pro Duo (polaris) is experiencing ring sdma0 timeouts when trying to
move to newer kernels.  I’m running
a custom build of 4.17.0-rc2-180424-fkxamd (from ROCm Kernel
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/tree/fkxamd/drm-next-wip)
without issues.

When I build either of these kernels, the card gets ring timeouts on boot. 
Both amdgpu-pro 18.20 and 18.30 for userland, didnt matter.


amd-staging-drm-next (built Oct 7 2018)

[   61.701281] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
signaled seq=888, emitted seq=890
[   61.701285] [drm] GPU recovery disabled.
[   61.701397] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
signaled seq=902, emitted seq=904
[   61.701399] [drm] GPU recovery disabled.

drm-next-4.20-wip (built Oct 8 2018)

[   60.840847] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
signaled seq=914, emitted seq=916
[   60.840851] [drm] GPU recovery disabled.
[   60.840962] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
signaled seq=907, emitted seq=909
[   60.840964] [drm] GPU recovery disabled.



Both of these kernels work fine on my Vega 56 and Vega 64's, just the Pro Duo
has the ring timeouts.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 104289] [regression][vega10] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout on exiting certain Steam games

2018-08-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104289

--- Comment #13 from Peter Klotz  ---
Sorry to post into this already closed bug.

Should this issue be fixed in 4.17.12?

I am asking because I see sporadic system hangs that start with these messages:

Aug 09 08:20:18 thinkpad kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
ring sdma0 timeout, last signaled seq=2260291, last emitted seq=2260293
Aug 09 08:20:18 thinkpad kernel: [drm] No hardware hang detected. Did some
blocks stall?
Aug 09 08:20:35 thinkpad kernel: watchdog: BUG: soft lockup - CPU#4 stuck for
22s! [kwin_x11:915]


Sounds similar to this bug.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 104289] [regression][vega10] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout on exiting certain Steam games

2018-01-07 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104289

--- Comment #12 from Vedran Miletić  ---
(In reply to Michel Dänzer from comment #7)
> (In reply to Vedran Miletić from comment #5)
> > I'm sorry, but I will not be able to bisect this. Checkouts of relevant
> > commits don't boot and simple reverts do apply cleanly, but don't compile.
> 
> FWIW, you may still be able to at least narrow things down with git bisect.
> If you can't test a selected commit, run "git bisect skip". That will select
> another commit to test. You can also manually check out another commit to
> test. In the worst case, the bisection process will end with identifying the
> minimal set of candidates instead of a single commit.

Thanks for the suggestion. Tried that and didn't get anywhere (all the relevant
commits were broken in one way or another).

(In reply to Christian König from comment #11)
> Code fix is now in amd-staging-drm-next

Verified as fixed. (Would have checked earlier, but was away from the computer
with Vega.)

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 104289] [regression][vega10] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout on exiting certain Steam games

2018-01-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104289

Christian König  changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 104289] [regression][vega10] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout on exiting certain Steam games

2018-01-03 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104289

Christian König  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Christian König  ---
Code fix is now in amd-staging-drm-next

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 104289] [regression][vega10] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout on exiting certain Steam games

2017-12-31 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104289

--- Comment #10 from Tom Englund  ---
i could reliably reproduce this with starting fallout 4 in wine, getting same
or similiar crashes in dmesg,

however with the last attachment Christian König posted it now runs.
https://bugs.freedesktop.org/attachment.cgi?id=136343

dmesg: 

dec 31 15:01:22 tom-pc kernel: WARNING: CPU: 6 PID: 25993 at
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1641
amdgpu_vm_bo_update_mapping+0x3dd/0x3f0 [amdgpu]
dec 31 15:01:22 tom-pc kernel: Modules linked in: fuse mousedev msr
nls_iso8859_1 nls_cp437 vfat fat intel_rapl x86_pkg_temp_thermal
intel_powerclamp coretemp 
dec 31 15:01:22 tom-pc kernel:  gpu_sched drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops ttm drm agpgart
dec 31 15:01:22 tom-pc kernel: CPU: 6 PID: 25993 Comm: amdgpu_cs:0 Tainted: G  
 W4.15.0-rc2-mainline #1
dec 31 15:01:22 tom-pc kernel: Hardware name: Gigabyte Technology Co., Ltd.
Z170-HD3P/Z170-HD3P-CF, BIOS F20 11/04/2016
dec 31 15:01:22 tom-pc kernel: task: 569a51e8 task.stack:
bc284a6f
dec 31 15:01:22 tom-pc kernel: RIP:
0010:amdgpu_vm_bo_update_mapping+0x3dd/0x3f0 [amdgpu]
dec 31 15:01:22 tom-pc kernel: RSP: 0018:ace501b7b9e0 EFLAGS: 00010216
dec 31 15:01:22 tom-pc kernel: RAX: 92a0f7ac6e58 RBX: 92a0c072d800 RCX:
92a1682b6550
dec 31 15:01:22 tom-pc kernel: RDX: ace50336c700 RSI: 92a0f7ac6e58 RDI:
92a1682b6560
dec 31 15:01:22 tom-pc kernel: RBP: 92a1682b R08: 0002 R09:

dec 31 15:01:22 tom-pc kernel: R10: 07fb R11: 07f9 R12:
078e
dec 31 15:01:22 tom-pc kernel: R13: 92a1682b6560 R14: 00109200 R15:

dec 31 15:01:22 tom-pc kernel: FS:  7fc349c21700()
GS:92a17ed8() knlGS:7fea8000
dec 31 15:01:22 tom-pc kernel: CS:  0010 DS:  ES:  CR0:
80050033
dec 31 15:01:22 tom-pc kernel: CR2: 7fc296881fa8 CR3: 0003e8fbd003 CR4:
003606e0
dec 31 15:01:22 tom-pc kernel: DR0:  DR1:  DR2:

dec 31 15:01:22 tom-pc kernel: DR3:  DR6: fffe0ff0 DR7:
0400
dec 31 15:01:22 tom-pc kernel: Call Trace:
dec 31 15:01:22 tom-pc kernel:  ? amdgpu_vm_free_mapping.isra.24+0x20/0x20
[amdgpu]
dec 31 15:01:22 tom-pc kernel:  amdgpu_vm_bo_update+0x327/0x5e0 [amdgpu]
dec 31 15:01:22 tom-pc kernel:  amdgpu_vm_handle_moved+0x73/0xa0 [amdgpu]
dec 31 15:01:22 tom-pc kernel:  amdgpu_cs_ioctl+0x1a4a/0x1ae0 [amdgpu]
dec 31 15:01:22 tom-pc kernel:  ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu]
dec 31 15:01:22 tom-pc kernel:  drm_ioctl_kernel+0x59/0xb0 [drm]
dec 31 15:01:22 tom-pc kernel:  drm_ioctl+0x2d5/0x370 [drm]
dec 31 15:01:22 tom-pc kernel:  ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu]
dec 31 15:01:22 tom-pc kernel:  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
dec 31 15:01:22 tom-pc kernel:  do_vfs_ioctl+0xa1/0x610
dec 31 15:01:22 tom-pc kernel:  ? SyS_futex+0x12d/0x180
dec 31 15:01:22 tom-pc kernel:  SyS_ioctl+0x74/0x80
dec 31 15:01:22 tom-pc kernel:  entry_SYSCALL_64_fastpath+0x1a/0x7d
dec 31 15:01:22 tom-pc kernel: RIP: 0033:0x7fc41e3b1a07
dec 31 15:01:22 tom-pc kernel: RSP: 002b:7fc349c20c78 EFLAGS: 0246
ORIG_RAX: 0010
dec 31 15:01:22 tom-pc kernel: RAX: ffda RBX: 0008 RCX:
7fc41e3b1a07
dec 31 15:01:22 tom-pc kernel: RDX: 7fc349c20ce0 RSI: c0186444 RDI:
001e
dec 31 15:01:22 tom-pc kernel: RBP: 7fc349c20e00 R08: 7fc349c20d80 R09:
7fc349c20cc0
dec 31 15:01:22 tom-pc kernel: R10: 0001 R11: 0246 R12:
7cdf0a98
dec 31 15:01:22 tom-pc kernel: R13: 0001 R14: 7fc349c20cf0 R15:

dec 31 15:01:22 tom-pc kernel: Code: ff 74 16 f0 ff 0f 0f 88 3c d4 12 00 75 0b
89 04 24 e8 c8 44 0a e3 8b 04 24 48 8b 54 24 38 48 8b 5c 24 08 48 89 13 e9 0b
fd
dec 31 15:01:22 tom-pc kernel: ---[ end trace 425bb209c57fc66b ]---
dec 31 15:01:32 tom-pc kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma0 timeout, last signaled seq=53896, last emitted seq=53898
dec 31 15:01:32 tom-pc kernel: [drm] No hardware hang detected. Did some blocks
stall?
dec 31 15:01:35 tom-pc systemd-logind[561]: Power key pressed.
dec 31 15:01:35 tom-pc systemd-logind[561]: Powering Off...
dec 31 15:01:35 tom-pc systemd-logind[561]: System is powering down.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 104289] [regression][vega10] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout on exiting certain Steam games

2017-12-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104289

Christian König  changed:

   What|Removed |Added

 Attachment #136340|0   |1
is obsolete||

--- Comment #9 from Christian König  ---
Created attachment 136343
  --> https://bugs.freedesktop.org/attachment.cgi?id=136343=edit
Possible fix v2

Please try that one instead.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 104289] [regression][vega10] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout on exiting certain Steam games

2017-12-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104289

--- Comment #8 from Christian König  ---
I think I've figured out what is going on here. Give me a moment to provide a
new patch.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 104289] [regression][vega10] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout on exiting certain Steam games

2017-12-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104289

--- Comment #7 from Michel Dänzer  ---
(In reply to Vedran Miletić from comment #5)
> I'm sorry, but I will not be able to bisect this. Checkouts of relevant
> commits don't boot and simple reverts do apply cleanly, but don't compile.

FWIW, you may still be able to at least narrow things down with git bisect. If
you can't test a selected commit, run "git bisect skip". That will select
another commit to test. You can also manually check out another commit to test.
In the worst case, the bisection process will end with identifying the minimal
set of candidates instead of a single commit.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 104289] [regression][vega10] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout on exiting certain Steam games

2017-12-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104289

--- Comment #6 from Christian König  ---
Created attachment 136340
  --> https://bugs.freedesktop.org/attachment.cgi?id=136340=edit
Possible fix

Complete shot into the dark, but while double checking the code I've found that
at least this calculation isn't correct.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 104289] [regression][vega10] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout on exiting certain Steam games

2017-12-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104289

--- Comment #5 from Vedran Miletić  ---
(In reply to Christian König from comment #4)
> You can restrict that to changes to drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c.
> 
> The problem is that we use more dw than expected for clearing the page
> tables. No idea what exactly goes wrong, but bisecting the commit which
> introduced it would certainly help.

I'm sorry, but I will not be able to bisect this. Checkouts of relevant commits
don't boot and simple reverts do apply cleanly, but don't compile.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 104289] [regression][vega10] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout on exiting certain Steam games

2017-12-19 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104289

--- Comment #4 from Christian König  ---
You can restrict that to changes to drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c.

The problem is that we use more dw than expected for clearing the page tables.
No idea what exactly goes wrong, but bisecting the commit which introduced it
would certainly help.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 104289] [regression][vega10] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout on exiting certain Steam games

2017-12-18 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104289

--- Comment #2 from Michel Dänzer  ---
Can you bisect?

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel