Hi, Chirstian:
I have issued a bug report on bugs.freedesktop.org. could you please
help to look into it ? what will cause this bug happen?
------------------ ???????? ------------------
??????: "Koenig, Christian"<christian.koe...@amd.com>;
????????: 2019??9??3??(??????) ????9:19c
??????: ""<78666...@qq.com>;"amd-gfx"<amd-gfx@lists.freedesktop.org>;
????: "Deucher, Alexander"<alexander.deuc...@amd.com>;
????: Re: ?????? ?????? Bug: amdgpu drm driver cause process into Disk sleep
state
This is just a GPU lock, please open up a bug report on freedesktop.org and
attach the full dmesg and which version of Mesa you are using.
Regards,
Christian.
Am 03.09.19 um 15:16 schrieb 78666679:
Yes, with dmesg|grep drm , I get following.
348571.880718] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout,
signaled seq=24423862, emitted seq=24423865
------------------ ???????? ------------------
??????: "Koenig, Christian"<christian.koe...@amd.com>;
????????: 2019??9??3??(??????) ????9:07
??????: ""<78666...@qq.com>;"amd-gfx"<amd-gfx@lists.freedesktop.org>;
????: "Deucher, Alexander"<alexander.deuc...@amd.com>;
????: Re: ?????? Bug: amdgpu drm driver cause process into Disk sleep state
Well that looks like the hardware got stuck.
Do you get something in the locks about a timeout on the SDMA ring?
Regards,
Christian.
Am 03.09.19 um 14:50 schrieb 78666679:
Hi Christian,
Sometimes the thread blocked disk sleeping in call to
amdgpu_sa_bo_new. following is the stack trace. it seems the sa bo is used up
, so the caller blocked waiting someone to free sa resources.
D 206833 227656 [surfaceflinger] <defunct> Binder:45_5
cat /proc/206833/task/227656/stack
[<0>] __switch_to+0x94/0xe8
[<0>] dma_fence_wait_any_timeout+0x234/0x2d0
[<0>] amdgpu_sa_bo_new+0x468/0x540 [amdgpu]
[<0>] amdgpu_ib_get+0x60/0xc8 [amdgpu]
[<0>] amdgpu_job_alloc_with_ib+0x70/0xb0 [amdgpu]
[<0>] amdgpu_vm_bo_update_mapping+0x2e0/0x3d8 [amdgpu]
[<0>] amdgpu_vm_bo_update+0x2a0/0x710 [amdgpu]
[<0>] amdgpu_gem_va_ioctl+0x46c/0x4c8 [amdgpu]
[<0>] drm_ioctl_kernel+0x94/0x118 [drm]
[<0>] drm_ioctl+0x1f0/0x438 [drm]
[<0>] amdgpu_drm_ioctl+0x58/0x90 [amdgpu]
[<0>] do_vfs_ioctl+0xc4/0x8c0
[<0>] ksys_ioctl+0x8c/0xa0
[<0>] __arm64_sys_ioctl+0x28/0x38
[<0>] el0_svc_common+0xa0/0x180
[<0>] el0_svc_handler+0x38/0x78
[<0>] el0_svc+0x8/0xc
[<0>] 0xffffffffffffffff
--------------------
YanHua
------------------ ???????? ------------------
??????: "Koenig, Christian"<christian.koe...@amd.com>;
????????: 2019??9??3??(??????) ????4:21
??????: ""<78666...@qq.com>;"amd-gfx"<amd-gfx@lists.freedesktop.org>;
????: "Deucher, Alexander"<alexander.deuc...@amd.com>;
????: Re: Bug: amdgpu drm driver cause process into Disk sleep state
Hi Yanhua,
please update your kernel first, cause that looks like a known issue
which was recently fixed by patch "drm/scheduler: use job count instead
of peek".
Probably best to try the latest bleeding edge kernel and if that doesn't
help please open up a bug report on https://bugs.freedesktop.org/.
Regards,
Christian.
Am 03.09.19 um 09:35 schrieb 78666679:
> Hi, Sirs:
> I have a wx5100 amdgpu card, It randomly come into failure.
> sometimes, it will cause processes into uninterruptible wait state.
>
>
> cps-new-ondemand-0587:~ # ps aux|grep -w D
> root 11268 0.0 0.0 260628 3516 ? Ssl 8??26 0:00
> /usr/sbin/gssproxy -D
> root 136482 0.0 0.0 212500 572 pts/0 S+ 15:25 0:00 grep
> --color=auto -w D
> root 370684 0.0 0.0 17972 7428 ? Ss 9??02 0:04
> /usr/sbin/sshd -D
> 10066 432951 0.0 0.0 0 0 ? D 9??02 0:00
> [FakeFinalizerDa]
> root 496774 0.0 0.0 0 0 ? D 9??02 0:17
> [kworker/8:1+eve]
> cps-new-ondemand-0587:~ # cat /proc/496774/stack
> [<0>] __switch_to+0x94/0xe8
> [<0>] drm_sched_entity_flush+0xf8/0x248 [gpu_sched]
> [<0>] amdgpu_ctx_mgr_entity_flush+0xac/0x148 [amdgpu]
> [<0>] amdgpu_flush+0x2c/0x50 [amdgpu]
> [<0>] filp_close+0x40/0xa0
> [<0>] put_files_struct+0x118/0x120
> [<0>] put_files_struct+0x30/0x68 [binder_linux]
> [<0>] binder_deferred_func+0x4d4/0x658 [binder_linux]
> [<0>] process_one_work+0x1b4/0x3f8
> [<0>] worker_thread+0x54/0x470
> [<0>] kthread+0x134/0x138
> [<0>] ret_from_fork+0x10/0x18
> [<0>] 0xffffffffffffffff
>
>
>
> This issue troubled me a long time. looking eagerly to get help from you!
>
>
> -----
> Yanhua
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx