scheduler: Don't call wait_event_killable for signaled process.

Christian König Mon, 30 Apr 2018 08:26:59 -0700

Am 30.04.2018 um 16:32 schrieb Andrey Grodzovsky:

On 04/30/2018 08:08 AM, Christian König wrote:
Hi Eric,

sorry for the late response, was on vacation last week.

Am 26.04.2018 um 02:01 schrieb Eric W. Biederman:
Andrey Grodzovsky <andrey.grodzov...@amd.com> writes:
On 04/25/2018 01:17 PM, Oleg Nesterov wrote:
On 04/25, Andrey Grodzovsky wrote:
here (drm_sched_entity_fini) is also a bad idea, but we stillwant to be
able to exit immediately
and not wait for GPU jobs completion when the reason for reachingthis code
is because of KILL
signal to the user process who opened the device file.
Can you hook f_op->flush method?
THANKS! That sounds like a really good idea to me and we haven'tinvestigated into that direction yet.
But this one is called for each task releasing a reference to thethe file, so
not sure I see how this solves the problem.
The big question is why do you need to wait during the final closing a
file?
As always it's because of historical reasons. Initially user spacepushed commands directly to a hardware queue and when a processesfinished we didn't need to wait for anything.
Then the GPU scheduler was introduced which delayed pushing the jobsto the hardware queue to a later point in time.
This wait was then added to maintain backward compability and notbreak userspace (but see below).
The wait can be terminated so the wait does not appear to be simply a
matter of correctness.
Well when the process is killed we don't care about correctness anymore, we just want to get rid of it as quickly as possible (OOMsituation etc...).
But it is perfectly possible that a process submits some rendercommands and then calls exit() or terminates because of a SIGTERM,SIGINT etc.. In this case we need to wait here to make sure that allrendering is pushed to the hardware because the scheduler might needresources/settings from the file descriptor.
For example if you just remove that wait you could close firefox andget garbage on the screen for a millisecond because the remainingrendering commands where not executed.
So what we essentially need is to distinct between a SIGKILL (whichmeans stop processing as soon as possible) and any other reasonbecause then we don't want to annoy the user with garbage on thescreen (even if it's just for a few milliseconds).
Constructive ideas how to handle this would be very welcome, cause Icompletely agree that what we have at the moment by checkingPF_SIGNAL is just a very very hacky workaround.
What about changing PF_SIGNALED to PF_EXITING indrm_sched_entity_do_release
- if ((current->flags & PF_SIGNALED) && current->exit_code ==SIGKILL)+ if ((current->flags & PF_EXITING) && current->exit_code ==SIGKILL)
From looking into do_exit and it's callers , current->exit_code willget assign the signal which was delivered to the task. If SIGINT wassent then it's SIGINT, if SIGKILL then SIGKILL.


That's at least a band aid to stop us from abusing PF_SIGNALED.

But additional to that change, can you investigate when f_ops->flush()is called when the process exists normally, because of SIGKILL orbecause of some other signal?


Could be that this is more closely to what we are searching for,
Christian.


Andrey


Thanks,
Christian.


Eric
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process.

Reply via email to