On 3/26/21 9:08 AM, Stefan Metzmacher wrote:
> Am 26.03.21 um 15:55 schrieb Jens Axboe:
>> On 3/26/21 8:53 AM, Jens Axboe wrote:
>>> On 3/26/21 8:45 AM, Stefan Metzmacher wrote:
>>>> Am 26.03.21 um 15:43 schrieb Stefan Metzmacher:
>>>>> Am 26.03.21 um 15:38 schrieb Jens Axboe:
>>>>>> On 3/26/21 7:59 AM, Jens Axboe wrote:
>>>>>>> On 3/26/21 7:54 AM, Jens Axboe wrote:
>>>>>>>>> The KILL after STOP deadlock still exists.
>>>>>>>>
>>>>>>>> In which tree? Sounds like you're still on the old one with that
>>>>>>>> incremental you sent, which wasn't complete.
>>>>>>>>
>>>>>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL?
>>>>>>>>
>>>>>>>> No, it should kill up in all cases. I'll try your stop + kill, I just
>>>>>>>> tested both of them separately and didn't observe anything. I also ran
>>>>>>>> your io_uring-cp example (and found a bug in the example, fixed and
>>>>>>>> pushed), fwiw.
>>>>>>>
>>>>>>> I can reproduce this one! I'll take a closer look.
>>>>>>
>>>>>> OK, that one is actually pretty straight forward - we rely on cleaning
>>>>>> up on exit, but for fatal cases, get_signal() will call do_exit() for us
>>>>>> and never return. So we might need a special case in there to deal with
>>>>>> that, or some other way of ensuring that fatal signal gets processed
>>>>>> correctly for IO threads.
>>>>>
>>>>> And if (fatal_signal_pending(current)) doesn't prevent get_signal() from 
>>>>> being called?
>>>>
>>>> Ah, we're still in the first get_signal() from SIGSTOP, correct?
>>>
>>> Yes exactly, we're waiting in there being stopped. So we either need to
>>> check to something ala:
>>>
>>> relock:
>>> +   if (current->flags & PF_IO_WORKER && fatal_signal_pending(current))
>>> +           return false;
>>>
>>> to catch it upfront and from the relock case, or add:
>>>
>>>     fatal:
>>> +           if (current->flags & PF_IO_WORKER)
>>> +                   return false;
>>>
>>> to catch it in the fatal section.
>>
>> Can you try this? Not crazy about adding a special case, but I don't
>> think there's any way around this one. And should be pretty cheap, as
>> we're already pulling in ->flags right above anyway.
>>
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index 5ad8566534e7..5b75fbe3d2d6 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -2752,6 +2752,15 @@ bool get_signal(struct ksignal *ksig)
>>               */
>>              current->flags |= PF_SIGNALED;
>>  
>> +            /*
>> +             * PF_IO_WORKER threads will catch and exit on fatal signals
>> +             * themselves. They have cleanup that must be performed, so
>> +             * we cannot call do_exit() on their behalf. coredumps also
>> +             * do not apply to them.
>> +             */
>> +            if (current->flags & PF_IO_WORKER)
>> +                    return false;
>> +
>>              if (sig_kernel_coredump(signr)) {
>>                      if (print_fatal_signals)
>>                              print_fatal_signal(ksig->info.si_signo);
>>
> 
> I guess not before next week, but if it resolves the problem for you,
> I guess it would be good to get this into rc5.

It does, I pushed out a new branch. I'll send out a v2 series in a bit.

-- 
Jens Axboe

Reply via email to