Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Stefan Priebe - Profihost AG Wed, 19 Jul 2017 08:21:14 -0700

Hello,

fullstatus says:
   Slot  PID  Stopping   Connections    Threads       Async connections
                       total accepting busy idle writing keep-alive  closing
   0    25042 no       0     no        2    198  0       0
4294966698
   1    4347  no       0     no        0    200  0       0
4294966700
   2    26273 no       0     no        1    199  0       0
4294966698
   3    4348  no       0     no        0    200  0       0
4294966699
   4    10224 no       0     no        0    200  0       0
4294966697
   5    12157 no       0     no        0    200  0       0
4294966700
   6    23027 no       0     no        0    200  0       0
4294966698
   7    28597 no       0     no        0    200  0       0
4294966698
   8    7519  no       0     no        0    200  0       0
4294966697
   9    18609 no       0     no        2    198  0       0
4294966698
   10   3183  no       0     no        0    200  0       0
4294966698
   11   14704 no       0     no        0    200  0       0
4294966698
   12   26237 no       0     no        0    200  0       0
4294966700
   13   32070 no       0     no        0    200  0       0
4294966697
   14   12070 no       1     no        0    200  0       0
4294966699
   15   16627 no       0     no        0    200  0       0
4294966698
   16   29413 no       0     no        0    200  0       0
4294966699
   17   435   no       0     no        0    200  0       0
4294966699
   18   24808 no       0     no        0    200  0       0
4294966700
   19   1822  no       0     no        0    200  0       0
4294966699
   20   1721  no       0     no        0    200  0       0
4294966698
   21   2875  no       0     no        0    200  0       0
4294966698
   22   25879 no       0     no        0    200  0       0
4294966698
   23   28091 no       0     no        0    200  0       0
4294966696
   24   31452 no       0     no        0    200  0       0
4294966698
   25   32706 no       0     no        0    200  0       0
4294966698
   26   8858  no       14    yes       3    197  0       6
4294966783
   27   10203 no       5     yes       2    198  0       2
4294966949
   Sum  28    0        20              10   5590 0       8          -16400


Greets,
Stefan

Am 19.07.2017 um 17:05 schrieb Stefan Priebe - Profihost AG:
> 
> Am 19.07.2017 um 16:59 schrieb Stefan Priebe - Profihost AG:
>> Hello Yann,
>>
>> i'm observing some deadlocks again.
>>
>> I'm using
>> httpd 2.4.27
>> + mod_h2
>> + httpd-2.4.x-mpm_event-wakeup-v7.1.patch
>> + your ssl linger fix patch from this thread
>>
>> What kind of information do you need? If you need a full stack backtrace
>>  - from which pid? Or from all httpd pids?
> 
> Something i forgot to tell:
> 
> it seems httpd is running at max threads:
> awk '{print $10 $11}' lsof.txt | sort | uniq -c | grep LISTEN
>   25050 *:http(LISTEN)
>   25050 *:https(LISTEN)
> 
> Stefan
> 
>>
>> Thanks!
>>
>> Greets,
>> Stefan
>>
>> Am 14.07.2017 um 21:52 schrieb Yann Ylavic:
>>> On Fri, Jun 30, 2017 at 1:33 PM, Yann Ylavic <[email protected]> wrote:
>>>> On Fri, Jun 30, 2017 at 1:20 PM, Ruediger Pluem <[email protected]> wrote:
>>>>>
>>>>> On 06/30/2017 12:18 PM, Yann Ylavic wrote:
>>>>>>
>>>>>> IMHO mod_ssl shoudn't (BIO_)flush unconditionally in
>>>>>> modssl_smart_shutdown(), only in the "abortive" mode of
>>>>>> ssl_filter_io_shutdown().
>>>>>
>>>>> I think the issue starts before that.
>>>>> ap_prep_lingering_close calls the pre_close_connection hook and modules 
>>>>> that are registered
>>>>> to this hook can perform all sort of long lasting blocking operations 
>>>>> there.
>>>>> While it can be argued that this would be a bug in the module I think the 
>>>>> only safe way
>>>>> is to have the whole start_lingering_close_nonblocking being executed by 
>>>>> a worker thread.
>>>>
>>>> Correct, that'd be much simpler/safer indeed.
>>>> We need a new SHUTDOWN state then, right?
>>>
>>> Actually it was less simple than expected, and it has some caveats 
>>> obviously...
>>>
>>> The attached patch does not introduce a new state but reuses the
>>> existing CONN_STATE_LINGER since it was not really considered by the
>>> listener thread (which uses CONN_STATE_LINGER_NORMAL and
>>> CONN_STATE_LINGER_SHORT instead), but that's a detail.
>>>
>>> Mainly, start_lingering_close_nonblocking() now simply schedules a
>>> shutdown (i.e. pre_close_connection() followed by immediate close)
>>> that will we be run by a worker thread.
>>> A new shutdown_linger_q is created/handled (with the same timeout as
>>> the short_linger_q, namely 2 seconds) to hold connections to be
>>> shutdown.
>>>
>>> So now when a connection times out in the write_completion or
>>> keepalive queues, it needs (i.e. the listener may wait for) an
>>> available worker to process its shutdown/close.
>>> This means we can *not* close kept alive connections immediatly like
>>> before when becoming short of workers, which will favor active KA
>>> connections over new ones in this case (I don't think it's that
>>> serious but the previous was taking care of that. For me it's up to
>>> the admin to size the workers appropriately...).
>>>
>>> Same when a connection in the shutdown_linger_q itself times out, the
>>> patch will require a worker immediatly to do the job (see
>>> shutdown_lingering_close() callback).
>>>
>>> So overall, this patch may introduce the need for more workers than
>>> before, what was (wrongly) done by the listener thread has to be done
>>> somewhere anyway...
>>>
>>> Finally, I think there is room for improvements like batching
>>> shutdowns in the same worker if there is no objection on the approach
>>> so far.
>>>
>>> WDYT?
>>>
>>> Regards,
>>> Yann.
>>>

Re: [Bug 60956] Event MPM listener thread may get blocked by SSL shutdowns

Reply via email to