2017-07-17 9:33 GMT+02:00 Stefan Eissing <[email protected]>:
> > > Am 14.07.2017 um 21:52 schrieb Yann Ylavic <[email protected]>: > > > > On Fri, Jun 30, 2017 at 1:33 PM, Yann Ylavic <[email protected]> > wrote: > >> On Fri, Jun 30, 2017 at 1:20 PM, Ruediger Pluem <[email protected]> > wrote: > >>> > >>> On 06/30/2017 12:18 PM, Yann Ylavic wrote: > >>>> > >>>> IMHO mod_ssl shoudn't (BIO_)flush unconditionally in > >>>> modssl_smart_shutdown(), only in the "abortive" mode of > >>>> ssl_filter_io_shutdown(). > >>> > >>> I think the issue starts before that. > >>> ap_prep_lingering_close calls the pre_close_connection hook and > modules that are registered > >>> to this hook can perform all sort of long lasting blocking operations > there. > >>> While it can be argued that this would be a bug in the module I think > the only safe way > >>> is to have the whole start_lingering_close_nonblocking being executed > by a worker thread. > >> > >> Correct, that'd be much simpler/safer indeed. > >> We need a new SHUTDOWN state then, right? > > > > Actually it was less simple than expected, and it has some caveats > obviously... > > > > The attached patch does not introduce a new state but reuses the > > existing CONN_STATE_LINGER since it was not really considered by the > > listener thread (which uses CONN_STATE_LINGER_NORMAL and > > CONN_STATE_LINGER_SHORT instead), but that's a detail. > > > > Mainly, start_lingering_close_nonblocking() now simply schedules a > > shutdown (i.e. pre_close_connection() followed by immediate close) > > that will we be run by a worker thread. > > A new shutdown_linger_q is created/handled (with the same timeout as > > the short_linger_q, namely 2 seconds) to hold connections to be > > shutdown. > > > > So now when a connection times out in the write_completion or > > keepalive queues, it needs (i.e. the listener may wait for) an > > available worker to process its shutdown/close. > > This means we can *not* close kept alive connections immediatly like > > before when becoming short of workers, which will favor active KA > > connections over new ones in this case (I don't think it's that > > serious but the previous was taking care of that. For me it's up to > > the admin to size the workers appropriately...). > > > > Same when a connection in the shutdown_linger_q itself times out, the > > patch will require a worker immediatly to do the job (see > > shutdown_lingering_close() callback). > > > > So overall, this patch may introduce the need for more workers than > > before, what was (wrongly) done by the listener thread has to be done > > somewhere anyway... > > > > Finally, I think there is room for improvements like batching > > shutdowns in the same worker if there is no objection on the approach > > so far. > > > > WDYT? > > I will test the patch, most likely today. I lot of +1s for the initiative! > > Thanks a lot for this work Yann, really nice! I will try to test it as well during the next days, I was not convinced in the beginning that triggering a (potential) increase in workers usage was super great for our users but it is definitely the most correct and safe thing to do. Even if we find a way to fix the ssl-lingering-close-block issue we might encounter a similar one in the future, so it is better imho to fix it at the source. Luca
