Hello!

On Mon, Jan 08, 2024 at 01:31:11PM +0000, J Carter wrote:

> On Mon, 8 Jan 2024 11:25:55 +0000
> J Carter <jordanc.car...@outlook.com> wrote:
> 
> > Hello,
> > 
> > On Mon, 27 Nov 2023 05:50:27 +0300
> > Maxim Dounin <mdou...@mdounin.ru> wrote:
> > 
> > > # HG changeset patch
> > > # User Maxim Dounin <mdou...@mdounin.ru>
> > > # Date 1701050170 -10800
> > > #      Mon Nov 27 04:56:10 2023 +0300
> > > # Node ID 00c3e7333145ddb5ea0eeaaa66b3d9c26973c9c2
> > > # Parent  61d08e4cf97cc073200ec32fc6ada9a2d48ffe51
> > > AIO operations now add timers (ticket #2162).
> > > 
> > > Each AIO (thread IO) operation being run is now accompanied with 1-minute
> > > timer.  This timer prevents unexpected shutdown of the worker process 
> > > while
> > > an AIO operation is running, and logs an alert if the operation is running
> > > for too long.  
> > 
> > Shouldn't this timer's duration be set to match worker_shutdown_timeout's
> > duration rather than being hard coded to 60s ?
> 
> Ah nevermind, I understand.
> 
> These timers will either expire from passing the 60s set duration, or 
> will expire as worker_process_timeout itself expires, kills the 
> connection and times out associated timers (including the aio timers). 
> 
> Setting it to worker_shutdown_timeout's duration would be pointless 
> (an 'infinite' timer would give the same result).
> 
> So the only situation in which a different value for these AIO 
> timers would make sense is if these AIO operations are expected to
> take longer 60s, but less than worker_shutdown_timeout (in cases 
> where it has been increased from it's own default of 60s).
> 
> In that case the AIO operation's timeout would have to be one 
> (or more) of it's own directives, with a value less than 
> worker_shutdown_timeout. 

Not really.

When worker_shutdown_timeout expires, it tries to terminate the 
request, but it can't as long as an AIO operation is running.  
When the AIO operation completes, the request will be actually 
terminated and the worker process will be allowed to exit.  So far 
so good.

But if the AIO operation never completes, the timer will expire 
after 1 minute, will log an alert, and the worker processes will 
be anyway allowed to exit (with the request still present).  This 
might not be actually possible though - for example, 
ngx_thread_pool_exit_worker() will just block waiting for all 
pending operations to complete.

In theory, the situation when an AIO operation never completes 
should never happen, and just a counter of pending AIO 
operations can be used instead to delay shutdown (which is 
essentially equivalent to an infinite timer).  

In practice, though, using a large enough guard timer might be 
better: it provides additional level of protection against bugs or 
system misbehaviour, and at least logs an alert if something 
really weird happens.  It is also looks more universal and in line 
with current approach of using existing timers as an indicator 
that something is going on and shutdown should be delayed.

The timer is expected to be "large enough", since we cannot do 
anything meaningful with an AIO operation which never completes, 
we can only complain loudly, so the timer should never expire 
unless there is something really wrong.  This is not the case with 
worker_shutdown_timeout: it can be set to an arbitrary low value, 
which is not expected to mean that something is really wrong if 
the timer expires, but rather means that nginx shouldn't try to 
process remaining requests, but should instead close them as long 
as it can do so.  That is, worker_shutdown_timeout does not fit 
semantically.  Further, worker_shutdown_timeout is not set by 
default, so it simply cannot be used.

The 1 minute was chosen as it matches default send_timeout, which 
typically accompanies AIO operations when sending responses (and 
also delays shutdown, so no "open socket left" alerts are normally 
seen).  Still, send_timeout is also quite different semantically, 
and therefore a hardcoded value is used instead.

I don't think there are valid cases when AIO operations can take 
longer than 1 minute, as these are considered to be small and fast 
operations, which are normally done synchronously within nginx 
event loop when not using AIO, and an operations which takes 1m 
would mean nginx is completely unresponsive.  Still, if we'll 
found out that 1 minute is not large enough in some cases, 
we can just bump it to a larger value.

[...]

-- 
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx-devel mailing list
nginx-devel@nginx.org
https://mailman.nginx.org/mailman/listinfo/nginx-devel

Reply via email to