Hello! On Fri, Jan 26, 2024 at 04:02:30PM +0400, Roman Arutyunyan wrote:
> On Mon, Nov 27, 2023 at 05:50:24AM +0300, Maxim Dounin wrote: > > # HG changeset patch > > # User Maxim Dounin <mdou...@mdounin.ru> > > # Date 1701049682 -10800 > > # Mon Nov 27 04:48:02 2023 +0300 > > # Node ID a5e39e9d1f4c84dcbe6a2f9e079372a3d63aef0b > > # Parent f366007dd23a6ce8e8427c1b3042781b618a2ade > > Fixed request termination with AIO and subrequests (ticket #2555). > > > > When a request was terminated due to an error via > > ngx_http_terminate_request() > > while an AIO operation was running in a subrequest, various issues were > > observed. This happened because ngx_http_request_finalizer() was only set > > in the subrequest where ngx_http_terminate_request() was called, but not > > in the subrequest where the AIO operation was running. After completion > > of the AIO operation resumed normal processing of the subrequest, leading > > to issues. > > Something is wrong with the last sentence. Thanks, rewritten as: ... After completion of the AIO operation normal processing of the subrequest was resumed, leading to issues. > > In particular, in case of the upstream module, termination of the request > > called upstream cleanup, which closed the upstream connection. Attempts to > > further work with the upstream connection after AIO operation completion > > resulted in segfaults in ngx_ssl_recv(), "readv() failed (9: Bad file > > descriptor) while reading upstream" errors, or socket leaks. > > Can you elaborate on socket leaks? For example, consider a request which is waiting for additional data from the upstream, so the only timer is the read timer for the upstream connection, and which is terminated because the client closed the connection. Request termination will remove the only timer. Still, the client connection is not yet closed by nginx. So as long as the request is not actually freed following completion of the AIO operation, this is a socket leak: we have no timers left, and no further events expected. And this can easily happen if neither segfault nor readv() error was triggered (for example, if p->upstream->read->ready was not set during AIO operation completion). > > In ticket #2555, issues were observed with the following configuration > > with cache background update (with thread writing instrumented to > > introduce a delay, when a client closes the connection during an update): > > > > location = /background-and-aio-write { > > proxy_pass ... > > proxy_cache one; > > proxy_cache_valid 200 1s; > > proxy_cache_background_update on; > > proxy_cache_use_stale updating; > > aio threads; > > aio_write on; > > limit_rate 1000; > > } > > > > Similarly, the same issue can be seen with SSI, and can be caused by > > errors in subrequests, such as in the following configuration > > (were "/proxy" uses AIO, and "/sleep" returns 444 after some delay, > > s/were/where/ ? Fixed, thanks. > > causing request termination): > > > > location = /ssi-active-boom { > > ssi on; > > ssi_types *; > > return 200 ' > > <!--#include virtual="/proxy" --> > > <!--#include virtual="/sleep" --> > > '; > > limit_rate 1000; > > } > > > > Or the same with both AIO operation and the error in non-active subrequests > > (which needs slightly different handling, see below): > > > > location = /ssi-non-active-boom { > > ssi on; > > ssi_types *; > > return 200 ' > > <!--#include virtual="/static" --> > > <!--#include virtual="/proxy" --> > > <!--#include virtual="/sleep" --> > > '; > > limit_rate 1000; > > } > > > > Similarly, issues can be observed with just static files. However, > > with static files potential impact is limited due to timeout safeguards > > in ngx_http_writer(), and the fact that c->error is set during request > > termination. > > In a simple configuration with an AIO operation in the active subrequest, > > such as in the following configuration, the connection is closed right > > after completion of the AIO operation anyway, since ngx_http_writer() > > tries to write to the connection and fails due to c->error set: > > > > location = /ssi-active-static-boom { > > ssi on; > > ssi_types *; > > return 200 ' > > <!--#include virtual="/static-aio" --> > > <!--#include virtual="/sleep" --> > > '; > > limit_rate 1000; > > } > > > > In the following configuration, with an AIO operation in a non-active > > subrequest, the connection is closed only after send_timeout expires: > > > > location = /ssi-non-active-static-boom { > > ssi on; > > ssi_types *; > > return 200 ' > > <!--#include virtual="/static" --> > > <!--#include virtual="/static-aio" --> > > <!--#include virtual="/sleep" --> > > '; > > limit_rate 1000; > > } > > > > Fix is to introduce r->main->terminated flag, which is to be checked > > by AIO event handlers when the r->main->blocked counter is decremented. > > When the flag is set, handlers are expected to wake up the connection > > instead of the subrequest (which might be already cleaned up). > > > > Additionally, now ngx_http_request_finalizer() is always set in the > > active subrequest, so waking up the connection properly finalizes the > > request even if termination happened in a non-active subrequest. > > The issue does not seem to be significant for static file. In fact, the > biggest problem is trying to use a resource after it was freed by an > ngx_http_cleanup_add()-registered handler, as opposed to > ngx_pool_cleanup_add() > handlers which are safer, but serve a slightly different purpose. > > As for non-ngx_http_cleanup_add() related code (like static files), the > effect > of the issue is just a possible delay of the connection closure until output > is > produced, in which case typically ngx_http_write_filter() triggers the > closure. > So the patch basically fixes a time delay (usually limited by a timeout). > IMO there's no need to go in so many details about that. The issue with static files demonstrates that the issue goes beyond use of cleanup handlers. As such, just skipping cleanup handlers with mr->blocked is wrong (see below). Also, it shows that the safeguard timeouts we use in ngx_http_writer() are actually useful, and save us from bigger problems here. In the past, there were attempts to remove these timers as long as an AIO operation is running. And the details provide ways to reproduce different aspects of the issue (and were actually used during testing and development of the patch). > > > diff --git a/src/http/ngx_http_copy_filter_module.c > > b/src/http/ngx_http_copy_filter_module.c > > --- a/src/http/ngx_http_copy_filter_module.c > > +++ b/src/http/ngx_http_copy_filter_module.c > > @@ -195,9 +195,18 @@ ngx_http_copy_aio_event_handler(ngx_even > > r->main->blocked--; > > r->aio = 0; > > > > - r->write_event_handler(r); > > + if (r->main->terminated) { > > + /* > > + * trigger connection event handler if the request was > > + * terminated > > + */ > > > > - ngx_http_run_posted_requests(c); > > + c->write->handler(c->write); > > + > > + } else { > > + r->write_event_handler(r); > > + ngx_http_run_posted_requests(c); > > + } > > } > > > > #endif > > @@ -305,11 +314,11 @@ ngx_http_copy_thread_event_handler(ngx_e > > > > #endif > > > > - if (r->done) { > > + if (r->done || r->main->terminated) { > > /* > > * trigger connection event handler if the subrequest was > > - * already finalized; this can happen if the handler is used > > - * for sendfile() in threads > > + * already finalized (this can happen if the handler is used > > + * for sendfile() in threads), or if the request was terminated > > */ > > > > c->write->handler(c->write); > > diff --git a/src/http/ngx_http_file_cache.c b/src/http/ngx_http_file_cache.c > > --- a/src/http/ngx_http_file_cache.c > > +++ b/src/http/ngx_http_file_cache.c > > @@ -14,7 +14,7 @@ > > static ngx_int_t ngx_http_file_cache_lock(ngx_http_request_t *r, > > ngx_http_cache_t *c); > > static void ngx_http_file_cache_lock_wait_handler(ngx_event_t *ev); > > -static void ngx_http_file_cache_lock_wait(ngx_http_request_t *r, > > +static ngx_int_t ngx_http_file_cache_lock_wait(ngx_http_request_t *r, > > ngx_http_cache_t *c); > > static ngx_int_t ngx_http_file_cache_read(ngx_http_request_t *r, > > ngx_http_cache_t *c); > > @@ -463,6 +463,7 @@ ngx_http_file_cache_lock(ngx_http_reques > > static void > > ngx_http_file_cache_lock_wait_handler(ngx_event_t *ev) > > { > > + ngx_int_t rc; > > ngx_connection_t *c; > > ngx_http_request_t *r; > > > > @@ -474,13 +475,31 @@ ngx_http_file_cache_lock_wait_handler(ng > > ngx_log_debug2(NGX_LOG_DEBUG_HTTP, c->log, 0, > > "http file cache wait: \"%V?%V\"", &r->uri, &r->args); > > > > - ngx_http_file_cache_lock_wait(r, r->cache); > > - > > - ngx_http_run_posted_requests(c); > > + rc = ngx_http_file_cache_lock_wait(r, r->cache); > > + > > + if (rc == NGX_AGAIN) { > > + return; > > + } > > + > > + r->cache->waiting = 0; > > + r->main->blocked--; > > + > > + if (r->main->terminated) { > > + /* > > + * trigger connection event handler if the request was > > + * terminated > > + */ > > + > > + c->write->handler(c->write); > > + > > + } else { > > + r->write_event_handler(r); > > + ngx_http_run_posted_requests(c); > > + } > > } > > BTW, cache lock is not a real aio. It's just a regular event timer. > And it's deleted in ngx_http_file_cache_free() which is called from > ngx_http_upstream_finalize_request(). So it looks like the "terminated" > flag will never be 1 here. The upstream cleanup handler is installed after checking the cache (and waiting for the cache lock, if needed). And that's, basically, why the code uses r->blocked in the first place. While the code can be rewritten to not depend on the r->blocked flag and so checking r->main->terminated won't be needed as well, this is not currently the case. > > > -static void > > +static ngx_int_t > > ngx_http_file_cache_lock_wait(ngx_http_request_t *r, ngx_http_cache_t *c) > > { > > ngx_uint_t wait; > > @@ -495,7 +514,7 @@ ngx_http_file_cache_lock_wait(ngx_http_r > > ngx_log_error(NGX_LOG_INFO, r->connection->log, 0, > > "cache lock timeout"); > > c->lock_timeout = 0; > > - goto wakeup; > > + return NGX_OK; > > } > > > > cache = c->file_cache; > > @@ -513,14 +532,10 @@ ngx_http_file_cache_lock_wait(ngx_http_r > > > > if (wait) { > > ngx_add_timer(&c->wait_event, (timer > 500) ? 500 : timer); > > - return; > > + return NGX_AGAIN; > > } > > > > -wakeup: > > - > > - c->waiting = 0; > > - r->main->blocked--; > > - r->write_event_handler(r); > > + return NGX_OK; > > } > > > > > > @@ -740,9 +755,18 @@ ngx_http_cache_aio_event_handler(ngx_eve > > r->main->blocked--; > > r->aio = 0; > > > > - r->write_event_handler(r); > > - > > - ngx_http_run_posted_requests(c); > > + if (r->main->terminated) { > > + /* > > + * trigger connection event handler if the request was > > + * terminated > > + */ > > + > > + c->write->handler(c->write); > > + > > + } else { > > + r->write_event_handler(r); > > + ngx_http_run_posted_requests(c); > > + } > > } > > > > #endif > > @@ -810,9 +834,18 @@ ngx_http_cache_thread_event_handler(ngx_ > > r->main->blocked--; > > r->aio = 0; > > > > - r->write_event_handler(r); > > - > > - ngx_http_run_posted_requests(c); > > + if (r->main->terminated) { > > + /* > > + * trigger connection event handler if the request was > > + * terminated > > + */ > > + > > + c->write->handler(c->write); > > + > > + } else { > > + r->write_event_handler(r); > > + ngx_http_run_posted_requests(c); > > + } > > } > > > > #endif > > diff --git a/src/http/ngx_http_request.c b/src/http/ngx_http_request.c > > --- a/src/http/ngx_http_request.c > > +++ b/src/http/ngx_http_request.c > > @@ -2681,6 +2681,8 @@ ngx_http_terminate_request(ngx_http_requ > > ngx_log_debug1(NGX_LOG_DEBUG_HTTP, r->connection->log, 0, > > "http terminate request count:%d", mr->count); > > > > + mr->terminated = 1; > > Another solution could be skipping the cleanup handlers below if mr->blocked > is set. This would fix the crash, but would not fix the delay though. Exactly. While skipping cleanup handlers will fix a particular crash in the upstream module, the same issue observed with static files clearly demonstrates that this is a wrong approach. Further, skipping cleanup handlers means that we will continue doing unneeded (and potentially dangerous, given that request termination can be due to a fatal error, such as a memory allocation error) work we can otherwise cancel early by calling the cleanup handler. > > > if (rc > 0 && (mr->headers_out.status == 0 || mr->connection->sent == > > 0)) { > > mr->headers_out.status = rc; > > } > > @@ -2703,8 +2705,13 @@ ngx_http_terminate_request(ngx_http_requ > > if (mr->write_event_handler) { > > > > if (mr->blocked) { > > + if (r != r->connection->data) { > > + r = r->connection->data; > > + } > > Why not simply r = r->connection->data. Or maybe a new variable > ar (active request) similar to mr (main request) would make sense. When writing this, I've decided that using conditional assignment better explains the idea that we need the active subrequest, and therefore switch to it if "r" is not active. Otherwise, just assignment is equivalent. Changed to just "r = r->connection->data;", as I have no strong preference here (and we already do "r = r->main;" in several places). The new variable looks like an overkill though. > > > + > > r->connection->error = 1; > > r->write_event_handler = ngx_http_request_finalizer; > > + > > return; > > } > > > > diff --git a/src/http/ngx_http_request.h b/src/http/ngx_http_request.h > > --- a/src/http/ngx_http_request.h > > +++ b/src/http/ngx_http_request.h > > @@ -550,6 +550,7 @@ struct ngx_http_request_s { > > unsigned root_tested:1; > > unsigned done:1; > > unsigned logged:1; > > + unsigned terminated:1; > > > > unsigned buffered:4; > > > > diff --git a/src/http/ngx_http_upstream.c b/src/http/ngx_http_upstream.c > > --- a/src/http/ngx_http_upstream.c > > +++ b/src/http/ngx_http_upstream.c > > @@ -3984,11 +3984,11 @@ ngx_http_upstream_thread_event_handler(n > > > > #endif > > > > - if (r->done) { > > + if (r->done || r->main->terminated) { > > /* > > * trigger connection event handler if the subrequest was > > - * already finalized; this can happen if the handler is used > > - * for sendfile() in threads > > + * already finalized (this can happen if the handler is used > > + * for sendfile() in threads), or if the request was terminated > > */ > > > > c->write->handler(c->write); > > The patch is generally ok. Just in case, here is the updated patch: # HG changeset patch # User Maxim Dounin <mdou...@mdounin.ru> # Date 1706513520 -10800 # Mon Jan 29 10:32:00 2024 +0300 # Node ID 35bfb011f69bb97cf853b379fbdcfd5052d0e3ed # Parent e88cdaa0f1ffc9af3144770c72ee5baf07b2562e Fixed request termination with AIO and subrequests (ticket #2555). When a request was terminated due to an error via ngx_http_terminate_request() while an AIO operation was running in a subrequest, various issues were observed. This happened because ngx_http_request_finalizer() was only set in the subrequest where ngx_http_terminate_request() was called, but not in the subrequest where the AIO operation was running. After completion of the AIO operation normal processing of the subrequest was resumed, leading to issues. In particular, in case of the upstream module, termination of the request called upstream cleanup, which closed the upstream connection. Attempts to further work with the upstream connection after AIO operation completion resulted in segfaults in ngx_ssl_recv(), "readv() failed (9: Bad file descriptor) while reading upstream" errors, or socket leaks. In ticket #2555, issues were observed with the following configuration with cache background update (with thread writing instrumented to introduce a delay, when a client closes the connection during an update): location = /background-and-aio-write { proxy_pass ... proxy_cache one; proxy_cache_valid 200 1s; proxy_cache_background_update on; proxy_cache_use_stale updating; aio threads; aio_write on; limit_rate 1000; } Similarly, the same issue can be seen with SSI, and can be caused by errors in subrequests, such as in the following configuration (where "/proxy" uses AIO, and "/sleep" returns 444 after some delay, causing request termination): location = /ssi-active-boom { ssi on; ssi_types *; return 200 ' <!--#include virtual="/proxy" --> <!--#include virtual="/sleep" --> '; limit_rate 1000; } Or the same with both AIO operation and the error in non-active subrequests (which needs slightly different handling, see below): location = /ssi-non-active-boom { ssi on; ssi_types *; return 200 ' <!--#include virtual="/static" --> <!--#include virtual="/proxy" --> <!--#include virtual="/sleep" --> '; limit_rate 1000; } Similarly, issues can be observed with just static files. However, with static files potential impact is limited due to timeout safeguards in ngx_http_writer(), and the fact that c->error is set during request termination. In a simple configuration with an AIO operation in the active subrequest, such as in the following configuration, the connection is closed right after completion of the AIO operation anyway, since ngx_http_writer() tries to write to the connection and fails due to c->error set: location = /ssi-active-static-boom { ssi on; ssi_types *; return 200 ' <!--#include virtual="/static-aio" --> <!--#include virtual="/sleep" --> '; limit_rate 1000; } In the following configuration, with an AIO operation in a non-active subrequest, the connection is closed only after send_timeout expires: location = /ssi-non-active-static-boom { ssi on; ssi_types *; return 200 ' <!--#include virtual="/static" --> <!--#include virtual="/static-aio" --> <!--#include virtual="/sleep" --> '; limit_rate 1000; } Fix is to introduce r->main->terminated flag, which is to be checked by AIO event handlers when the r->main->blocked counter is decremented. When the flag is set, handlers are expected to wake up the connection instead of the subrequest (which might be already cleaned up). Additionally, now ngx_http_request_finalizer() is always set in the active subrequest, so waking up the connection properly finalizes the request even if termination happened in a non-active subrequest. diff --git a/src/http/ngx_http_copy_filter_module.c b/src/http/ngx_http_copy_filter_module.c --- a/src/http/ngx_http_copy_filter_module.c +++ b/src/http/ngx_http_copy_filter_module.c @@ -208,9 +208,18 @@ ngx_http_copy_aio_event_handler(ngx_even r->main->blocked--; r->aio = 0; - r->write_event_handler(r); + if (r->main->terminated) { + /* + * trigger connection event handler if the request was + * terminated + */ - ngx_http_run_posted_requests(c); + c->write->handler(c->write); + + } else { + r->write_event_handler(r); + ngx_http_run_posted_requests(c); + } } #endif @@ -331,11 +340,11 @@ ngx_http_copy_thread_event_handler(ngx_e #endif - if (r->done) { + if (r->done || r->main->terminated) { /* * trigger connection event handler if the subrequest was - * already finalized; this can happen if the handler is used - * for sendfile() in threads + * already finalized (this can happen if the handler is used + * for sendfile() in threads), or if the request was terminated */ c->write->handler(c->write); diff --git a/src/http/ngx_http_file_cache.c b/src/http/ngx_http_file_cache.c --- a/src/http/ngx_http_file_cache.c +++ b/src/http/ngx_http_file_cache.c @@ -14,7 +14,7 @@ static ngx_int_t ngx_http_file_cache_lock(ngx_http_request_t *r, ngx_http_cache_t *c); static void ngx_http_file_cache_lock_wait_handler(ngx_event_t *ev); -static void ngx_http_file_cache_lock_wait(ngx_http_request_t *r, +static ngx_int_t ngx_http_file_cache_lock_wait(ngx_http_request_t *r, ngx_http_cache_t *c); static ngx_int_t ngx_http_file_cache_read(ngx_http_request_t *r, ngx_http_cache_t *c); @@ -463,6 +463,7 @@ ngx_http_file_cache_lock(ngx_http_reques static void ngx_http_file_cache_lock_wait_handler(ngx_event_t *ev) { + ngx_int_t rc; ngx_connection_t *c; ngx_http_request_t *r; @@ -474,13 +475,31 @@ ngx_http_file_cache_lock_wait_handler(ng ngx_log_debug2(NGX_LOG_DEBUG_HTTP, c->log, 0, "http file cache wait: \"%V?%V\"", &r->uri, &r->args); - ngx_http_file_cache_lock_wait(r, r->cache); - - ngx_http_run_posted_requests(c); + rc = ngx_http_file_cache_lock_wait(r, r->cache); + + if (rc == NGX_AGAIN) { + return; + } + + r->cache->waiting = 0; + r->main->blocked--; + + if (r->main->terminated) { + /* + * trigger connection event handler if the request was + * terminated + */ + + c->write->handler(c->write); + + } else { + r->write_event_handler(r); + ngx_http_run_posted_requests(c); + } } -static void +static ngx_int_t ngx_http_file_cache_lock_wait(ngx_http_request_t *r, ngx_http_cache_t *c) { ngx_uint_t wait; @@ -495,7 +514,7 @@ ngx_http_file_cache_lock_wait(ngx_http_r ngx_log_error(NGX_LOG_INFO, r->connection->log, 0, "cache lock timeout"); c->lock_timeout = 0; - goto wakeup; + return NGX_OK; } cache = c->file_cache; @@ -513,14 +532,10 @@ ngx_http_file_cache_lock_wait(ngx_http_r if (wait) { ngx_add_timer(&c->wait_event, (timer > 500) ? 500 : timer); - return; + return NGX_AGAIN; } -wakeup: - - c->waiting = 0; - r->main->blocked--; - r->write_event_handler(r); + return NGX_OK; } @@ -753,9 +768,18 @@ ngx_http_cache_aio_event_handler(ngx_eve r->main->blocked--; r->aio = 0; - r->write_event_handler(r); - - ngx_http_run_posted_requests(c); + if (r->main->terminated) { + /* + * trigger connection event handler if the request was + * terminated + */ + + c->write->handler(c->write); + + } else { + r->write_event_handler(r); + ngx_http_run_posted_requests(c); + } } #endif @@ -836,9 +860,18 @@ ngx_http_cache_thread_event_handler(ngx_ r->main->blocked--; r->aio = 0; - r->write_event_handler(r); - - ngx_http_run_posted_requests(c); + if (r->main->terminated) { + /* + * trigger connection event handler if the request was + * terminated + */ + + c->write->handler(c->write); + + } else { + r->write_event_handler(r); + ngx_http_run_posted_requests(c); + } } #endif diff --git a/src/http/ngx_http_request.c b/src/http/ngx_http_request.c --- a/src/http/ngx_http_request.c +++ b/src/http/ngx_http_request.c @@ -2694,6 +2694,8 @@ ngx_http_terminate_request(ngx_http_requ ngx_log_debug1(NGX_LOG_DEBUG_HTTP, r->connection->log, 0, "http terminate request count:%d", mr->count); + mr->terminated = 1; + if (rc > 0 && (mr->headers_out.status == 0 || mr->connection->sent == 0)) { mr->headers_out.status = rc; } @@ -2716,8 +2718,11 @@ ngx_http_terminate_request(ngx_http_requ if (mr->write_event_handler) { if (mr->blocked) { + r = r->connection->data; + r->connection->error = 1; r->write_event_handler = ngx_http_request_finalizer; + return; } diff --git a/src/http/ngx_http_request.h b/src/http/ngx_http_request.h --- a/src/http/ngx_http_request.h +++ b/src/http/ngx_http_request.h @@ -550,6 +550,7 @@ struct ngx_http_request_s { unsigned root_tested:1; unsigned done:1; unsigned logged:1; + unsigned terminated:1; unsigned buffered:4; diff --git a/src/http/ngx_http_upstream.c b/src/http/ngx_http_upstream.c --- a/src/http/ngx_http_upstream.c +++ b/src/http/ngx_http_upstream.c @@ -3997,11 +3997,11 @@ ngx_http_upstream_thread_event_handler(n #endif - if (r->done) { + if (r->done || r->main->terminated) { /* * trigger connection event handler if the subrequest was - * already finalized; this can happen if the handler is used - * for sendfile() in threads + * already finalized (this can happen if the handler is used + * for sendfile() in threads), or if the request was terminated */ c->write->handler(c->write); -- Maxim Dounin http://mdounin.ru/ _______________________________________________ nginx-devel mailing list nginx-devel@nginx.org https://mailman.nginx.org/mailman/listinfo/nginx-devel