[PATCH 0 of 4] aio fixes
Hello! Here is a patch series which fixes several issues with AIO (threaded IO) and some related issues, initially inspired by ticket #2555. Most notably, it fixes various issues with incorrect request termination with subrequests when an AIO operation is running in another subrequest, as observed in ticket #2555 (first patch). Additionally, it fixes an issue with request termination due to filter finalization with ngx_event_pipe() on stack (second patch), fixes incorrect alerts about open sockets left when SIGQUIT is followed by SIGTERM (third patch), and introduces guard timers for AIO operations to prevent premature shutdown of worker processes (forth patch, ticket #2162). In tests this fixes two alerts as observed with aio_write, in image_filter_finalize.t and in proxy_cache_use_stale.t: image_filter_finalize.t (Wstat: 0 Tests: 5 Failed: 0) TODO passed: 4 proxy_cache_use_stale.t (Wstat: 0 Tests: 36 Failed: 0) TODO passed: 35 Review and testing appreciated. -- Maxim Dounin http://mdounin.ru/ ___ nginx-devel mailing list nginx-devel@nginx.org https://mailman.nginx.org/mailman/listinfo/nginx-devel
[PATCH 3 of 4] Silenced complaints about socket leaks on forced termination
# HG changeset patch # User Maxim Dounin # Date 1701049787 -10800 # Mon Nov 27 04:49:47 2023 +0300 # Node ID 61d08e4cf97cc073200ec32fc6ada9a2d48ffe51 # Parent faf0b9defc76b8683af466f8a950c2c241382970 Silenced complaints about socket leaks on forced termination. When graceful shutdown was requested, and then nginx was forced to do fast shutdown, it used to (incorrectly) complain about open sockets left in connections which weren't yet closed when fast shutdown was requested. Fix is to avoid complaining about open sockets when fast shutdown was requested after graceful one. Abnormal termination, if requested with the WINCH signal, can still happen though. diff --git a/src/os/unix/ngx_process_cycle.c b/src/os/unix/ngx_process_cycle.c --- a/src/os/unix/ngx_process_cycle.c +++ b/src/os/unix/ngx_process_cycle.c @@ -948,7 +948,7 @@ ngx_worker_process_exit(ngx_cycle_t *cyc } } -if (ngx_exiting) { +if (ngx_exiting && !ngx_terminate) { c = cycle->connections; for (i = 0; i < cycle->connection_n; i++) { if (c[i].fd != -1 @@ -963,11 +963,11 @@ ngx_worker_process_exit(ngx_cycle_t *cyc ngx_debug_quit = 1; } } +} -if (ngx_debug_quit) { -ngx_log_error(NGX_LOG_ALERT, cycle->log, 0, "aborting"); -ngx_debug_point(); -} +if (ngx_debug_quit) { +ngx_log_error(NGX_LOG_ALERT, cycle->log, 0, "aborting"); +ngx_debug_point(); } /* diff --git a/src/os/win32/ngx_process_cycle.c b/src/os/win32/ngx_process_cycle.c --- a/src/os/win32/ngx_process_cycle.c +++ b/src/os/win32/ngx_process_cycle.c @@ -834,7 +834,7 @@ ngx_worker_process_exit(ngx_cycle_t *cyc } } -if (ngx_exiting) { +if (ngx_exiting && !ngx_terminate) { c = cycle->connections; for (i = 0; i < cycle->connection_n; i++) { if (c[i].fd != (ngx_socket_t) -1 ___ nginx-devel mailing list nginx-devel@nginx.org https://mailman.nginx.org/mailman/listinfo/nginx-devel
[PATCH 2 of 4] Upstream: fixed usage of closed sockets with filter finalization
# HG changeset patch # User Maxim Dounin # Date 1701049758 -10800 # Mon Nov 27 04:49:18 2023 +0300 # Node ID faf0b9defc76b8683af466f8a950c2c241382970 # Parent a5e39e9d1f4c84dcbe6a2f9e079372a3d63aef0b Upstream: fixed usage of closed sockets with filter finalization. When filter finalization is triggered when working with an upstream server, and error_page redirects request processing to some simple handler, ngx_http_request_finalize() triggers request termination when the response is sent. In particular, via the upstream cleanup handler, nginx will close the upstream connection and the corresponding socket. Still, this can happen to be with ngx_event_pipe() on stack. While the code will set p->downstream_error due to NGX_ERROR returned from the output filter chain by filter finalization, otherwise the error will be ignored till control returns to ngx_http_upstream_process_request(). And event pipe might try reading from the (already closed) socket, resulting in "readv() failed (9: Bad file descriptor) while reading upstream" errors (or even segfaults with SSL). Such errors were seen with the following configuration: location /t2 { proxy_pass http://127.0.0.1:8080/big; image_filter_buffer 10m; image_filter resize 150 100; error_page 415 = /empty; } location /empty { return 204; } location /big { # big enough static file } Fix is to set p->upstream_error in ngx_http_upstream_finalize_request(), so the existing checks in ngx_event_pipe_read_upstream() will prevent further reading from the closed upstream connection. Similarly, p->upstream_error is now checked when handling events at ngx_event_pipe() exit, as checking p->upstream->fd is not enough if keepalive upstream connections are being used and the connection was saved to cache during request termination. diff --git a/src/event/ngx_event_pipe.c b/src/event/ngx_event_pipe.c --- a/src/event/ngx_event_pipe.c +++ b/src/event/ngx_event_pipe.c @@ -57,7 +57,9 @@ ngx_event_pipe(ngx_event_pipe_t *p, ngx_ do_write = 1; } -if (p->upstream->fd != (ngx_socket_t) -1) { +if (p->upstream->fd != (ngx_socket_t) -1 +&& !p->upstream_error) +{ rev = p->upstream->read; flags = (rev->eof || rev->error) ? NGX_CLOSE_EVENT : 0; diff --git a/src/http/ngx_http_upstream.c b/src/http/ngx_http_upstream.c --- a/src/http/ngx_http_upstream.c +++ b/src/http/ngx_http_upstream.c @@ -4561,6 +4561,10 @@ ngx_http_upstream_finalize_request(ngx_h u->peer.connection = NULL; +if (u->pipe) { +u->pipe->upstream_error = 1; +} + if (u->pipe && u->pipe->temp_file) { ngx_log_debug1(NGX_LOG_DEBUG_HTTP, r->connection->log, 0, "http upstream temp fd: %d", ___ nginx-devel mailing list nginx-devel@nginx.org https://mailman.nginx.org/mailman/listinfo/nginx-devel
[PATCH 4 of 4] AIO operations now add timers (ticket #2162)
# HG changeset patch # User Maxim Dounin # Date 1701050170 -10800 # Mon Nov 27 04:56:10 2023 +0300 # Node ID 00c3e7333145ddb5ea0eeaaa66b3d9c26973c9c2 # Parent 61d08e4cf97cc073200ec32fc6ada9a2d48ffe51 AIO operations now add timers (ticket #2162). Each AIO (thread IO) operation being run is now accompanied with 1-minute timer. This timer prevents unexpected shutdown of the worker process while an AIO operation is running, and logs an alert if the operation is running for too long. This fixes "open socket left" alerts during worker processes shutdown due to pending AIO (or thread IO) operations while corresponding requests have no timers. In particular, such errors were observed while reading cache headers (ticket #2162), and with worker_shutdown_timeout. diff --git a/src/http/ngx_http_copy_filter_module.c b/src/http/ngx_http_copy_filter_module.c --- a/src/http/ngx_http_copy_filter_module.c +++ b/src/http/ngx_http_copy_filter_module.c @@ -170,6 +170,8 @@ ngx_http_copy_aio_handler(ngx_output_cha file->aio->data = r; file->aio->handler = ngx_http_copy_aio_event_handler; +ngx_add_timer(&file->aio->event, 6); + r->main->blocked++; r->aio = 1; ctx->aio = 1; @@ -192,6 +194,17 @@ ngx_http_copy_aio_event_handler(ngx_even ngx_log_debug2(NGX_LOG_DEBUG_HTTP, c->log, 0, "http aio: \"%V?%V\"", &r->uri, &r->args); +if (ev->timedout) { +ngx_log_error(NGX_LOG_ALERT, c->log, 0, + "aio operation took too long"); +ev->timedout = 0; +return; +} + +if (ev->timer_set) { +ngx_del_timer(ev); +} + r->main->blocked--; r->aio = 0; @@ -273,6 +286,8 @@ ngx_http_copy_thread_handler(ngx_thread_ return NGX_ERROR; } +ngx_add_timer(&task->event, 6); + r->main->blocked++; r->aio = 1; @@ -297,6 +312,17 @@ ngx_http_copy_thread_event_handler(ngx_e ngx_log_debug2(NGX_LOG_DEBUG_HTTP, c->log, 0, "http thread: \"%V?%V\"", &r->uri, &r->args); +if (ev->timedout) { +ngx_log_error(NGX_LOG_ALERT, c->log, 0, + "thread operation took too long"); +ev->timedout = 0; +return; +} + +if (ev->timer_set) { +ngx_del_timer(ev); +} + r->main->blocked--; r->aio = 0; diff --git a/src/http/ngx_http_file_cache.c b/src/http/ngx_http_file_cache.c --- a/src/http/ngx_http_file_cache.c +++ b/src/http/ngx_http_file_cache.c @@ -705,6 +705,8 @@ ngx_http_file_cache_aio_read(ngx_http_re c->file.aio->data = r; c->file.aio->handler = ngx_http_cache_aio_event_handler; +ngx_add_timer(&c->file.aio->event, 6); + r->main->blocked++; r->aio = 1; @@ -752,6 +754,17 @@ ngx_http_cache_aio_event_handler(ngx_eve ngx_log_debug2(NGX_LOG_DEBUG_HTTP, c->log, 0, "http file cache aio: \"%V?%V\"", &r->uri, &r->args); +if (ev->timedout) { +ngx_log_error(NGX_LOG_ALERT, c->log, 0, + "aio operation took too long"); +ev->timedout = 0; +return; +} + +if (ev->timer_set) { +ngx_del_timer(ev); +} + r->main->blocked--; r->aio = 0; @@ -810,6 +823,8 @@ ngx_http_cache_thread_handler(ngx_thread return NGX_ERROR; } +ngx_add_timer(&task->event, 6); + r->main->blocked++; r->aio = 1; @@ -831,6 +846,17 @@ ngx_http_cache_thread_event_handler(ngx_ ngx_log_debug2(NGX_LOG_DEBUG_HTTP, c->log, 0, "http file cache thread: \"%V?%V\"", &r->uri, &r->args); +if (ev->timedout) { +ngx_log_error(NGX_LOG_ALERT, c->log, 0, + "thread operation took too long"); +ev->timedout = 0; +return; +} + +if (ev->timer_set) { +ngx_del_timer(ev); +} + r->main->blocked--; r->aio = 0; diff --git a/src/http/ngx_http_upstream.c b/src/http/ngx_http_upstream.c --- a/src/http/ngx_http_upstream.c +++ b/src/http/ngx_http_upstream.c @@ -3949,6 +3949,8 @@ ngx_http_upstream_thread_handler(ngx_thr r->aio = 1; p->aio = 1; +ngx_add_timer(&task->event, 6); + return NGX_OK; } @@ -3967,6 +3969,17 @@ ngx_http_upstream_thread_event_handler(n ngx_log_debug2(NGX_LOG_DEBUG_HTTP, c->log, 0, "http upstream thread: \"%V?%V\"", &r->uri, &r->args); +if (ev->timedout) { +ngx_log_error(NGX_LOG_ALERT, c->log, 0, + "thread operation took too long"); +ev->timedout = 0; +return; +} + +if (ev->timer_set) { +ngx_del_timer(ev); +} + r->main->blocked--; r->aio = 0; diff --git a/src/os/unix/ngx_files.c b/src/os/unix/ngx_files.c --- a/src/os/unix/ngx_files.c +++ b/src/os/unix/ngx_files.c @@ -110,6 +110,8 @@ ngx_thread_read(ngx_file_t *file, u_char return NGX_ERROR; } +task->event.log = file->log; + file->thread_task = task
[PATCH 1 of 4] Fixed request termination with AIO and subrequests (ticket #2555)
# HG changeset patch # User Maxim Dounin # Date 1701049682 -10800 # Mon Nov 27 04:48:02 2023 +0300 # Node ID a5e39e9d1f4c84dcbe6a2f9e079372a3d63aef0b # Parent f366007dd23a6ce8e8427c1b3042781b618a2ade Fixed request termination with AIO and subrequests (ticket #2555). When a request was terminated due to an error via ngx_http_terminate_request() while an AIO operation was running in a subrequest, various issues were observed. This happened because ngx_http_request_finalizer() was only set in the subrequest where ngx_http_terminate_request() was called, but not in the subrequest where the AIO operation was running. After completion of the AIO operation resumed normal processing of the subrequest, leading to issues. In particular, in case of the upstream module, termination of the request called upstream cleanup, which closed the upstream connection. Attempts to further work with the upstream connection after AIO operation completion resulted in segfaults in ngx_ssl_recv(), "readv() failed (9: Bad file descriptor) while reading upstream" errors, or socket leaks. In ticket #2555, issues were observed with the following configuration with cache background update (with thread writing instrumented to introduce a delay, when a client closes the connection during an update): location = /background-and-aio-write { proxy_pass ... proxy_cache one; proxy_cache_valid 200 1s; proxy_cache_background_update on; proxy_cache_use_stale updating; aio threads; aio_write on; limit_rate 1000; } Similarly, the same issue can be seen with SSI, and can be caused by errors in subrequests, such as in the following configuration (were "/proxy" uses AIO, and "/sleep" returns 444 after some delay, causing request termination): location = /ssi-active-boom { ssi on; ssi_types *; return 200 ' '; limit_rate 1000; } Or the same with both AIO operation and the error in non-active subrequests (which needs slightly different handling, see below): location = /ssi-non-active-boom { ssi on; ssi_types *; return 200 ' '; limit_rate 1000; } Similarly, issues can be observed with just static files. However, with static files potential impact is limited due to timeout safeguards in ngx_http_writer(), and the fact that c->error is set during request termination. In a simple configuration with an AIO operation in the active subrequest, such as in the following configuration, the connection is closed right after completion of the AIO operation anyway, since ngx_http_writer() tries to write to the connection and fails due to c->error set: location = /ssi-active-static-boom { ssi on; ssi_types *; return 200 ' '; limit_rate 1000; } In the following configuration, with an AIO operation in a non-active subrequest, the connection is closed only after send_timeout expires: location = /ssi-non-active-static-boom { ssi on; ssi_types *; return 200 ' '; limit_rate 1000; } Fix is to introduce r->main->terminated flag, which is to be checked by AIO event handlers when the r->main->blocked counter is decremented. When the flag is set, handlers are expected to wake up the connection instead of the subrequest (which might be already cleaned up). Additionally, now ngx_http_request_finalizer() is always set in the active subrequest, so waking up the connection properly finalizes the request even if termination happened in a non-active subrequest. diff --git a/src/http/ngx_http_copy_filter_module.c b/src/http/ngx_http_copy_filter_module.c --- a/src/http/ngx_http_copy_filter_module.c +++ b/src/http/ngx_http_copy_filter_module.c @@ -195,9 +195,18 @@ ngx_http_copy_aio_event_handler(ngx_even r->main->blocked--; r->aio = 0; -r->write_event_handler(r); +if (r->main->terminated) { +/* + * trigger connection event handler if the request was + * terminated + */ -ngx_http_run_posted_requests(c); +c->write->handler(c->write); + +} else { +r->write_event_handler(r); +ngx_http_run_posted_requests(c); +} } #endif @@ -305,11 +314,11 @@ ngx_http_copy_thread_event_handler(ngx_e #endif -if (r->done) { +if (r->done || r->main->terminated) { /* * trigger connection event handler if the subrequest was - * already finalized; this can happen if the handler is used - * for sendfile() in threads + * already finalized (this can happen if the handler is used + * for sendfile() in threads), or if the request was terminated