Re: [users@httpd] graceful-stop closes established connections without response
On 1/31/24 3:11 PM, Sherrard Burton wrote: two is about par for the course _when there are resets_. but it doesn't necessarily happen on every test run. for example, after initially applying the v1 patch (and having fully composed a response to say that the patch seemed to have worked :-)), i discovered that i had forgotten to switch back from prefork to event. ie, i ran the test a few times in a row with no resets, even though the patch was not in play. i have not previously, but i will try to gather some statistics about number of resets per failed run and failed vs successful runs, since the combination of the two is needed to make any quantitative assessment. correction: two is par for the course when the client connection is remote. in order to have an iterable test setup, i created a bash loop that started the tcpdump, strace and curl instances all on the server while simultaneously gracefully stopping apache. with client connection coming from the localhost, we end up with many more reset connections per run, and failures present in each test. but ultimately, the results are such that i don't think any changes are warranted. stock debian apache 2.4.57-2, 100 iterations: average number of established connections/submitted requests: 67284 / 100 = 672.84 average number of responses received: 65162 / 100 = 651.62 average number of reset connections: 2122 / 100 = 21.22 patched with v3, 100 iterations: average number of established connections/submitted requests: 68319 / 100 = 683.19 average number of responses received: 66082 / 100 = 660.82 average number of reset connections: 2237 / 100 = 22.37 not sure if the slight increase in connections established and handled under the patched version indicates that there might be some benefit to the patch, or if the difference is small enough that it can be chalked up slight differences in resource demands in the VM host at the time that the tests were running. - To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org
Re: [users@httpd] graceful-stop closes established connections without response
On 1/31/24 06:24 AM, Yann Ylavic wrote: On Tue, Jan 30, 2024 at 8:24 PM Sherrard Burton wrote: i have confirmed that the patch has been applied, and the behavior still persists, as confirmed by comparing the counts of [SYN,ACK] and accept() ~$ tcpdump -n -r /tmp/tcpdump.pcap | grep -Fc '[S.]'; grep -Fh 'accept4' /tmp/strace-apache2.out.* | grep -Fc .240.209 reading from file /tmp/tcpdump.pcap, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 Warning: interface names might be incorrect 3485 3483 This means those two connections came in (or were made available by the system) after the last accept() call, which is the race condition that httpd can do nothing about unfortunately. How much does it improve compared to non-patched httpd, how many reset connections without the patch? If not significant I don't think it's worth attempting to do something about it.. two is about par for the course _when there are resets_. but it doesn't necessarily happen on every test run. for example, after initially applying the v1 patch (and having fully composed a response to say that the patch seemed to have worked :-)), i discovered that i had forgotten to switch back from prefork to event. ie, i ran the test a few times in a row with no resets, even though the patch was not in play. i have not previously, but i will try to gather some statistics about number of resets per failed run and failed vs successful runs, since the combination of the two is needed to make any quantitative assessment. - To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org
Re: [users@httpd] graceful-stop closes established connections without response
On Tue, Jan 30, 2024 at 8:24 PM Sherrard Burton wrote: > > i have confirmed that the patch has been applied, and the behavior still > persists, as confirmed by comparing the counts of [SYN,ACK] and accept() > > ~$ tcpdump -n -r /tmp/tcpdump.pcap | grep -Fc '[S.]'; grep -Fh 'accept4' > /tmp/strace-apache2.out.* | grep -Fc .240.209 > reading from file /tmp/tcpdump.pcap, link-type LINUX_SLL2 (Linux cooked > v2), snapshot length 262144 > Warning: interface names might be incorrect > 3485 > 3483 This means those two connections came in (or were made available by the system) after the last accept() call, which is the race condition that httpd can do nothing about unfortunately. How much does it improve compared to non-patched httpd, how many reset connections without the patch? If not significant I don't think it's worth attempting to do something about it.. Regards; Yann. - To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org
Re: [users@httpd] graceful-stop closes established connections without response
i have confirmed that the patch has been applied, and the behavior still persists, as confirmed by comparing the counts of [SYN,ACK] and accept() ~$ tcpdump -n -r /tmp/tcpdump.pcap | grep -Fc '[S.]'; grep -Fh 'accept4' /tmp/strace-apache2.out.* | grep -Fc .240.209 reading from file /tmp/tcpdump.pcap, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 Warning: interface names might be incorrect 3485 3483 as well as ~$ for i in '> GET /index.html' '< HTTP/1.1 200 OK'; do echo -e "'$i':\t$( cat /tmp/curl-[0-9]*.out | grep -Fc "$i" )"; done '> GET /index.html': 3485 '< HTTP/1.1 200 OK': 3483 ~$ grep -F reset /tmp/curl-[0-9]*.out /tmp/curl-13.out:* Recv failure: Connection reset by peer /tmp/curl-13.out:curl: (56) Recv failure: Connection reset by peer /tmp/curl-2.out:* Recv failure: Connection reset by peer /tmp/curl-2.out:curl: (56) Recv failure: Connection reset by peer which all seem to agree that there were two failed connections and, in case this shines some light on things: ~$ grep -F '19:10:50' /var/log/apache2/error.log | grep -F XXX [Tue Jan 30 19:10:50.357074 2024] [mpm_event:notice] [pid 151381:tid 140267709454016] XXX: may exit (0, 1) [Tue Jan 30 19:10:50.357498 2024] [mpm_event:notice] [pid 151382:tid 140267709454016] XXX: may exit (0, 2) [Tue Jan 30 19:10:50.357610 2024] [mpm_event:notice] [pid 151381:tid 140267709454016] XXX: draining [Tue Jan 30 19:10:50.358011 2024] [mpm_event:notice] [pid 151381:tid 140267709454016] XXX: exiting [Tue Jan 30 19:10:50.358091 2024] [mpm_event:notice] [pid 151382:tid 140267709454016] XXX: draining [Tue Jan 30 19:10:50.358201 2024] [mpm_event:notice] [pid 151381:tid 140267709454016] XXX: closing [Tue Jan 30 19:10:50.358492 2024] [mpm_event:notice] [pid 151382:tid 140267709454016] XXX: exiting [Tue Jan 30 19:10:50.358906 2024] [mpm_event:notice] [pid 151382:tid 140267709454016] XXX: closing [Tue Jan 30 19:10:50.366948 2024] [mpm_event:notice] [pid 151381:tid 140267709454016] XXX: exited [Tue Jan 30 19:10:50.367124 2024] [mpm_event:notice] [pid 151382:tid 140267709454016] XXX: exited On 1/30/24 06:09 AM, Yann Ylavic wrote: On Tue, Jan 30, 2024 at 11:54 AM Yann Ylavic wrote: On Tue, Jan 30, 2024 at 4:37 AM Sherrard Burton wrote: i was going to add some debugging lines, but when i took a quick look at the patch, i wasn't clear on which sections of the code i should be guaranteed to hit. can you be so kind as to send an updated patch with some gratuitous logging in the appropriate sections so that there will be positive affirmation that the patch has (or hasn't) been applied and is falling into the expected sections? Sure, here is a v2 (which also includes a fix w.r.t. v1). Argh, please use this v3 instead, I missed that EINTR could interfere and should be ignored while draining. Regards; Yann. - To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org - To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org
Re: [users@httpd] graceful-stop closes established connections without response
On Tue, Jan 30, 2024 at 11:54 AM Yann Ylavic wrote: > > On Tue, Jan 30, 2024 at 4:37 AM Sherrard Burton wrote: > > > > i was going to add some debugging lines, but when i took a quick look at > > the patch, i wasn't clear on which sections of the code i should be > > guaranteed to hit. can you be so kind as to send an updated patch with > > some gratuitous logging in the appropriate sections so that there will > > be positive affirmation that the patch has (or hasn't) been applied and > > is falling into the expected sections? > > Sure, here is a v2 (which also includes a fix w.r.t. v1). Argh, please use this v3 instead, I missed that EINTR could interfere and should be ignored while draining. > > Regards; > Yann. Index: server/mpm/event/event.c === --- server/mpm/event/event.c (revision 1915442) +++ server/mpm/event/event.c (working copy) @@ -174,7 +174,7 @@ static int had_healthy_child = 0; static volatile int dying = 0; static volatile int workers_may_exit = 0; static volatile int start_thread_may_exit = 0; -static volatile int listener_may_exit = 0; +static volatile apr_uint32_t listener_may_exit = 0; static int listener_is_wakeable = 0;/* Pollset supports APR_POLLSET_WAKEABLE */ static int num_listensocks = 0; static apr_int32_t conns_this_child;/* MaxConnectionsPerChild, only access @@ -481,8 +481,7 @@ static void disable_listensocks(void) static void enable_listensocks(void) { int i; -if (listener_may_exit -|| apr_atomic_cas32(&listensocks_disabled, 0, 1) != 1) { +if (dying || apr_atomic_cas32(&listensocks_disabled, 0, 1) != 1) { return; } ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ap_server_conf, APLOGNO(00457) @@ -575,8 +574,7 @@ static void wakeup_listener(void) ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ap_server_conf, "wake up listener%s", listener_may_exit ? " again" : ""); -listener_may_exit = 1; -disable_listensocks(); +apr_atomic_cas32(&listener_may_exit, 1, 0); /* Unblock the listener if it's poll()ing */ if (event_pollset && listener_is_wakeable) { @@ -1184,12 +1182,9 @@ read_request: cs->pub.state = CONN_STATE_READ_REQUEST_LINE; goto read_request; } -else if (!listener_may_exit) { +else { cs->pub.state = CONN_STATE_CHECK_REQUEST_LINE_READABLE; } -else { -cs->pub.state = CONN_STATE_LINGER; -} } if (cs->pub.state == CONN_STATE_CHECK_REQUEST_LINE_READABLE) { @@ -1256,18 +1251,21 @@ static void check_infinite_requests(void) } } -static int close_listeners(int *closed) +static int close_listeners(void) { ap_log_error(APLOG_MARK, APLOG_TRACE6, 0, ap_server_conf, "clos%s listeners (connection_count=%u)", - *closed ? "ed" : "ing", apr_atomic_read32(&connection_count)); -if (!*closed) { + dying ? "ed" : "ing", apr_atomic_read32(&connection_count)); +if (!dying) { int i; +dying = 1; /* once */ + +ap_log_error(APLOG_MARK, APLOG_NOTICE, 0, ap_server_conf, + "XXX: closing"); + ap_close_listeners_ex(my_bucket->listeners); -*closed = 1; /* once */ -dying = 1; ap_scoreboard_image->parent[ap_child_slot].quiescing = 1; for (i = 0; i < threads_per_child; ++i) { ap_update_child_status_from_indexes(ap_child_slot, i, @@ -1654,8 +1652,7 @@ static void * APR_THREAD_FUNC listener_thread(apr_ proc_info *ti = dummy; int process_slot = ti->pslot; struct process_score *ps = ap_get_scoreboard_process(process_slot); -int closed = 0; -int have_idle_worker = 0; +int have_idle_worker = 0, exiting = 0; apr_time_t last_log; last_log = apr_time_now(); @@ -1678,8 +1675,8 @@ static void * APR_THREAD_FUNC listener_thread(apr_ if (conns_this_child <= 0) check_infinite_requests(); -if (listener_may_exit) { -int first_close = close_listeners(&closed); +if (exiting) { +int first_close = close_listeners(); if (terminate_mode == ST_UNGRACEFUL || apr_atomic_read32(&connection_count) == 0) @@ -1710,7 +1707,7 @@ static void * APR_THREAD_FUNC listener_thread(apr_ apr_atomic_read32(keepalive_q->total), apr_atomic_read32(&lingering_count), apr_atomic_read32(&suspended_count)); -if (dying) { +if (exiting) { ap_log_error(APLOG_MARK, APLOG_TRACE6, 0, ap_server_conf, "%u/%u workers shutdown", apr_atomic_read32(&threads_shutdown), @@ -1792,6 +1789,13 @@ static void * APR_THREAD_FUNC listener_thread(apr_ } num = 0;
Re: [users@httpd] graceful-stop closes established connections without response
On Tue, Jan 30, 2024 at 4:37 AM Sherrard Burton wrote: > > i was going to add some debugging lines, but when i took a quick look at > the patch, i wasn't clear on which sections of the code i should be > guaranteed to hit. can you be so kind as to send an updated patch with > some gratuitous logging in the appropriate sections so that there will > be positive affirmation that the patch has (or hasn't) been applied and > is falling into the expected sections? Sure, here is a v2 (which also includes a fix w.r.t. v1). Regards; Yann. Index: server/mpm/event/event.c === --- server/mpm/event/event.c (revision 1915442) +++ server/mpm/event/event.c (working copy) @@ -174,7 +174,7 @@ static int had_healthy_child = 0; static volatile int dying = 0; static volatile int workers_may_exit = 0; static volatile int start_thread_may_exit = 0; -static volatile int listener_may_exit = 0; +static volatile apr_uint32_t listener_may_exit = 0; static int listener_is_wakeable = 0;/* Pollset supports APR_POLLSET_WAKEABLE */ static int num_listensocks = 0; static apr_int32_t conns_this_child;/* MaxConnectionsPerChild, only access @@ -481,8 +481,7 @@ static void disable_listensocks(void) static void enable_listensocks(void) { int i; -if (listener_may_exit -|| apr_atomic_cas32(&listensocks_disabled, 0, 1) != 1) { +if (dying || apr_atomic_cas32(&listensocks_disabled, 0, 1) != 1) { return; } ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ap_server_conf, APLOGNO(00457) @@ -575,8 +574,7 @@ static void wakeup_listener(void) ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ap_server_conf, "wake up listener%s", listener_may_exit ? " again" : ""); -listener_may_exit = 1; -disable_listensocks(); +apr_atomic_cas32(&listener_may_exit, 1, 0); /* Unblock the listener if it's poll()ing */ if (event_pollset && listener_is_wakeable) { @@ -1184,12 +1182,9 @@ read_request: cs->pub.state = CONN_STATE_READ_REQUEST_LINE; goto read_request; } -else if (!listener_may_exit) { +else { cs->pub.state = CONN_STATE_CHECK_REQUEST_LINE_READABLE; } -else { -cs->pub.state = CONN_STATE_LINGER; -} } if (cs->pub.state == CONN_STATE_CHECK_REQUEST_LINE_READABLE) { @@ -1256,18 +1251,21 @@ static void check_infinite_requests(void) } } -static int close_listeners(int *closed) +static int close_listeners(void) { ap_log_error(APLOG_MARK, APLOG_TRACE6, 0, ap_server_conf, "clos%s listeners (connection_count=%u)", - *closed ? "ed" : "ing", apr_atomic_read32(&connection_count)); -if (!*closed) { + dying ? "ed" : "ing", apr_atomic_read32(&connection_count)); +if (!dying) { int i; +dying = 1; /* once */ + +ap_log_error(APLOG_MARK, APLOG_NOTICE, 0, ap_server_conf, + "XXX: closing"); + ap_close_listeners_ex(my_bucket->listeners); -*closed = 1; /* once */ -dying = 1; ap_scoreboard_image->parent[ap_child_slot].quiescing = 1; for (i = 0; i < threads_per_child; ++i) { ap_update_child_status_from_indexes(ap_child_slot, i, @@ -1654,8 +1652,7 @@ static void * APR_THREAD_FUNC listener_thread(apr_ proc_info *ti = dummy; int process_slot = ti->pslot; struct process_score *ps = ap_get_scoreboard_process(process_slot); -int closed = 0; -int have_idle_worker = 0; +int have_idle_worker = 0, exiting = 0; apr_time_t last_log; last_log = apr_time_now(); @@ -1678,8 +1675,8 @@ static void * APR_THREAD_FUNC listener_thread(apr_ if (conns_this_child <= 0) check_infinite_requests(); -if (listener_may_exit) { -int first_close = close_listeners(&closed); +if (exiting) { +int first_close = close_listeners(); if (terminate_mode == ST_UNGRACEFUL || apr_atomic_read32(&connection_count) == 0) @@ -1710,7 +1707,7 @@ static void * APR_THREAD_FUNC listener_thread(apr_ apr_atomic_read32(keepalive_q->total), apr_atomic_read32(&lingering_count), apr_atomic_read32(&suspended_count)); -if (dying) { +if (exiting) { ap_log_error(APLOG_MARK, APLOG_TRACE6, 0, ap_server_conf, "%u/%u workers shutdown", apr_atomic_read32(&threads_shutdown), @@ -1792,6 +1789,10 @@ static void * APR_THREAD_FUNC listener_thread(apr_ } num = 0; } +if (!exiting && apr_atomic_read32(&listener_may_exit)) { +ap_log_error(APLOG_MARK, APLOG_NOTICE, 0, ap_server_conf, + "XXX: may exit (%
Re: [users@httpd] graceful-stop closes established connections without response
On 1/29/24 12:25 PM, Yann Ylavic wrote: That's where we are, I think, if this first/light patch eventually helps significantly with the "local" graceful-stop which you care about still, it's possibly worth it since it requires no opt-in (but needs testing..), but going further looks overkill/risky for httpd. i (theoretically) applied your patch to the debian source and rebuilt and installed the packages, but the behavior still persists. it has been a long time since i have had to patch debian source (it was apache back then as well :-) thundering herd patch IIRC) so i'm not 100% sure that i've gotten it right. mod_mpm_event.so does indeed have an updated modtime -rw-r--r-- 1 root root 75984 Jan 30 00:13 /usr/lib/apache2/modules/mod_mpm_event.so vs -rw-r--r-- 1 root root 75984 Apr 13 2023 /usr/lib/apache2/modules/mod_mpm_event.so but the identical file size makes me wonder whether i was successful in applying the patch. i was going to add some debugging lines, but when i took a quick look at the patch, i wasn't clear on which sections of the code i should be guaranteed to hit. can you be so kind as to send an updated patch with some gratuitous logging in the appropriate sections so that there will be positive affirmation that the patch has (or hasn't) been applied and is falling into the expected sections? - To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org
Re: [users@httpd] graceful-stop closes established connections without response
On Mon, Jan 29, 2024 at 4:59 PM Sherrard Burton wrote: > > On 1/29/24 10:17 AM, Yann Ylavic wrote: > > On Mon, Jan 29, 2024 at 3:06 PM Eric Covener wrote: > > > > The patch helps in this case because we no longer close the listening > > sockets unconditionally, I mean without first checking if there are > > new connections in the backlog. So I thought the option was needed > > because if nothing stops new connections from arriving it could > > prevent the child from stopping indefinitely? How could we know if a > > LB/VIP is in place? > > it sounds like this issue is all but resolved, but i would like to > understand whether the above (preventing the child from stopping > indefinitely) is an actual possibility. > > my (naive) expectation is that if a given child has been signaled while > handling an existing request then it "knows" not to accept() a new > request after completing the existing request. so it seems that the > child is not under any danger of continuing indefinitely, regardless of > the contents of the backlog. Yes, a stopping child won't accept any new connection currently in httpd, but this is what I proposed to change: each child continues to accept new connections after the graceful signal, until there is nothing to accept anymore. Though this needs an opt-in obviously. Sorry for the confusion because this is not what the patch I initially proposed is doing either, the patch simply allows for one more try at emptying the backlog after the signal was received, so it won't by itself prevent the child from stopping, but it might (likely) not be enough if resets don't happen mainly because of some bad timing in the listener thread (which this patch addresses, only). So before we go to the opt-in, as Eric said, we might as well consider that since it's not fully addressable in httpd anyway (without races), we'd rather let this be handled outside httpd (better/fully). That's where we are, I think, if this first/light patch eventually helps significantly with the "local" graceful-stop which you care about still, it's possibly worth it since it requires no opt-in (but needs testing..), but going further looks overkill/risky for httpd. - To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org
Re: [users@httpd] graceful-stop closes established connections without response
On 1/29/24 10:17 AM, Yann Ylavic wrote: On Mon, Jan 29, 2024 at 3:06 PM Eric Covener wrote: The patch helps in this case because we no longer close the listening sockets unconditionally, I mean without first checking if there are new connections in the backlog. So I thought the option was needed because if nothing stops new connections from arriving it could prevent the child from stopping indefinitely? How could we know if a LB/VIP is in place? it sounds like this issue is all but resolved, but i would like to understand whether the above (preventing the child from stopping indefinitely) is an actual possibility. my (naive) expectation is that if a given child has been signaled while handling an existing request then it "knows" not to accept() a new request after completing the existing request. so it seems that the child is not under any danger of continuing indefinitely, regardless of the contents of the backlog. - To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org
Re: [users@httpd] graceful-stop closes established connections without response
On 1/29/24 9:05 AM, Eric Covener wrote: Maybe I wasn't clear enough but this patch makes sense only if there is something in place that prevents new connections from arriving at the stopping httpd children processes (like a frontend/load-balancer or a tcp/bpf filter), otherwise they may never really stop which does not help for a graceful stop/restart obviously. So this change (if useful) should be guarded by a GracefulDrain on/off or something config option to not hurt the other use cases. Thanks Yann! It seems to me If there is no such LB/VIP that stops new connections from landing on this server, the new option should be avoided. But if there is such a LB/VIP, the option is not really needed. Is it fair? indeed. i tried to convey this in my most-recent response, but i think you put it much more clearly. we have essentially starting removing servers from the backend pool before issuing the graceful-stop. _when_ this is feasible, it does obviate the need for any changes at the httpd level. but it still leaves us with the possibility of resetting established connections during unplanned shutdown, such as an unexpected host reboot. but, as indicated previously, i think i am satisfied with this approach after gaining understanding the underlying mechanisms. - To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org
Re: [users@httpd] graceful-stop closes established connections without response
On 1/29/24 8:59 AM, Yann Ylavic wrote: Maybe I wasn't clear enough but this patch makes sense only if there is something in place that prevents new connections from arriving at the stopping httpd children processes (like a frontend/load-balancer or a tcp/bpf filter), otherwise they may never really stop which does not help for a graceful stop/restart obviously. So this change (if useful) should be guarded by a GracefulDrain on/off or something config option to not hurt the other use cases. Yann, thank you for the feedback and the thorough explanation (i hadn't thought about the OS-level backlog). after discovery of this problem i began removing the target server from the backend pool in the load-balancer before issuing the graceful-stop. this seems to work as expected for planned maintenance, but the problem can still occur if the backend server ends up needing to be stopped outside of my direct command. we know that we cannot guard against all situations, but the systemd unit files on debian seem to trigger graceful-stop for its 'stop' action ExecStop=/usr/sbin/apachectl graceful-stop so if indeed this were addressable within httpd (which it sounds like it is not) then we would theoretically be protected from all but unexpected hard crashes in httpd or at the OS level. my takeaway at this point, not having yet finished reading Eric's other responses on this thread, is that this mostly "is what it is", and i think that i am (provisionally) satisfied with knowing that the behavior that i am observing is not a bug, per-se. ps--is there any possibility of "shrinking" this window by tuning either ListenBacklog or any analogous settings at the OS level? or do i risk creating a different problem? i ask because theoretically, the backlog/queuing (from the client perspective) should mostly be handled by the load-balancer, which is currently L4, not L7. so if reducing/eliminating the backlog on the server might positively impact this situation then i may spend some time on it. thanks again - To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org
Re: [users@httpd] graceful-stop closes established connections without response
On Mon, Jan 29, 2024 at 4:21 PM Eric Covener wrote: > > > > It seems to me If there is no such LB/VIP that stops new connections > > > from landing on this server, the new option should be avoided. > > > > Correct. > > > > > But if there is such a LB/VIP, the option is not really needed. Is it > > > fair? > > > > The patch helps in this case because we no longer close the listening > > sockets unconditionally, I mean without first checking if there are > > new connections in the backlog. So I thought the option was needed > > because if nothing stops new connections from arriving it could > > prevent the child from stopping indefinitely? How could we know if a > > LB/VIP is in place? > > I mean the initial patch vs. the status quo, not just the opt-in part. Even if there is a LB that stops routing new connections to the stopping httpd we might kill the ones that are in the backlog already. But yes I suppose that the switch on the LB could precede the graceful-stop by a few seconds to let httpd drain the backlog normally, in any case the race is hardly addressable fully in httpd so we might consider doing nothing to minimize it too, that's fair enough :) - To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org
Re: [users@httpd] graceful-stop closes established connections without response
> > It seems to me If there is no such LB/VIP that stops new connections > > from landing on this server, the new option should be avoided. > > Correct. > > > But if there is such a LB/VIP, the option is not really needed. Is it fair? > > The patch helps in this case because we no longer close the listening > sockets unconditionally, I mean without first checking if there are > new connections in the backlog. So I thought the option was needed > because if nothing stops new connections from arriving it could > prevent the child from stopping indefinitely? How could we know if a > LB/VIP is in place? I mean the initial patch vs. the status quo, not just the opt-in part. - To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org
Re: [users@httpd] graceful-stop closes established connections without response
On Mon, Jan 29, 2024 at 3:06 PM Eric Covener wrote: > > > Maybe I wasn't clear enough but this patch makes sense only if there > > is something in place that prevents new connections from arriving at > > the stopping httpd children processes (like a frontend/load-balancer > > or a tcp/bpf filter), otherwise they may never really stop which does > > not help for a graceful stop/restart obviously. So this change (if > > useful) should be guarded by a GracefulDrain on/off or something > > config option to not hurt the other use cases. > > Thanks Yann! > > It seems to me If there is no such LB/VIP that stops new connections > from landing on this server, the new option should be avoided. Correct. > But if there is such a LB/VIP, the option is not really needed. Is it fair? The patch helps in this case because we no longer close the listening sockets unconditionally, I mean without first checking if there are new connections in the backlog. So I thought the option was needed because if nothing stops new connections from arriving it could prevent the child from stopping indefinitely? How could we know if a LB/VIP is in place? - To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org
Re: [users@httpd] graceful-stop closes established connections without response
> Maybe I wasn't clear enough but this patch makes sense only if there > is something in place that prevents new connections from arriving at > the stopping httpd children processes (like a frontend/load-balancer > or a tcp/bpf filter), otherwise they may never really stop which does > not help for a graceful stop/restart obviously. So this change (if > useful) should be guarded by a GracefulDrain on/off or something > config option to not hurt the other use cases. Thanks Yann! It seems to me If there is no such LB/VIP that stops new connections from landing on this server, the new option should be avoided. But if there is such a LB/VIP, the option is not really needed. Is it fair? - To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org
Re: [users@httpd] graceful-stop closes established connections without response
On Mon, Jan 29, 2024 at 2:23 PM Yann Ylavic wrote: > > On Sun, Jan 28, 2024 at 5:26 AM Sherrard Burton wrote: > > > > On 1/27/24 09:46 PM, Eric Covener wrote: > > > > > > Both worker and event MPMs have a dedicated listener thread per child > > > process, so it will close those copies of the listening sockets much > > > more quickly. > > > > so that i am clear, are you saying that this behavior is still possible, > > although less likely under the worker and event MPMs? > > I think it's possible regardless of the MPM, and there is quite little > a server can do about it without the help of the system or some > tcp/bpf filter (something that prepares the graceful shutdown at the > system level to prevent the 3-way handshake from completing). > This is because when the connections are ready to be accept()ed (i.e. > in the listening socket's backlog), they are already fully established > and likely contain the request data (at least partly), the system has > done this underneath httpd already. > So if/when httpd closes its listening socket(s) all the connections in > the backlog(s) are lost/reset anyway, and there is always going to be > a race condition with the draining of the backlog if nothing stops new > connections from being established at the system level. > > To minimize the race condition maybe httpd can do better at trying to > drain the backlog before closing the listeners. Does the attached > patch help for instance (it's against mpm_event 2.4.x)? > But I don't think it can be fully solved at httpd level anyway, with > this change the effective stop could be longer (so long as there are > incoming/pending connections routed to each child by the system), it > could even last forever theoretically if connections keep coming > indefinitely.. Maybe I wasn't clear enough but this patch makes sense only if there is something in place that prevents new connections from arriving at the stopping httpd children processes (like a frontend/load-balancer or a tcp/bpf filter), otherwise they may never really stop which does not help for a graceful stop/restart obviously. So this change (if useful) should be guarded by a GracefulDrain on/off or something config option to not hurt the other use cases. > > Regards; > Yann. - To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org
Re: [users@httpd] graceful-stop closes established connections without response
On Sun, Jan 28, 2024 at 5:26 AM Sherrard Burton wrote: > > On 1/27/24 09:46 PM, Eric Covener wrote: > > > > Both worker and event MPMs have a dedicated listener thread per child > > process, so it will close those copies of the listening sockets much > > more quickly. > > so that i am clear, are you saying that this behavior is still possible, > although less likely under the worker and event MPMs? I think it's possible regardless of the MPM, and there is quite little a server can do about it without the help of the system or some tcp/bpf filter (something that prepares the graceful shutdown at the system level to prevent the 3-way handshake from completing). This is because when the connections are ready to be accept()ed (i.e. in the listening socket's backlog), they are already fully established and likely contain the request data (at least partly), the system has done this underneath httpd already. So if/when httpd closes its listening socket(s) all the connections in the backlog(s) are lost/reset anyway, and there is always going to be a race condition with the draining of the backlog if nothing stops new connections from being established at the system level. To minimize the race condition maybe httpd can do better at trying to drain the backlog before closing the listeners. Does the attached patch help for instance (it's against mpm_event 2.4.x)? But I don't think it can be fully solved at httpd level anyway, with this change the effective stop could be longer (so long as there are incoming/pending connections routed to each child by the system), it could even last forever theoretically if connections keep coming indefinitely.. Regards; Yann. Index: server/mpm/event/event.c === --- server/mpm/event/event.c (revision 1915442) +++ server/mpm/event/event.c (working copy) @@ -174,7 +174,7 @@ static int had_healthy_child = 0; static volatile int dying = 0; static volatile int workers_may_exit = 0; static volatile int start_thread_may_exit = 0; -static volatile int listener_may_exit = 0; +static volatile apr_uint32_t listener_may_exit = 0; static int listener_is_wakeable = 0;/* Pollset supports APR_POLLSET_WAKEABLE */ static int num_listensocks = 0; static apr_int32_t conns_this_child;/* MaxConnectionsPerChild, only access @@ -481,8 +481,7 @@ static void disable_listensocks(void) static void enable_listensocks(void) { int i; -if (listener_may_exit -|| apr_atomic_cas32(&listensocks_disabled, 0, 1) != 1) { +if (dying || apr_atomic_cas32(&listensocks_disabled, 0, 1) != 1) { return; } ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ap_server_conf, APLOGNO(00457) @@ -575,8 +574,7 @@ static void wakeup_listener(void) ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, ap_server_conf, "wake up listener%s", listener_may_exit ? " again" : ""); -listener_may_exit = 1; -disable_listensocks(); +apr_atomic_cas32(&listener_may_exit, 1, 0); /* Unblock the listener if it's poll()ing */ if (event_pollset && listener_is_wakeable) { @@ -1184,12 +1182,9 @@ read_request: cs->pub.state = CONN_STATE_READ_REQUEST_LINE; goto read_request; } -else if (!listener_may_exit) { +else { cs->pub.state = CONN_STATE_CHECK_REQUEST_LINE_READABLE; } -else { -cs->pub.state = CONN_STATE_LINGER; -} } if (cs->pub.state == CONN_STATE_CHECK_REQUEST_LINE_READABLE) { @@ -1654,7 +1649,7 @@ static void * APR_THREAD_FUNC listener_thread(apr_ proc_info *ti = dummy; int process_slot = ti->pslot; struct process_score *ps = ap_get_scoreboard_process(process_slot); -int closed = 0; +int may_exit = 0, closed = 0; int have_idle_worker = 0; apr_time_t last_log; @@ -1678,7 +1673,7 @@ static void * APR_THREAD_FUNC listener_thread(apr_ if (conns_this_child <= 0) check_infinite_requests(); -if (listener_may_exit) { +if (may_exit) { int first_close = close_listeners(&closed); if (terminate_mode == ST_UNGRACEFUL @@ -1899,7 +1894,7 @@ static void * APR_THREAD_FUNC listener_thread(apr_ "Idle workers: %u", ap_queue_info_num_idlers(worker_queue_info)); } -else if (!listener_may_exit) { +else { void *csd = NULL; ap_listen_rec *lr = (ap_listen_rec *) pt->baton; apr_pool_t *ptrans; /* Pool for per-transaction stuff */ @@ -1960,6 +1955,14 @@ static void * APR_THREAD_FUNC listener_thread(apr_ } /* if:else on pt->type */ } /* for processing poll */ +/* On graceful shutdown/stop we can close the listening sockets + * since the backlog should be drained now. +
Re: [users@httpd] graceful-stop closes established connections without response
Eric, thanks for the quick reply. follow-up inline below: On 1/27/24 09:46 PM, Eric Covener wrote: apache2: 2.4.56-1~deb11u2, prefork MPM, mod_perl I think it's a large window on prefork where this can happen. If any process is busy processing a request, it cannot close its copy of the listening socket. The OS will continue to complete TCP connections and acknowledge (some) data with nobody calling accept(). When the last of the listening sockets is finally closed, the TCP connections that arrived in this timeframe will be abruptly closed. is this window essentially the time between when the parent process calls close() on its copy of the listening socket and when the last of the child processes is handling its request and can call close() on the last remaining copy of the listening socket? if so, then it sounds like the window becomes wider and this situation becomes more likely if the children are handling longer-running requests. Both worker and event MPMs have a dedicated listener thread per child process, so it will close those copies of the listening sockets much more quickly. so that i am clear, are you saying that this behavior is still possible, although less likely under the worker and event MPMs? NVM, actually, it appears that i may have already answered this question by not paying enough attention to detail when setting up my test environment. i unintentionally replicated the behavior under the event MPM when my intention was to use prefork for the sake of consistency. but i would still like to better understand "how much" the window is reduced under worker and event. for example, would they be similarly-effected by a child that is still handling a long-running request, or would the listener thread be able to close() its copy of the socket while the handling thread is still serving content to the client? thanks again - To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org
Re: [users@httpd] graceful-stop closes established connections without response
> apache2: 2.4.56-1~deb11u2, prefork MPM, mod_perl I think it's a large window on prefork where this can happen. If any process is busy processing a request, it cannot close its copy of the listening socket. The OS will continue to complete TCP connections and acknowledge (some) data with nobody calling accept(). When the last of the listening sockets is finally closed, the TCP connections that arrived in this timeframe will be abruptly closed. Both worker and event MPMs have a dedicated listener thread per child process, so it will close those copies of the listening sockets much more quickly. - To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org
[users@httpd] graceful-stop closes established connections without response
good day all, i have an issue where for a small subset of requests issued during a graceful-stop the client is not receiving connection refused (immediate [RST,ACK] after [SYN]), nor receiving a full response, but the client is rather completing the handshake, receiving an [ACK] after being allowed to send the full request, but then receiving a [RST,ACK] immediately after that [ACK]. as a result, a client with direct access reports 'connection reset by peer', a client with access through a L4 load balancer reports 'empty reply from server' and a client accessing via a reverse proxy receives a 502. all of these seem to be consistent with the observed behavior of abruptly closing the connection. i have the tcpdumps and the strace output to accompany the above behavior, but before deluging the list with attachments, i wanted to first see a) if anyone has experienced similar behavior during graceful-stop b) if anyone can point me to an explanation of this behavior (and let me know where my search skills are lacking :-)) c) if it is my understanding of the expected behavior of graceful-stop is at fault here the particulars of my setup: Debian GNU/Linux 11 (bullseye) apache2: 2.4.56-1~deb11u2, prefork MPM, mod_perl my testing involves starting a series of curl loops in parallel, then calling graceful-stop on the running server. client: ~$ for i in {1..6}; do for j in {1..1000}; do curl ...; done; done server: ~$ apache2ctl graceful-stop && sleep 5 && systemctl start apache2 and the above behavior can usually be observed for a handful of requests during each test run. and from comparing the strace and tcpdump output, in _appears_ that the connection is being dropped between the time that it is received by the parent apache process, but before one of the child processes can accept() it. ie, for the ephemeral port used for the errant connections, there is no matching accept() in any of the strace output. i hope that i have provided enough initial information and a sufficient description. thanks in advance for any help, pointers or correction. - To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org For additional commands, e-mail: users-h...@httpd.apache.org