Re: child exit on self-proxy

2021-07-07 Thread Stefan Eissing



> Am 07.07.2021 um 18:34 schrieb Yann Ylavic :
> 
> Does this attached patch help? This is to verify an hypothesis where
> killed keepalive connections are still in the defer_linger_chain
> without a worker to handle them.
> 
> I don't see "interrupted, but continuing" triggering in your log but
> possibly it happened earlier..


[Wed Jul 07 16:48:04.237865 2021] [mpm_event:debug] [pid 55134:tid 4643098112] 
event.c(599): AH: wakeup_listener: start
[Wed Jul 07 16:48:04.237895 2021] [mpm_event:debug] [pid 55134:tid 4643098112] 
event.c(603): AH: wakeup_listener: apr_pollset_wakeup
[Wed Jul 07 16:48:04.238042 2021] [mpm_event:debug] [pid 55134:tid 4643098112] 
event.c(610): AH: wakeup_listener: ap_queue_info_term
[Wed Jul 07 16:48:04.238224 2021] [mpm_event:debug] [pid 55134:tid 
123145362530304] event.c(1971): (4)Interrupted system call: AH: pollset 
returned listener_may_exit=1 connection_count=1
[Wed Jul 07 16:48:04.238381 2021] [mpm_event:trace1] [pid 55134:tid 
123145362530304] event.c(1809): All workers are busy or dying, will close 1 
keep-alive connections
[Wed Jul 07 16:48:04.238196 2021] [mpm_event:debug] [pid 55134:tid 4643098112] 
event.c(629): AH: wakeup_listener: pthread_kill
[Wed Jul 07 16:48:04.238513 2021] [mpm_event:debug] [pid 55134:tid 
123145362530304] event.c(1971): (4)Interrupted system call: AH: pollset 
returned listener_may_exit=1 connection_count=1
[Wed Jul 07 16:48:04.238506 2021] [mpm_event:debug] [pid 55134:tid 4643098112] 
event.c(635): AH: wakeup_listener: end
[Wed Jul 07 16:48:04.238555 2021] [mpm_event:trace1] [pid 55134:tid 4643098112] 
event.c(2928): graceful termination received, join workers
[Wed Jul 07 16:48:08.443130 2021] [mpm_event:debug] [pid 55134:tid 
123145362530304] event.c(1971): (70007)The timeout specified has expired: AH: 
pollset returned listener_may_exit=1 connection_count=1
[Wed Jul 07 16:48:08.443149 2021] [mpm_event:trace1] [pid 55134:tid 
123145362530304] event.c(1809): All workers are busy or dying, will close 0 
keep-alive connections

> 
> On Wed, Jul 7, 2021 at 5:51 PM Stefan Eissing
>  wrote:
>> 
>> The things I do for you...
>> 
>> [Wed Jul 07 15:48:52.591983 2021] [mpm_event:debug] [pid 48747:tid 
>> 4385197568] event.c(599): AH: wakeup_listener: start
>> [Wed Jul 07 15:48:52.592000 2021] [mpm_event:debug] [pid 48747:tid 
>> 4385197568] event.c(603): AH: wakeup_listener: apr_pollset_wakeup
>> [Wed Jul 07 15:48:52.592046 2021] [mpm_event:debug] [pid 48747:tid 
>> 4385197568] event.c(610): AH: wakeup_listener: ap_queue_info_term
>> [Wed Jul 07 15:48:52.592188 2021] [mpm_event:debug] [pid 48747:tid 
>> 4385197568] event.c(629): AH: wakeup_listener: pthread_kill
>> [Wed Jul 07 15:48:52.592166 2021] [mpm_event:debug] [pid 48747:tid 
>> 123145502474240] event.c(1985): (4)Interrupted system call: AH: pollset 
>> returned listener_may_exit=1 connection_count=2
>> [Wed Jul 07 15:48:52.592523 2021] [mpm_event:debug] [pid 48747:tid 
>> 4385197568] event.c(635): AH: wakeup_listener: end
>> [Wed Jul 07 15:48:52.592727 2021] [mpm_event:trace1] [pid 48747:tid 
>> 4385197568] event.c(2942): graceful termination received, join workers
>> [Wed Jul 07 15:48:52.592515 2021] [mpm_event:trace1] [pid 48747:tid 
>> 123145502474240] event.c(1809): All workers are busy or dying, will close 2 
>> keep-alive connections
>> [Wed Jul 07 15:48:52.593381 2021] [mpm_event:debug] [pid 48747:tid 
>> 123145502474240] event.c(1985): AH: pollset returned listener_may_exit=1 
>> connection_count=2
>> [Wed Jul 07 15:48:52.593478 2021] [mpm_event:debug] [pid 48747:tid 
>> 123145502474240] event.c(1985): (4)Interrupted system call: AH: pollset 
>> returned listener_may_exit=1 connection_count=1
>> [Wed Jul 07 15:48:57.552623 2021] [mpm_event:debug] [pid 48747:tid 
>> 123145502474240] event.c(1985): (70007)The timeout specified has expired: 
>> AH: pollset returned listener_may_exit=1 connection_count=1
>> [Wed Jul 07 15:48:57.552658 2021] [mpm_event:trace1] [pid 48747:tid 
>> 123145502474240] event.c(1809): All workers are busy or dying, will close 0 
>> keep-alive connections
>> 
>> 
>> 
>> Btw. I opened a bottle of nice wine - this is entertaining. ;-)
>> 
>> - Stefan
>> 
>>> Am 07.07.2021 um 17:48 schrieb Yann Ylavic :
>>> 
>>> On Wed, Jul 7, 2021 at 5:45 PM Stefan Eissing
>>>  wrote:
 
> Am 07.07.2021 um 17:39 schrieb Ruediger Pluem :
> 
> Hm. The ap_log_error statically writes listener_may_exit=0. Can you put 
> the actual value of listener_may_exit in the log message?
 
 I put it statically since that was the value in the "if ()" 6 lines above. 
 it should have been visible then and there to work.
>>> 
>>> The above if () does not always break.
>>> 
>>> 
>>> Cheers;
>>> Yann.
>> 
> 



Re: child exit on self-proxy

2021-07-07 Thread Yann Ylavic
Does this attached patch help? This is to verify an hypothesis where
killed keepalive connections are still in the defer_linger_chain
without a worker to handle them.

I don't see "interrupted, but continuing" triggering in your log but
possibly it happened earlier..

On Wed, Jul 7, 2021 at 5:51 PM Stefan Eissing
 wrote:
>
> The things I do for you...
>
> [Wed Jul 07 15:48:52.591983 2021] [mpm_event:debug] [pid 48747:tid 
> 4385197568] event.c(599): AH: wakeup_listener: start
> [Wed Jul 07 15:48:52.592000 2021] [mpm_event:debug] [pid 48747:tid 
> 4385197568] event.c(603): AH: wakeup_listener: apr_pollset_wakeup
> [Wed Jul 07 15:48:52.592046 2021] [mpm_event:debug] [pid 48747:tid 
> 4385197568] event.c(610): AH: wakeup_listener: ap_queue_info_term
> [Wed Jul 07 15:48:52.592188 2021] [mpm_event:debug] [pid 48747:tid 
> 4385197568] event.c(629): AH: wakeup_listener: pthread_kill
> [Wed Jul 07 15:48:52.592166 2021] [mpm_event:debug] [pid 48747:tid 
> 123145502474240] event.c(1985): (4)Interrupted system call: AH: pollset 
> returned listener_may_exit=1 connection_count=2
> [Wed Jul 07 15:48:52.592523 2021] [mpm_event:debug] [pid 48747:tid 
> 4385197568] event.c(635): AH: wakeup_listener: end
> [Wed Jul 07 15:48:52.592727 2021] [mpm_event:trace1] [pid 48747:tid 
> 4385197568] event.c(2942): graceful termination received, join workers
> [Wed Jul 07 15:48:52.592515 2021] [mpm_event:trace1] [pid 48747:tid 
> 123145502474240] event.c(1809): All workers are busy or dying, will close 2 
> keep-alive connections
> [Wed Jul 07 15:48:52.593381 2021] [mpm_event:debug] [pid 48747:tid 
> 123145502474240] event.c(1985): AH: pollset returned listener_may_exit=1 
> connection_count=2
> [Wed Jul 07 15:48:52.593478 2021] [mpm_event:debug] [pid 48747:tid 
> 123145502474240] event.c(1985): (4)Interrupted system call: AH: pollset 
> returned listener_may_exit=1 connection_count=1
> [Wed Jul 07 15:48:57.552623 2021] [mpm_event:debug] [pid 48747:tid 
> 123145502474240] event.c(1985): (70007)The timeout specified has expired: AH: 
> pollset returned listener_may_exit=1 connection_count=1
> [Wed Jul 07 15:48:57.552658 2021] [mpm_event:trace1] [pid 48747:tid 
> 123145502474240] event.c(1809): All workers are busy or dying, will close 0 
> keep-alive connections
>
>
>
> Btw. I opened a bottle of nice wine - this is entertaining. ;-)
>
> - Stefan
>
> > Am 07.07.2021 um 17:48 schrieb Yann Ylavic :
> >
> > On Wed, Jul 7, 2021 at 5:45 PM Stefan Eissing
> >  wrote:
> >>
> >>> Am 07.07.2021 um 17:39 schrieb Ruediger Pluem :
> >>>
> >>> Hm. The ap_log_error statically writes listener_may_exit=0. Can you put 
> >>> the actual value of listener_may_exit in the log message?
> >>
> >> I put it statically since that was the value in the "if ()" 6 lines above. 
> >> it should have been visible then and there to work.
> >
> > The above if () does not always break.
> >
> >
> > Cheers;
> > Yann.
>
Index: server/mpm/event/event.c
===
--- server/mpm/event/event.c	(revision 1891217)
+++ server/mpm/event/event.c	(working copy)
@@ -1940,15 +1940,6 @@ static void * APR_THREAD_FUNC listener_thread(apr_
 rc = apr_pollset_poll(event_pollset, timeout_interval, &num, &out_pfd);
 if (rc != APR_SUCCESS) {
 if (APR_STATUS_IS_EINTR(rc)) {
-/* Woken up, if we are exiting or listeners are disabled we
- * must fall through to kill kept-alive connections or test
- * whether listeners should be re-enabled. Otherwise we only
- * need to update timeouts (logic is above, so simply restart
- * the loop).
- */
-if (!listener_may_exit && !listeners_disabled()) {
-continue;
-}
 timeout_time = 0;
 }
 else if (!APR_STATUS_IS_TIMEUP(rc)) {


Re: child exit on self-proxy

2021-07-07 Thread Stefan Eissing
The things I do for you...

[Wed Jul 07 15:48:52.591983 2021] [mpm_event:debug] [pid 48747:tid 4385197568] 
event.c(599): AH: wakeup_listener: start
[Wed Jul 07 15:48:52.592000 2021] [mpm_event:debug] [pid 48747:tid 4385197568] 
event.c(603): AH: wakeup_listener: apr_pollset_wakeup
[Wed Jul 07 15:48:52.592046 2021] [mpm_event:debug] [pid 48747:tid 4385197568] 
event.c(610): AH: wakeup_listener: ap_queue_info_term
[Wed Jul 07 15:48:52.592188 2021] [mpm_event:debug] [pid 48747:tid 4385197568] 
event.c(629): AH: wakeup_listener: pthread_kill
[Wed Jul 07 15:48:52.592166 2021] [mpm_event:debug] [pid 48747:tid 
123145502474240] event.c(1985): (4)Interrupted system call: AH: pollset 
returned listener_may_exit=1 connection_count=2
[Wed Jul 07 15:48:52.592523 2021] [mpm_event:debug] [pid 48747:tid 4385197568] 
event.c(635): AH: wakeup_listener: end
[Wed Jul 07 15:48:52.592727 2021] [mpm_event:trace1] [pid 48747:tid 4385197568] 
event.c(2942): graceful termination received, join workers
[Wed Jul 07 15:48:52.592515 2021] [mpm_event:trace1] [pid 48747:tid 
123145502474240] event.c(1809): All workers are busy or dying, will close 2 
keep-alive connections
[Wed Jul 07 15:48:52.593381 2021] [mpm_event:debug] [pid 48747:tid 
123145502474240] event.c(1985): AH: pollset returned listener_may_exit=1 
connection_count=2
[Wed Jul 07 15:48:52.593478 2021] [mpm_event:debug] [pid 48747:tid 
123145502474240] event.c(1985): (4)Interrupted system call: AH: pollset 
returned listener_may_exit=1 connection_count=1
[Wed Jul 07 15:48:57.552623 2021] [mpm_event:debug] [pid 48747:tid 
123145502474240] event.c(1985): (70007)The timeout specified has expired: AH: 
pollset returned listener_may_exit=1 connection_count=1
[Wed Jul 07 15:48:57.552658 2021] [mpm_event:trace1] [pid 48747:tid 
123145502474240] event.c(1809): All workers are busy or dying, will close 0 
keep-alive connections



event_debugv2.diff
Description: Binary data


Btw. I opened a bottle of nice wine - this is entertaining. ;-)

- Stefan

> Am 07.07.2021 um 17:48 schrieb Yann Ylavic :
> 
> On Wed, Jul 7, 2021 at 5:45 PM Stefan Eissing
>  wrote:
>> 
>>> Am 07.07.2021 um 17:39 schrieb Ruediger Pluem :
>>> 
>>> Hm. The ap_log_error statically writes listener_may_exit=0. Can you put the 
>>> actual value of listener_may_exit in the log message?
>> 
>> I put it statically since that was the value in the "if ()" 6 lines above. 
>> it should have been visible then and there to work.
> 
> The above if () does not always break.
> 
> 
> Cheers;
> Yann.



Re: child exit on self-proxy

2021-07-07 Thread Yann Ylavic
On Wed, Jul 7, 2021 at 5:45 PM Stefan Eissing
 wrote:
>
> > Am 07.07.2021 um 17:39 schrieb Ruediger Pluem :
> >
> > Hm. The ap_log_error statically writes listener_may_exit=0. Can you put the 
> > actual value of listener_may_exit in the log message?
>
> I put it statically since that was the value in the "if ()" 6 lines above. it 
> should have been visible then and there to work.

The above if () does not always break.


Cheers;
Yann.


Re: child exit on self-proxy

2021-07-07 Thread Yann Ylavic
On Wed, Jul 7, 2021 at 5:39 PM Ruediger Pluem  wrote:
>
>
>
> On 7/7/21 5:25 PM, Stefan Eissing wrote:
> > In order to reproduce the logs I see on this weird behaviour, I'll attach 
> > the patch I made:
> >
> >
> >
> >
> > With this, I see in a hanging process:
> >
> > [Wed Jul 07 15:20:23.014980 2021] [core:trace1] [pid 42471:tid 
> > 123145448968192] core_filters.c(429): (32)Broken pipe: [client 
> > 127.0.0.1:49824] core_output_filter: writing data to the network
> > [Wed Jul 07 15:20:23.044541 2021] [mpm_event:debug] [pid 42471:tid 
> > 4525448704] event.c(599): AH: wakeup_listener: start
> > [Wed Jul 07 15:20:23.044601 2021] [mpm_event:debug] [pid 42471:tid 
> > 4525448704] event.c(603): AH: wakeup_listener: apr_pollset_wakeup
> > [Wed Jul 07 15:20:23.044715 2021] [mpm_event:debug] [pid 42471:tid 
> > 4525448704] event.c(610): AH: wakeup_listener: ap_queue_info_term
> > [Wed Jul 07 15:20:23.044861 2021] [mpm_event:debug] [pid 42471:tid 
> > 123145461309440] event.c(1985): (4)Interrupted system call: AH: pollset 
> > returned listener_may_exit=0
>
> Hm. The ap_log_error statically writes listener_may_exit=0. Can you put the 
> actual value of listener_may_exit in the log message?

It would be interesting to log apr_atomic_read32(&connection_count) too.


Re: child exit on self-proxy

2021-07-07 Thread Stefan Eissing



> Am 07.07.2021 um 17:39 schrieb Ruediger Pluem :
> 
> 
> 
> On 7/7/21 5:25 PM, Stefan Eissing wrote:
>> In order to reproduce the logs I see on this weird behaviour, I'll attach 
>> the patch I made:
>> 
>> 
>> 
>> 
>> With this, I see in a hanging process:
>> 
>> [Wed Jul 07 15:20:23.014980 2021] [core:trace1] [pid 42471:tid 
>> 123145448968192] core_filters.c(429): (32)Broken pipe: [client 
>> 127.0.0.1:49824] core_output_filter: writing data to the network
>> [Wed Jul 07 15:20:23.044541 2021] [mpm_event:debug] [pid 42471:tid 
>> 4525448704] event.c(599): AH: wakeup_listener: start
>> [Wed Jul 07 15:20:23.044601 2021] [mpm_event:debug] [pid 42471:tid 
>> 4525448704] event.c(603): AH: wakeup_listener: apr_pollset_wakeup
>> [Wed Jul 07 15:20:23.044715 2021] [mpm_event:debug] [pid 42471:tid 
>> 4525448704] event.c(610): AH: wakeup_listener: ap_queue_info_term
>> [Wed Jul 07 15:20:23.044861 2021] [mpm_event:debug] [pid 42471:tid 
>> 123145461309440] event.c(1985): (4)Interrupted system call: AH: pollset 
>> returned listener_may_exit=0
> 
> Hm. The ap_log_error statically writes listener_may_exit=0. Can you put the 
> actual value of listener_may_exit in the log message?

I put it statically since that was the value in the "if ()" 6 lines above. it 
should have been visible then and there to work.

> 
>> [Wed Jul 07 15:20:23.044817 2021] [mpm_event:debug] [pid 42471:tid 
>> 4525448704] event.c(629): AH: wakeup_listener: pthread_kill
>> [Wed Jul 07 15:20:23.044954 2021] [mpm_event:trace1] [pid 42471:tid 
>> 123145461309440] event.c(1809): All workers are busy or dying, will close 1 
>> keep-alive connections
>> [Wed Jul 07 15:20:23.044987 2021] [mpm_event:debug] [pid 42471:tid 
>> 4525448704] event.c(635): AH: wakeup_listener: end
>> [Wed Jul 07 15:20:23.045154 2021] [mpm_event:trace1] [pid 42471:tid 
>> 4525448704] event.c(2940): graceful termination received, join workers
>> [Wed Jul 07 15:20:25.927437 2021] [mpm_event:debug] [pid 42471:tid 
>> 123145461309440] event.c(1985): (70007)The timeout specified has expired: 
>> AH: pollset returned listener_may_exit=0
> 
> Looks like we get back in the poll and have to wait for the timeout.
> 
>> [Wed Jul 07 15:20:25.927468 2021] [mpm_event:trace1] [pid 42471:tid 
>> 123145461309440] event.c(1809): All workers are busy or dying, will close 0 
>> keep-alive connections
>> 
>> Which says (to my eyes) that the pollset_wakeup happened, but the listener 
>> saw "listener_may_exit == 0". Is the volatile maybe not enough?
> 
> Regards
> 
> Rüdiger



Re: child exit on self-proxy

2021-07-07 Thread Ruediger Pluem



On 7/7/21 5:25 PM, Stefan Eissing wrote:
> In order to reproduce the logs I see on this weird behaviour, I'll attach the 
> patch I made:
> 
> 
> 
> 
> With this, I see in a hanging process:
> 
> [Wed Jul 07 15:20:23.014980 2021] [core:trace1] [pid 42471:tid 
> 123145448968192] core_filters.c(429): (32)Broken pipe: [client 
> 127.0.0.1:49824] core_output_filter: writing data to the network
> [Wed Jul 07 15:20:23.044541 2021] [mpm_event:debug] [pid 42471:tid 
> 4525448704] event.c(599): AH: wakeup_listener: start
> [Wed Jul 07 15:20:23.044601 2021] [mpm_event:debug] [pid 42471:tid 
> 4525448704] event.c(603): AH: wakeup_listener: apr_pollset_wakeup
> [Wed Jul 07 15:20:23.044715 2021] [mpm_event:debug] [pid 42471:tid 
> 4525448704] event.c(610): AH: wakeup_listener: ap_queue_info_term
> [Wed Jul 07 15:20:23.044861 2021] [mpm_event:debug] [pid 42471:tid 
> 123145461309440] event.c(1985): (4)Interrupted system call: AH: pollset 
> returned listener_may_exit=0

Hm. The ap_log_error statically writes listener_may_exit=0. Can you put the 
actual value of listener_may_exit in the log message?

> [Wed Jul 07 15:20:23.044817 2021] [mpm_event:debug] [pid 42471:tid 
> 4525448704] event.c(629): AH: wakeup_listener: pthread_kill
> [Wed Jul 07 15:20:23.044954 2021] [mpm_event:trace1] [pid 42471:tid 
> 123145461309440] event.c(1809): All workers are busy or dying, will close 1 
> keep-alive connections
> [Wed Jul 07 15:20:23.044987 2021] [mpm_event:debug] [pid 42471:tid 
> 4525448704] event.c(635): AH: wakeup_listener: end
> [Wed Jul 07 15:20:23.045154 2021] [mpm_event:trace1] [pid 42471:tid 
> 4525448704] event.c(2940): graceful termination received, join workers
> [Wed Jul 07 15:20:25.927437 2021] [mpm_event:debug] [pid 42471:tid 
> 123145461309440] event.c(1985): (70007)The timeout specified has expired: AH: 
> pollset returned listener_may_exit=0

Looks like we get back in the poll and have to wait for the timeout.

> [Wed Jul 07 15:20:25.927468 2021] [mpm_event:trace1] [pid 42471:tid 
> 123145461309440] event.c(1809): All workers are busy or dying, will close 0 
> keep-alive connections
> 
> Which says (to my eyes) that the pollset_wakeup happened, but the listener 
> saw "listener_may_exit == 0". Is the volatile maybe not enough?

Regards

Rüdiger



Re: child exit on self-proxy

2021-07-07 Thread Stefan Eissing
In order to reproduce the logs I see on this weird behaviour, I'll attach the 
patch I made:



event_debug.diff
Description: Binary data


With this, I see in a hanging process:

[Wed Jul 07 15:20:23.014980 2021] [core:trace1] [pid 42471:tid 123145448968192] 
core_filters.c(429): (32)Broken pipe: [client 127.0.0.1:49824] 
core_output_filter: writing data to the network
[Wed Jul 07 15:20:23.044541 2021] [mpm_event:debug] [pid 42471:tid 4525448704] 
event.c(599): AH: wakeup_listener: start
[Wed Jul 07 15:20:23.044601 2021] [mpm_event:debug] [pid 42471:tid 4525448704] 
event.c(603): AH: wakeup_listener: apr_pollset_wakeup
[Wed Jul 07 15:20:23.044715 2021] [mpm_event:debug] [pid 42471:tid 4525448704] 
event.c(610): AH: wakeup_listener: ap_queue_info_term
[Wed Jul 07 15:20:23.044861 2021] [mpm_event:debug] [pid 42471:tid 
123145461309440] event.c(1985): (4)Interrupted system call: AH: pollset 
returned listener_may_exit=0
[Wed Jul 07 15:20:23.044817 2021] [mpm_event:debug] [pid 42471:tid 4525448704] 
event.c(629): AH: wakeup_listener: pthread_kill
[Wed Jul 07 15:20:23.044954 2021] [mpm_event:trace1] [pid 42471:tid 
123145461309440] event.c(1809): All workers are busy or dying, will close 1 
keep-alive connections
[Wed Jul 07 15:20:23.044987 2021] [mpm_event:debug] [pid 42471:tid 4525448704] 
event.c(635): AH: wakeup_listener: end
[Wed Jul 07 15:20:23.045154 2021] [mpm_event:trace1] [pid 42471:tid 4525448704] 
event.c(2940): graceful termination received, join workers
[Wed Jul 07 15:20:25.927437 2021] [mpm_event:debug] [pid 42471:tid 
123145461309440] event.c(1985): (70007)The timeout specified has expired: AH: 
pollset returned listener_may_exit=0
[Wed Jul 07 15:20:25.927468 2021] [mpm_event:trace1] [pid 42471:tid 
123145461309440] event.c(1809): All workers are busy or dying, will close 0 
keep-alive connections

Which says (to my eyes) that the pollset_wakeup happened, but the listener saw 
"listener_may_exit == 0". Is the volatile maybe not enough?


> Am 07.07.2021 um 16:56 schrieb Stefan Eissing :
> 
> 
> 
>> Am 07.07.2021 um 13:51 schrieb Yann Ylavic :
>> 
>> On Wed, Jul 7, 2021 at 12:16 PM Stefan Eissing
>>  wrote:
>>> 
>>> I added a TRACE1 log in event.c before/after join_workers (line 2921) and 
>>> see:
>>> 
>>> [Wed Jul 07 10:06:03.144044 2021] [mpm_event:trace1] [pid 72886:tid 
>>> 4493635072] event.c(2921): graceful termination received, join workers
>>> [Wed Jul 07 10:06:03.144213 2021] [mpm_event:trace1] [pid 72886:tid 
>>> 123145435672576] event.c(1799): All workers are busy or dying, will close 1 
>>> keep-alive connections
>>> [Wed Jul 07 10:06:08.079690 2021] [mpm_event:trace1] [pid 72886:tid 
>>> 123145435672576] event.c(1799): All workers are busy or dying, will close 0 
>>> keep-alive connections
>>> [Wed Jul 07 10:06:10.78 2021] [core:warn] [pid 72813:tid 4493635072] 
>>> AH00045: child process 72886 still did not exit, sending a SIGTERM
>>> [Wed Jul 07 10:06:12.789335 2021] [core:warn] [pid 72813:tid 4493635072] 
>>> AH00045: child process 72886 still did not exit, sending a SIGTERM
>>> [Wed Jul 07 10:06:14.791281 2021] [core:warn] [pid 72813:tid 4493635072] 
>>> AH00045: child process 72886 still did not exit, sending a SIGTERM
>>> [Wed Jul 07 10:06:16.792983 2021] [core:error] [pid 72813:tid 4493635072] 
>>> AH00046: child process 72886 still did not exit, sending a SIGKILL
>>> 
>>> So, I assume the keep-alive connection is the mod_proxy_http connection to 
>>> the sever itself. Since the join_workers() never returns, there seems to be 
>>> a thread not finishing.
>> 
>> No, MPM event doesn't handle mod_proxy kept alive connections, those
>> are client connections.
>> 
>>> 
>>> On another run, I got a stacktrace of the child:
>>> 
>>> Call graph:
>>>   264 Thread_36766542   DispatchQueue_1: com.apple.main-thread  (serial)
>>>   + 264 start  (in libdyld.dylib) + 1  [0x7fff2032cf5d]
>>>   +   264 main  (in httpd) + 2278  [0x104729b86]  main.c:862
>>>   + 264 ap_run_mpm  (in httpd) + 75  [0x10473917b]  mpm_common.c:100
>>>   +   264 event_run  (in mod_mpm_event.so) + 2994  [0x10502ae62]  
>>> event.c:3398
>>>   + 264 make_child  (in mod_mpm_event.so) + 436  [0x10502bbc4]  
>>> event.c:2997
>>>   +   264 child_main  (in mod_mpm_event.so) + 1734  [0x10502c2e6]  
>>> event.c:2924
>>>   + 264 join_workers  (in mod_mpm_event.so) + 386  
>>> [0x10502cc72]  event.c:2717
>>>   +   264 apr_thread_join  (in libapr-2.0.dylib) + 44  
>>> [0x1048b347c]  thread.c:256
>>>   + 264 _pthread_join  (in libsystem_pthread.dylib) + 362  
>>> [0x7fff20312f60]
>>>   +   264 __ulock_wait  (in libsystem_kernel.dylib) + 10  
>>> [0x7fff202dd9ee]
>> 
>> OK, that's where the child waits for worker threads.
>> 
>>>   264 Thread_36766548
>>>   + 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
>>>   +   264 _pthread_start  (in libsystem_pthread.dylib) + 224  
>>> [0x7fff203118fc]
>>>   + 264 dummy_worker  (in l

Re: child exit on self-proxy

2021-07-07 Thread Stefan Eissing



> Am 07.07.2021 um 13:51 schrieb Yann Ylavic :
> 
> On Wed, Jul 7, 2021 at 12:16 PM Stefan Eissing
>  wrote:
>> 
>> I added a TRACE1 log in event.c before/after join_workers (line 2921) and 
>> see:
>> 
>> [Wed Jul 07 10:06:03.144044 2021] [mpm_event:trace1] [pid 72886:tid 
>> 4493635072] event.c(2921): graceful termination received, join workers
>> [Wed Jul 07 10:06:03.144213 2021] [mpm_event:trace1] [pid 72886:tid 
>> 123145435672576] event.c(1799): All workers are busy or dying, will close 1 
>> keep-alive connections
>> [Wed Jul 07 10:06:08.079690 2021] [mpm_event:trace1] [pid 72886:tid 
>> 123145435672576] event.c(1799): All workers are busy or dying, will close 0 
>> keep-alive connections
>> [Wed Jul 07 10:06:10.78 2021] [core:warn] [pid 72813:tid 4493635072] 
>> AH00045: child process 72886 still did not exit, sending a SIGTERM
>> [Wed Jul 07 10:06:12.789335 2021] [core:warn] [pid 72813:tid 4493635072] 
>> AH00045: child process 72886 still did not exit, sending a SIGTERM
>> [Wed Jul 07 10:06:14.791281 2021] [core:warn] [pid 72813:tid 4493635072] 
>> AH00045: child process 72886 still did not exit, sending a SIGTERM
>> [Wed Jul 07 10:06:16.792983 2021] [core:error] [pid 72813:tid 4493635072] 
>> AH00046: child process 72886 still did not exit, sending a SIGKILL
>> 
>> So, I assume the keep-alive connection is the mod_proxy_http connection to 
>> the sever itself. Since the join_workers() never returns, there seems to be 
>> a thread not finishing.
> 
> No, MPM event doesn't handle mod_proxy kept alive connections, those
> are client connections.
> 
>> 
>> On another run, I got a stacktrace of the child:
>> 
>> Call graph:
>>264 Thread_36766542   DispatchQueue_1: com.apple.main-thread  (serial)
>>+ 264 start  (in libdyld.dylib) + 1  [0x7fff2032cf5d]
>>+   264 main  (in httpd) + 2278  [0x104729b86]  main.c:862
>>+ 264 ap_run_mpm  (in httpd) + 75  [0x10473917b]  mpm_common.c:100
>>+   264 event_run  (in mod_mpm_event.so) + 2994  [0x10502ae62]  
>> event.c:3398
>>+ 264 make_child  (in mod_mpm_event.so) + 436  [0x10502bbc4]  
>> event.c:2997
>>+   264 child_main  (in mod_mpm_event.so) + 1734  [0x10502c2e6]  
>> event.c:2924
>>+ 264 join_workers  (in mod_mpm_event.so) + 386  
>> [0x10502cc72]  event.c:2717
>>+   264 apr_thread_join  (in libapr-2.0.dylib) + 44  
>> [0x1048b347c]  thread.c:256
>>+ 264 _pthread_join  (in libsystem_pthread.dylib) + 362  
>> [0x7fff20312f60]
>>+   264 __ulock_wait  (in libsystem_kernel.dylib) + 10  
>> [0x7fff202dd9ee]
> 
> OK, that's where the child waits for worker threads.
> 
>>264 Thread_36766548
>>+ 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
>>+   264 _pthread_start  (in libsystem_pthread.dylib) + 224  
>> [0x7fff203118fc]
>>+ 264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
>> thread.c:148
>>+   264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
>> h2_workers.c:260
>>+ 264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
>> [0x7fff20311e49]
>>+   264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
>> [0x7fff202decde]
>>264 Thread_36766549
>>+ 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
>>+   264 _pthread_start  (in libsystem_pthread.dylib) + 224  
>> [0x7fff203118fc]
>>+ 264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
>> thread.c:148
>>+   264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
>> h2_workers.c:260
>>+ 264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
>> [0x7fff20311e49]
>>+   264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
>> [0x7fff202decde]
>>264 Thread_36766550
>>+ 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
>>+   264 _pthread_start  (in libsystem_pthread.dylib) + 224  
>> [0x7fff203118fc]
>>+ 264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
>> thread.c:148
>>+   264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
>> h2_workers.c:260
>>+ 264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
>> [0x7fff20311e49]
>>+   264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
>> [0x7fff202decde]
>>264 Thread_36766551
>>+ 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
>>+   264 _pthread_start  (in libsystem_pthread.dylib) + 224  
>> [0x7fff203118fc]
>>+ 264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
>> thread.c:148
>>+   264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
>> h2_workers.c:260
>>+ 264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
>> [0x7fff20311e49]
>>+   264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
>> [0x7fff202decde]
> 
> The above mod_http2 threads are not accounted for by the MPM

Re: child exit on self-proxy

2021-07-07 Thread Stefan Eissing
Was busy on other things. Will try to get answers...

> Am 07.07.2021 um 13:51 schrieb Yann Ylavic :
> 
> On Wed, Jul 7, 2021 at 12:16 PM Stefan Eissing
>  wrote:
>> 
>> I added a TRACE1 log in event.c before/after join_workers (line 2921) and 
>> see:
>> 
>> [Wed Jul 07 10:06:03.144044 2021] [mpm_event:trace1] [pid 72886:tid 
>> 4493635072] event.c(2921): graceful termination received, join workers
>> [Wed Jul 07 10:06:03.144213 2021] [mpm_event:trace1] [pid 72886:tid 
>> 123145435672576] event.c(1799): All workers are busy or dying, will close 1 
>> keep-alive connections
>> [Wed Jul 07 10:06:08.079690 2021] [mpm_event:trace1] [pid 72886:tid 
>> 123145435672576] event.c(1799): All workers are busy or dying, will close 0 
>> keep-alive connections
>> [Wed Jul 07 10:06:10.78 2021] [core:warn] [pid 72813:tid 4493635072] 
>> AH00045: child process 72886 still did not exit, sending a SIGTERM
>> [Wed Jul 07 10:06:12.789335 2021] [core:warn] [pid 72813:tid 4493635072] 
>> AH00045: child process 72886 still did not exit, sending a SIGTERM
>> [Wed Jul 07 10:06:14.791281 2021] [core:warn] [pid 72813:tid 4493635072] 
>> AH00045: child process 72886 still did not exit, sending a SIGTERM
>> [Wed Jul 07 10:06:16.792983 2021] [core:error] [pid 72813:tid 4493635072] 
>> AH00046: child process 72886 still did not exit, sending a SIGKILL
>> 
>> So, I assume the keep-alive connection is the mod_proxy_http connection to 
>> the sever itself. Since the join_workers() never returns, there seems to be 
>> a thread not finishing.
> 
> No, MPM event doesn't handle mod_proxy kept alive connections, those
> are client connections.
> 
>> 
>> On another run, I got a stacktrace of the child:
>> 
>> Call graph:
>>264 Thread_36766542   DispatchQueue_1: com.apple.main-thread  (serial)
>>+ 264 start  (in libdyld.dylib) + 1  [0x7fff2032cf5d]
>>+   264 main  (in httpd) + 2278  [0x104729b86]  main.c:862
>>+ 264 ap_run_mpm  (in httpd) + 75  [0x10473917b]  mpm_common.c:100
>>+   264 event_run  (in mod_mpm_event.so) + 2994  [0x10502ae62]  
>> event.c:3398
>>+ 264 make_child  (in mod_mpm_event.so) + 436  [0x10502bbc4]  
>> event.c:2997
>>+   264 child_main  (in mod_mpm_event.so) + 1734  [0x10502c2e6]  
>> event.c:2924
>>+ 264 join_workers  (in mod_mpm_event.so) + 386  
>> [0x10502cc72]  event.c:2717
>>+   264 apr_thread_join  (in libapr-2.0.dylib) + 44  
>> [0x1048b347c]  thread.c:256
>>+ 264 _pthread_join  (in libsystem_pthread.dylib) + 362  
>> [0x7fff20312f60]
>>+   264 __ulock_wait  (in libsystem_kernel.dylib) + 10  
>> [0x7fff202dd9ee]
> 
> OK, that's where the child waits for worker threads.
> 
>>264 Thread_36766548
>>+ 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
>>+   264 _pthread_start  (in libsystem_pthread.dylib) + 224  
>> [0x7fff203118fc]
>>+ 264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
>> thread.c:148
>>+   264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
>> h2_workers.c:260
>>+ 264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
>> [0x7fff20311e49]
>>+   264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
>> [0x7fff202decde]
>>264 Thread_36766549
>>+ 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
>>+   264 _pthread_start  (in libsystem_pthread.dylib) + 224  
>> [0x7fff203118fc]
>>+ 264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
>> thread.c:148
>>+   264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
>> h2_workers.c:260
>>+ 264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
>> [0x7fff20311e49]
>>+   264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
>> [0x7fff202decde]
>>264 Thread_36766550
>>+ 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
>>+   264 _pthread_start  (in libsystem_pthread.dylib) + 224  
>> [0x7fff203118fc]
>>+ 264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
>> thread.c:148
>>+   264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
>> h2_workers.c:260
>>+ 264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
>> [0x7fff20311e49]
>>+   264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
>> [0x7fff202decde]
>>264 Thread_36766551
>>+ 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
>>+   264 _pthread_start  (in libsystem_pthread.dylib) + 224  
>> [0x7fff203118fc]
>>+ 264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
>> thread.c:148
>>+   264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
>> h2_workers.c:260
>>+ 264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
>> [0x7fff20311e49]
>>+   264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
>> [0x7fff202decde]
> 
> The above

Re: migration of the HTTPD project website

2021-07-07 Thread Daniel Gruno

On 07/07/2021 14.14, Eric Covener wrote:

https://bz.apache.org/bugzilla/show_bug.cgi?id=65439

whoops, looks like the manual links from for the following are not working:

https://httpd.apache.org/mod_ftp/
https://httpd.apache.org/mod_cgid/

Daniel, do you recall how this worked before? Maybe we lost some of
this extra content when you moved where httpd.apache.org points?


I know where it originally pointed too, and I'll try to weave that into 
the new site checkout. Should apply within 30 minutes.





On Mon, Jul 5, 2021 at 4:15 AM Yann Ylavic  wrote:


On Mon, Jul 5, 2021 at 9:56 AM Ruediger Pluem  wrote:


I don't think that we want to release them. I am fine with omitting trunk from 
the path. Anyone objected to

svn mv https://svn.apache.org/repos/asf/httpd/site/trunk/tools 
https://svn.apache.org/repos/asf/httpd/dev-tools/

?


+1

Regards;
Yann.




--
Eric Covener
cove...@gmail.com





Re: migration of the HTTPD project website

2021-07-07 Thread Eric Covener
https://bz.apache.org/bugzilla/show_bug.cgi?id=65439

whoops, looks like the manual links from for the following are not working:

https://httpd.apache.org/mod_ftp/
https://httpd.apache.org/mod_cgid/

Daniel, do you recall how this worked before? Maybe we lost some of
this extra content when you moved where httpd.apache.org points?


On Mon, Jul 5, 2021 at 4:15 AM Yann Ylavic  wrote:
>
> On Mon, Jul 5, 2021 at 9:56 AM Ruediger Pluem  wrote:
> >
> > I don't think that we want to release them. I am fine with omitting trunk 
> > from the path. Anyone objected to
> >
> > svn mv https://svn.apache.org/repos/asf/httpd/site/trunk/tools 
> > https://svn.apache.org/repos/asf/httpd/dev-tools/
> >
> > ?
>
> +1
>
> Regards;
> Yann.



--
Eric Covener
cove...@gmail.com


Re: child exit on self-proxy

2021-07-07 Thread Yann Ylavic
On Wed, Jul 7, 2021 at 12:16 PM Stefan Eissing
 wrote:
>
> I added a TRACE1 log in event.c before/after join_workers (line 2921) and see:
>
> [Wed Jul 07 10:06:03.144044 2021] [mpm_event:trace1] [pid 72886:tid 
> 4493635072] event.c(2921): graceful termination received, join workers
> [Wed Jul 07 10:06:03.144213 2021] [mpm_event:trace1] [pid 72886:tid 
> 123145435672576] event.c(1799): All workers are busy or dying, will close 1 
> keep-alive connections
> [Wed Jul 07 10:06:08.079690 2021] [mpm_event:trace1] [pid 72886:tid 
> 123145435672576] event.c(1799): All workers are busy or dying, will close 0 
> keep-alive connections
> [Wed Jul 07 10:06:10.78 2021] [core:warn] [pid 72813:tid 4493635072] 
> AH00045: child process 72886 still did not exit, sending a SIGTERM
> [Wed Jul 07 10:06:12.789335 2021] [core:warn] [pid 72813:tid 4493635072] 
> AH00045: child process 72886 still did not exit, sending a SIGTERM
> [Wed Jul 07 10:06:14.791281 2021] [core:warn] [pid 72813:tid 4493635072] 
> AH00045: child process 72886 still did not exit, sending a SIGTERM
> [Wed Jul 07 10:06:16.792983 2021] [core:error] [pid 72813:tid 4493635072] 
> AH00046: child process 72886 still did not exit, sending a SIGKILL
>
> So, I assume the keep-alive connection is the mod_proxy_http connection to 
> the sever itself. Since the join_workers() never returns, there seems to be a 
> thread not finishing.

No, MPM event doesn't handle mod_proxy kept alive connections, those
are client connections.

>
> On another run, I got a stacktrace of the child:
>
> Call graph:
> 264 Thread_36766542   DispatchQueue_1: com.apple.main-thread  (serial)
> + 264 start  (in libdyld.dylib) + 1  [0x7fff2032cf5d]
> +   264 main  (in httpd) + 2278  [0x104729b86]  main.c:862
> + 264 ap_run_mpm  (in httpd) + 75  [0x10473917b]  mpm_common.c:100
> +   264 event_run  (in mod_mpm_event.so) + 2994  [0x10502ae62]  
> event.c:3398
> + 264 make_child  (in mod_mpm_event.so) + 436  [0x10502bbc4]  
> event.c:2997
> +   264 child_main  (in mod_mpm_event.so) + 1734  [0x10502c2e6]  
> event.c:2924
> + 264 join_workers  (in mod_mpm_event.so) + 386  
> [0x10502cc72]  event.c:2717
> +   264 apr_thread_join  (in libapr-2.0.dylib) + 44  
> [0x1048b347c]  thread.c:256
> + 264 _pthread_join  (in libsystem_pthread.dylib) + 362  
> [0x7fff20312f60]
> +   264 __ulock_wait  (in libsystem_kernel.dylib) + 10  
> [0x7fff202dd9ee]

OK, that's where the child waits for worker threads.

> 264 Thread_36766548
> + 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
> +   264 _pthread_start  (in libsystem_pthread.dylib) + 224  
> [0x7fff203118fc]
> + 264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
> thread.c:148
> +   264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
> h2_workers.c:260
> + 264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
> [0x7fff20311e49]
> +   264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
> [0x7fff202decde]
> 264 Thread_36766549
> + 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
> +   264 _pthread_start  (in libsystem_pthread.dylib) + 224  
> [0x7fff203118fc]
> + 264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
> thread.c:148
> +   264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
> h2_workers.c:260
> + 264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
> [0x7fff20311e49]
> +   264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
> [0x7fff202decde]
> 264 Thread_36766550
> + 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
> +   264 _pthread_start  (in libsystem_pthread.dylib) + 224  
> [0x7fff203118fc]
> + 264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
> thread.c:148
> +   264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
> h2_workers.c:260
> + 264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
> [0x7fff20311e49]
> +   264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
> [0x7fff202decde]
> 264 Thread_36766551
> + 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
> +   264 _pthread_start  (in libsystem_pthread.dylib) + 224  
> [0x7fff203118fc]
> + 264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
> thread.c:148
> +   264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
> h2_workers.c:260
> + 264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
> [0x7fff20311e49]
> +   264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
> [0x7fff202decde]

The above mod_http2 threads are not accounted for by the MPM I suppose
(they should be stopped by the future child_exit hook, but still the
MPM will only care about its own worker threads, i.e. thos

Re: child exit on self-proxy

2021-07-07 Thread Stefan Eissing



> Am 07.07.2021 um 11:49 schrieb Ruediger Pluem :
> 
> 
> 
> On 7/7/21 11:45 AM, Stefan Eissing wrote:
>> In my h2 test suite, I do a setup where I use proxy configs against the 
>> server itself. We seem to have a problem performing a clean child exit with 
>> that. Test in 2.4.48 and trunk:
>> - run tests with several graceful restarts
>> - no proxied request, clean exit
>> - with proxied requests
>>  AH00045: child process 53682 still did not exit, sending a SIGTERM
>>  AH00045: child process 53682 still did not exit, sending a SIGTERM
>>  [often stops here, sometimes]
>>  ...
>>  AH00046: child process 53682 still did not exit, sending a SIGKILL
>> 
>> Question:
>> - is such a test setup doomed to fail in general?
>> - are we sure that we cannot encounter such states in "normal" setups?
>> 
>> If someone wants at log at whatever LogLevels, let me know. It's seems 
>> highly reproducible.
> 
> Do you have stack traces where these processes are hanging and a simple 
> config that causes this?

I added a TRACE1 log in event.c before/after join_workers (line 2921) and see:

[Wed Jul 07 10:06:03.144044 2021] [mpm_event:trace1] [pid 72886:tid 4493635072] 
event.c(2921): graceful termination received, join workers
[Wed Jul 07 10:06:03.144213 2021] [mpm_event:trace1] [pid 72886:tid 
123145435672576] event.c(1799): All workers are busy or dying, will close 1 
keep-alive connections
[Wed Jul 07 10:06:08.079690 2021] [mpm_event:trace1] [pid 72886:tid 
123145435672576] event.c(1799): All workers are busy or dying, will close 0 
keep-alive connections
[Wed Jul 07 10:06:10.78 2021] [core:warn] [pid 72813:tid 4493635072] 
AH00045: child process 72886 still did not exit, sending a SIGTERM
[Wed Jul 07 10:06:12.789335 2021] [core:warn] [pid 72813:tid 4493635072] 
AH00045: child process 72886 still did not exit, sending a SIGTERM
[Wed Jul 07 10:06:14.791281 2021] [core:warn] [pid 72813:tid 4493635072] 
AH00045: child process 72886 still did not exit, sending a SIGTERM
[Wed Jul 07 10:06:16.792983 2021] [core:error] [pid 72813:tid 4493635072] 
AH00046: child process 72886 still did not exit, sending a SIGKILL

So, I assume the keep-alive connection is the mod_proxy_http connection to the 
sever itself. Since the join_workers() never returns, there seems to be a 
thread not finishing.

On another run, I got a stacktrace of the child:

Call graph:
264 Thread_36766542   DispatchQueue_1: com.apple.main-thread  (serial)
+ 264 start  (in libdyld.dylib) + 1  [0x7fff2032cf5d]
+   264 main  (in httpd) + 2278  [0x104729b86]  main.c:862
+ 264 ap_run_mpm  (in httpd) + 75  [0x10473917b]  mpm_common.c:100
+   264 event_run  (in mod_mpm_event.so) + 2994  [0x10502ae62]  
event.c:3398
+ 264 make_child  (in mod_mpm_event.so) + 436  [0x10502bbc4]  
event.c:2997
+   264 child_main  (in mod_mpm_event.so) + 1734  [0x10502c2e6]  
event.c:2924
+ 264 join_workers  (in mod_mpm_event.so) + 386  [0x10502cc72]  
event.c:2717
+   264 apr_thread_join  (in libapr-2.0.dylib) + 44  
[0x1048b347c]  thread.c:256
+ 264 _pthread_join  (in libsystem_pthread.dylib) + 362  
[0x7fff20312f60]
+   264 __ulock_wait  (in libsystem_kernel.dylib) + 10  
[0x7fff202dd9ee]
264 Thread_36766548
+ 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
+   264 _pthread_start  (in libsystem_pthread.dylib) + 224  [0x7fff203118fc]
+ 264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
thread.c:148
+   264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
h2_workers.c:260
+ 264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
[0x7fff20311e49]
+   264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
[0x7fff202decde]
264 Thread_36766549
+ 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
+   264 _pthread_start  (in libsystem_pthread.dylib) + 224  [0x7fff203118fc]
+ 264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
thread.c:148
+   264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
h2_workers.c:260
+ 264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
[0x7fff20311e49]
+   264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
[0x7fff202decde]
264 Thread_36766550
+ 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
+   264 _pthread_start  (in libsystem_pthread.dylib) + 224  [0x7fff203118fc]
+ 264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
thread.c:148
+   264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
h2_workers.c:260
+ 264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
[0x7fff20311e49]
+   264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
[0x7fff202decde]
264 Thread_36766551
+ 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
+   

Re: child exit on self-proxy

2021-07-07 Thread Ruediger Pluem



On 7/7/21 11:45 AM, Stefan Eissing wrote:
> In my h2 test suite, I do a setup where I use proxy configs against the 
> server itself. We seem to have a problem performing a clean child exit with 
> that. Test in 2.4.48 and trunk:
> - run tests with several graceful restarts
> - no proxied request, clean exit
> - with proxied requests
>   AH00045: child process 53682 still did not exit, sending a SIGTERM
>   AH00045: child process 53682 still did not exit, sending a SIGTERM
>   [often stops here, sometimes]
>   ...
>   AH00046: child process 53682 still did not exit, sending a SIGKILL
> 
> Question:
> - is such a test setup doomed to fail in general?
> - are we sure that we cannot encounter such states in "normal" setups?
> 
> If someone wants at log at whatever LogLevels, let me know. It's seems highly 
> reproducible.

Do you have stack traces where these processes are hanging and a simple config 
that causes this?

Regards

Rüdiger



child exit on self-proxy

2021-07-07 Thread Stefan Eissing
In my h2 test suite, I do a setup where I use proxy configs against the server 
itself. We seem to have a problem performing a clean child exit with that. Test 
in 2.4.48 and trunk:
- run tests with several graceful restarts
- no proxied request, clean exit
- with proxied requests
  AH00045: child process 53682 still did not exit, sending a SIGTERM
  AH00045: child process 53682 still did not exit, sending a SIGTERM
  [often stops here, sometimes]
  ...
  AH00046: child process 53682 still did not exit, sending a SIGKILL

Question:
- is such a test setup doomed to fail in general?
- are we sure that we cannot encounter such states in "normal" setups?

If someone wants at log at whatever LogLevels, let me know. It's seems highly 
reproducible.

- Stefan

Re: svn commit: r1891321 - in /httpd/test/framework/trunk/t: conf/proxy.conf.in modules/proxy_balancer.t

2021-07-07 Thread jean-frederic clere

On 07/07/2021 09:00, Ruediger Pluem wrote:



On 7/6/21 6:56 PM, jfcl...@apache.org wrote:

Author: jfclere
Date: Tue Jul  6 16:56:47 2021
New Revision: 1891321

URL: http://svn.apache.org/viewvc?rev=1891321&view=rev
Log:
Add tests for dynamic part of mod_proxy_balancer.

Modified:
 httpd/test/framework/trunk/t/conf/proxy.conf.in
 httpd/test/framework/trunk/t/modules/proxy_balancer.t




Modified: httpd/test/framework/trunk/t/modules/proxy_balancer.t
URL: 
http://svn.apache.org/viewvc/httpd/test/framework/trunk/t/modules/proxy_balancer.t?rev=1891321&r1=1891320&r2=1891321&view=diff
==
--- httpd/test/framework/trunk/t/modules/proxy_balancer.t (original)
+++ httpd/test/framework/trunk/t/modules/proxy_balancer.t Tue Jul  6 16:56:47 
2021



@@ -54,4 +87,39 @@ foreach my $t (@echos) {
  skip $skipbodyfailover, t_cmp($r->content, $t, "response body echoed");
  }
  
+# test dynamic part

+$r = GET("/balancer-manager");
+ok t_cmp($r->code, 200, "Can't find balancer-manager");
+
+# get the nonce and add a worker
+my $result = GetNonce("/balancer-manager", "dynproxy");
+
+my $query = 
"b_lbm=byrequests&b_tmo=0&b_max=0&b_sforce=0&b_ss=&b_nwrkr=ajp%3A%2F%2F%5B0%3A0%3A0%3A0%3A0%3A0%3A0%3A1%5D%3A8080&b_wyes=1&b=dynproxy&nonce="
 . $result;
+my @proxy_balancer_headers;
+my $vars   = Apache::Test::vars();
+push @proxy_balancer_headers, "Referer" => "http://"; . $vars->{servername} . ":" . 
$vars->{port} . "/balancer-manager";
+
+# First try with the referer it should fail.


s/with/without/ ?


Oops fixed, thanks.




+if (have_min_apache_version("2.4.41")) {
+  $r = POST("/balancer-manager", content => $query);
+  ok t_cmp($r->code, 200, "request failed");
+  ok !t_cmp($r->content, qr/ajp/, "AJP worker created");
+}
  


Regards

Rüdiger




--
Cheers

Jean-Frederic



Re: svn commit: r1891321 - in /httpd/test/framework/trunk/t: conf/proxy.conf.in modules/proxy_balancer.t

2021-07-07 Thread Ruediger Pluem



On 7/6/21 6:56 PM, jfcl...@apache.org wrote:
> Author: jfclere
> Date: Tue Jul  6 16:56:47 2021
> New Revision: 1891321
> 
> URL: http://svn.apache.org/viewvc?rev=1891321&view=rev
> Log:
> Add tests for dynamic part of mod_proxy_balancer.
> 
> Modified:
> httpd/test/framework/trunk/t/conf/proxy.conf.in
> httpd/test/framework/trunk/t/modules/proxy_balancer.t
> 

> Modified: httpd/test/framework/trunk/t/modules/proxy_balancer.t
> URL: 
> http://svn.apache.org/viewvc/httpd/test/framework/trunk/t/modules/proxy_balancer.t?rev=1891321&r1=1891320&r2=1891321&view=diff
> ==
> --- httpd/test/framework/trunk/t/modules/proxy_balancer.t (original)
> +++ httpd/test/framework/trunk/t/modules/proxy_balancer.t Tue Jul  6 16:56:47 
> 2021

> @@ -54,4 +87,39 @@ foreach my $t (@echos) {
>  skip $skipbodyfailover, t_cmp($r->content, $t, "response body echoed");
>  }
>  
> +# test dynamic part
> +$r = GET("/balancer-manager");
> +ok t_cmp($r->code, 200, "Can't find balancer-manager");
> +
> +# get the nonce and add a worker
> +my $result = GetNonce("/balancer-manager", "dynproxy");
> +
> +my $query = 
> "b_lbm=byrequests&b_tmo=0&b_max=0&b_sforce=0&b_ss=&b_nwrkr=ajp%3A%2F%2F%5B0%3A0%3A0%3A0%3A0%3A0%3A0%3A1%5D%3A8080&b_wyes=1&b=dynproxy&nonce="
>  . $result;
> +my @proxy_balancer_headers;
> +my $vars   = Apache::Test::vars();
> +push @proxy_balancer_headers, "Referer" => "http://"; . $vars->{servername} . 
> ":" . $vars->{port} . "/balancer-manager";
> +
> +# First try with the referer it should fail.

s/with/without/ ?

> +if (have_min_apache_version("2.4.41")) {
> +  $r = POST("/balancer-manager", content => $query);
> +  ok t_cmp($r->code, 200, "request failed");
> +  ok !t_cmp($r->content, qr/ajp/, "AJP worker created");
> +}
>  

Regards

Rüdiger