https://bz.apache.org/bugzilla/show_bug.cgi?id=66615
Bug ID: 66615
Summary: httpd kills keepalive connections when idle workers
available
Product: Apache httpd-2
Version: 2.4.37
Hardware: PC
OS: Linux
Status: NEW
Severity: regression
Priority: P2
Component: mpm_event
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---
I have two identical VMs - 16GB RAM, 16 vCPUs. One is fresh Centos 7 install,
the other is fresh Rocky 8. I installed httpd (on Centos 7it's version 2.4.6
and on Rocky 8 it's 2.4.37), configured them to point to the same static
default html file and enabled mpm event on Centos (mpm event is default on
Rocky). Then I added following options to default config on both servers:
```
<IfModule mpm_event_module>
ThreadsPerChild 25
StartServers 3
ServerLimit 120
MinSpareThreads 75
MaxSpareThreads 3000
MaxRequestWorkers 3000
MaxConnectionsPerChild 0
</IfModule>
```
After this is done I performed ab tests with keepalive using different Centos 7
VM in the same local network. On Centos I am able to complete 1 milion requests
at 1000 concurrent connections with little to no errors, however with version
2.4.37 on Rocky I get a lot of failed requests due to length and exceptions.
Served content is static so I am assuming this is because keepalive connections
are closed by the server.
This problem occurs only when using keepalive. There are no errors when using
ab without -k option, although speed is lower. I can replicate this issue on
newest httpd build from source (2.4.57).
Here is example logs with trace1 for mpm-event enabled on apache 2.4.37 during
test:
```
[Tue May 23 08:21:24.206092 2023] [mpm_event:trace1] [pid 2123:tid
140575713961728] event.c(1583): Idle workers: 22
[Tue May 23 08:21:24.206300 2023] [mpm_event:debug] [pid 2291:tid
140575713961728] event.c(1580): Too many open connections (72), not
accepting new conns in
this process
[Tue May 23 08:21:24.206303 2023] [mpm_event:trace1] [pid 2291:tid
140575713961728] event.c(1583): Idle workers: 23
[Tue May 23 08:21:24.214594 2023] [mpm_event:debug] [pid 2402:tid
140575713961728] event.c(386): AH00457: Accepting new connections
again: 71 active conns (
1 lingering/0 clogged/0 suspended), 24 idle workers
[Tue May 23 08:21:24.214651 2023] [mpm_event:debug] [pid 2402:tid
140575713961728] event.c(1580): Too many open connections (71), not
accepting new conns in
this process
[Tue May 23 08:21:24.214657 2023] [mpm_event:trace1] [pid 2402:tid
140575713961728] event.c(1583): Idle workers: 17
[Tue May 23 08:21:24.224628 2023] [mpm_event:debug] [pid 2402:tid
140575713961728] event.c(386): AH00457: Accepting new connections
again: 71 active conns (
1 lingering/0 clogged/0 suspended), 24 idle workers
[Tue May 23 08:21:24.224677 2023] [mpm_event:debug] [pid 2402:tid
140575713961728] event.c(1580): Too many open connections (71), not
accepting new conns in
this process
[Tue May 23 08:21:24.224681 2023] [mpm_event:trace1] [pid 2402:tid
140575713961728] event.c(1583): Idle workers: 18
[Tue May 23 08:21:24.224986 2023] [mpm_event:debug] [pid 2402:tid
140575713961728] event.c(386): AH00457: Accepting new connections
again: 70 active conns (
2 lingering/0 clogged/0 suspended), 23 idle workers
[Tue May 23 08:21:24.225018 2023] [mpm_event:debug] [pid 2402:tid
140575713961728] event.c(1580): Too many open connections (70), not
accepting new conns in
this process
[Tue May 23 08:21:24.225024 2023] [mpm_event:trace1] [pid 2402:tid
140575713961728] event.c(1583): Idle workers: 19
[Tue May 23 08:21:24.227927 2023] [mpm_event:debug] [pid 2121:tid
140575713961728] event.c(386): AH00457: Accepting new connections
again: 73 active conns (
4 lingering/0 clogged/0 suspended), 25 idle workers
[Tue May 23 08:21:24.227978 2023] [mpm_event:debug] [pid 2121:tid
140575713961728] event.c(1580): Too many open connections (73), not
accepting new conns in
this process
[Tue May 23 08:21:24.227982 2023] [mpm_event:trace1] [pid 2121:tid
140575713961728] event.c(1583): Idle workers: 21
[Tue May 23 08:21:24.233929 2023] [mpm_event:debug] [pid 2402:tid
140575713961728] event.c(386): AH00457: Accepting new connections
again: 70 active conns (
2 lingering/0 clogged/0 suspended), 24 idle workers
[Tue May 23 08:21:24.233981 2023] [mpm_event:debug] [pid 2402:tid
140575713961728] event.c(1580): Too many open connections (70), not
accepting new conns in
this process
[Tue May 23 08:21:24.233987 2023] [mpm_event:trace1] [pid 2402:tid
140575713961728] event.c(1583): Idle workers: 21
[Tue May 23 08:21:24.234230 2023] [mpm_event:debug] [pid 2402:tid
140575713961728] event.c(386): AH00457: Accepting new connections
again: 72 active conns (
2 lingering/0 clogged/0 suspended), 24 idle workers
[Tue May 23 08:21:24.234247 2023] [mpm_event:debug] [pid 2402:tid
140575713961728] event.c(1580): Too many open connections (72), not
accepting new conns in
this process
[Tue May 23 08:21:24.234250 2023] [mpm_event:trace1] [pid 2402:tid
140575713961728] event.c(1583): Idle workers: 22
[Tue May 23 08:21:24.234601 2023] [mpm_event:debug] [pid 2402:tid
140575713961728] event.c(386): AH00457: Accepting new connections
again: 70 active conns (
0 lingering/0 clogged/0 suspended), 24 idle workers
[Tue May 23 08:21:24.234618 2023] [mpm_event:debug] [pid 2402:tid
140575713961728] event.c(1580): Too many open connections (70), not
accepting new conns in
this process
```
I can see 2 problems during my tests.
httpd does not add enough servers when test is running. It kills keepalive
connections and logs "all workers busy or dying" but adds only up to 25 workers
with config mentioned before.
httpd seems to not register that it has free workers even when I set
StartServers to 120. There are thousands of idle threads and it still logs "all
workers busy or dying" and kills keepalive connections. This can be worked
around by setting ThreadsPerChild and ThreadLimit much higher and lowering
StartServers/ServerLimit respectively. For example with following settings I
can easily process over 1500 concurrent connections without errors and
keepalive killing:
```
<IfModule mpm_event_module>
ThreadsPerChild 200
ThreadLimit 200
StartServers 10
ServerLimit 15
MinSpareThreads 75
MaxSpareThreads 3000
MaxRequestWorkers 3000
MaxConnectionsPerChild 0
</IfModule>
```
If I am understanding this correctly it looks this: each server
(StartServers/ServerLimit) has workers (ThreadsPerChild) and is getting
connections from listener. When ThreadsPerChild is low and connection rate is
high listener overfills workers often which in turn causes server to kill its
keepalive connections to free some workers. When we set ThreadsPerChild higher
we can get to much higher count of concurrent connections before encountering
this problem. Please correct me if I am wrong.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]