Dear All, Our analysis of Apache httpd 2.4.7 prefork mpm, on 32 and 64 thread Intel Xeon 2600 series systems, using an open source three tier social networking web server workload, revealed performance scaling issues. In current software single listen statement (listen 80) provides better scalability due to un-serialized accept. However, when system is under very high load, this can lead to big number of child processes stuck in D state. On the other hand, the serialized accept approach cannot scale with the high load either. In our analysis, a 32-thread system, with 2 listen statements specified, could scale to just 70% utilization, and a 64-thread system, with signal listen statement specified (listen 80, 4 network interfaces), could scale to only 60% utilization.
Based on those findings, we created a prototype patch for prefork mpm which
extends performance and thread utilization. In Linux kernel newer than 3.9,
SO_REUSEPORT is enabled. This feature allows multiple sockets listen to the
same IP:port and automatically round robins connections. We use this feature to
create multiple duplicated listener records of the original one and partition
the child processes into buckets. Each bucket listens to 1 IP:port. In case of
old kernel which does not have the SO_REUSEPORT enabled, we modified the
"multiple listen statement case" by creating 1 listen record for each listen
statement and partitioning the child processes into different buckets. Each
bucket listens to 1 IP:port.
Quick tests of the patch, running the same workload, demonstrated a 22%
throughput increase with 32-threads system and 2 listen statements (Linux
kernel 3.10.4). With the older kernel (Linux Kernel 3.8.8, without
SO_REUSEPORT), 10% performance gain was measured. With single listen statement
(listen 80) configuration, we observed over 2X performance improvements on
modern dual socket Intel platforms (Linux Kernel 3.10.4). We also observed big
reduction in response time, in addition to the throughput improvement gained in
our tests 1.
Following the feedback from the bugzilla website where we originally submitted
the patch, we removed the dependency of APR change to simplify the patch
testing process. Thanks Jeff Trawick for his good suggestion! We are also
actively working on extending the patch to worker and event MPMs, as a next
step. Meanwhile, we would like to gather comments from all of you on the
current prefork patch. Please take some time test it and let us know how it
works in your environment.
This is our first patch to the Apache community. Please help us review it and
let us know if there is anything we might revise to improve it. Your feedback
is very much appreciated.
Configuration:
<IfModule prefork.c>
ListenBacklog 105384
ServerLimit 105000
MaxClients 1024
MaxRequestsPerChild 0
StartServers 64
MinSpareServers 8
MaxSpareServers 16
</IfModule>
1. Software and workloads used in performance tests may have been optimized for
performance only on Intel microprocessors. Performance tests, such as SYSmark
and MobileMark, are measured using specific computer systems, components,
software, operations and functions. Any change to any of those factors may
cause the results to vary. You should consult other information and performance
tests to assist you in fully evaluating your contemplated purchases, including
the performance of that product when combined with other products.
Thanks,
Yingqi
unified.diff.httpd-2.4.7.patch
Description: unified.diff.httpd-2.4.7.patch
