On Mon, Feb 24, 2014 at 1:20 PM, Lu, Yingqi <yingqi...@intel.com> wrote:
> Hi All, > > > > I just want to ping again on this patch to gather your feedback and > comments. Please refer to the messages below for patch details. > > > > Thanks, > > Yingqi > Hi Yinqqi, I'm sorry that nobody has responded yet. I'll try to do so very soon. > > > *From:* Lu, Yingqi > *Sent:* Monday, February 10, 2014 11:13 AM > > *To:* dev@httpd.apache.org > *Subject:* RE: [PATCH ASF bugzilla# 55897]prefork_mpm patch with > SO_REUSEPORT support > > > > I am reattaching the patch in case you missed the original email. > > > > Thanks, > > Yingqi > > > > *From:* Lu, Yingqi > *Sent:* Monday, February 10, 2014 11:09 AM > *To:* dev@httpd.apache.org > *Subject:* RE: [PATCH ASF bugzilla# 55897]prefork_mpm patch with > SO_REUSEPORT support > > > > Hi All, > > > > I just want to ping again on this patch to see if there are any feedback > and comments. This is our first patch to the Apache community. Please let > us know if there is anything we can do to help you test and comment the > patch. > > > > Thanks, > > Yingqi > > > > *From:* Lu, Yingqi > *Sent:* Friday, January 24, 2014 3:26 PM > *To:* dev@httpd.apache.org > *Subject:* [PATCH ASF bugzilla# 55897]prefork_mpm patch with SO_REUSEPORT > support > > > > Dear All, > > > > Our analysis of Apache httpd 2.4.7 prefork mpm, on 32 and 64 thread Intel > Xeon 2600 series systems, using an open source three tier social networking > web server workload, revealed performance scaling issues. In current > software single listen statement (listen 80) provides better scalability > due to un-serialized accept. However, when system is under very high load, > this can lead to big number of child processes stuck in D state. On the > other hand, the serialized accept approach cannot scale with the high load > either. In our analysis, a 32-thread system, with 2 listen statements > specified, could scale to just 70% utilization, and a 64-thread system, > with signal listen statement specified (listen 80, 4 network interfaces), > could scale to only 60% utilization. > > > > Based on those findings, we created a prototype patch for prefork mpm > which extends performance and thread utilization. In Linux kernel newer > than 3.9, SO_REUSEPORT is enabled. This feature allows multiple sockets > listen to the same IP:port and automatically round robins connections. We > use this feature to create multiple duplicated listener records of the > original one and partition the child processes into buckets. Each bucket > listens to 1 IP:port. In case of old kernel which does not have the > SO_REUSEPORT enabled, we modified the "multiple listen statement case" by > creating 1 listen record for each listen statement and partitioning the > child processes into different buckets. Each bucket listens to 1 IP:port. > > > > Quick tests of the patch, running the same workload, demonstrated a 22% > throughput increase with 32-threads system and 2 listen statements (Linux > kernel 3.10.4). With the older kernel (Linux Kernel 3.8.8, without > SO_REUSEPORT), 10% performance gain was measured. With single listen > statement (listen 80) configuration, we observed over 2X performance > improvements on modern dual socket Intel platforms (Linux Kernel 3.10.4). > We also observed big reduction in response time, in addition to the > throughput improvement gained in our tests 1. > > > > Following the feedback from the bugzilla website where we originally > submitted the patch, we removed the dependency of APR change to simplify > the patch testing process. Thanks Jeff Trawick for his good suggestion! We > are also actively working on extending the patch to worker and event MPMs, > as a next step. Meanwhile, we would like to gather comments from all of you > on the current prefork patch. Please take some time test it and let us know > how it works in your environment. > > > > This is our first patch to the Apache community. Please help us review it > and let us know if there is anything we might revise to improve it. Your > feedback is very much appreciated. > > > > *Configuration:* > > <IfModule prefork.c> > > ListenBacklog 105384 > > ServerLimit 105000 > > MaxClients 1024 > > MaxRequestsPerChild 0 > > StartServers 64 > > MinSpareServers 8 > > MaxSpareServers 16 > > </IfModule> > > > > 1. Software and workloads used in performance tests may have been > optimized for performance only on Intel microprocessors. Performance tests, > such as SYSmark and MobileMark, are measured using specific computer > systems, components, software, operations and functions. Any change to any > of those factors may cause the results to vary. You should consult other > information and performance tests to assist you in fully evaluating your > contemplated purchases, including the performance of that product when > combined with other products. > > > > Thanks, > > Yingqi > -- Born in Roswell... married an alien... http://emptyhammock.com/ http://edjective.org/