FTR: I saw no reason to try to handle both patches... I used the so_reuseport patch as the primary patch to focus on.
I have some minor changes coming up to follow-up post the initial commit On Jun 3, 2014, at 8:51 AM, Jim Jagielski <j...@jagunet.com> wrote: > I have folded this into trunk and am currently fixing some > compile warnings and errors... > > On Jun 2, 2014, at 4:22 AM, Lu, Yingqi <yingqi...@intel.com> wrote: > >> Hi Jim, >> >> Personally, I think the second approach is better, it keeps >> ap_mpm_pod_signal () and ap_mpm_pod_killpg () exactly as the original ones, >> only modifies dummy_connection (). Please let me know if you have different >> opinions. >> >> Attached is the latest version of the two patches. They were both generated >> against trunk rev. 1598561. Please review them and let me know if there is >> anything missing. >> >> I already updated the Bugzilla database for the item 55897 and item 56279. >> >> Thanks, >> Yingqi >> >> -----Original Message----- >> From: Lu, Yingqi [mailto:yingqi...@intel.com] >> Sent: Saturday, May 31, 2014 11:48 PM >> To: dev@httpd.apache.org >> Subject: RE: [PATCH ASF bugzilla# 55897]prefork_mpm patch with SO_REUSEPORT >> support >> >> Hi Jim, >> >> Regarding to your comment #2, yes, you are right, it should be >> ap_mpm_pod_killpg(pod, retained->max_daemons_limit, i). Thanks very much for >> catching this. >> >> Regarding to your comment #1, the patch modifies the >> dummy_connection(ap_pod_t *pod) to be dummy_connection(ap_pod_t *pod, int >> child_bucket). Inside the function, the reference listen statement is >> mpm_listen[child_bucket]. And, ap_mpm_pod_signal() calls dummy_connection(). >> >> Can we just modify the return of ap_mpm_pod_signal() from >> dummy_connection(pod) to dummy_connection(pod, 0) and add >> ap_mpm_pod_signal_ex()? >> >> Or, if we need to keep ap_mpm_pod_signal() exactly as the original, I can >> modify dummy_connection() to send dummy data via all the duplicated listen >> statements. Then, we do not need child_bucket as the input parameter for >> dummy_connection(). In this case, we do not need adding >> ap_mpm_pod_signal_ex() too. >> >> I already tested the code for the above approaches and they both work. >> >> Please let me know which way you think is better. I can quickly send you an >> update for review. >> >> Thanks, >> Yingqi >> >> -----Original Message----- >> From: Lu, Yingqi >> Sent: Saturday, May 31, 2014 3:28 PM >> To: <dev@httpd.apache.org> >> Subject: Re: [PATCH ASF bugzilla# 55897]prefork_mpm patch with SO_REUSEPORT >> support >> >> Hi Jim, >> >> Thanks very much for your email! I will look into both of them and send an >> update tonight! >> >> Thanks, >> Yingqi >> >>> On May 31, 2014, at 9:43 AM, "Jim Jagielski" <j...@jagunet.com> wrote: >>> >>> I also see: >>> >>> /* kill off the idle ones */ >>> - ap_mpm_pod_killpg(pod, retained->max_daemons_limit); >>> + for (i = 0; i < num_buckets; i++) { >>> + ap_mpm_pod_killpg(pod[i], i, retained->max_daemons_limit); >>> + } >>> >>> >>> Is that right? Why isn't it: ap_mpm_pod_killpg(pod, >>> retained->max_daemons_limit, i); ?? >>> >>> /** >>> * Write data to the pipe-of-death, signalling that all child process >>> * should die. >>> * @param pod The pipe-of-death to write to. >>> * @param num The number of child processes to kill >>> + * @param my_bucket the bucket that holds the dying child process. >>> */ >>> -AP_DECLARE(void) ap_mpm_pod_killpg(ap_pod_t *pod, int num); >>> +AP_DECLARE(void) ap_mpm_pod_killpg(ap_pod_t *pod, int num, int >>> +child_bucket); >>> >>> Isn't 'num' the same in both implementation?? >>> >>>> On May 31, 2014, at 12:03 PM, Jim Jagielski <j...@jagunet.com> wrote: >>>> >>>> Sorry I didn't catch this earlier: >>>> >>>> I see >>>> >>>> +++ httpd-trunk.new/include/mpm_common.h 2014-05-16 13:07:03.892987491 >>>> -0400 >>>> @@ -267,16 +267,18 @@ >>>> * Write data to the pipe-of-death, signalling that one child process >>>> * should die. >>>> * @param pod the pipe-of-death to write to. >>>> + * @param my_bucket the bucket that holds the dying child process. >>>> */ >>>> -AP_DECLARE(apr_status_t) ap_mpm_pod_signal(ap_pod_t *pod); >>>> +AP_DECLARE(apr_status_t) ap_mpm_pod_signal(ap_pod_t *pod, int >>>> +child_bucket); >>>> >>>> We can change the API at this point. We could add another function, >>>> eg ap_mpm_pod_signal_ex() which takes the int param, but we can't >>>> modify ap_mpm_pod_signal() itself. >>>> >>>>> On May 30, 2014, at 11:15 AM, Lu, Yingqi <yingqi...@intel.com> wrote: >>>>> >>>>> Thank you very much! >>>>> >>>>> Thanks, >>>>> Yingqi >>>>> >>>>> -----Original Message----- >>>>> From: Jim Jagielski [mailto:j...@jagunet.com] >>>>> Sent: Friday, May 30, 2014 7:07 AM >>>>> To: dev@httpd.apache.org >>>>> Subject: Re: [PATCH ASF bugzilla# 55897]prefork_mpm patch with >>>>> SO_REUSEPORT support >>>>> >>>>> Thx! Let me review. My plan is to fold into trunk this weekend. >>>>> >>>>>> On May 16, 2014, at 2:53 PM, Lu, Yingqi <yingqi...@intel.com> wrote: >>>>>> >>>>>> Hi Jim, >>>>>> >>>>>> Thanks very much for clarifying this with me. I added #ifdef in the code >>>>>> to check _SC_NPROCESSORS_ONLN in the so_reuseport patch. Bucket patch >>>>>> does not use this parameter so that it remains the same. >>>>>> >>>>>> Attached are the two most recent patches. I already updated the bugzilla >>>>>> #55897 as well. >>>>>> >>>>>> Thanks, >>>>>> Yingqi >>>>>> >>>>>> -----Original Message----- >>>>>> From: Jim Jagielski [mailto:j...@jagunet.com] >>>>>> Sent: Thursday, May 15, 2014 7:53 AM >>>>>> To: dev@httpd.apache.org >>>>>> Subject: Re: [PATCH ASF bugzilla# 55897]prefork_mpm patch with >>>>>> SO_REUSEPORT support >>>>>> >>>>>> I was thinking more about the sysconf(_SC_NPROCESSORS_ONLN) stuff... >>>>>> We could either check for that during config/build or protect it with a >>>>>> #ifdef in the code (and create some logging so the admin nows if it was >>>>>> found or not). >>>>>> >>>>>>> On May 14, 2014, at 11:59 AM, Lu, Yingqi <yingqi...@intel.com> wrote: >>>>>>> >>>>>>> Hi Jim, >>>>>>> >>>>>>> Thanks very much for your email. >>>>>>> >>>>>>> In the SO_REUSEPORT patch, SO_REUSEPORT support is checked inside >>>>>>> listen.c file. If the feature is not supported on the OS (for example, >>>>>>> Linux kernel < 3.9), it will fall back to the original behavior. >>>>>>> >>>>>>> In the bucket patch, there is no need to check the params. With single >>>>>>> listen statement, it is just the default behavior. >>>>>>> >>>>>>> Please let me know if this answers your question. >>>>>>> >>>>>>> Thanks, >>>>>>> Yingqi >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Jim Jagielski [mailto:j...@jagunet.com] >>>>>>> Sent: Wednesday, May 14, 2014 6:57 AM >>>>>>> To: dev@httpd.apache.org >>>>>>> Subject: Re: [PATCH ASF bugzilla# 55897]prefork_mpm patch with >>>>>>> SO_REUSEPORT support >>>>>>> >>>>>>> This is very cool! >>>>>>> >>>>>>> mod_status assumes that sysconf() exists, but do we need to do a config >>>>>>> check on the params we use in these patches? >>>>>>> We look OK on Linux, FreeBSD and OSX... >>>>>>> >>>>>>> I'm +1 on folding into trunk. >>>>>>> >>>>>>>> On May 13, 2014, at 7:55 PM, Lu, Yingqi <yingqi...@intel.com> wrote: >>>>>>>> >>>>>>>> Dear All, >>>>>>>> >>>>>>>> During the last couple weeks, I spent some time extending the original >>>>>>>> two patches from prefork MPM only to all three Linux MPMs (prefork, >>>>>>>> worker and event). Attached is the latest version of the two patches. >>>>>>>> Bugzilla database has also been updated already. The ID for the two >>>>>>>> patches are #55897 and #56279. Please refer to messages below for >>>>>>>> details on both of the patches. >>>>>>>> >>>>>>>> Quick test result on modern dual socket Intel platform (Linux >>>>>>>> Kernel >>>>>>>> 3.13.9) SO_REUSEPORT patch (bugzilla #55897) >>>>>>>> 1. Prefork MPM: 1 listen statement: 2.16X throughput >>>>>>>> improvement; 2 listen statements: 2.33X throughput improvement >>>>>>>> 2. Worker MPM: 1 listen statement: 10% throughput improvement; 2 >>>>>>>> listen statements: 35% throughput improvement >>>>>>>> 3. Event MPM: 1 listen statement: 13% throughput improvement; 2 >>>>>>>> listen statements: throughput parity, but 62% response time reduction >>>>>>>> (with patch, 38% response time as original SW) >>>>>>>> >>>>>>>> Bucket patch (bugzilla #56279, only impact multiple listen statement >>>>>>>> case) >>>>>>>> 1. Prefork MPM: 2 listen statements: 42% throughput improvement >>>>>>>> 2. Worker MPM: 2 listen statements: 7% throughput improvement >>>>>>>> >>>>>>>> In all the above testing cases, significant response time reductions >>>>>>>> are observed, even with throughput improvements. >>>>>>>> >>>>>>>> Please let me know your feedback and comments. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Yingqi >>>>>>>> Software and workloads used in performance tests may have been >>>>>>>> optimized for performance only on Intel microprocessors. Performance >>>>>>>> tests, such as SYSmark and MobileMark, are measured using specific >>>>>>>> computer systems, components, software, operations and functions. Any >>>>>>>> change to any of those factors may cause the results to vary. You >>>>>>>> should consult other information and performance tests to assist you >>>>>>>> in fully evaluating your contemplated purchases, including the >>>>>>>> performance of that product when combined with other products. >>>>>>>> >>>>>>>> From: Lu, Yingqi [mailto:yingqi...@intel.com] >>>>>>>> Sent: Monday, March 17, 2014 1:41 PM >>>>>>>> To: dev@httpd.apache.org >>>>>>>> Subject: RE: FW: [PATCH ASF bugzilla# 55897]prefork_mpm patch >>>>>>>> with SO_REUSEPORT support >>>>>>>> >>>>>>>> Dear all, >>>>>>>> >>>>>>>> Based on the feedback we received, we modified this patch. Here is the >>>>>>>> most recent version. We also modified the Bugzilla database(Bugzilla# >>>>>>>> 55897 for SO_REUSEPORT patch; Bugzilla# 56279 for bucket patch). >>>>>>>> >>>>>>>> Below are the changes we made into this new version: >>>>>>>> >>>>>>>> According to Yann Ylavic and other people's comments, we separate the >>>>>>>> original patch between with and without SO_REUSEPORT into two >>>>>>>> separated patches. The SO_REUSEPORT patch does not change the original >>>>>>>> listen sockets, it just duplicate the original one into multiple ones. >>>>>>>> Since the listen sockets are identical, there is no need to change the >>>>>>>> idle_server_maintenance function. The bucket patch (without >>>>>>>> SO_REUSEPORT), on the other hand, it breaks down the original listen >>>>>>>> record (if there are multiple listen socks) to multiple listen record >>>>>>>> linked lists. In this case, idle_server_maintenance is implemented at >>>>>>>> bucket level to address the situation that imbalanced traffic occurs >>>>>>>> among different listen sockets/children buckets. In the bucket patch, >>>>>>>> the polling in the child process is removed since each child only >>>>>>>> listens to 1 sock. >>>>>>>> >>>>>>>> According to Arkadiusz Miskiewicz's comment, we make the "detection of >>>>>>>> SO_REUSEPORT" at run time. >>>>>>>> >>>>>>>> According to Jeff Trawick's comments, 1. We generate the patches >>>>>>>> against the httpd trunk. >>>>>>>> 2. We tested the current patches and they do not impact event and >>>>>>>> worker mpms. If current patches can be accepted, we would be happy to >>>>>>>> extend them to other Linux based mpms. There are not much code >>>>>>>> changes, but require some time to setup the workload to test. >>>>>>>> 3. We removed unnecessary comments and changed APLOGNO(). We also >>>>>>>> changed some of the parameter/variable/function names to better >>>>>>>> represent their meanings. >>>>>>>> 4. There should be no build-in limitations for SO_REUSEPORT patch. For >>>>>>>> bucket patch, the only thing is the number of children bucket only >>>>>>>> scales to MAX_SPAWN_RATE. If there are more than 32 (current default >>>>>>>> MAX_SPQN_RATE) listen statements specified in the httpd.conf, the >>>>>>>> number of buckets will be fixed to 32. The reason for this is because >>>>>>>> that we implement the idle_server_maintenance at bucket level, each >>>>>>>> bucket's own max_spawn_rate is set to MAX_SPAWN_RATE/num_buckets. >>>>>>>> >>>>>>>> Again, thanks very much for all the comments and feedback. Please let >>>>>>>> us know if there are more changes we need to complete to make them >>>>>>>> accepted. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Yingqi Lu >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> From: Lu, Yingqi >>>>>>>> Sent: Tuesday, March 04, 2014 10:43 AM >>>>>>>> To: dev@httpd.apache.org >>>>>>>> Subject: RE: FW: [PATCH ASF bugzilla# 55897]prefork_mpm patch >>>>>>>> with SO_REUSEPORT support >>>>>>>> >>>>>>>> Hi Jeff, >>>>>>>> >>>>>>>> Thanks very much for your time reviewing the patch! We will modify the >>>>>>>> patch according to your comments and repost it here. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Yingqi >>>>>>>> >>>>>>>> From: Jeff Trawick [mailto:traw...@gmail.com] >>>>>>>> Sent: Tuesday, March 04, 2014 10:08 AM >>>>>>>> To: Apache HTTP Server Development List >>>>>>>> Subject: Re: FW: [PATCH ASF bugzilla# 55897]prefork_mpm patch >>>>>>>> with SO_REUSEPORT support >>>>>>>> >>>>>>>> On Tue, Mar 4, 2014 at 10:35 AM, Lu, Yingqi <yingqi...@intel.com> >>>>>>>> wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> I just want to ping again on this patch to gather your feedback and >>>>>>>> comments. Please refer to the messages below for patch details. >>>>>>>> >>>>>>>> If you need any additional information/supporting data, please let us >>>>>>>> know as well. >>>>>>>> >>>>>>>> Yeah, it has been on my todo list, but I don't have time to give >>>>>>>> an in depth review at the moment. Here are a few questions/comments. >>>>>>>> (And you'll have to deal with the fact that it is unnecessarily >>>>>>>> tedious for me to evaluate higher-level considerations if there >>>>>>>> are a lot of distractions, such as the code comments below ;) But >>>>>>>> others are of course free to chime in.) >>>>>>>> >>>>>>>> The patch should be against httpd trunk. It probably won't take much >>>>>>>> time for you to create that patch and confirm basic operation. >>>>>>>> >>>>>>>> What is the impact to other MPMs, even if they shouldn't use or don't >>>>>>>> have the necessary code to use SO_REUSEPORT at this time? >>>>>>>> >>>>>>>> Have you tried the event MPM? >>>>>>>> >>>>>>>> Is there a way for the admin to choose this behavior? Most won't >>>>>>>> care, but everyone's behavior is changed AFAICT. >>>>>>>> >>>>>>>> Are there built-in limitations in this patch that we should be aware >>>>>>>> of? E.g., the free slot/spawn rate changes suggest to me that there >>>>>>>> can't be more than 1025 children??? >>>>>>>> >>>>>>>> We should assume for now that there's no reason this couldn't be >>>>>>>> committed to trunk after review/rework, so make sure it is as close as >>>>>>>> you can get it to what you think is the final form. >>>>>>>> >>>>>>>> For the configure-time check for 3.9 kernel: I think we'd also >>>>>>>> use AC_TRY_COMPILE at configure time to confirm that the >>>>>>>> SO_REUSEPORT definition is available, and not enable it if the >>>>>>>> system includes doesn't define it. (Does that cause a problem >>>>>>>> for any significant number of people?) >>>>>>>> >>>>>>>> Don't mention the patch in the patch ;) (e.g., "This function is >>>>>>>> added for the patch.") >>>>>>>> >>>>>>>> Incomplete comments on style/syntax issues: >>>>>>>> >>>>>>>> * mixing declarations and statements (e.g., "duplr->next = 0; >>>>>>>> apr_socket_t *temps;") isn't supported by all compilers and is >>>>>>>> distracting when reviewing >>>>>>>> * suitable identifier names (e.g., fix global variable "flag" and >>>>>>>> whatever else isn't appropriate; "ap_post_config_listeners" >>>>>>>> should be renamed to indicate what it does >>>>>>>> * APLOGNO(99999) and comments about fixing it: Instead put "APLOGNO()" >>>>>>>> and don't add reminders in comments >>>>>>>> * this doesn't seem portable: "int >>>>>>>> free_slots[MAX_SPAWN_RATE/num_buckets];" >>>>>>>> and so on >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Yingqi >>>>>>>> >>>>>>>> >>>>>>>> From: Lu, Yingqi >>>>>>>> Sent: Friday, January 24, 2014 3:26 PM >>>>>>>> To: dev@httpd.apache.org >>>>>>>> Subject: [PATCH ASF bugzilla# 55897]prefork_mpm patch with >>>>>>>> SO_REUSEPORT support >>>>>>>> >>>>>>>> Dear All, >>>>>>>> >>>>>>>> Our analysis of Apache httpd 2.4.7 prefork mpm, on 32 and 64 thread >>>>>>>> Intel Xeon 2600 series systems, using an open source three tier social >>>>>>>> networking web server workload, revealed performance scaling issues. >>>>>>>> In current software single listen statement (listen 80) provides >>>>>>>> better scalability due to un-serialized accept. However, when system >>>>>>>> is under very high load, this can lead to big number of child >>>>>>>> processes stuck in D state. On the other hand, the serialized accept >>>>>>>> approach cannot scale with the high load either. In our analysis, a >>>>>>>> 32-thread system, with 2 listen statements specified, could scale to >>>>>>>> just 70% utilization, and a 64-thread system, with signal listen >>>>>>>> statement specified (listen 80, 4 network interfaces), could scale to >>>>>>>> only 60% utilization. >>>>>>>> >>>>>>>> Based on those findings, we created a prototype patch for prefork mpm >>>>>>>> which extends performance and thread utilization. In Linux kernel >>>>>>>> newer than 3.9, SO_REUSEPORT is enabled. This feature allows multiple >>>>>>>> sockets listen to the same IP:port and automatically round robins >>>>>>>> connections. We use this feature to create multiple duplicated >>>>>>>> listener records of the original one and partition the child processes >>>>>>>> into buckets. Each bucket listens to 1 IP:port. In case of old kernel >>>>>>>> which does not have the SO_REUSEPORT enabled, we modified the >>>>>>>> "multiple listen statement case" by creating 1 listen record for each >>>>>>>> listen statement and partitioning the child processes into different >>>>>>>> buckets. Each bucket listens to 1 IP:port. >>>>>>>> >>>>>>>> Quick tests of the patch, running the same workload, demonstrated a >>>>>>>> 22% throughput increase with 32-threads system and 2 listen statements >>>>>>>> (Linux kernel 3.10.4). With the older kernel (Linux Kernel 3.8.8, >>>>>>>> without SO_REUSEPORT), 10% performance gain was measured. With single >>>>>>>> listen statement (listen 80) configuration, we observed over 2X >>>>>>>> performance improvements on modern dual socket Intel platforms (Linux >>>>>>>> Kernel 3.10.4). We also observed big reduction in response time, in >>>>>>>> addition to the throughput improvement gained in our tests 1. >>>>>>>> >>>>>>>> Following the feedback from the bugzilla website where we originally >>>>>>>> submitted the patch, we removed the dependency of APR change to >>>>>>>> simplify the patch testing process. Thanks Jeff Trawick for his good >>>>>>>> suggestion! We are also actively working on extending the patch to >>>>>>>> worker and event MPMs, as a next step. Meanwhile, we would like to >>>>>>>> gather comments from all of you on the current prefork patch. Please >>>>>>>> take some time test it and let us know how it works in your >>>>>>>> environment. >>>>>>>> >>>>>>>> This is our first patch to the Apache community. Please help us review >>>>>>>> it and let us know if there is anything we might revise to improve it. >>>>>>>> Your feedback is very much appreciated. >>>>>>>> >>>>>>>> Configuration: >>>>>>>> <IfModule prefork.c> >>>>>>>> ListenBacklog 105384 >>>>>>>> ServerLimit 105000 >>>>>>>> MaxClients 1024 >>>>>>>> MaxRequestsPerChild 0 >>>>>>>> StartServers 64 >>>>>>>> MinSpareServers 8 >>>>>>>> MaxSpareServers 16 >>>>>>>> </IfModule> >>>>>>>> >>>>>>>> 1. Software and workloads used in performance tests may have been >>>>>>>> optimized for performance only on Intel microprocessors. Performance >>>>>>>> tests, such as SYSmark and MobileMark, are measured using specific >>>>>>>> computer systems, components, software, operations and functions. Any >>>>>>>> change to any of those factors may cause the results to vary. You >>>>>>>> should consult other information and performance tests to assist you >>>>>>>> in fully evaluating your contemplated purchases, including the >>>>>>>> performance of that product when combined with other products. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Yingqi >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Born in Roswell... married an alien... >>>>>>>> http://emptyhammock.com/ >>>>>>>> http://edjective.org/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Born in Roswell... married an alien... >>>>>>>> http://emptyhammock.com/ >>>>>>>> http://edjective.org/ >>>>>>>> >>>>>>>> <httpd_trunk_so_reuseport.patch><httpd_trunk_bucket.patch> >>>>>> >>>>>> <httpd_trunk_so_reuseport.patch><httpd_trunk_bucket.patch> >>> >> <httpd_trunk_so_reuseport.patch><httpd_trunk_bucket.patch> >