Re: [PATCH ASF bugzilla# 55897]prefork_mpm patch with SO_REUSEPORT support

2014-03-05 Thread Yann Ylavic
Hi Yingqi,

I'm a bit confused about the patch, mainly because it seems to handle the
same way both with and without SO_REUSEPORT available, while SO_REUSEPORT
could (IMHO) be handled in children only (a less intrusive way).

With SO_REUSEPORT, I would have expected the accept mutex to be useless
since, if I understand correcly the option, multiple processes/threads can
accept() simultaneously provided they use their own socket (each one
bound/listening on the same addr:port).
Couldn't then each child duplicate the listeners (ie. new
socket+bind(SO_REUSEPORT)+listen), before switching UIDs, and then poll()
all of them without synchronisation (accept() is probably not an option for
timeout reasons), and then get fair scheduling from the OS (for all the
listeners)?
Is the lock still needed because the duplicated listeners are inherited
from the parent process?

Without SO_REUSEPORT, if I understand correctly still, each child will
poll() a single listener to avoid the serialized accept.
On the other hand, each child is dedicated, won't one have to multiply the
configured ServerLimit by the number of Listen to achieve the same (maximum
theoretical) scalability with regard to all the listeners?
I don't pretend it is a good or bad thing, just figuring out what could
then be a rule to size the configuration (eg.
MaxClients/ServerLimit/#cores/#Listen).

It seems to me that the patches with and without SO_REUSEPORT should be
separate ones, but I may be missing something.

Also, but this is not related to this patch particularly (addressed to who
knows), it's unclear to me why an accept mutex is needed at all.
Multiple processes poll()ing the same inherited socket is safe but not
multiple ones? Is that an OS issue? Process wide only? Still (in)valid in
latest OSes?

Thanks for the patch anyway, it looks promising.

Regards,
Yann.

On Sat, Jan 25, 2014 at 12:25 AM, Lu, Yingqi yingqi...@intel.com wrote:

  Dear All,
 



 Our analysis of Apache httpd 2.4.7 prefork mpm, on 32 and 64 thread Intel
 Xeon 2600 series systems, using an open source three tier social networking
 web server workload, revealed performance scaling issues.  In current
 software single listen statement (listen 80) provides better scalability
 due to un-serialized accept. However, when system is under very high load,
 this can lead to big number of child processes stuck in D state.




 On the other hand, the serialized accept approach cannot scale with the
 high load either.  In our analysis, a 32-thread system, with 2 listen
 statements specified, could scale to just 70% utilization, and a 64-thread
 system, with signal listen statement specified (listen 80, 4 network
 interfaces), could scale to only 60% utilization.



 Based on those findings, we created a prototype patch for prefork mpm
 which extends performance and thread utilization. In Linux kernel newer
 than 3.9, SO_REUSEPORT is enabled. This feature allows multiple sockets
 listen to the same IP:port and automatically round robins connections. We
 use this feature to create multiple duplicated listener records of the
 original one and partition the child processes into buckets. Each bucket
 listens to 1 IP:port. In case of old kernel which does not have the
 SO_REUSEPORT enabled, we modified the multiple listen statement case by
 creating 1 listen record for each listen statement and partitioning the
 child processes into different buckets. Each bucket listens to 1 IP:port.



 Quick tests of the patch, running the same workload, demonstrated a 22%
 throughput increase with 32-threads system and 2 listen statements (Linux
 kernel 3.10.4). With the older kernel (Linux Kernel 3.8.8, without
 SO_REUSEPORT), 10% performance gain was measured. With single listen
 statement (listen 80) configuration, we observed over 2X performance
 improvements on modern dual socket Intel platforms (Linux Kernel 3.10.4).
 We also observed big reduction in response time, in addition to the
 throughput improvement gained in our tests 1.



 Following the feedback from the bugzilla website where we originally
 submitted the patch, we removed the dependency of APR change to simplify
 the patch testing process. Thanks Jeff Trawick for his good suggestion! We
 are also actively working on extending the patch to worker and event MPMs,
 as a next step. Meanwhile, we would like to gather comments from all of you
 on the current prefork patch. Please take some time test it and let us know
 how it works in your environment.



 This is our first patch to the Apache community. Please help us review it
 and let us know if there is anything we might revise to improve it. Your
 feedback is very much appreciated.



 *Configuration:*

 IfModule prefork.c

 ListenBacklog 105384

 ServerLimit 105000

 MaxClients 1024

 MaxRequestsPerChild 0

 StartServers 64

 MinSpareServers 8

 MaxSpareServers 16

 /IfModule



 1. Software and workloads used in performance tests may have been
 optimized for 

Re: [PATCH ASF bugzilla# 55897]prefork_mpm patch with SO_REUSEPORT support

2014-03-05 Thread Yann Ylavic
On Wed, Mar 5, 2014 at 2:04 PM, Yann Ylavic ylavic@gmail.com wrote:

 Also, but this is not related to this patch particularly (addressed to
 who knows), it's unclear to me why an accept mutex is needed at all.
 Multiple processes poll()ing the same inherited socket is safe but not
 multiple ones? Is that an OS issue? Process wide only? Still (in)valid in
 latest OSes?


I mean
 when 
SINGLE_LISTEN_UNSERIALIZED_ACCEPT is
set only, the OS has the capability to do unserialized accept with one
socket but not any more.


Re: svn commit: r1574518 - /httpd/httpd/trunk/modules/loggers/mod_log_config.c

2014-03-05 Thread Mike Rumph

Hello Jim,

I see a style difference in the change below compare to the lines just 
above it.

How to test a value after an assignment.
In the while statement the value is tested implicitly.
In the if statement the value is explicitly compared against NULL.
Usually the second way is chosen to avoid the appearance of an obvious 
mistake (= versus ==).
Especially when code is this close together, I would think that we would 
want to use a consistent convention.
Or maybe you just don't want to change the style of existing code at the 
same time as adding new code, correct?


Thanks,

Mike

On 3/5/2014 7:00 AM, j...@apache.org wrote:

Author: jim
Date: Wed Mar  5 15:00:56 2014
New Revision: 1574518

URL: http://svn.apache.org/r1574518
Log:
ensure cookies have name/value

Modified:
 httpd/httpd/trunk/modules/loggers/mod_log_config.c

Modified: httpd/httpd/trunk/modules/loggers/mod_log_config.c
URL: 
http://svn.apache.org/viewvc/httpd/httpd/trunk/modules/loggers/mod_log_config.c?rev=1574518r1=1574517r2=1574518view=diff
==
--- httpd/httpd/trunk/modules/loggers/mod_log_config.c (original)
+++ httpd/httpd/trunk/modules/loggers/mod_log_config.c Wed Mar  5 15:00:56 2014
@@ -542,8 +542,9 @@ static const char *log_cookie(request_re
  char *cookies = apr_pstrdup(r-pool, cookies_entry);
  
  while ((cookie = apr_strtok(cookies, ;, last1))) {

-char *name = apr_strtok(cookie, =, last2);
-if (name) {
+char *name;
+if (strchr(cookie, '=') 
+   (name = apr_strtok(cookie, =, last2)) != NULL) {
  char *value = name + strlen(name) + 1;
  apr_collapse_spaces(name, name);
  









RE: [PATCH ASF bugzilla# 55897]prefork_mpm patch with SO_REUSEPORT support

2014-03-05 Thread Lu, Yingqi
Hi Yann,

Thanks very much for your email.

1. If I understand correctly (please correct me if not), do you suggest 
duplicating the listen socks inside the child process with SO_REUSEPROT 
enabled? Yes, I agree this would be a cleaner implementation and I actually 
tried that before. However, I encountered the connection reset error since 
the number of the child process is changing. I googled online and found it 
actually being discussed here at http://lwn.net/Articles/542629/.

2. Then, I decided to do the socket duplication in the parent process. The goal 
of this change is to extend the CPU thread scalability with the big thread 
count system. Therefore, I just very simply defined 
number_of_listen_buckets=total_number_active_thread/8, and each listen bucket 
has a dedicated listener. I do not want to over duplicate the socket; 
otherwise, it would create too many child processes at the beginning. One 
listen bucket should have at least one child process to start with. However, 
this is only my understanding and it may not be correct and complete. If you 
have other ideas, please share with us. Feedbacks and comments are very welcome 
here :)

3. I am struggling with myself as well on if we should put with and without 
SO_REUSEPORT into two different patches. The only reason I put them together is 
because they both use the concept of listen buckets. If you think it would make 
more sense to separate them into two patches, I can certainly do that. Also, I 
am a little bit confused about your comments On the other hand, each child is 
dedicated, won't one have to multiply the configured ServerLimit by the number 
of Listen to achieve the same (maximum theoretical) scalability with regard to 
all the listeners?. Can you please explain a little bit more on this? Really 
appreciate.

This is our first patch to the open source and Apache community. We are still 
on the learning curve about a lot of things. Your feedback and comments really 
help us!

Please let me know if you have any further questions.

Thanks,
Yingqi


From: Yann Ylavic [mailto:ylavic@gmail.com]
Sent: Wednesday, March 05, 2014 5:04 AM
To: httpd
Subject: Re: [PATCH ASF bugzilla# 55897]prefork_mpm patch with SO_REUSEPORT 
support

Hi Yingqi,

I'm a bit confused about the patch, mainly because it seems to handle the same 
way both with and without SO_REUSEPORT available, while SO_REUSEPORT could 
(IMHO) be handled in children only (a less intrusive way).
With SO_REUSEPORT, I would have expected the accept mutex to be useless since, 
if I understand correcly the option, multiple processes/threads can accept() 
simultaneously provided they use their own socket (each one bound/listening on 
the same addr:port).
Couldn't then each child duplicate the listeners (ie. new 
socket+bind(SO_REUSEPORT)+listen), before switching UIDs, and then poll() all 
of them without synchronisation (accept() is probably not an option for timeout 
reasons), and then get fair scheduling from the OS (for all the listeners)?
Is the lock still needed because the duplicated listeners are inherited from 
the parent process?

Without SO_REUSEPORT, if I understand correctly still, each child will poll() a 
single listener to avoid the serialized accept.
On the other hand, each child is dedicated, won't one have to multiply the 
configured ServerLimit by the number of Listen to achieve the same (maximum 
theoretical) scalability with regard to all the listeners?
I don't pretend it is a good or bad thing, just figuring out what could then be 
a rule to size the configuration (eg. MaxClients/ServerLimit/#cores/#Listen).
It seems to me that the patches with and without SO_REUSEPORT should be 
separate ones, but I may be missing something.
Also, but this is not related to this patch particularly (addressed to who 
knows), it's unclear to me why an accept mutex is needed at all.
Multiple processes poll()ing the same inherited socket is safe but not multiple 
ones? Is that an OS issue? Process wide only? Still (in)valid in latest OSes?

Thanks for the patch anyway, it looks promising.
Regards,
Yann.

On Sat, Jan 25, 2014 at 12:25 AM, Lu, Yingqi 
yingqi...@intel.commailto:yingqi...@intel.com wrote:
Dear All,

Our analysis of Apache httpd 2.4.7 prefork mpm, on 32 and 64 thread Intel Xeon 
2600 series systems, using an open source three tier social networking web 
server workload, revealed performance scaling issues.  In current software 
single listen statement (listen 80) provides better scalability due to 
un-serialized accept. However, when system is under very high load, this can 
lead to big number of child processes stuck in D state.


On the other hand, the serialized accept approach cannot scale with the high 
load either.  In our analysis, a 32-thread system, with 2 listen statements 
specified, could scale to just 70% utilization, and a 64-thread system, with 
signal listen statement specified (listen 80, 4 network interfaces), could 
scale to only 60% utilization.


Re: [PATCH ASF bugzilla# 55897]prefork_mpm patch with SO_REUSEPORT support

2014-03-05 Thread William A. Rowe Jr.
Yingqi,

as one of the 'Windows folks' here, your idea is very intriguing, and
I'm sorry that other issues have distracted me from giving it the
attention it deserves.

If you want to truly re-architect the MPM, by all means, propose it as
another MPM module.  If it isn't adopted here, please don't hesitate
to offer it to interested users as separate source (although I hope we
find a way to adopt it.)

The idea of different MPM's was that they were swappable.  MPM foo
isn't MPM bar.  E.g., worker, prefork, event each have their own tree.
 Likewise, there is nothing stopping us from having 2, or 3 MPM's on
Windows, and there is nothing stopping us from stating that there is a
prerequisite on a particular MPM of Linux 3.1 kernels or Windows
2008+.

The Windows build system hasn't been so flexible, but this can be
remediated with cmake, as folks have spent many hours to accomplish.
I understand you are probably relying on functions authored entirely
for the winnt_mpm, and we can re-factor those on trunk out to the
os/win32/ directory so that MPM's may share them.

The definition of the word prefork is a single thread process which
handles a request.  Please don't misuse the phrase, and without
reviewing your code, I'll presume that is what you meant.

I don't doubt your results of benchmarking, but please make note that
only Windows Server OS's can actually be used to perform any
benchmarks.  Any 'desktop' enterprise, professional or home editions
are deliberately hobbled, and IMHO the project should make no
accommodation for vendor stupidity.

In terms of benchmarking, I don't know how you measured, but if you
can peg a machine at 95% total utilization yet httpd shows itself
consuming only 70% or 60%, that means it is kernel-bound.  That is
usually a good thing, that the app is operating optimally and is only
constrained by the architecture.

I think I understand where you are going with reuseport.  That doesn't
equate to the Unix OS's... they can distribute the already opened
listener to an unlimited number of forks.  On windows, we also
distribute the listener through a write/stdin channel to the child
process.  What doesn't work well is for parallel windows children to
share certain resources such as the error log, access log etc.  But we
can contend with that issue.  What we can't contend with is what 3rd
party modules have chosen to do, and almost any patch you offer is not
going to be suitable for binary compatibility with 3rd party httpd 2.4
modules compiled for windows, so your patch presented for the 2.4
branch is rejected.

That said, we should endeavor to solve this for 2.6 (or 3.0 or
whatever we call the 'next httpd').  We are all out of fresh ideas, so
proposals such as yours are a welcome sight!!!

Finally, please do have patience, large patches require time for us to
digest, and we have limited amounts of that resource.  As I mention,
adding a whole new MPM directory to trunk, alone, should meet very
little resistance for any architectures.

Thank you for your posts, and please do not feel ignored.  There are a
handful of people active and we all have many details to attend to.

Yours,

Bill

On Fri, Jan 24, 2014 at 5:25 PM, Lu, Yingqi yingqi...@intel.com wrote:
 Dear All,



 Our analysis of Apache httpd 2.4.7 prefork mpm, on 32 and 64 thread Intel
 Xeon 2600 series systems, using an open source three tier social networking
 web server workload, revealed performance scaling issues.  In current
 software single listen statement (listen 80) provides better scalability due
 to un-serialized accept. However, when system is under very high load, this
 can lead to big number of child processes stuck in D state. On the other
 hand, the serialized accept approach cannot scale with the high load either.
 In our analysis, a 32-thread system, with 2 listen statements specified,
 could scale to just 70% utilization, and a 64-thread system, with signal
 listen statement specified (listen 80, 4 network interfaces), could scale to
 only 60% utilization.



 Based on those findings, we created a prototype patch for prefork mpm which
 extends performance and thread utilization. In Linux kernel newer than 3.9,
 SO_REUSEPORT is enabled. This feature allows multiple sockets listen to the
 same IP:port and automatically round robins connections. We use this feature
 to create multiple duplicated listener records of the original one and
 partition the child processes into buckets. Each bucket listens to 1
 IP:port. In case of old kernel which does not have the SO_REUSEPORT enabled,
 we modified the multiple listen statement case by creating 1 listen record
 for each listen statement and partitioning the child processes into
 different buckets. Each bucket listens to 1 IP:port.



 Quick tests of the patch, running the same workload, demonstrated a 22%
 throughput increase with 32-threads system and 2 listen statements (Linux
 kernel 3.10.4). With the older kernel (Linux Kernel 3.8.8, without
 SO_REUSEPORT), 10% 

Re: [PATCH ASF bugzilla# 55897]prefork_mpm patch with SO_REUSEPORT support

2014-03-05 Thread Arkadiusz Miśkiewicz
On Thursday 06 of March 2014, William A. Rowe Jr. wrote:

 If you want to truly re-architect the MPM, by all means, propose it as
 another MPM module.  If it isn't adopted here, please don't hesitate
 to offer it to interested users as separate source (although I hope we
 find a way to adopt it.)
 
 The idea of different MPM's was that they were swappable.  MPM foo
 isn't MPM bar.  E.g., worker, prefork, event each have their own tree.
  Likewise, there is nothing stopping us from having 2, or 3 MPM's on
 Windows, and there is nothing stopping us from stating that there is a
 prerequisite on a particular MPM of Linux 3.1 kernels or Windows
 2008+.

I dislike idea of yet another mpm. More mpm means that each mpm gets lower 
developer resources and lower testing (and external mpm, distributed outside 
apache get almost no devs and no testing).

Less MPMs is better IMO. So better to improve existing ones than invent new 
one.

 (although I hope we find a way to adopt it.)

+1

-- 
Arkadiusz Miśkiewicz, arekm / maven.pl


RE: [PATCH ASF bugzilla# 55897]prefork_mpm patch with SO_REUSEPORT support

2014-03-05 Thread Lu, Yingqi
Hi Bill,

Thanks very much for your email and I am really happy that I got lots of very 
good feedbacks on the email list.

The patch was created only for Linux Prefork mpm so that it should not impact 
winnt_mpm. I may misunderstand you here, but do you mean in order to adopt the 
patch, we need to extend it for winnt_mpm?

Regarding to the testing result, what we provided was based on RHEL 6.2 (server 
version) with kernel 3.10.4. We measured the throughput as operations/sec as 
well as the response time defined by the time that a request sending from the 
client till it gets the response back. It is a three tier webserver workload. 
We measured the throughput on the frontend webserver tier (Apache httpd with 
Prefork + PHP as libphp5.so under httpd/modules).

Thanks,
Yingqi 

-Original Message-
From: William A. Rowe Jr. [mailto:wmr...@gmail.com] 
Sent: Wednesday, March 05, 2014 9:58 PM
To: dev@httpd.apache.org
Subject: Re: [PATCH ASF bugzilla# 55897]prefork_mpm patch with SO_REUSEPORT 
support

Yingqi,

as one of the 'Windows folks' here, your idea is very intriguing, and I'm sorry 
that other issues have distracted me from giving it the attention it deserves.

If you want to truly re-architect the MPM, by all means, propose it as another 
MPM module.  If it isn't adopted here, please don't hesitate to offer it to 
interested users as separate source (although I hope we find a way to adopt it.)

The idea of different MPM's was that they were swappable.  MPM foo isn't MPM 
bar.  E.g., worker, prefork, event each have their own tree.
 Likewise, there is nothing stopping us from having 2, or 3 MPM's on Windows, 
and there is nothing stopping us from stating that there is a prerequisite on a 
particular MPM of Linux 3.1 kernels or Windows
2008+.

The Windows build system hasn't been so flexible, but this can be remediated 
with cmake, as folks have spent many hours to accomplish.
I understand you are probably relying on functions authored entirely for the 
winnt_mpm, and we can re-factor those on trunk out to the os/win32/ directory 
so that MPM's may share them.

The definition of the word prefork is a single thread process which handles a 
request.  Please don't misuse the phrase, and without reviewing your code, I'll 
presume that is what you meant.

I don't doubt your results of benchmarking, but please make note that only 
Windows Server OS's can actually be used to perform any benchmarks.  Any 
'desktop' enterprise, professional or home editions are deliberately hobbled, 
and IMHO the project should make no accommodation for vendor stupidity.

In terms of benchmarking, I don't know how you measured, but if you can peg a 
machine at 95% total utilization yet httpd shows itself consuming only 70% or 
60%, that means it is kernel-bound.  That is usually a good thing, that the app 
is operating optimally and is only constrained by the architecture.

I think I understand where you are going with reuseport.  That doesn't equate 
to the Unix OS's... they can distribute the already opened listener to an 
unlimited number of forks.  On windows, we also distribute the listener through 
a write/stdin channel to the child process.  What doesn't work well is for 
parallel windows children to share certain resources such as the error log, 
access log etc.  But we can contend with that issue.  What we can't contend 
with is what 3rd party modules have chosen to do, and almost any patch you 
offer is not going to be suitable for binary compatibility with 3rd party httpd 
2.4 modules compiled for windows, so your patch presented for the 2.4 branch is 
rejected.

That said, we should endeavor to solve this for 2.6 (or 3.0 or whatever we call 
the 'next httpd').  We are all out of fresh ideas, so proposals such as yours 
are a welcome sight!!!

Finally, please do have patience, large patches require time for us to digest, 
and we have limited amounts of that resource.  As I mention, adding a whole new 
MPM directory to trunk, alone, should meet very little resistance for any 
architectures.

Thank you for your posts, and please do not feel ignored.  There are a handful 
of people active and we all have many details to attend to.

Yours,

Bill

On Fri, Jan 24, 2014 at 5:25 PM, Lu, Yingqi yingqi...@intel.com wrote:
 Dear All,



 Our analysis of Apache httpd 2.4.7 prefork mpm, on 32 and 64 thread 
 Intel Xeon 2600 series systems, using an open source three tier social 
 networking web server workload, revealed performance scaling issues.  
 In current software single listen statement (listen 80) provides 
 better scalability due to un-serialized accept. However, when system 
 is under very high load, this can lead to big number of child 
 processes stuck in D state. On the other hand, the serialized accept approach 
 cannot scale with the high load either.
 In our analysis, a 32-thread system, with 2 listen statements 
 specified, could scale to just 70% utilization, and a 64-thread 
 system, with signal listen 

Re: http/2, spdy and bears, oh my!

2014-03-05 Thread Pierre Joye
hi,

On Wed, Feb 5, 2014 at 8:09 PM, Jim Jagielski j...@jagunet.com wrote:
 With http/2 becoming closer and closer, and spdy being
 in place as we speak, it seems that we should really
 ramp up development on trunk to support these new techs.

 Lets get serious on what needs to be done w/ trunk
 to get there, and what our wish-list is for new capability
 and architecture.

 Taking a page from mod_spdy, breaking the connection-request
 singularity looks like an interesting 1st step, maybe via
 some sort of virtual connection which a real connection
 can spin up/down and which corresponds to the request's
 actual connection...

By the way, I was wondering what the Apache strategy here. Do you plan
to implement your own httpbis stack or use existing library like
https://github.com/tatsuhiro-t/nghttp2?

-- 
Pierre

@pierrejoye | http://www.libgd.org