Re: Keepalives

2005-06-20 Thread Greg Ames

Greg Ames wrote:

Brian Akins wrote:


We've been doing some testing with the current 2.1 implementation, and 
it works, it just currently doesn't offer much advantage over worker 
for us.  If num keepalives == maxclients, you can't accept anymore 
connections.  
 
that's a surprise, and it sounds like a bug.  I'll investigate.  


the event mpm in httpd-2.1 trunk is working fine for me.  running a specweb99 
mini-benchmark with MaxClients 50 and Listen 8092, I see:


[EMAIL PROTECTED] built]$ while true; do netstat -an | grep -c 
8092.*ESTABLISHED; done

146
163
162
188
164
166
149
145
157
152
163

...so with this setup, I have roughly 3 connections for every worker thread, 
including the idle threads.


here is a server-status: 
http://people.apache.org/~gregames/event-server-status.html


how are you counting connections?

Greg



Re: Keepalives

2005-06-20 Thread Akins, Brian
Title: Re: Keepalives






On 6/20/05 3:14 PM, Greg Ames [EMAIL PROTECTED] wrote:



 ...so with this setup, I have roughly 3 connections for every worker thread,
 including the idle threads.


Cool. Maybe I just need the latest version. Or I could have just screwed my
test...


Anyway, there should be a way to limit how many idle connections to keep
around. 


-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies






Re: Keepalives

2005-06-17 Thread William A. Rowe, Jr.
At 08:11 AM 6/17/2005, Akins, Brian wrote:

If you want to use keepalives, all of you workers (threads/procs/whatever) 
can become busy just waiting on another request on a keepalive connection. 
Raising MaxClients does not help. 

No, it doesn't :)  But lowering the keepalive threshold to three
to five seconds does.  We are lowering the 'example' keepalive
timeout in the next releases.

Keepalives as originally implemented were to help users with
loading additional images (and now, css stylesheets.)  And the
time was set to allow a user to grab 'another page' if they
quickly clicked through.  But the later use is not a good use
of child slots, and the former use case has a -much- lower
window these days, as most browsers are quite fast to compose
the base document and determine what images are required.

With the relative disappearance of 1200baud dialup, 15 seconds
for the client to sit and think about grabbing more documents
to compose the current page is very silly :)

The Event MPM does not seems to really help this situation.  It seems to 
only make each keepalive connection cheaper.  It can still allow all 
workers to be blocking on keepalives. 

Short Term solution: 

This is what we did.  We use worker MPM.  We wrote a simple modules that 
keep track of how many keeapalive connections are active.  When a threshold 
is reached, it does not allow anymore keepalives.  (Basically sets 
r-connection-keepalive = AP_CONN_CLOSE).  This works for us, but the limit 
is per process and only works for threaded MPM's. 

Long Term solution: 

Keep track of keepalives in the scoreboard (or somewhere else). Allow 
admin's to set a threshold for keepalives: 

MaxClients 1024 
MaxConcurrentKeepalives 768 

Or something like that. 

Thoughts?  I am willing to write the code if this seems desirable.  Should 
this just be another module or in the http core? 

If you experiment with setting the keepalive window to 3 seconds
or so, how does that affect your test?  Also, I'd be very concerned
about additional load - clients who are retrieving many gifs (with
no pause at all) in a pipelined fashion will end up hurting the
over resource usage if you force them back to HTTP/1.0 behavior.

Bill




Re: Keepalives

2005-06-17 Thread Brian Akins

William A. Rowe, Jr. wrote:


No, it doesn't :)  But lowering the keepalive threshold to three
to five seconds does.  


For us, in heavy loads, that's 3-5 seconds that a thread cannot process 
a new client.  Under normal circumstances, the 15 seconds is fine, but 
when we are stressed, we need to free threads as quickly as possible.




 Also, I'd be very concerned
about additional load - clients who are retrieving many gifs (with
no pause at all) in a pipelined fashion will end up hurting the
over resource usage if you force them back to HTTP/1.0 behavior.


Yes, but if all threads are waiting for x seconds for keepalives (even 
if it is 3-5 seconds), the server cannot service any new clients.  I'm 
willing to take an overall resource hit (and inconvenience some 
clients) to maintain the overall availability of the server.


Does that make any sense?  It does to me, but I may not be explaining 
our problem well.



--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies


Re: Keepalives

2005-06-17 Thread Nick Kew
Akins, Brian wrote:

 Short Term solution:
 
 This is what we did.  We use worker MPM.  We wrote a simple modules that
 keep track of how many keeapalive connections are active.  When a threshold
 is reached, it does not allow anymore keepalives.  (Basically sets
 r-connection-keepalive = AP_CONN_CLOSE).  This works for us, but the limit
 is per process and only works for threaded MPM's.

Could that be done dynamically?  As in, make the max keepalive time a
function of how near the server is to running out of spare workers?

Oh, and is the default still ridiculously high?  ISTR it being 15 secs
at one time - not sure if that ever got changed.

Also, have you looked into making keepalive dependent on resource type?
E.g. use them for HTML docs - which typically have inline contents - but
not for other media types unless REFERER is a local HTML page?


 Long Term solution:
 
 Keep track of keepalives in the scoreboard (or somewhere else). Allow
 admin's to set a threshold for keepalives:
 
 MaxClients 1024
 MaxConcurrentKeepalives 768
 
 Or something like that.
 
 
 Thoughts?  I am willing to write the code if this seems desirable.  Should
 this just be another module or in the http core?

Is that a candidate application for the monitor hook?  Other things
being equal, I'd make it a module.

-- 
Nick Kew


Re: Keepalives

2005-06-17 Thread Bill Stoddard

Akins, Brian wrote:

Here's the problem:

If you want to use keepalives, all of you workers (threads/procs/whatever)
can become busy just waiting on another request on a keepalive connection.
Raising MaxClients does not help.

The Event MPM does not seems to really help this situation.  It seems to
only make each keepalive connection cheaper.  It can still allow all
workers to be blocking on keepalives.


If the event MPM is working properly, then a worker thread should not be blocking waiting for the next ka 
request. You still have the overhead of the tcp connection and some storage used by httpd to manage connection 
events but both of those are small compared to a blocking thread.





Short Term solution:

This is what we did.  We use worker MPM.  We wrote a simple modules that
keep track of how many keeapalive connections are active.  When a threshold
is reached, it does not allow anymore keepalives.  (Basically sets
r-connection-keepalive = AP_CONN_CLOSE).  This works for us, but the 
limit

is per process and only works for threaded MPM's.


Long Term solution:

Keep track of keepalives in the scoreboard (or somewhere else). Allow
admin's to set a threshold for keepalives:

MaxClients 1024
MaxConcurrentKeepalives 768

Or something like that.


Thoughts?  


Both approaches sound pragmatic (+.5) although I would like to think the best long term solution is to 
completely decouple TCP connections from worker threads. The event MPM is an experiment in that direction but 
it still has a long ways to go. Earliest I could see this happening is in the v 2.4 timeframe.


Bill



Re: Keepalives

2005-06-17 Thread Brian Akins

Nick Kew wrote:

Could that be done dynamically?  As in, make the max keepalive time a
function of how near the server is to running out of spare workers?


Sure.  I'd have to poke around a bit to see the best way to do it. 
Speed is of utmost concern for us. I guess I could dynamically change 
r-server-keep_alive_max or r-server-keep_alive_timeout? Maybe make 
the timeout a sliding timeout something like:


/*calculate max_clients by querying mpm*/
/*is there a good, fast way to get idle workers?*/
/*store keep_alive_timeout somewhere*/

r-server-keep_alive_timeout = keepalive_timeout / (max_clients / 
idle_workers);



Thoughts?



Also, have you looked into making keepalive dependent on resource type?
E.g. use them for HTML docs - which typically have inline contents - but
not for other media types unless REFERER is a local HTML page?


Sounds horribly slow... Also, in our case, HTML and other content come 
from separate server pools.  But most pages are made up of a few HTML 
pages.  (You have to look at the HTML source to see what I mean).


Also, we have some app servers that often have all connections tied up 
in keepalive because the front ends open tons of keepalives (I have no 
direct control of them).


I was hoping for a more generic solution that would maybe help others. 
I'm sure there are others with similar situations.




--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies


Re: Keepalives

2005-06-17 Thread William A. Rowe, Jr.
At 09:27 AM 6/17/2005, Brian Akins wrote:
Also, I'd be very concerned
about additional load - clients who are retrieving many gifs (with
no pause at all) in a pipelined fashion will end up hurting the
over resource usage if you force them back to HTTP/1.0 behavior.

Yes, but if all threads are waiting for x seconds for keepalives (even if it 
is 3-5 seconds), the server cannot service any new clients.  I'm willing to 
take an overall resource hit (and inconvenience some clients) to maintain 
the overall availability of the server.

Does that make any sense?  It does to me, but I may not be explaining our 
problem well.

Yes it makes sense.  But I'd encourage you to consider dropping that
keepalive time and see if the problem isn't significantly mitigated.

We have a schema today to create 'parallel' scoreboards, but perhaps
in the core we should offer this is a public API to module authors,
to keep it very simple?

I believe keepalive-blocked read should be able to be determined
from the scoreboard.  As far as 'counting' states, that would be
somewhat interesting.  Right now, it does take cycles to walk the
scoreboard to determine the number in a given state (and this is
somewhat fuzzy since values are flipping as you walk along the
list of workers.)  Adding an indexed list of 'counts' would be
very lightweight, and one atomic increment and decrement per state
change.  This would probably be more efficient than walking the
entire list.

In any case, I would simply extend counts for all registered
request states in the scoreboard, rather than a one-off for
every state someone becomes interested in.

Bill  



Re: Keepalives

2005-06-17 Thread Brian Akins

Bill Stoddard wrote:
If the event MPM is working properly, then a worker thread should not be 
blocking waiting for the next ka
request. You still have the overhead of the tcp connection and some 
storage used by httpd to manage connection

events but both of those are small compared to a blocking thread.


Should there be an upper limit on how many connections to have in 
keepalive, even when using event? Say you have 100 worker threads, you 
wouldn't want to have 8192 keepalive connections.  So you would want 
some limit.


Both approaches sound pragmatic (+.5) although I would like to think the 
best long term solution is to
completely decouple TCP connections from worker threads. 


I really like the event mpm, but I still think there has to be an upper 
limit on how many connections to allow to keepalive.



is an experiment in that direction but
it still has a long ways to go. Earliest I could see this happening is 
in the v 2.4 timeframe.


We've been doing some testing with the current 2.1 implementation, and 
it works, it just currently doesn't offer much advantage over worker for 
us.  If num keepalives == maxclients, you can't accept anymore 
connections.  I want to be able to limit total number of keepalives.




--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies


Re: Keepalives

2005-06-17 Thread Brian Akins

William A. Rowe, Jr. wrote:

Yes it makes sense.  But I'd encourage you to consider dropping that
keepalive time and see if the problem isn't significantly mitigated.


It is mitigated somewhat, but we still hit maxclients without our hack 
in place.




Right now, it does take cycles to walk the
scoreboard to determine the number in a given state (and this is
somewhat fuzzy since values are flipping as you walk along the
list of workers.)  



I know the worker MPM, for example, keeps a count of idle workers 
internally.  Maybe just an mpm query to retrieve that value would be 
good?  all MPM's keep track of this in some fashion because they all 
know when maxclients is reached.




Adding an indexed list of 'counts' would be
very lightweight, and one atomic increment and decrement per state
change.  This would probably be more efficient than walking the
entire list.


Sounds good.  Of course, when changing from on state to another you 
would always have to decrement the previous state and increment the new 
one.  The way the core seems to be now, that would require some careful 
examination of the code to ensure all the state changes were covered.




--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies


Re: Keepalives

2005-06-17 Thread Paul Querna
. Snipping all the other issues, which are largely valid and do 
contain some good ideas


Akins, Brian wrote:


Here's the problem:

If you want to use keepalives, all of you workers 
(threads/procs/whatever)
can become busy just waiting on another request on a keepalive 
connection.

Raising MaxClients does not help.

The Event MPM does not seems to really help this situation.  It seems to
only make each keepalive connection cheaper.  It can still allow all
workers to be blocking on keepalives.



Can you be more detailed on this?  It really _should_ help, and my 
testing says it does.  What are you seeing behavior wise?  Any changes 
you would like made?


Re: Keepalives

2005-06-17 Thread William A. Rowe, Jr.
At 10:12 AM 6/17/2005, Brian Akins wrote:
Adding an indexed list of 'counts' would be
very lightweight, and one atomic increment and decrement per state
change.  This would probably be more efficient than walking the
entire list.

Sounds good.  Of course, when changing from on state to another you would 
always have to decrement the previous state and increment the new one.  The 
way the core seems to be now, that would require some careful examination of 
the code to ensure all the state changes were covered.

In that exact order :)  Much better to 'under report' a given state,
and consider that under reporting (if you sum during an update) to be
a product of state changes.

I think ++, -- would be much more misleading, since the server would
be taking more actions than could possibly occur at once.

Bill




Re: Keepalives

2005-06-17 Thread Bill Stoddard

Brian Akins wrote:

Bill Stoddard wrote:

If the event MPM is working properly, then a worker thread should not 
be blocking waiting for the next ka
request. You still have the overhead of the tcp connection and some 
storage used by httpd to manage connection

events but both of those are small compared to a blocking thread.



Should there be an upper limit on how many connections to have in 
keepalive, even when using event? Say you have 100 worker threads, you 
wouldn't want to have 8192 keepalive connections.  So you would want 
some limit.


Both approaches sound pragmatic (+.5) although I would like to think 
the best long term solution is to
completely decouple TCP connections from worker threads. 



I really like the event mpm, but I still think there has to be an upper 
limit on how many connections to allow to keepalive.



is an experiment in that direction but
it still has a long ways to go. Earliest I could see this happening is 
in the v 2.4 timeframe.



We've been doing some testing with the current 2.1 implementation, and 
it works, it just currently doesn't offer much advantage over worker for 
us.  If num keepalives == maxclients, you can't accept anymore 
connections.  
Interesting point; it's been a while since I looked at the event MPM but I thought (mistakenly) that 
maxclients accounting was adjusted to reflect max number of concurrently active worker threads rather than 
active tcp connections. I agree we need some kind of upper limit on the max number of TCP connections into a 
running instance of httpd, regardless of whether those connections are associated with a worker thread or not.


Bill



Re: Keepalives

2005-06-17 Thread Brian Akins

Any interest/objections to added another MPM query

AP_MPMQ_IDLE_WORKERS

(or some other name)

in worker.c, could just add this to ap_mpm_query:


 case AP_MPMQ_IDLE_WORKERS:
*result = ap_idle_thread_count;
return APR_SUCCESS;


and in perform_idle_server_maintenance we would update ap_idle_thread_count.

I can submit a patch if anyone thinks this has a chance of being committed.



--
Brian Akins
Lead Systems Engineer
CNN Internet Technologies


Re: Keepalives

2005-06-17 Thread Greg Ames

Brian Akins wrote:

Bill Stoddard wrote:

If the event MPM is working properly, then a worker thread should not 
be blocking waiting for the next ka
request. You still have the overhead of the tcp connection and some 
storage used by httpd to manage connection

events but both of those are small compared to a blocking thread.



Should there be an upper limit on how many connections to have in 
keepalive, even when using event? Say you have 100 worker threads, you 
wouldn't want to have 8192 keepalive connections.  So you would want 
some limit.


I really like the event mpm, but I still think there has to be an upper 
limit on how many connections to allow to keepalive.


I'm pleased to hear you've tried the event mpm.

not sure why there has to be a limit.  are you talking about connections per 
worker process?  except for the size of the pollset, I didn't see a need to put 
a limit on the number of connections per worker process back when I was stress 
testing it with specweb99.  when a worker process was saturated with active 
threads, the listener thread would block in ap_queue_info_wait_for_idler() until 
a worker thread freed up.  in the mean time, other processes would grab the new 
connections.  so it was sort of self balancing as far as distributing 
connections among processes.


not sure if the current code still behaves that way.  I plan to find out soon 
though.


We've been doing some testing with the current 2.1 implementation, and 
it works, it just currently doesn't offer much advantage over worker for 
us.  If num keepalives == maxclients, you can't accept anymore 
connections.  


that's a surprise, and it sounds like a bug.  I'll investigate.  it used to be 
that maxclients was really max worker threads and you could have far more 
connections than threads.


thanks for the feedback.

Greg