Re: Keepalives
Greg Ames wrote: Brian Akins wrote: We've been doing some testing with the current 2.1 implementation, and it works, it just currently doesn't offer much advantage over worker for us. If num keepalives == maxclients, you can't accept anymore connections. that's a surprise, and it sounds like a bug. I'll investigate. the event mpm in httpd-2.1 trunk is working fine for me. running a specweb99 mini-benchmark with MaxClients 50 and Listen 8092, I see: [EMAIL PROTECTED] built]$ while true; do netstat -an | grep -c 8092.*ESTABLISHED; done 146 163 162 188 164 166 149 145 157 152 163 ...so with this setup, I have roughly 3 connections for every worker thread, including the idle threads. here is a server-status: http://people.apache.org/~gregames/event-server-status.html how are you counting connections? Greg
Re: Keepalives
Title: Re: Keepalives On 6/20/05 3:14 PM, Greg Ames [EMAIL PROTECTED] wrote: ...so with this setup, I have roughly 3 connections for every worker thread, including the idle threads. Cool. Maybe I just need the latest version. Or I could have just screwed my test... Anyway, there should be a way to limit how many idle connections to keep around. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Keepalives
At 08:11 AM 6/17/2005, Akins, Brian wrote: If you want to use keepalives, all of you workers (threads/procs/whatever) can become busy just waiting on another request on a keepalive connection. Raising MaxClients does not help. No, it doesn't :) But lowering the keepalive threshold to three to five seconds does. We are lowering the 'example' keepalive timeout in the next releases. Keepalives as originally implemented were to help users with loading additional images (and now, css stylesheets.) And the time was set to allow a user to grab 'another page' if they quickly clicked through. But the later use is not a good use of child slots, and the former use case has a -much- lower window these days, as most browsers are quite fast to compose the base document and determine what images are required. With the relative disappearance of 1200baud dialup, 15 seconds for the client to sit and think about grabbing more documents to compose the current page is very silly :) The Event MPM does not seems to really help this situation. It seems to only make each keepalive connection cheaper. It can still allow all workers to be blocking on keepalives. Short Term solution: This is what we did. We use worker MPM. We wrote a simple modules that keep track of how many keeapalive connections are active. When a threshold is reached, it does not allow anymore keepalives. (Basically sets r-connection-keepalive = AP_CONN_CLOSE). This works for us, but the limit is per process and only works for threaded MPM's. Long Term solution: Keep track of keepalives in the scoreboard (or somewhere else). Allow admin's to set a threshold for keepalives: MaxClients 1024 MaxConcurrentKeepalives 768 Or something like that. Thoughts? I am willing to write the code if this seems desirable. Should this just be another module or in the http core? If you experiment with setting the keepalive window to 3 seconds or so, how does that affect your test? Also, I'd be very concerned about additional load - clients who are retrieving many gifs (with no pause at all) in a pipelined fashion will end up hurting the over resource usage if you force them back to HTTP/1.0 behavior. Bill
Re: Keepalives
William A. Rowe, Jr. wrote: No, it doesn't :) But lowering the keepalive threshold to three to five seconds does. For us, in heavy loads, that's 3-5 seconds that a thread cannot process a new client. Under normal circumstances, the 15 seconds is fine, but when we are stressed, we need to free threads as quickly as possible. Also, I'd be very concerned about additional load - clients who are retrieving many gifs (with no pause at all) in a pipelined fashion will end up hurting the over resource usage if you force them back to HTTP/1.0 behavior. Yes, but if all threads are waiting for x seconds for keepalives (even if it is 3-5 seconds), the server cannot service any new clients. I'm willing to take an overall resource hit (and inconvenience some clients) to maintain the overall availability of the server. Does that make any sense? It does to me, but I may not be explaining our problem well. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Keepalives
Akins, Brian wrote: Short Term solution: This is what we did. We use worker MPM. We wrote a simple modules that keep track of how many keeapalive connections are active. When a threshold is reached, it does not allow anymore keepalives. (Basically sets r-connection-keepalive = AP_CONN_CLOSE). This works for us, but the limit is per process and only works for threaded MPM's. Could that be done dynamically? As in, make the max keepalive time a function of how near the server is to running out of spare workers? Oh, and is the default still ridiculously high? ISTR it being 15 secs at one time - not sure if that ever got changed. Also, have you looked into making keepalive dependent on resource type? E.g. use them for HTML docs - which typically have inline contents - but not for other media types unless REFERER is a local HTML page? Long Term solution: Keep track of keepalives in the scoreboard (or somewhere else). Allow admin's to set a threshold for keepalives: MaxClients 1024 MaxConcurrentKeepalives 768 Or something like that. Thoughts? I am willing to write the code if this seems desirable. Should this just be another module or in the http core? Is that a candidate application for the monitor hook? Other things being equal, I'd make it a module. -- Nick Kew
Re: Keepalives
Akins, Brian wrote: Here's the problem: If you want to use keepalives, all of you workers (threads/procs/whatever) can become busy just waiting on another request on a keepalive connection. Raising MaxClients does not help. The Event MPM does not seems to really help this situation. It seems to only make each keepalive connection cheaper. It can still allow all workers to be blocking on keepalives. If the event MPM is working properly, then a worker thread should not be blocking waiting for the next ka request. You still have the overhead of the tcp connection and some storage used by httpd to manage connection events but both of those are small compared to a blocking thread. Short Term solution: This is what we did. We use worker MPM. We wrote a simple modules that keep track of how many keeapalive connections are active. When a threshold is reached, it does not allow anymore keepalives. (Basically sets r-connection-keepalive = AP_CONN_CLOSE). This works for us, but the limit is per process and only works for threaded MPM's. Long Term solution: Keep track of keepalives in the scoreboard (or somewhere else). Allow admin's to set a threshold for keepalives: MaxClients 1024 MaxConcurrentKeepalives 768 Or something like that. Thoughts? Both approaches sound pragmatic (+.5) although I would like to think the best long term solution is to completely decouple TCP connections from worker threads. The event MPM is an experiment in that direction but it still has a long ways to go. Earliest I could see this happening is in the v 2.4 timeframe. Bill
Re: Keepalives
Nick Kew wrote: Could that be done dynamically? As in, make the max keepalive time a function of how near the server is to running out of spare workers? Sure. I'd have to poke around a bit to see the best way to do it. Speed is of utmost concern for us. I guess I could dynamically change r-server-keep_alive_max or r-server-keep_alive_timeout? Maybe make the timeout a sliding timeout something like: /*calculate max_clients by querying mpm*/ /*is there a good, fast way to get idle workers?*/ /*store keep_alive_timeout somewhere*/ r-server-keep_alive_timeout = keepalive_timeout / (max_clients / idle_workers); Thoughts? Also, have you looked into making keepalive dependent on resource type? E.g. use them for HTML docs - which typically have inline contents - but not for other media types unless REFERER is a local HTML page? Sounds horribly slow... Also, in our case, HTML and other content come from separate server pools. But most pages are made up of a few HTML pages. (You have to look at the HTML source to see what I mean). Also, we have some app servers that often have all connections tied up in keepalive because the front ends open tons of keepalives (I have no direct control of them). I was hoping for a more generic solution that would maybe help others. I'm sure there are others with similar situations. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Keepalives
At 09:27 AM 6/17/2005, Brian Akins wrote: Also, I'd be very concerned about additional load - clients who are retrieving many gifs (with no pause at all) in a pipelined fashion will end up hurting the over resource usage if you force them back to HTTP/1.0 behavior. Yes, but if all threads are waiting for x seconds for keepalives (even if it is 3-5 seconds), the server cannot service any new clients. I'm willing to take an overall resource hit (and inconvenience some clients) to maintain the overall availability of the server. Does that make any sense? It does to me, but I may not be explaining our problem well. Yes it makes sense. But I'd encourage you to consider dropping that keepalive time and see if the problem isn't significantly mitigated. We have a schema today to create 'parallel' scoreboards, but perhaps in the core we should offer this is a public API to module authors, to keep it very simple? I believe keepalive-blocked read should be able to be determined from the scoreboard. As far as 'counting' states, that would be somewhat interesting. Right now, it does take cycles to walk the scoreboard to determine the number in a given state (and this is somewhat fuzzy since values are flipping as you walk along the list of workers.) Adding an indexed list of 'counts' would be very lightweight, and one atomic increment and decrement per state change. This would probably be more efficient than walking the entire list. In any case, I would simply extend counts for all registered request states in the scoreboard, rather than a one-off for every state someone becomes interested in. Bill
Re: Keepalives
Bill Stoddard wrote: If the event MPM is working properly, then a worker thread should not be blocking waiting for the next ka request. You still have the overhead of the tcp connection and some storage used by httpd to manage connection events but both of those are small compared to a blocking thread. Should there be an upper limit on how many connections to have in keepalive, even when using event? Say you have 100 worker threads, you wouldn't want to have 8192 keepalive connections. So you would want some limit. Both approaches sound pragmatic (+.5) although I would like to think the best long term solution is to completely decouple TCP connections from worker threads. I really like the event mpm, but I still think there has to be an upper limit on how many connections to allow to keepalive. is an experiment in that direction but it still has a long ways to go. Earliest I could see this happening is in the v 2.4 timeframe. We've been doing some testing with the current 2.1 implementation, and it works, it just currently doesn't offer much advantage over worker for us. If num keepalives == maxclients, you can't accept anymore connections. I want to be able to limit total number of keepalives. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Keepalives
William A. Rowe, Jr. wrote: Yes it makes sense. But I'd encourage you to consider dropping that keepalive time and see if the problem isn't significantly mitigated. It is mitigated somewhat, but we still hit maxclients without our hack in place. Right now, it does take cycles to walk the scoreboard to determine the number in a given state (and this is somewhat fuzzy since values are flipping as you walk along the list of workers.) I know the worker MPM, for example, keeps a count of idle workers internally. Maybe just an mpm query to retrieve that value would be good? all MPM's keep track of this in some fashion because they all know when maxclients is reached. Adding an indexed list of 'counts' would be very lightweight, and one atomic increment and decrement per state change. This would probably be more efficient than walking the entire list. Sounds good. Of course, when changing from on state to another you would always have to decrement the previous state and increment the new one. The way the core seems to be now, that would require some careful examination of the code to ensure all the state changes were covered. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Keepalives
. Snipping all the other issues, which are largely valid and do contain some good ideas Akins, Brian wrote: Here's the problem: If you want to use keepalives, all of you workers (threads/procs/whatever) can become busy just waiting on another request on a keepalive connection. Raising MaxClients does not help. The Event MPM does not seems to really help this situation. It seems to only make each keepalive connection cheaper. It can still allow all workers to be blocking on keepalives. Can you be more detailed on this? It really _should_ help, and my testing says it does. What are you seeing behavior wise? Any changes you would like made?
Re: Keepalives
At 10:12 AM 6/17/2005, Brian Akins wrote: Adding an indexed list of 'counts' would be very lightweight, and one atomic increment and decrement per state change. This would probably be more efficient than walking the entire list. Sounds good. Of course, when changing from on state to another you would always have to decrement the previous state and increment the new one. The way the core seems to be now, that would require some careful examination of the code to ensure all the state changes were covered. In that exact order :) Much better to 'under report' a given state, and consider that under reporting (if you sum during an update) to be a product of state changes. I think ++, -- would be much more misleading, since the server would be taking more actions than could possibly occur at once. Bill
Re: Keepalives
Brian Akins wrote: Bill Stoddard wrote: If the event MPM is working properly, then a worker thread should not be blocking waiting for the next ka request. You still have the overhead of the tcp connection and some storage used by httpd to manage connection events but both of those are small compared to a blocking thread. Should there be an upper limit on how many connections to have in keepalive, even when using event? Say you have 100 worker threads, you wouldn't want to have 8192 keepalive connections. So you would want some limit. Both approaches sound pragmatic (+.5) although I would like to think the best long term solution is to completely decouple TCP connections from worker threads. I really like the event mpm, but I still think there has to be an upper limit on how many connections to allow to keepalive. is an experiment in that direction but it still has a long ways to go. Earliest I could see this happening is in the v 2.4 timeframe. We've been doing some testing with the current 2.1 implementation, and it works, it just currently doesn't offer much advantage over worker for us. If num keepalives == maxclients, you can't accept anymore connections. Interesting point; it's been a while since I looked at the event MPM but I thought (mistakenly) that maxclients accounting was adjusted to reflect max number of concurrently active worker threads rather than active tcp connections. I agree we need some kind of upper limit on the max number of TCP connections into a running instance of httpd, regardless of whether those connections are associated with a worker thread or not. Bill
Re: Keepalives
Any interest/objections to added another MPM query AP_MPMQ_IDLE_WORKERS (or some other name) in worker.c, could just add this to ap_mpm_query: case AP_MPMQ_IDLE_WORKERS: *result = ap_idle_thread_count; return APR_SUCCESS; and in perform_idle_server_maintenance we would update ap_idle_thread_count. I can submit a patch if anyone thinks this has a chance of being committed. -- Brian Akins Lead Systems Engineer CNN Internet Technologies
Re: Keepalives
Brian Akins wrote: Bill Stoddard wrote: If the event MPM is working properly, then a worker thread should not be blocking waiting for the next ka request. You still have the overhead of the tcp connection and some storage used by httpd to manage connection events but both of those are small compared to a blocking thread. Should there be an upper limit on how many connections to have in keepalive, even when using event? Say you have 100 worker threads, you wouldn't want to have 8192 keepalive connections. So you would want some limit. I really like the event mpm, but I still think there has to be an upper limit on how many connections to allow to keepalive. I'm pleased to hear you've tried the event mpm. not sure why there has to be a limit. are you talking about connections per worker process? except for the size of the pollset, I didn't see a need to put a limit on the number of connections per worker process back when I was stress testing it with specweb99. when a worker process was saturated with active threads, the listener thread would block in ap_queue_info_wait_for_idler() until a worker thread freed up. in the mean time, other processes would grab the new connections. so it was sort of self balancing as far as distributing connections among processes. not sure if the current code still behaves that way. I plan to find out soon though. We've been doing some testing with the current 2.1 implementation, and it works, it just currently doesn't offer much advantage over worker for us. If num keepalives == maxclients, you can't accept anymore connections. that's a surprise, and it sounds like a bug. I'll investigate. it used to be that maxclients was really max worker threads and you could have far more connections than threads. thanks for the feedback. Greg