On Sun, Mar 21, 2010 at 12:05:16AM -0400, Greg Gard wrote:
> thanks holger,
> 
> i did some research and was able to find more on mongrel and queuing.
> so that helps to clarify. i am unsure what i will do viz checking in
> the end as we have some long running requests that are frankly the
> bane of my existence and complicate load balancing. we need to
> refactor as part of the solution.
> 
> just to be complete, are there any plans to have health checks get queued?

We'll see how we can do that when checks are reworked, but quite
frankly, mongrel users are the *only* ones who need that, and when
you have one server which can take one minute to respond to a single
request, you have far more important trouble to worry about than if
health checks will be queued or not ! If a server can only do one
thing at a time, you must design it to do it extremely fast. In your
case, someone could ruin your day by just sending a few repeated
clicks to your server and feed it some work for one full day...
There's obviously something wrong !

As Holger indicated it, mongrel can queue requests, so if your
server had occasionally long response times of one SECOND, the check
would just transparently be queued and processed. But at some point,
infrastructure elements can't work around bad code, and nothing but
fixing the code will make your users accept to wait for the response.
Just imagine if someone posted a link to your site on a forum or any
regularly visited site. Your site would then be permanently down...

As a user, when I see that a site does not respond within 5-7
seconds, I first check my internet connectivity. After that, I
declare the site dead and go somewhere else, which is especially
true with online stores. You don't even know if *any* of your users
have ever waited for your site to respond to the "60+ sec" requests.

Also you should consider the cost of fixing the code versus paying
the electricity bill... Assuming your server consumes 400W, at 60s
it consumes 24 kJ per click !!! The equivalent of a 60W light bulb
for 7 minutes. For 20000 clicks which will take 1 second to haproxy,
you'll get work for 2 full weeks on your server or about 133 kWh !!!
SO YES, THERE IS SOMETHING DEFINITELY WRONG IN HAVING A WEB SERVER
TAKE 60+ SEC TO PROCESS ONE REQUEST. And excuse me for being so crude,
but if you don't fix that, your site is deemed to fail long before it
gets minimal audience.

In the mean time, the only thing I can suggest you is to use very
large check timeouts (larger than the longer supposedly valid
request), with a low retry count to avoid taking too much time to
declare a server down, and probably make use of the "observe",
"on-error" and "error-limit" server options to be able to set your
server down as soon as a 5xx response is returned to a client.

Hoping this helps,
Willy


Reply via email to