Hi everyone,
We are using Pound 2.4 in production to load balance some web servers.
Actually we have a load problem on the web servers (we are fixing this)
that leads to them sometimes not responding during peeks (connection
timeout from Pound perspective).
As Pound get connection timeouts to them, it marks them as dead.
We have configured the "Alive" value at 2 seconds, so we expected our
web backends being considered alive within this time when they get
better health.
The reality is a completly different story. Backends that start again
responding are not considered alive until a few minutes have passed.
I'm not a C expert, but looking at the 2.6 source code, it looks like
the alive/dead logic is quiet wrong, please correct me if I'm wrong :
(into do_resurect(void) )
for(svc = services; svc; svc = svc->next) {
for(modified = 0, be = svc->backends; be; be = be->next) {
....
if(connect_nb(sock, addr, be->conn_to) == 0) {
be->resurrect = 1;
modified = 1;
}
}
if(modified) {
...
if(be->resurrect) {
be->alive = 1;
str_be(buf, MAXBUF - 1, be);
logmsg(LOG_NOTICE, "BackEnd %s resurrect", buf);
}
...
}
...
}
This means that Pound first tries to connect to every dead backend, just
note the state, and then only starts resurrecting servers.
If servers are still dead, the first operation takes
NUMBER_OF_DEAD_BACKEND * TIMEOUT seconds, and only after this time
servers are resurrected if necessary.
This explains what we see, resurrecting code have to wait for all dead
backend timeouts before doing its job.
This behaviour looks wrong, the resurrecting action should be taken as
soon as one server is seen alive so it does not have to wait for all
other backend timeouts.
Am I right?
If I do, is there any chance you provide a patch soon ?
Regards,
Vincent Miszczak
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
--
To unsubscribe send an email with subject unsubscribe to pound@apsis.ch.
Please contact ro...@apsis.ch for questions.