On 06.03.2009 09:22, Mladen Turk wrote:
BZ 46808 is valid but it brings us back where we were before.
It solves (well doesn't actually) one thing,
but breaks the sticky sessions.
Unless the patch reliably detects the cause of failure
I'm -1 for committing that. The problem is that returned
codes from socket operations are platform specific, and
then what if some one pulls the cable in the middle of
AJP request? The possibilities are endless, and trying to
solve each and every one is over engineering thought.
We should keep the things simple:
"If there are opened connections to the backend do not
mark that worker in error"
We simply do not know the status of the network beyond
the current connection. It might be Tomcat rejecting
new connections, firewall reaching the limit, ulimit,
ssh tunnel restart, etc, etc ...
So although our connection fails other might still
work. If we can for sure detect something as absolute
failure then we could save some concurrent client
cycles, but then again we have a watchdog that can
check all the client connections in background.
I don't want to simply commit the patch of BZ 46808, I first want to
fully understand the construction and implications of the local error
states. I did understand the concept in general (even before 1.2.27 was
released), but nevertheless I have some questions:
1) New busy introduced in r647085
=================================
We have three busy counters:
a) one for the lb in total
b) one for each lb sub
c) one for each ajp worker
In status worker we use only a) and c). In lb we use a) and b). Your
comment to BZ 46808 seems to indicate, that using c) instead ob b) in lb
would be better. We could then again remove b).
Right?
So we could drop the rec->s->busy++ and --, because that's done for the
ajp busyness alredy in jk_ajp_common.c and we would test against
aw->s->busy instead of rec->s->busy.
OK?
2) Local states vs. global states
=================================
I went through all places were we use and set states in lb. I have some
comments:
a) Setting local states and global states to different values
-------------------------------------------------------------
Of course the whole idea of the local state is to allow situations,
where we behave different for a single request and for the rest of the
system. On the other hand if we have to many such differences, the
system gets very complex, because as you can see further down, we need
to carefully decide whether we need to test against the global or
against the local state.
First the easier cases (I think)
(i) JK_CLIENT_ERROR
1258 else if (service_stat == JK_CLIENT_ERROR) {
1259 /*
1260 * Client error !!!
1261 * Since this is bad request do not fail over.
1262 */
1263 rec->s->state = JK_LB_STATE_OK;
1264 p->states[rec->i] = JK_LB_STATE_ERROR;
1265 rec->s->error_time = 0;
1266 rc = JK_CLIENT_ERROR;
1267 recoverable = JK_FALSE;
1268 }
recoverable=JK_FALSE means we no longer loop in service() and thus
states are no longer used. I would therefore suggest to keep the two
states equal to "OK" here. It doesn't matter for the logic, but makes
keeping track of the differences easier.
(ii) JK_STATUS_FATAL_ERROR
1291 else if (service_stat == JK_STATUS_FATAL_ERROR) {
1292 /*
1293 * Status code configured as service is down.
1294 * Mark the node as bad.
1295 * Failing over to another node could help.
1296 */
1297 rec->s->errors++;
1298 if (rec->s->busy) {
1299 rec->s->state = JK_LB_STATE_OK;
1300 }
1301 else {
1302 rec->s->state = JK_LB_STATE_ERROR;
1303 }
1304 p->states[rec->i] = JK_LB_STATE_ERROR;
1305 rec->s->error_time = time(NULL);
1306 rc = JK_FALSE;
1307 }
This is the situation, where fail_on_status was triggered with a
positive status code, so the admin configured to use this as ERROR. And
despite busyness, we actually got a response back. So I propose to not
make a distinction based on busyness here and to set the global state to
ERROR too.
OK?
(iii) JK_REPLY_TIMEOUT
1308 else if (service_stat == JK_REPLY_TIMEOUT) {
1309 if (aw->s->reply_timeouts >
(unsigned)p->worker->max_reply_timeouts) {
1310 /*
1311 * Service failed - to many reply timeouts
1312 * Take this node out of service.
1313 */
1314 rec->s->errors++;
1315 if (rec->s->busy) {
1316 rec->s->state = JK_LB_STATE_OK;
1317 }
1318 else {
1319 rec->s->state = JK_LB_STATE_ERROR;
1320 }
1321 p->states[rec->i] = JK_LB_STATE_ERROR;
1322 rec->s->error_time = time(NULL);
We are already over our max_reply_timeouts. So this should trigger
global ERROR, even when busy. The fact we have max_reply_timeouts is
exactly to prevent isolated reply_timeouts to trigger ERROR. If we
really go over that limit, we should go into ERROR without further
conditions. So I propose global ERROR here, without checking busy.
Now the more difficult cases:
(iv) JK_SERVER_ERROR
We only get this, if a memory allocation fails.
1269 else if (service_stat == JK_SERVER_ERROR) {
1270 /*
1271 * Internal JK server error.
1272 * Don't mark the node as bad.
1273 * Failing over to another node could help.
1274 */
1275 rec->s->state = JK_LB_STATE_OK;
1276 p->states[rec->i] = JK_LB_STATE_ERROR;
1277 rec->s->error_time = 0;
1278 rc = JK_FALSE;
1279 }
I'm fine with what you decided, although actually I see no reason why
allocation should work better or worse for one of the lb members.
We can leave it, to keep track of the differences it would be easier to
set local state to OK too.
(v) the "else"
else {
/*
* Service failed !!!
* Time for fault tolerance (if possible)...
*/
rec->s->errors++;
if (rec->s->busy) {
rec->s->state = JK_LB_STATE_OK;
}
else {
rec->s->state = JK_LB_STATE_ERROR;
}
p->states[rec->i] = JK_LB_STATE_ERROR;
rec->s->error_time = time(NULL);
rc = JK_FALSE;
}
This is the hard case. It handles connect problems, protocol failures
and a lot of various problems.
I agree, that it is hard to decide whether an error here should be
handled as a global failover decision. On the other hand in production
it is likely that busy>0 so we will never set a global error in this
"else" cases.
Of course now we are in good shape with cping/cpong (if a connection
exists) and the new socket connect timeout (if not) to detect problems
on the network layer really fast and before the request is wasted und
unrecoverable.
My idea would be: leave this as is, but add a second check based on the
error_time. We still set it, even when the global state is OK. In the
other cases, were you use the local state to simply prevent using the
same member in the service loop twice, we don't set it. So it's a good
indicator for this "else" tree.
Something along the lines of:
Index: jk_lb_worker.h
===================================================================
--- jk_lb_worker.h (revision 750052)
+++ jk_lb_worker.h (working copy)
@@ -123,6 +123,9 @@
/* Time to wait before retry. */
#define WAIT_BEFORE_RECOVER (60)
+/* Time after a local error switches to global error (seconds) */
+#define MAX_LOCAL_ERROR_TIME 10
+
/* We accept doing global maintenance if we are */
/* JK_LB_MAINTAIN_TOLERANCE seconds early. */
#define JK_LB_MAINTAIN_TOLERANCE (2)
Index: jk_lb_worker.c
===================================================================
--- jk_lb_worker.c (revision 750069)
+++ jk_lb_worker.c (working copy)
@@ -1338,15 +1338,17 @@
* Service failed !!!
* Time for fault tolerance (if possible)...
*/
+ time_t now = time(NULL);
rec->s->errors++;
- if (rec->s->busy) {
- rec->s->state = JK_LB_STATE_OK;
- }
- else {
+ if (rec->s->busy == 0 ||
+ (rec->s->error_time > 0 &&
+ (int)difftime(now, rec->s->error_time) >
MAX_LOCAL_ERROR_TIME)) {
rec->s->state = JK_LB_STATE_ERROR;
}
p->states[rec->i] = JK_LB_STATE_ERROR;
- rec->s->error_time = time(NULL);
+ if (rec->s->error_time == 0) {
+ rec->s->error_time = now;
+ }
rc = JK_FALSE;
}
if (p->worker->lblock == JK_LB_LOCK_PESSIMISTIC)
This also has another important side effect: it prevents switching an
established global error state back to OK, if we are in the "else" case
and busy==0.
What do you think?
a) Local states and forced recovery
--------------------------------------
I think we should test against the local state in forced recovery
(whenever global is ERROR, local is as well, but local can be without
global). Since we use forced recovery in maintenance (there we have no
local states) and in normal service, I suggest:
Index: jk_lb_worker.c
===================================================================
--- jk_lb_worker.c (revision 750069)
+++ jk_lb_worker.c (working copy)
@@ -570,17 +570,23 @@
}
static int force_recovery(lb_worker_t *p,
+ int *states,
jk_logger_t *l)
{
unsigned int i;
int forced = 0;
+ int state;
lb_sub_worker_t *w = NULL;
ajp_worker_t *aw = NULL;
JK_TRACE_ENTER(l);
for (i = 0; i < p->num_of_workers; i++) {
w = &p->lb_workers[i];
- if (w->s->state == JK_LB_STATE_ERROR) {
+ if (states == NULL)
+ state = w->s->state;
+ else
+ state = states[i];
+ if (state == JK_LB_STATE_ERROR) {
if (JK_IS_DEBUG_LEVEL(l))
jk_log(l, JK_LOG_INFO,
"worker %s is marked for forced recovery",
@@ -588,6 +594,8 @@
aw = (ajp_worker_t *)w->worker->worker_private;
aw->s->reply_timeouts = 0;
w->s->state = JK_LB_STATE_FORCE;
+ if (states != NULL)
+ states[i] = JK_LB_STATE_FORCE;
forced++;
}
}
@@ -659,7 +667,7 @@
JK_LB_DECAY_MULT * delta / lb->maintain_time);
curmax = decay_load(lb, JK_LB_DECAY_MULT * delta /
lb->maintain_time, l);
if (!recover_workers(lb, curmax, now, l)) {
- force_recovery(lb, l);
+ force_recovery(lb, NULL, l);
}
}
@@ -1396,7 +1406,7 @@
* If it still fails, Tomcat is still disconnected.
*/
jk_shm_lock();
- nf = force_recovery(p->worker, l);
+ nf = force_recovery(p->worker, states, l);
jk_shm_unlock();
was_forced = 1;
if (nf) {
OK?
Regards,
Rainer
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org