On 06.03.2009 09:22, Mladen Turk wrote:
BZ 46808 is valid but it brings us back where we were before.
It solves (well doesn't actually) one thing,
but breaks the sticky sessions.
Unless the patch reliably detects the cause of failure
I'm -1 for committing that. The problem is that returned
codes from socket operations are platform specific, and
then what if some one pulls the cable in the middle of
AJP request? The possibilities are endless, and trying to
solve each and every one is over engineering thought.
We should keep the things simple:

"If there are opened connections to the backend do not
mark that worker in error"
We simply do not know the status of the network beyond
the current connection. It might be Tomcat rejecting
new connections, firewall reaching the limit, ulimit,
ssh tunnel restart, etc, etc ...
So although our connection fails other might still
work. If we can for sure detect something as absolute
failure then we could save some concurrent client
cycles, but then again we have a watchdog that can
check all the client connections in background.

I don't want to simply commit the patch of BZ 46808, I first want to fully understand the construction and implications of the local error states. I did understand the concept in general (even before 1.2.27 was released), but nevertheless I have some questions:

1) New busy introduced in r647085
=================================

We have three busy counters:

a) one for the lb in total
b) one for each lb sub
c) one for each ajp worker

In status worker we use only a) and c). In lb we use a) and b). Your comment to BZ 46808 seems to indicate, that using c) instead ob b) in lb would be better. We could then again remove b).

Right?

So we could drop the rec->s->busy++ and --, because that's done for the ajp busyness alredy in jk_ajp_common.c and we would test against aw->s->busy instead of rec->s->busy.

OK?

2) Local states vs. global states
=================================

I went through all places were we use and set states in lb. I have some comments:

a) Setting local states and global states to different values
-------------------------------------------------------------

Of course the whole idea of the local state is to allow situations, where we behave different for a single request and for the rest of the system. On the other hand if we have to many such differences, the system gets very complex, because as you can see further down, we need to carefully decide whether we need to test against the global or against the local state.

First the easier cases (I think)

(i) JK_CLIENT_ERROR

1258                 else if (service_stat == JK_CLIENT_ERROR) {
1259                     /*
1260                      * Client error !!!
1261                      * Since this is bad request do not fail over.
1262                      */
1263                     rec->s->state  = JK_LB_STATE_OK;
1264                     p->states[rec->i] = JK_LB_STATE_ERROR;
1265                     rec->s->error_time = 0;
1266                     rc = JK_CLIENT_ERROR;
1267                     recoverable = JK_FALSE;
1268                 }

recoverable=JK_FALSE means we no longer loop in service() and thus states are no longer used. I would therefore suggest to keep the two states equal to "OK" here. It doesn't matter for the logic, but makes keeping track of the differences easier.

(ii) JK_STATUS_FATAL_ERROR

1291                 else if (service_stat == JK_STATUS_FATAL_ERROR) {
1292                     /*
1293                      * Status code configured as service is down.
1294                      * Mark the node as bad.
1295                      * Failing over to another node could help.
1296                      */
1297                     rec->s->errors++;
1298                     if (rec->s->busy) {
1299                         rec->s->state = JK_LB_STATE_OK;
1300                     }
1301                     else {
1302                         rec->s->state = JK_LB_STATE_ERROR;
1303                     }
1304                     p->states[rec->i] = JK_LB_STATE_ERROR;
1305                     rec->s->error_time = time(NULL);
1306                     rc = JK_FALSE;
1307                 }

This is the situation, where fail_on_status was triggered with a positive status code, so the admin configured to use this as ERROR. And despite busyness, we actually got a response back. So I propose to not make a distinction based on busyness here and to set the global state to ERROR too.

OK?

(iii) JK_REPLY_TIMEOUT

1308                 else if (service_stat == JK_REPLY_TIMEOUT) {
1309 if (aw->s->reply_timeouts > (unsigned)p->worker->max_reply_timeouts) {
1310                         /*
1311                          * Service failed - to many reply timeouts
1312                          * Take this node out of service.
1313                          */
1314                         rec->s->errors++;
1315                         if (rec->s->busy) {
1316                             rec->s->state = JK_LB_STATE_OK;
1317                         }
1318                         else {
1319                             rec->s->state = JK_LB_STATE_ERROR;
1320                         }
1321                         p->states[rec->i] = JK_LB_STATE_ERROR;
1322                         rec->s->error_time = time(NULL);

We are already over our max_reply_timeouts. So this should trigger global ERROR, even when busy. The fact we have max_reply_timeouts is exactly to prevent isolated reply_timeouts to trigger ERROR. If we really go over that limit, we should go into ERROR without further conditions. So I propose global ERROR here, without checking busy.

Now the more difficult cases:

(iv) JK_SERVER_ERROR

We only get this, if a memory allocation fails.

1269                 else if (service_stat == JK_SERVER_ERROR) {
1270                     /*
1271                      * Internal JK server error.
1272                      * Don't mark the node as bad.
1273                      * Failing over to another node could help.
1274                      */
1275                     rec->s->state  = JK_LB_STATE_OK;
1276                     p->states[rec->i] = JK_LB_STATE_ERROR;
1277                     rec->s->error_time = 0;
1278                     rc = JK_FALSE;
1279                 }

I'm fine with what you decided, although actually I see no reason why allocation should work better or worse for one of the lb members.

We can leave it, to keep track of the differences it would be easier to set local state to OK too.

(v) the "else"

                else {
                    /*
                     * Service failed !!!
                     * Time for fault tolerance (if possible)...
                     */
                    rec->s->errors++;
                    if (rec->s->busy) {
                        rec->s->state = JK_LB_STATE_OK;
                    }
                    else {
                        rec->s->state = JK_LB_STATE_ERROR;
                    }
                    p->states[rec->i] = JK_LB_STATE_ERROR;
                    rec->s->error_time = time(NULL);
                    rc = JK_FALSE;
                }

This is the hard case. It handles connect problems, protocol failures and a lot of various problems.

I agree, that it is hard to decide whether an error here should be handled as a global failover decision. On the other hand in production it is likely that busy>0 so we will never set a global error in this "else" cases.

Of course now we are in good shape with cping/cpong (if a connection exists) and the new socket connect timeout (if not) to detect problems on the network layer really fast and before the request is wasted und unrecoverable.

My idea would be: leave this as is, but add a second check based on the error_time. We still set it, even when the global state is OK. In the other cases, were you use the local state to simply prevent using the same member in the service loop twice, we don't set it. So it's a good indicator for this "else" tree.

Something along the lines of:

Index: jk_lb_worker.h
===================================================================
--- jk_lb_worker.h      (revision 750052)
+++ jk_lb_worker.h      (working copy)
@@ -123,6 +123,9 @@

 /* Time to wait before retry. */
 #define WAIT_BEFORE_RECOVER   (60)
+/* Time after a local error switches to global error (seconds) */
+#define MAX_LOCAL_ERROR_TIME 10
+
 /* We accept doing global maintenance if we are */
 /* JK_LB_MAINTAIN_TOLERANCE seconds early. */
 #define JK_LB_MAINTAIN_TOLERANCE (2)
Index: jk_lb_worker.c
===================================================================
--- jk_lb_worker.c      (revision 750069)
+++ jk_lb_worker.c      (working copy)
@@ -1338,15 +1338,17 @@
                      * Service failed !!!
                      * Time for fault tolerance (if possible)...
                      */
+                    time_t now = time(NULL);
                     rec->s->errors++;
-                    if (rec->s->busy) {
-                        rec->s->state = JK_LB_STATE_OK;
-                    }
-                    else {
+                    if (rec->s->busy == 0 ||
+                        (rec->s->error_time > 0 &&
+ (int)difftime(now, rec->s->error_time) > MAX_LOCAL_ERROR_TIME)) {
                         rec->s->state = JK_LB_STATE_ERROR;
                     }
                     p->states[rec->i] = JK_LB_STATE_ERROR;
-                    rec->s->error_time = time(NULL);
+                    if (rec->s->error_time == 0) {
+                        rec->s->error_time = now;
+                    }
                     rc = JK_FALSE;
                 }
                 if (p->worker->lblock == JK_LB_LOCK_PESSIMISTIC)

This also has another important side effect: it prevents switching an established global error state back to OK, if we are in the "else" case and busy==0.

What do you think?

a) Local states and forced recovery
--------------------------------------

I think we should test against the local state in forced recovery (whenever global is ERROR, local is as well, but local can be without global). Since we use forced recovery in maintenance (there we have no local states) and in normal service, I suggest:

Index: jk_lb_worker.c
===================================================================
--- jk_lb_worker.c      (revision 750069)
+++ jk_lb_worker.c      (working copy)
@@ -570,17 +570,23 @@
 }

 static int force_recovery(lb_worker_t *p,
+                          int *states,
                           jk_logger_t *l)
 {
     unsigned int i;
     int forced = 0;
+    int state;
     lb_sub_worker_t *w = NULL;
     ajp_worker_t *aw = NULL;
     JK_TRACE_ENTER(l);

     for (i = 0; i < p->num_of_workers; i++) {
         w = &p->lb_workers[i];
-        if (w->s->state == JK_LB_STATE_ERROR) {
+        if (states == NULL)
+            state = w->s->state;
+        else
+            state = states[i];
+        if (state == JK_LB_STATE_ERROR) {
             if (JK_IS_DEBUG_LEVEL(l))
                 jk_log(l, JK_LOG_INFO,
                        "worker %s is marked for forced recovery",
@@ -588,6 +594,8 @@
             aw = (ajp_worker_t *)w->worker->worker_private;
             aw->s->reply_timeouts = 0;
             w->s->state = JK_LB_STATE_FORCE;
+            if (states != NULL)
+                states[i] = JK_LB_STATE_FORCE;
             forced++;
         }
     }
@@ -659,7 +667,7 @@
                        JK_LB_DECAY_MULT * delta / lb->maintain_time);
curmax = decay_load(lb, JK_LB_DECAY_MULT * delta / lb->maintain_time, l);
             if (!recover_workers(lb, curmax, now, l)) {
-                force_recovery(lb, l);
+                force_recovery(lb, NULL, l);
             }
         }

@@ -1396,7 +1406,7 @@
                  * If it still fails, Tomcat is still disconnected.
                  */
                 jk_shm_lock();
-                nf = force_recovery(p->worker, l);
+                nf = force_recovery(p->worker, states, l);
                 jk_shm_unlock();
                 was_forced = 1;
                 if (nf) {


OK?

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to