https://issues.apache.org/bugzilla/show_bug.cgi?id=52567

             Bug #: 52567
           Summary: Worker recovery state does not properly persist if no
                    traffic is received
           Product: Tomcat Connectors
           Version: 1.2.32
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Common
        AssignedTo: dev@tomcat.apache.org
        ReportedBy: aogb...@redhat.com
    Classification: Unclassified


I've noticed an issue with the worker recovery state.  If the worker receives
no traffic after it goes into recovery mode, the worker will flip back into
full error mode again with the next worker maintenance call.  This can be
problematic in certain scenarios without session replication/failover and low
traffic in a multiple httpd server mod_jk load balancing configuration.

If traffic is unlucky enough just to hit the worker when it has flipped back
into error mode, the worker doesn't get a chance to recover.  Checking the
relevant code, I see the cause behind this behavior in recover_workers:

        else if (w->s->error_time > 0 &&
                 (int)difftime(now, w->s->error_time) >=
p->error_escalation_time) {
            if (JK_IS_DEBUG_LEVEL(l))
                jk_log(l, JK_LOG_DEBUG,
                       "worker %s escalating local error to global error",
                       w->name);
            w->s->state = JK_LB_STATE_ERROR;
        }

A worker in recovery mode has an error_time still set with a difftime that is
greater than the error_escalation_time and so it falls into the "escalating
local error to global error" block and moves back to full error mode. This
issue could likely typically be worked around through other config options or
administrative practices through the jkstatus, but this is inconsistent with
expected/intended behavior and looks like an easy fix.  It seems this could be
corrected with an additional check to confirm that the worker state is not
JK_LB_STATE_RECOVER, for example:

        else if (w->s->error_time > 0 &&
                 (int)difftime(now, w->s->error_time) >=
p->error_escalation_time) {
            if (w->s->state != JK_LB_STATE_RECOVER) {
                 if (JK_IS_DEBUG_LEVEL(l))
                     jk_log(l, JK_LOG_DEBUG,
                            "worker %s escalating local error to global error",
                            w->name);
                 w->s->state = JK_LB_STATE_ERROR;
            }
        }

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to