proxy_handler() calls ap_proxy_pre_request() inside a do loop over
balanced workers.
This in turn calls proxy_balancer_pre_request() which does
(*worker)->s->busy++.
Correspondingly proxy_balancer_post_request() does:
if (worker && worker->s->busy)
worker->s->busy--;
Unfortunately, proxy_handler only calls proxy_run_post_request() and
thus proxy_balancer_post_request() outside the do loop. Thus the "busy"
count of workers which currently cannot take requests (e.g. that are
currently dead) increases without bound due to retries -- and is never
reset.
Does anyone (i.e. who is more familiar with this code) have suggestions
for how this should be fixed? If not, I can take a swing at it.
Similarly, when retrying workers in various routines in
mod_proxy_balancer.c those worker's lbstatus is incremented. If the
retry fails, however, the lbstatus is never reset. This issue also
leads to an lbstatus that increases without bound. Just because a
worker was dead for 8 hours does not mean it can handle all the work
load now. It needs to start fresh -- not 8 hours in the hole. This
issue also creates an unduly huge impact when doing
mycandidate->s->lbstatus -= total_factor;
We're seeing the load balancing be thrown dramatically off in this case.
Does anyone have suggestions for how this should be fixed? If not,
again I can take a swing at this, e.g. reseting lbstatus to 0 in
ap_proxy_retry_worker().
It *seems* like both of the issue center on handling of dead workers,
especially having a multiple dead workers and/or workers that are dead
for long periods of time.
I've not yet checked whether mod_jk (where I believe these basic
algorithms came from) has similar issues.
--
Jess Holle