proxy_handler() calls ap_proxy_pre_request() inside a do loop over balanced workers.

This in turn calls proxy_balancer_pre_request() which does

   (*worker)->s->busy++.

Correspondingly proxy_balancer_post_request() does:

       if (worker && worker->s->busy)
           worker->s->busy--;

Unfortunately, proxy_handler only calls proxy_run_post_request() and thus proxy_balancer_post_request() outside the do loop. Thus the "busy" count of workers which currently cannot take requests (e.g. that are currently dead) increases without bound due to retries -- and is never reset.

Does anyone (i.e. who is more familiar with this code) have suggestions for how this should be fixed? If not, I can take a swing at it.

Similarly, when retrying workers in various routines in mod_proxy_balancer.c those worker's lbstatus is incremented. If the retry fails, however, the lbstatus is never reset. This issue also leads to an lbstatus that increases without bound. Just because a worker was dead for 8 hours does not mean it can handle all the work load now. It needs to start fresh -- not 8 hours in the hole. This issue also creates an unduly huge impact when doing

   mycandidate->s->lbstatus -= total_factor;

We're seeing the load balancing be thrown dramatically off in this case.

Does anyone have suggestions for how this should be fixed? If not, again I can take a swing at this, e.g. reseting lbstatus to 0 in ap_proxy_retry_worker().

It *seems* like both of the issue center on handling of dead workers, especially having a multiple dead workers and/or workers that are dead for long periods of time.

I've not yet checked whether mod_jk (where I believe these basic algorithms came from) has similar issues.

--
Jess Holle

Reply via email to