[RADIATOR] Infinite retries in AuthByLOADBALANCE

Frank Danielson Wed, 04 Nov 2020 09:11:13 -0800

Good Day All-

We’ve been running AuthByLOADBALANCE for some time now and have noticed that if 
there is a message that does not get a response from the downstream hosts that 
it will be retried infinitely. This not only keeps the message around forever 
but as it is tried and failed, it increases the failure counts for the target 
hosts which makes them more likely to be marked unavailable and causes delivery 
problems with other requests.


For example a malformed request may be sent by an upstream client and handled 
by AuthByLOADBALANCE where the target hosts simply do not respond to the 
proxied request because they don’t like it. The request will be retried on the 
current host for Retries times by handle_timeout() after which the request is 
handed off to failed(), which tracks MaxFailedRequests for the host and marks 
it unavailable if applicable and then hands off the request to forward() which 
calls chooseHost() to find the next available host. The stock chooseHost() in 
AuthByRADIUS tracks if the request has reach the end of the list or not but 
chooseHost() in AuthByLOADBALANCE will always return a host if one is available 
and it could even be the same host as the last try if MaxFailedRequests has not 
been reached for that host. The end result is that the request will be retried 
forever and incrementing the failure count for downstream hosts, causing them 
to be marked unavailable.

After some looking at the code I think I could override failed() to track the 
number of unique hosts to which a request has been forwarded with something like

$fp->{retryHosts}->{$host}++

and then add a couple of checks in chooseHost() that are similar to the to 
original one-

if (@{$fp->{retryHosts}} < @{$self->{Hosts}})
{
foreach $host (@{$self->{Hosts}})
 {
  next if ($fp->{retryHosts}->{$host})
  …

The end result being that the request will be tried for each host in the list 
Retries times and then the next best candidate chosen by the volume algorithm 
until all hosts are tried and then the request fails. That may not be the 
optimal behavior but it beats trying forever.

Before doing that and bearing the burden of maintaining a custom AuthBy I 
figured I’d send it to the list and see if someone else has already solved this 
problem or if Open Systems would be willing to revisit the AuthByLOADBALANCE 
logic. Perhaps changing the interpretation of Retries to mean the total number 
of times a request is retried instead of a per host number in order to have a 
finite lifetime on a request? In that case chooseHost() could be called for 
each retry in handle_timeout() to increase the chances of success.

Regards-

[cid:3BC7925D-9AA6-49B4-BE13-4C50B5984F63]

Frank Danielson | S.V.P. Engineering
• 
[email protected]<applewebdata://B42CE82B-00AD-4466-A1C0-45CE1FB8AEBB/[email protected]>

_______________________________________________
radiator mailing list
[email protected]
https://lists.open.com.au/mailman/listinfo/radiator

[RADIATOR] Infinite retries in AuthByLOADBALANCE

Reply via email to