Graeme,
Hmm, it does appear to be a bit hard to find documentation on the stats
page :)
If I look at the stats page, on one of my backends I'm seeing these
values under the warning column: retr (344) and redis (172). The
backend has 8 servers and only 1 has non-zero values for this column.
I believe that they are retries and redispatches. To judge by the values
of the numbers a server will be retried twice before it redispatches (2
* 172 = 344).
Can someone explain what the numbers mean, I've tried poking through
the documentation but nothing stands out.
A retry is "is the number of times a connection attempt should be
retried on a server when a connection either is refused or times out.",
you can search the doc for "retries <value>"
In the docs, also found the following text for redispatch is "In HTTP
mode, if a server designated by a cookie is down, clients may definitely
stick to it because they cannot flush the cookie, so they will not be
able to access the service anymore." I'm not certain that this is
completely applicable in your case though.
It seems that your one server is either actively rejecting connections
or not receiving them. I assume they are all local servers with the same
configuration on the same network. I would recommend that you check the
network interface to make sure there are no issues with it. If there are
no problems at that level it may be that you don't have the same config
on each backend server, and you have a lower number of permitted
connections or similar.
If you enable tcp logging I believe you will be able to see more precise
information about why a given request has failed. This could let you
determine if you are experiencing timeouts (network or server config
related, probably more likely the former) or connection refused messages
(almost certainly server config related).
-JohnF