At 12:21 PM 10/29/00 +0000, David Hodgkinson wrote:
>Gunther Birznieks <[EMAIL PROTECTED]> writes:
>
> > I am also concerned that the original question brings up the notion of
> > failover. mod_backhand is not a failover solution. Backhand does have some
> > facilities to do some failover (eg ByAge weeding) but it's not failover in
> > the traditional sense. Backhand is for load balance not failover.
>
>Are we talking about failing "out" a server that's lost the plot, or
>bringing a new server "in" as well? Isn't it just a case of defaulting
>the apparent load of a failed machine up really high (like infinite)?
This question gets into the realm of stuff that I am not really well
qualified to answer. However, I think the way it works is that there are
several candicacy functions that slowly wittle down the list of servers to
direct a given request to.
The simulation of a failed machine defaulting to infinite load is a bit odd
in mod_backhand for a couple reasons.
1) The ByLoad candicacy function relies on resource information having been
broadcasted by potential backhand destinations not something that is
collected by the backhand origin.
Should a backhand destination server go down, it will not broadcast itself
and ByLoad will not know the resource update. In my experience, few
servers ever know they are going down before something catastrophic
happens. They may complain about something but they don't know it's going
down. Of course, there are cases when a machine knows it is on the doomed
list, but I would argue that this a rare case unfortunately.
In other words, the way mod_backhand's ByLoad function works would require
mod_psychic to be compiled as well. :)
2) This then leads to the natural thing that you were probably thinking
(?)... which is that ByLoad might end up pinging the destination server to
make sure it is up before distributing the load to it.
Unfortunately, I don't think that this is in ByLoad. Or at least it's not
documented at http://www.backhand.org/mod_backhand/
Also, Theo's slide http://www.backhand.org/ApacheCon2000/EU/img4.htm
explicitly x'ed out the fail-over part of mod_backhand as a solution.
However, the question is at what point the ping candicacy function would be
written. If you write it too early, you waste time pinging all the servers.
If you write it too late, you might have too few machines to test. Let's go
through an example of this..
Destination Servers 1,2,3,4,5,6,7,8 are mod_backhand'ed... Let's assume
that the load is lowest on the lowest # server and highest on the highest
number server.
A reasonable example of candicacy functions are the following: ByAge,
ByRandom, ByLog, ByLoad. Let's assume that servers 5-8 have just gone down
because someone decided to purchase one big UPS for all 4 servers instead
of separate ones ,and the UPS just burned out and also shorted out the
power when this happened causing all 4 servers to go down.
So let's say 5 seconds have gone by with requests..
1. ByAge says that they are all responded within the last 20 seconds (this
is the default)...
Now, this provides some fail over but 20 seconds can be a long time for a
server to be down and not weeded out. In this case, 5 seconds has gone by
and all 8 are seen by backhand as being up.
2. ByRandom randomizes the list (1,2,3,4,5,6,7,8) let's say this become
8,6,5,1,3,2,6,4
3. ByLog strips everything but the first log2(n) servers (where n is the
number of elements in the list). Thus, for 8 elements, we get 3 now. 8,6,5
4. ByLoad checks out the load and then distrubutes it to 5 which is the
lowest load. But whoops..., 5 is down. Remember 5-8 went down.
Now it would be smart to build a ping into ByLoad but that still wouldn't
help because actually 8,6,5 that are left after step 3 are all down too.
You also can't write a ByPing candicacy function that starts out because it
basically means every request generates pings to every server asking if
they are working which would be quite intense and it would defeat the
performance advantage of the multicast broadcast of status data.
The moral is that to be more accurate mod_backhand actually have to build
something into the candicacy function to tell it to start all candicacy
functions over from scratch and wipe that server off the list.
If all the servers are down, then there's nothing to be done. But at least
one will be up and this should be the chosen one.
However, my understanding from the mod_backhand talk and the documentation
is that fail over is not an issue that is discussed as a goal of
mod_backhand and that there are other products to recommend such as
Alteon/BIGip/whatever switches or other such fail over products.
Anyway, I think that to some degree it does make sense that within the
context of the original mod_backhand server distributing the connections,
there should be some fail over for the destinations to back up the ByAge
function at the very end of all the candicacy function processing.
Does anyone know if this facility exists in mod_backhand? Sorry for the
long winded mail. A lot of this I am writing here because I want to make
sure I am understanding mod_backhand properly myself by thinking it
through. So if I am wrong, someone is surely going to point it out. :)
Later,
Gunther