At 12:21 PM 10/29/00 +0000, David Hodgkinson wrote:
>Gunther Birznieks <[EMAIL PROTECTED]> writes:
>
> > I am also concerned that the original question brings up the notion of
> > failover. mod_backhand is not a failover solution. Backhand does have some
> > facilities to do some failover (eg ByAge weeding) but it's not failover in
> > the traditional sense. Backhand is for load balance not failover.
>
>Are we talking about failing "out" a server that's lost the plot, or
>bringing a new server "in" as well? Isn't it just a case of defaulting
>the apparent load of a failed machine up really high (like infinite)?

This question gets into the realm of stuff that I am not really well 
qualified to answer. However, I think the way it works is that there are 
several candicacy functions that slowly wittle down the list of servers to 
direct a given request to.

The simulation of a failed machine defaulting to infinite load is a bit odd 
in mod_backhand for a couple reasons.

1) The ByLoad candicacy function relies on resource information having been 
broadcasted by potential backhand destinations not something that is 
collected by the backhand origin.

Should a backhand destination server go down, it will not broadcast itself 
and ByLoad will not know the resource update.  In my experience, few 
servers ever know they are going down before something catastrophic 
happens. They may complain about something but they don't know it's going 
down. Of course, there are cases when a machine knows it is on the doomed 
list, but I would argue that this a rare case unfortunately.

In other words, the way mod_backhand's ByLoad function works would require 
mod_psychic to be compiled as well. :)

2) This then leads to the natural thing that you were probably thinking 
(?)... which is that ByLoad might end up pinging the destination server to 
make sure it is up before distributing the load to it.

Unfortunately, I don't think that this is in ByLoad. Or at least it's not 
documented at http://www.backhand.org/mod_backhand/

Also, Theo's slide http://www.backhand.org/ApacheCon2000/EU/img4.htm 
explicitly x'ed out the fail-over part of mod_backhand as a solution.

However, the question is at what point the ping candicacy function would be 
written. If you write it too early, you waste time pinging all the servers. 
If you write it too late, you might have too few machines to test. Let's go 
through an example of this..

Destination Servers 1,2,3,4,5,6,7,8 are mod_backhand'ed... Let's assume 
that the load is lowest on the lowest # server and highest on the highest 
number server.

A reasonable example of candicacy functions are the following: ByAge, 
ByRandom, ByLog, ByLoad. Let's assume that servers 5-8 have just gone down 
because someone decided to purchase one big UPS for all 4 servers instead 
of separate ones ,and the UPS just burned out and also shorted out the 
power when this happened causing all 4 servers to go down.

So let's say 5 seconds have gone by with requests..

1. ByAge says that they are all responded within the last 20 seconds (this 
is the default)...

Now, this provides some fail over but 20 seconds can be a long time for a 
server to be down and not weeded out.  In this case, 5 seconds has gone by 
and all 8 are seen by backhand as being up.

2. ByRandom randomizes the list (1,2,3,4,5,6,7,8) let's say this become 
8,6,5,1,3,2,6,4

3. ByLog strips everything but the first log2(n) servers (where n is the 
number of elements in the list). Thus, for 8 elements, we get 3 now. 8,6,5

4. ByLoad checks out the load and then distrubutes it to 5 which is the 
lowest load. But whoops..., 5 is down. Remember 5-8 went down.

Now it would be smart to build a ping into ByLoad but that still wouldn't 
help because actually 8,6,5 that are left after step 3 are all down too.

You also can't write a ByPing candicacy function that starts out because it 
basically means every request generates pings to every server asking if 
they are working which would be quite intense and it would defeat the 
performance advantage of the multicast broadcast of status data.

The moral is that to be more accurate mod_backhand actually have to build 
something into the candicacy function to tell it to start all candicacy 
functions over from scratch and wipe that server off the list.

If all the servers are down, then there's nothing to be done. But at least 
one will be up and this should be the chosen one.

However, my understanding from the mod_backhand talk and the documentation 
is that fail over is not an issue that is discussed as a goal of 
mod_backhand and that there are other products to recommend such as 
Alteon/BIGip/whatever switches or other such fail over products.

Anyway, I think that to some degree it does make sense that within the 
context of the original mod_backhand server distributing the connections, 
there should be some fail over for the destinations to back up the ByAge 
function at the very end of all the candicacy function processing.

Does anyone know if this facility exists in mod_backhand? Sorry for the 
long winded mail. A lot of this I am writing here because I want to make 
sure I am understanding mod_backhand properly myself by thinking it 
through. So if I am wrong, someone is surely going to point it out. :)

Later,
    Gunther

Reply via email to