Re: Replication using totem protocol

Jules Gosnell Mon, 16 Jan 2006 16:12:34 -0800

lichtner wrote:

On Mon, 16 Jan 2006, Jules Gosnell wrote:

2. When an HTTP request arrives, if the cluster which received does not
have R copies then it blocks (it waits until there are.) This should in
data centers because partitions are likely to be very short-lived (aka
virtual partitions, which are due to congestion, not to any hardware
issue.)

Interesting. I was intending to actively repopulate the cluster
fragment, as soon as the split was detected. I figure that
- the longer that sessions spend without their full complement of
backups, the more likely that a further failure may result in data loss.
- the split is an exceptional cicumstance at which you would expect to
pay an exceptional cost (regenerating missing primaries from backups and
vice-versa)

by waiting for a request to arrive for a session before ensuring it has
its correct complement of backups, you extend the time during which it
is 'at risk'. By doing this 'lazily', you will also have to perform an
additional check on every request arrival, which you would not have to
do if you had regenerated missing state at the point that you noticed
the split.


Actually I didn't mean to say that you should do it lazily. You most
definitely do it aggressively, but I would not try to do _all_ the state
transfer ASAP, because this can kill availability.

Ah - OK, my misunderstanding - so you do it agressively but there isstill the possibility of a request arriving before you have finishedregenerating, so you handle that by holding it up - got you. I agree.

If I had to do the state transfer using totem I would use priority queues,
so that you know that while the system is doing state transfer it is still
operating at, say, 80% efficiency.

It was not about lazy vs. greedy.

I believe that if you put some spare capacity in your cluster you will get
good availability. For example, if your minimum R is 2 and the normal
operating value is 4, when a node fails you will not be frantically doing
state transfer.

OK - so your system is a little more relaxed about the exact number ofreplicants. You specify upper and lower bounds rather than an absolutenumber, then you move towards the upper bound when you have the capacity ?

3. If at any time an HTTP reaches a server which does not have itself a
replica of the session it sends a client redirect to a node which does.

WADI can relocate request to session, as you suggest (via redirect or
proxy), or session to request, by migration. Relocation of request
should scale better since requests are generally smaller and, in the web
tier, may run concurrently through the same session, whereas sessions
are generally larger and may only be migrated serially (since only one
copy at a time may be 'active').


I would also just send a redirect. I don't think it's worth relocating a
session.

If you can communicate the session's location to the load-balancer, thenI agree, but some load-balancers are pretty dumb :-)

and possibly migration of some session for
proper load balancing.

forcing the balancing of state around the cluster is something that I
have considered with WADI, but not yet tried to implement. The type of
load-balancer that is being used has a big impact here. If you cannot
communicate a change of session location satisfactorily to the Http load
balancer, then you have to just go with wherever it decides a session is
located.... With SFSBs we should have much more control at the client
side, so this becomes a real option.


In my opinion load balancing is not something that a cluster api can
address effectively. Half the problem is evaluating how busy the system is
in the first place.

agreed

all in all, though, it sounds like we see pretty much eye to eye :-)


Better than the other way ..

the lazy partition regeneration is an interesting idea and this is the
second time it has been suggested to me, so I will give it some serious
thought.


Again, I wasn't advocating lazy state transfer. But perhaps it has
applications somewhere.

understood - and I think a hybrid approach will probably just incur thecosts of both the other approaches - but I may still kick it around.



Jules

Thanks for taking the time to share your thoughts,


No problem.



--
"Open Source is a self-assembling organism. You dangle a piece of
string into a super-saturated solution and a whole operating-system
crystallises out around it."

/**********************************
* Jules Gosnell
* Partner
* Core Developers Network (Europe)
*
*    www.coredevelopers.net
*
* Open Source Training & Support.
**********************************/

Re: Replication using totem protocol

Reply via email to