[jboss-user] [Clustering/JBoss] - Providing Safe Buddly Replication Failover

dmurphy Mon, 05 May 2008 15:14:57 -0700

Hi - these question relate to establishing the safe operation of buddy 
replication under AS 4.0.5.


Selection of Buddies
Say we have nodes a1, a2 and a3 and they are booted in that order. What we see 
is that when a2 starts it forms a buddy pair with a1. Then when a3 starts a1 
becomes the backup for a3. So in this scenario a1 is backing up two nodes and 
a3 is backing up zero nodes. 

So the memory utilization across the nodes is unbalanced. (we now have logging 
around the session replication listener to analyse this behaviour)

This seems to be broken. Is this the way buddy replication should select 
buddies or do we have a config problem somewhere? What we need is that each 
node has the same amount of backup work (memory, cpu etc) overhead to even the 
load of providing replication across the cluster. 

Failover Operation
Having read the JBOSS doc I still need to understand more about the basic 
operation of failover. Currently we have no replication so if an app server 
node goes down we loose ~25% of users but the other 75% stays pretty 
operational. In practice, we will have a cluster of 6 app servers and during 
peak times we would see 2000-3000 users per node. 18K concurrent users in all.

What I am concerned about using buddy replication is that if a node goes down 
we could send other nodes down as well as they have to rapidly take over the 
work of the node that failed (a sort of domino affect). After reading the doc I 
still dont have a solid understanding of how this process works or the risks we 
might have.

Assume a2 backs up a1, a3 backs up a2 and a1 backs up a3. This is buddy 
replication with one backup buddy. All nodes are fronted by an F5 load balancer 
that provides sticky sessions and will redirect a user to a random node if the 
node with its original session fails.

So what, in detail, happens if a1 goes down? After the failure of a1 the F5 
will direct Some of a1's users to a2 and some to a3.

1) How does the cluster determine who is the new primary owner of a1's session 
data? Hopefully it will decide to use a2 since it already has a copy of a1's 
session cache.

2) For users directed to a3 by the F5 - how does a3 now populate its session 
cache to service those newly arriving users.

3) I assume the cluster also now picks a new buddy for a3 since it lost its 
buddy a1. In this case it will have to be a2 since there are no other nodes. So 
question is - what is the impact (network, cpu etc) on a2 and a3 to establish 
a2 as the new buddy relationship is established. What we are worried about is 
that both a2 and a3 now suddenly have a large group of new users to support as 
well as taking the resource hit to replicate each others session state.

Failover Best Practices
What are the buddy replicatoon 'best practices' that we should follow to 
provide safe and reliable failover in a heavily loaded cluster?

View the original post : 
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4148704#4148704

Reply to the post : 
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4148704
_______________________________________________
jboss-user mailing list
jboss-user@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/jboss-user

[jboss-user] [Clustering/JBoss] - Providing Safe Buddly Replication Failover

Reply via email to