I thought I would present my current design for state replication. Please see
the enclosed diagram.
First, some terms :
n - the number of nodes in the cluster
s - a single copy of one node's state (a small coloured box)
b - the number of 'buddies' (n) in a partition and therefore copies of each 's'
(1)
n=3 b=3
State is replicated clockwise Red's state is backed up by Green and Yellow Green's state is backed up by Yellow and Red etc...
(2) introduction of a new node
n=4 b=3
Blue joins cluster
Yellow and Red take responsibility for backing up Blue's state (cuurently empty).
Blue takes responsibility for backing up Green and Red's state.
A copy of Green's state 'migrates' from Red to Blue
A copy of Red's state 'migrates' to Blue
Since blue entered with no state the total migration cost is 2s
(3) node loss
Blue leaves the cluster The lost copy of Red's state is replicated from Green to Yellow The lost copy of Green's state is replicated from Yellow to Red The lost copy of Blues state is replicated from Red/Yellow to Green The total migration cost is 3s
In summary
a node entering the cluster costs (b-1)*s a node leaving the cluster costs b*s
these costs are constant, regardless of the value of n (number of nodes in cluster)
breaking each node's state into more than the one piece suggested here and
scattering these more widely across a larger cluster will complicate the topology
and not reduce these costs.
You'll note that in (3) each node is now carrying the state of 4, not 3, nodes and that furthermore, if there were other nodes in the cluster, they would not be sharing this burden.
The solution is to balance the state around the cluster.
How/Whether this can be done depends upon your load-balancer as discussed in a previous posting.
If you can explicitly inform your load-balancer to 'unstick' a session from one node and 'restick' it to another, then you have a mechanism whereby you could split the state in the blue boxes up into smaller pieces, migrate them equally to all nodes in the cluster and notify the load-balancer of their new locations (i.e. divide the contents of the blue boxes and subsume them into R,G and Y boxes).
If you cannot coordinate with your load-balancer like this, you may have to wait for it to try to deliver a request to Blue and failover to another node. If you can predict which node this will be, you can proactively have migrated the state to it. If you cannot predict where this will be, you have to wait for the request to arrive and then reactively migrate the state in underneath it or then forward/redirect the request to the node that has the state.
This piece of decision making needs to be abstracted into an e.g. LoadBalancerPolicy class which can be plugged in to provide the correct behaviour for your deployment environment.
The passivation of aging sessions into a shared store can mitigate migration costs by reducing the number of active (in-vm) sessions that each node's state comprises, whilst maintaining their availability.
Migration should always be sourced from a non-primary node since there will be less contention on backed up state (which whill only be receiving writes) than on primary state (which will be the subject of reads and writes). Furthermore, using the scheme I presented for lazy demarshalling, you can see that the cost of marshalling backed-up state will be significantly less than that of marshalling primary state as potentially large object trees will already have been flattened into byte[].
The example above has only considered the web tier. If this is colocated with e.g. an EJB tier, then the cost of migration goes up, since web state is likely to have associations with EJB state which are best maintained locally. This means that if you migrate web state you should consider migrating the related EJB state along with it. So a wholistic approach which considers all tiers together is required.
I hope that the model that I am proposing here is sufficiently generic to be of use for the clustering of other tiers/services as well.
That's enough for now,
Jules
-- /************************************* * Jules Gosnell * Partner * Core Developers Network (Europe) * http://www.coredevelopers.net *************************************/
<<inline: Cluster.jpg>>
