Alex,
we seem to be using the same terms to mean very different things and this is leading to confusion.
I'll go first:
shared-store - many nodes storing their state in a common service.
replication - each node storing copies of it's state somewhere off-node (probably not in the same place as all the other nodes). I am suggesting that this 'other-place' is on the back of another web-container.
The implementation of the storage in either situation is not specified - I'm doing a logical design.
Affinity is load-balancing policy, which allows you to optimise caching (and possibly forget about session distribution entirely, if you can live with SPoFs), because you can predict where subsequent requests for a session will fall.
How do these compare to your definitions ?
Once we talk the same language, we can go forward :-)
Jules
Alex Blewitt wrote:
Let's number your suggestions 1,2,3 and look at them
(1) I call this one shared store. I think 'replication' is better than shared store for the following reasons :
- replication IS shared store, it's just the stores are in each and every node.
The difference between 'shared store' and 'replicated store' is that the latter doesn't scale as well as you add more nodes to the system. If I have 1 node and 1 db, then it makes no difference, but if I have 20 nodes and 1 db then a replicated store is going to generate 20x as much traffic as the db variant.
Usually the DB is on two machines (and clustered) so there isn't a SPOF.
- because your store is already on-node, you save a round trip to the remote store with every request for a session
(or loading it). There's a tradeoff between having the SPOF and replication, for sure.
- replication allows your cluster to be made up of homogeneous nodes rather than heterogeneous ones (i.e. web servers and session servers) - clusters are complex enough already and a management nightmare. My aim is to make them easy, out-of-the box, deployments....
Don't understand why replication allows this specifically. Do you mean that there's no need for a DB server to hold the store data? If so, I agree with you -- but in almost all situations there will be a DB that can be piggy-backed for this.
(2) This is affinity/sticky sessions right ?
The one node holding the session becomes a SPoF for the client who owns the session. As soon as you add backing up the session off-node to the equation you are back at the replication (my approach) or shared store (1) approach. Except that you have the optimisation you describe in (3)??
Yes, the one node does become a SPOF for that client. This may not be acceptable (but it may be).
(3) I'm not clear on exactly what is going on in this one :-) It looks like an optimisation (in the form of affinity) that you might use to prevent continually pulling the same unchanged session across a network from a shared store - right ? I will be doing exactly the same in my model. Mod_JK will be set up (dynamically and automagically) to route requests for sessions in particular bucket to ANY node in the partition in which the bucket resides. This is affinity at the partition level. I don't have to do it at the node level, since I can guarantee that the session is in-vm on every node in the partition. There is also, as far as i can tell, no way to tell Mod_JK to route to a node, but fail-over within a partition, at the moment. (Any mod_jk people reading this list ?).
It still had the SPOF concept, but instead of replicating the session data it merely proxied to the session store on the remote server.
So, actually, we are not far adrift :-) If you really like the DB idea, perhaps we could abstract away enough from what we are doing to allow a store to be a DB...
I think I like enough of the various possibilities to make it worthwhile to abstract an interface to allow this to happen in several ways. So, you could (say) start the DB-Session-Store or the Affinity-Store or the Replicated-Store depending on your deployment.
For a developer, for example, I'd probably go with the Affinity-Store/Replicated-Store on one machine. For multiple nodes, it might be easier to use the Replicated/DB-Stores, but IMHO the Replicated one may give a lot of headaches to implement :-)
One thing that probably is worth stating explicitly is that when you're using various stores, it may not be necessary to load the store in its entierity each time; just proxy through the session.getAttribute("thing") to the underlying store. The store may then choose to load the session as a whole, or may choose to only load attribute 'thing'.
It might also be worth doing a further abstracted layer to provide a generic 'replicated Map' which the Session then uses. Might come in handy for other things, who knows ...
Alex.
-- /************************************* * Jules Gosnell * Partner * Core Developers Network (Europe) * http://www.coredevelopers.net *************************************/
