Hi Sanjaya, Sorry for not replying sooner. Comments inline as usual marked with [RA].
Regards, Rajith On 2/8/07, Sanjaya Karunasena <[EMAIL PROTECTED]> wrote:
Hi Rujith, Please fine my comments inline. Regards, Sanjaya On Thursday 08 February 2007 09:10, Rajith Attapattu wrote: > Hey Sanjaya, > > It is indeed turning out to be a good conversation. > comments inline. > > Regards, > > Rajith > > On 2/7/07, Sanjaya Karunasena < [EMAIL PROTECTED]> wrote: > > [SK] So why not use Synapse? Of course there is an option of embedding a > > simple but a fast load balancer as the default load balancer. It is always > good to have different configurations available for different requirements > when it comes to application development. Only thing required is some good > documentation. > [RA] Syanpse is certainly an option > > > [SK] Certainly, starting with small steps is always important and work. > > But let's have the discussion going so that we keep an eye on the final > goal while doing that. > [RA] Totally agree. > > > Message ordering provided by the communication framework to work, it > > should be notified on all the dependant events. However there is a cost > associate with this. The question is where do you invest? Which approach > handle concurrency with the least cost? > > > Let me explain how total ordering going to work. In total ordering > > message > > m+1 is delivered to all the recipients before message m is delivered. When > event > execution is synchronous, the event will be automatically blocked > until the message is delivered. > > > This way if a write happen at time t and if a read start concurrently at > > time t+1 the event will be automatically blocked until the write is > delivered to all the recipients. Which event occurred first (happen before) > can be determined using the Lamport algorithm. > > [RA] If we block for reads and to be sure that nobody is writing to it > while we are reading it then we need to wait till we have the "virtual > token", since no node can write until they aquire the token. This will be > very slow. Isn't it ? > This maybe acceptable for writes, but for obvious performance reasons, we > will have to live with dirty reads. > Also to block the service from reading or writing cannot be done w/o > modifications/impact to the kernal, which is going to be shot down for sure > > :) > > I am already getting a beating for performance :) [SK] I can see in the mailing list :-). But if you carefully analyze it, with locking you do the same without having any reliability guarantees plus the additional burden of having to tackle distributed locking. However, IMO whether to live with dirty reads or not should be the choice of application developer. We have no right to lock them in to some thing. Obviously then they will look for some other solution where they have more flexibility. Like I earlier stated, performance is important but if you want scalability, reliability, etc you have to give in some of it. You can't eat your cake and have it too. :-) But yes, when there is no clustering, the new code added should not degrade performance. What we need to understand is, a reliable group communication framework allows us to make some assumption on reliability, message ordering etc which helps us to simplify the algorithms in the upper layers.
[RA] that was exactly my point. Keep the upper levels as simple as possible and allow the infrastrucutre to do the magic. That is why I was opposed to block reads and updates at the axis2 level to enforce total data integrity. If the underlying implementation can do some magic then great, but lets keep the core axis2 code simple. So there is a saving for
the initial cost we inccur. Certainly, we should evaluate the return on investment.
[RA] I agree that we shouldn't impose a locking strategy or anything on the end user and we need to provide a choice. Currently (as per the disscussions we had on this subject) there are a couple of decesions we are allowing an end user. These options can be combined to create an acceptable solution. See if these are acceptable to you. 1) Choice of replication/group communication framework. If we implement the ClusterManager with several group communication frameworks (Richocet, Tribes, Evs4J ....etc) then the end user can choose a stratergy that best fits there need. All these frameworks has different level of garuntees forreliability, scalability, performance ..etc. 2) Choice of replication strategy. a) Container managed. - We have predetermined replication points to replicate state. Currently this is at the end of an invocation. b) Service managed - The service author decides the replication points and the frequency in which it is called. In a sticky session use case (a) would be acceptable as the same service wan't be accessed concurrently in two nodes (hopefully :). Also in an active passive use case (a) is the most reasonable choice. If "business logic" is executed aysnchronously or if the invocation is long running or if there are no sticky sessions then (b) would be a safe bet. The service author can call updateState or flush when ever he/she thinks it's appropriate. It could be after every property change or some other criteria. This coupled with a group communication framework that implements total ordering would ensure the desired result (or atleast something close)
> > A relaxed approach is to use causal ordering of messages, if the causal > > order of events can be determined. There, events for which the order cannot > be determined, is treated as independent and does not enforce any ordering. > > [RA] The paper on TOTEM claims same performance as casual ordering or even > FIFO delivery. But not sure how accurate that claim is. > > > Sounds very expensive ha.... :-) But if you really look at it, locking > > techniques essentially does the same with giving you the additional > overhead of tackling distributed deadlocks. > [RA] Well the research paper says so :) > This approach is good if we replicate attributes as and when a change > occurs. But if a service does too many writes during a invocation it will > be a big performance issue and increase network chatter considerably. If > they update the same variable several times during an operation it would be > waste of resources. > If we replicate at the end of an invocation the chances of conflicts go > up. In such a case, distributable locking maybe a more viable solution. > [SK] The multicasting techniques used by group communication frameworks are not like IP multicasting. Messages only get delivered to the group. An algorithm which tackle distributed locking need to also do some thing very similar. I couldn't find any research work on this. Well, this could be the one :-). But you are right, this is not going to work if we replicate things at the end of an invocation. At the same time we need to evaluate the question raised by Chamikara.
[RA] Yes, so WADI seems to be the only infrastrucutre that does locking. I am not for or against distributed locking or total ordering. I am against about making those decesions at the axis2 level (refer to my previous emails). I think the strategy should come from the replication framework we use.
> [SK] OK I think I got your point. But then it nullify the ability make > > use > > of the real power provided by the underneath messaging infrastructure. It > will be only used as a multicasting channel and we have to come up with > techniques to tackle every thing else. > [RA] Not sure I understand you here (as to how it nullifies the ability to > leverage ...). Can you explain this a bit more ? [SK] As you may have already read, following are some of the attractive properties of a Reliable Group Communication environment. * Virtual Synchrony * Reliable multicast with Message Ordering * Group membership services Due to many layers they have implemented to tackle these, moment you employ one of these, you are absorbing a cost. So, then we need to seek to get the maximum out of it. But let me read bit more on some of the work which is lready done on this area.
JBoss clustering implementation may be worth
looking at. Following two papers are also worth looking at. http://citeseer.ist.psu.edu/amir94robust.html http://citeseer.ist.psu.edu/327241.html
[RA] As I said I am not against total ordering :) Tribes is not IP multicasting either. It's a group membership communication environment built on peer-to-peer com. I personally don't know if IP multicasting or peer-to-peer is best. (I have heard all types of arguments on this topic) JGroups doesn't implement virtual synchrony either. It does have an experimental version on totem which not production quality. Besdies the license is LGPL. Richochet does not have group membership and it is based on IP multicasting (Chamikara correct me if I am wrong). WADI is built on top of Tribes (or other group coms) and provides distributed locking. I am interested on all these technologies and is not for or against any. Lets experiment with them as time permits and let the end user choose what they want depending on their situation.
> >[SK] Have you checked Appia and stuffed developed at Cornell? As I told > > you > > we may get away with causal ordering too. > [RA] We talked with Prof Ken Birman and looked at Ricochet Thats what they > recommended us. > The problem with Ricochet is that it doesn't have membership. But it does > have some interesting guarantees about performance especially when the no > of nodes go up. But this was a year ago. I am thinking about restarting the > discussion. They may have added membership to Ricochet. > I am actually interested in doing another clustering impl with Ricochet. > (now that we have some ground work in place) > [SK] Do you mean the membership service?
[RA] Yes. Anyway, I am talking about a different way of doing these stuff. So this
certainly need some research investment.
[RA] What did u mean by different way? I am sorry I think I didn't understand the context clearly.
Regards, > > Rajith. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
