Re: [Axis2] Adding ClusterManager code the the codebase

Rajith Attapattu Mon, 12 Feb 2007 16:08:23 -0800

Hi Sanjaya,

Sorry for not replying sooner.
Comments inline as usual marked with [RA].


Regards,

Rajith

On 2/8/07, Sanjaya Karunasena <[EMAIL PROTECTED]> wrote:


Hi Rujith,

Please fine my comments inline.

Regards,
Sanjaya

On Thursday 08 February 2007 09:10, Rajith Attapattu wrote:
> Hey Sanjaya,
>
> It is indeed turning out to be a good conversation.
> comments inline.
>
> Regards,
>
> Rajith
>
> On 2/7/07, Sanjaya Karunasena < [EMAIL PROTECTED]> wrote:
> > [SK] So why not use Synapse? Of course there is an option of embedding
a
>
> simple but a fast load balancer as the default load balancer. It is
always
> good to have different configurations available for different
requirements
> when it comes to application development. Only thing required is some
good
> documentation.
> [RA] Syanpse is certainly an option
>
> > [SK] Certainly, starting with small steps is always important and
work.
>
> But let's have the discussion going so that we keep an eye on the final
> goal while doing that.
> [RA] Totally agree.
>
> > Message ordering provided by the communication framework to work, it
>
> should be notified on all the dependant events. However there is a cost
> associate with this. The question is where do you invest? Which approach
> handle concurrency with the least cost?
>
> > Let me explain how total ordering going to work. In total ordering
> > message
>
> m+1 is delivered to all the recipients before message m is delivered.
When
> event > execution is synchronous, the event will be automatically
blocked
> until the message is delivered.
>
> >  This way if a write happen at time t and if a read start concurrently
at
>
> time t+1 the event will be automatically blocked until the write is
> delivered to all the recipients. Which event occurred first (happen
before)
> can be determined using  the Lamport algorithm.
>
> [RA] If we block for reads and to be sure that nobody is writing to it
> while we are reading it then we need to wait till we have the "virtual
> token", since no node can write until they aquire the token. This will
be
> very slow. Isn't it ?
> This maybe acceptable for writes, but for obvious performance reasons,
we
> will have to live with dirty reads.
> Also to block the service from reading or writing cannot be done w/o
> modifications/impact to the kernal, which is going to be shot down for
sure
>
> :)
>
> I am already getting a beating for performance :)

[SK] I can see in the mailing list :-). But if you carefully analyze it,
with
locking you do the same without having any reliability guarantees plus the
additional burden of having to tackle distributed locking.

However, IMO whether to live with dirty reads or not should be the choice
of
application developer. We have no right to lock them in to some thing.
Obviously then they will look for some other solution where they have more
flexibility.

Like I earlier stated, performance is important but if you want
scalability,
reliability, etc you have to give in some of it. You can't eat your cake
and
have it too. :-)

But yes, when there is no clustering, the new code added should not
degrade
performance.

What we need to understand is, a reliable group communication framework
allows
us to make some assumption on reliability, message ordering etc which
helps
us to simplify the algorithms in the upper layers.


[RA] that was exactly my point. Keep the upper levels as simple as possible
and allow the infrastrucutre to do the magic.
That is why I was opposed to block reads and updates at the axis2 level to
enforce total data integrity. If the underlying implementation can do some
magic then great, but lets keep the core axis2 code simple.

So there is a saving for

the initial cost we inccur. Certainly, we should evaluate the return on
investment.



[RA] I agree that we shouldn't impose a locking strategy or anything on the
end user and we need to provide a choice.

Currently (as per the disscussions we had on this subject) there are a
couple of decesions we are allowing an end user. These options can be
combined to create an acceptable solution. See if these are acceptable to
you.

1) Choice of replication/group communication framework.
  If we implement the ClusterManager with several group communication
frameworks (Richocet, Tribes, Evs4J ....etc) then the end user can choose a
stratergy that best fits there need. All these frameworks has different
level of garuntees forreliability, scalability, performance ..etc.

2) Choice of replication strategy.
 a) Container managed. - We have predetermined replication points to
replicate state. Currently this is at the end of an invocation.
 b) Service managed - The service author decides the replication points and
the frequency in which it is called.

In a sticky session use case (a) would be acceptable as the same service
wan't be accessed concurrently in two nodes (hopefully :).
Also in an active passive use case (a) is the most reasonable choice.

If "business logic" is executed aysnchronously or if the invocation is long
running or if there are no sticky sessions then (b) would be a safe bet.
The service author can call updateState or flush when ever he/she thinks
it's appropriate. It could be after every property change or some other
criteria. This coupled with a group communication framework that implements
total ordering would ensure the desired result (or atleast something close)


> >  A relaxed approach is to use causal ordering of messages, if the
causal
>
> order of events can be determined. There, events for which the order
cannot
> be determined, is treated as independent and does not enforce any
ordering.
>
> [RA] The paper on TOTEM claims same performance as casual ordering or
even
> FIFO delivery. But not sure how accurate that claim is.
>
> > Sounds very expensive ha....  :-) But if you really look at it,
locking
>
> techniques essentially does the same with giving you the additional
> overhead of tackling distributed deadlocks.
> [RA] Well the research paper says so :)
>  This approach is good if we replicate attributes as and when a change
> occurs. But if a service does too many writes during a invocation it
will
> be a big performance issue and increase network chatter considerably. If
> they update the same variable several times during an operation it would
be
> waste of resources.
>   If we replicate at the end of an invocation the chances of conflicts
go
> up. In such a case, distributable locking maybe a more viable solution.
>

[SK] The multicasting techniques used by group communication frameworks
are
not like IP multicasting. Messages only get delivered to the group. An
algorithm which tackle distributed locking need to also do some thing very
similar. I couldn't find any research work on this. Well, this could be
the
one :-). But you are right, this is not going to work if we replicate
things
at the end of an invocation. At the same time we need to evaluate the
question raised by Chamikara.



[RA] Yes, so WADI seems to be the only infrastrucutre that does locking.
I am not for or against distributed locking or total ordering. I am against
about making those decesions at the axis2 level (refer to my previous
emails). I think the strategy should come from the replication framework we
use.

> [SK] OK I think I got your point. But then it nullify the ability make
> > use
>
> of the real power provided by the underneath messaging infrastructure.
It
> will be only used as a multicasting channel and we have to come up with
> techniques to tackle every thing else.
> [RA] Not sure I understand you here (as to how it nullifies the ability
to
> leverage ...). Can you explain this a bit more ?

[SK] As you may have already read, following are some of the attractive
properties of a Reliable Group Communication environment.

* Virtual Synchrony
* Reliable multicast with Message Ordering
* Group membership services

Due to many layers they have implemented to tackle these, moment you
employ
one of these, you are absorbing a cost. So, then we need to seek to get
the
maximum out of it. But let me read bit more on some of the work which is
lready done on this area.

JBoss clustering implementation may be worth

looking at. Following two papers are also worth looking at.

http://citeseer.ist.psu.edu/amir94robust.html
http://citeseer.ist.psu.edu/327241.html



[RA] As I said I am not against  total ordering :)
Tribes is not IP multicasting either. It's a group membership communication
environment built on peer-to-peer com.
I personally don't know if IP multicasting or peer-to-peer is best. (I have
heard all types of arguments on this topic)

JGroups doesn't implement virtual synchrony either. It does have an
experimental version on totem which not production quality.
Besdies the license is LGPL.

Richochet does not have group membership and it is based on IP multicasting
(Chamikara correct me if I am wrong).

WADI is built on top of Tribes (or other group coms) and provides
distributed locking.

I am interested on all these technologies and is not for or against any.
Lets experiment with them as time permits and let the end user choose what
they want depending on their situation.


> >[SK] Have you checked Appia and stuffed developed at Cornell? As I told
> > you
>
> we may get away with causal ordering too.
> [RA] We talked with Prof Ken Birman and looked at Ricochet Thats what
they
> recommended us.
> The problem with Ricochet is that it doesn't have membership. But it
does
> have some interesting guarantees about performance especially when the
no
> of nodes go up. But this was a year ago. I am thinking about restarting
the
> discussion. They may have added membership to Ricochet.
> I am actually interested in doing another clustering impl with Ricochet.
> (now that we have some ground work in place)
>

[SK] Do you mean the membership service?


[RA] Yes.


Anyway, I am talking about a different way of doing these stuff. So this

certainly need some research investment.



[RA] What did u mean by different way? I am sorry I think I didn't
understand the context clearly.

Regards,
>
> Rajith.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [Axis2] Adding ClusterManager code the the codebase

Reply via email to