Re: MemberFault event is lost forever when MB is down

Udara Liyanage Tue, 29 Jul 2014 21:27:13 -0700

Hi Akila,

+1
The core reason for this is Stratos function is heavily depending on the
message broker. Losing messages, or unavailability of the MB causes system
go into a problematic state. $subject is one of example scenario.
We should have a health monitoring system which does not depends on the MB.



On Wed, Jul 30, 2014 at 9:45 AM, Akila Ravihansa Perera <[email protected]>
wrote:

> Hi Devs,
>
> Current Stratos architecture relies heavily on high availability of
> the message broker. We faced a situation when MB is down, some of the
> messages published will get lost forever and the system state will
> never be recovered.
>
> One such example is, when a cartridge instance goes down the CEP
> component will identify this event and publish a MemberFault event to
> the MB's summarized-health-stat topic. But the problem is CEP
> component creates its own list of cartridge instance members by
> looking at health-stats published to MB - it does not consider the
> topology. Hence, when a cartridge instance goes down, MemberFault
> event will get fired only once. But if the MB is down at this time, it
> will cause this message to be lost forever resulting in an un-stable
> system state in which Stratos thinks a member exists but in reality it
> is not the case.
>
> We can introduce a simple house keeping task to check whether every
> member is alive. Ideally this should be auto-scaler's responsibility.
> It will allow the system to recover itself from an un-stable
> situation. I think this is a critical bug and should be given high
> priority.
>
> Please share your thoughts.
>
> --
> Akila Ravihansa Perera
> Software Engineer
> WSO2 Inc.
> http://wso2.com
>
> Blog: http://ravihansa3000.blogspot.com
>



-- 

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com
lean. enterprise. middleware

web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897

Re: MemberFault event is lost forever when MB is down

Reply via email to