Hi Akila,

+1
The core reason for this is Stratos function is heavily depending on the
message broker. Losing messages, or unavailability of the MB causes system
go into a problematic state. $subject is one of example scenario.
We should have a health monitoring system which does not depends on the MB.


On Wed, Jul 30, 2014 at 9:45 AM, Akila Ravihansa Perera <raviha...@wso2.com>
wrote:

> Hi Devs,
>
> Current Stratos architecture relies heavily on high availability of
> the message broker. We faced a situation when MB is down, some of the
> messages published will get lost forever and the system state will
> never be recovered.
>
> One such example is, when a cartridge instance goes down the CEP
> component will identify this event and publish a MemberFault event to
> the MB's summarized-health-stat topic. But the problem is CEP
> component creates its own list of cartridge instance members by
> looking at health-stats published to MB - it does not consider the
> topology. Hence, when a cartridge instance goes down, MemberFault
> event will get fired only once. But if the MB is down at this time, it
> will cause this message to be lost forever resulting in an un-stable
> system state in which Stratos thinks a member exists but in reality it
> is not the case.
>
> We can introduce a simple house keeping task to check whether every
> member is alive. Ideally this should be auto-scaler's responsibility.
> It will allow the system to recover itself from an un-stable
> situation. I think this is a critical bug and should be given high
> priority.
>
> Please share your thoughts.
>
> --
> Akila Ravihansa Perera
> Software Engineer
> WSO2 Inc.
> http://wso2.com
>
> Blog: http://ravihansa3000.blogspot.com
>



-- 

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com
lean. enterprise. middleware

web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897

Reply via email to