On Wed, Jul 30, 2014 at 9:45 AM, Akila Ravihansa Perera <raviha...@wso2.com>
wrote:

> Hi Devs,
>
> Current Stratos architecture relies heavily on high availability of
> the message broker. We faced a situation when MB is down, some of the
> messages published will get lost forever and the system state will
> never be recovered.
>
> One such example is, when a cartridge instance goes down the CEP
> component will identify this event and publish a MemberFault event to
> the MB's summarized-health-stat topic. But the problem is CEP
> component creates its own list of cartridge instance members by
> looking at health-stats published to MB - it does not consider the
> topology. Hence, when a cartridge instance goes down, MemberFault
> event will get fired only once. But if the MB is down at this time, it
> will cause this message to be lost forever resulting in an un-stable
> system state in which Stratos thinks a member exists but in reality it
> is not the case.
>
> We can introduce a simple house keeping task to check whether every
> member is alive. Ideally this should be auto-scaler's responsibility.
> It will allow the system to recover itself from an un-stable
> situation. I think this is a critical bug and should be given high
> priority.
>
> Please share your thoughts.
>
+1. We would need to decide what is the best method for this though. If we
consider CEP the central point of decision making, another option is to
make it listen to topology and get the correct decision. Or else, we can
use a health check mechanism for the MB which can detect if the MB is down
and replay any of the messages. This IMO can be very useful since the
primary communication mechanism in Stratos is the MB.

One other important thing is to have fail-over/HA for MB. There can be many
other occasion if the MB is down, the system going to a undefined state due
to loss of messages.

>
> --
> Akila Ravihansa Perera
> Software Engineer
> WSO2 Inc.
> http://wso2.com
>
> Blog: http://ravihansa3000.blogspot.com
>
> --
> <http://ravihansa3000.blogspot.com>
> Thanks and Regards,
>
> Isuru H.
> <http://ravihansa3000.blogspot.com>
> +94 716 358 048 <http://ravihansa3000.blogspot.com>* <http://wso2.com/>*
>
>
> * <http://wso2.com/>*
>
>
>

Reply via email to