Hi Imesh,

Yes any message will not be communicated when message broker is not
available.


On Wed, Jul 30, 2014 at 7:24 PM, Imesh Gunaratne <im...@apache.org> wrote:

> As I understood its not just the Member Fault event that is affected in
> this scenario, any event that CEP publishes to message broker will
> encounter the same problem.
>
>
> On Wed, Jul 30, 2014 at 5:49 AM, Michiel Blokzijl (mblokzij) <
> mblok...@cisco.com> wrote:
>
>> +1.
>>
>> If Stratos, or any component it relies on, fails, and eventually returns
>> to service, Stratos should "orchestrate" the cloud back to the desired
>> state. If any cartridges went missing and after some time T (post failure)
>> Stratos hasn’t re-discovered them, they should be respawned.
>>
>> Best regards,
>>
>> Michiel
>>
>>
>> On 30 Jul 2014, at 05:51, Isuru Haththotuwa <isu...@apache.org> wrote:
>>
>>
>>
>>
>> On Wed, Jul 30, 2014 at 9:45 AM, Akila Ravihansa Perera <
>> raviha...@wso2.com> wrote:
>>
>>> Hi Devs,
>>>
>>> Current Stratos architecture relies heavily on high availability of
>>> the message broker. We faced a situation when MB is down, some of the
>>> messages published will get lost forever and the system state will
>>> never be recovered.
>>>
>>> One such example is, when a cartridge instance goes down the CEP
>>> component will identify this event and publish a MemberFault event to
>>> the MB's summarized-health-stat topic. But the problem is CEP
>>> component creates its own list of cartridge instance members by
>>> looking at health-stats published to MB - it does not consider the
>>> topology. Hence, when a cartridge instance goes down, MemberFault
>>> event will get fired only once. But if the MB is down at this time, it
>>> will cause this message to be lost forever resulting in an un-stable
>>> system state in which Stratos thinks a member exists but in reality it
>>> is not the case.
>>>
>>> We can introduce a simple house keeping task to check whether every
>>> member is alive. Ideally this should be auto-scaler's responsibility.
>>> It will allow the system to recover itself from an un-stable
>>> situation. I think this is a critical bug and should be given high
>>> priority.
>>>
>>> Please share your thoughts.
>>>
>> +1. We would need to decide what is the best method for this though. If
>> we consider CEP the central point of decision making, another option is to
>> make it listen to topology and get the correct decision. Or else, we can
>> use a health check mechanism for the MB which can detect if the MB is down
>> and replay any of the messages. This IMO can be very useful since the
>> primary communication mechanism in Stratos is the MB.
>>
>> One other important thing is to have fail-over/HA for MB. There can be
>> many other occasion if the MB is down, the system going to a undefined
>> state due to loss of messages.
>>
>>>
>>> --
>>> Akila Ravihansa Perera
>>> Software Engineer
>>> WSO2 Inc.
>>> http://wso2.com
>>>
>>> Blog: http://ravihansa3000.blogspot.com
>>>
>>> --
>>> <http://ravihansa3000.blogspot.com/>
>>> Thanks and Regards,
>>>
>>> Isuru H.
>>> <http://ravihansa3000.blogspot.com/>
>>> +94 716 358 048 <http://ravihansa3000.blogspot.com/>*
>>> <http://wso2.com/>*
>>>
>>>
>>> * <http://wso2.com/>*
>>>
>>>
>>>
>>
>
>
> --
> Imesh Gunaratne
>
> Technical Lead, WSO2
> Committer & PPMC Member, Apache Stratos
>



-- 

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com
lean. enterprise. middleware

web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897

Reply via email to