Hi Akila, +1 The core reason for this is Stratos function is heavily depending on the message broker. Losing messages, or unavailability of the MB causes system go into a problematic state. $subject is one of example scenario. We should have a health monitoring system which does not depends on the MB.
On Wed, Jul 30, 2014 at 9:45 AM, Akila Ravihansa Perera <raviha...@wso2.com> wrote: > Hi Devs, > > Current Stratos architecture relies heavily on high availability of > the message broker. We faced a situation when MB is down, some of the > messages published will get lost forever and the system state will > never be recovered. > > One such example is, when a cartridge instance goes down the CEP > component will identify this event and publish a MemberFault event to > the MB's summarized-health-stat topic. But the problem is CEP > component creates its own list of cartridge instance members by > looking at health-stats published to MB - it does not consider the > topology. Hence, when a cartridge instance goes down, MemberFault > event will get fired only once. But if the MB is down at this time, it > will cause this message to be lost forever resulting in an un-stable > system state in which Stratos thinks a member exists but in reality it > is not the case. > > We can introduce a simple house keeping task to check whether every > member is alive. Ideally this should be auto-scaler's responsibility. > It will allow the system to recover itself from an un-stable > situation. I think this is a critical bug and should be given high > priority. > > Please share your thoughts. > > -- > Akila Ravihansa Perera > Software Engineer > WSO2 Inc. > http://wso2.com > > Blog: http://ravihansa3000.blogspot.com > -- Udara Liyanage Software Engineer WSO2, Inc.: http://wso2.com lean. enterprise. middleware web: http://udaraliyanage.wordpress.com phone: +94 71 443 6897