Hi, I’m guessing the fix for [1] will be in 4.1.0, right? I’m glad you managed to resolve both issues with 1 fix!
Thanks and best regards, Michiel [1] https://issues.apache.org/jira/browse/STRATOS-795 On 12 Sep 2014, at 15:57, Lahiru Sandaruwan <lahi...@wso2.com> wrote: > Ok cool, Let's resolve the Jira. > > On Fri, Sep 12, 2014 at 5:51 PM, Akila Ravihansa Perera <raviha...@wso2.com> > wrote: > Hi Lahiru, > > Yes, this is resolved now. Stratos will now check health stats against > the member list published by CC to topology topic (CompleteTopology > event). This will allow Stratos to recover from MB failures and also > server unavailable situations. > > Thanks. > > On Fri, Sep 12, 2014 at 5:03 PM, Lahiru Sandaruwan <lahi...@wso2.com> wrote: > > Hi Akila, > > > > Would [1] also be solved with the solution we talked here? > > > > Thanks. > > [1] https://issues.apache.org/jira/browse/STRATOS-795 > > > > On Thu, Aug 28, 2014 at 12:48 PM, Akila Ravihansa Perera > > <raviha...@wso2.com> wrote: > >> > >> Hi, > >> > >> Since we're using WSO2 CEP for monitoring faulty members, it would > >> make sense to enhance the Faulty Member window processor [1] to > >> recover from a core component failure. I have made some improvements > >> to this window processor and committed in [2]. > >> > >> CEP will now have an additional dependency for Stratos messaging > >> component (applicable only when using stand-alone CEP). Therefore it > >> can now listen to the topology topic events published by CC. CEP will > >> now check for cartridge agent health stats published by instances > >> against the member list published by CC in complete topology event. > >> Thus, even if the MemberFault event is lost in case of MB failure > >> Stratos can recover itself since it will periodically check against > >> member list published by CC. The code has been rigorously tested on > >> EC2 and OpenStack. > >> > >> The other possible alternative (as opposed to dependency with > >> messaging component) would be to create a new JMS input adaptor in CEP > >> and listen to topology topic. But with this approach we will have to > >> duplicate the messaging component model (topology structure) in CEP > >> window processor. This is an un-necessary duplication IMHO. > >> > >> However, with this dependency for messaging component in CEP, if a > >> user is deploying Stratos with a stand-alone CEP, then he will have to > >> manually copy the messaging component artifacts to CEP plugins > >> directory. > >> > >> Would appreciate your thoughts on this. > >> > >> [1] > >> https://github.com/apache/stratos/blob/master/extensions/cep/stratos-cep-extension/src/main/java/org/apache/stratos/cep/extension/FaultHandlingWindowProcessor.java > >> [2] > >> https://github.com/apache/stratos/commit/05e1ddc20a871b73b721487a13a2547cf9b8768d > >> > >> Thanks. > >> > >> On Wed, Jul 30, 2014 at 7:32 PM, Udara Liyanage <ud...@wso2.com> wrote: > >> > Hi Imesh, > >> > > >> > Yes any message will not be communicated when message broker is not > >> > available. > >> > > >> > > >> > On Wed, Jul 30, 2014 at 7:24 PM, Imesh Gunaratne <im...@apache.org> > >> > wrote: > >> >> > >> >> As I understood its not just the Member Fault event that is affected in > >> >> this scenario, any event that CEP publishes to message broker will > >> >> encounter > >> >> the same problem. > >> >> > >> >> > >> >> On Wed, Jul 30, 2014 at 5:49 AM, Michiel Blokzijl (mblokzij) > >> >> <mblok...@cisco.com> wrote: > >> >>> > >> >>> +1. > >> >>> > >> >>> If Stratos, or any component it relies on, fails, and eventually > >> >>> returns > >> >>> to service, Stratos should "orchestrate" the cloud back to the desired > >> >>> state. If any cartridges went missing and after some time T (post > >> >>> failure) > >> >>> Stratos hasn’t re-discovered them, they should be respawned. > >> >>> > >> >>> Best regards, > >> >>> > >> >>> Michiel > >> >>> > >> >>> > >> >>> On 30 Jul 2014, at 05:51, Isuru Haththotuwa <isu...@apache.org> wrote: > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> On Wed, Jul 30, 2014 at 9:45 AM, Akila Ravihansa Perera > >> >>> <raviha...@wso2.com> wrote: > >> >>>> > >> >>>> Hi Devs, > >> >>>> > >> >>>> Current Stratos architecture relies heavily on high availability of > >> >>>> the message broker. We faced a situation when MB is down, some of the > >> >>>> messages published will get lost forever and the system state will > >> >>>> never be recovered. > >> >>>> > >> >>>> One such example is, when a cartridge instance goes down the CEP > >> >>>> component will identify this event and publish a MemberFault event to > >> >>>> the MB's summarized-health-stat topic. But the problem is CEP > >> >>>> component creates its own list of cartridge instance members by > >> >>>> looking at health-stats published to MB - it does not consider the > >> >>>> topology. Hence, when a cartridge instance goes down, MemberFault > >> >>>> event will get fired only once. But if the MB is down at this time, > >> >>>> it > >> >>>> will cause this message to be lost forever resulting in an un-stable > >> >>>> system state in which Stratos thinks a member exists but in reality > >> >>>> it > >> >>>> is not the case. > >> >>>> > >> >>>> We can introduce a simple house keeping task to check whether every > >> >>>> member is alive. Ideally this should be auto-scaler's responsibility. > >> >>>> It will allow the system to recover itself from an un-stable > >> >>>> situation. I think this is a critical bug and should be given high > >> >>>> priority. > >> >>>> > >> >>>> Please share your thoughts. > >> >>> > >> >>> +1. We would need to decide what is the best method for this though. > >> >>> If > >> >>> we consider CEP the central point of decision making, another option > >> >>> is to > >> >>> make it listen to topology and get the correct decision. Or else, we > >> >>> can use > >> >>> a health check mechanism for the MB which can detect if the MB is down > >> >>> and > >> >>> replay any of the messages. This IMO can be very useful since the > >> >>> primary > >> >>> communication mechanism in Stratos is the MB. > >> >>> > >> >>> One other important thing is to have fail-over/HA for MB. There can be > >> >>> many other occasion if the MB is down, the system going to a undefined > >> >>> state > >> >>> due to loss of messages. > >> >>>> > >> >>>> > >> >>>> -- > >> >>>> Akila Ravihansa Perera > >> >>>> Software Engineer > >> >>>> WSO2 Inc. > >> >>>> http://wso2.com > >> >>>> > >> >>>> Blog: http://ravihansa3000.blogspot.com > >> >>>> > >> >>>> -- > >> >>>> Thanks and Regards, > >> >>>> > >> >>>> Isuru H. > >> >>>> +94 716 358 048 > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>> > >> >> > >> >> > >> >> > >> >> -- > >> >> Imesh Gunaratne > >> >> > >> >> Technical Lead, WSO2 > >> >> Committer & PPMC Member, Apache Stratos > >> > > >> > > >> > > >> > > >> > -- > >> > > >> > Udara Liyanage > >> > Software Engineer > >> > WSO2, Inc.: http://wso2.com > >> > lean. enterprise. middleware > >> > > >> > web: http://udaraliyanage.wordpress.com > >> > phone: +94 71 443 6897 > >> > >> > >> > >> -- > >> Akila Ravihansa Perera > >> WSO2 Inc > >> > >> Blog: http://ravihansa3000.blogspot.com > > > > > > > > > > -- > > -- > > Lahiru Sandaruwan > > Committer and PMC member, Apache Stratos, > > Senior Software Engineer, > > WSO2 Inc., http://wso2.com > > lean.enterprise.middleware > > > > email: lahi...@wso2.com cell: (+94) 773 325 954 > > blog: http://lahiruwrites.blogspot.com/ > > twitter: http://twitter.com/lahirus > > linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146 > > > > > > -- > Akila Ravihansa Perera > Software Engineer, WSO2 > > Blog: http://ravihansa3000.blogspot.com > > > > -- > -- > Lahiru Sandaruwan > Committer and PMC member, Apache Stratos, > Senior Software Engineer, > WSO2 Inc., http://wso2.com > lean.enterprise.middleware > > email: lahi...@wso2.com cell: (+94) 773 325 954 > blog: http://lahiruwrites.blogspot.com/ > twitter: http://twitter.com/lahirus > linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146 >
signature.asc
Description: Message signed with OpenPGP using GPGMail