Hi, I've fixed most of these cases we discussed here. To summarize;
- Registry exceptions should always be thrown from persist() method and should be handled by the methods that are calling it - If service name, cluster id or member id is not found in the Topology, a RuntimeException should be thrown instead of returning from the method. Please go through the changes in [1] and share your thoughts. [1] https://github.com/apache/stratos/commit/8d46fab0f921e37d5f6da11f8006b68b3743226d Thanks. On Thu, Oct 1, 2015 at 12:14 PM, Imesh Gunaratne <im...@apache.org> wrote: > A good finding Dinithi! May be you can fix this for 4.1.4. > > Thanks > > On Wed, Sep 30, 2015 at 9:22 PM, Dinithi De Silva <dinit...@wso2.com> > wrote: > >> Hi Akila, >> >> Thank you for bringing this up. +1 for this as this would lead to an >> unrecoverable state and we can not even find out where it went wrong. >> >> I found another place in CloudControllerServiceImpl class which we need >> to fix. >> >> private void onClusterRemoval(final String clusterId) { >> ClusterContext ctxt = >> CloudControllerContext.getInstance().getClusterContext(clusterId); >> TopologyBuilder.handleClusterRemoved(ctxt); >> CloudControllerContext.getInstance().removeClusterContext(clusterId); >> >> CloudControllerContext.getInstance().removeMemberContextsOfCluster(clusterId); >> CloudControllerContext.getInstance().persist(); >> } >> >> As Imesh suggested it would be better to scan the complete code base to >> find out the places we need to fix. >> >> Thanks. >> >> On Tue, Sep 29, 2015 at 8:08 PM, Imesh Gunaratne <im...@apache.org> >> wrote: >> >>> Yes indeed! As I found the problem is in CloudControllerUtil class: >>> >>> public static void persistTopology(Topology topology) { >>> try { >>> >>> RegistryManager.getInstance().persist(CloudControllerConstants.TOPOLOGY_RESOURCE, >>> topology); >>> } catch (RegistryException e) { >>> String msg = "Failed to persist the Topology in registry. "; >>> log.fatal(msg, e); >>> } >>> } >>> >>> We might need to scan the entire codebase for such occurrences. >>> >>> Thanks >>> >>> >>> On Tue, Sep 29, 2015 at 4:26 PM, Reka Thirunavukkarasu <r...@wso2.com> >>> wrote: >>> >>>> +1 for handing these dependent operations as atomic in order to avoid >>>> inconsistency. It is a good thought. >>>> >>>> Better have the same approach for application model also as it has to >>>> update monitor hierarchy and application model at the same time. >>>> >>>> Thanks, >>>> Reka >>>> >>>> On Tue, Sep 29, 2015 at 1:57 AM, Akila Ravihansa Perera < >>>> raviha...@wso2.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> This is to bring your attention to possible inconsistency that could >>>>> arise due to rare edge cases. For an eg: If you follow the piece of code >>>>> at >>>>> [1], handleMemberTerminated can return without successfully removing a >>>>> member from topology but the next step will get executed which is to >>>>> remove >>>>> the member context from CC's context, thus leading to inconsistencies. >>>>> This >>>>> could result in permanent inconsistent state and system will never >>>>> recover. >>>>> >>>>> I'm proposing the $subject to resolve this issue. I think simply >>>>> throwing an exception if an operation is not successful and handling those >>>>> exceptions gracefully should fix the problem rather than silently >>>>> returning >>>>> from the method call. Above is only one example and we may have to find >>>>> such occurrences throughout the CC component since that is the only module >>>>> which writes/updates the topology. >>>>> >>>>> However, we need to check whether same problem can occur for >>>>> Application model maintained by AS and Tenant model maintained by SM. >>>>> @Reka: Any thoughts? >>>>> >>>>> I've created the JIRA [2] to track this issue. >>>>> >>>>> [1] >>>>> https://github.com/ravihansa3000/stratos/blob/stratos-4.1.x/components/org.apache.stratos.cloud.controller/src/main/java/org/apache/stratos/cloud/controller/services/impl/CloudControllerServiceUtil.java#L57 >>>>> [2] https://issues.apache.org/jira/browse/STRATOS-1578 >>>>> >>>>> Thanks. >>>>> >>>>> -- >>>>> Akila Ravihansa Perera >>>>> WSO2 Inc.; http://wso2.com/ >>>>> >>>>> Blog: http://ravihansa3000.blogspot.com >>>>> >>>> >>>> >>>> >>>> -- >>>> Reka Thirunavukkarasu >>>> Senior Software Engineer, >>>> WSO2, Inc.:http://wso2.com, >>>> Mobile: +94776442007 >>>> >>>> >>>> >>> >>> >>> -- >>> Imesh Gunaratne >>> >>> Senior Technical Lead, WSO2 >>> Committer & PMC Member, Apache Stratos >>> >> >> >> >> -- >> *Dinithi De Silva* >> Associate Software Engineer, WSO2 Inc. >> m:+94716667655 | e:dinit...@wso2.com | w: www.wso2.com >> | a: #20, Palm Grove, Colombo 03 >> > > > > -- > Imesh Gunaratne > > Senior Technical Lead, WSO2 > Committer & PMC Member, Apache Stratos > -- Akila Ravihansa Perera WSO2 Inc.; http://wso2.com/ Blog: http://ravihansa3000.blogspot.com