A good finding Dinithi! May be you can fix this for 4.1.4. Thanks
On Wed, Sep 30, 2015 at 9:22 PM, Dinithi De Silva <dinit...@wso2.com> wrote: > Hi Akila, > > Thank you for bringing this up. +1 for this as this would lead to an > unrecoverable state and we can not even find out where it went wrong. > > I found another place in CloudControllerServiceImpl class which we need to > fix. > > private void onClusterRemoval(final String clusterId) { > ClusterContext ctxt = > CloudControllerContext.getInstance().getClusterContext(clusterId); > TopologyBuilder.handleClusterRemoved(ctxt); > CloudControllerContext.getInstance().removeClusterContext(clusterId); > > CloudControllerContext.getInstance().removeMemberContextsOfCluster(clusterId); > CloudControllerContext.getInstance().persist(); > } > > As Imesh suggested it would be better to scan the complete code base to > find out the places we need to fix. > > Thanks. > > On Tue, Sep 29, 2015 at 8:08 PM, Imesh Gunaratne <im...@apache.org> wrote: > >> Yes indeed! As I found the problem is in CloudControllerUtil class: >> >> public static void persistTopology(Topology topology) { >> try { >> >> RegistryManager.getInstance().persist(CloudControllerConstants.TOPOLOGY_RESOURCE, >> topology); >> } catch (RegistryException e) { >> String msg = "Failed to persist the Topology in registry. "; >> log.fatal(msg, e); >> } >> } >> >> We might need to scan the entire codebase for such occurrences. >> >> Thanks >> >> >> On Tue, Sep 29, 2015 at 4:26 PM, Reka Thirunavukkarasu <r...@wso2.com> >> wrote: >> >>> +1 for handing these dependent operations as atomic in order to avoid >>> inconsistency. It is a good thought. >>> >>> Better have the same approach for application model also as it has to >>> update monitor hierarchy and application model at the same time. >>> >>> Thanks, >>> Reka >>> >>> On Tue, Sep 29, 2015 at 1:57 AM, Akila Ravihansa Perera < >>> raviha...@wso2.com> wrote: >>> >>>> Hi, >>>> >>>> This is to bring your attention to possible inconsistency that could >>>> arise due to rare edge cases. For an eg: If you follow the piece of code at >>>> [1], handleMemberTerminated can return without successfully removing a >>>> member from topology but the next step will get executed which is to remove >>>> the member context from CC's context, thus leading to inconsistencies. This >>>> could result in permanent inconsistent state and system will never recover. >>>> >>>> I'm proposing the $subject to resolve this issue. I think simply >>>> throwing an exception if an operation is not successful and handling those >>>> exceptions gracefully should fix the problem rather than silently returning >>>> from the method call. Above is only one example and we may have to find >>>> such occurrences throughout the CC component since that is the only module >>>> which writes/updates the topology. >>>> >>>> However, we need to check whether same problem can occur for >>>> Application model maintained by AS and Tenant model maintained by SM. >>>> @Reka: Any thoughts? >>>> >>>> I've created the JIRA [2] to track this issue. >>>> >>>> [1] >>>> https://github.com/ravihansa3000/stratos/blob/stratos-4.1.x/components/org.apache.stratos.cloud.controller/src/main/java/org/apache/stratos/cloud/controller/services/impl/CloudControllerServiceUtil.java#L57 >>>> [2] https://issues.apache.org/jira/browse/STRATOS-1578 >>>> >>>> Thanks. >>>> >>>> -- >>>> Akila Ravihansa Perera >>>> WSO2 Inc.; http://wso2.com/ >>>> >>>> Blog: http://ravihansa3000.blogspot.com >>>> >>> >>> >>> >>> -- >>> Reka Thirunavukkarasu >>> Senior Software Engineer, >>> WSO2, Inc.:http://wso2.com, >>> Mobile: +94776442007 >>> >>> >>> >> >> >> -- >> Imesh Gunaratne >> >> Senior Technical Lead, WSO2 >> Committer & PMC Member, Apache Stratos >> > > > > -- > *Dinithi De Silva* > Associate Software Engineer, WSO2 Inc. > m:+94716667655 | e:dinit...@wso2.com | w: www.wso2.com > | a: #20, Palm Grove, Colombo 03 > -- Imesh Gunaratne Senior Technical Lead, WSO2 Committer & PMC Member, Apache Stratos