Hi Akila,

Thank you for bringing this up. +1 for this as this would lead to an
unrecoverable state and we can not even find out where it went wrong.

I found another place in CloudControllerServiceImpl class which we need to
fix.

private void onClusterRemoval(final String clusterId) {
ClusterContext ctxt =
CloudControllerContext.getInstance().getClusterContext(clusterId);
TopologyBuilder.handleClusterRemoved(ctxt);
CloudControllerContext.getInstance().removeClusterContext(clusterId);
CloudControllerContext.getInstance().removeMemberContextsOfCluster(clusterId);
CloudControllerContext.getInstance().persist();
}

As Imesh suggested it would be better to scan the complete code base to
find out the places we need to fix.

Thanks.

On Tue, Sep 29, 2015 at 8:08 PM, Imesh Gunaratne <im...@apache.org> wrote:

> Yes indeed! As I found the problem is in CloudControllerUtil class:
>
> public static void persistTopology(Topology topology) {
>     try {
>         
> RegistryManager.getInstance().persist(CloudControllerConstants.TOPOLOGY_RESOURCE,
>  topology);
>     } catch (RegistryException e) {
>         String msg = "Failed to persist the Topology in registry. ";
>         log.fatal(msg, e);
>     }
> }
>
> We might need to scan the entire codebase for such occurrences.
>
> Thanks
>
>
> On Tue, Sep 29, 2015 at 4:26 PM, Reka Thirunavukkarasu <r...@wso2.com>
> wrote:
>
>> +1 for handing these dependent operations as atomic in order to avoid
>> inconsistency. It is a good thought.
>>
>> Better have the same approach for application model also as it has to
>> update monitor hierarchy and application model at the same time.
>>
>> Thanks,
>> Reka
>>
>> On Tue, Sep 29, 2015 at 1:57 AM, Akila Ravihansa Perera <
>> raviha...@wso2.com> wrote:
>>
>>> Hi,
>>>
>>> This is to bring your attention to possible inconsistency that could
>>> arise due to rare edge cases. For an eg: If you follow the piece of code at
>>> [1], handleMemberTerminated can return without successfully removing a
>>> member from topology but the next step will get executed which is to remove
>>> the member context from CC's context, thus leading to inconsistencies. This
>>> could result in permanent inconsistent state and system will never recover.
>>>
>>> I'm proposing the $subject to resolve this issue. I think simply
>>> throwing an exception if an operation is not successful and handling those
>>> exceptions gracefully should fix the problem rather than silently returning
>>> from the method call. Above is only one example and we may have to find
>>> such occurrences throughout the CC component since that is the only module
>>> which writes/updates the topology.
>>>
>>> However, we need to check whether same problem can occur for Application
>>> model maintained by AS and Tenant model maintained by SM.
>>> @Reka: Any thoughts?
>>>
>>> I've created the JIRA [2] to track this issue.
>>>
>>> [1]
>>> https://github.com/ravihansa3000/stratos/blob/stratos-4.1.x/components/org.apache.stratos.cloud.controller/src/main/java/org/apache/stratos/cloud/controller/services/impl/CloudControllerServiceUtil.java#L57
>>> [2] https://issues.apache.org/jira/browse/STRATOS-1578
>>>
>>> Thanks.
>>>
>>> --
>>> Akila Ravihansa Perera
>>> WSO2 Inc.;  http://wso2.com/
>>>
>>> Blog: http://ravihansa3000.blogspot.com
>>>
>>
>>
>>
>> --
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>> Mobile: +94776442007
>>
>>
>>
>
>
> --
> Imesh Gunaratne
>
> Senior Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>



-- 
*Dinithi De Silva*
Associate Software Engineer, WSO2 Inc.
m:+94716667655 | e:dinit...@wso2.com | w: www.wso2.com
| a: #20, Palm Grove, Colombo 03

Reply via email to