A good finding Dinithi! May be you can fix this for 4.1.4.

Thanks

On Wed, Sep 30, 2015 at 9:22 PM, Dinithi De Silva <dinit...@wso2.com> wrote:

> Hi Akila,
>
> Thank you for bringing this up. +1 for this as this would lead to an
> unrecoverable state and we can not even find out where it went wrong.
>
> I found another place in CloudControllerServiceImpl class which we need to
> fix.
>
> private void onClusterRemoval(final String clusterId) {
> ClusterContext ctxt =
> CloudControllerContext.getInstance().getClusterContext(clusterId);
> TopologyBuilder.handleClusterRemoved(ctxt);
> CloudControllerContext.getInstance().removeClusterContext(clusterId);
>
> CloudControllerContext.getInstance().removeMemberContextsOfCluster(clusterId);
> CloudControllerContext.getInstance().persist();
> }
>
> As Imesh suggested it would be better to scan the complete code base to
> find out the places we need to fix.
>
> Thanks.
>
> On Tue, Sep 29, 2015 at 8:08 PM, Imesh Gunaratne <im...@apache.org> wrote:
>
>> Yes indeed! As I found the problem is in CloudControllerUtil class:
>>
>> public static void persistTopology(Topology topology) {
>>     try {
>>         
>> RegistryManager.getInstance().persist(CloudControllerConstants.TOPOLOGY_RESOURCE,
>>  topology);
>>     } catch (RegistryException e) {
>>         String msg = "Failed to persist the Topology in registry. ";
>>         log.fatal(msg, e);
>>     }
>> }
>>
>> We might need to scan the entire codebase for such occurrences.
>>
>> Thanks
>>
>>
>> On Tue, Sep 29, 2015 at 4:26 PM, Reka Thirunavukkarasu <r...@wso2.com>
>> wrote:
>>
>>> +1 for handing these dependent operations as atomic in order to avoid
>>> inconsistency. It is a good thought.
>>>
>>> Better have the same approach for application model also as it has to
>>> update monitor hierarchy and application model at the same time.
>>>
>>> Thanks,
>>> Reka
>>>
>>> On Tue, Sep 29, 2015 at 1:57 AM, Akila Ravihansa Perera <
>>> raviha...@wso2.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> This is to bring your attention to possible inconsistency that could
>>>> arise due to rare edge cases. For an eg: If you follow the piece of code at
>>>> [1], handleMemberTerminated can return without successfully removing a
>>>> member from topology but the next step will get executed which is to remove
>>>> the member context from CC's context, thus leading to inconsistencies. This
>>>> could result in permanent inconsistent state and system will never recover.
>>>>
>>>> I'm proposing the $subject to resolve this issue. I think simply
>>>> throwing an exception if an operation is not successful and handling those
>>>> exceptions gracefully should fix the problem rather than silently returning
>>>> from the method call. Above is only one example and we may have to find
>>>> such occurrences throughout the CC component since that is the only module
>>>> which writes/updates the topology.
>>>>
>>>> However, we need to check whether same problem can occur for
>>>> Application model maintained by AS and Tenant model maintained by SM.
>>>> @Reka: Any thoughts?
>>>>
>>>> I've created the JIRA [2] to track this issue.
>>>>
>>>> [1]
>>>> https://github.com/ravihansa3000/stratos/blob/stratos-4.1.x/components/org.apache.stratos.cloud.controller/src/main/java/org/apache/stratos/cloud/controller/services/impl/CloudControllerServiceUtil.java#L57
>>>> [2] https://issues.apache.org/jira/browse/STRATOS-1578
>>>>
>>>> Thanks.
>>>>
>>>> --
>>>> Akila Ravihansa Perera
>>>> WSO2 Inc.;  http://wso2.com/
>>>>
>>>> Blog: http://ravihansa3000.blogspot.com
>>>>
>>>
>>>
>>>
>>> --
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>> Mobile: +94776442007
>>>
>>>
>>>
>>
>>
>> --
>> Imesh Gunaratne
>>
>> Senior Technical Lead, WSO2
>> Committer & PMC Member, Apache Stratos
>>
>
>
>
> --
> *Dinithi De Silva*
> Associate Software Engineer, WSO2 Inc.
> m:+94716667655 | e:dinit...@wso2.com | w: www.wso2.com
> | a: #20, Palm Grove, Colombo 03
>



-- 
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Reply via email to