[controller-dev] cluster - recovery from dual failure

Alfasi, Shlomi Sun, 22 Jan 2017 05:53:50 -0800

Hi All,

I configured a clustered setup with 3 nodes (attached the akka.conf of one of 
the nodes).
At a specific time one of the members in the cluster was down and then I 
restarted another node.
In the restarted node I see that it fails to read information from the 
datastore and repetitively throw exceptions [1]
In the node that was always up, every 10 seconds there is a log that imply that 
the restarted node doesn't manage to join [2]


What is the expected behavior in this case? Is this state recoverable?

Shlomi

[1]
WARN  | ult-dispatcher-2 | DataStoreAppConfigMetadata       | 153 - 
org.opendaylight.controller.blueprint - 0.5.2.SNAPSHOT | 
org.opendaylight.netvirt.elanmanager-impl (elanConfig): Read of app config 
org.opend
aylight.yang.gen.v1.urn.opendaylight.netvirt.elan.config.rev150710.ElanConfig 
failed - retrying
ReadFailedException{message=Error executeRead ReadData for path 
/(urn:opendaylight:netvirt:elan:config?revision=2015-07-10)elan-config, 
errorList=[RpcError [message=Error executeRead ReadData for path 
/(urn:opendaylight:netvirt:elan:co
nfig?revision=2015-07-10)elan-config, severity=ERROR, errorType=APPLICATION, 
tag=operation-failed, applicationTag=null, info=null, 
cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException:
 Shard member-3-s
hard-default-config currently has no leader. Try again later.]]}

[2]
2017-01-22 15:19:56,290 | INFO  | lt-dispatcher-22 | 
kka://opendaylight-cluster-data) | 159 - com.typesafe.akka.slf4j - 2.4.7 | 
Cluster Node [akka.tcp://opendaylight-cluster-data@10.0.77.33:2550] - New 
incarnation of existing member [M
ember(address = akka.tcp://opendaylight-cluster-data@10.0.97.128:2550, status = 
Down)] is trying to join. Existing will be removed from the cluster and then 
new member will be allowed to join.

akka.conf
Description: akka.conf

_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

[controller-dev] cluster - recovery from dual failure

Reply via email to