Re: [controller-dev] cluster - recovery from dual failure

Tom Pantelis Sun, 22 Jan 2017 06:30:03 -0800

This is a side effect of how akka clustering works. All unreachable nodes
must first become reachable again, or the status of the unreachable nodes
must be changed to 'Down', either manually or auto-downed.  You can enable
auto-downing but akka doesn't recommend it in production (
http://doc.akka.io/docs/akka/current/java/cluster-usage.html).


On Sun, Jan 22, 2017 at 8:53 AM, Alfasi, Shlomi <shlomi.alf...@hpe.com>
wrote:

> Hi All,
>
>
>
> I configured a clustered setup with 3 nodes (attached the akka.conf of one
> of the nodes).
>
> At a specific time one of the members in the cluster was down and then I
> restarted another node.
>
> In the restarted node I see that it fails to read information from the
> datastore and repetitively throw exceptions [1]
>
> In the node that was always up, every 10 seconds there is a log that imply
> that the restarted node doesn’t manage to join [2]
>
>
>
> What is the expected behavior in this case? Is this state recoverable?
>
>
>
> Shlomi
>
>
>
> [1]
>
> WARN  | ult-dispatcher-2 | DataStoreAppConfigMetadata       | 153 -
> org.opendaylight.controller.blueprint - 0.5.2.SNAPSHOT |
> org.opendaylight.netvirt.elanmanager-impl (elanConfig): Read of app
> config org.opend
>
> aylight.yang.gen.v1.urn.opendaylight.netvirt.elan.config.rev150710.ElanConfig
> failed - retrying
>
> ReadFailedException{message=Error executeRead ReadData for path
> /(urn:opendaylight:netvirt:elan:config?revision=2015-07-10)elan-config,
> errorList=[RpcError [message=Error executeRead ReadData for path
> /(urn:opendaylight:netvirt:elan:co
>
> nfig?revision=2015-07-10)elan-config, severity=ERROR,
> errorType=APPLICATION, tag=operation-failed, applicationTag=null,
> info=null, 
> cause=org.opendaylight.controller.md.sal.common.api.data.DataStoreUnavailableException:
> Shard member-3-s
>
> hard-default-config currently has no leader. Try again later.]]}
>
>
>
> [2]
>
> 2017-01-22 15:19:56,290 | INFO  | lt-dispatcher-22 |
> kka://opendaylight-cluster-data) | 159 - com.typesafe.akka.slf4j - 2.4.7
> | Cluster Node [akka.tcp://opendaylight-cluster-data@10.0.77.33:2550] -
> New incarnation of existing member [M
>
> ember(address = akka.tcp://opendaylight-cluster-data@10.0.97.128:2550,
> status = Down)] is trying to join. Existing will be removed from the
> cluster and then new member will be allowed to join.
>
>
>
> _______________________________________________
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>

_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Re: [controller-dev] cluster - recovery from dual failure

Reply via email to