Correct Ajay. We have done a bit of experiment in increasing the timeout and we
have not seen the recurrence.
We considered following options
1) Recreate affected Shard
a. We tried this using a “simulated shard-kill” and upon observance of
shard-kill, Shard Manager recreates the shard - it did work !! But we stay
away from this solution mainly because of DTCN side-effects . We do not know
how many applications would be tolerant to get the DTCNs all of a sudden
(similar to restart of node) in unexpected manner because silent Shard-Restart
implies that applications have to be thoroughly idempotent in handling DTCNs
across shards so that single shard recreate does not affect whatever state they
internally build-up via DTCNs
2) Restart entire controller
b. A more intrusive change would be to perfom bundle – 0 stop and let the
restart logic take care of restarting the node depending upon the environment
(systemd , pacemaker etc.) upon
onPersistFailure because anyway the system would
be useless if one shard stops completely. We are trying to get this correctly
working but have been unsuccessful so far.
Regards
Muthu
From: Ajay L [mailto:[email protected]]
Sent: Wednesday, November 15, 2017 1:46 AM
To: [email protected]
Cc: Muthukumaran K; Srini Seetharaman; Robert Varga; [email protected]; Sai
MarapaReddy
Subject: Re: [controller-dev] Circuit Breaker timed out
Hi All,
We are also seeing the "circuit breaker" error under heavy load. When this
happens, the affected shard is stopped and never restarted and I think the only
way to recover is to restart the node. I have opened
https://jira.opendaylight.org/browse/CONTROLLER-1789 to request better recovery
behavior. Increasing the akka journal persistence circuit-breaker call-timeout
value (default is 10s) does help in making it more tolerant to outage
Regards
Ajay
On Wed, Aug 16, 2017 at 2:23 AM, Robert Varga <[email protected]<mailto:[email protected]>>
wrote:
On 16/08/17 08:37, Muthukumaran K wrote:
> We have not tried on master branch (Nitrogen / Akka 2.5). Not sure if
> such an issue would go away with Akka 2.5 because the circuit breaker is
> primarily with LevelDB plugin.
>
Nitrogen is on akka-2.4.18. akka-2.5.x (and others) are staged for Oxygen.
Bye,
Robert
_______________________________________________
controller-dev mailing list
[email protected]<mailto:[email protected]>
https://lists.opendaylight.org/mailman/listinfo/controller-dev
_______________________________________________
controller-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/controller-dev