Tom and other controllerians,

While code reviewing https://git.opendaylight.org/gerrit/#/c/61526/ for
https://jira.opendaylight.org/browse/GENIUS-86, I learnt that, apparently
(quote) "in scale testing, there are too many writes and reads over the
network, and sometimes these AskTimeout exceptions occur due to the load,
it is just that for sometime we are not able to reach the other side, but
the nodes are all healthy, and it comes back soon", and wanted to know:

1. is this still the case, or is that propose change to master for some
known old problem that was meanwhile fixed in controller CDS infra?

2. does it seem right to you that application code handles this? Like
wouldn't it be better if there was some configuration knob somewhere in
controller CDS to increase whatever timeout or retry counter is behind when
these TransactionCommitFailedException caused by
akka.pattern.AskTimeoutException occur, to tune it to try harder/longer,
and not throw any TransactionCommitFailed?

3. when these do occur, is there really a "scenario where even though the
transaction throws a TransactionCommitFailedException (caused by
akka.pattern.AskTimeoutException) it eventually succeeds" ? That's what in
c/61526 is being proposed to be added to the DataBrokerFailures test
utility, to test such logic in application code... in
DataBrokerFailuresImpl, it simulates a submit() that actually did go
through and changed the DS (line 95 super.submit().get()) but then return
immediateFailedCheckedFuture(submitException) anyway. Is that really what
(under this scenario) could happen IRL at prod from CDS? That seems...
weird, curious - so it's transactions are not really (always)
transactionally to be trusted? ;)

Tx,
M.
--
Michael Vorburger, Red Hat
[email protected] | IRC: vorburger @freenode | ~ = http://vorburger.ch
_______________________________________________
controller-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to