Tom and other controllerians, While code reviewing https://git.opendaylight.org/gerrit/#/c/61526/ for https://jira.opendaylight.org/browse/GENIUS-86, I learnt that, apparently (quote) "in scale testing, there are too many writes and reads over the network, and sometimes these AskTimeout exceptions occur due to the load, it is just that for sometime we are not able to reach the other side, but the nodes are all healthy, and it comes back soon", and wanted to know:
1. is this still the case, or is that propose change to master for some known old problem that was meanwhile fixed in controller CDS infra? 2. does it seem right to you that application code handles this? Like wouldn't it be better if there was some configuration knob somewhere in controller CDS to increase whatever timeout or retry counter is behind when these TransactionCommitFailedException caused by akka.pattern.AskTimeoutException occur, to tune it to try harder/longer, and not throw any TransactionCommitFailed? 3. when these do occur, is there really a "scenario where even though the transaction throws a TransactionCommitFailedException (caused by akka.pattern.AskTimeoutException) it eventually succeeds" ? That's what in c/61526 is being proposed to be added to the DataBrokerFailures test utility, to test such logic in application code... in DataBrokerFailuresImpl, it simulates a submit() that actually did go through and changed the DS (line 95 super.submit().get()) but then return immediateFailedCheckedFuture(submitException) anyway. Is that really what (under this scenario) could happen IRL at prod from CDS? That seems... weird, curious - so it's transactions are not really (always) transactionally to be trusted? ;) Tx, M. -- Michael Vorburger, Red Hat [email protected] | IRC: vorburger @freenode | ~ = http://vorburger.ch
_______________________________________________ controller-dev mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/controller-dev
