Hi Michael, From what I have experienced, let me try to answer the questions
>>> is this still the case, or is that propose change to master for some known >>> old problem that was meanwhile fixed in controller CDS infra? This is one of the objectives of moving to tell-based protocol. More context – here - https://bugs.opendaylight.org/show_bug.cgi?id=5280 which also contains the gerrit topic link related to the changes. This is disabled by default @ [1] >>> does it seem right to you that application code handles this? Like wouldn't >>> it be better if there was some configuration knob somewhere in controller >>> CDS to increase whatever timeout or retry counter >>> is behind when these >>> TransactionCommitFailedException caused by akka.pattern.AskTimeoutException >>> occur, to tune it to try harder/longer, and not throw any >>> TransactionCommitFailed? There are two situations of AskTimeouts which are typically predominant one for total idleness of transaction itself and other for overall transaction timeout a) operation-timeout-in-seconds – default 5 . this is very sporadic and almost never seen in releases latest releases b) shard-transaction-commit-timeout-in-seconds – default 30 , this is relatively more frequent in many cases of scale particularly in HA scenarios like restarts with configurations Both these parameters are part of [1] [1] $KARAF_HOME/etc/ org.opendaylight.controller.cluster.datastore.cfg Regards Muthu From: [email protected] [mailto:[email protected]] On Behalf Of Michael Vorburger Sent: Thursday, November 09, 2017 1:50 AM To: controller-dev Cc: [email protected]; Kency Kurian Subject: [controller-dev] Should application code persist do retries on TransactionCommitFailedException caused by AskTimeoutException or could CDS be configured to retry more? Tom and other controllerians, While code reviewing https://git.opendaylight.org/gerrit/#/c/61526/ for https://jira.opendaylight.org/browse/GENIUS-86, I learnt that, apparently (quote) "in scale testing, there are too many writes and reads over the network, and sometimes these AskTimeout exceptions occur due to the load, it is just that for sometime we are not able to reach the other side, but the nodes are all healthy, and it comes back soon", and wanted to know: 1. is this still the case, or is that propose change to master for some known old problem that was meanwhile fixed in controller CDS infra? 2. does it seem right to you that application code handles this? Like wouldn't it be better if there was some configuration knob somewhere in controller CDS to increase whatever timeout or retry counter is behind when these TransactionCommitFailedException caused by akka.pattern.AskTimeoutException occur, to tune it to try harder/longer, and not throw any TransactionCommitFailed? 3. when these do occur, is there really a "scenario where even though the transaction throws a TransactionCommitFailedException (caused by akka.pattern.AskTimeoutException) it eventually succeeds" ? That's what in c/61526 is being proposed to be added to the DataBrokerFailures test utility, to test such logic in application code... in DataBrokerFailuresImpl, it simulates a submit() that actually did go through and changed the DS (line 95 super.submit().get()) but then return immediateFailedCheckedFuture(submitException) anyway. Is that really what (under this scenario) could happen IRL at prod from CDS? That seems... weird, curious - so it's transactions are not really (always) transactionally to be trusted? ;) Tx, M. -- Michael Vorburger, Red Hat [email protected]<mailto:[email protected]> | IRC: vorburger @freenode | ~ = http://vorburger.ch<http://vorburger.ch/>
_______________________________________________ controller-dev mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/controller-dev
