Hi Michael,

From what I have experienced, let me try to answer the questions

>>> is this still the case, or is that propose change to master for some known 
>>> old problem that was meanwhile fixed in controller CDS infra?
This is one of the objectives of moving to tell-based protocol. More context – 
here - https://bugs.opendaylight.org/show_bug.cgi?id=5280 which also contains 
the gerrit topic link related to the changes.
This is disabled by default @ [1]


>>> does it seem right to you that application code handles this? Like wouldn't 
>>> it be better if there was some configuration knob somewhere in controller 
>>> CDS to increase whatever timeout or retry counter >>> is behind when these 
>>> TransactionCommitFailedException caused by akka.pattern.AskTimeoutException 
>>> occur, to tune it to try harder/longer, and not throw any 
>>> TransactionCommitFailed?
There are two situations of AskTimeouts which are typically predominant one for 
total idleness of transaction itself and other for overall transaction timeout

a)      operation-timeout-in-seconds – default 5 . this is very sporadic and 
almost never seen in releases latest releases

b)      shard-transaction-commit-timeout-in-seconds – default 30 , this is 
relatively more frequent in many cases of scale particularly in HA scenarios 
like restarts with configurations
Both these parameters are part of [1]

[1] $KARAF_HOME/etc/ org.opendaylight.controller.cluster.datastore.cfg

Regards
Muthu



From: [email protected] 
[mailto:[email protected]] On Behalf Of Michael 
Vorburger
Sent: Thursday, November 09, 2017 1:50 AM
To: controller-dev
Cc: [email protected]; Kency Kurian
Subject: [controller-dev] Should application code persist do retries on 
TransactionCommitFailedException caused by AskTimeoutException or could CDS be 
configured to retry more?

Tom and other controllerians,

While code reviewing https://git.opendaylight.org/gerrit/#/c/61526/ for 
https://jira.opendaylight.org/browse/GENIUS-86, I learnt that, apparently 
(quote) "in scale testing, there are too many writes and reads over the 
network, and sometimes these AskTimeout exceptions occur due to the load, it is 
just that for sometime we are not able to reach the other side, but the nodes 
are all healthy, and it comes back soon", and wanted to know:

1. is this still the case, or is that propose change to master for some known 
old problem that was meanwhile fixed in controller CDS infra?

2. does it seem right to you that application code handles this? Like wouldn't 
it be better if there was some configuration knob somewhere in controller CDS 
to increase whatever timeout or retry counter is behind when these 
TransactionCommitFailedException caused by akka.pattern.AskTimeoutException 
occur, to tune it to try harder/longer, and not throw any 
TransactionCommitFailed?

3. when these do occur, is there really a "scenario where even though the 
transaction throws a TransactionCommitFailedException (caused by 
akka.pattern.AskTimeoutException) it eventually succeeds" ? That's what in 
c/61526 is being proposed to be added to the DataBrokerFailures test utility, 
to test such logic in application code... in DataBrokerFailuresImpl, it 
simulates a submit() that actually did go through and changed the DS (line 95 
super.submit().get()) but then return 
immediateFailedCheckedFuture(submitException) anyway. Is that really what 
(under this scenario) could happen IRL at prod from CDS? That seems... weird, 
curious - so it's transactions are not really (always) transactionally to be 
trusted? ;)

Tx,
M.
--
Michael Vorburger, Red Hat
[email protected]<mailto:[email protected]> | IRC: vorburger @freenode | 
~ = http://vorburger.ch<http://vorburger.ch/>
_______________________________________________
controller-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to