On Thu, Nov 9, 2017 at 7:51 AM, Muthukumaran K <[email protected]> wrote:
> Hi Michael, > > > > From what I have experienced, let me try to answer the questions > > > > >>> is this still the case, or is that propose change to master for some > known old problem that was meanwhile fixed in controller CDS infra? > > This is one of the objectives of moving to tell-based protocol. More > context – here - https://bugs.opendaylight.org/show_bug.cgi?id=5280 which > also contains the gerrit topic link related to the changes. This is > disabled by default @ [1] > Muthu, thank you for replying with these details here - really learnt something here! Tom, in your reply you pointed us to https://git.opendaylight.org/gerrit/#/c/61002/. If this tell-based thing works (does it?), this seems useful - so then shall it be enabled by default, now? Robert, https://git.opendaylight.org/gerrit/#/c/61002/ was from you, were you planning to pick this up, for Oxygen? FYI in https://git.opendaylight.org/gerrit/#/c/61526/ DataBrokerFailures testutil is being extended to be able test application code which handles this scenario. > >>> does it seem right to you that application code handles this? Like > wouldn't it be better if there was some configuration knob somewhere in > controller CDS to increase whatever timeout or retry counter >>> is behind > when these TransactionCommitFailedException caused by > akka.pattern.AskTimeoutException > occur, to tune it to try harder/longer, and not throw any > TransactionCommitFailed? > > There are two situations of AskTimeouts which are typically predominant > one for total idleness of transaction itself and other for overall > transaction timeout > > a) operation-timeout-in-seconds – default 5 . this is very sporadic > and almost never seen in releases latest releases > > b) shard-transaction-commit-timeout-in-seconds – default 30 , this > is relatively more frequent in many cases of scale particularly in HA > scenarios like restarts with configurations > > Both these parameters are part of [1] > > > > [1] $KARAF_HOME/etc/ org.opendaylight.controller.cluster.datastore.cfg > > > > Regards > > Muthu > > > > > > > > *From:* [email protected] [mailto: > [email protected]] *On Behalf Of *Michael > Vorburger > *Sent:* Thursday, November 09, 2017 1:50 AM > *To:* controller-dev > *Cc:* [email protected]; Kency Kurian > *Subject:* [controller-dev] Should application code persist do retries on > TransactionCommitFailedException caused by AskTimeoutException or could > CDS be configured to retry more? > > > > Tom and other controllerians, > > > > While code reviewing https://git.opendaylight.org/gerrit/#/c/61526/ for > https://jira.opendaylight.org/browse/GENIUS-86, I learnt that, apparently > (quote) "in scale testing, there are too many writes and reads over the > network, and sometimes these AskTimeout exceptions occur due to the load, > it is just that for sometime we are not able to reach the other side, but > the nodes are all healthy, and it comes back soon", and wanted to know: > > > > 1. is this still the case, or is that propose change to master for some > known old problem that was meanwhile fixed in controller CDS infra? > > > > 2. does it seem right to you that application code handles this? Like > wouldn't it be better if there was some configuration knob somewhere in > controller CDS to increase whatever timeout or retry counter is behind when > these TransactionCommitFailedException caused by > akka.pattern.AskTimeoutException > occur, to tune it to try harder/longer, and not throw any > TransactionCommitFailed? > > > > 3. when these do occur, is there really a "scenario where even though the > transaction throws a TransactionCommitFailedException (caused by > akka.pattern.AskTimeoutException) it eventually succeeds" ? That's what > in c/61526 is being proposed to be added to the DataBrokerFailures test > utility, to test such logic in application code... in > DataBrokerFailuresImpl, it simulates a submit() that actually did go > through and changed the DS (line 95 super.submit().get()) but then return > immediateFailedCheckedFuture(submitException) anyway. Is that really what > (under this scenario) could happen IRL at prod from CDS? That seems... > weird, curious - so it's transactions are not really (always) > transactionally to be trusted? ;) > > > > Tx, > > M. > > -- > > Michael Vorburger, Red Hat > [email protected] | IRC: vorburger @freenode | ~ = http://vorburger.ch >
_______________________________________________ controller-dev mailing list [email protected] https://lists.opendaylight.org/mailman/listinfo/controller-dev
