Re: [controller-dev] Should application code persist do retries on TransactionCommitFailedException caused by AskTimeoutException or could CDS be configured to retry more?

Michael Vorburger Thu, 16 Nov 2017 10:50:27 -0800

On Thu, Nov 9, 2017 at 7:51 AM, Muthukumaran K <[email protected]>
wrote:


> Hi Michael,
>
>
>
> From what I have experienced, let me try to answer the questions
>
>
>
> >>> is this still the case, or is that propose change to master for some
> known old problem that was meanwhile fixed in controller CDS infra?
>
> This is one of the objectives of moving to tell-based protocol. More
> context – here - https://bugs.opendaylight.org/show_bug.cgi?id=5280 which
> also contains the gerrit topic link related to the changes. This is
> disabled by default @ [1]
>

Muthu, thank you for replying with these details here - really learnt
something here!

Tom, in your reply you pointed us to
https://git.opendaylight.org/gerrit/#/c/61002/. If this tell-based thing
works (does it?), this seems useful - so then shall it be enabled by
default, now?

Robert, https://git.opendaylight.org/gerrit/#/c/61002/ was from you, were
you planning to pick this up, for Oxygen?

FYI in https://git.opendaylight.org/gerrit/#/c/61526/ DataBrokerFailures
testutil is being extended to be able test application code which handles
this scenario.


>  >>> does it seem right to you that application code handles this? Like
> wouldn't it be better if there was some configuration knob somewhere in
> controller CDS to increase whatever timeout or retry counter >>> is behind
> when these TransactionCommitFailedException caused by 
> akka.pattern.AskTimeoutException
> occur, to tune it to try harder/longer, and not throw any
> TransactionCommitFailed?
>
> There are two situations of AskTimeouts which are typically predominant
> one for total idleness of transaction itself and other for overall
> transaction timeout
>
> a)      operation-timeout-in-seconds – default 5 . this is very sporadic
> and almost never seen in releases latest releases
>
> b)      shard-transaction-commit-timeout-in-seconds – default 30 , this
> is relatively more frequent in many cases of scale particularly in HA
> scenarios like restarts with configurations
>
> Both these parameters are part of [1]
>
>
>
> [1] $KARAF_HOME/etc/ org.opendaylight.controller.cluster.datastore.cfg
>
>
>
> Regards
>
> Muthu
>
>
>
>
>
>
>
> *From:* [email protected] [mailto:
> [email protected]] *On Behalf Of *Michael
> Vorburger
> *Sent:* Thursday, November 09, 2017 1:50 AM
> *To:* controller-dev
> *Cc:* [email protected]; Kency Kurian
> *Subject:* [controller-dev] Should application code persist do retries on
> TransactionCommitFailedException caused by AskTimeoutException or could
> CDS be configured to retry more?
>
>
>
> Tom and other controllerians,
>
>
>
> While code reviewing https://git.opendaylight.org/gerrit/#/c/61526/ for
> https://jira.opendaylight.org/browse/GENIUS-86, I learnt that, apparently
> (quote) "in scale testing, there are too many writes and reads over the
> network, and sometimes these AskTimeout exceptions occur due to the load,
> it is just that for sometime we are not able to reach the other side, but
> the nodes are all healthy, and it comes back soon", and wanted to know:
>
>
>
> 1. is this still the case, or is that propose change to master for some
> known old problem that was meanwhile fixed in controller CDS infra?
>
>
>
> 2. does it seem right to you that application code handles this? Like
> wouldn't it be better if there was some configuration knob somewhere in
> controller CDS to increase whatever timeout or retry counter is behind when
> these TransactionCommitFailedException caused by 
> akka.pattern.AskTimeoutException
> occur, to tune it to try harder/longer, and not throw any
> TransactionCommitFailed?
>
>
>
> 3. when these do occur, is there really a "scenario where even though the
> transaction throws a TransactionCommitFailedException (caused by
> akka.pattern.AskTimeoutException) it eventually succeeds" ? That's what
> in c/61526 is being proposed to be added to the DataBrokerFailures test
> utility, to test such logic in application code... in
> DataBrokerFailuresImpl, it simulates a submit() that actually did go
> through and changed the DS (line 95 super.submit().get()) but then return
> immediateFailedCheckedFuture(submitException) anyway. Is that really what
> (under this scenario) could happen IRL at prod from CDS? That seems...
> weird, curious - so it's transactions are not really (always)
> transactionally to be trusted? ;)
>
>
>
> Tx,
>
> M.
>
> --
>
> Michael Vorburger, Red Hat
> [email protected] | IRC: vorburger @freenode | ~ = http://vorburger.ch
>

_______________________________________________
controller-dev mailing list
[email protected]
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Re: [controller-dev] Should application code persist do retries on TransactionCommitFailedException caused by AskTimeoutException or could CDS be configured to retry more?

Reply via email to