On Tue, Aug 9, 2016 at 3:31 PM William Reade <william.re...@canonical.com> wrote:
> I feel obliged to note that we also have axw's operation queue, used in > storageprovisioner, and that it's the only one which doesn't make the > assumption that the code being retried is the only important thing that > could possibly be happening in the calling context. > > All the other approaches discussed will leave us blocking 99 good machines > behind a single failure -- and while that's harmless-ish in the current > provisioner, because it has a strategy of waiting no more than 30s in > total, it's going to bite us hard as soon as we switch to a strategy that > actually backs off and keeps retrying long enough to be useful. > > Andrew: as I recall, that code is actually pretty general. Is there a > strong reason it's tucked away in an /internal/ package? > Nope. I did propose creating a new repo a while back, but it fell by the wayside: https://github.com/axw/juju-time. It should be a natural fit for the machine provisioner. Cheers, Andrew > On Tue, Aug 9, 2016 at 9:02 AM, roger peppe <roger.pe...@canonical.com> > wrote: > >> On 9 August 2016 at 07:28, roger peppe <roger.pe...@canonical.com> wrote: >> > On 9 August 2016 at 01:22, Katherine Cox-Buday >> > <katherine.cox-bu...@canonical.com> wrote: >> >> Hey All, >> >> >> >> We currently have 3 ways we're performing retries in Juju: >> >> >> >> 1. Archaic, custom, intra-package retry code. >> >> 2. github.com/juju/utils::{Countdown,BackoffTimer} >> <http://github.com/juju/utils::%7BCountdown,BackoffTimer%7D> >> >> 3. github.com/juju/retry (current best practice) >> > >> > There's a fourth way that you haven't mentioned, which fits >> > somewhere in between 1 and 2 I think (it was the first >> > explicit general retry code in Juju AFAIK), which is >> > utils.Attempt. >> > >> > I'd suggest that the way it's used matches the pattern >> > you'd like to write quite well: >> > >> > for attempt := strategy.State(); attempt.Next(); { >> >> Oops; s/State/Start/ of course. >> >> > Do something >> > } >> > >> > AFAIR this pattern was arrived at after some discussion between myself >> > and Gustavo. At the time I was thinking of some kind of >> > callback-based scheme like the others schemes you mention, but >> > I think that leaving the caller in control and the loop explicit >> > is actually nicer - the logic is clear to the reader and more >> > composable with other primitives (it is also a good match >> > for the pattern you propose). >> > >> > How about making AttemptStrategy a little more flexible? >> > It was always my intention to admit other kinds of strategy - if >> > a BackoffFunc field were added to AttemptStrategy and >> > a Stop argument added to AttemptStrategy.Start it would >> > become as general as retry.Call, while keeping the nice >> > usage pattern and general straightforwardness. >> > >> > cheers, >> > rog. >> > >> >> I have just touched some code which fits #1, and when I was attempting >> to retrofit it to #3, I found it a little obtuse. Here's why: >> >> >> >> In my case (and I assume most), I wanted something like this: >> >> >> >> for range retries { // Loop should break if our parent is cancelled >> >> // Do something >> >> // Success, retry, or fatal error >> >> } >> >> >> >> To implement this, I do something like this: >> >> >> >> args := retry.CallArgs{ >> >> Func: func() error { >> >> // Body of my loop >> >> }, >> >> BackoffFunc: func(delay time.Duration, attempt int) time.Duration { >> >> if attempt == 1 { >> >> return delay >> >> } >> >> return delay * factor >> >> }, >> >> Stop: dying, >> >> // etc... >> >> } >> >> >> >> Call(args) >> >> >> >> We attempt to encapsulate every scenario for retries in >> github.com/juju/retry/::CallArgs. We have variables to determine if >> errors are fatal, to notify things, how to back off, etc. This feels very >> heavy-weight to me, and a bit like the monolith model rather than the unix >> model. I would prefer the logic be left to the caller (i.e. do I break out >> of this loop? do I notify something?), and the interface into a retry >> strategy be much simpler, say a channel like in time.Tick(time.Duration) >> <-chan time.Time. >> >> >> >> How would folks feel about something like this? >> >> >> >> func BackoffTick(done <-chan struct{}, clock clock.Cock, delay >> time.Duration, factor int64) <-chan time.Time { >> >> signalStream := make(chan time.Time) >> >> go func() { >> >> defer close(signalStream) >> >> for { >> >> select { >> >> case <-done: >> >> return >> >> case signalStream <- time.After(delay): >> >> delay = >> time.Duration(delay.Nanoseconds() * factor) >> >> } >> >> } >> >> }() >> >> return signalStream >> >> } >> >> >> >> With this, the above becomes: >> >> >> >> for range BackoffTick(dying, myClock, 1*time.Second, 2) { >> >> // Determination of fatality and notifications happen here >> >> } >> >> >> >> If you want a max retry concept, you do this: >> >> >> >> for range Take(done, 3, BackoffTick(dying, myClock, 1*time.Second, 2)) >> { >> >> // Determination of fatality and notifications happen here >> >> } >> >> >> >> If you want a max duration you do this: >> >> >> >> done = Timeout(done, 5*time.Minute) >> >> for range Take(done, 3, BackoffTick(dying, myClock, 1*time.Second, 2)) >> { >> >> // Determination of fatality and notifications happen here >> >> } >> >> >> >> Functionally, what's in juju/retry is fine; and I can stuff anything I >> want into the function. It just feels a little odd to use in that I must >> put the body of my loop in a function, and I dislike that the CallArgs >> struct attempts to encapsulate a bunch of different scenarios. >> >> >> >> Thoughts? >> > >> > I'm not keen on the channel-based approach because if you want to >> > terminate early, you need to shut down the goroutine - it's just >> > one more resource to clean up. Also, the goroutine that's sending >> > the events is one step removed from what's going on in the loop - the >> > delay is calculated one step before the current iteration is complete. >> > >> > cheers, >> > rog. >> > >> >> >> >> Also, is it on the roadmap to address the inconsitant retries >> throughout Juju? I almost used utils.BackoffTimer until I started looking >> deeper. I'm going to submit a PR to flag it as deprecated. >> >> >> >> -- >> >> Katherine >> >> >> >> -- >> >> Juju-dev mailing list >> >> Juju-dev@lists.ubuntu.com >> >> Modify settings or unsubscribe at: >> https://lists.ubuntu.com/mailman/listinfo/juju-dev >> >> -- >> Juju-dev mailing list >> Juju-dev@lists.ubuntu.com >> Modify settings or unsubscribe at: >> https://lists.ubuntu.com/mailman/listinfo/juju-dev >> > >
-- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev