I'm not near a computer at the moment. Feel free to revert but please don't leave it at a place where it doesn't work for long running apps. Thanks On Feb 4, 2016 4:47 PM, "sebb" <seb...@gmail.com> wrote:
> On 4 February 2016 at 20:42, Sam Ruby <ru...@intertwingly.net> wrote: > > On Thu, Feb 4, 2016 at 3:15 PM, sebb <seb...@gmail.com> wrote: > >> On 4 February 2016 at 19:45, Sam Ruby <ru...@intertwingly.net> wrote: > >>> On Thu, Feb 4, 2016 at 12:55 PM, sebb <seb...@gmail.com> wrote: > >>>> On 4 February 2016 at 15:31, Sam Ruby <ru...@apache.org> wrote: > >>>>> Commit 60f24d0cab030cf6d64ce75c23d05b1ac33bdda7: > >>>>> if all hosts are down, give up > >>>>> > >>>>> Branch: refs/heads/master > >>>>> Author: Sam Ruby <ru...@intertwingly.net> > >>>>> Committer: Sam Ruby <ru...@intertwingly.net> > >>>>> Pusher: rubys <ru...@apache.org> > >>>>> > >>>>> ------------------------------------------------------------ > >>>>> lib/whimsy/asf/ldap.rb | ++ -- > >>>>> ------------------------------------------------------------ > >>>>> 4 changes: 2 additions, 2 deletions. > >>>>> ------------------------------------------------------------ > >>>>> > >>>>> > >>>>> diff --git a/lib/whimsy/asf/ldap.rb b/lib/whimsy/asf/ldap.rb > >>>>> index 0376c64..f73646b 100644 > >>>>> --- a/lib/whimsy/asf/ldap.rb > >>>>> +++ b/lib/whimsy/asf/ldap.rb > >>>>> @@ -73,7 +73,7 @@ def self.puppet_ldapservers > >>>>> > >>>>> # connect to LDAP > >>>>> def self.connect > >>>>> - loop do > >>>>> + host.length.times do > >>>> > >>>> -1 > >>>> > >>>> The original code works fine. > >>>> > >>>> The new loop tries to loop some 37 times (depending on the size of the > >>>> host name) and then fails with > >>>> > >>>> iteration reached an end (StopIteration) > >>>> > >>>> I think the original code works rather better. > >>> > >>> Can you explain how this works for me? > >>> > >>> What I see is a "loop do...end" with the only exit being on a success, > >>> and the code at the end of this method being unreachable. > >> > >> loop/end works in conjunction with next_host which is an enum. > >> enum.next generates StopIteration which is specifically caught by > loop/end. > >> > >> http://ruby-doc.org/core-1.9.3/StopIteration.html > > > > That feels a bit too tightly coupled to me. Having one method's > > implementation depend on another method's implementation and all that. > > Seems reasonable to me - it offers plenty of flexibility in designing > loops as the getter can be anywhere in the loop. > Otherwise the getter has to be part of the condition checking at the > start or end. > > > But be that as it may, I missed the fact that once you run out of your > > enumeration, it is game over. That doesn't work well with long > > running applications, like the board agenda. 37 failures and it is > > time to restart the server. > > In fact it's about 7 failures (or however many are in the host list) > > The loop maximum was 37 (approx) but it will only try each host once. > > > That doesn't work for me. > > > >>> I will say that whether there are 4 hosts or 50 hosts, trying exactly > >>> 37 times (depending on the length of an unrelated string) seems kinda > >>> weird. > >> > >> It does it 37 or so times because host holds the host URL, and that is > >> roughly its length. > > > > That seems totally random. > > > I could understand trying each host once. > > Which is what it does. > > > Or trying each host twice. Or trying something like [hosts.length, > > 3].min times. But trying a number of times based on a url length? > > See above, 37 approx is the *max* loop count. I think you > misunderstood me earlier. > > >> The host is the wrong variable to use. > > Hence the peculiar loop count. > > >> But so is the hosts array wrong, because that will always return the > >> original number of hosts, even if some have already been used up and > >> we are resuming the loop. > >> The loop will then fall off the end of the enum, generating a > >> StopIteration, which is not caught by the x.length.times do loop. > > > > I would rather next_host continuously provide hosts. > > OK, but this is a different design. > > My intention was to ensure that hosts that have already failed are not > tried again. > This makes sense for all the apps I have worked on. > > However I agree that does not sit well with long-running apps. > > > When it runs out > > of a batch, it starts a new set. I was too quick in scanning the code > > and presumed that's what the @he ||= was doing. > > > > Perhaps something like the following instead: > > > > @he = hosts.dup unless @he and not @he.empty? > > @he.pop > > > > The net effect is that it would try each host in turn. If there later > > is a failure, it will start over and loop back around if necessary > > until it finds a host that works. > > > > Thoughts? > > The host re-use behaviour should be selectable. > > There's no point looping round all the hosts again for the one-shot > apps that are going to be run again soon. > If all the hosts have failed, the likelihood is that they will do so > again if tried again immediately. > So just give up and hope the fault has cleared for the next run. > > For long-running apps, the situation is a bit different. > It's fine if each host is tried multiple times, so long as that > happens over a long period. > However there needs to be a guard against repeatedly trying all the > hosts in rapid succession. > > So I don't think simply resetting the list will do; there needs to be > a time delay. > Nor does it make sense to always pause before restarting. > Perhaps it would work to store the time when the list was last > started, and add a wait if necessary before restarting. > This would stop endless rapid recycling, but would still not be ideal > behaviour e.g. if the external network was down. > > Is there a stage when even the long-running apps need to give up? > And if so, how is that detected? > > Note: there was no auto retry at all until recently, so what do the > long-running apps do currently? > Do they try to recover from LDAP queries? > > If so, it seems more likely that the current design (StopIteration > generated and not caught by connect) is less likely to play well with > any recovery. > > So whatever the case, I think the change needs to be reverted. > > We can then improve the host re-use strategy for the long-running apps. > > > - Sam Ruby > > > >>> And failing with a StopIteration is less desirable than producing a > >>> clean message, such as "Failed to connect to any LDAP host". I can > >>> see the advantage of raising an exception over returning nil, but I > >>> would prefer a more meaningful message. > >> > >> It only *fails* with StopIteration with the updated code. > >> That is because the loop count is too large, so next_host is called > >> more times than there are hosts. > >> > >> With the previous code, next_host is also called one extra time, but > >> the generated StopIteration is caught by the loop/do, as designed. > >> > >> http://ruby-doc.org/core-1.9.3/StopIteration.html > >> > >> Try defining single failing URL in your .whimsy file and you will see > >> what I mean. > >> > >>> - Sam Ruby > >>> > >>> > >>>>> host = next_host > >>>>> Wunderbar.info "Connecting to LDAP server: #{host}" > >>>>> > >>>>> @@ -97,8 +97,8 @@ def self.connect > >>>>> Wunderbar.warn "Error connecting to LDAP server #{host}: > " + > >>>>> re.message + " (continuing)" > >>>>> end > >>>>> - > >>>>> end > >>>>> + > >>>>> Wunderbar.error "Failed to connect to any LDAP host" > >>>>> return nil > >>>>> end >