Guess breaking into two items: -detecting a failed puppet run when triggered via script/external apply -how many times to retry
For the former, you could try to use " --detailed-exitcodes" which should force a non-zero exit code, your script could detect that and act accordingly. Remember seeing a bug while back mentioned that you needed to assert that param on apply to force puppet to return non-zero on error. Not sure if still exists, or what version you are running but safe to probably try. As far as number of retries, all apps/services/etc could be different.., only specific point of view I would say is given the puppet apply has all data/attributes it needs to successfully converge, after two failed attempts you can safely assume failed, and then resort to log check to see what issue could be. One other aspect to consider is that the puppet converge could succeed but something outside causes a failure right after. Depending on resiliency you would want your process/other monitor to assert after a successful run, and restart the whole converge run again.., or just notify, or etc. Does that help? -----Original Message----- From: Konstantin Boudnik [mailto:[email protected]] Sent: Wednesday, December 10, 2014 4:08 PM To: [email protected] Cc: [email protected]; Nate D'Amico; Rich Subject: Re: Problem using puppet scripts to configure bigtop on AmazonLinux Rob, following on our IRC chat I will Cc here two guys from the community who know Puppet the best. Nate and Rich are likely to have the answer. Guys, if you can chime in on the topic - it'd be great! To reiterate it: you are looking to a way to automatically tell if a recipe has failed and repeat it, if required, right? On Sun, Nov 30, 2014 at 09:50PM, Leidle, Rob wrote: > Thanks Cos, > > This would be something that I would want to automate as it would be > running many times across many different clusters. Ideally I would fix > any issues causing the puppet scripts to not complete properly, but I > don╧t know how realistic that is in the short term so I would like to > setup retry logic if that is the recommended way of doing things. > That╧s why I was hoping for some direction on how often to run the retry. > > On 11/29/14, 5:12 PM, "Konstantin Boudnik" <[email protected]> wrote: > > >On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote: > >> Thanks Roman, > >> > >> I actually fixed the problem. I had an existing process monitoring > >>the daemon and restarting it if it terminated. However, puppet > >>encapsulates this so it is no longer needed. Also, this process was > >>causing the namenode service to terminate once. I removed my > >>existing monitoring process and everything is working fine. > >> > >> That being said is there a recommended number of times we should > >>retry the puppet scripts on failure? > > > >Good to see you're coming through! As for the retries: if something > >doesn't work I usually check the logs immediatelly. Sometimes after a > >second re-run. > > > >Cos > > >
