Guess breaking into two items:

-detecting a failed puppet run when triggered via script/external apply
-how many times to retry

For the former, you could try to use " --detailed-exitcodes" which should force 
a non-zero exit code, your script could detect that and act accordingly.  
Remember seeing a bug while back mentioned that you needed to assert that param 
on apply to force puppet to return non-zero on error.  Not sure if still 
exists, or what version you are running but safe to probably try.

As far as number of retries, all apps/services/etc could be different.., only 
specific point of view I would say is given the puppet apply has all 
data/attributes it needs to successfully converge, after two failed attempts 
you can safely assume failed, and then resort to log check to see what issue 
could be.

One other aspect to consider is that the puppet converge could succeed but 
something outside causes a failure right after.  Depending on resiliency you 
would want your process/other monitor to assert after a successful run, and 
restart the whole converge run again.., or just notify, or etc.

Does that help?


-----Original Message-----
From: Konstantin Boudnik [mailto:[email protected]] 
Sent: Wednesday, December 10, 2014 4:08 PM
To: [email protected]
Cc: [email protected]; Nate D'Amico; Rich
Subject: Re: Problem using puppet scripts to configure bigtop on AmazonLinux

Rob,

following on our IRC chat I will Cc here two guys from the community who know 
Puppet the best. Nate and Rich are likely to have the answer. Guys, if you can 
chime in on the topic - it'd be great!

To reiterate it: you are looking to a way to automatically tell if a recipe has 
failed and repeat it, if required, right?

On Sun, Nov 30, 2014 at 09:50PM, Leidle, Rob wrote:
> Thanks Cos,
> 
> This would be something that I would want to automate as it would be 
> running many times across many different clusters. Ideally I would fix 
> any issues causing the puppet scripts to not complete properly, but I 
> don╧t know how realistic that is in the short term so I would like to 
> setup retry logic if that is the recommended way of doing things. 
> That╧s why I was hoping for some direction on how often to run the retry.
> 
> On 11/29/14, 5:12 PM, "Konstantin Boudnik" <[email protected]> wrote:
> 
> >On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
> >> Thanks Roman,
> >> 
> >> I actually fixed the problem. I had an existing process monitoring 
> >>the  daemon and restarting it if it terminated. However, puppet 
> >>encapsulates this  so it is no longer needed. Also, this process was 
> >>causing the namenode  service to terminate once. I removed my 
> >>existing monitoring process and  everything is working fine.
> >> 
> >> That being said is there a recommended number of times we should 
> >>retry the  puppet scripts on failure?
> >
> >Good to see you're coming through! As for the retries: if something 
> >doesn't work I usually check the logs immediatelly. Sometimes after a 
> >second re-run.
> >
> >Cos
> >
> 

Reply via email to