Rob,Puppet itself does not provide for this type of capability, but
like most other config management solutions can be used to install and
configure packages that do. So if service has a configuration topology
 that handles some high availability mode, Puppet can configure this.
Similarly, as an example, if you wanted to use a process manager
solution like monit, you can write or leverage Puppet modules that
configure this to manage and monitor the daemons you wanted to better
protect. 
A general way to describe what most configuration management systems
do with respect to to high availability is that they are not involved
in a loop of  detecting errors and events and responding with
configuration changes although some systems are starting to tackle
things like "configuration triggers where configuration changes can be
triggered based on detected events". My view however in most cases
much better to do this using the underlying service's mechanism,
process management solutions or other infrastructure focused on high
availability if available.
-Rich

----- Original Message -----
From: "Leidle Rob" 
To:"[email protected]" , "Konstantin Boudnik" ,
"[email protected]" 
Cc:"Rich" 
Sent:Thu, 11 Dec 2014 17:37:41 +0000
Subject:Re: Problem using puppet scripts to configure bigtop on
AmazonLinux

 Thanks Nate, this is exactly what I was looking for. One more
question — 
 does puppet have any mechanism for monitoring service daemons and 
 restarting them in the case where they have a catastrophic
failure/crash? 
 How do others in the Bigtop world deal with high availability and
ensuring 
 that processes are restarted when they inappropriately terminate?
Does 
 anyone have this kind of need?

 On 12/11/14, 12:26 AM, "Nate D'Amico"  wrote:

 >Guess breaking into two items:
 >
 >-detecting a failed puppet run when triggered via script/external
apply
 >-how many times to retry
 >
 >For the former, you could try to use " --detailed-exitcodes" which
should 
 >force a non-zero exit code, your script could detect that and act 
 >accordingly. Remember seeing a bug while back mentioned that you
needed 
 >to assert that param on apply to force puppet to return non-zero on 
 >error. Not sure if still exists, or what version you are running but

 >safe to probably try.
 >
 >As far as number of retries, all apps/services/etc could be
different.., 
 >only specific point of view I would say is given the puppet apply
has all 
 >data/attributes it needs to successfully converge, after two failed 
 >attempts you can safely assume failed, and then resort to log check
to 
 >see what issue could be.
 >
 >One other aspect to consider is that the puppet converge could
succeed 
 >but something outside causes a failure right after. Depending on 
 >resiliency you would want your process/other monitor to assert after
a 
 >successful run, and restart the whole converge run again.., or just 
 >notify, or etc.
 >
 >Does that help?
 >
 >
 >-----Original Message-----
 >From: Konstantin Boudnik [mailto:[email protected]] 
 >Sent: Wednesday, December 10, 2014 4:08 PM
 >To: [email protected]
 >Cc: [email protected]; Nate D'Amico; Rich
 >Subject: Re: Problem using puppet scripts to configure bigtop on 
 >AmazonLinux
 >
 >Rob,
 >
 >following on our IRC chat I will Cc here two guys from the community
who 
 >know Puppet the best. Nate and Rich are likely to have the answer.
Guys, 
 >if you can chime in on the topic - it'd be great!
 >
 >To reiterate it: you are looking to a way to automatically tell if a

 >recipe has failed and repeat it, if required, right?
 >
 >On Sun, Nov 30, 2014 at 09:50PM, Leidle, Rob wrote:
 >> Thanks Cos,
 >> 
 >> This would be something that I would want to automate as it would
be 
 >> running many times across many different clusters. Ideally I would
fix 
 >> any issues causing the puppet scripts to not complete properly,
but I 
 >> don╧t know how realistic that is in the short term so I would
like to 
 >> setup retry logic if that is the recommended way of doing things. 
 >> That╧s why I was hoping for some direction on how often to run
the 
 >>retry.
 >> 
 >> On 11/29/14, 5:12 PM, "Konstantin Boudnik"  wrote:
 >> 
 >> >On Sun, Nov 30, 2014 at 12:50AM, Leidle, Rob wrote:
 >> >> Thanks Roman,
 >> >> 
 >> >> I actually fixed the problem. I had an existing process
monitoring 
 >> >>the daemon and restarting it if it terminated. However, puppet 
 >> >>encapsulates this so it is no longer needed. Also, this process
was 
 >> >>causing the namenode service to terminate once. I removed my 
 >> >>existing monitoring process and everything is working fine.
 >> >> 
 >> >> That being said is there a recommended number of times we
should 
 >> >>retry the puppet scripts on failure?
 >> >
 >> >Good to see you're coming through! As for the retries: if
something 
 >> >doesn't work I usually check the logs immediatelly. Sometimes
after a 
 >> >second re-run.
 >> >
 >> >Cos
 >> >
 >> 
 >

Reply via email to