Re: [Nagios-users] nohup and check_nrpe and timeout

2008-12-05 Thread Thomas Guyot-Sionnest
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 05/12/08 01:36 PM, David Shapiro wrote:
> 
> Okay, I tried to compile the setsid.c, but it wanted nls.h.  I tried
> to get nls.h, and it wanted types.h, etc etc.  I searched around for
> a version for Solaris, which is what I am using, but I had no luck.
> I

Yep... it's part of linux-utils... I'm dure there's a way on solaris
though, I unfortunately don'T have time to try right now.

> ended up abandoning this idea and tried the check_by_ssh, and this 
> worked fine, so I am good for now I think.  The one thing I am not 
> certain about is the comment that when you use the -f option, to 
> disassociate the process from a tty, it says something like if ssh is
>  successful it always returns a success.  Is it trying to say that it
>  will ignore my exit codes if I use the -f option?!?  That will not
> work for me if that is the case.

Yes, it will. You can't get the return code of a process anyway. You
could possibly try check_by_ssh the normal way to see if it behaves any
better in your case...

Although it seems to me that yore trying to perform two seperate
things... Check a service and restart it if it's down?

The proper way to do it is having a normal check, and trigger an event
handler to restart the service (or anything else?). Note though that you
must fork in your event handler before doing anything because nagios
will wait until the event handler finishes executing (teh fork option of
check_by_sse could work most of the time, but if the host does not
answer to ssh it will hang anyway).

Maybe you could describe the whole flow you want instead of the forking
problem alone ;)


> 
> Anyway, Thomas,you are absolutely great with passing me this 
> information.  I highly appreciate the fact you stepped up and did so.  I
> wish you happy holidays and the best in case I do not hear back from 
> you.

You too, thanks!

Ps,: please don't top-post (it breaks discussion flow), and try to
word-wrap your emails (otherwise I have to redo it myself...)... thanks!


- --
Thomas
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJOah46dZ+Kt5BchYRAtB6AKCelnYEbn9LHKT3x63qEmBPi3Fy8QCgt6KE
N6Lqib6sMhFOK0M7yb5Ri1M=
=IoLG
-END PGP SIGNATURE-

--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] nohup and check_nrpe and timeout

2008-12-05 Thread David Shapiro
Okay, I tried to compile the setsid.c, but it wanted nls.h.  I tried to get 
nls.h, and it wanted types.h, etc etc.  I searched around for a version for 
Solaris, which is what I am using, but I had no luck.  I ended up abandoning 
this idea and tried the check_by_ssh, and this worked fine, so I am good for 
now I think.  The one thing I am not certain about is the comment that when you 
use the -f option, to disassociate the process from a tty, it says something 
like if ssh is successful it always returns a success.  Is it trying to say 
that it will ignore my exit codes if I use the -f option?!?  That will not work 
for me if that is the case.

Anyway, Thomas,you are absolutely great with passing me this information.  I 
highly appreciate the fact you stepped up and did so.  I wish you happy 
holidays and the best in case I do not hear back from you.

David

-Original Message-
From: Thomas Guyot-Sionnest [mailto:[EMAIL PROTECTED] 
Sent: Friday, December 05, 2008 9:14 AM
To: David Shapiro
Cc: nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] nohup and check_nrpe and timeout

On 05/12/08 08:37 AM, David Shapiro wrote:
> Thank you Thomas, this looks like good info.  I do not seem to have an 
> executable on Solaris called setsid though.  It is listed as a c function.  
> Nrpe.cfg does in fact let you increase the timeout, but I was thinking that 
> will not help because my program remains running in a loop.  Re-iterations 
> has it check logs that it is generating.  If it is not seen as running, I 
> mentioned that it will just start it again.  However, since it is in a loop, 
> I am thinking that nrpe will timeout no matter how much my timeout is set to. 
>  The setsid idea looked interesting, but unfortunately I do not see it on my 
> server.  The last one I think was using bash to close stdout, stdin, and 
> stderr, but it also used setsid in your example (sigh).  The alarm handle 
> idea did not work.  That leaves the ssh check agent.  I will look into that 
> today.
> 

Try this maybe...

http://www.google.com/codesearch?hl=en&q=linux+setsid.c+show:SQhth2SDUWk:_rHMxvw0UiI:7BWeGpnuS2M&sa=N&cd=5&ct=rc&cs_p=ftp://ftp.kernel.org/pub/linux/utils/util-linux/testing/util-linux-2.13-pre7.tar.gz&cs_f=util-linux-2.13-pre7/sys-utils/setsid.c

-- 
Thomas


--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] nohup and check_nrpe and timeout

2008-12-05 Thread Thomas Guyot-Sionnest
On 05/12/08 08:37 AM, David Shapiro wrote:
> Thank you Thomas, this looks like good info.  I do not seem to have an 
> executable on Solaris called setsid though.  It is listed as a c function.  
> Nrpe.cfg does in fact let you increase the timeout, but I was thinking that 
> will not help because my program remains running in a loop.  Re-iterations 
> has it check logs that it is generating.  If it is not seen as running, I 
> mentioned that it will just start it again.  However, since it is in a loop, 
> I am thinking that nrpe will timeout no matter how much my timeout is set to. 
>  The setsid idea looked interesting, but unfortunately I do not see it on my 
> server.  The last one I think was using bash to close stdout, stdin, and 
> stderr, but it also used setsid in your example (sigh).  The alarm handle 
> idea did not work.  That leaves the ssh check agent.  I will look into that 
> today.
> 

Try this maybe...

http://www.google.com/codesearch?hl=en&q=linux+setsid.c+show:SQhth2SDUWk:_rHMxvw0UiI:7BWeGpnuS2M&sa=N&cd=5&ct=rc&cs_p=ftp://ftp.kernel.org/pub/linux/utils/util-linux/testing/util-linux-2.13-pre7.tar.gz&cs_f=util-linux-2.13-pre7/sys-utils/setsid.c

-- 
Thomas

--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] nohup and check_nrpe and timeout

2008-12-05 Thread David Shapiro
Thank you Thomas, this looks like good info.  I do not seem to have an 
executable on Solaris called setsid though.  It is listed as a c function.  
Nrpe.cfg does in fact let you increase the timeout, but I was thinking that 
will not help because my program remains running in a loop.  Re-iterations has 
it check logs that it is generating.  If it is not seen as running, I mentioned 
that it will just start it again.  However, since it is in a loop, I am 
thinking that nrpe will timeout no matter how much my timeout is set to.  The 
setsid idea looked interesting, but unfortunately I do not see it on my server. 
 The last one I think was using bash to close stdout, stdin, and stderr, but it 
also used setsid in your example (sigh).  The alarm handle idea did not work.  
That leaves the ssh check agent.  I will look into that today.

David

-Original Message-
From: Thomas Guyot-Sionnest [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 04, 2008 11:16 PM
To: David Shapiro
Cc: nagios-users@lists.sourceforge.net
Subject: Re: [Nagios-users] nohup and check_nrpe and timeout

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/12/08 01:27 PM, David Shapiro wrote:
> Hello,
> 
> I wrote something that starts a python script to check weblogic that if 
> it sees it is not running starts it.  The problem is that it takes 
> several minutes to start.  I tried in my script to just nohup and 
> background the process that also says it is starting and exists with a 
> 1.  For some reason though even though I do a nohup and background, it 
> does not run with a nohup.  It times out.  Is there a way to do this?  
> Why is check_nrpe maxed for 60 seconds?  Why will it not recognize I 
> just used a nohup and background?

nrpe is definitely not the best way to do it, but here's some insights
on what you could try:

1. I think the 60 second timeout is configurable in nrpe.conf

2. You can start nohup with setsid to run it in a new session. You can
possibly avoid the use of nohup at all by closing stdin/out/err (in
bash: exec /dev/null; exec 2>/dev/null; setsid )

3. It's possible that nrpe starts the alarm handler before exec'ing the
plugin; try resetting it before running. You could do that in perl:
perl -e 'alarm(0); exec ' (actually I think exec in perl will
invoke the shell which will get the the signal, so alarm(0) is useless
anyway)

4. check_by_ssh has a mode to start the remote program/script and
return. You will need to setup an ssh keypair for this to work (don't
forget that nagios will run it as the nagios user, do you'll have to set
up the keys for that user)


- --
Thomas
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJOKow6dZ+Kt5BchYRAnxsAKDRiYNHxGkoLGLww6YG5qnGNBbrIACfW6fY
WHe4BSKXgC5AH56aTWGWpW0=
=sdKI
-END PGP SIGNATURE-


--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] nohup and check_nrpe and timeout

2008-12-04 Thread Thomas Guyot-Sionnest
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/12/08 01:27 PM, David Shapiro wrote:
> Hello,
> 
> I wrote something that starts a python script to check weblogic that if 
> it sees it is not running starts it.  The problem is that it takes 
> several minutes to start.  I tried in my script to just nohup and 
> background the process that also says it is starting and exists with a 
> 1.  For some reason though even though I do a nohup and background, it 
> does not run with a nohup.  It times out.  Is there a way to do this?  
> Why is check_nrpe maxed for 60 seconds?  Why will it not recognize I 
> just used a nohup and background?

nrpe is definitely not the best way to do it, but here's some insights
on what you could try:

1. I think the 60 second timeout is configurable in nrpe.conf

2. You can start nohup with setsid to run it in a new session. You can
possibly avoid the use of nohup at all by closing stdin/out/err (in
bash: exec /dev/null; exec 2>/dev/null; setsid )

3. It's possible that nrpe starts the alarm handler before exec'ing the
plugin; try resetting it before running. You could do that in perl:
perl -e 'alarm(0); exec ' (actually I think exec in perl will
invoke the shell which will get the the signal, so alarm(0) is useless
anyway)

4. check_by_ssh has a mode to start the remote program/script and
return. You will need to setup an ssh keypair for this to work (don't
forget that nagios will run it as the nagios user, do you'll have to set
up the keys for that user)


- --
Thomas
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJOKow6dZ+Kt5BchYRAnxsAKDRiYNHxGkoLGLww6YG5qnGNBbrIACfW6fY
WHe4BSKXgC5AH56aTWGWpW0=
=sdKI
-END PGP SIGNATURE-

--
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


[Nagios-users] nohup and check_nrpe and timeout

2008-12-04 Thread David Shapiro
Hello,

I wrote something that starts a python script to check weblogic that if it sees 
it is not running starts it.  The problem is that it takes several minutes to 
start.  I tried in my script to just nohup and background the process that also 
says it is starting and exists with a 1.  For some reason though even though I 
do a nohup and background, it does not run with a nohup.  It times out.  Is 
there a way to do this?  Why is check_nrpe maxed for 60 seconds?  Why will it 
not recognize I just used a nohup and background?

David


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null