Hi Jan-Henrik,

I went ahead and created a sample script to make sure this actually works
and I can confirm it does with that simple script. The issue as logs show
is apparently a result of a double notification. The script took so long
that monit killed it but the timeout was exactly equal to the time of next
occurence:
[EDT Jul 13 03:15:51] error    : 'myscript' program timed out after 7230
seconds. Killing program with pid 4407
[EDT Jul 13 03:15:51] error    : 'myscript' Sun Microsystems Inc.     SunOS
5.10      Generic January 2005
You have new mail.

The first is a real error but from the myscript logs I can see that on
03:15 it did start and it was running correctly until suddenly it stopped
presumably because monit killed it. So my best guess at this moment would
be:
1. Monit receives previous myscript timeout notification at the same time
as current myscript run events
2. Monit kills both instances
3. Monit alerts on the timeout and on the killed process, however on the
latter there is nothing in stderr so monit defaults to stdout

Clearly I have a workaround which is setting a shorter than the script run
cycle (2 hours for this script case)

On a side note/question I noticed monit switches to "waiting" for the next
occurrence of the script instead of staying in failed status. After all I
would like to run 'monit summary' and make sure I know if the script failed
last time or not (and not rely uniquely on an alert). Is this a feature to
be considered? You can see this easily just scheduling a simple bash bash
script and forcing it to exit with status=1 for example.

Thanks!
- Nestor


On Sat, Jul 13, 2013 at 9:02 AM, Jan-Henrik Haukeland
<[email protected]>wrote:

> On 13 Jul 2013, at 13:39, Nestor Urquiza <[email protected]> wrote:
>
> > check program myscript with path "/usr/local/bin/myscript.sh" with
> timeout 1000 seconds if status != 0 then alert
> > When it fails I get the stdout in the alert instead of stderr. There is
> a lot of logging in the script and monit is collecting only the first few
> lines so tge real cause of the issue is not coming up. This is happening in
> solaris running 5.5.1.
>
> Monit first reads from the script's stderr, if there is nothing there
> _then_ it reads from stdout. Please make sure that your script really write
> to stderr if needed. The output (if any) is part of the alert message and
> to avoid too long messages only 255 chars are read. Maybe your script could
> do some processing of the error and only write the relevant part to stderr?
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
>
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to