Some more information. I removed the timeout and restarted monit. I noticed
that the script hanged during an rsync and then the default timeout of 5
minutes did not apply. As a consequence I got monit reporting an error with
stdout content (the stdout content of the next execution as explained)

I will proceed to setup the timeout explicitely to 30 minutes (the script
runs every two hours) and tomorrow I will be reporting back on any issues.


On Sat, Jul 13, 2013 at 6:14 PM, Nestor Urquiza <[email protected]>wrote:

> Hi Jan-Henrik,
>
> I went ahead and created a sample script to make sure this actually works
> and I can confirm it does with that simple script. The issue as logs show
> is apparently a result of a double notification. The script took so long
> that monit killed it but the timeout was exactly equal to the time of next
> occurence:
> [EDT Jul 13 03:15:51] error    : 'myscript' program timed out after 7230
> seconds. Killing program with pid 4407
> [EDT Jul 13 03:15:51] error    : 'myscript' Sun Microsystems Inc.
> SunOS 5.10      Generic January 2005
> You have new mail.
>
> The first is a real error but from the myscript logs I can see that on
> 03:15 it did start and it was running correctly until suddenly it stopped
> presumably because monit killed it. So my best guess at this moment would
> be:
> 1. Monit receives previous myscript timeout notification at the same time
> as current myscript run events
> 2. Monit kills both instances
> 3. Monit alerts on the timeout and on the killed process, however on the
> latter there is nothing in stderr so monit defaults to stdout
>
> Clearly I have a workaround which is setting a shorter than the script run
> cycle (2 hours for this script case)
>
> On a side note/question I noticed monit switches to "waiting" for the next
> occurrence of the script instead of staying in failed status. After all I
> would like to run 'monit summary' and make sure I know if the script failed
> last time or not (and not rely uniquely on an alert). Is this a feature to
> be considered? You can see this easily just scheduling a simple bash bash
> script and forcing it to exit with status=1 for example.
>
> Thanks!
> - Nestor
>
>
> On Sat, Jul 13, 2013 at 9:02 AM, Jan-Henrik Haukeland <[email protected]
> > wrote:
>
>> On 13 Jul 2013, at 13:39, Nestor Urquiza <[email protected]>
>> wrote:
>>
>> > check program myscript with path "/usr/local/bin/myscript.sh" with
>> timeout 1000 seconds if status != 0 then alert
>> > When it fails I get the stdout in the alert instead of stderr. There is
>> a lot of logging in the script and monit is collecting only the first few
>> lines so tge real cause of the issue is not coming up. This is happening in
>> solaris running 5.5.1.
>>
>> Monit first reads from the script's stderr, if there is nothing there
>> _then_ it reads from stdout. Please make sure that your script really write
>> to stderr if needed. The output (if any) is part of the alert message and
>> to avoid too long messages only 255 chars are read. Maybe your script could
>> do some processing of the error and only write the relevant part to stderr?
>> --
>> To unsubscribe:
>> https://lists.nongnu.org/mailman/listinfo/monit-general
>>
>
>
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to