The new timeout did work as expected:
[EDT Jul 15 03:15:14] error    : 'mysqcript' program timed out after 7157
seconds. Killing program with pid 28899
[EDT Jul 15 03:15:14] error    : 'myscript' STARTED ON Mon Jul 15 01:15:58
EDT 2013

First you can see how the timeout is still 2 hours (apparently the timeout
is matching the cycle time) no matter the fact that I added the timeoit:
check program myscript with path "/usr/myuser/myscript" with timeout 1800
seconds
  every "15 1,3,5,7,9,11,13,15,17,19,21,23 * * *"
  if status != 0 then alert

Second you can see how the second line above outputs the stdout because
there is not really stderr. From the myscript log all I can see is that it
stops running with no indication because IMO monit just kills it as well.

It looks like monit will kill all myscript processes when the timeout
occurs instead of just the correct pid. Also it looks like the timeout
directive is not been honored. This is happening in the below:

$ uname -a
SunOS genevastby 5.10 Generic_139556-08 i86pc i386 i86pc
$ monit -V
This is Monit version 5.5.1
$ which monit
/usr/bin/monit
$ ls -al /usr/bin/monit
lrwxrwxrwx   1 root     root          20 Sep 25  2012 /usr/bin/monit ->
/usr/local/bin/monit
$ ps -ef|grep monit|grep -v grep
    root 18343     1   0   Jun 25 ?           5:08 /usr/local/bin/monit -Ic
/usr/local/etc/monitrc

Is this issue related to a bug?

Thanks!
- Nestor





On Sun, Jul 14, 2013 at 11:11 AM, Nestor Urquiza
<[email protected]>wrote:

> Some more information. I removed the timeout and restarted monit. I
> noticed that the script hanged during an rsync and then the default timeout
> of 5 minutes did not apply. As a consequence I got monit reporting an error
> with stdout content (the stdout content of the next execution as explained)
>
> I will proceed to setup the timeout explicitely to 30 minutes (the script
> runs every two hours) and tomorrow I will be reporting back on any issues.
>
>
> On Sat, Jul 13, 2013 at 6:14 PM, Nestor Urquiza 
> <[email protected]>wrote:
>
>> Hi Jan-Henrik,
>>
>> I went ahead and created a sample script to make sure this actually works
>> and I can confirm it does with that simple script. The issue as logs show
>> is apparently a result of a double notification. The script took so long
>> that monit killed it but the timeout was exactly equal to the time of next
>> occurence:
>> [EDT Jul 13 03:15:51] error    : 'myscript' program timed out after 7230
>> seconds. Killing program with pid 4407
>> [EDT Jul 13 03:15:51] error    : 'myscript' Sun Microsystems Inc.
>> SunOS 5.10      Generic January 2005
>> You have new mail.
>>
>> The first is a real error but from the myscript logs I can see that on
>> 03:15 it did start and it was running correctly until suddenly it stopped
>> presumably because monit killed it. So my best guess at this moment would
>> be:
>> 1. Monit receives previous myscript timeout notification at the same time
>> as current myscript run events
>> 2. Monit kills both instances
>> 3. Monit alerts on the timeout and on the killed process, however on the
>> latter there is nothing in stderr so monit defaults to stdout
>>
>> Clearly I have a workaround which is setting a shorter than the script
>> run cycle (2 hours for this script case)
>>
>> On a side note/question I noticed monit switches to "waiting" for the
>> next occurrence of the script instead of staying in failed status. After
>> all I would like to run 'monit summary' and make sure I know if the script
>> failed last time or not (and not rely uniquely on an alert). Is this a
>> feature to be considered? You can see this easily just scheduling a simple
>> bash bash script and forcing it to exit with status=1 for example.
>>
>> Thanks!
>> - Nestor
>>
>>
>> On Sat, Jul 13, 2013 at 9:02 AM, Jan-Henrik Haukeland <
>> [email protected]> wrote:
>>
>>> On 13 Jul 2013, at 13:39, Nestor Urquiza <[email protected]>
>>> wrote:
>>>
>>> > check program myscript with path "/usr/local/bin/myscript.sh" with
>>> timeout 1000 seconds if status != 0 then alert
>>> > When it fails I get the stdout in the alert instead of stderr. There
>>> is a lot of logging in the script and monit is collecting only the first
>>> few lines so tge real cause of the issue is not coming up. This is
>>> happening in solaris running 5.5.1.
>>>
>>> Monit first reads from the script's stderr, if there is nothing there
>>> _then_ it reads from stdout. Please make sure that your script really write
>>> to stderr if needed. The output (if any) is part of the alert message and
>>> to avoid too long messages only 255 chars are read. Maybe your script could
>>> do some processing of the error and only write the relevant part to stderr?
>>> --
>>> To unsubscribe:
>>> https://lists.nongnu.org/mailman/listinfo/monit-general
>>>
>>
>>
>
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to