Some more information. I removed the timeout and restarted monit. I noticed that the script hanged during an rsync and then the default timeout of 5 minutes did not apply. As a consequence I got monit reporting an error with stdout content (the stdout content of the next execution as explained)
I will proceed to setup the timeout explicitely to 30 minutes (the script runs every two hours) and tomorrow I will be reporting back on any issues. On Sat, Jul 13, 2013 at 6:14 PM, Nestor Urquiza <[email protected]>wrote: > Hi Jan-Henrik, > > I went ahead and created a sample script to make sure this actually works > and I can confirm it does with that simple script. The issue as logs show > is apparently a result of a double notification. The script took so long > that monit killed it but the timeout was exactly equal to the time of next > occurence: > [EDT Jul 13 03:15:51] error : 'myscript' program timed out after 7230 > seconds. Killing program with pid 4407 > [EDT Jul 13 03:15:51] error : 'myscript' Sun Microsystems Inc. > SunOS 5.10 Generic January 2005 > You have new mail. > > The first is a real error but from the myscript logs I can see that on > 03:15 it did start and it was running correctly until suddenly it stopped > presumably because monit killed it. So my best guess at this moment would > be: > 1. Monit receives previous myscript timeout notification at the same time > as current myscript run events > 2. Monit kills both instances > 3. Monit alerts on the timeout and on the killed process, however on the > latter there is nothing in stderr so monit defaults to stdout > > Clearly I have a workaround which is setting a shorter than the script run > cycle (2 hours for this script case) > > On a side note/question I noticed monit switches to "waiting" for the next > occurrence of the script instead of staying in failed status. After all I > would like to run 'monit summary' and make sure I know if the script failed > last time or not (and not rely uniquely on an alert). Is this a feature to > be considered? You can see this easily just scheduling a simple bash bash > script and forcing it to exit with status=1 for example. > > Thanks! > - Nestor > > > On Sat, Jul 13, 2013 at 9:02 AM, Jan-Henrik Haukeland <[email protected] > > wrote: > >> On 13 Jul 2013, at 13:39, Nestor Urquiza <[email protected]> >> wrote: >> >> > check program myscript with path "/usr/local/bin/myscript.sh" with >> timeout 1000 seconds if status != 0 then alert >> > When it fails I get the stdout in the alert instead of stderr. There is >> a lot of logging in the script and monit is collecting only the first few >> lines so tge real cause of the issue is not coming up. This is happening in >> solaris running 5.5.1. >> >> Monit first reads from the script's stderr, if there is nothing there >> _then_ it reads from stdout. Please make sure that your script really write >> to stderr if needed. The output (if any) is part of the alert message and >> to avoid too long messages only 255 chars are read. Maybe your script could >> do some processing of the error and only write the relevant part to stderr? >> -- >> To unsubscribe: >> https://lists.nongnu.org/mailman/listinfo/monit-general >> > >
-- To unsubscribe: https://lists.nongnu.org/mailman/listinfo/monit-general
