>> On Mon, Jun 20, 2011 at 9:24 AM, Rai Ricafrente <maill...@ricafrente.com> >> wrote: >> > Hi everyone, >> > >> > I just installed a fresh Nagios v3.2.3 with about 150 hosts and 600 >> > services. I just noticed from time to time, hosts are throwing out >> > "Return >> > code of 141 is out of bounds" status every now and then, then it will >> > eventually go away. I don't know if this has anything to do with the >> > plugin >> > since the status will return to OK state without intervention, which >> > proves >> > that the check_icmp plugin works just fine. >> > >> > I'm confused with this error, and this one did not manifest itself when >> > we >> > were using Nagios v2. Anyone has the same issue? >> > >> > Big thanks, >> > >> > Rai
> On Mon, Jun 20, 2011 at 10:16 AM, Yueh-Hung Liu <yuehung....@gmail.com> > wrote: >> >> nagios only accepts integers 0~3 as return codes of plugins. >> try to manually execute the command of the questioned service (be the >> user nagios runs as) and check the ouputs. On Sun, Jun 19, 2011 at 19:24, Rai Ricafrente <maill...@ricafrente.com> wrote: > The output returns OK status when run manually. It seems that the error > occurs at random times, but as mentioned, will eventually go away. If the > plugin is the issue, the error should be persistent. In my case, it happens > from time to time. I only experienced this when we used Nagios 3.2.3, this > never happened in Nagios v2.6 (Quick reminder: mailing list: don't top-post) Rai, the logic of "it never happened before on 2.6 so it would have never happened on 2.6, therefore 3.2.3 is in error" is like "we've never had an oil rig explode in the Gulf of Mexico before" :) Really, the way to find out who is to blame is similar to Yueh-Hung Liu's suggestion, but make a wrapper for the script instead. The wrapper should record the environment offered to the script, and the parameters, and should check the return code, storing the results by a filename based on the result code -- for example, renaming a temporary file used to collect into a filename based on the result. An example in /bin/bash would be to store all content into a file /tmp/nagios-tmp.$$, and then based on the $0 of the script execution, "mv /tmp/nagios-tmp.$$ /tmp/ret.$0" or some such. To explain what this offers, consider that you may have the return codes 0,1,2,3, and 141, and you're using "/tmp/ret" as a base filename. When you're running again, and you have a few successful results plus a "141" return code, compare any of the /tmp/ret.0, /tmp/ret.1, /tmp/ret.2,/tmp/ret.3 with the /tmp/ret.141 contents. You only need to keep the last occurrence of each (since they should be similar) so it keeps you from running out of disk. You can run this overnight without crushing your monitoring system's disk, no huge difference except for the file I/O you've added. Then, when you compare the wrapper output in the 141 case to the 0-3 case (ie "diff /tmp/ret.0 /tmp/ret.141"), you'll see whether the input environment or parameters are different. If it's relatively the same input either way, then when the wrapper executes the wrapped script, perhaps turn on some debugging or tracing, and the output will still collect, but you'll have some verbose debug information to dig through to see why. Alternatively, if the input seems to change, you'll be able to see what Nagios is doing differently between executions. Allan -- all...@chickenandporn.com "金鱼" http://linkedin.com/in/goldfish ------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null