Andrea:
I've not ever used the utility 'Mon', but I've spent a good deal of time
configuring systems with HA (High Availability). It's not uncommon for
a service to flagged as down on periodic checks. The solution to false
positives requires some give and take.
For example, when I was writing code to monitor a sybase engine, I would
run a check every X seconds. If the server failed to respond Y times in
a row, it was assumed down. For other applications, we might have
polled every J seconds, and required more or less failures.
If there is a way to configure Mon to report a service as down after a
number of failures, then that is my recommendation. Just because a
service fails a test once doesn't mean that it's down. I could just be
busy.
David
Andrea Cerrito wrote:
>
> Hi to all,
>
> I have a server farm with pop3 / smtp / ftp services running on Linux and
> served by tcpserver. My monitoring software is Mon, and sometimes I'm
> receiving alarms about these services: they are always false alarms.
>
> For example:
>
> =======SERVICE IS MARKED AS DOWN==========
> Summary output : **** Time Out
>
> Group : pop3-a.frontend.int
> Service : smtp
> Time noticed : Tue Jun 12 13:27:10 2001
> Secs until next alert :
> Members : pop3-a.frontend.int
>
> Detailed text (if any) follows:
> -------------------------------
> pop3-a.frontend.int
>
> ========SERVICE IS MARKED AS UP==========
> Summary output : **** Time Out
>
> Group : pop3-a.frontend.int
> Service : smtp
> Time noticed : Tue Jun 12 13:28:16 2001
> Secs until next alert :
> Members : pop3-a.frontend.int
>
> Detailed text (if any) follows:
> -------------------------------
> pop3-a.frontend.int
>
> Just one minute (and I'm doing test every minute)... I'm trying to
> understand why I'm having those false alarms on only services running with
> tcpserver on Linux. I mean, if the service is running with tcpserver on
> Solaris or the services is running on linux without tcpserver, I've no
> errors (ie, qmail on solaris and Apache on linux).
>
> Viewing logs, I've no errors.
>
> What can be the problem?? What I've to search for??
>
> Thanks
>
> PS I didn't find a list about ucspi-tcp: if I wrote to wrong list, please
> tell me which is the correct one :)
> ---
> Cordiali saluti / Best regards
> Andrea Cerrito
> ^^^^^^^^^^^^^^
> Net.Admin @ Centro MultiMediale di Terni S.p.A.
> P.zzale Bosco 3A
> 05100 Terni IT
> Tel. +39 744 5441330
> Fax. +39 744 5441372