[EMAIL PROTECTED] wrote:
Well, I looked through the logs and found it tended to occur in blocks... you'd have a day or two (I think the longest stretch was ~5 days) when it would occur often, and then weeks at a time when it wouldn't at all.

It didn't seem to be a 100% go/no-go thing that I can see... for example, we had someone who triggered a message spurt nine times over a ~30 minute period during the last problem block, and six went normally, three failed.

The other problem is that I can't know it's failing WHEN it happens, only after... basically I do "grep sendmail error_log | wc -l" and as long as the number hasn't gone up, we're fine. If it's redlining something for 10 seconds at a time, it's back to normal before I can go and fire up any measurements.

Maybe it should email me if it can't send email? :D :D :D

Of course, it hasn't done it in like 28 hours now, but it matters to me that I can know what's wrong, in the event it happens again.

Cook up some scripts to do the monitoring and write their output to log files. You won't know in realtime, but you'll be able to read the logs later and know what's going on. What to monitor is the big question. But that's just educated guessing and trial and error.

Maybe try something like Nagios or Hobbit to monitor all the necessary parts of the process. You might end up noticing other things that are odd around those times, and it could give you clues to go on.

http://www.nagios.org/
http://hobbitmon.sourceforge.net/

Adding a performance monitoring agent like ganglia might add some information about what's going on at the times you're having issues.

http://ganglia.info/

alex

Attachment: signature.asc
Description: OpenPGP digital signature

---------------------------------------------------
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss

Reply via email to