We have a mysterious problem with Orion that appears to be getting worse. We
have used Orion to server hundreds of thousands of hits per day at JavaLobby
and other sites for about 2 years now, and we use NetSaint as our site
monitoring tool. NetSaint hits selected URL's at each of our sites every 60
seconds and will alert us if there are problems.

We're not sure which version of Orion it began with, but we get a handful of
"connection refused" messages from NetSaint almost every single hour. We
have been able to see the same thing with "wget" as well, so we don't think
it is NetSaint falsely reporting a connection refused.

We've tried to isolate down all the possible physical causes: swapped
hardware, network cabling, ethernet switches, etc. o matter what we do there
are always still connections refused from time to time.

The problem may have some correlation with traffic load, but we haven't been
able to confirm that either. We get no messages whatsoever in our logs, it
seems to happen at such a low level that it never even gets to any of our
code. Usually after a connection is refused the next one will be accepted,
but sometimes a series of repeated checks at 1-minute intervals will fail.
That will cause a beeper page (which can be really annoying at 3AM) -
another one shortly thereafter when the condition finally clears.

What can we do? We'd love to find and crush this one, but it is tough to
chase down and we don't know any further steps we can take. Has anyone else
experienced anything like this with Orion? We'll appreciate it if anyone can
shed some light, this one is a bitch!

Thanks,
Rick Ross


Reply via email to