Our datacenter mon instance runs on linux flawlessly, but I'm setting up
a mon instance for our office net and it's not running correctly.
Mon version is mon-0-99-1.8, on Solaris 8 (sparc). What happens is when
I start Mon as a "mon" user, in /usr/local/mon (recursively owned by the
"mon" user), it forks off as many instances of the mon command line as
the systems will allow until memory is exhausted.
Perl is 5.6.0, Mon-client is Mon-0.11. I normally run a M4 config, but
when this started happening I stripped out example.cf and put only a
single host with HTTP and ping checks, with the same results.
A truss of the parent Mon proc (showing forks) showed all the children
blocking for some unknown reason:
26500: waitid(P_ALL, 0, 0xFFBEFA30, WEXITED|WTRAPPED|WNOHANG) Err#10
ECHILD
26500: poll(0xFFBEFAB0, 2, 1000) = 0
26500: time() = 1001534981
26500: time() = 1001534981
26500: poll(0xFFBEFA70, 0, 0) = 0
26500: time() = 1001534981
26500: waitid(P_ALL, 0, 0xFFBEFA30, WEXITED|WTRAPPED|WNOHANG) Err#10
ECHILD
26500: poll(0xFFBEFAB0, 2, 1000) = 0
26500: time() = 1001534982
26500: time() = 1001534982
26500: poll(0xFFBEFA70, 0, 0) = 0
26500: time() = 1001534982
26500: waitid(P_ALL, 0, 0xFFBEFA30, WEXITED|WTRAPPED|WNOHANG) Err#10
ECHILD
26500: poll(0xFFBEFAB0, 2, 1000) = 0
26500: time() = 1001534983
26500: time() = 1001534983
26500: poll(0xFFBEFA70, 0, 0) = 0
26500: time() = 1001534983
I've installed the required perl modules, and generated the *.ph files.
I don't know what's left. The mon program itself compiles successfully
and passes syntax checks with "perl -c".
What's left to check?
I got all of Lycos (where I work) to switch from BB to Mon, but if this
shows up on other sparc boxes when the rest of the organization switches,
this will look bad :(
--
Nate