Today, Leif Nixon wrote forth saying... > When I got here today I was met by a somewhat irritated coworker who > had shut down gmond (2.5.0) on our 200 node cluster due to excessive > network and CPU utilization. I haven't looked at it yet, though.
let's find the source of the problem and make sure that it doesn't happen again. can you do a "grep gmond /var/log/messages" on the machines and see if there are any error messages? if this problem happens again can you do a "kill -11 <gmond pid>" to generate a core file (you might need to do a ulimit -c unlimited" in order to have gmond dump a core file.. do a "man ulimit" for details)? you can also drop the line... debug_level 1 into /etc/gmond.conf to force gmond to not go into the background and to try loads of debugging information to terminal. i don't want this bug in 2.5.1.. we'll stomp it out. -matt