Garrick Staples wrote:
On Fri, Sep 28, 2007 at 09:31:01AM +0200, Jan Ploski alleged:
Hello,
diagnose -n on my system gives the following message for quite a few
nodes:
WARNING: node 'node38' has been idle for 8:25:00 but load is HIGH. load:
3.020 (check for runaway processes?)
However, the node is running three jobs:
node38 Idle 4:4 7988:7988 1:1 15314:15314
1.00 linux [NONE] DEF 3.00 003 [dgiseq_4:4][verylong_4:4][sma [DEFAULT]
[dual][eth]
node38
state = free
PBS is reporting the node as "free", not "busy" and maui is giving you a
warning on this. It sees "free" with a higher load average as an indication of
something being misconfigured. If everything is working fine, then just ignore
it.
If you want to fix it, then configure pbs_mom's $max_load and $ideal_load so
that nodes get reported as "busy".
Thanks for the tips. The node was indeed "free" in the sense that only 3
out of 4 possible jobs were running on it (actual load < max_load == 4).
However, it is difficult to say whether the warning can or should be
ignored. In particular, I am concerned about the 4:4 indications which I
suppose is "class initializers". If these were inconsistent with the
actual number of running jobs, then more than the configured number of
jobs would be able to start in the given class (if I understand the
concept correctly).
Best regards,
Jan Ploski
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers