Garrick Staples wrote:
On Fri, Sep 28, 2007 at 09:31:01AM +0200, Jan Ploski alleged:
Hello,

diagnose -n on my system gives the following message for quite a few nodes:

WARNING: node 'node38' has been idle for 8:25:00 but load is HIGH. load: 3.020 (check for runaway processes?)

However, the node is running three jobs:

node38 Idle 4:4 7988:7988 1:1 15314:15314 1.00 linux [NONE] DEF 3.00 003 [dgiseq_4:4][verylong_4:4][sma [DEFAULT] [dual][eth] node38
     state = free

PBS is reporting the node as "free", not "busy" and maui is giving you a
warning on this.  It sees "free" with a higher load average as an indication of
something being misconfigured.  If everything is working fine, then just ignore
it.

If you want to fix it, then configure pbs_mom's $max_load and $ideal_load so
that nodes get reported as "busy".

Thanks for the tips. The node was indeed "free" in the sense that only 3 out of 4 possible jobs were running on it (actual load < max_load == 4). However, it is difficult to say whether the warning can or should be ignored. In particular, I am concerned about the 4:4 indications which I suppose is "class initializers". If these were inconsistent with the actual number of running jobs, then more than the configured number of jobs would be able to start in the given class (if I understand the concept correctly).

Best regards,
Jan Ploski
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to