Hi,

My problem is that less than half the capacity of our nodes in being used.

We have two 64-core nodes. At any one time <~50 jobs are scheduled.

Looking at a job that's queued I get:


$ checkjob 22360

checking job 22360

State: Idle
Creds:  user:bckhouse  group:minos  class:minos  qos:DEFAULT
WallTime: 00:00:00 of   INFINITY
SubmitTime: Tue May 14 06:11:10
   (Time Queued  Total: 5:24:43  Eligible: 5:24:43)

Total Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
Dedicated Resources Per Task: PROCS: 1  MEM: 2000M

IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 0
PartitionMask: [ALL]
Flags:       RESTARTABLE

Reservation '22360' (00:00:00 ->   INFINITY  Duration:   INFINITY)
PE:  1.00  StartPriority:  324
job cannot run in partition DEFAULT (insufficient idle procs available: 
0 < 1)


So, the problem is there are no idle procs.

Checking one of the nodes:


$ checknode node078

checking node node078

State:      Busy  (in current state for 00:32:08)
Expected State:  Running   SyncDeadline: Tue May 14 11:37:10
Configured Resources: PROCS: 64  MEM: 252G  SWAP: 252G  DISK: 558G
Utilized   Resources: PROCS: 64  SWAP: 17G  DISK: 32M
Dedicated  Resources: PROCS: 24  MEM: 46G
Opsys:         linux  Arch:      [NONE]
Speed:      1.00  Load:      23.180
Network:    [DEFAULT]
Features:   [amd][MEM256G]
Attributes: [Batch]
Classes:    [batch 64:64][minos 40:64]

Total Time:   INFINITY  Up:   INFINITY (99.66%)  Active: 27:15:36:09 
(20.67%)

<snip a bunch of reservations corresponding to the running jobs>


So it knows it's only scheduled ("dedicated") 24 jobs, but it thinks the 
utilization is the full 64.

This obviously isn't true.

node078:~$ uptime
  11:38:13 up 16:45,  2 users,  load average: 22.97, 23.24, 23.97

So the load is only the 24 jobs that are running, not 64. Likewise, the 
node looks half-loaded in ganglia.


So, how is the "utilized" number determined? The docs make it sound like 
it's just the load average, but that doesn't seem to be the case.


I set "NODEAVAILABILITYPOLICY DEDICATED:PROCS" in maui.cfg on the head 
node, in an attempt to get maui to ignore the "utilized" number, but it 
doesn't seem to have had any effect.


Additional information: I can get the nodes full if I "surprise" them by 
submitting 128 jobs simultaneously. Those jobs run happily. That's the 
state I want to be able to obtain in the general case.

Watching the load on ganglia, it looks like the jobs drain out of the 
system as they complete, and then every 30 minutes they're topped back 
up to the ~half utilization level. My guess is driven by the "defer" 
mechanism.

Sorry if this problem has been answered before. I found several 
instances of what sound like the same thing in the mail archives, but no 
solutions :(

Thanks - Chris
_______________________________________________
mauiusers mailing list
mauiusers@supercluster.org
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to