Dear David,

I´m sorry to bother You again with this issue, but the problem still exists.

Please have a look onto this example:

- I submitted a job like this: 

qsub -q fwd -l nodes=1:ppn=4 -I -l walltime=12:00:00

- maui.log tells me that the job cannot be started:

03/21 14:30:11 MRMJobStart(784238,Msg,SC)
03/21 14:30:11 MPBSJobStart(784238,base,Msg,SC)
03/21 14:30:11 ERROR:    job '784238' cannot be started: (rc: 15046  errmsg: 
'Resource temporarily unavailable MSG=job allocation request exceeds currently 
available cluster nodes, 1 requested, 0 available'  hostlist: 'fluid001:ppn=4')
03/21 14:30:11 ERROR:    cannot start job '784238' in partition DEFAULT
03/21 14:30:11 MJobPReserve(784238,DEFAULT,ResCount,ResCountRej)
03/21 14:30:30 job '784238'  State:        Idle  EState:       Idle   
QueueTime:       Tue Mar 21 14:29:50

- checkjob knows that on this particular node there are 16 CPU cores and thinks 
that 9 are in use:

checking node fluid001

State:   Running  (in current state for 00:00:00)
Expected State:     Idle   SyncDeadline: Sat Oct 24 14:26:40
Configured Resources: PROCS: 16  MEM: 62G  SWAP: 62G  DISK: 1M
Utilized   Resources: SWAP: 10G
Dedicated  Resources: PROCS: 9
Opsys:        ubuntu  Arch:         x64
Speed:      1.00  Load:      15.030
Network:    [DEFAULT]
Features:   [NONE]
Attributes: [Batch]
Classes:    [default 16:16][fwd 7:16][fwi 16:16][short 16:16][long 
16:16][benchmark 16:16][fwo 16:16]

Total Time:   INFINITY  Up:   INFINITY (98.92%)  Active:   INFINITY (93.87%)

Reservations:
  Job '772551'(x1)  -6:05:29:39 -> 2:02:30:21 (8:08:00:00)
  Job '772553'(x1)  -6:05:29:39 -> 2:02:30:21 (8:08:00:00)
  Job '772555'(x1)  -6:05:29:39 -> 2:02:30:21 (8:08:00:00)
  Job '772557'(x1)  -6:05:29:39 -> 2:02:30:21 (8:08:00:00)
  Job '779684'(x1)  -2:20:22:38 -> 5:11:37:22 (8:08:00:00)
  Job '779685'(x1)  -2:20:22:38 -> 5:11:37:22 (8:08:00:00)
  Job '781758'(x1)  -1:19:54:49 -> 6:12:05:11 (8:08:00:00)
  Job '783132'(x1)  -1:00:19:39 -> 7:07:40:21 (8:08:00:00)
  Job '783909'(x1)  -6:19:42 -> 8:01:40:18 (8:08:00:00)
  User 'fluid.0.0'(x1)  -00:03:52 ->   INFINITY (  INFINITY)
    Blocked Resources@00:00:00    Procs: 7/16 (43.75%)
    Blocked Resources@2:02:30:21  Procs: 11/16 (68.75%)
    Blocked Resources@5:11:37:22  Procs: 13/16 (81.25%)
    Blocked Resources@6:12:05:11  Procs: 14/16 (87.50%)
    Blocked Resources@7:07:40:21  Procs: 15/16 (93.75%)
    Blocked Resources@8:01:40:18  Procs: 16/16 (100.00%)
JobList:  772551,772553,772555,772557,779684,779685,781758,783132,783909

- with qstat I can see that there is only one free slot on the node and 15 are 
used by the jobs:

qstat -ae -n | grep fluid001
   fluid001/0
   fluid001/9
   fluid001/11
   fluid001/13
   fluid001/5,7
   fluid001/14-15
   fluid001/1
   fluid001/2-4,6
   fluid001/8,10

- The node has 9 running jobs, but the syntax of the allocations is still 
misunderstood by maui.

Do I have to switch to a newer version of Torque? Currently I am using version 
5.1.1.

Thanks in advance,
Henrik

> Am 22.08.2016 um 18:21 schrieb David Beer <db...@adaptivecomputing.com>:
> 
> This incompatibility exists for all versions of Torque > 5. It has been fixed 
> in the Maui source, but no official release has been made. You can grab the 
> new source from svn:
> 
> svn co svn://opensvn.adaptivecomputing.com/maui 
> <http://opensvn.adaptivecomputing.com/maui>
> 
> After that you can build it as you would a normal tarball.
> 
> On Sat, Aug 20, 2016 at 3:59 AM, Guangping Zhang <zgp...@126.com 
> <mailto:zgp...@126.com>> wrote:
> Dear all,
> 
> I found the Torque 6.0.2 not work properly with Maui 3.3.1 time to time.
> 
> And I found in the log file of maui that
> 
> 08/20 17:14:12 INFO:     PBS node node04 set to state Idle (free)
> 08/20 17:14:12 INFO:     node 'node04' changed states from Running to Idle
> 08/20 17:14:12 MPBSNodeUpdate(node04,node04,Idle,NODE00)
> 08/20 17:14:12 INFO:     node node04 has joblist '0-9/248.node00'
> 08/20 17:14:12 ALERT:    cannot locate PBS job '0-9' (running on node node04)
> 
> where 0-9 not jobs but the allocated procs for job 248.node00. So, will this 
> prevent torque to work good along with maui ?
> 
> Thanks for your discussion.
> 
> /Guangping
> 
> 
> _______________________________________________
> torqueusers mailing list
> torqueus...@supercluster.org <mailto:torqueus...@supercluster.org>
> http://www.supercluster.org/mailman/listinfo/torqueusers 
> <http://www.supercluster.org/mailman/listinfo/torqueusers>
> 
> 
> 
> 
> -- 
> David Beer | Torque Architect
> Adaptive Computing
> _______________________________________________
> torqueusers mailing list
> torqueus...@supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


_______________________________________________
mauiusers mailing list
mauiusers@supercluster.org
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to