Re: [gt-user] Limits when submitting jobs to pbs through globus

Stuart Martin Tue, 27 May 2008 10:22:13 -0700


On May 27, 2008, at May 27, 12:20 AM, Yuriy wrote:


Hi,


What determines the limits on number of jobs that can be submitted
through the single <job> description?

There is no limit. Only the limit of what is available in the cluster/LRM that GRAM is interfacing with.

We have 10 node cluster with 2
quad-core processors per node, and when number of jobs is greater then
160 there seems to be increasing probability to get the following
error:


/bin/sh:

/home/grid-bestgrid/.globus/90bbca80-2ba4-11dd-95fc-8fae74568b88/scheduler_pbs_cmd_script:

No such file or directory

This error does not happen all the time, but the probability increases
as number of jobs increase, and I hasn't been able to trigger this
error with number of processors < number of cores * 2.

If seen a similar error on TeraGrid's UC/ANL cluster due to NFSscalability issues between the cluster's head node and compute nodes.What happens is that GRAM creates the scheduler_pbs_cmd_script for thejob. From the main (first) compute node allocated, GRAM will rsh toeach of the other compute nodes allocated and run that same commandscript. When all compute nodes in the job access that file at thesame time, some fail even though the file is there. Maybe we need toadd some reliability in there to retry (since we know that scriptshould be there). Or maybe there is a better way to handle thissituation. I'll have to think about this some.

One place to look to improve this would be to optimize your NFSconfiguration/setup. I am not an expert here and cannot offer muchadvise. But it would be good to have some helpful hints on this forpeople who run into it. So please provide any information if you findways to improve the situation.

At what scale do problems occur with this? By that I mean, how manyPBS processes/nodes are trying to access that file at (nearly) thesame time when errors begin to occur?

Also <count> tag seems to have no effect on number of jobs executed,other
then if it is equal to one, all jobs execute on single node.


Here is the 4.0 doc on extension handling:
        
http://www.globus.org/toolkit/docs/4.0/execution/wsgram/WS_GRAM_Job_Desc_Extensions.html#r-wsgram-extensions-constructs

This is not be well documented, but for PBS, when theresourceAllocationGroup is used, then count is ignored. As you cansee here in the if-then-else in choosing what is going to set the PBSnodes directive.


From PBS.pm >>>>>>>>
    if (defined $description->nodes())
    {

#Generated by ExtensionsHandler.pm fromresourceAllocationGroup elements

        print JOB '#PBS -l nodes=', $description->nodes(), "\n";
    }
    elsif($description->host_count() != 0)
    {
        print JOB '#PBS -l nodes=', $description->host_count(), "\n";
    }
    elsif($cluster && $cpu_per_node != 0)
    {
        print JOB '#PBS -l nodes=',
        myceil($description->count() / $cpu_per_node), "\n";
    }
<<<<<<<<



Example job description:

<job>
   <factoryEndpoint

xmlns:gram="http://www.globus.org/namespaces/2004/10/gram/job"xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing">

       <wsa:Address>
           
https://ng2test.auckland.ac.nz:8443/wsrf/services/ManagedJobFactoryService
       </wsa:Address>
       <wsa:ReferenceProperties>
           <gram:ResourceID>PBS</gram:ResourceID>
       </wsa:ReferenceProperties>
   </factoryEndpoint>
<executable>/bin/hostname</executable>
<count>200</count>
<queue>[EMAIL PROTECTED]</queue>
<jobType>multiple</jobType>
   <extensions>
       <resourceAllocationGroup>
               <hostCount>10</hostCount>
               <cpusPerHost>8</cpusPerHost>
               <processCount>162</processCount>
       </resourceAllocationGroup>
   </extensions>
</job>

For MPI jobs the limit seems to be 20 * number of cores, for larger
number of processes I see erros like this:

--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)

[compute-1.local:23438] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275[compute-1.local:23438] [0,0,0] ORTE_ERROR_LOG: Timeout in filepls_rsh_module.c at line 1164[compute-1.local:23438] [0,0,0] ORTE_ERROR_LOG: Timeout in fileerrmgr_hnp.c at line 90mpiexec noticed that job rank 8 with PID 22257 on node compute-10exited on signal 15 (Terminated).[compute-1.local:23438] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188[compute-1.local:23438] [0,0,0] ORTE_ERROR_LOG: Timeout in filepls_rsh_module.c at line 1196

--------------------------------------------------------------------------

Again, this does not happen all the time.


Example job description:

<job>
   <factoryEndpoint

xmlns:gram="http://www.globus.org/namespaces/2004/10/gram/job"xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing">

       <wsa:Address>
           
https://ng2test.auckland.ac.nz:9443/wsrf/services/ManagedJobFactoryService
       </wsa:Address>
       <wsa:ReferenceProperties>
           <gram:ResourceID>PBS</gram:ResourceID>
       </wsa:ReferenceProperties>
   </factoryEndpoint>

<executable>test</executable>
<directory>/home/grid-bestgrid/MPI/</directory>
<queue>[EMAIL PROTECTED]</queue>
<jobType>mpi</jobType>

   <extensions>
       <resourceAllocationGroup>
               <hostCount>5</hostCount>
               <cpusPerHost>8</cpusPerHost>
               <processCount>900</processCount>
       </resourceAllocationGroup>
   </extensions>
</job>



Can anyone explain what is going on here?


Regards,
Yuriy

Re: [gt-user] Limits when submitting jobs to pbs through globus

Reply via email to