...and your version of Slurm?

On Feb 12, 2014, at 7:19 AM, Ralph Castain <r...@open-mpi.org> wrote:

> What is your SLURM_TASKS_PER_NODE?
> 
> On Feb 12, 2014, at 6:58 AM, Adrian Reber <adr...@lisas.de> wrote:
> 
>> No, the system has only a few MOAB_* variables and many SLURM_*
>> variables:
>> 
>> $BASH                         $IFS                          $SECONDS         
>>              $SLURM_PTY_PORT
>> $BASHOPTS                     $LINENO                       $SHELL           
>>              $SLURM_PTY_WIN_COL
>> $BASHPID                      $LINES                        $SHELLOPTS       
>>              $SLURM_PTY_WIN_ROW
>> $BASH_ALIASES                 $MACHTYPE                     $SHLVL           
>>              $SLURM_SRUN_COMM_HOST
>> $BASH_ARGC                    $MAILCHECK                    $SLURMD_NODENAME 
>>              $SLURM_SRUN_COMM_PORT
>> $BASH_ARGV                    $MOAB_CLASS                   
>> $SLURM_CHECKPOINT_IMAGE_DIR   $SLURM_STEPID
>> $BASH_CMDS                    $MOAB_GROUP                   $SLURM_CONF      
>>              $SLURM_STEP_ID
>> $BASH_COMMAND                 $MOAB_JOBID                   
>> $SLURM_CPUS_ON_NODE           $SLURM_STEP_LAUNCHER_PORT
>> $BASH_LINENO                  $MOAB_NODECOUNT               
>> $SLURM_DISTRIBUTION           $SLURM_STEP_NODELIST
>> $BASH_SOURCE                  $MOAB_PARTITION               $SLURM_GTIDS     
>>              $SLURM_STEP_NUM_NODES
>> $BASH_SUBSHELL                $MOAB_PROCCOUNT               $SLURM_JOBID     
>>              $SLURM_STEP_NUM_TASKS
>> $BASH_VERSINFO                $MOAB_SUBMITDIR               
>> $SLURM_JOB_CPUS_PER_NODE      $SLURM_STEP_TASKS_PER_NODE
>> $BASH_VERSION                 $MOAB_USER                    $SLURM_JOB_ID    
>>              $SLURM_SUBMIT_DIR
>> $COLUMNS                      $OPTERR                       
>> $SLURM_JOB_NODELIST           $SLURM_SUBMIT_HOST
>> $COMP_WORDBREAKS              $OPTIND                       
>> $SLURM_JOB_NUM_NODES          $SLURM_TASKS_PER_NODE
>> $DIRSTACK                     $OSTYPE                       
>> $SLURM_LAUNCH_NODE_IPADDR     $SLURM_TASK_PID
>> $EUID                         $PATH                         $SLURM_LOCALID   
>>              $SLURM_TOPOLOGY_ADDR
>> $GROUPS                       $POSIXLY_CORRECT              $SLURM_NNODES    
>>              $SLURM_TOPOLOGY_ADDR_PATTERN
>> $HISTCMD                      $PPID                         $SLURM_NODEID    
>>              $SRUN_DEBUG
>> $HISTFILE                     $PS1                          $SLURM_NODELIST  
>>              $TERM
>> $HISTFILESIZE                 $PS2                          $SLURM_NPROCS    
>>              $TMPDIR
>> $HISTSIZE                     $PS4                          $SLURM_NTASKS    
>>              $UID
>> $HOSTNAME                     $PWD                          
>> $SLURM_PRIO_PROCESS           $_
>> $HOSTTYPE                     $RANDOM                       $SLURM_PROCID    
>>              
>> 
>> 
>> 
>> On Wed, Feb 12, 2014 at 06:12:45AM -0800, Ralph Castain wrote:
>>> Seems rather odd - since this is managed by Moab, you shouldn't be seeing 
>>> SLURM envars at all. What you should see are PBS_* envars, including a 
>>> PBS_NODEFILE that actually contains the allocation.
>>> 
>>> 
>>> On Feb 12, 2014, at 4:42 AM, Adrian Reber <adr...@lisas.de> wrote:
>>> 
>>>> I tried the nightly snapshot (openmpi-1.7.5a1r30692.tar.gz) on a system
>>>> with slurm and moab. I requested an interactive session using:
>>>> 
>>>> msub -I -l nodes=3:ppn=8
>>>> 
>>>> and started a simple test case which fails:
>>>> 
>>>> $ mpirun -np 2 ./mpi-test 1
>>>> --------------------------------------------------------------------------
>>>> There are not enough slots available in the system to satisfy the 2 slots 
>>>> that were requested by the application:
>>>> ./mpi-test
>>>> 
>>>> Either request fewer slots for your application, or make more slots 
>>>> available
>>>> for use.
>>>> --------------------------------------------------------------------------
>>>> srun: error: xxxx108: task 1: Exited with exit code 1
>>>> srun: Terminating job step 131823.4
>>>> srun: error: xxxx107: task 0: Exited with exit code 1
>>>> srun: Job step aborted
>>>> slurmd[xxxx108]: *** STEP 131823.4 KILLED AT 2014-02-12T13:30:32 WITH 
>>>> SIGNAL 9 ***
>>>> 
>>>> 
>>>> requesting only one core works:
>>>> 
>>>> $ mpirun  ./mpi-test 1
>>>> 4.4.7 20120313 (Red Hat 4.4.7-4):Process 0 on xxxx106 out of 1: 0.000000
>>>> 4.4.7 20120313 (Red Hat 4.4.7-4):Process 0 on xxxx106 out of 1: 0.000000
>>>> 
>>>> 
>>>> using openmpi-1.6.5 works with multiple cores:
>>>> 
>>>> $ mpirun -np 24 ./mpi-test 2
>>>> 4.4.7 20120313 (Red Hat 4.4.7-4):Process 0 on xxxx106 out of 24: 0.000000
>>>> 4.4.7 20120313 (Red Hat 4.4.7-4):Process 12 on xxxx106 out of 24: 12.000000
>>>> 4.4.7 20120313 (Red Hat 4.4.7-4):Process 11 on xxxx108 out of 24: 11.000000
>>>> 4.4.7 20120313 (Red Hat 4.4.7-4):Process 18 on xxxx106 out of 24: 18.000000
>>>> 
>>>> $ echo $SLURM_JOB_CPUS_PER_NODE 
>>>> 8(x3)
>>>> 
>>>> I never used slurm before so this could also be a user error on my side.
>>>> But as 1.6.5 works it seems something has changed and wanted to let
>>>> you know in case it was not intentionally.
>>>> 
>>>>            Adrian
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>>              Adrian
>> 
>> -- 
>> Adrian Reber <adr...@lisas.de>            http://lisas.de/~adrian/
>> "Let us all bask in television's warm glowing warming glow." -- Homer Simpson
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 

Reply via email to