Hi Team,

I am using Qlogic Infiniband and Openmpi-1.5.3. I can able to run the jobs
by CLI without any issues, but when iam submitting over torque scheduler
facing the below issue.

I am facing issue while submitting the jobs through Torque scheduler. Error
file is attached

*Overview of the problem:*

node1.ibab.ac.in.5910Driver initialization failure on /dev/ipath (err=23)

--------------------------------------------------------------------------

PSM was unable to open an endpoint. Please make sure that the network link
is

active on the node and the hardware is functioning.



  Error: Failure in initializing endpoint



I gone through the link
http://www.open-mpi.org/community/lists/users/2011/12/17888.php for
solution, same followed but no luck.

I exported the value in my input submit script file as export
PSM_SHAREDCONTEXTS_MAX=16, and submitted the job.

Sample inputfile is

#!/bin/bash

#PBS -N matmul

#PBS -l nodes=1:ppn=1

node=1

ppn=1

nprocs=`expr ${node} \* ${ppn}`

echo "--- PBS_NODEFILE CONTENT ---"

cat $PBS_NODEFILE

export PSM_SHAREDCONTEXTS_MAX=16



mpirun -np ${nprocs} --hostfile $PBS_NODEFILE  /home/khan/a.out <
/home/khan/iter



Please let me know I doing correct or not ? and suggest me for best out ?

Regards,

Bhagya Raju K
node1.ibab.ac.in.5910Driver initialization failure on /dev/ipath (err=23)
--------------------------------------------------------------------------
PSM was unable to open an endpoint. Please make sure that the network link is
active on the node and the hardware is functioning.

  Error: Failure in initializing endpoint
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[node1.ibab.ac.in:5910] Abort before MPI_INIT completed successfully; not able 
to guarantee that all other processes were killed!
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 5910 on
node node1.ibab.ac.in exiting improperly. There are two reasons this could 
occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[node1.ibab.ac.in:05909] 1 more process has sent help message help-mtl-psm.txt 
/ unable to open endpoint
[node1.ibab.ac.in:05909] Set MCA parameter "orte_base_help_aggregate" to 0 to 
see all help / error messages

Reply via email to