Hi Jeffrey, Thanks for that i will contact them... as i mentioned earlier.. OpenMPI developers has provided the solution that we need to set the value for PSM_SHAREDCONTEXTS_MAX="some value"....
I kept in input file as export PSM_SHAREDCONTEXTS_MAX=16.. Correct me i have to do it same way or any other ways... Regards Raju... On Thu, Mar 29, 2012 at 8:58 PM, Jeffrey Squyres <jsquy...@cisco.com> wrote: > This looks like a PSM problem (PSM is the layer than runs below Open MPI > on QLogic NICs). You might need to contact QLogic tech support to find out > how to solve it. > > > On Mar 29, 2012, at 11:26 AM, Raju wrote: > > > Hi Ralph, > > > > I recompiled OMPI with --with-tm option, but still same issue... I > changed the input file as below... Please let me know what i have to fine > tune and verify > > > > #!/bin/bash > > #PBS -N matmul > > #PBS -l nodes=1:ppn=1 > > node=1 > > ppn=1 > > nprocs=`expr ${node} \* ${ppn}` > > export PSM_SHAREDCONTEXTS_MAX=16 > > > > mpirun -np ${nprocs} /home/khan/a.out < /home/khan/iter > > > > Regards, > > Raju... > > > > On Thu, Mar 29, 2012 at 8:49 PM, Raju <bra...@gmail.com> wrote: > > Hi Ralph, > > > > Thanks for the very quick response, I did compiled with -tm option i am > doing now, once it done i will revert back... > > > > Thanks > > Raju.. > > > > > > On Thu, Mar 29, 2012 at 8:29 PM, Ralph Castain <r...@open-mpi.org> wrote: > > One thing stands out right away: why are you specifying a hostfile? Did > you remember to configure OMPI with --with-tm so we launch via Torque? If > not, then you could hit issues as you are actually attempting to launch via > ssh, which has implications on a Torque-based system. > > > > > > On Mar 29, 2012, at 8:51 AM, Raju wrote: > > > >> Hi Team, > >> > >> I am using Qlogic Infiniband and Openmpi-1.5.3. I can able to run the > jobs by CLI without any issues, but when iam submitting over torque > scheduler facing the below issue. > >> > >> I am facing issue while submitting the jobs through Torque scheduler. > Error file is attached > >> > >> Overview of the problem: > >> > >> node1.ibab.ac.in.5910Driver initialization failure on /dev/ipath > (err=23) > >> > -------------------------------------------------------------------------- > >> PSM was unable to open an endpoint. Please make sure that the network > link is > >> active on the node and the hardware is functioning. > >> > >> Error: Failure in initializing endpoint > >> > >> I gone through the link > http://www.open-mpi.org/community/lists/users/2011/12/17888.php for > solution, same followed but no luck. > >> > >> I exported the value in my input submit script file as export > PSM_SHAREDCONTEXTS_MAX=16, and submitted the job. > >> > >> Sample inputfile is > >> > >> #!/bin/bash > >> #PBS -N matmul > >> #PBS -l nodes=1:ppn=1 > >> node=1 > >> ppn=1 > >> nprocs=`expr ${node} \* ${ppn}` > >> echo "--- PBS_NODEFILE CONTENT ---" > >> cat $PBS_NODEFILE > >> export PSM_SHAREDCONTEXTS_MAX=16 > >> > >> mpirun -np ${nprocs} --hostfile $PBS_NODEFILE /home/khan/a.out < > /home/khan/iter > >> > >> Please let me know I doing correct or not ? and suggest me for best out > ? > >> > >> Regards, > >> Bhagya Raju K > >> <errfile.txt>_______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >