[OMPI devel] opal/mca/common: you can remove this directory
FYI: The opal/mca/common directory had been functionally empty for a while, so I "svn rm"'ed it last week or so. However, if you svn up, it SVN will likely still leave that directory around because it probably contains a Makefile and Makefile.in. It is safe to rm -rf this entire tree and re-augoten / configure / make. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI devel] Openmpi-1.5.3 issue " initialization failure on /dev/ipath (err=23)"
Hi Team, I am using Qlogic Infiniband and Openmpi-1.5.3. I can able to run the jobs by CLI without any issues, but when iam submitting over torque scheduler facing the below issue. I am facing issue while submitting the jobs through Torque scheduler. Error file is attached *Overview of the problem:* node1.ibab.ac.in.5910Driver initialization failure on /dev/ipath (err=23) -- PSM was unable to open an endpoint. Please make sure that the network link is active on the node and the hardware is functioning. Error: Failure in initializing endpoint I gone through the link http://www.open-mpi.org/community/lists/users/2011/12/17888.php for solution, same followed but no luck. I exported the value in my input submit script file as export PSM_SHAREDCONTEXTS_MAX=16, and submitted the job. Sample inputfile is #!/bin/bash #PBS -N matmul #PBS -l nodes=1:ppn=1 node=1 ppn=1 nprocs=`expr ${node} \* ${ppn}` echo "--- PBS_NODEFILE CONTENT ---" cat $PBS_NODEFILE export PSM_SHAREDCONTEXTS_MAX=16 mpirun -np ${nprocs} --hostfile $PBS_NODEFILE /home/khan/a.out < /home/khan/iter Please let me know I doing correct or not ? and suggest me for best out ? Regards, Bhagya Raju K node1.ibab.ac.in.5910Driver initialization failure on /dev/ipath (err=23) -- PSM was unable to open an endpoint. Please make sure that the network link is active on the node and the hardware is functioning. Error: Failure in initializing endpoint -- -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): PML add procs failed --> Returned "Error" (-1) instead of "Success" (0) -- *** The MPI_Init() function was called before MPI_INIT was invoked. *** This is disallowed by the MPI standard. *** Your MPI job will now abort. [node1.ibab.ac.in:5910] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! -- mpirun has exited due to process rank 0 with PID 5910 on node node1.ibab.ac.in exiting improperly. There are two reasons this could occur: 1. this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination. 2. this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination" This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -- [node1.ibab.ac.in:05909] 1 more process has sent help message help-mtl-psm.txt / unable to open endpoint [node1.ibab.ac.in:05909] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Re: [OMPI devel] Openmpi-1.5.3 issue " initialization failure on /dev/ipath (err=23)"
One thing stands out right away: why are you specifying a hostfile? Did you remember to configure OMPI with --with-tm so we launch via Torque? If not, then you could hit issues as you are actually attempting to launch via ssh, which has implications on a Torque-based system. On Mar 29, 2012, at 8:51 AM, Raju wrote: > Hi Team, > > I am using Qlogic Infiniband and Openmpi-1.5.3. I can able to run the jobs by > CLI without any issues, but when iam submitting over torque scheduler facing > the below issue. > > I am facing issue while submitting the jobs through Torque scheduler. Error > file is attached > > Overview of the problem: > > node1.ibab.ac.in.5910Driver initialization failure on /dev/ipath (err=23) > -- > PSM was unable to open an endpoint. Please make sure that the network link is > active on the node and the hardware is functioning. > > Error: Failure in initializing endpoint > > I gone through the link > http://www.open-mpi.org/community/lists/users/2011/12/17888.php for solution, > same followed but no luck. > > I exported the value in my input submit script file as export > PSM_SHAREDCONTEXTS_MAX=16, and submitted the job. > > Sample inputfile is > > #!/bin/bash > #PBS -N matmul > #PBS -l nodes=1:ppn=1 > node=1 > ppn=1 > nprocs=`expr ${node} \* ${ppn}` > echo "--- PBS_NODEFILE CONTENT ---" > cat $PBS_NODEFILE > export PSM_SHAREDCONTEXTS_MAX=16 > > mpirun -np ${nprocs} --hostfile $PBS_NODEFILE /home/khan/a.out < > /home/khan/iter > > Please let me know I doing correct or not ? and suggest me for best out ? > > Regards, > Bhagya Raju K > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Openmpi-1.5.3 issue " initialization failure on /dev/ipath (err=23)"
Hi Ralph, Thanks for the very quick response, I did compiled with -tm option i am doing now, once it done i will revert back... Thanks Raju.. On Thu, Mar 29, 2012 at 8:29 PM, Ralph Castain wrote: > One thing stands out right away: why are you specifying a hostfile? Did > you remember to configure OMPI with --with-tm so we launch via Torque? If > not, then you could hit issues as you are actually attempting to launch via > ssh, which has implications on a Torque-based system. > > > On Mar 29, 2012, at 8:51 AM, Raju wrote: > > Hi Team, > > I am using Qlogic Infiniband and Openmpi-1.5.3. I can able to run the jobs > by CLI without any issues, but when iam submitting over torque scheduler > facing the below issue. > > I am facing issue while submitting the jobs through Torque scheduler. > Error file is attached > > *Overview of the problem:* > node1.ibab.ac.in.5910Driver initialization failure on /dev/ipath (err=23) > -- > PSM was unable to open an endpoint. Please make sure that the network link > is > active on the node and the hardware is functioning. > > > Error: Failure in initializing endpoint > > > I gone through the link > http://www.open-mpi.org/community/lists/users/2011/12/17888.php for > solution, same followed but no luck. > > I exported the value in my input submit script file as export > PSM_SHAREDCONTEXTS_MAX=16, and submitted the job. > > Sample inputfile is > #!/bin/bash > #PBS -N matmul > #PBS -l nodes=1:ppn=1 > node=1 > ppn=1 > nprocs=`expr ${node} \* ${ppn}` > echo "--- PBS_NODEFILE CONTENT ---" > cat $PBS_NODEFILE > export PSM_SHAREDCONTEXTS_MAX=16 > > > mpirun -np ${nprocs} --hostfile $PBS_NODEFILE /home/khan/a.out < > /home/khan/iter > > > > Please let me know I doing correct or not ? and suggest me for best out ? > > Regards, > > Bhagya Raju K > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] Openmpi-1.5.3 issue " initialization failure on /dev/ipath (err=23)"
Hi Ralph, I recompiled OMPI with --with-tm option, but still same issue... I changed the input file as below... Please let me know what i have to fine tune and verify #!/bin/bash #PBS -N matmul #PBS -l nodes=1:ppn=1 node=1 ppn=1 nprocs=`expr ${node} \* ${ppn}` export PSM_SHAREDCONTEXTS_MAX=16 mpirun -np ${nprocs} /home/khan/a.out < /home/khan/iter Regards, Raju... On Thu, Mar 29, 2012 at 8:49 PM, Raju wrote: > Hi Ralph, > > Thanks for the very quick response, I did compiled with -tm option i am > doing now, once it done i will revert back... > > Thanks > Raju.. > > > On Thu, Mar 29, 2012 at 8:29 PM, Ralph Castain wrote: > >> One thing stands out right away: why are you specifying a hostfile? Did >> you remember to configure OMPI with --with-tm so we launch via Torque? If >> not, then you could hit issues as you are actually attempting to launch via >> ssh, which has implications on a Torque-based system. >> >> >> On Mar 29, 2012, at 8:51 AM, Raju wrote: >> >> Hi Team, >> >> I am using Qlogic Infiniband and Openmpi-1.5.3. I can able to run the >> jobs by CLI without any issues, but when iam submitting over torque >> scheduler facing the below issue. >> >> I am facing issue while submitting the jobs through Torque scheduler. >> Error file is attached >> >> *Overview of the problem:* >> node1.ibab.ac.in.5910Driver initialization failure on /dev/ipath (err=23) >> -- >> PSM was unable to open an endpoint. Please make sure that the network >> link is >> active on the node and the hardware is functioning. >> >> >> Error: Failure in initializing endpoint >> >> >> I gone through the link >> http://www.open-mpi.org/community/lists/users/2011/12/17888.php for >> solution, same followed but no luck. >> >> I exported the value in my input submit script file as export >> PSM_SHAREDCONTEXTS_MAX=16, and submitted the job. >> >> Sample inputfile is >> #!/bin/bash >> #PBS -N matmul >> #PBS -l nodes=1:ppn=1 >> node=1 >> ppn=1 >> nprocs=`expr ${node} \* ${ppn}` >> echo "--- PBS_NODEFILE CONTENT ---" >> cat $PBS_NODEFILE >> export PSM_SHAREDCONTEXTS_MAX=16 >> >> >> mpirun -np ${nprocs} --hostfile $PBS_NODEFILE /home/khan/a.out < >> /home/khan/iter >> >> >> >> Please let me know I doing correct or not ? and suggest me for best out ? >> >> Regards, >> >> Bhagya Raju K >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > >
Re: [OMPI devel] Openmpi-1.5.3 issue " initialization failure on /dev/ipath (err=23)"
This looks like a PSM problem (PSM is the layer than runs below Open MPI on QLogic NICs). You might need to contact QLogic tech support to find out how to solve it. On Mar 29, 2012, at 11:26 AM, Raju wrote: > Hi Ralph, > > I recompiled OMPI with --with-tm option, but still same issue... I changed > the input file as below... Please let me know what i have to fine tune and > verify > > #!/bin/bash > #PBS -N matmul > #PBS -l nodes=1:ppn=1 > node=1 > ppn=1 > nprocs=`expr ${node} \* ${ppn}` > export PSM_SHAREDCONTEXTS_MAX=16 > > mpirun -np ${nprocs} /home/khan/a.out < /home/khan/iter > > Regards, > Raju... > > On Thu, Mar 29, 2012 at 8:49 PM, Raju wrote: > Hi Ralph, > > Thanks for the very quick response, I did compiled with -tm option i am doing > now, once it done i will revert back... > > Thanks > Raju.. > > > On Thu, Mar 29, 2012 at 8:29 PM, Ralph Castain wrote: > One thing stands out right away: why are you specifying a hostfile? Did you > remember to configure OMPI with --with-tm so we launch via Torque? If not, > then you could hit issues as you are actually attempting to launch via ssh, > which has implications on a Torque-based system. > > > On Mar 29, 2012, at 8:51 AM, Raju wrote: > >> Hi Team, >> >> I am using Qlogic Infiniband and Openmpi-1.5.3. I can able to run the jobs >> by CLI without any issues, but when iam submitting over torque scheduler >> facing the below issue. >> >> I am facing issue while submitting the jobs through Torque scheduler. Error >> file is attached >> >> Overview of the problem: >> >> node1.ibab.ac.in.5910Driver initialization failure on /dev/ipath (err=23) >> -- >> PSM was unable to open an endpoint. Please make sure that the network link is >> active on the node and the hardware is functioning. >> >> Error: Failure in initializing endpoint >> >> I gone through the link >> http://www.open-mpi.org/community/lists/users/2011/12/17888.php for >> solution, same followed but no luck. >> >> I exported the value in my input submit script file as export >> PSM_SHAREDCONTEXTS_MAX=16, and submitted the job. >> >> Sample inputfile is >> >> #!/bin/bash >> #PBS -N matmul >> #PBS -l nodes=1:ppn=1 >> node=1 >> ppn=1 >> nprocs=`expr ${node} \* ${ppn}` >> echo "--- PBS_NODEFILE CONTENT ---" >> cat $PBS_NODEFILE >> export PSM_SHAREDCONTEXTS_MAX=16 >> >> mpirun -np ${nprocs} --hostfile $PBS_NODEFILE /home/khan/a.out < >> /home/khan/iter >> >> Please let me know I doing correct or not ? and suggest me for best out ? >> >> Regards, >> Bhagya Raju K >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] Openmpi-1.5.3 issue " initialization failure on /dev/ipath (err=23)"
Hi Jeffrey, Thanks for that i will contact them... as i mentioned earlier.. OpenMPI developers has provided the solution that we need to set the value for PSM_SHAREDCONTEXTS_MAX="some value" I kept in input file as export PSM_SHAREDCONTEXTS_MAX=16.. Correct me i have to do it same way or any other ways... Regards Raju... On Thu, Mar 29, 2012 at 8:58 PM, Jeffrey Squyres wrote: > This looks like a PSM problem (PSM is the layer than runs below Open MPI > on QLogic NICs). You might need to contact QLogic tech support to find out > how to solve it. > > > On Mar 29, 2012, at 11:26 AM, Raju wrote: > > > Hi Ralph, > > > > I recompiled OMPI with --with-tm option, but still same issue... I > changed the input file as below... Please let me know what i have to fine > tune and verify > > > > #!/bin/bash > > #PBS -N matmul > > #PBS -l nodes=1:ppn=1 > > node=1 > > ppn=1 > > nprocs=`expr ${node} \* ${ppn}` > > export PSM_SHAREDCONTEXTS_MAX=16 > > > > mpirun -np ${nprocs} /home/khan/a.out < /home/khan/iter > > > > Regards, > > Raju... > > > > On Thu, Mar 29, 2012 at 8:49 PM, Raju wrote: > > Hi Ralph, > > > > Thanks for the very quick response, I did compiled with -tm option i am > doing now, once it done i will revert back... > > > > Thanks > > Raju.. > > > > > > On Thu, Mar 29, 2012 at 8:29 PM, Ralph Castain wrote: > > One thing stands out right away: why are you specifying a hostfile? Did > you remember to configure OMPI with --with-tm so we launch via Torque? If > not, then you could hit issues as you are actually attempting to launch via > ssh, which has implications on a Torque-based system. > > > > > > On Mar 29, 2012, at 8:51 AM, Raju wrote: > > > >> Hi Team, > >> > >> I am using Qlogic Infiniband and Openmpi-1.5.3. I can able to run the > jobs by CLI without any issues, but when iam submitting over torque > scheduler facing the below issue. > >> > >> I am facing issue while submitting the jobs through Torque scheduler. > Error file is attached > >> > >> Overview of the problem: > >> > >> node1.ibab.ac.in.5910Driver initialization failure on /dev/ipath > (err=23) > >> > -- > >> PSM was unable to open an endpoint. Please make sure that the network > link is > >> active on the node and the hardware is functioning. > >> > >> Error: Failure in initializing endpoint > >> > >> I gone through the link > http://www.open-mpi.org/community/lists/users/2011/12/17888.php for > solution, same followed but no luck. > >> > >> I exported the value in my input submit script file as export > PSM_SHAREDCONTEXTS_MAX=16, and submitted the job. > >> > >> Sample inputfile is > >> > >> #!/bin/bash > >> #PBS -N matmul > >> #PBS -l nodes=1:ppn=1 > >> node=1 > >> ppn=1 > >> nprocs=`expr ${node} \* ${ppn}` > >> echo "--- PBS_NODEFILE CONTENT ---" > >> cat $PBS_NODEFILE > >> export PSM_SHAREDCONTEXTS_MAX=16 > >> > >> mpirun -np ${nprocs} --hostfile $PBS_NODEFILE /home/khan/a.out < > /home/khan/iter > >> > >> Please let me know I doing correct or not ? and suggest me for best out > ? > >> > >> Regards, > >> Bhagya Raju K > >> ___ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] Openmpi-1.5.3 issue " initialization failure on /dev/ipath (err=23)"
I didn't realize from your text that the SHAREDCONTEXTS_MAX value made it work. If so, I would assume that is a good solution. But I don't know for sure; you might well need to contact QLogic and ask. On Mar 29, 2012, at 11:34 AM, Raju wrote: > Hi Jeffrey, > > Thanks for that i will contact them... as i mentioned earlier.. OpenMPI > developers has provided the solution that we need to set the value for > PSM_SHAREDCONTEXTS_MAX="some value" > > I kept in input file as export PSM_SHAREDCONTEXTS_MAX=16.. Correct me i > have to do it same way or any other ways... > > Regards > Raju... > > On Thu, Mar 29, 2012 at 8:58 PM, Jeffrey Squyres wrote: > This looks like a PSM problem (PSM is the layer than runs below Open MPI on > QLogic NICs). You might need to contact QLogic tech support to find out how > to solve it. > > > On Mar 29, 2012, at 11:26 AM, Raju wrote: > > > Hi Ralph, > > > > I recompiled OMPI with --with-tm option, but still same issue... I changed > > the input file as below... Please let me know what i have to fine tune and > > verify > > > > #!/bin/bash > > #PBS -N matmul > > #PBS -l nodes=1:ppn=1 > > node=1 > > ppn=1 > > nprocs=`expr ${node} \* ${ppn}` > > export PSM_SHAREDCONTEXTS_MAX=16 > > > > mpirun -np ${nprocs} /home/khan/a.out < /home/khan/iter > > > > Regards, > > Raju... > > > > On Thu, Mar 29, 2012 at 8:49 PM, Raju wrote: > > Hi Ralph, > > > > Thanks for the very quick response, I did compiled with -tm option i am > > doing now, once it done i will revert back... > > > > Thanks > > Raju.. > > > > > > On Thu, Mar 29, 2012 at 8:29 PM, Ralph Castain wrote: > > One thing stands out right away: why are you specifying a hostfile? Did you > > remember to configure OMPI with --with-tm so we launch via Torque? If not, > > then you could hit issues as you are actually attempting to launch via ssh, > > which has implications on a Torque-based system. > > > > > > On Mar 29, 2012, at 8:51 AM, Raju wrote: > > > >> Hi Team, > >> > >> I am using Qlogic Infiniband and Openmpi-1.5.3. I can able to run the jobs > >> by CLI without any issues, but when iam submitting over torque scheduler > >> facing the below issue. > >> > >> I am facing issue while submitting the jobs through Torque scheduler. > >> Error file is attached > >> > >> Overview of the problem: > >> > >> node1.ibab.ac.in.5910Driver initialization failure on /dev/ipath (err=23) > >> -- > >> PSM was unable to open an endpoint. Please make sure that the network link > >> is > >> active on the node and the hardware is functioning. > >> > >> Error: Failure in initializing endpoint > >> > >> I gone through the link > >> http://www.open-mpi.org/community/lists/users/2011/12/17888.php for > >> solution, same followed but no luck. > >> > >> I exported the value in my input submit script file as export > >> PSM_SHAREDCONTEXTS_MAX=16, and submitted the job. > >> > >> Sample inputfile is > >> > >> #!/bin/bash > >> #PBS -N matmul > >> #PBS -l nodes=1:ppn=1 > >> node=1 > >> ppn=1 > >> nprocs=`expr ${node} \* ${ppn}` > >> echo "--- PBS_NODEFILE CONTENT ---" > >> cat $PBS_NODEFILE > >> export PSM_SHAREDCONTEXTS_MAX=16 > >> > >> mpirun -np ${nprocs} --hostfile $PBS_NODEFILE /home/khan/a.out < > >> /home/khan/iter > >> > >> Please let me know I doing correct or not ? and suggest me for best out ? > >> > >> Regards, > >> Bhagya Raju K > >> ___ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/