Fully expected - if ORTE can’t start one or more daemons, then the MPI job itself will never be executed.
There was an SGE integration issue in the 2.0 series - I fixed it, but IIRC it didn’t quite make the 2.0.2 release. In fact, I just checked and it did indeed miss that release. You have three choices: 1. you could apply the patch to the 2.0.2 source code yourself - it is at https://github.com/open-mpi/ompi/pull/3162 <https://github.com/open-mpi/ompi/pull/3162> 2. download a copy of the latest nightly 2.0.3 tarball - hasn’t been officially released yet, but includes the patch 3. upgrade to the nightly 2.1.1 tarball - expected to be officially released soon and also includes the patch Hopefully, one of those options will fix the problem Ralph > On Apr 19, 2017, at 4:57 PM, Kevin Buckley > <kevin.buckley.ecs.vuw.ac...@gmail.com> wrote: > > On 19 April 2017 at 18:35, Kevin Buckley > <kevin.buckley.ecs.vuw.ac...@gmail.com> wrote: > >> If I compile against 2.0.2 the same command works at the command line >> but not in the "SGE" job submission, where I see a complaint about >> >> ================================= >> Host key verification failed. >> -------------------------------------------------------------------------- >> ORTE was unable to reliably start one or more daemons. >> This usually is caused by: >> .... blah, blah, blah ... >> ================================= > > Just to add that if I add in some basic debugging > > --mca btl_base_verbose 30 > > then when running at the command line, I get a swathe of info > from the MCA, however within the SGE environment, I still only > get the "ORTE was unable .." message ? > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel