Fully expected - if ORTE can’t start one or more daemons, then the MPI job 
itself will never be executed.

There was an SGE integration issue in the 2.0 series - I fixed it, but IIRC it 
didn’t quite make the 2.0.2 release. In fact, I just checked and it did indeed 
miss that release.

You have three choices:

1. you could apply the patch to the 2.0.2 source code yourself - it is at 
https://github.com/open-mpi/ompi/pull/3162 
<https://github.com/open-mpi/ompi/pull/3162>

2. download a copy of the latest nightly 2.0.3 tarball - hasn’t been officially 
released yet, but includes the patch

3. upgrade to the nightly 2.1.1 tarball - expected to be officially released 
soon and also includes the patch

Hopefully, one of those options will fix the problem
Ralph


> On Apr 19, 2017, at 4:57 PM, Kevin Buckley 
> <kevin.buckley.ecs.vuw.ac...@gmail.com> wrote:
> 
> On 19 April 2017 at 18:35, Kevin Buckley
> <kevin.buckley.ecs.vuw.ac...@gmail.com> wrote:
> 
>> If I compile against 2.0.2 the same command works at the command line
>> but not in the "SGE" job submission, where I see a complaint about
>> 
>> =================================
>> Host key verification failed.
>> --------------------------------------------------------------------------
>> ORTE was unable to reliably start one or more daemons.
>> This usually is caused by:
>> .... blah, blah, blah ...
>> =================================
> 
> Just to add that if I add in some basic debugging
> 
> --mca btl_base_verbose 30
> 
> then when running at the command line, I get a swathe of info
> from the MCA, however within the SGE environment, I still only
> get the "ORTE was unable .." message ?
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to