By the way, there was a change between 2.x and 3.0.x:

2.x:

Hello, world, I am 0 of 1, (Open MPI v2.1.2a1, package: Open MPI 
bbarrett@ip-172-31-64-10 Distribution, ident: 2.1.2a1, repo rev: 
v2.1.1-59-gdc049e4, Unreleased developer copy, 148)
Hello, world, I am 0 of 1, (Open MPI v2.1.2a1, package: Open MPI 
bbarrett@ip-172-31-64-10 Distribution, ident: 2.1.2a1, repo rev: 
v2.1.1-59-gdc049e4, Unreleased developer copy, 148)


3.0.x:

% srun  -n 2 ./hello_c
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[ip-172-31-64-100:72545] Local abort before MPI_INIT completed completed 
successfully, but am not able to aggregate error messages, and not able to 
guarantee that all other processes were killed!
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[ip-172-31-64-100:72546] Local abort before MPI_INIT completed completed 
successfully, but am not able to aggregate error messages, and not able to 
guarantee that all other processes were killed!
srun: error: ip-172-31-64-100: tasks 0-1: Exited with exit code 1

Don’t think it really matters, since v2.x probably wasn’t what the customer 
wanted.

Brian

On Jun 19, 2017, at 7:18 AM, Howard Pritchard 
<hpprit...@gmail.com<mailto:hpprit...@gmail.com>> wrote:

Hi Ralph

I think the alternative you mention below should suffice.

Howard

r...@open-mpi.org<mailto:r...@open-mpi.org> 
<r...@open-mpi.org<mailto:r...@open-mpi.org>> schrieb am Mo. 19. Juni 2017 um 
07:24:
So what you guys want is for me to detect that no opal/pmix framework 
components could run, detect that we are in a slurm job, and so print out an 
error message saying “hey dummy - you didn’t configure us with slurm pmi 
support”?

It means embedding slurm job detection code in the heart of ORTE (as opposed to 
in a component), which bothers me a bit.

As an alternative, what if I print out a generic “you didn’t configure us with 
pmi support for this environment” instead of the “pmix select failed” message? 
I can mention how to configure the support in a general way, but it avoids 
having to embed slurm detection into ORTE outside of a component.

> On Jun 16, 2017, at 8:39 AM, Jeff Squyres (jsquyres) 
> <jsquy...@cisco.com<mailto:jsquy...@cisco.com>> wrote:
>
> +1 on the error message.
>
>
>
>> On Jun 16, 2017, at 10:06 AM, Howard Pritchard 
>> <hpprit...@gmail.com<mailto:hpprit...@gmail.com>> wrote:
>>
>> Hi Ralph
>>
>> I think a helpful  error message would suffice.
>>
>> Howard
>>
>> r...@open-mpi.org<mailto:r...@open-mpi.org> 
>> <r...@open-mpi.org<mailto:r...@open-mpi.org>> schrieb am Di. 13. Juni 2017 
>> um 11:15:
>> Hey folks
>>
>> Brian brought this up today on the call, so I spent a little time 
>> investigating. After installing SLURM 17.02 (with just --prefix as config 
>> args), I configured OMPI with just --prefix config args. Getting an 
>> allocation and then executing “srun ./hello” failed, as expected.
>>
>> However, configuring OMPI --with-pmi=<path-to-slurm> resolved the problem. 
>> SLURM continues to default to PMI-1, and so we pick that option up and use 
>> it. Everything works fine.
>>
>> FWIW: I also went back and checked using SLURM 15.08 and got the identical 
>> behavior.
>>
>> So the issue is: we don’t pick up PMI support by default, and never have due 
>> to the SLURM license issue. Thus, we have always required that the user 
>> explicitly configure --with-pmi so they take responsibility for the license. 
>> This is an acknowledged way of avoiding having GPL pull OMPI under its 
>> umbrella as it is the user, and not the OMPI community, that is making the 
>> link.
>>
>> I’m not sure there is anything we can or should do about this, other than 
>> perhaps providing a nicer error message. Thoughts?
>> Ralph
>>
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com<mailto:jsquy...@cisco.com>
>
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to