Re: [OMPI users] Status of SLURM integration

2012-01-11 Thread Andrew Senin
Ralph, Jeff, thanks!

I managed to make it work with the following configure options:

 ./configure --with-pmi=/usr/ --with-slurm=/usr/ --without-psm
--prefix=`pwd`/install

Regards,
Andrew Senin

On Wed, Jan 11, 2012 at 7:17 PM, Ralph Castain  wrote:
> Well, yes - but it isn't quite that simple. :-/
>
> If you want to direct-launch on slurm without using the resv_ports option, 
> you need to build OMPI to include PMI support by including --with-pmi on your 
> configure cmd line. You may need to point to where pmi.h resides (e.g., 
> --with-pmi=/opt/slurm/include).
>
> We don't do that automatically because slurm's pmi.h is GPL, and so the 
> resulting binary is GPL. This isn't an issue if you are just using the binary 
> and not distributing it, but we chose to not surprise anyone.
>
> If you build the PMI support, then you can just srun your app without using 
> resv_ports.
>
> HTH
> Ralph
>
> On Jan 11, 2012, at 6:04 AM, Jeff Squyres wrote:
>
>> The latest -- 1.5.5rc2 (just released last night) -- has direct "srun 
>> my_mpi_application" integration.  It's not in a final release yet, but as 
>> you can probably guess by the version number, it'll be in the final version 
>> of 1.5.5.
>>
>> We have 1-2 bugs remaining in 1.5.5 that are actively being worked.  Once 
>> those are fixed (hopefully, in the Very Near Future), 1.5.5 will be released.
>>
>>
>> On Jan 10, 2012, at 11:38 PM, Andrew Senin wrote:
>>
>>> Hi,
>>>
>>> Could you please describe the current status of SLURM integration? I
>>> had a feeling srun supports direct launch of OpenMpi applications
>>> (without mpirun) compiled with the 1.5 branch.  At least one of my
>>> colleagu succeeded on that.
>>>
>>> But when I installed SLURM and the head revision of OpenMPI 1.5 branch
>>> I did not manage to run it without settings the SLURM_STEP_RESV_PORTS
>>> environment variable. I receive the following:
>>>
>>> orte_grpcomm_modex failed
>>> --> Returned "A message is attempting to be sent to a process whose
>>> contact information is unknown" (-117) instead of "Success" (0)
>>> --
>>> [mir9:25477] *** An error occurred in MPI_Init
>>> [mir9:25477] *** on a NULL communicator
>>> [mir9:25477] *** Unknown error
>>> [mir9:25477] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
>>>
>>> So I have 2 questions:
>>> 1. Is support of SLURM in the head revision of 1.5 branch stable
>>> enough to use it in the lab?
>>> 2. Does direct launch of mpi applications require setting the
>>> SLURM_STEP_RESV_PORTS environment variable?
>>>
>>> Thanks,
>>> Andrew Senin.
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Status of SLURM integration

2012-01-11 Thread Ralph Castain
Well, yes - but it isn't quite that simple. :-/

If you want to direct-launch on slurm without using the resv_ports option, you 
need to build OMPI to include PMI support by including --with-pmi on your 
configure cmd line. You may need to point to where pmi.h resides (e.g., 
--with-pmi=/opt/slurm/include).

We don't do that automatically because slurm's pmi.h is GPL, and so the 
resulting binary is GPL. This isn't an issue if you are just using the binary 
and not distributing it, but we chose to not surprise anyone.

If you build the PMI support, then you can just srun your app without using 
resv_ports.

HTH
Ralph

On Jan 11, 2012, at 6:04 AM, Jeff Squyres wrote:

> The latest -- 1.5.5rc2 (just released last night) -- has direct "srun 
> my_mpi_application" integration.  It's not in a final release yet, but as you 
> can probably guess by the version number, it'll be in the final version of 
> 1.5.5.
> 
> We have 1-2 bugs remaining in 1.5.5 that are actively being worked.  Once 
> those are fixed (hopefully, in the Very Near Future), 1.5.5 will be released.
> 
> 
> On Jan 10, 2012, at 11:38 PM, Andrew Senin wrote:
> 
>> Hi,
>> 
>> Could you please describe the current status of SLURM integration? I
>> had a feeling srun supports direct launch of OpenMpi applications
>> (without mpirun) compiled with the 1.5 branch.  At least one of my
>> colleagu succeeded on that.
>> 
>> But when I installed SLURM and the head revision of OpenMPI 1.5 branch
>> I did not manage to run it without settings the SLURM_STEP_RESV_PORTS
>> environment variable. I receive the following:
>> 
>> orte_grpcomm_modex failed
>> --> Returned "A message is attempting to be sent to a process whose
>> contact information is unknown" (-117) instead of "Success" (0)
>> --
>> [mir9:25477] *** An error occurred in MPI_Init
>> [mir9:25477] *** on a NULL communicator
>> [mir9:25477] *** Unknown error
>> [mir9:25477] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
>> 
>> So I have 2 questions:
>> 1. Is support of SLURM in the head revision of 1.5 branch stable
>> enough to use it in the lab?
>> 2. Does direct launch of mpi applications require setting the
>> SLURM_STEP_RESV_PORTS environment variable?
>> 
>> Thanks,
>> Andrew Senin.
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Status of SLURM integration

2012-01-11 Thread Jeff Squyres
The latest -- 1.5.5rc2 (just released last night) -- has direct "srun 
my_mpi_application" integration.  It's not in a final release yet, but as you 
can probably guess by the version number, it'll be in the final version of 
1.5.5.

We have 1-2 bugs remaining in 1.5.5 that are actively being worked.  Once those 
are fixed (hopefully, in the Very Near Future), 1.5.5 will be released.


On Jan 10, 2012, at 11:38 PM, Andrew Senin wrote:

> Hi,
> 
> Could you please describe the current status of SLURM integration? I
> had a feeling srun supports direct launch of OpenMpi applications
> (without mpirun) compiled with the 1.5 branch.  At least one of my
> colleagu succeeded on that.
> 
> But when I installed SLURM and the head revision of OpenMPI 1.5 branch
> I did not manage to run it without settings the SLURM_STEP_RESV_PORTS
> environment variable. I receive the following:
> 
>  orte_grpcomm_modex failed
>  --> Returned "A message is attempting to be sent to a process whose
> contact information is unknown" (-117) instead of "Success" (0)
> --
> [mir9:25477] *** An error occurred in MPI_Init
> [mir9:25477] *** on a NULL communicator
> [mir9:25477] *** Unknown error
> [mir9:25477] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
> 
> So I have 2 questions:
> 1. Is support of SLURM in the head revision of 1.5 branch stable
> enough to use it in the lab?
> 2. Does direct launch of mpi applications require setting the
> SLURM_STEP_RESV_PORTS environment variable?
> 
> Thanks,
> Andrew Senin.
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI users] Status of SLURM integration

2012-01-11 Thread Andrew Senin
Hi,

Could you please describe the current status of SLURM integration? I
had a feeling srun supports direct launch of OpenMpi applications
(without mpirun) compiled with the 1.5 branch.  At least one of my
colleagu succeeded on that.

But when I installed SLURM and the head revision of OpenMPI 1.5 branch
I did not manage to run it without settings the SLURM_STEP_RESV_PORTS
environment variable. I receive the following:

  orte_grpcomm_modex failed
  --> Returned "A message is attempting to be sent to a process whose
contact information is unknown" (-117) instead of "Success" (0)
--
[mir9:25477] *** An error occurred in MPI_Init
[mir9:25477] *** on a NULL communicator
[mir9:25477] *** Unknown error
[mir9:25477] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort

So I have 2 questions:
1. Is support of SLURM in the head revision of 1.5 branch stable
enough to use it in the lab?
2. Does direct launch of mpi applications require setting the
SLURM_STEP_RESV_PORTS environment variable?

Thanks,
Andrew Senin.