[OMPI users] Questions about integration with resource distribution systems

2017-07-25 Thread Kulshrestha, Vipul
I have several questions about integration of openmpi with resource queuing 
systems.

1.
I understand that openmpi supports integration with various resource 
distribution systems such as SGE, LSF, torque etc.

I need to build an openmpi application that can interact with variety of 
different resource distribution systems, since different customers have 
different systems. Based on my research, it seems that I need to build a 
different openmpi installation to work, e.g. create an installation of opempi 
with grid and create a different installation of openmpi with LSF. Is there a 
way to build a generic installation of openmpi that can be used with more than 
1 distribution system by using some generic mechanism?

2.
For integration with LSF/grid, how would I specify the memory (RAM) requirement 
(or some other parameter) to bsub/qsub, when launching mpirun command? Will 
something like below work to ensure that each of the 8 copies of a.out have 40 
GB memory reserved for them by grid engine?

qsub -pe orte 8 -b y -V -l m_mem_free=40G -cwd mpirun -np 8 a.out

3.
Some of our customers use custom distribution engine (some 
non-industry-standard distribution engine). How can I integrate my openmpi  
application with such system? I would think that it should be possible to do 
that if openmpi launched/managed interaction with the distribution engine using 
some kind of generic mechanism (say, use a configurable command to launch, 
monitor, kill a job and then allow specification of a plugin define these 
operations with commands specific to the distribution engine being in use). 
Does such integration exist in openmpi?


Thanks,
Vipul


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-25 Thread r...@open-mpi.org

> On Jul 25, 2017, at 3:48 PM, Kulshrestha, Vipul 
>  wrote:
> 
> I have several questions about integration of openmpi with resource queuing 
> systems.
>  
> 1.
> I understand that openmpi supports integration with various resource 
> distribution systems such as SGE, LSF, torque etc.
>  
> I need to build an openmpi application that can interact with variety of 
> different resource distribution systems, since different customers have 
> different systems. Based on my research, it seems that I need to build a 
> different openmpi installation to work, e.g. create an installation of opempi 
> with grid and create a different installation of openmpi with LSF. Is there a 
> way to build a generic installation of openmpi that can be used with more 
> than 1 distribution system by using some generic mechanism?

Just to be clear: your application doesn’t depend on the environment in any 
way. Only mpirun does - so if you are distributing an _application_, then your 
question is irrelevant. 

If you are distributing OMPI itself, and therefore mpirun, then you can build 
the various components if you first install the headers for that environment on 
your system. It means that you need one machine where all those resource 
managers at least have their headers installed on it. Then configure OMPI 
--with-xxx pointing to each of the RM’s headers so all the components get 
built. When the binary hits your customer’s machine, only those components that 
have active libraries present will execute.

>  
> 2.
> For integration with LSF/grid, how would I specify the memory (RAM) 
> requirement (or some other parameter) to bsub/qsub, when launching mpirun 
> command? Will something like below work to ensure that each of the 8 copies 
> of a.out have 40 GB memory reserved for them by grid engine?
>  
> qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out

You’ll have to provide something that is environment dependent, I’m afraid - 
there is no standard out there.

>  
> 3.
> Some of our customers use custom distribution engine (some 
> non-industry-standard distribution engine). How can I integrate my openmpi  
> application with such system? I would think that it should be possible to do 
> that if openmpi launched/managed interaction with the distribution engine 
> using some kind of generic mechanism (say, use a configurable command to 
> launch, monitor, kill a job and then allow specification of a plugin define 
> these operations with commands specific to the distribution engine being in 
> use). Does such integration exist in openmpi?

Easiest solution is to write a script that reads the allocation and dumps it 
into a file, and then provide that file as your hostfile on the mpirun cmd line 
(or in the environment). We will then use ssh to perform the launch. Otherwise, 
you’ll need to write at least an orte/mca/ras component to get the allocation, 
and possibly an orte/mca/plm component if you want to use the native launch 
mechanism in place of ssh.

>  
>  
> Thanks,
> Vipul
>  
>  
> ___
> users mailing list
> users@lists.open-mpi.org 
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> 
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-26 Thread Reuti
Hi,

> Am 26.07.2017 um 00:48 schrieb Kulshrestha, Vipul 
> :
> 
> I have several questions about integration of openmpi with resource queuing 
> systems.
>  
> 1.
> I understand that openmpi supports integration with various resource 
> distribution systems such as SGE, LSF, torque etc.
>  
> I need to build an openmpi application that can interact with variety of 
> different resource distribution systems, since different customers have 
> different systems. Based on my research, it seems that I need to build a 
> different openmpi installation to work, e.g. create an installation of opempi 
> with grid and create a different installation of openmpi with LSF. Is there a 
> way to build a generic installation of openmpi that can be used with more 
> than 1 distribution system by using some generic mechanism?
>  
> 2.
> For integration with LSF/grid, how would I specify the memory (RAM) 
> requirement (or some other parameter) to bsub/qsub, when launching mpirun 
> command? Will something like below work to ensure that each of the 8 copies 
> of a.out have 40 GB memory reserved for them by grid engine?
>  
> qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out

m_mem_free is part of Univa SGE (but not the various free ones of SGE AFAIK).

Also: this syntax is for SGE, in LSF it's different.

To have this independent from the actual queuing system, one could look into 
DRMAA V.2 (unfortunately not many are supporting this version of DRMAA).

-- Reuti
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-26 Thread Reuti
Hi,

> Am 26.07.2017 um 02:16 schrieb r...@open-mpi.org:
> 
> 
>> On Jul 25, 2017, at 3:48 PM, Kulshrestha, Vipul 
>>  wrote:
>> 
>> I have several questions about integration of openmpi with resource queuing 
>> systems.
>>  
>> 1.
>> I understand that openmpi supports integration with various resource 
>> distribution systems such as SGE, LSF, torque etc.
>>  
>> I need to build an openmpi application that can interact with variety of 
>> different resource distribution systems, since different customers have 
>> different systems. Based on my research, it seems that I need to build a 
>> different openmpi installation to work, e.g. create an installation of 
>> opempi with grid and create a different installation of openmpi with LSF. Is 
>> there a way to build a generic installation of openmpi that can be used with 
>> more than 1 distribution system by using some generic mechanism?
> 
> Just to be clear: your application doesn’t depend on the environment in any 
> way. Only mpirun does - so if you are distributing an _application_, then 
> your question is irrelevant. 
> 
> If you are distributing OMPI itself, and therefore mpirun, then you can build 
> the various components if you first install the headers for that environment 
> on your system. It means that you need one machine where all those resource 
> managers at least have their headers installed on it. Then configure OMPI 
> --with-xxx pointing to each of the RM’s headers so all the components get 
> built. When the binary hits your customer’s machine, only those components 
> that have active libraries present will execute.

Just note, that the SGE integration doesn't need any library. It's invoked when 
Open MPI finds certain environment variables set to sensitive values.

-- Reuti
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-26 Thread Kulshrestha, Vipul
Thanks for a quick response.

I will try building OMPI as suggested.

On the integration with unsupported distribution systems, we cannot use script 
based approach, because often these machines don’t have ssh permission in 
customer environment. I will explore the path of writing orte component. At 
this stage, I don’t understand the effort for the same.

I guess my question 2 was not understood correctly. I used the below command as 
an example for SGE and want to understand the expected behavior for such a 
command. With the below command, I expect to have 8 copies of a.out launched 
with each copy having access to 40GB of memory. Is that correct? I am doubtful, 
because I don’t understand how mpirun gets access to information about RAM 
requirement.

qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out


Regards,
Vipul



From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of 
r...@open-mpi.org
Sent: Tuesday, July 25, 2017 8:16 PM
To: Open MPI Users 
Subject: Re: [OMPI users] Questions about integration with resource 
distribution systems


On Jul 25, 2017, at 3:48 PM, Kulshrestha, Vipul 
mailto:vipul_kulshres...@mentor.com>> wrote:

I have several questions about integration of openmpi with resource queuing 
systems.

1.
I understand that openmpi supports integration with various resource 
distribution systems such as SGE, LSF, torque etc.

I need to build an openmpi application that can interact with variety of 
different resource distribution systems, since different customers have 
different systems. Based on my research, it seems that I need to build a 
different openmpi installation to work, e.g. create an installation of opempi 
with grid and create a different installation of openmpi with LSF. Is there a 
way to build a generic installation of openmpi that can be used with more than 
1 distribution system by using some generic mechanism?

Just to be clear: your application doesn’t depend on the environment in any 
way. Only mpirun does - so if you are distributing an _application_, then your 
question is irrelevant.

If you are distributing OMPI itself, and therefore mpirun, then you can build 
the various components if you first install the headers for that environment on 
your system. It means that you need one machine where all those resource 
managers at least have their headers installed on it. Then configure OMPI 
--with-xxx pointing to each of the RM’s headers so all the components get 
built. When the binary hits your customer’s machine, only those components that 
have active libraries present will execute.


2.
For integration with LSF/grid, how would I specify the memory (RAM) requirement 
(or some other parameter) to bsub/qsub, when launching mpirun command? Will 
something like below work to ensure that each of the 8 copies of a.out have 40 
GB memory reserved for them by grid engine?

qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out

You’ll have to provide something that is environment dependent, I’m afraid - 
there is no standard out there.



3.
Some of our customers use custom distribution engine (some 
non-industry-standard distribution engine). How can I integrate my openmpi  
application with such system? I would think that it should be possible to do 
that if openmpi launched/managed interaction with the distribution engine using 
some kind of generic mechanism (say, use a configurable command to launch, 
monitor, kill a job and then allow specification of a plugin define these 
operations with commands specific to the distribution engine being in use). 
Does such integration exist in openmpi?

Easiest solution is to write a script that reads the allocation and dumps it 
into a file, and then provide that file as your hostfile on the mpirun cmd line 
(or in the environment). We will then use ssh to perform the launch. Otherwise, 
you’ll need to write at least an orte/mca/ras component to get the allocation, 
and possibly an orte/mca/plm component if you want to use the native launch 
mechanism in place of ssh.




Thanks,
Vipul


___
users mailing list
users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-26 Thread r...@open-mpi.org
mpirun doesn’t get access to that requirement, nor does it need to do so. SGE 
will use the requirement when determining the nodes to allocate. mpirun just 
uses the nodes that SGE provides.

What your cmd line does is restrict the entire operation on each node (daemon + 
8 procs) to 40GB of memory. OMPI does not support per-process restrictions 
other than binding to cpus.


> On Jul 26, 2017, at 6:03 AM, Kulshrestha, Vipul 
>  wrote:
> 
> Thanks for a quick response.
>  
> I will try building OMPI as suggested.
>  
> On the integration with unsupported distribution systems, we cannot use 
> script based approach, because often these machines don’t have ssh permission 
> in customer environment. I will explore the path of writing orte component. 
> At this stage, I don’t understand the effort for the same.
>  
> I guess my question 2 was not understood correctly. I used the below command 
> as an example for SGE and want to understand the expected behavior for such a 
> command. With the below command, I expect to have 8 copies of a.out launched 
> with each copy having access to 40GB of memory. Is that correct? I am 
> doubtful, because I don’t understand how mpirun gets access to information 
> about RAM requirement.
>  
> qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out
>  
>  
> Regards,
> Vipul
>  
>  
>   <>
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of 
> r...@open-mpi.org
> Sent: Tuesday, July 25, 2017 8:16 PM
> To: Open MPI Users 
> Subject: Re: [OMPI users] Questions about integration with resource 
> distribution systems
>  
>  
> On Jul 25, 2017, at 3:48 PM, Kulshrestha, Vipul  <mailto:vipul_kulshres...@mentor.com>> wrote:
>  
> I have several questions about integration of openmpi with resource queuing 
> systems.
>  
> 1.
> I understand that openmpi supports integration with various resource 
> distribution systems such as SGE, LSF, torque etc.
>  
> I need to build an openmpi application that can interact with variety of 
> different resource distribution systems, since different customers have 
> different systems. Based on my research, it seems that I need to build a 
> different openmpi installation to work, e.g. create an installation of opempi 
> with grid and create a different installation of openmpi with LSF. Is there a 
> way to build a generic installation of openmpi that can be used with more 
> than 1 distribution system by using some generic mechanism?
>  
> Just to be clear: your application doesn’t depend on the environment in any 
> way. Only mpirun does - so if you are distributing an _application_, then 
> your question is irrelevant. 
>  
> If you are distributing OMPI itself, and therefore mpirun, then you can build 
> the various components if you first install the headers for that environment 
> on your system. It means that you need one machine where all those resource 
> managers at least have their headers installed on it. Then configure OMPI 
> --with-xxx pointing to each of the RM’s headers so all the components get 
> built. When the binary hits your customer’s machine, only those components 
> that have active libraries present will execute.
>  
>  
> 2.
> For integration with LSF/grid, how would I specify the memory (RAM) 
> requirement (or some other parameter) to bsub/qsub, when launching mpirun 
> command? Will something like below work to ensure that each of the 8 copies 
> of a.out have 40 GB memory reserved for them by grid engine?
>  
> qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out
>  
> You’ll have to provide something that is environment dependent, I’m afraid - 
> there is no standard out there.
> 
> 
>  
> 3.
> Some of our customers use custom distribution engine (some 
> non-industry-standard distribution engine). How can I integrate my openmpi  
> application with such system? I would think that it should be possible to do 
> that if openmpi launched/managed interaction with the distribution engine 
> using some kind of generic mechanism (say, use a configurable command to 
> launch, monitor, kill a job and then allow specification of a plugin define 
> these operations with commands specific to the distribution engine being in 
> use). Does such integration exist in openmpi?
>  
> Easiest solution is to write a script that reads the allocation and dumps it 
> into a file, and then provide that file as your hostfile on the mpirun cmd 
> line (or in the environment). We will then use ssh to perform the launch. 
> Otherwise, you’ll need to write at least an orte/mca/ras component to get the 
> allocation, and possibly an orte/mca/plm component if you want to use the 
> native launch mechanism 

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-26 Thread Reuti
Hi,

> Am 26.07.2017 um 15:03 schrieb Kulshrestha, Vipul 
> :
> 
> Thanks for a quick response.
>  
> I will try building OMPI as suggested.
>  
> On the integration with unsupported distribution systems, we cannot use 
> script based approach, because often these machines don’t have ssh permission 
> in customer environment. I will explore the path of writing orte component. 
> At this stage, I don’t understand the effort for the same.
>  
> I guess my question 2 was not understood correctly. I used the below command 
> as an example for SGE and want to understand the expected behavior for such a 
> command. With the below command, I expect to have 8 copies of a.out launched

Yep.


> with each copy having access to 40GB of memory. Is that correct?

SGE will grant the memory, not Open MPI. This is done by SGE's tight 
integration and as slave tasks are started by `qrsh -inherit …` and not by a 
plain `ssh`. This way SGE can keep track of the started processes.


> I am doubtful, because I don’t understand how mpirun gets access to 
> information about RAM requirement.
>  
> qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out

In case your application relies on the actual value of "m_mem_free", your 
application has to request this information. This might different between the 
various queuing systems though. In SGE one could either use `qstat` to `grep` 
the information, or (instead of a direct `mpirun`) uses a jobscript which will 
feed this value in addition to an environment variable, which you can access 
directly in your application. On a command line it would be:

$ qsub –pe orte 8 –b y –v m_mem_free=40G –l m_mem_free=40G –cwd mpirun –np 8 
a.out

1. -V might set to many variable. Usually I suggest to forward only environment 
variables which are necessary for the job. The user could set some environment 
variable by accident and wonder why the job, which started a couple of days 
later only, crashes; but submitting exactly the same job again succeeds.

2. The 40G is a string in the environment variable, you may want to use the 
plain value in bytes there.

-- Reuti
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-26 Thread Reuti

> Am 26.07.2017 um 15:09 schrieb r...@open-mpi.org:
> 
> mpirun doesn’t get access to that requirement, nor does it need to do so. SGE 
> will use the requirement when determining the nodes to allocate.

m_mem_free appears to come from Univa GE and is not part of the open source 
versions. So I can't comment on this for sure, but it seems to set the memory 
also in cgroups.

-- Reuti


> mpirun just uses the nodes that SGE provides.
> 
> What your cmd line does is restrict the entire operation on each node (daemon 
> + 8 procs) to 40GB of memory. OMPI does not support per-process restrictions 
> other than binding to cpus.
> 
> 
>> On Jul 26, 2017, at 6:03 AM, Kulshrestha, Vipul 
>>  wrote:
>> 
>> Thanks for a quick response.
>>  
>> I will try building OMPI as suggested.
>>  
>> On the integration with unsupported distribution systems, we cannot use 
>> script based approach, because often these machines don’t have ssh 
>> permission in customer environment. I will explore the path of writing orte 
>> component. At this stage, I don’t understand the effort for the same.
>>  
>> I guess my question 2 was not understood correctly. I used the below command 
>> as an example for SGE and want to understand the expected behavior for such 
>> a command. With the below command, I expect to have 8 copies of a.out 
>> launched with each copy having access to 40GB of memory. Is that correct? I 
>> am doubtful, because I don’t understand how mpirun gets access to 
>> information about RAM requirement.
>>  
>> qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out
>>  
>>  
>> Regards,
>> Vipul
>>  
>>  
>>  
>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of 
>> r...@open-mpi.org
>> Sent: Tuesday, July 25, 2017 8:16 PM
>> To: Open MPI Users 
>> Subject: Re: [OMPI users] Questions about integration with resource 
>> distribution systems
>>  
>>  
>> On Jul 25, 2017, at 3:48 PM, Kulshrestha, Vipul 
>>  wrote:
>>  
>> I have several questions about integration of openmpi with resource queuing 
>> systems.
>>  
>> 1.
>> I understand that openmpi supports integration with various resource 
>> distribution systems such as SGE, LSF, torque etc.
>>  
>> I need to build an openmpi application that can interact with variety of 
>> different resource distribution systems, since different customers have 
>> different systems. Based on my research, it seems that I need to build a 
>> different openmpi installation to work, e.g. create an installation of 
>> opempi with grid and create a different installation of openmpi with LSF. Is 
>> there a way to build a generic installation of openmpi that can be used with 
>> more than 1 distribution system by using some generic mechanism?
>>  
>> Just to be clear: your application doesn’t depend on the environment in any 
>> way. Only mpirun does - so if you are distributing an _application_, then 
>> your question is irrelevant. 
>>  
>> If you are distributing OMPI itself, and therefore mpirun, then you can 
>> build the various components if you first install the headers for that 
>> environment on your system. It means that you need one machine where all 
>> those resource managers at least have their headers installed on it. Then 
>> configure OMPI --with-xxx pointing to each of the RM’s headers so all the 
>> components get built. When the binary hits your customer’s machine, only 
>> those components that have active libraries present will execute.
>>  
>>  
>> 2.
>> For integration with LSF/grid, how would I specify the memory (RAM) 
>> requirement (or some other parameter) to bsub/qsub, when launching mpirun 
>> command? Will something like below work to ensure that each of the 8 copies 
>> of a.out have 40 GB memory reserved for them by grid engine?
>>  
>> qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out
>>  
>> You’ll have to provide something that is environment dependent, I’m afraid - 
>> there is no standard out there.
>> 
>> 
>>  
>> 3.
>> Some of our customers use custom distribution engine (some 
>> non-industry-standard distribution engine). How can I integrate my openmpi  
>> application with such system? I would think that it should be possible to do 
>> that if openmpi launched/managed interaction with the distribution engine 
>> using some kind of generic mechanism (say, use a configurable command to 
>> launch, monitor, kill a job and then allow specification of a plugin de

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-26 Thread Kulshrestha, Vipul
Thanks Reuti & RHC for your responses.

My application does not relies on the actual value of m_mem_free and I used 
this as an example, in open source SGE environment, we use mem_free resource.

Now, I understand that SGE will allocate requested resources (based on qsub 
options) and then launch mpirun, which starts "a.out" on allocated resouces 
using 'qrsh -inherit', so that SGE can keep track of all the launched processes.

I assume LSF integration works in a similar way.

Regards,
Vipul


-Original Message-
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Reuti
Sent: Wednesday, July 26, 2017 9:25 AM
To: Open MPI Users 
Subject: Re: [OMPI users] Questions about integration with resource 
distribution systems


> Am 26.07.2017 um 15:09 schrieb r...@open-mpi.org:
> 
> mpirun doesn’t get access to that requirement, nor does it need to do so. SGE 
> will use the requirement when determining the nodes to allocate.

m_mem_free appears to come from Univa GE and is not part of the open source 
versions. So I can't comment on this for sure, but it seems to set the memory 
also in cgroups.

-- Reuti


> mpirun just uses the nodes that SGE provides.
> 
> What your cmd line does is restrict the entire operation on each node (daemon 
> + 8 procs) to 40GB of memory. OMPI does not support per-process restrictions 
> other than binding to cpus.
> 
> 
>> On Jul 26, 2017, at 6:03 AM, Kulshrestha, Vipul 
>>  wrote:
>> 
>> Thanks for a quick response.
>>  
>> I will try building OMPI as suggested.
>>  
>> On the integration with unsupported distribution systems, we cannot use 
>> script based approach, because often these machines don’t have ssh 
>> permission in customer environment. I will explore the path of writing orte 
>> component. At this stage, I don’t understand the effort for the same.
>>  
>> I guess my question 2 was not understood correctly. I used the below command 
>> as an example for SGE and want to understand the expected behavior for such 
>> a command. With the below command, I expect to have 8 copies of a.out 
>> launched with each copy having access to 40GB of memory. Is that correct? I 
>> am doubtful, because I don’t understand how mpirun gets access to 
>> information about RAM requirement.
>>  
>> qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out
>>  
>>  
>> Regards,
>> Vipul
>>  
>>  
>>  
>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of 
>> r...@open-mpi.org
>> Sent: Tuesday, July 25, 2017 8:16 PM
>> To: Open MPI Users 
>> Subject: Re: [OMPI users] Questions about integration with resource 
>> distribution systems
>>  
>>  
>> On Jul 25, 2017, at 3:48 PM, Kulshrestha, Vipul 
>>  wrote:
>>  
>> I have several questions about integration of openmpi with resource queuing 
>> systems.
>>  
>> 1.
>> I understand that openmpi supports integration with various resource 
>> distribution systems such as SGE, LSF, torque etc.
>>  
>> I need to build an openmpi application that can interact with variety of 
>> different resource distribution systems, since different customers have 
>> different systems. Based on my research, it seems that I need to build a 
>> different openmpi installation to work, e.g. create an installation of 
>> opempi with grid and create a different installation of openmpi with LSF. Is 
>> there a way to build a generic installation of openmpi that can be used with 
>> more than 1 distribution system by using some generic mechanism?
>>  
>> Just to be clear: your application doesn’t depend on the environment in any 
>> way. Only mpirun does - so if you are distributing an _application_, then 
>> your question is irrelevant. 
>>  
>> If you are distributing OMPI itself, and therefore mpirun, then you can 
>> build the various components if you first install the headers for that 
>> environment on your system. It means that you need one machine where all 
>> those resource managers at least have their headers installed on it. Then 
>> configure OMPI --with-xxx pointing to each of the RM’s headers so all the 
>> components get built. When the binary hits your customer’s machine, only 
>> those components that have active libraries present will execute.
>>  
>>  
>> 2.
>> For integration with LSF/grid, how would I specify the memory (RAM) 
>> requirement (or some other parameter) to bsub/qsub, when launching mpirun 
>> command? Will something like below work to ensure that each of the 8 copies 
>> of a.out have 40 GB memory reserved fo

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-26 Thread r...@open-mpi.org
Oh no, that's not right. Mpirun launches daemons using qrsh and those daemons 
spawn the app's procs. SGE has no visibility of the app at all

Sent from my iPad

> On Jul 26, 2017, at 7:46 AM, Kulshrestha, Vipul 
>  wrote:
> 
> Thanks Reuti & RHC for your responses.
> 
> My application does not relies on the actual value of m_mem_free and I used 
> this as an example, in open source SGE environment, we use mem_free resource.
> 
> Now, I understand that SGE will allocate requested resources (based on qsub 
> options) and then launch mpirun, which starts "a.out" on allocated resouces 
> using 'qrsh -inherit', so that SGE can keep track of all the launched 
> processes.
> 
> I assume LSF integration works in a similar way.
> 
> Regards,
> Vipul
> 
> 
> -Original Message-
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Reuti
> Sent: Wednesday, July 26, 2017 9:25 AM
> To: Open MPI Users 
> Subject: Re: [OMPI users] Questions about integration with resource 
> distribution systems
> 
> 
>> Am 26.07.2017 um 15:09 schrieb r...@open-mpi.org:
>> 
>> mpirun doesn’t get access to that requirement, nor does it need to do so. 
>> SGE will use the requirement when determining the nodes to allocate.
> 
> m_mem_free appears to come from Univa GE and is not part of the open source 
> versions. So I can't comment on this for sure, but it seems to set the memory 
> also in cgroups.
> 
> -- Reuti
> 
> 
>> mpirun just uses the nodes that SGE provides.
>> 
>> What your cmd line does is restrict the entire operation on each node 
>> (daemon + 8 procs) to 40GB of memory. OMPI does not support per-process 
>> restrictions other than binding to cpus.
>> 
>> 
>>> On Jul 26, 2017, at 6:03 AM, Kulshrestha, Vipul 
>>>  wrote:
>>> 
>>> Thanks for a quick response.
>>> 
>>> I will try building OMPI as suggested.
>>> 
>>> On the integration with unsupported distribution systems, we cannot use 
>>> script based approach, because often these machines don’t have ssh 
>>> permission in customer environment. I will explore the path of writing orte 
>>> component. At this stage, I don’t understand the effort for the same.
>>> 
>>> I guess my question 2 was not understood correctly. I used the below 
>>> command as an example for SGE and want to understand the expected behavior 
>>> for such a command. With the below command, I expect to have 8 copies of 
>>> a.out launched with each copy having access to 40GB of memory. Is that 
>>> correct? I am doubtful, because I don’t understand how mpirun gets access 
>>> to information about RAM requirement.
>>> 
>>> qsub –pe orte 8 –b y –V –l m_mem_free=40G –cwd mpirun –np 8 a.out
>>> 
>>> 
>>> Regards,
>>> Vipul
>>> 
>>> 
>>> 
>>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of 
>>> r...@open-mpi.org
>>> Sent: Tuesday, July 25, 2017 8:16 PM
>>> To: Open MPI Users 
>>> Subject: Re: [OMPI users] Questions about integration with resource 
>>> distribution systems
>>> 
>>> 
>>> On Jul 25, 2017, at 3:48 PM, Kulshrestha, Vipul 
>>>  wrote:
>>> 
>>> I have several questions about integration of openmpi with resource queuing 
>>> systems.
>>> 
>>> 1.
>>> I understand that openmpi supports integration with various resource 
>>> distribution systems such as SGE, LSF, torque etc.
>>> 
>>> I need to build an openmpi application that can interact with variety of 
>>> different resource distribution systems, since different customers have 
>>> different systems. Based on my research, it seems that I need to build a 
>>> different openmpi installation to work, e.g. create an installation of 
>>> opempi with grid and create a different installation of openmpi with LSF. 
>>> Is there a way to build a generic installation of openmpi that can be used 
>>> with more than 1 distribution system by using some generic mechanism?
>>> 
>>> Just to be clear: your application doesn’t depend on the environment in any 
>>> way. Only mpirun does - so if you are distributing an _application_, then 
>>> your question is irrelevant. 
>>> 
>>> If you are distributing OMPI itself, and therefore mpirun, then you can 
>>> build the various components if you first install the headers for that 
>>> environment on your system. It means that you need one machin

Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-27 Thread Dave Love
"r...@open-mpi.org"  writes:

> Oh no, that's not right. Mpirun launches daemons using qrsh and those
> daemons spawn the app's procs. SGE has no visibility of the app at all

Oh no, that's not right.

The whole point of tight integration with remote startup using qrsh is
to report resource usage and provide control over the job.  I'm somewhat
familiar with this.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Questions about integration with resource distribution systems

2017-07-31 Thread Gilles Gouaillardet

Dave,


unless you are doing direct launch (for example, use 'srun' instead of 
'mpirun' under SLURM),


this is the way Open MPI is working : mpirun will use whatever the 
resource manager provides


in order to spawn the remote orted (tm with PBS, qrsh with SGE, srun 
with SLURM, ...).



then mpirun/orted will fork&exec the MPI tasks.


direct launch provides tightest integration, but it requires some 
capabilities (a PMI(x) server)


are provided by the resource manager.


hopefully the resource manager will report memory consumption and so on 
of the spawned process


(e.g. orted) but also its children (e.g. the MPI tasks)


back to SGE, and if i understand correctly, memory is requested per task 
on the qsub command line.


i am not sure what is done then ... this requirement is either ignored, 
or the requirement is set per orted.


(and once again, i do not know if the limit is only for the orted 
process, or its children too)



Bottom line, unless SGE natively provides PMI(x) capabilities, the 
current "tight integration" is imho the best we can do




Cheers,


Gilles




On 7/28/2017 12:50 AM, Dave Love wrote:

"r...@open-mpi.org"  writes:


Oh no, that's not right. Mpirun launches daemons using qrsh and those
daemons spawn the app's procs. SGE has no visibility of the app at all

Oh no, that's not right.

The whole point of tight integration with remote startup using qrsh is
to report resource usage and provide control over the job.  I'm somewhat
familiar with this.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Questions about integration with resource distribution systems

2017-08-01 Thread Dave Love
Gilles Gouaillardet  writes:

> Dave,
>
>
> unless you are doing direct launch (for example, use 'srun' instead of
> 'mpirun' under SLURM),
>
> this is the way Open MPI is working : mpirun will use whatever the
> resource manager provides
>
> in order to spawn the remote orted (tm with PBS, qrsh with SGE, srun
> with SLURM, ...).
>
>
> then mpirun/orted will fork&exec the MPI tasks.

I know quite well how SGE works with openmpi, which isn't special --
I've done enough work on it.  SGE tracks the process tree under orted
just like under bash, even if things daemonize.  The OP was correct.

I should qualify that by noting that ENABLE_ADDGRP_KILL has apparently
never propagated through remote startup, so killing those orphans after
VASP crashes may fail, though resource reporting works.  (I never
installed a fix for want of a test system, but it's not needed with
Linux cpusets.)
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Questions about integration with resource distribution systems

2017-08-01 Thread Reuti
Hi,

> Am 01.08.2017 um 18:36 schrieb Dave Love :
> 
> Gilles Gouaillardet  writes:
> 
>> Dave,
>> 
>> 
>> unless you are doing direct launch (for example, use 'srun' instead of
>> 'mpirun' under SLURM),
>> 
>> this is the way Open MPI is working : mpirun will use whatever the
>> resource manager provides
>> 
>> in order to spawn the remote orted (tm with PBS, qrsh with SGE, srun
>> with SLURM, ...).
>> 
>> 
>> then mpirun/orted will fork&exec the MPI tasks.
> 
> I know quite well how SGE works with openmpi, which isn't special --
> I've done enough work on it.  SGE tracks the process tree under orted
> just like under bash, even if things daemonize.  The OP was correct.
> 
> I should qualify that by noting that ENABLE_ADDGRP_KILL has apparently
> never propagated through remote startup,

Isn't it a setting inside SGE which the sge_execd is aware of? I never exported 
any environment variable for this purpose.

-- Reuti


> so killing those orphans after
> VASP crashes may fail, though resource reporting works.  (I never
> installed a fix for want of a test system, but it's not needed with
> Linux cpusets.)
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Questions about integration with resource distribution systems

2017-08-02 Thread Dave Love
Reuti  writes:

>> I should qualify that by noting that ENABLE_ADDGRP_KILL has apparently
>> never propagated through remote startup,
>
> Isn't it a setting inside SGE which the sge_execd is aware of? I never
> exported any environment variable for this purpose.

Yes, but this is surely off-topic, even though
 mentions openmpi.
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users