Re: [OMPI devel] Hints for using an own pmix server

2018-10-18 Thread Ralph H Castain


> On Oct 17, 2018, at 3:32 AM, Stephan Krempel  wrote:
> 
> 
> Hi Ralph.
> 
 One point that remains open and is interesting for me is if I can
 achieve the same with the 3.1.2 release of OpenMPI. Is it somehow
 possible to configure it as there were the "--with-ompi-pmix-rte"
 switch from version 4.x?
>>> 
>>> I’m afraid we didn’t backport that capability to the v3.x branches.
>>> I’ll ask the relevant release managers if they’d like us to do so.
>> 
>> I checked and we will not be backporting this to the v3.x series. It
>> will begin with v4.x.
> 
> Thanks for checking out. I need to check with our users if supporting
> OpenMPI 4 will be sufficient for them, else for sure I will come back
> soon with some more questions regarding how to manage supporting
> OpenMPI 3.

If it becomes an issue, I can probably provide a patch for OMPI v3 that you 
could locally install

> 
> Thank you again for the assistance.
> 
> Best regards
> 
> Stephan
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Hints for using an own pmix server

2018-10-17 Thread Stephan Krempel

Hi Ralph.

> > > One point that remains open and is interesting for me is if I can
> > > achieve the same with the 3.1.2 release of OpenMPI. Is it somehow
> > > possible to configure it as there were the "--with-ompi-pmix-rte"
> > > switch from version 4.x?
> > 
> > I’m afraid we didn’t backport that capability to the v3.x branches.
> > I’ll ask the relevant release managers if they’d like us to do so.
> 
> I checked and we will not be backporting this to the v3.x series. It
> will begin with v4.x.

Thanks for checking out. I need to check with our users if supporting
OpenMPI 4 will be sufficient for them, else for sure I will come back
soon with some more questions regarding how to manage supporting
OpenMPI 3.

Thank you again for the assistance.

Best regards

Stephan
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Hints for using an own pmix server

2018-10-14 Thread Ralph H Castain


> On Oct 12, 2018, at 6:15 AM, Ralph H Castain  wrote:
> 
>> One point that remains open and is interesting for me is if I can
>> achieve the same with the 3.1.2 release of OpenMPI. Is it somehow
>> possible to configure it as there were the "--with-ompi-pmix-rte"
>> switch from version 4.x?
> 
> I’m afraid we didn’t backport that capability to the v3.x branches. I’ll ask 
> the relevant release managers if they’d like us to do so.

I checked and we will not be backporting this to the v3.x series. It will begin 
with v4.x.

Ralph

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Hints for using an own pmix server

2018-10-12 Thread Ralph H Castain
I took a look at the following:

>> A remark to pmix at this point: pmix_bfrops_base_value_load() does
>> silently not handle PMIX_DATA_ARRAY type leading to not working makros
>> PMIX_VALUE_LOAD and PMIX_INFO_LOAD with that type. I think this is
>> unlucky and took me a while to figure out why it comes to a segfault
>> when pmix tried to process my PMIX_PROC_DATA infos.

It appears that this was true in the v2.x release series, but has since been 
fixed - thus, the v3.x series is okay. I’ll backport the support to the v2.x 
for their next releases.

Thanks for point it out!
Ralph

> On Oct 12, 2018, at 6:15 AM, Ralph H Castain  wrote:
> 
> Hi Stephan
> 
> 
>> On Oct 12, 2018, at 2:25 AM, Stephan Krempel > > wrote:
>> 
>> Hallo Ralph,
>> 
>>> I assume this (--with-ompi-mpix-rte) is a typo as the correct option
>>> is —with-ompi-pmix-rte?
>> 
>> You were right, this was a typo, with the correct option I now managed
>> to start an MPI helloworld program using OpenMPI and our own process
>> manager with pmix server.
> 
> Hooray! If you want me to show support for your PM on our web site, please 
> send me a little info about it. You are welcome to send it off-list if you 
> prefer.
> 
>> 
>>> It all looks okay to me for the client, but I wonder if you
>>> remembered to call register_nspace and register_client on your server
>>> prior to starting the client? If not, the connection will be dropped
>>> - you could add PMIX_MCA_ptl_base_verbose=100 to your environment to
>>> see the detailed connection handshake.
>> 
>> This has been a point that I could finally figure out from the prrte
>> code. To make it working you do not only need to call register_nspace
>> but also pass some specific information to it that OpenMPI considers to
>> be available (e.g. proc info with lrank).
> 
> My apologies - we will document this better on the PMIx web site and provide 
> some link to it on the OMPI web site. We actually do publish the info OMPI is 
> expecting, but it isn’t in an obvious enough place.
> 
>> 
>> A remark to pmix at this point: pmix_bfrops_base_value_load() does
>> silently not handle PMIX_DATA_ARRAY type leading to not working makros
>> PMIX_VALUE_LOAD and PMIX_INFO_LOAD with that type. I think this is
>> unlucky and took me a while to figure out why it comes to a segfault
>> when pmix tried to process my PMIX_PROC_DATA infos.
> 
> I’ll check that out - I don’t know why we wouldn’t handle it, so it is likely 
> just an oversight. Regardless, it should return an error if it isn’t doing it.
> 
>> 
>> So thank you again for your help so far.
>> 
>> 
>> One point that remains open and is interesting for me is if I can
>> achieve the same with the 3.1.2 release of OpenMPI. Is it somehow
>> possible to configure it as there were the "--with-ompi-pmix-rte"
>> switch from version 4.x?
> 
> I’m afraid we didn’t backport that capability to the v3.x branches. I’ll ask 
> the relevant release managers if they’d like us to do so.
> 
> Ralph
> 
>> 
>> Regards,
>> 
>> Stephan
>> 
>> 
>>> 
 On Oct 9, 2018, at 3:14 PM, Stephan Krempel >>> >
 wrote:
 
 Hi Ralf,
 
 After studying prrte a little bit, I tried something new and
 followed
 the description here using openmpi 4:
 https://pmix.org/code/building-the-pmix-reference-server/ 
 
 
 I configured openmpi 4.0.0rc3:
 
 ../configure --enable-debug --prefix [...] --with-pmix=[...] \
  --with-libevent=/usr --with-ompi-mpix-rte
 
 (I also tried to set --with-orte=no, but it then claims not to have
 a
 suitable rte and does not finish)
 
 I then started my own PMIx and spawned a client compiled with mpicc
 of
 the new openmpi installation with this environment:
 
 PMIX_NAMESPACE=namespace_3228_0_0
 PMIX_RANK=0
 PMIX_SERVER_URI2=pmix-server.3234;tcp4://127.0.0.1:49637
 PMIX_SERVER_URI21=pmix-server.3234;tcp4://127.0.0.1:49637
 PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
 PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
 PMIX_SECURITY_MODE=native,none
 PMIX_PTL_MODULE=tcp,usock
 PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
 PMIX_GDS_MODULE=ds12,hash
 PMIX_DSTORE_ESH_BASE_PATH=/tmp/pmix_dstor_3234
 
 The client is not connecting to my pmix server and it's environment
 after MPI_Init looks like that:
 
 PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
 PMIX_RANK=0
 PMIX_PTL_MODULE=tcp,usock
 PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
 PMIX_MCA_mca_base_component_show_load_errors=1
 PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
 PMIX_DSTORE_ESH_BASE_PATH=/tmp/ompi.landroval.1001/pid.3243/pmix_ds
 tor_
 3243
 PMIX_SERVER_URI2=864157696.0;tcp4://127.0.0.1:33619
 PMIX_SERVER_URI21=864157696.0;tcp4://127.0.0.1:33619
 PMIX_SECURITY_MODE=native,no

Re: [OMPI devel] Hints for using an own pmix server

2018-10-12 Thread Ralph H Castain
Hi Stephan


> On Oct 12, 2018, at 2:25 AM, Stephan Krempel  wrote:
> 
> Hallo Ralph,
> 
>> I assume this (--with-ompi-mpix-rte) is a typo as the correct option
>> is —with-ompi-pmix-rte?
> 
> You were right, this was a typo, with the correct option I now managed
> to start an MPI helloworld program using OpenMPI and our own process
> manager with pmix server.

Hooray! If you want me to show support for your PM on our web site, please send 
me a little info about it. You are welcome to send it off-list if you prefer.

> 
>> It all looks okay to me for the client, but I wonder if you
>> remembered to call register_nspace and register_client on your server
>> prior to starting the client? If not, the connection will be dropped
>> - you could add PMIX_MCA_ptl_base_verbose=100 to your environment to
>> see the detailed connection handshake.
> 
> This has been a point that I could finally figure out from the prrte
> code. To make it working you do not only need to call register_nspace
> but also pass some specific information to it that OpenMPI considers to
> be available (e.g. proc info with lrank).

My apologies - we will document this better on the PMIx web site and provide 
some link to it on the OMPI web site. We actually do publish the info OMPI is 
expecting, but it isn’t in an obvious enough place.

> 
> A remark to pmix at this point: pmix_bfrops_base_value_load() does
> silently not handle PMIX_DATA_ARRAY type leading to not working makros
> PMIX_VALUE_LOAD and PMIX_INFO_LOAD with that type. I think this is
> unlucky and took me a while to figure out why it comes to a segfault
> when pmix tried to process my PMIX_PROC_DATA infos.

I’ll check that out - I don’t know why we wouldn’t handle it, so it is likely 
just an oversight. Regardless, it should return an error if it isn’t doing it.

> 
> So thank you again for your help so far.
> 
> 
> One point that remains open and is interesting for me is if I can
> achieve the same with the 3.1.2 release of OpenMPI. Is it somehow
> possible to configure it as there were the "--with-ompi-pmix-rte"
> switch from version 4.x?

I’m afraid we didn’t backport that capability to the v3.x branches. I’ll ask 
the relevant release managers if they’d like us to do so.

Ralph

> 
> Regards,
> 
> Stephan
> 
> 
>> 
>>> On Oct 9, 2018, at 3:14 PM, Stephan Krempel 
>>> wrote:
>>> 
>>> Hi Ralf,
>>> 
>>> After studying prrte a little bit, I tried something new and
>>> followed
>>> the description here using openmpi 4:
>>> https://pmix.org/code/building-the-pmix-reference-server/
>>> 
>>> I configured openmpi 4.0.0rc3:
>>> 
>>> ../configure --enable-debug --prefix [...] --with-pmix=[...] \
>>>  --with-libevent=/usr --with-ompi-mpix-rte
>>> 
>>> (I also tried to set --with-orte=no, but it then claims not to have
>>> a
>>> suitable rte and does not finish)
>>> 
>>> I then started my own PMIx and spawned a client compiled with mpicc
>>> of
>>> the new openmpi installation with this environment:
>>> 
>>> PMIX_NAMESPACE=namespace_3228_0_0
>>> PMIX_RANK=0
>>> PMIX_SERVER_URI2=pmix-server.3234;tcp4://127.0.0.1:49637
>>> PMIX_SERVER_URI21=pmix-server.3234;tcp4://127.0.0.1:49637
>>> PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
>>> PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
>>> PMIX_SECURITY_MODE=native,none
>>> PMIX_PTL_MODULE=tcp,usock
>>> PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
>>> PMIX_GDS_MODULE=ds12,hash
>>> PMIX_DSTORE_ESH_BASE_PATH=/tmp/pmix_dstor_3234
>>> 
>>> The client is not connecting to my pmix server and it's environment
>>> after MPI_Init looks like that:
>>> 
>>> PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
>>> PMIX_RANK=0
>>> PMIX_PTL_MODULE=tcp,usock
>>> PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
>>> PMIX_MCA_mca_base_component_show_load_errors=1
>>> PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
>>> PMIX_DSTORE_ESH_BASE_PATH=/tmp/ompi.landroval.1001/pid.3243/pmix_ds
>>> tor_
>>> 3243
>>> PMIX_SERVER_URI2=864157696.0;tcp4://127.0.0.1:33619
>>> PMIX_SERVER_URI21=864157696.0;tcp4://127.0.0.1:33619
>>> PMIX_SECURITY_MODE=native,none
>>> PMIX_NAMESPACE=864157697
>>> PMIX_GDS_MODULE=ds12,hash
>>> ORTE_SCHIZO_DETECTION=ORTE
>>> OMPI_COMMAND=./hello_env
>>> OMPI_MCA_orte_precondition_transports=f28d6577f6b6ac08-
>>> d92c0e73869e1cfa
>>> OMPI_MCA_orte_launch=1
>>> OMPI_APP_CTX_NUM_PROCS=1
>>> OMPI_MCA_pmix=^s1,s2,cray,isolated
>>> OMPI_MCA_ess=singleton
>>> OMPI_MCA_orte_ess_num_procs=1
>>> 
>>> So something goes wrong but I do not have an idea what I am
>>> missing. Do
>>> you have an idea what I need to change? Do I have to set an MCA
>>> parameter to tell OpenMPI not to start orted, or does it need
>>> another
>>> hint in the client environment beside the stuff comming from the
>>> PMIx
>>> server helper library?
>>> 
>>> 
>>> Stephan
>>> 
>>> 
>>> On Tuesday, Oct 10 2018, 08:33 -0700 Ralph H Castain wrote:
 Hi Stephan
 
 Thanks for the clarification - that helps a great deal. You are
 correct that OMPI’s orted dae

Re: [OMPI devel] Hints for using an own pmix server

2018-10-12 Thread Stephan Krempel
Hallo Ralph,

> I assume this (--with-ompi-mpix-rte) is a typo as the correct option
> is —with-ompi-pmix-rte?

You were right, this was a typo, with the correct option I now managed
to start an MPI helloworld program using OpenMPI and our own process
manager with pmix server.

> It all looks okay to me for the client, but I wonder if you
> remembered to call register_nspace and register_client on your server
> prior to starting the client? If not, the connection will be dropped
> - you could add PMIX_MCA_ptl_base_verbose=100 to your environment to
> see the detailed connection handshake.

This has been a point that I could finally figure out from the prrte
code. To make it working you do not only need to call register_nspace
but also pass some specific information to it that OpenMPI considers to
be available (e.g. proc info with lrank).

A remark to pmix at this point: pmix_bfrops_base_value_load() does
silently not handle PMIX_DATA_ARRAY type leading to not working makros
PMIX_VALUE_LOAD and PMIX_INFO_LOAD with that type. I think this is
unlucky and took me a while to figure out why it comes to a segfault
when pmix tried to process my PMIX_PROC_DATA infos.

So thank you again for your help so far.


One point that remains open and is interesting for me is if I can
achieve the same with the 3.1.2 release of OpenMPI. Is it somehow
possible to configure it as there were the "--with-ompi-pmix-rte"
switch from version 4.x?

Regards,

Stephan


> 
> > On Oct 9, 2018, at 3:14 PM, Stephan Krempel 
> > wrote:
> > 
> > Hi Ralf,
> > 
> > After studying prrte a little bit, I tried something new and
> > followed
> > the description here using openmpi 4:
> > https://pmix.org/code/building-the-pmix-reference-server/
> > 
> > I configured openmpi 4.0.0rc3:
> > 
> > ../configure --enable-debug --prefix [...] --with-pmix=[...] \
> >  --with-libevent=/usr --with-ompi-mpix-rte
> > 
> > (I also tried to set --with-orte=no, but it then claims not to have
> > a
> > suitable rte and does not finish)
> > 
> > I then started my own PMIx and spawned a client compiled with mpicc
> > of
> > the new openmpi installation with this environment:
> > 
> > PMIX_NAMESPACE=namespace_3228_0_0
> > PMIX_RANK=0
> > PMIX_SERVER_URI2=pmix-server.3234;tcp4://127.0.0.1:49637
> > PMIX_SERVER_URI21=pmix-server.3234;tcp4://127.0.0.1:49637
> > PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
> > PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
> > PMIX_SECURITY_MODE=native,none
> > PMIX_PTL_MODULE=tcp,usock
> > PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
> > PMIX_GDS_MODULE=ds12,hash
> > PMIX_DSTORE_ESH_BASE_PATH=/tmp/pmix_dstor_3234
> > 
> > The client is not connecting to my pmix server and it's environment
> > after MPI_Init looks like that:
> > 
> > PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
> > PMIX_RANK=0
> > PMIX_PTL_MODULE=tcp,usock
> > PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
> > PMIX_MCA_mca_base_component_show_load_errors=1
> > PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
> > PMIX_DSTORE_ESH_BASE_PATH=/tmp/ompi.landroval.1001/pid.3243/pmix_ds
> > tor_
> > 3243
> > PMIX_SERVER_URI2=864157696.0;tcp4://127.0.0.1:33619
> > PMIX_SERVER_URI21=864157696.0;tcp4://127.0.0.1:33619
> > PMIX_SECURITY_MODE=native,none
> > PMIX_NAMESPACE=864157697
> > PMIX_GDS_MODULE=ds12,hash
> > ORTE_SCHIZO_DETECTION=ORTE
> > OMPI_COMMAND=./hello_env
> > OMPI_MCA_orte_precondition_transports=f28d6577f6b6ac08-
> > d92c0e73869e1cfa
> > OMPI_MCA_orte_launch=1
> > OMPI_APP_CTX_NUM_PROCS=1
> > OMPI_MCA_pmix=^s1,s2,cray,isolated
> > OMPI_MCA_ess=singleton
> > OMPI_MCA_orte_ess_num_procs=1
> > 
> > So something goes wrong but I do not have an idea what I am
> > missing. Do
> > you have an idea what I need to change? Do I have to set an MCA
> > parameter to tell OpenMPI not to start orted, or does it need
> > another
> > hint in the client environment beside the stuff comming from the
> > PMIx
> > server helper library?
> > 
> > 
> > Stephan
> > 
> > 
> > On Tuesday, Oct 10 2018, 08:33 -0700 Ralph H Castain wrote:
> > > Hi Stephan
> > > 
> > > Thanks for the clarification - that helps a great deal. You are
> > > correct that OMPI’s orted daemons do more than just host the PMIx
> > > server library. However, they are only active if you launch the
> > > OMPI
> > > processes using mpirun. This is probably the source of the
> > > trouble
> > > you are seeing.
> > > 
> > > Since you have a process launcher and have integrated the PMIx
> > > server
> > > support into your RM’s daemons, you really have no need for
> > > mpirun at
> > > all. You should just be able to launch the processes directly
> > > using
> > > your own launcher. The PMIx support will take care of the startup
> > > requirements. The application procs will not use the orted in
> > > such
> > > cases.
> > > 
> > > So if your system is working fine with the PMIx example programs,
> > > then just launch the OMPI apps the same way and it should just
> > > work.
> > > 
> > > On the Sl

Re: [OMPI devel] Hints for using an own pmix server

2018-10-09 Thread Ralph H Castain
I assume this (--with-ompi-mpix-rte) is a typo as the correct option is 
—with-ompi-pmix-rte?

It all looks okay to me for the client, but I wonder if you remembered to call 
register_nspace and register_client on your server prior to starting the 
client? If not, the connection will be dropped - you could add 
PMIX_MCA_ptl_base_verbose=100 to your environment to see the detailed 
connection handshake.

> On Oct 9, 2018, at 3:14 PM, Stephan Krempel  wrote:
> 
> Hi Ralf,
> 
> After studying prrte a little bit, I tried something new and followed
> the description here using openmpi 4:
> https://pmix.org/code/building-the-pmix-reference-server/
> 
> I configured openmpi 4.0.0rc3:
> 
> ../configure --enable-debug --prefix [...] --with-pmix=[...] \
>  --with-libevent=/usr --with-ompi-mpix-rte
> 
> (I also tried to set --with-orte=no, but it then claims not to have a
> suitable rte and does not finish)
> 
> I then started my own PMIx and spawned a client compiled with mpicc of
> the new openmpi installation with this environment:
> 
> PMIX_NAMESPACE=namespace_3228_0_0
> PMIX_RANK=0
> PMIX_SERVER_URI2=pmix-server.3234;tcp4://127.0.0.1:49637
> PMIX_SERVER_URI21=pmix-server.3234;tcp4://127.0.0.1:49637
> PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
> PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
> PMIX_SECURITY_MODE=native,none
> PMIX_PTL_MODULE=tcp,usock
> PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
> PMIX_GDS_MODULE=ds12,hash
> PMIX_DSTORE_ESH_BASE_PATH=/tmp/pmix_dstor_3234
> 
> The client is not connecting to my pmix server and it's environment
> after MPI_Init looks like that:
> 
> PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
> PMIX_RANK=0
> PMIX_PTL_MODULE=tcp,usock
> PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
> PMIX_MCA_mca_base_component_show_load_errors=1
> PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
> PMIX_DSTORE_ESH_BASE_PATH=/tmp/ompi.landroval.1001/pid.3243/pmix_dstor_
> 3243
> PMIX_SERVER_URI2=864157696.0;tcp4://127.0.0.1:33619
> PMIX_SERVER_URI21=864157696.0;tcp4://127.0.0.1:33619
> PMIX_SECURITY_MODE=native,none
> PMIX_NAMESPACE=864157697
> PMIX_GDS_MODULE=ds12,hash
> ORTE_SCHIZO_DETECTION=ORTE
> OMPI_COMMAND=./hello_env
> OMPI_MCA_orte_precondition_transports=f28d6577f6b6ac08-d92c0e73869e1cfa
> OMPI_MCA_orte_launch=1
> OMPI_APP_CTX_NUM_PROCS=1
> OMPI_MCA_pmix=^s1,s2,cray,isolated
> OMPI_MCA_ess=singleton
> OMPI_MCA_orte_ess_num_procs=1
> 
> So something goes wrong but I do not have an idea what I am missing. Do
> you have an idea what I need to change? Do I have to set an MCA
> parameter to tell OpenMPI not to start orted, or does it need another
> hint in the client environment beside the stuff comming from the PMIx
> server helper library?
> 
> 
> Stephan
> 
> 
> On Tuesday, Oct 10 2018, 08:33 -0700 Ralph H Castain wrote:
>> Hi Stephan
>> 
>> Thanks for the clarification - that helps a great deal. You are
>> correct that OMPI’s orted daemons do more than just host the PMIx
>> server library. However, they are only active if you launch the OMPI
>> processes using mpirun. This is probably the source of the trouble
>> you are seeing.
>> 
>> Since you have a process launcher and have integrated the PMIx server
>> support into your RM’s daemons, you really have no need for mpirun at
>> all. You should just be able to launch the processes directly using
>> your own launcher. The PMIx support will take care of the startup
>> requirements. The application procs will not use the orted in such
>> cases.
>> 
>> So if your system is working fine with the PMIx example programs,
>> then just launch the OMPI apps the same way and it should just work.
>> 
>> On the Slurm side: I’m surprised that it doesn’t work without the
>> —with-slurm option. An application proc doesn’t care about any of the
>> Slurm-related code if PMIx is available. I might have access to a
>> machine where I can check it…
>> 
>> Ralph
>> 
>> 
>>> On Oct 9, 2018, at 3:26 AM, Stephan Krempel 
>>> wrote:
>>> 
>>> Ralph, Gilles,
>>> 
>>> thanks for your input.
>>> 
>>> Before I answer, let me shortly explain what my general intention
>>> is.
>>> We do have our own resource manager and process launcher that
>>> supports
>>> different MPI implementations in different ways. I want to adapt it
>>> to
>>> PMIx to cleanly support OpenMPI and hopefully other MPI
>>> implementation
>>> supporting PMIx in the future, too. 
>>> 
 It sounds like what you really want to do is replace the orted,
 and
 have your orted open your PMIx server? In other words, you want
 to
 use the PMIx reference library to handle all the PMIx stuff, and
 provide your own backend functions to support the PMIx server
 calls? 
>>> 
>>> You are right, that was my original plan, and I already did it so
>>> far.
>>> In my environment I already can launch processes that successfully
>>> call
>>> PMIx client functions like put, get, fence and so on, all handled
>>> by my
>>> servers using the PMIx server helper library

Re: [OMPI devel] Hints for using an own pmix server

2018-10-09 Thread Stephan Krempel
Hi Ralf,

After studying prrte a little bit, I tried something new and followed
the description here using openmpi 4:
https://pmix.org/code/building-the-pmix-reference-server/

I configured openmpi 4.0.0rc3:

../configure --enable-debug --prefix [...] --with-pmix=[...] \
  --with-libevent=/usr --with-ompi-mpix-rte

(I also tried to set --with-orte=no, but it then claims not to have a
suitable rte and does not finish)

I then started my own PMIx and spawned a client compiled with mpicc of
the new openmpi installation with this environment:

PMIX_NAMESPACE=namespace_3228_0_0
PMIX_RANK=0
PMIX_SERVER_URI2=pmix-server.3234;tcp4://127.0.0.1:49637
PMIX_SERVER_URI21=pmix-server.3234;tcp4://127.0.0.1:49637
PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
PMIX_SECURITY_MODE=native,none
PMIX_PTL_MODULE=tcp,usock
PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
PMIX_GDS_MODULE=ds12,hash
PMIX_DSTORE_ESH_BASE_PATH=/tmp/pmix_dstor_3234

The client is not connecting to my pmix server and it's environment
after MPI_Init looks like that:

PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
PMIX_RANK=0
PMIX_PTL_MODULE=tcp,usock
PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
PMIX_MCA_mca_base_component_show_load_errors=1
PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
PMIX_DSTORE_ESH_BASE_PATH=/tmp/ompi.landroval.1001/pid.3243/pmix_dstor_
3243
PMIX_SERVER_URI2=864157696.0;tcp4://127.0.0.1:33619
PMIX_SERVER_URI21=864157696.0;tcp4://127.0.0.1:33619
PMIX_SECURITY_MODE=native,none
PMIX_NAMESPACE=864157697
PMIX_GDS_MODULE=ds12,hash
ORTE_SCHIZO_DETECTION=ORTE
OMPI_COMMAND=./hello_env
OMPI_MCA_orte_precondition_transports=f28d6577f6b6ac08-d92c0e73869e1cfa
OMPI_MCA_orte_launch=1
OMPI_APP_CTX_NUM_PROCS=1
OMPI_MCA_pmix=^s1,s2,cray,isolated
OMPI_MCA_ess=singleton
OMPI_MCA_orte_ess_num_procs=1

So something goes wrong but I do not have an idea what I am missing. Do
you have an idea what I need to change? Do I have to set an MCA
parameter to tell OpenMPI not to start orted, or does it need another
hint in the client environment beside the stuff comming from the PMIx
server helper library?


Stephan


On Tuesday, Oct 10 2018, 08:33 -0700 Ralph H Castain wrote:
> Hi Stephan
> 
> Thanks for the clarification - that helps a great deal. You are
> correct that OMPI’s orted daemons do more than just host the PMIx
> server library. However, they are only active if you launch the OMPI
> processes using mpirun. This is probably the source of the trouble
> you are seeing.
> 
> Since you have a process launcher and have integrated the PMIx server
> support into your RM’s daemons, you really have no need for mpirun at
> all. You should just be able to launch the processes directly using
> your own launcher. The PMIx support will take care of the startup
> requirements. The application procs will not use the orted in such
> cases.
> 
> So if your system is working fine with the PMIx example programs,
> then just launch the OMPI apps the same way and it should just work.
> 
> On the Slurm side: I’m surprised that it doesn’t work without the
> —with-slurm option. An application proc doesn’t care about any of the
> Slurm-related code if PMIx is available. I might have access to a
> machine where I can check it…
> 
> Ralph
> 
> 
> > On Oct 9, 2018, at 3:26 AM, Stephan Krempel 
> > wrote:
> > 
> > Ralph, Gilles,
> > 
> > thanks for your input.
> > 
> > Before I answer, let me shortly explain what my general intention
> > is.
> > We do have our own resource manager and process launcher that
> > supports
> > different MPI implementations in different ways. I want to adapt it
> > to
> > PMIx to cleanly support OpenMPI and hopefully other MPI
> > implementation
> > supporting PMIx in the future, too. 
> > 
> > > It sounds like what you really want to do is replace the orted,
> > > and
> > > have your orted open your PMIx server? In other words, you want
> > > to
> > > use the PMIx reference library to handle all the PMIx stuff, and
> > > provide your own backend functions to support the PMIx server
> > > calls? 
> > 
> > You are right, that was my original plan, and I already did it so
> > far.
> > In my environment I already can launch processes that successfully
> > call
> > PMIx client functions like put, get, fence and so on, all handled
> > by my
> > servers using the PMIx server helper library. As far as I
> > implemented
> > the server functions now, all the example programs coming with the
> > pmix
> > library are working fine.
> > 
> > Then I tried to use that with OpenMPI and stumbled.
> > My first idea was to simply replace orted but after taking a closer
> > look into OpenMPI it seems to me, that it uses/needs orted not only
> > for
> > spawning and exchange of process information, but also for its
> > general
> > communication and collectives. Am I wrong with that?
> > 
> > So replacing it completely is perhaps not what I want since I do
> > not
> > intent to replace OpenMPIs whole comm

Re: [OMPI devel] Hints for using an own pmix server

2018-10-09 Thread Ralph H Castain
Hi Stephan

Thanks for the clarification - that helps a great deal. You are correct that 
OMPI’s orted daemons do more than just host the PMIx server library. However, 
they are only active if you launch the OMPI processes using mpirun. This is 
probably the source of the trouble you are seeing.

Since you have a process launcher and have integrated the PMIx server support 
into your RM’s daemons, you really have no need for mpirun at all. You should 
just be able to launch the processes directly using your own launcher. The PMIx 
support will take care of the startup requirements. The application procs will 
not use the orted in such cases.

So if your system is working fine with the PMIx example programs, then just 
launch the OMPI apps the same way and it should just work.

On the Slurm side: I’m surprised that it doesn’t work without the —with-slurm 
option. An application proc doesn’t care about any of the Slurm-related code if 
PMIx is available. I might have access to a machine where I can check it…

Ralph


> On Oct 9, 2018, at 3:26 AM, Stephan Krempel  wrote:
> 
> Ralph, Gilles,
> 
> thanks for your input.
> 
> Before I answer, let me shortly explain what my general intention is.
> We do have our own resource manager and process launcher that supports
> different MPI implementations in different ways. I want to adapt it to
> PMIx to cleanly support OpenMPI and hopefully other MPI implementation
> supporting PMIx in the future, too. 
> 
>> It sounds like what you really want to do is replace the orted, and
>> have your orted open your PMIx server? In other words, you want to
>> use the PMIx reference library to handle all the PMIx stuff, and
>> provide your own backend functions to support the PMIx server calls? 
> 
> You are right, that was my original plan, and I already did it so far.
> In my environment I already can launch processes that successfully call
> PMIx client functions like put, get, fence and so on, all handled by my
> servers using the PMIx server helper library. As far as I implemented
> the server functions now, all the example programs coming with the pmix
> library are working fine.
> 
> Then I tried to use that with OpenMPI and stumbled.
> My first idea was to simply replace orted but after taking a closer
> look into OpenMPI it seems to me, that it uses/needs orted not only for
> spawning and exchange of process information, but also for its general
> communication and collectives. Am I wrong with that?
> 
> So replacing it completely is perhaps not what I want since I do not
> intent to replace OpenMPIs whole communication stuff. But perhaps I do
> mix up orte and orted here, not certain about that.
> 
>> If so, then your best bet would be to edit the PRRTE code in
>> orte/orted/pmix and replace it with your code. You’ll have to deal
>> with the ORTE data objects and PRRTE’s launch procedure, but that is
>> likely easier than trying to write your own version of “orted” from
>> scratch.
> 
> I think one problem here is, that I do not really understand which
> purposes orted fulfills overall especially beside implementing the PMIx
> server side. Can you please give me a short overview?
> 
>> As for Slurm: it behaves the same way as PRRTE. It has a plugin that
>> implements the server backend functions, and the Slurm daemons “host”
>> the plugin. What you would need to do is replace that plugin with
>> your own.
> 
> I understand that, but it also seems to need some special support by
> the several slurm modules on the OpenMPI side that I do not understand,
> yet. At least when I tried OpenMPI without slurm support and
> `srun --mpi=pmix_v2` it does not work but generates a message that
> slurm support in opemmpi is missing.
> 
> 
> 
> Stephan
> 
> 
> 
>> 
>>> On Oct 8, 2018, at 5:36 PM, Gilles Gouaillardet 
>>> wrote:
>>> 
>>> Stephan,
>>> 
>>> 
>>> Have you already checked https://github.com/pmix/prrte ?
>>> 
>>> 
>>> This is the PMIx Reference RunTime Environment (PPRTE), which was
>>> built on top of orted.
>>> 
>>> Long story short, it deploys the PMIx server and then you start
>>> your MPI app with prun
>>> An example is available at https://github.com/pmix/prrte/blob/maste
>>> r/contrib/travis/test_client.sh
>>> 
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> 
>>> On 10/9/2018 8:45 AM, Stephan Krempel wrote:
 Hallo everyone,
 
 I am currently implementing a PMIx server and I try to use it
 with
 OpenMPI. I do have an own mpiexec which starts my PMIx server and
 launches the processes.
 
 If I launch an executable linked against OpenMPI, during
 MPI_Init() the
 ORTE layer starts another PMIx server and overrides my PMIX_*
 environment so this new server is used instead of mine.
 
 So I am looking for a method to prevent orte(d) from starting a
 PMIx
 server.
 
 I already tried to understand what the slurm support is doing,
 since
 this is (at least in parts) what I think I need. Somehow when
 start

Re: [OMPI devel] Hints for using an own pmix server

2018-10-09 Thread Stephan Krempel
Ralph, Gilles,

thanks for your input.

Before I answer, let me shortly explain what my general intention is.
We do have our own resource manager and process launcher that supports
different MPI implementations in different ways. I want to adapt it to
PMIx to cleanly support OpenMPI and hopefully other MPI implementation
supporting PMIx in the future, too. 

> It sounds like what you really want to do is replace the orted, and
> have your orted open your PMIx server? In other words, you want to
> use the PMIx reference library to handle all the PMIx stuff, and
> provide your own backend functions to support the PMIx server calls? 

You are right, that was my original plan, and I already did it so far.
In my environment I already can launch processes that successfully call
PMIx client functions like put, get, fence and so on, all handled by my
servers using the PMIx server helper library. As far as I implemented
the server functions now, all the example programs coming with the pmix
library are working fine.

Then I tried to use that with OpenMPI and stumbled.
My first idea was to simply replace orted but after taking a closer
look into OpenMPI it seems to me, that it uses/needs orted not only for
spawning and exchange of process information, but also for its general
communication and collectives. Am I wrong with that?

So replacing it completely is perhaps not what I want since I do not
intent to replace OpenMPIs whole communication stuff. But perhaps I do
mix up orte and orted here, not certain about that.

> If so, then your best bet would be to edit the PRRTE code in
> orte/orted/pmix and replace it with your code. You’ll have to deal
> with the ORTE data objects and PRRTE’s launch procedure, but that is
> likely easier than trying to write your own version of “orted” from
> scratch.

I think one problem here is, that I do not really understand which
purposes orted fulfills overall especially beside implementing the PMIx
server side. Can you please give me a short overview?

> As for Slurm: it behaves the same way as PRRTE. It has a plugin that
> implements the server backend functions, and the Slurm daemons “host”
> the plugin. What you would need to do is replace that plugin with
> your own.

I understand that, but it also seems to need some special support by
the several slurm modules on the OpenMPI side that I do not understand,
yet. At least when I tried OpenMPI without slurm support and
`srun --mpi=pmix_v2` it does not work but generates a message that
slurm support in opemmpi is missing.



Stephan



> 
> > On Oct 8, 2018, at 5:36 PM, Gilles Gouaillardet 
> > wrote:
> > 
> > Stephan,
> > 
> > 
> > Have you already checked https://github.com/pmix/prrte ?
> > 
> > 
> > This is the PMIx Reference RunTime Environment (PPRTE), which was
> > built on top of orted.
> > 
> > Long story short, it deploys the PMIx server and then you start
> > your MPI app with prun
> > An example is available at https://github.com/pmix/prrte/blob/maste
> > r/contrib/travis/test_client.sh
> > 
> > 
> > Cheers,
> > 
> > Gilles
> > 
> > 
> > On 10/9/2018 8:45 AM, Stephan Krempel wrote:
> > > Hallo everyone,
> > > 
> > > I am currently implementing a PMIx server and I try to use it
> > > with
> > > OpenMPI. I do have an own mpiexec which starts my PMIx server and
> > > launches the processes.
> > > 
> > > If I launch an executable linked against OpenMPI, during
> > > MPI_Init() the
> > > ORTE layer starts another PMIx server and overrides my PMIX_*
> > > environment so this new server is used instead of mine.
> > > 
> > > So I am looking for a method to prevent orte(d) from starting a
> > > PMIx
> > > server.
> > > 
> > > I already tried to understand what the slurm support is doing,
> > > since
> > > this is (at least in parts) what I think I need. Somehow when
> > > starting
> > > a job with srun --mpi=pmix_v2 the ess module pmi is started, but
> > > I was
> > > not able to enforce that manually by setting an MCA parameter
> > > (oss
> > > should be the correct one?!?)
> > > And I do not yet have a clue how the slurm support is working.
> > > 
> > > So does anyone has a hint for me where I can find documentation
> > > or
> > > information concerning that or is there an easy way to achieve
> > > what I
> > > am trying to do that I missed?
> > > 
> > > Thank you in advance.
> > > 
> > > Regards,
> > > 
> > > Stephan
> > > ___
> > > devel mailing list
> > > devel@lists.open-mpi.org
> > > https://lists.open-mpi.org/mailman/listinfo/devel
> > > 
> > 
> > ___
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/devel
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/list

Re: [OMPI devel] Hints for using an own pmix server

2018-10-08 Thread Ralph H Castain
Even PRRTE won’t allow you to stop the orted from initializing its PMIx server. 
I’m not sure I really understand your objective. Remember, PMIx is just a 
library - the orted opens it and uses it to interface to its client application 
procs. It makes no sense to have some other process perform that role as it 
won’t know any job-level information.

It sounds like what you really want to do is replace the orted, and have your 
orted open your PMIx server? In other words, you want to use the PMIx reference 
library to handle all the PMIx stuff, and provide your own backend functions to 
support the PMIx server calls? If so, then your best bet would be to edit the 
PRRTE code in orte/orted/pmix and replace it with your code. You’ll have to 
deal with the ORTE data objects and PRRTE’s launch procedure, but that is 
likely easier than trying to write your own version of “orted” from scratch.

As for Slurm: it behaves the same way as PRRTE. It has a plugin that implements 
the server backend functions, and the Slurm daemons “host” the plugin. What you 
would need to do is replace that plugin with your own.
Ralph


> On Oct 8, 2018, at 5:36 PM, Gilles Gouaillardet  wrote:
> 
> Stephan,
> 
> 
> Have you already checked https://github.com/pmix/prrte ?
> 
> 
> This is the PMIx Reference RunTime Environment (PPRTE), which was built on 
> top of orted.
> 
> Long story short, it deploys the PMIx server and then you start your MPI app 
> with prun
> An example is available at 
> https://github.com/pmix/prrte/blob/master/contrib/travis/test_client.sh
> 
> 
> Cheers,
> 
> Gilles
> 
> 
> On 10/9/2018 8:45 AM, Stephan Krempel wrote:
>> Hallo everyone,
>> 
>> I am currently implementing a PMIx server and I try to use it with
>> OpenMPI. I do have an own mpiexec which starts my PMIx server and
>> launches the processes.
>> 
>> If I launch an executable linked against OpenMPI, during MPI_Init() the
>> ORTE layer starts another PMIx server and overrides my PMIX_*
>> environment so this new server is used instead of mine.
>> 
>> So I am looking for a method to prevent orte(d) from starting a PMIx
>> server.
>> 
>> I already tried to understand what the slurm support is doing, since
>> this is (at least in parts) what I think I need. Somehow when starting
>> a job with srun --mpi=pmix_v2 the ess module pmi is started, but I was
>> not able to enforce that manually by setting an MCA parameter (oss
>> should be the correct one?!?)
>> And I do not yet have a clue how the slurm support is working.
>> 
>> So does anyone has a hint for me where I can find documentation or
>> information concerning that or is there an easy way to achieve what I
>> am trying to do that I missed?
>> 
>> Thank you in advance.
>> 
>> Regards,
>> 
>> Stephan
>> ___
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel
>> 
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Hints for using an own pmix server

2018-10-08 Thread Gilles Gouaillardet

Stephan,


Have you already checked https://github.com/pmix/prrte ?


This is the PMIx Reference RunTime Environment (PPRTE), which was built 
on top of orted.


Long story short, it deploys the PMIx server and then you start your MPI 
app with prun
An example is available at 
https://github.com/pmix/prrte/blob/master/contrib/travis/test_client.sh



Cheers,

Gilles


On 10/9/2018 8:45 AM, Stephan Krempel wrote:

Hallo everyone,

I am currently implementing a PMIx server and I try to use it with
OpenMPI. I do have an own mpiexec which starts my PMIx server and
launches the processes.

If I launch an executable linked against OpenMPI, during MPI_Init() the
ORTE layer starts another PMIx server and overrides my PMIX_*
environment so this new server is used instead of mine.

So I am looking for a method to prevent orte(d) from starting a PMIx
server.

I already tried to understand what the slurm support is doing, since
this is (at least in parts) what I think I need. Somehow when starting
a job with srun --mpi=pmix_v2 the ess module pmi is started, but I was
not able to enforce that manually by setting an MCA parameter (oss
should be the correct one?!?)
And I do not yet have a clue how the slurm support is working.

So does anyone has a hint for me where I can find documentation or
information concerning that or is there an easy way to achieve what I
am trying to do that I missed?

Thank you in advance.

Regards,

Stephan
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel



___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


Re: [OMPI devel] Hints for using an own pmix server

2018-10-08 Thread Stephan Krempel
Hello again,

just want to add the versions I am using:

OpenMPI 3.1.2 with external PMIx 2.1.3

Sorry that I missed that in the first mail.

Regards,

Stephan


Am Dienstag, den 09.10.2018, 01:45 +0200 schrieb Stephan Krempel:
> Hallo everyone,
> 
> I am currently implementing a PMIx server and I try to use it with
> OpenMPI. I do have an own mpiexec which starts my PMIx server and
> launches the processes.
> 
> If I launch an executable linked against OpenMPI, during MPI_Init()
> the
> ORTE layer starts another PMIx server and overrides my PMIX_*
> environment so this new server is used instead of mine.
> 
> So I am looking for a method to prevent orte(d) from starting a PMIx
> server.
> 
> I already tried to understand what the slurm support is doing, since
> this is (at least in parts) what I think I need. Somehow when
> starting
> a job with srun --mpi=pmix_v2 the ess module pmi is started, but I
> was
> not able to enforce that manually by setting an MCA parameter (oss
> should be the correct one?!?)
> And I do not yet have a clue how the slurm support is working.
> 
> So does anyone has a hint for me where I can find documentation or
> information concerning that or is there an easy way to achieve what I
> am trying to do that I missed?
> 
> Thank you in advance.
> 
> Regards,
> 
> Stephan
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel


[OMPI devel] Hints for using an own pmix server

2018-10-08 Thread Stephan Krempel
Hallo everyone,

I am currently implementing a PMIx server and I try to use it with
OpenMPI. I do have an own mpiexec which starts my PMIx server and
launches the processes.

If I launch an executable linked against OpenMPI, during MPI_Init() the
ORTE layer starts another PMIx server and overrides my PMIX_*
environment so this new server is used instead of mine.

So I am looking for a method to prevent orte(d) from starting a PMIx
server.

I already tried to understand what the slurm support is doing, since
this is (at least in parts) what I think I need. Somehow when starting
a job with srun --mpi=pmix_v2 the ess module pmi is started, but I was
not able to enforce that manually by setting an MCA parameter (oss
should be the correct one?!?)
And I do not yet have a clue how the slurm support is working.

So does anyone has a hint for me where I can find documentation or
information concerning that or is there an easy way to achieve what I
am trying to do that I missed?

Thank you in advance.

Regards,

Stephan
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel