Re: [OMPI devel] Hints for using an own pmix server

2018-10-12 Thread Ralph H Castain
I took a look at the following:

>> A remark to pmix at this point: pmix_bfrops_base_value_load() does
>> silently not handle PMIX_DATA_ARRAY type leading to not working makros
>> PMIX_VALUE_LOAD and PMIX_INFO_LOAD with that type. I think this is
>> unlucky and took me a while to figure out why it comes to a segfault
>> when pmix tried to process my PMIX_PROC_DATA infos.

It appears that this was true in the v2.x release series, but has since been 
fixed - thus, the v3.x series is okay. I’ll backport the support to the v2.x 
for their next releases.

Thanks for point it out!
Ralph

> On Oct 12, 2018, at 6:15 AM, Ralph H Castain  wrote:
> 
> Hi Stephan
> 
> 
>> On Oct 12, 2018, at 2:25 AM, Stephan Krempel > > wrote:
>> 
>> Hallo Ralph,
>> 
>>> I assume this (--with-ompi-mpix-rte) is a typo as the correct option
>>> is —with-ompi-pmix-rte?
>> 
>> You were right, this was a typo, with the correct option I now managed
>> to start an MPI helloworld program using OpenMPI and our own process
>> manager with pmix server.
> 
> Hooray! If you want me to show support for your PM on our web site, please 
> send me a little info about it. You are welcome to send it off-list if you 
> prefer.
> 
>> 
>>> It all looks okay to me for the client, but I wonder if you
>>> remembered to call register_nspace and register_client on your server
>>> prior to starting the client? If not, the connection will be dropped
>>> - you could add PMIX_MCA_ptl_base_verbose=100 to your environment to
>>> see the detailed connection handshake.
>> 
>> This has been a point that I could finally figure out from the prrte
>> code. To make it working you do not only need to call register_nspace
>> but also pass some specific information to it that OpenMPI considers to
>> be available (e.g. proc info with lrank).
> 
> My apologies - we will document this better on the PMIx web site and provide 
> some link to it on the OMPI web site. We actually do publish the info OMPI is 
> expecting, but it isn’t in an obvious enough place.
> 
>> 
>> A remark to pmix at this point: pmix_bfrops_base_value_load() does
>> silently not handle PMIX_DATA_ARRAY type leading to not working makros
>> PMIX_VALUE_LOAD and PMIX_INFO_LOAD with that type. I think this is
>> unlucky and took me a while to figure out why it comes to a segfault
>> when pmix tried to process my PMIX_PROC_DATA infos.
> 
> I’ll check that out - I don’t know why we wouldn’t handle it, so it is likely 
> just an oversight. Regardless, it should return an error if it isn’t doing it.
> 
>> 
>> So thank you again for your help so far.
>> 
>> 
>> One point that remains open and is interesting for me is if I can
>> achieve the same with the 3.1.2 release of OpenMPI. Is it somehow
>> possible to configure it as there were the "--with-ompi-pmix-rte"
>> switch from version 4.x?
> 
> I’m afraid we didn’t backport that capability to the v3.x branches. I’ll ask 
> the relevant release managers if they’d like us to do so.
> 
> Ralph
> 
>> 
>> Regards,
>> 
>> Stephan
>> 
>> 
>>> 
 On Oct 9, 2018, at 3:14 PM, Stephan Krempel >>> >
 wrote:
 
 Hi Ralf,
 
 After studying prrte a little bit, I tried something new and
 followed
 the description here using openmpi 4:
 https://pmix.org/code/building-the-pmix-reference-server/ 
 
 
 I configured openmpi 4.0.0rc3:
 
 ../configure --enable-debug --prefix [...] --with-pmix=[...] \
  --with-libevent=/usr --with-ompi-mpix-rte
 
 (I also tried to set --with-orte=no, but it then claims not to have
 a
 suitable rte and does not finish)
 
 I then started my own PMIx and spawned a client compiled with mpicc
 of
 the new openmpi installation with this environment:
 
 PMIX_NAMESPACE=namespace_3228_0_0
 PMIX_RANK=0
 PMIX_SERVER_URI2=pmix-server.3234;tcp4://127.0.0.1:49637
 PMIX_SERVER_URI21=pmix-server.3234;tcp4://127.0.0.1:49637
 PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
 PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
 PMIX_SECURITY_MODE=native,none
 PMIX_PTL_MODULE=tcp,usock
 PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
 PMIX_GDS_MODULE=ds12,hash
 PMIX_DSTORE_ESH_BASE_PATH=/tmp/pmix_dstor_3234
 
 The client is not connecting to my pmix server and it's environment
 after MPI_Init looks like that:
 
 PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
 PMIX_RANK=0
 PMIX_PTL_MODULE=tcp,usock
 PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
 PMIX_MCA_mca_base_component_show_load_errors=1
 PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
 PMIX_DSTORE_ESH_BASE_PATH=/tmp/ompi.landroval.1001/pid.3243/pmix_ds
 tor_
 3243
 PMIX_SERVER_URI2=864157696.0;tcp4://127.0.0.1:33619
 PMIX_SERVER_URI21=864157696.0;tcp4://127.0.0.1:33619
 

Re: [OMPI devel] Hints for using an own pmix server

2018-10-12 Thread Ralph H Castain
Hi Stephan


> On Oct 12, 2018, at 2:25 AM, Stephan Krempel  wrote:
> 
> Hallo Ralph,
> 
>> I assume this (--with-ompi-mpix-rte) is a typo as the correct option
>> is —with-ompi-pmix-rte?
> 
> You were right, this was a typo, with the correct option I now managed
> to start an MPI helloworld program using OpenMPI and our own process
> manager with pmix server.

Hooray! If you want me to show support for your PM on our web site, please send 
me a little info about it. You are welcome to send it off-list if you prefer.

> 
>> It all looks okay to me for the client, but I wonder if you
>> remembered to call register_nspace and register_client on your server
>> prior to starting the client? If not, the connection will be dropped
>> - you could add PMIX_MCA_ptl_base_verbose=100 to your environment to
>> see the detailed connection handshake.
> 
> This has been a point that I could finally figure out from the prrte
> code. To make it working you do not only need to call register_nspace
> but also pass some specific information to it that OpenMPI considers to
> be available (e.g. proc info with lrank).

My apologies - we will document this better on the PMIx web site and provide 
some link to it on the OMPI web site. We actually do publish the info OMPI is 
expecting, but it isn’t in an obvious enough place.

> 
> A remark to pmix at this point: pmix_bfrops_base_value_load() does
> silently not handle PMIX_DATA_ARRAY type leading to not working makros
> PMIX_VALUE_LOAD and PMIX_INFO_LOAD with that type. I think this is
> unlucky and took me a while to figure out why it comes to a segfault
> when pmix tried to process my PMIX_PROC_DATA infos.

I’ll check that out - I don’t know why we wouldn’t handle it, so it is likely 
just an oversight. Regardless, it should return an error if it isn’t doing it.

> 
> So thank you again for your help so far.
> 
> 
> One point that remains open and is interesting for me is if I can
> achieve the same with the 3.1.2 release of OpenMPI. Is it somehow
> possible to configure it as there were the "--with-ompi-pmix-rte"
> switch from version 4.x?

I’m afraid we didn’t backport that capability to the v3.x branches. I’ll ask 
the relevant release managers if they’d like us to do so.

Ralph

> 
> Regards,
> 
> Stephan
> 
> 
>> 
>>> On Oct 9, 2018, at 3:14 PM, Stephan Krempel 
>>> wrote:
>>> 
>>> Hi Ralf,
>>> 
>>> After studying prrte a little bit, I tried something new and
>>> followed
>>> the description here using openmpi 4:
>>> https://pmix.org/code/building-the-pmix-reference-server/
>>> 
>>> I configured openmpi 4.0.0rc3:
>>> 
>>> ../configure --enable-debug --prefix [...] --with-pmix=[...] \
>>>  --with-libevent=/usr --with-ompi-mpix-rte
>>> 
>>> (I also tried to set --with-orte=no, but it then claims not to have
>>> a
>>> suitable rte and does not finish)
>>> 
>>> I then started my own PMIx and spawned a client compiled with mpicc
>>> of
>>> the new openmpi installation with this environment:
>>> 
>>> PMIX_NAMESPACE=namespace_3228_0_0
>>> PMIX_RANK=0
>>> PMIX_SERVER_URI2=pmix-server.3234;tcp4://127.0.0.1:49637
>>> PMIX_SERVER_URI21=pmix-server.3234;tcp4://127.0.0.1:49637
>>> PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
>>> PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
>>> PMIX_SECURITY_MODE=native,none
>>> PMIX_PTL_MODULE=tcp,usock
>>> PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
>>> PMIX_GDS_MODULE=ds12,hash
>>> PMIX_DSTORE_ESH_BASE_PATH=/tmp/pmix_dstor_3234
>>> 
>>> The client is not connecting to my pmix server and it's environment
>>> after MPI_Init looks like that:
>>> 
>>> PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
>>> PMIX_RANK=0
>>> PMIX_PTL_MODULE=tcp,usock
>>> PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
>>> PMIX_MCA_mca_base_component_show_load_errors=1
>>> PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
>>> PMIX_DSTORE_ESH_BASE_PATH=/tmp/ompi.landroval.1001/pid.3243/pmix_ds
>>> tor_
>>> 3243
>>> PMIX_SERVER_URI2=864157696.0;tcp4://127.0.0.1:33619
>>> PMIX_SERVER_URI21=864157696.0;tcp4://127.0.0.1:33619
>>> PMIX_SECURITY_MODE=native,none
>>> PMIX_NAMESPACE=864157697
>>> PMIX_GDS_MODULE=ds12,hash
>>> ORTE_SCHIZO_DETECTION=ORTE
>>> OMPI_COMMAND=./hello_env
>>> OMPI_MCA_orte_precondition_transports=f28d6577f6b6ac08-
>>> d92c0e73869e1cfa
>>> OMPI_MCA_orte_launch=1
>>> OMPI_APP_CTX_NUM_PROCS=1
>>> OMPI_MCA_pmix=^s1,s2,cray,isolated
>>> OMPI_MCA_ess=singleton
>>> OMPI_MCA_orte_ess_num_procs=1
>>> 
>>> So something goes wrong but I do not have an idea what I am
>>> missing. Do
>>> you have an idea what I need to change? Do I have to set an MCA
>>> parameter to tell OpenMPI not to start orted, or does it need
>>> another
>>> hint in the client environment beside the stuff comming from the
>>> PMIx
>>> server helper library?
>>> 
>>> 
>>> Stephan
>>> 
>>> 
>>> On Tuesday, Oct 10 2018, 08:33 -0700 Ralph H Castain wrote:
 Hi Stephan
 
 Thanks for the clarification - that helps a great deal. You are
 correct that OMPI’s orted 

[OMPI devel] GPUDirect RDMA/Async for DL Acceleration (MPI)

2018-10-12 Thread Jaco Joubert
Good day All,

My hypothesis is that with a SmartNIC offloading the CPU, some benefits of
Infiniband can also be achieved with Ethernet and I am looking for
information regarding fully supporting GPUDirect on the NIC's side.

I was able to DMA between a SmartNIC and a V100 GPU through PCIe. However,
to make this useful and more general it should work transparently with
things like MPI (and NCCL). Most resources I've found explains CUDA-Aware
MPI from a user's point of view, but I couldn't as of yet find information
about what need to be implemented on the NIC's side.

I've seen that there are MCA BTL parameters to set tcp, sm, self, openib
etc. I believe some development needs to be done in order to enable MPI to
make use of the SmartNIC, perhaps adding another BTL option? AFAIU, the
data that needs to be sent (and destination rank?), should be copied to the
TX Queue of the NIC. The NIC can then encap the raw data with relevant
headers and forward over a network without any CPU involvement.

Can anyone please point me to documentation, code, or give advice on how to
approach the integration between MPI and NIC?

Regards
Jaco
-- 
*Jaco Joubert*
*Software Engineer*

*Netronome* | 1st Floor, Southdowns Ridge Office Park, Cnr John Vorster &
  Nellmapius Street, Irene, Centurion 0157, South Africa
Phone: +27 (012) 665-4427 <(012)%665-4427> | Skype: jaco.joubert12 |
www.netronome.com
___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Hints for using an own pmix server

2018-10-12 Thread Stephan Krempel
Hallo Ralph,

> I assume this (--with-ompi-mpix-rte) is a typo as the correct option
> is —with-ompi-pmix-rte?

You were right, this was a typo, with the correct option I now managed
to start an MPI helloworld program using OpenMPI and our own process
manager with pmix server.

> It all looks okay to me for the client, but I wonder if you
> remembered to call register_nspace and register_client on your server
> prior to starting the client? If not, the connection will be dropped
> - you could add PMIX_MCA_ptl_base_verbose=100 to your environment to
> see the detailed connection handshake.

This has been a point that I could finally figure out from the prrte
code. To make it working you do not only need to call register_nspace
but also pass some specific information to it that OpenMPI considers to
be available (e.g. proc info with lrank).

A remark to pmix at this point: pmix_bfrops_base_value_load() does
silently not handle PMIX_DATA_ARRAY type leading to not working makros
PMIX_VALUE_LOAD and PMIX_INFO_LOAD with that type. I think this is
unlucky and took me a while to figure out why it comes to a segfault
when pmix tried to process my PMIX_PROC_DATA infos.

So thank you again for your help so far.


One point that remains open and is interesting for me is if I can
achieve the same with the 3.1.2 release of OpenMPI. Is it somehow
possible to configure it as there were the "--with-ompi-pmix-rte"
switch from version 4.x?

Regards,

Stephan


> 
> > On Oct 9, 2018, at 3:14 PM, Stephan Krempel 
> > wrote:
> > 
> > Hi Ralf,
> > 
> > After studying prrte a little bit, I tried something new and
> > followed
> > the description here using openmpi 4:
> > https://pmix.org/code/building-the-pmix-reference-server/
> > 
> > I configured openmpi 4.0.0rc3:
> > 
> > ../configure --enable-debug --prefix [...] --with-pmix=[...] \
> >  --with-libevent=/usr --with-ompi-mpix-rte
> > 
> > (I also tried to set --with-orte=no, but it then claims not to have
> > a
> > suitable rte and does not finish)
> > 
> > I then started my own PMIx and spawned a client compiled with mpicc
> > of
> > the new openmpi installation with this environment:
> > 
> > PMIX_NAMESPACE=namespace_3228_0_0
> > PMIX_RANK=0
> > PMIX_SERVER_URI2=pmix-server.3234;tcp4://127.0.0.1:49637
> > PMIX_SERVER_URI21=pmix-server.3234;tcp4://127.0.0.1:49637
> > PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
> > PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
> > PMIX_SECURITY_MODE=native,none
> > PMIX_PTL_MODULE=tcp,usock
> > PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
> > PMIX_GDS_MODULE=ds12,hash
> > PMIX_DSTORE_ESH_BASE_PATH=/tmp/pmix_dstor_3234
> > 
> > The client is not connecting to my pmix server and it's environment
> > after MPI_Init looks like that:
> > 
> > PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
> > PMIX_RANK=0
> > PMIX_PTL_MODULE=tcp,usock
> > PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
> > PMIX_MCA_mca_base_component_show_load_errors=1
> > PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
> > PMIX_DSTORE_ESH_BASE_PATH=/tmp/ompi.landroval.1001/pid.3243/pmix_ds
> > tor_
> > 3243
> > PMIX_SERVER_URI2=864157696.0;tcp4://127.0.0.1:33619
> > PMIX_SERVER_URI21=864157696.0;tcp4://127.0.0.1:33619
> > PMIX_SECURITY_MODE=native,none
> > PMIX_NAMESPACE=864157697
> > PMIX_GDS_MODULE=ds12,hash
> > ORTE_SCHIZO_DETECTION=ORTE
> > OMPI_COMMAND=./hello_env
> > OMPI_MCA_orte_precondition_transports=f28d6577f6b6ac08-
> > d92c0e73869e1cfa
> > OMPI_MCA_orte_launch=1
> > OMPI_APP_CTX_NUM_PROCS=1
> > OMPI_MCA_pmix=^s1,s2,cray,isolated
> > OMPI_MCA_ess=singleton
> > OMPI_MCA_orte_ess_num_procs=1
> > 
> > So something goes wrong but I do not have an idea what I am
> > missing. Do
> > you have an idea what I need to change? Do I have to set an MCA
> > parameter to tell OpenMPI not to start orted, or does it need
> > another
> > hint in the client environment beside the stuff comming from the
> > PMIx
> > server helper library?
> > 
> > 
> > Stephan
> > 
> > 
> > On Tuesday, Oct 10 2018, 08:33 -0700 Ralph H Castain wrote:
> > > Hi Stephan
> > > 
> > > Thanks for the clarification - that helps a great deal. You are
> > > correct that OMPI’s orted daemons do more than just host the PMIx
> > > server library. However, they are only active if you launch the
> > > OMPI
> > > processes using mpirun. This is probably the source of the
> > > trouble
> > > you are seeing.
> > > 
> > > Since you have a process launcher and have integrated the PMIx
> > > server
> > > support into your RM’s daemons, you really have no need for
> > > mpirun at
> > > all. You should just be able to launch the processes directly
> > > using
> > > your own launcher. The PMIx support will take care of the startup
> > > requirements. The application procs will not use the orted in
> > > such
> > > cases.
> > > 
> > > So if your system is working fine with the PMIx example programs,
> > > then just launch the OMPI apps the same way and it should just
> > > work.
> > > 
> > > On the