I assume this (--with-ompi-mpix-rte) is a typo as the correct option is 
—with-ompi-pmix-rte?

It all looks okay to me for the client, but I wonder if you remembered to call 
register_nspace and register_client on your server prior to starting the 
client? If not, the connection will be dropped - you could add 
PMIX_MCA_ptl_base_verbose=100 to your environment to see the detailed 
connection handshake.

> On Oct 9, 2018, at 3:14 PM, Stephan Krempel <krem...@par-tec.com> wrote:
> 
> Hi Ralf,
> 
> After studying prrte a little bit, I tried something new and followed
> the description here using openmpi 4:
> https://pmix.org/code/building-the-pmix-reference-server/
> 
> I configured openmpi 4.0.0rc3:
> 
> ../configure --enable-debug --prefix [...] --with-pmix=[...] \
>  --with-libevent=/usr --with-ompi-mpix-rte
> 
> (I also tried to set --with-orte=no, but it then claims not to have a
> suitable rte and does not finish)
> 
> I then started my own PMIx and spawned a client compiled with mpicc of
> the new openmpi installation with this environment:
> 
> PMIX_NAMESPACE=namespace_3228_0_0
> PMIX_RANK=0
> PMIX_SERVER_URI2=pmix-server.3234;tcp4://127.0.0.1:49637
> PMIX_SERVER_URI21=pmix-server.3234;tcp4://127.0.0.1:49637
> PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
> PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
> PMIX_SECURITY_MODE=native,none
> PMIX_PTL_MODULE=tcp,usock
> PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
> PMIX_GDS_MODULE=ds12,hash
> PMIX_DSTORE_ESH_BASE_PATH=/tmp/pmix_dstor_3234
> 
> The client is not connecting to my pmix server and it's environment
> after MPI_Init looks like that:
> 
> PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
> PMIX_RANK=0
> PMIX_PTL_MODULE=tcp,usock
> PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
> PMIX_MCA_mca_base_component_show_load_errors=1
> PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
> PMIX_DSTORE_ESH_BASE_PATH=/tmp/ompi.landroval.1001/pid.3243/pmix_dstor_
> 3243
> PMIX_SERVER_URI2=864157696.0;tcp4://127.0.0.1:33619
> PMIX_SERVER_URI21=864157696.0;tcp4://127.0.0.1:33619
> PMIX_SECURITY_MODE=native,none
> PMIX_NAMESPACE=864157697
> PMIX_GDS_MODULE=ds12,hash
> ORTE_SCHIZO_DETECTION=ORTE
> OMPI_COMMAND=./hello_env
> OMPI_MCA_orte_precondition_transports=f28d6577f6b6ac08-d92c0e73869e1cfa
> OMPI_MCA_orte_launch=1
> OMPI_APP_CTX_NUM_PROCS=1
> OMPI_MCA_pmix=^s1,s2,cray,isolated
> OMPI_MCA_ess=singleton
> OMPI_MCA_orte_ess_num_procs=1
> 
> So something goes wrong but I do not have an idea what I am missing. Do
> you have an idea what I need to change? Do I have to set an MCA
> parameter to tell OpenMPI not to start orted, or does it need another
> hint in the client environment beside the stuff comming from the PMIx
> server helper library?
> 
> 
> Stephan
> 
> 
> On Tuesday, Oct 10 2018, 08:33 -0700 Ralph H Castain wrote:
>> Hi Stephan
>> 
>> Thanks for the clarification - that helps a great deal. You are
>> correct that OMPI’s orted daemons do more than just host the PMIx
>> server library. However, they are only active if you launch the OMPI
>> processes using mpirun. This is probably the source of the trouble
>> you are seeing.
>> 
>> Since you have a process launcher and have integrated the PMIx server
>> support into your RM’s daemons, you really have no need for mpirun at
>> all. You should just be able to launch the processes directly using
>> your own launcher. The PMIx support will take care of the startup
>> requirements. The application procs will not use the orted in such
>> cases.
>> 
>> So if your system is working fine with the PMIx example programs,
>> then just launch the OMPI apps the same way and it should just work.
>> 
>> On the Slurm side: I’m surprised that it doesn’t work without the
>> —with-slurm option. An application proc doesn’t care about any of the
>> Slurm-related code if PMIx is available. I might have access to a
>> machine where I can check it…
>> 
>> Ralph
>> 
>> 
>>> On Oct 9, 2018, at 3:26 AM, Stephan Krempel <krem...@par-tec.com>
>>> wrote:
>>> 
>>> Ralph, Gilles,
>>> 
>>> thanks for your input.
>>> 
>>> Before I answer, let me shortly explain what my general intention
>>> is.
>>> We do have our own resource manager and process launcher that
>>> supports
>>> different MPI implementations in different ways. I want to adapt it
>>> to
>>> PMIx to cleanly support OpenMPI and hopefully other MPI
>>> implementation
>>> supporting PMIx in the future, too. 
>>> 
>>>> It sounds like what you really want to do is replace the orted,
>>>> and
>>>> have your orted open your PMIx server? In other words, you want
>>>> to
>>>> use the PMIx reference library to handle all the PMIx stuff, and
>>>> provide your own backend functions to support the PMIx server
>>>> calls? 
>>> 
>>> You are right, that was my original plan, and I already did it so
>>> far.
>>> In my environment I already can launch processes that successfully
>>> call
>>> PMIx client functions like put, get, fence and so on, all handled
>>> by my
>>> servers using the PMIx server helper library. As far as I
>>> implemented
>>> the server functions now, all the example programs coming with the
>>> pmix
>>> library are working fine.
>>> 
>>> Then I tried to use that with OpenMPI and stumbled.
>>> My first idea was to simply replace orted but after taking a closer
>>> look into OpenMPI it seems to me, that it uses/needs orted not only
>>> for
>>> spawning and exchange of process information, but also for its
>>> general
>>> communication and collectives. Am I wrong with that?
>>> 
>>> So replacing it completely is perhaps not what I want since I do
>>> not
>>> intent to replace OpenMPIs whole communication stuff. But perhaps I
>>> do
>>> mix up orte and orted here, not certain about that.
>>> 
>>>> If so, then your best bet would be to edit the PRRTE code in
>>>> orte/orted/pmix and replace it with your code. You’ll have to
>>>> deal
>>>> with the ORTE data objects and PRRTE’s launch procedure, but that
>>>> is
>>>> likely easier than trying to write your own version of “orted”
>>>> from
>>>> scratch.
>>> 
>>> I think one problem here is, that I do not really understand which
>>> purposes orted fulfills overall especially beside implementing the
>>> PMIx
>>> server side. Can you please give me a short overview?
>>> 
>>>> As for Slurm: it behaves the same way as PRRTE. It has a plugin
>>>> that
>>>> implements the server backend functions, and the Slurm daemons
>>>> “host”
>>>> the plugin. What you would need to do is replace that plugin with
>>>> your own.
>>> 
>>> I understand that, but it also seems to need some special support
>>> by
>>> the several slurm modules on the OpenMPI side that I do not
>>> understand,
>>> yet. At least when I tried OpenMPI without slurm support and
>>> `srun --mpi=pmix_v2` it does not work but generates a message that
>>> slurm support in opemmpi is missing.
>>> 
>>> 
>>> 
>>> Stephan
>>> 
>>> 
>>> 
>>>>> On Oct 8, 2018, at 5:36 PM, Gilles Gouaillardet <gil...@rist.or
>>>>> .jp>
>>>>> wrote:
>>>>> 
>>>>> Stephan,
>>>>> 
>>>>> 
>>>>> Have you already checked https://github.com/pmix/prrte ?
>>>>> 
>>>>> 
>>>>> This is the PMIx Reference RunTime Environment (PPRTE), which
>>>>> was
>>>>> built on top of orted.
>>>>> 
>>>>> Long story short, it deploys the PMIx server and then you start
>>>>> your MPI app with prun
>>>>> An example is available at https://github.com/pmix/prrte/blob/m
>>>>> aste
>>>>> r/contrib/travis/test_client.sh
>>>>> 
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Gilles
>>>>> 
>>>>> 
>>>>> On 10/9/2018 8:45 AM, Stephan Krempel wrote:
>>>>>> Hallo everyone,
>>>>>> 
>>>>>> I am currently implementing a PMIx server and I try to use it
>>>>>> with
>>>>>> OpenMPI. I do have an own mpiexec which starts my PMIx server
>>>>>> and
>>>>>> launches the processes.
>>>>>> 
>>>>>> If I launch an executable linked against OpenMPI, during
>>>>>> MPI_Init() the
>>>>>> ORTE layer starts another PMIx server and overrides my PMIX_*
>>>>>> environment so this new server is used instead of mine.
>>>>>> 
>>>>>> So I am looking for a method to prevent orte(d) from starting
>>>>>> a
>>>>>> PMIx
>>>>>> server.
>>>>>> 
>>>>>> I already tried to understand what the slurm support is
>>>>>> doing,
>>>>>> since
>>>>>> this is (at least in parts) what I think I need. Somehow when
>>>>>> starting
>>>>>> a job with srun --mpi=pmix_v2 the ess module pmi is started,
>>>>>> but
>>>>>> I was
>>>>>> not able to enforce that manually by setting an MCA parameter
>>>>>> (oss
>>>>>> should be the correct one?!?)
>>>>>> And I do not yet have a clue how the slurm support is
>>>>>> working.
>>>>>> 
>>>>>> So does anyone has a hint for me where I can find
>>>>>> documentation
>>>>>> or
>>>>>> information concerning that or is there an easy way to
>>>>>> achieve
>>>>>> what I
>>>>>> am trying to do that I missed?
>>>>>> 
>>>>>> Thank you in advance.
>>>>>> 
>>>>>> Regards,
>>>>>> 
>>>>>> Stephan
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel@lists.open-mpi.org
>>>>>> https://lists.open-mpi.org/mailman/listinfo/devel
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel@lists.open-mpi.org
>>>>> https://lists.open-mpi.org/mailman/listinfo/devel
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel@lists.open-mpi.org
>>>> https://lists.open-mpi.org/mailman/listinfo/devel
>>> _______________________________________________
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/devel
>> 
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to