Hallo Ralph,

> I assume this (--with-ompi-mpix-rte) is a typo as the correct option
> is —with-ompi-pmix-rte?

You were right, this was a typo, with the correct option I now managed
to start an MPI helloworld program using OpenMPI and our own process
manager with pmix server.

> It all looks okay to me for the client, but I wonder if you
> remembered to call register_nspace and register_client on your server
> prior to starting the client? If not, the connection will be dropped
> - you could add PMIX_MCA_ptl_base_verbose=100 to your environment to
> see the detailed connection handshake.

This has been a point that I could finally figure out from the prrte
code. To make it working you do not only need to call register_nspace
but also pass some specific information to it that OpenMPI considers to
be available (e.g. proc info with lrank).

A remark to pmix at this point: pmix_bfrops_base_value_load() does
silently not handle PMIX_DATA_ARRAY type leading to not working makros
PMIX_VALUE_LOAD and PMIX_INFO_LOAD with that type. I think this is
unlucky and took me a while to figure out why it comes to a segfault
when pmix tried to process my PMIX_PROC_DATA infos.

So thank you again for your help so far.


One point that remains open and is interesting for me is if I can
achieve the same with the 3.1.2 release of OpenMPI. Is it somehow
possible to configure it as there were the "--with-ompi-pmix-rte"
switch from version 4.x?

Regards,

Stephan


> 
> > On Oct 9, 2018, at 3:14 PM, Stephan Krempel <krem...@par-tec.com>
> > wrote:
> > 
> > Hi Ralf,
> > 
> > After studying prrte a little bit, I tried something new and
> > followed
> > the description here using openmpi 4:
> > https://pmix.org/code/building-the-pmix-reference-server/
> > 
> > I configured openmpi 4.0.0rc3:
> > 
> > ../configure --enable-debug --prefix [...] --with-pmix=[...] \
> >  --with-libevent=/usr --with-ompi-mpix-rte
> > 
> > (I also tried to set --with-orte=no, but it then claims not to have
> > a
> > suitable rte and does not finish)
> > 
> > I then started my own PMIx and spawned a client compiled with mpicc
> > of
> > the new openmpi installation with this environment:
> > 
> > PMIX_NAMESPACE=namespace_3228_0_0
> > PMIX_RANK=0
> > PMIX_SERVER_URI2=pmix-server.3234;tcp4://127.0.0.1:49637
> > PMIX_SERVER_URI21=pmix-server.3234;tcp4://127.0.0.1:49637
> > PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
> > PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
> > PMIX_SECURITY_MODE=native,none
> > PMIX_PTL_MODULE=tcp,usock
> > PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
> > PMIX_GDS_MODULE=ds12,hash
> > PMIX_DSTORE_ESH_BASE_PATH=/tmp/pmix_dstor_3234
> > 
> > The client is not connecting to my pmix server and it's environment
> > after MPI_Init looks like that:
> > 
> > PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234
> > PMIX_RANK=0
> > PMIX_PTL_MODULE=tcp,usock
> > PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234
> > PMIX_MCA_mca_base_component_show_load_errors=1
> > PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC
> > PMIX_DSTORE_ESH_BASE_PATH=/tmp/ompi.landroval.1001/pid.3243/pmix_ds
> > tor_
> > 3243
> > PMIX_SERVER_URI2=864157696.0;tcp4://127.0.0.1:33619
> > PMIX_SERVER_URI21=864157696.0;tcp4://127.0.0.1:33619
> > PMIX_SECURITY_MODE=native,none
> > PMIX_NAMESPACE=864157697
> > PMIX_GDS_MODULE=ds12,hash
> > ORTE_SCHIZO_DETECTION=ORTE
> > OMPI_COMMAND=./hello_env
> > OMPI_MCA_orte_precondition_transports=f28d6577f6b6ac08-
> > d92c0e73869e1cfa
> > OMPI_MCA_orte_launch=1
> > OMPI_APP_CTX_NUM_PROCS=1
> > OMPI_MCA_pmix=^s1,s2,cray,isolated
> > OMPI_MCA_ess=singleton
> > OMPI_MCA_orte_ess_num_procs=1
> > 
> > So something goes wrong but I do not have an idea what I am
> > missing. Do
> > you have an idea what I need to change? Do I have to set an MCA
> > parameter to tell OpenMPI not to start orted, or does it need
> > another
> > hint in the client environment beside the stuff comming from the
> > PMIx
> > server helper library?
> > 
> > 
> > Stephan
> > 
> > 
> > On Tuesday, Oct 10 2018, 08:33 -0700 Ralph H Castain wrote:
> > > Hi Stephan
> > > 
> > > Thanks for the clarification - that helps a great deal. You are
> > > correct that OMPI’s orted daemons do more than just host the PMIx
> > > server library. However, they are only active if you launch the
> > > OMPI
> > > processes using mpirun. This is probably the source of the
> > > trouble
> > > you are seeing.
> > > 
> > > Since you have a process launcher and have integrated the PMIx
> > > server
> > > support into your RM’s daemons, you really have no need for
> > > mpirun at
> > > all. You should just be able to launch the processes directly
> > > using
> > > your own launcher. The PMIx support will take care of the startup
> > > requirements. The application procs will not use the orted in
> > > such
> > > cases.
> > > 
> > > So if your system is working fine with the PMIx example programs,
> > > then just launch the OMPI apps the same way and it should just
> > > work.
> > > 
> > > On the Slurm side: I’m surprised that it doesn’t work without the
> > > —with-slurm option. An application proc doesn’t care about any of
> > > the
> > > Slurm-related code if PMIx is available. I might have access to a
> > > machine where I can check it…
> > > 
> > > Ralph
> > > 
> > > 
> > > > On Oct 9, 2018, at 3:26 AM, Stephan Krempel <krem...@par-tec.co
> > > > m>
> > > > wrote:
> > > > 
> > > > Ralph, Gilles,
> > > > 
> > > > thanks for your input.
> > > > 
> > > > Before I answer, let me shortly explain what my general
> > > > intention
> > > > is.
> > > > We do have our own resource manager and process launcher that
> > > > supports
> > > > different MPI implementations in different ways. I want to
> > > > adapt it
> > > > to
> > > > PMIx to cleanly support OpenMPI and hopefully other MPI
> > > > implementation
> > > > supporting PMIx in the future, too. 
> > > > 
> > > > > It sounds like what you really want to do is replace the
> > > > > orted,
> > > > > and
> > > > > have your orted open your PMIx server? In other words, you
> > > > > want
> > > > > to
> > > > > use the PMIx reference library to handle all the PMIx stuff,
> > > > > and
> > > > > provide your own backend functions to support the PMIx server
> > > > > calls? 
> > > > 
> > > > You are right, that was my original plan, and I already did it
> > > > so
> > > > far.
> > > > In my environment I already can launch processes that
> > > > successfully
> > > > call
> > > > PMIx client functions like put, get, fence and so on, all
> > > > handled
> > > > by my
> > > > servers using the PMIx server helper library. As far as I
> > > > implemented
> > > > the server functions now, all the example programs coming with
> > > > the
> > > > pmix
> > > > library are working fine.
> > > > 
> > > > Then I tried to use that with OpenMPI and stumbled.
> > > > My first idea was to simply replace orted but after taking a
> > > > closer
> > > > look into OpenMPI it seems to me, that it uses/needs orted not
> > > > only
> > > > for
> > > > spawning and exchange of process information, but also for its
> > > > general
> > > > communication and collectives. Am I wrong with that?
> > > > 
> > > > So replacing it completely is perhaps not what I want since I
> > > > do
> > > > not
> > > > intent to replace OpenMPIs whole communication stuff. But
> > > > perhaps I
> > > > do
> > > > mix up orte and orted here, not certain about that.
> > > > 
> > > > > If so, then your best bet would be to edit the PRRTE code in
> > > > > orte/orted/pmix and replace it with your code. You’ll have to
> > > > > deal
> > > > > with the ORTE data objects and PRRTE’s launch procedure, but
> > > > > that
> > > > > is
> > > > > likely easier than trying to write your own version of
> > > > > “orted”
> > > > > from
> > > > > scratch.
> > > > 
> > > > I think one problem here is, that I do not really understand
> > > > which
> > > > purposes orted fulfills overall especially beside implementing
> > > > the
> > > > PMIx
> > > > server side. Can you please give me a short overview?
> > > > 
> > > > > As for Slurm: it behaves the same way as PRRTE. It has a
> > > > > plugin
> > > > > that
> > > > > implements the server backend functions, and the Slurm
> > > > > daemons
> > > > > “host”
> > > > > the plugin. What you would need to do is replace that plugin
> > > > > with
> > > > > your own.
> > > > 
> > > > I understand that, but it also seems to need some special
> > > > support
> > > > by
> > > > the several slurm modules on the OpenMPI side that I do not
> > > > understand,
> > > > yet. At least when I tried OpenMPI without slurm support and
> > > > `srun --mpi=pmix_v2` it does not work but generates a message
> > > > that
> > > > slurm support in opemmpi is missing.
> > > > 
> > > > 
> > > > 
> > > > Stephan
> > > > 
> > > > 
> > > > 
> > > > > > On Oct 8, 2018, at 5:36 PM, Gilles Gouaillardet <gilles@ris
> > > > > > t.or
> > > > > > .jp>
> > > > > > wrote:
> > > > > > 
> > > > > > Stephan,
> > > > > > 
> > > > > > 
> > > > > > Have you already checked https://github.com/pmix/prrte ?
> > > > > > 
> > > > > > 
> > > > > > This is the PMIx Reference RunTime Environment (PPRTE),
> > > > > > which
> > > > > > was
> > > > > > built on top of orted.
> > > > > > 
> > > > > > Long story short, it deploys the PMIx server and then you
> > > > > > start
> > > > > > your MPI app with prun
> > > > > > An example is available at https://github.com/pmix/prrte/bl
> > > > > > ob/m
> > > > > > aste
> > > > > > r/contrib/travis/test_client.sh
> > > > > > 
> > > > > > 
> > > > > > Cheers,
> > > > > > 
> > > > > > Gilles
> > > > > > 
> > > > > > 
> > > > > > On 10/9/2018 8:45 AM, Stephan Krempel wrote:
> > > > > > > Hallo everyone,
> > > > > > > 
> > > > > > > I am currently implementing a PMIx server and I try to
> > > > > > > use it
> > > > > > > with
> > > > > > > OpenMPI. I do have an own mpiexec which starts my PMIx
> > > > > > > server
> > > > > > > and
> > > > > > > launches the processes.
> > > > > > > 
> > > > > > > If I launch an executable linked against OpenMPI, during
> > > > > > > MPI_Init() the
> > > > > > > ORTE layer starts another PMIx server and overrides my
> > > > > > > PMIX_*
> > > > > > > environment so this new server is used instead of mine.
> > > > > > > 
> > > > > > > So I am looking for a method to prevent orte(d) from
> > > > > > > starting
> > > > > > > a
> > > > > > > PMIx
> > > > > > > server.
> > > > > > > 
> > > > > > > I already tried to understand what the slurm support is
> > > > > > > doing,
> > > > > > > since
> > > > > > > this is (at least in parts) what I think I need. Somehow
> > > > > > > when
> > > > > > > starting
> > > > > > > a job with srun --mpi=pmix_v2 the ess module pmi is
> > > > > > > started,
> > > > > > > but
> > > > > > > I was
> > > > > > > not able to enforce that manually by setting an MCA
> > > > > > > parameter
> > > > > > > (oss
> > > > > > > should be the correct one?!?)
> > > > > > > And I do not yet have a clue how the slurm support is
> > > > > > > working.
> > > > > > > 
> > > > > > > So does anyone has a hint for me where I can find
> > > > > > > documentation
> > > > > > > or
> > > > > > > information concerning that or is there an easy way to
> > > > > > > achieve
> > > > > > > what I
> > > > > > > am trying to do that I missed?
> > > > > > > 
> > > > > > > Thank you in advance.
> > > > > > > 
> > > > > > > Regards,
> > > > > > > 
> > > > > > > Stephan
> > > > > > > _______________________________________________
> > > > > > > devel mailing list
> > > > > > > devel@lists.open-mpi.org
> > > > > > > https://lists.open-mpi.org/mailman/listinfo/devel
> > > > > > > 
> > > > > > 
> > > > > > _______________________________________________
> > > > > > devel mailing list
> > > > > > devel@lists.open-mpi.org
> > > > > > https://lists.open-mpi.org/mailman/listinfo/devel
> > > > > 
> > > > > _______________________________________________
> > > > > devel mailing list
> > > > > devel@lists.open-mpi.org
> > > > > https://lists.open-mpi.org/mailman/listinfo/devel
> > > > 
> > > > _______________________________________________
> > > > devel mailing list
> > > > devel@lists.open-mpi.org
> > > > https://lists.open-mpi.org/mailman/listinfo/devel
> > > 
> > > _______________________________________________
> > > devel mailing list
> > > devel@lists.open-mpi.org
> > > https://lists.open-mpi.org/mailman/listinfo/devel
> > 
> > _______________________________________________
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/devel
> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
-- 
-- 
Stephan Krempel
HPC Software Engineer

ParTec Cluster Competence Center GmbH
Possartstraße 20
81679 München, Germany

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to