Hallo Ralph, > I assume this (--with-ompi-mpix-rte) is a typo as the correct option > is —with-ompi-pmix-rte?
You were right, this was a typo, with the correct option I now managed to start an MPI helloworld program using OpenMPI and our own process manager with pmix server. > It all looks okay to me for the client, but I wonder if you > remembered to call register_nspace and register_client on your server > prior to starting the client? If not, the connection will be dropped > - you could add PMIX_MCA_ptl_base_verbose=100 to your environment to > see the detailed connection handshake. This has been a point that I could finally figure out from the prrte code. To make it working you do not only need to call register_nspace but also pass some specific information to it that OpenMPI considers to be available (e.g. proc info with lrank). A remark to pmix at this point: pmix_bfrops_base_value_load() does silently not handle PMIX_DATA_ARRAY type leading to not working makros PMIX_VALUE_LOAD and PMIX_INFO_LOAD with that type. I think this is unlucky and took me a while to figure out why it comes to a segfault when pmix tried to process my PMIX_PROC_DATA infos. So thank you again for your help so far. One point that remains open and is interesting for me is if I can achieve the same with the 3.1.2 release of OpenMPI. Is it somehow possible to configure it as there were the "--with-ompi-pmix-rte" switch from version 4.x? Regards, Stephan > > > On Oct 9, 2018, at 3:14 PM, Stephan Krempel <krem...@par-tec.com> > > wrote: > > > > Hi Ralf, > > > > After studying prrte a little bit, I tried something new and > > followed > > the description here using openmpi 4: > > https://pmix.org/code/building-the-pmix-reference-server/ > > > > I configured openmpi 4.0.0rc3: > > > > ../configure --enable-debug --prefix [...] --with-pmix=[...] \ > > --with-libevent=/usr --with-ompi-mpix-rte > > > > (I also tried to set --with-orte=no, but it then claims not to have > > a > > suitable rte and does not finish) > > > > I then started my own PMIx and spawned a client compiled with mpicc > > of > > the new openmpi installation with this environment: > > > > PMIX_NAMESPACE=namespace_3228_0_0 > > PMIX_RANK=0 > > PMIX_SERVER_URI2=pmix-server.3234;tcp4://127.0.0.1:49637 > > PMIX_SERVER_URI21=pmix-server.3234;tcp4://127.0.0.1:49637 > > PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234 > > PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234 > > PMIX_SECURITY_MODE=native,none > > PMIX_PTL_MODULE=tcp,usock > > PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC > > PMIX_GDS_MODULE=ds12,hash > > PMIX_DSTORE_ESH_BASE_PATH=/tmp/pmix_dstor_3234 > > > > The client is not connecting to my pmix server and it's environment > > after MPI_Init looks like that: > > > > PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234 > > PMIX_RANK=0 > > PMIX_PTL_MODULE=tcp,usock > > PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234 > > PMIX_MCA_mca_base_component_show_load_errors=1 > > PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC > > PMIX_DSTORE_ESH_BASE_PATH=/tmp/ompi.landroval.1001/pid.3243/pmix_ds > > tor_ > > 3243 > > PMIX_SERVER_URI2=864157696.0;tcp4://127.0.0.1:33619 > > PMIX_SERVER_URI21=864157696.0;tcp4://127.0.0.1:33619 > > PMIX_SECURITY_MODE=native,none > > PMIX_NAMESPACE=864157697 > > PMIX_GDS_MODULE=ds12,hash > > ORTE_SCHIZO_DETECTION=ORTE > > OMPI_COMMAND=./hello_env > > OMPI_MCA_orte_precondition_transports=f28d6577f6b6ac08- > > d92c0e73869e1cfa > > OMPI_MCA_orte_launch=1 > > OMPI_APP_CTX_NUM_PROCS=1 > > OMPI_MCA_pmix=^s1,s2,cray,isolated > > OMPI_MCA_ess=singleton > > OMPI_MCA_orte_ess_num_procs=1 > > > > So something goes wrong but I do not have an idea what I am > > missing. Do > > you have an idea what I need to change? Do I have to set an MCA > > parameter to tell OpenMPI not to start orted, or does it need > > another > > hint in the client environment beside the stuff comming from the > > PMIx > > server helper library? > > > > > > Stephan > > > > > > On Tuesday, Oct 10 2018, 08:33 -0700 Ralph H Castain wrote: > > > Hi Stephan > > > > > > Thanks for the clarification - that helps a great deal. You are > > > correct that OMPI’s orted daemons do more than just host the PMIx > > > server library. However, they are only active if you launch the > > > OMPI > > > processes using mpirun. This is probably the source of the > > > trouble > > > you are seeing. > > > > > > Since you have a process launcher and have integrated the PMIx > > > server > > > support into your RM’s daemons, you really have no need for > > > mpirun at > > > all. You should just be able to launch the processes directly > > > using > > > your own launcher. The PMIx support will take care of the startup > > > requirements. The application procs will not use the orted in > > > such > > > cases. > > > > > > So if your system is working fine with the PMIx example programs, > > > then just launch the OMPI apps the same way and it should just > > > work. > > > > > > On the Slurm side: I’m surprised that it doesn’t work without the > > > —with-slurm option. An application proc doesn’t care about any of > > > the > > > Slurm-related code if PMIx is available. I might have access to a > > > machine where I can check it… > > > > > > Ralph > > > > > > > > > > On Oct 9, 2018, at 3:26 AM, Stephan Krempel <krem...@par-tec.co > > > > m> > > > > wrote: > > > > > > > > Ralph, Gilles, > > > > > > > > thanks for your input. > > > > > > > > Before I answer, let me shortly explain what my general > > > > intention > > > > is. > > > > We do have our own resource manager and process launcher that > > > > supports > > > > different MPI implementations in different ways. I want to > > > > adapt it > > > > to > > > > PMIx to cleanly support OpenMPI and hopefully other MPI > > > > implementation > > > > supporting PMIx in the future, too. > > > > > > > > > It sounds like what you really want to do is replace the > > > > > orted, > > > > > and > > > > > have your orted open your PMIx server? In other words, you > > > > > want > > > > > to > > > > > use the PMIx reference library to handle all the PMIx stuff, > > > > > and > > > > > provide your own backend functions to support the PMIx server > > > > > calls? > > > > > > > > You are right, that was my original plan, and I already did it > > > > so > > > > far. > > > > In my environment I already can launch processes that > > > > successfully > > > > call > > > > PMIx client functions like put, get, fence and so on, all > > > > handled > > > > by my > > > > servers using the PMIx server helper library. As far as I > > > > implemented > > > > the server functions now, all the example programs coming with > > > > the > > > > pmix > > > > library are working fine. > > > > > > > > Then I tried to use that with OpenMPI and stumbled. > > > > My first idea was to simply replace orted but after taking a > > > > closer > > > > look into OpenMPI it seems to me, that it uses/needs orted not > > > > only > > > > for > > > > spawning and exchange of process information, but also for its > > > > general > > > > communication and collectives. Am I wrong with that? > > > > > > > > So replacing it completely is perhaps not what I want since I > > > > do > > > > not > > > > intent to replace OpenMPIs whole communication stuff. But > > > > perhaps I > > > > do > > > > mix up orte and orted here, not certain about that. > > > > > > > > > If so, then your best bet would be to edit the PRRTE code in > > > > > orte/orted/pmix and replace it with your code. You’ll have to > > > > > deal > > > > > with the ORTE data objects and PRRTE’s launch procedure, but > > > > > that > > > > > is > > > > > likely easier than trying to write your own version of > > > > > “orted” > > > > > from > > > > > scratch. > > > > > > > > I think one problem here is, that I do not really understand > > > > which > > > > purposes orted fulfills overall especially beside implementing > > > > the > > > > PMIx > > > > server side. Can you please give me a short overview? > > > > > > > > > As for Slurm: it behaves the same way as PRRTE. It has a > > > > > plugin > > > > > that > > > > > implements the server backend functions, and the Slurm > > > > > daemons > > > > > “host” > > > > > the plugin. What you would need to do is replace that plugin > > > > > with > > > > > your own. > > > > > > > > I understand that, but it also seems to need some special > > > > support > > > > by > > > > the several slurm modules on the OpenMPI side that I do not > > > > understand, > > > > yet. At least when I tried OpenMPI without slurm support and > > > > `srun --mpi=pmix_v2` it does not work but generates a message > > > > that > > > > slurm support in opemmpi is missing. > > > > > > > > > > > > > > > > Stephan > > > > > > > > > > > > > > > > > > On Oct 8, 2018, at 5:36 PM, Gilles Gouaillardet <gilles@ris > > > > > > t.or > > > > > > .jp> > > > > > > wrote: > > > > > > > > > > > > Stephan, > > > > > > > > > > > > > > > > > > Have you already checked https://github.com/pmix/prrte ? > > > > > > > > > > > > > > > > > > This is the PMIx Reference RunTime Environment (PPRTE), > > > > > > which > > > > > > was > > > > > > built on top of orted. > > > > > > > > > > > > Long story short, it deploys the PMIx server and then you > > > > > > start > > > > > > your MPI app with prun > > > > > > An example is available at https://github.com/pmix/prrte/bl > > > > > > ob/m > > > > > > aste > > > > > > r/contrib/travis/test_client.sh > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Gilles > > > > > > > > > > > > > > > > > > On 10/9/2018 8:45 AM, Stephan Krempel wrote: > > > > > > > Hallo everyone, > > > > > > > > > > > > > > I am currently implementing a PMIx server and I try to > > > > > > > use it > > > > > > > with > > > > > > > OpenMPI. I do have an own mpiexec which starts my PMIx > > > > > > > server > > > > > > > and > > > > > > > launches the processes. > > > > > > > > > > > > > > If I launch an executable linked against OpenMPI, during > > > > > > > MPI_Init() the > > > > > > > ORTE layer starts another PMIx server and overrides my > > > > > > > PMIX_* > > > > > > > environment so this new server is used instead of mine. > > > > > > > > > > > > > > So I am looking for a method to prevent orte(d) from > > > > > > > starting > > > > > > > a > > > > > > > PMIx > > > > > > > server. > > > > > > > > > > > > > > I already tried to understand what the slurm support is > > > > > > > doing, > > > > > > > since > > > > > > > this is (at least in parts) what I think I need. Somehow > > > > > > > when > > > > > > > starting > > > > > > > a job with srun --mpi=pmix_v2 the ess module pmi is > > > > > > > started, > > > > > > > but > > > > > > > I was > > > > > > > not able to enforce that manually by setting an MCA > > > > > > > parameter > > > > > > > (oss > > > > > > > should be the correct one?!?) > > > > > > > And I do not yet have a clue how the slurm support is > > > > > > > working. > > > > > > > > > > > > > > So does anyone has a hint for me where I can find > > > > > > > documentation > > > > > > > or > > > > > > > information concerning that or is there an easy way to > > > > > > > achieve > > > > > > > what I > > > > > > > am trying to do that I missed? > > > > > > > > > > > > > > Thank you in advance. > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > Stephan > > > > > > > _______________________________________________ > > > > > > > devel mailing list > > > > > > > devel@lists.open-mpi.org > > > > > > > https://lists.open-mpi.org/mailman/listinfo/devel > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > devel mailing list > > > > > > devel@lists.open-mpi.org > > > > > > https://lists.open-mpi.org/mailman/listinfo/devel > > > > > > > > > > _______________________________________________ > > > > > devel mailing list > > > > > devel@lists.open-mpi.org > > > > > https://lists.open-mpi.org/mailman/listinfo/devel > > > > > > > > _______________________________________________ > > > > devel mailing list > > > > devel@lists.open-mpi.org > > > > https://lists.open-mpi.org/mailman/listinfo/devel > > > > > > _______________________________________________ > > > devel mailing list > > > devel@lists.open-mpi.org > > > https://lists.open-mpi.org/mailman/listinfo/devel > > > > _______________________________________________ > > devel mailing list > > devel@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/devel > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel -- -- Stephan Krempel HPC Software Engineer ParTec Cluster Competence Center GmbH Possartstraße 20 81679 München, Germany
signature.asc
Description: This is a digitally signed message part
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel