I assume this (--with-ompi-mpix-rte) is a typo as the correct option is —with-ompi-pmix-rte?
It all looks okay to me for the client, but I wonder if you remembered to call register_nspace and register_client on your server prior to starting the client? If not, the connection will be dropped - you could add PMIX_MCA_ptl_base_verbose=100 to your environment to see the detailed connection handshake. > On Oct 9, 2018, at 3:14 PM, Stephan Krempel <krem...@par-tec.com> wrote: > > Hi Ralf, > > After studying prrte a little bit, I tried something new and followed > the description here using openmpi 4: > https://pmix.org/code/building-the-pmix-reference-server/ > > I configured openmpi 4.0.0rc3: > > ../configure --enable-debug --prefix [...] --with-pmix=[...] \ > --with-libevent=/usr --with-ompi-mpix-rte > > (I also tried to set --with-orte=no, but it then claims not to have a > suitable rte and does not finish) > > I then started my own PMIx and spawned a client compiled with mpicc of > the new openmpi installation with this environment: > > PMIX_NAMESPACE=namespace_3228_0_0 > PMIX_RANK=0 > PMIX_SERVER_URI2=pmix-server.3234;tcp4://127.0.0.1:49637 > PMIX_SERVER_URI21=pmix-server.3234;tcp4://127.0.0.1:49637 > PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234 > PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234 > PMIX_SECURITY_MODE=native,none > PMIX_PTL_MODULE=tcp,usock > PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC > PMIX_GDS_MODULE=ds12,hash > PMIX_DSTORE_ESH_BASE_PATH=/tmp/pmix_dstor_3234 > > The client is not connecting to my pmix server and it's environment > after MPI_Init looks like that: > > PMIX_SERVER_URI2USOCK=pmix-server:3234:/tmp/pmix-3234 > PMIX_RANK=0 > PMIX_PTL_MODULE=tcp,usock > PMIX_SERVER_URI=pmix-server:3234:/tmp/pmix-3234 > PMIX_MCA_mca_base_component_show_load_errors=1 > PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_FULLY_DESC > PMIX_DSTORE_ESH_BASE_PATH=/tmp/ompi.landroval.1001/pid.3243/pmix_dstor_ > 3243 > PMIX_SERVER_URI2=864157696.0;tcp4://127.0.0.1:33619 > PMIX_SERVER_URI21=864157696.0;tcp4://127.0.0.1:33619 > PMIX_SECURITY_MODE=native,none > PMIX_NAMESPACE=864157697 > PMIX_GDS_MODULE=ds12,hash > ORTE_SCHIZO_DETECTION=ORTE > OMPI_COMMAND=./hello_env > OMPI_MCA_orte_precondition_transports=f28d6577f6b6ac08-d92c0e73869e1cfa > OMPI_MCA_orte_launch=1 > OMPI_APP_CTX_NUM_PROCS=1 > OMPI_MCA_pmix=^s1,s2,cray,isolated > OMPI_MCA_ess=singleton > OMPI_MCA_orte_ess_num_procs=1 > > So something goes wrong but I do not have an idea what I am missing. Do > you have an idea what I need to change? Do I have to set an MCA > parameter to tell OpenMPI not to start orted, or does it need another > hint in the client environment beside the stuff comming from the PMIx > server helper library? > > > Stephan > > > On Tuesday, Oct 10 2018, 08:33 -0700 Ralph H Castain wrote: >> Hi Stephan >> >> Thanks for the clarification - that helps a great deal. You are >> correct that OMPI’s orted daemons do more than just host the PMIx >> server library. However, they are only active if you launch the OMPI >> processes using mpirun. This is probably the source of the trouble >> you are seeing. >> >> Since you have a process launcher and have integrated the PMIx server >> support into your RM’s daemons, you really have no need for mpirun at >> all. You should just be able to launch the processes directly using >> your own launcher. The PMIx support will take care of the startup >> requirements. The application procs will not use the orted in such >> cases. >> >> So if your system is working fine with the PMIx example programs, >> then just launch the OMPI apps the same way and it should just work. >> >> On the Slurm side: I’m surprised that it doesn’t work without the >> —with-slurm option. An application proc doesn’t care about any of the >> Slurm-related code if PMIx is available. I might have access to a >> machine where I can check it… >> >> Ralph >> >> >>> On Oct 9, 2018, at 3:26 AM, Stephan Krempel <krem...@par-tec.com> >>> wrote: >>> >>> Ralph, Gilles, >>> >>> thanks for your input. >>> >>> Before I answer, let me shortly explain what my general intention >>> is. >>> We do have our own resource manager and process launcher that >>> supports >>> different MPI implementations in different ways. I want to adapt it >>> to >>> PMIx to cleanly support OpenMPI and hopefully other MPI >>> implementation >>> supporting PMIx in the future, too. >>> >>>> It sounds like what you really want to do is replace the orted, >>>> and >>>> have your orted open your PMIx server? In other words, you want >>>> to >>>> use the PMIx reference library to handle all the PMIx stuff, and >>>> provide your own backend functions to support the PMIx server >>>> calls? >>> >>> You are right, that was my original plan, and I already did it so >>> far. >>> In my environment I already can launch processes that successfully >>> call >>> PMIx client functions like put, get, fence and so on, all handled >>> by my >>> servers using the PMIx server helper library. As far as I >>> implemented >>> the server functions now, all the example programs coming with the >>> pmix >>> library are working fine. >>> >>> Then I tried to use that with OpenMPI and stumbled. >>> My first idea was to simply replace orted but after taking a closer >>> look into OpenMPI it seems to me, that it uses/needs orted not only >>> for >>> spawning and exchange of process information, but also for its >>> general >>> communication and collectives. Am I wrong with that? >>> >>> So replacing it completely is perhaps not what I want since I do >>> not >>> intent to replace OpenMPIs whole communication stuff. But perhaps I >>> do >>> mix up orte and orted here, not certain about that. >>> >>>> If so, then your best bet would be to edit the PRRTE code in >>>> orte/orted/pmix and replace it with your code. You’ll have to >>>> deal >>>> with the ORTE data objects and PRRTE’s launch procedure, but that >>>> is >>>> likely easier than trying to write your own version of “orted” >>>> from >>>> scratch. >>> >>> I think one problem here is, that I do not really understand which >>> purposes orted fulfills overall especially beside implementing the >>> PMIx >>> server side. Can you please give me a short overview? >>> >>>> As for Slurm: it behaves the same way as PRRTE. It has a plugin >>>> that >>>> implements the server backend functions, and the Slurm daemons >>>> “host” >>>> the plugin. What you would need to do is replace that plugin with >>>> your own. >>> >>> I understand that, but it also seems to need some special support >>> by >>> the several slurm modules on the OpenMPI side that I do not >>> understand, >>> yet. At least when I tried OpenMPI without slurm support and >>> `srun --mpi=pmix_v2` it does not work but generates a message that >>> slurm support in opemmpi is missing. >>> >>> >>> >>> Stephan >>> >>> >>> >>>>> On Oct 8, 2018, at 5:36 PM, Gilles Gouaillardet <gil...@rist.or >>>>> .jp> >>>>> wrote: >>>>> >>>>> Stephan, >>>>> >>>>> >>>>> Have you already checked https://github.com/pmix/prrte ? >>>>> >>>>> >>>>> This is the PMIx Reference RunTime Environment (PPRTE), which >>>>> was >>>>> built on top of orted. >>>>> >>>>> Long story short, it deploys the PMIx server and then you start >>>>> your MPI app with prun >>>>> An example is available at https://github.com/pmix/prrte/blob/m >>>>> aste >>>>> r/contrib/travis/test_client.sh >>>>> >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> >>>>> >>>>> On 10/9/2018 8:45 AM, Stephan Krempel wrote: >>>>>> Hallo everyone, >>>>>> >>>>>> I am currently implementing a PMIx server and I try to use it >>>>>> with >>>>>> OpenMPI. I do have an own mpiexec which starts my PMIx server >>>>>> and >>>>>> launches the processes. >>>>>> >>>>>> If I launch an executable linked against OpenMPI, during >>>>>> MPI_Init() the >>>>>> ORTE layer starts another PMIx server and overrides my PMIX_* >>>>>> environment so this new server is used instead of mine. >>>>>> >>>>>> So I am looking for a method to prevent orte(d) from starting >>>>>> a >>>>>> PMIx >>>>>> server. >>>>>> >>>>>> I already tried to understand what the slurm support is >>>>>> doing, >>>>>> since >>>>>> this is (at least in parts) what I think I need. Somehow when >>>>>> starting >>>>>> a job with srun --mpi=pmix_v2 the ess module pmi is started, >>>>>> but >>>>>> I was >>>>>> not able to enforce that manually by setting an MCA parameter >>>>>> (oss >>>>>> should be the correct one?!?) >>>>>> And I do not yet have a clue how the slurm support is >>>>>> working. >>>>>> >>>>>> So does anyone has a hint for me where I can find >>>>>> documentation >>>>>> or >>>>>> information concerning that or is there an easy way to >>>>>> achieve >>>>>> what I >>>>>> am trying to do that I missed? >>>>>> >>>>>> Thank you in advance. >>>>>> >>>>>> Regards, >>>>>> >>>>>> Stephan >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> devel@lists.open-mpi.org >>>>>> https://lists.open-mpi.org/mailman/listinfo/devel >>>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> devel@lists.open-mpi.org >>>>> https://lists.open-mpi.org/mailman/listinfo/devel >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> devel@lists.open-mpi.org >>>> https://lists.open-mpi.org/mailman/listinfo/devel >>> _______________________________________________ >>> devel mailing list >>> devel@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/devel >> >> _______________________________________________ >> devel mailing list >> devel@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/devel > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel _______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel