Hi Stephan Thanks for the clarification - that helps a great deal. You are correct that OMPI’s orted daemons do more than just host the PMIx server library. However, they are only active if you launch the OMPI processes using mpirun. This is probably the source of the trouble you are seeing.
Since you have a process launcher and have integrated the PMIx server support into your RM’s daemons, you really have no need for mpirun at all. You should just be able to launch the processes directly using your own launcher. The PMIx support will take care of the startup requirements. The application procs will not use the orted in such cases. So if your system is working fine with the PMIx example programs, then just launch the OMPI apps the same way and it should just work. On the Slurm side: I’m surprised that it doesn’t work without the —with-slurm option. An application proc doesn’t care about any of the Slurm-related code if PMIx is available. I might have access to a machine where I can check it… Ralph > On Oct 9, 2018, at 3:26 AM, Stephan Krempel <krem...@par-tec.com> wrote: > > Ralph, Gilles, > > thanks for your input. > > Before I answer, let me shortly explain what my general intention is. > We do have our own resource manager and process launcher that supports > different MPI implementations in different ways. I want to adapt it to > PMIx to cleanly support OpenMPI and hopefully other MPI implementation > supporting PMIx in the future, too. > >> It sounds like what you really want to do is replace the orted, and >> have your orted open your PMIx server? In other words, you want to >> use the PMIx reference library to handle all the PMIx stuff, and >> provide your own backend functions to support the PMIx server calls? > > You are right, that was my original plan, and I already did it so far. > In my environment I already can launch processes that successfully call > PMIx client functions like put, get, fence and so on, all handled by my > servers using the PMIx server helper library. As far as I implemented > the server functions now, all the example programs coming with the pmix > library are working fine. > > Then I tried to use that with OpenMPI and stumbled. > My first idea was to simply replace orted but after taking a closer > look into OpenMPI it seems to me, that it uses/needs orted not only for > spawning and exchange of process information, but also for its general > communication and collectives. Am I wrong with that? > > So replacing it completely is perhaps not what I want since I do not > intent to replace OpenMPIs whole communication stuff. But perhaps I do > mix up orte and orted here, not certain about that. > >> If so, then your best bet would be to edit the PRRTE code in >> orte/orted/pmix and replace it with your code. You’ll have to deal >> with the ORTE data objects and PRRTE’s launch procedure, but that is >> likely easier than trying to write your own version of “orted” from >> scratch. > > I think one problem here is, that I do not really understand which > purposes orted fulfills overall especially beside implementing the PMIx > server side. Can you please give me a short overview? > >> As for Slurm: it behaves the same way as PRRTE. It has a plugin that >> implements the server backend functions, and the Slurm daemons “host” >> the plugin. What you would need to do is replace that plugin with >> your own. > > I understand that, but it also seems to need some special support by > the several slurm modules on the OpenMPI side that I do not understand, > yet. At least when I tried OpenMPI without slurm support and > `srun --mpi=pmix_v2` it does not work but generates a message that > slurm support in opemmpi is missing. > > > > Stephan > > > >> >>> On Oct 8, 2018, at 5:36 PM, Gilles Gouaillardet <gil...@rist.or.jp> >>> wrote: >>> >>> Stephan, >>> >>> >>> Have you already checked https://github.com/pmix/prrte ? >>> >>> >>> This is the PMIx Reference RunTime Environment (PPRTE), which was >>> built on top of orted. >>> >>> Long story short, it deploys the PMIx server and then you start >>> your MPI app with prun >>> An example is available at https://github.com/pmix/prrte/blob/maste >>> r/contrib/travis/test_client.sh >>> >>> >>> Cheers, >>> >>> Gilles >>> >>> >>> On 10/9/2018 8:45 AM, Stephan Krempel wrote: >>>> Hallo everyone, >>>> >>>> I am currently implementing a PMIx server and I try to use it >>>> with >>>> OpenMPI. I do have an own mpiexec which starts my PMIx server and >>>> launches the processes. >>>> >>>> If I launch an executable linked against OpenMPI, during >>>> MPI_Init() the >>>> ORTE layer starts another PMIx server and overrides my PMIX_* >>>> environment so this new server is used instead of mine. >>>> >>>> So I am looking for a method to prevent orte(d) from starting a >>>> PMIx >>>> server. >>>> >>>> I already tried to understand what the slurm support is doing, >>>> since >>>> this is (at least in parts) what I think I need. Somehow when >>>> starting >>>> a job with srun --mpi=pmix_v2 the ess module pmi is started, but >>>> I was >>>> not able to enforce that manually by setting an MCA parameter >>>> (oss >>>> should be the correct one?!?) >>>> And I do not yet have a clue how the slurm support is working. >>>> >>>> So does anyone has a hint for me where I can find documentation >>>> or >>>> information concerning that or is there an easy way to achieve >>>> what I >>>> am trying to do that I missed? >>>> >>>> Thank you in advance. >>>> >>>> Regards, >>>> >>>> Stephan >>>> _______________________________________________ >>>> devel mailing list >>>> devel@lists.open-mpi.org >>>> https://lists.open-mpi.org/mailman/listinfo/devel >>>> >>> >>> _______________________________________________ >>> devel mailing list >>> devel@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/devel >> >> _______________________________________________ >> devel mailing list >> devel@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/devel > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org> > https://lists.open-mpi.org/mailman/listinfo/devel > <https://lists.open-mpi.org/mailman/listinfo/devel>
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel