I should have also stated that one other difference in the "tool" setup is that the tool does not create a session directory (you can override that if you need one). This simplifies cleanup of the tool, and helps with tools like orte-clean. Note that the session directory's main function is to house the shared memory backing files and other temporary files that typically aren't used by tools as they are more MPI-specific.
So the default behavior is to -not- create the session dir - like I said, though, you can override that if you need it. Ralph On 1/16/08 9:25 AM, "Ralph Castain" <r...@lanl.gov> wrote: > Hi Josh > > I already converted orte-ps and orte-clean in the tmp/rhc-step2b branch on > OMPI's svn repository. Shouldn't be hard to convert the checkpoint/restart > tools to use it too - I may have already done some of that work, but I may > not be remembering it correctly. > > I'll do some cleanup on the code in my private repository and put the rest > of the implementation in the svn repository next week. I mostly just needed > to talk to Jeff this morning about setting up the comm library - he pointed > out that if I create a special "orte_tool_init" function that only calls > what is needed, then the linker won't bring everything else into the > executable, so a separate "library" may not be required. Still needs to be > tested to ensure I can make that work as neatly as desired. > > Appreciate the feedback > Ralph > > > > On 1/16/08 8:58 AM, "Josh Hursey" <jjhur...@open-mpi.org> wrote: > >> Ralph, >> >> This looks interesting. Can you point me to the header files and any >> ORTE tools that you may have converted to use this library already >> (e.g., orte-ps)? I can port the checkpoint/restart tools to this >> library and start sending you some feedback on the API. >> >> Cheers, >> Josh >> >> On Jan 16, 2008, at 7:47 AM, Ralph Castain wrote: >> >>> Hello all >>> >>> Summary: this note provides a brief overview of how various tools can >>> interface to OMPI applications once the next version of ORTE is >>> integrated >>> into the trunk. It includes a request for input regarding any needs >>> (e.g., >>> additional commands to be supported in the interface) that have not >>> been >>> adequately addressed. >>> >>> As many of you know, I have been working on a tmp branch to >>> complete the >>> revamp of ORTE that has been in progress for quite some time. Among >>> other >>> things, this revamp is intended to simplify the system, provide >>> enhanced >>> scalability, and improved reliability. >>> >>> As part of that effort, I have extensively revised the support for >>> external >>> tools. In the past, tools such as the Eclipse PTP could only >>> interact with >>> Open MPI-based applications via ORTE API's, thus exposing the tool >>> to any >>> changes in those APIs. Most tools, however, do not require the >>> level of >>> control provided by the APIs and can benefit from a simplified >>> interface. >>> >>> Accordingly, the revamped ORTE now offers alternative methods of >>> interaction. The primary change has been the creation of a >>> communications >>> library with a simple serial protocol for interacting with OMPI >>> jobs. Thus, >>> tools now have three choices for interacting with OMPI jobs: >>> >>> 1. I have created a new communications library that tools can link >>> against. >>> It does not include all of the ORTE or OMPI libraries, so it is a >>> very small >>> memory footprint. Besides the usual calls to initialize and >>> finalize, the >>> library contains utilities for finding all of the OMPI jobs running >>> on that >>> HNP (i.e., all OMPI jobs whose mpirun was executed from that host), >>> querying >>> the status of a job (provides the job map plus all proc states); >>> querying >>> the status of nodes (provides node names, status, and list of procs >>> on each >>> node including their state); querying the status of a specific >>> process; >>> spawning a new job; and terminating a job. In addition, you can >>> attach to >>> output streams of any process, specifying stdout, stderr, or both - >>> this >>> "tees" the specified streams, so it won't interfere with the job's >>> normal >>> output flow. >>> >>> I could also create a utility to allow attachment to the input >>> stream of a >>> process. However, I'm a little concerned about possible conflicts with >>> whatever is already flowing across that stream. I would appreciate any >>> suggestions as to whether or not to provide that capability. >>> >>> Note: we removed the concept of the ORTE "universe", so a tool can >>> now talk >>> to any mpirun without complications. Thus, tools can simultaneously >>> "connect" to and monitor multiple mpiruns, if desired. >>> >>> >>> 2. link against all of OMPI or ORTE, and execute a standalone >>> program. In >>> this mode, your tool would act as a surrogate for mpirun by directly >>> spawning the user's application. This provides some flexibility, >>> but it does >>> mean that both the tool and the job -must- end together, and that >>> the tool >>> may need to be revised whenever OMPI/ORTE APIs are updated. >>> >>> >>> 3. link against all of OMPI or ORTE, executing as a distributed set of >>> processes. In this mode, you would execute your tool via "mpirun - >>> pernode >>> ./my_tool" (or whatever command is appropriate - this example would >>> launch >>> one tool process on every node in the allocation). If the tool >>> processes >>> need to communicate with each other, they can call MPI_Init or >>> orte_init, >>> depending upon the level of desired communication. Note that the >>> tool job >>> will be completely standalone from the application job and must be >>> terminated separately. >>> >>> >>> In all of these cases, it is possible for tool processes to connect >>> (via MPI >>> and/or ORTE-RML) to a job's processes provided that the application >>> supports >>> it. >>> >>> I can provide more details, of course, to anyone wishing them. What >>> I would >>> appreciate, though, is any feedback about desired commands, mode of >>> operation, etc. that I might have missed or people would prefer be >>> changed. >>> This code is all in a private repository for my tmp branch, but I >>> expect >>> that to merge with the trunk fairly soon. I have provided a couple of >>> example tools to illustrate the above modes of operation in that code. >>> >>> Thanks >>> Ralph >>> >>> >>> >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > >