Hi Josh I already converted orte-ps and orte-clean in the tmp/rhc-step2b branch on OMPI's svn repository. Shouldn't be hard to convert the checkpoint/restart tools to use it too - I may have already done some of that work, but I may not be remembering it correctly.
I'll do some cleanup on the code in my private repository and put the rest of the implementation in the svn repository next week. I mostly just needed to talk to Jeff this morning about setting up the comm library - he pointed out that if I create a special "orte_tool_init" function that only calls what is needed, then the linker won't bring everything else into the executable, so a separate "library" may not be required. Still needs to be tested to ensure I can make that work as neatly as desired. Appreciate the feedback Ralph On 1/16/08 8:58 AM, "Josh Hursey" <jjhur...@open-mpi.org> wrote: > Ralph, > > This looks interesting. Can you point me to the header files and any > ORTE tools that you may have converted to use this library already > (e.g., orte-ps)? I can port the checkpoint/restart tools to this > library and start sending you some feedback on the API. > > Cheers, > Josh > > On Jan 16, 2008, at 7:47 AM, Ralph Castain wrote: > >> Hello all >> >> Summary: this note provides a brief overview of how various tools can >> interface to OMPI applications once the next version of ORTE is >> integrated >> into the trunk. It includes a request for input regarding any needs >> (e.g., >> additional commands to be supported in the interface) that have not >> been >> adequately addressed. >> >> As many of you know, I have been working on a tmp branch to >> complete the >> revamp of ORTE that has been in progress for quite some time. Among >> other >> things, this revamp is intended to simplify the system, provide >> enhanced >> scalability, and improved reliability. >> >> As part of that effort, I have extensively revised the support for >> external >> tools. In the past, tools such as the Eclipse PTP could only >> interact with >> Open MPI-based applications via ORTE API's, thus exposing the tool >> to any >> changes in those APIs. Most tools, however, do not require the >> level of >> control provided by the APIs and can benefit from a simplified >> interface. >> >> Accordingly, the revamped ORTE now offers alternative methods of >> interaction. The primary change has been the creation of a >> communications >> library with a simple serial protocol for interacting with OMPI >> jobs. Thus, >> tools now have three choices for interacting with OMPI jobs: >> >> 1. I have created a new communications library that tools can link >> against. >> It does not include all of the ORTE or OMPI libraries, so it is a >> very small >> memory footprint. Besides the usual calls to initialize and >> finalize, the >> library contains utilities for finding all of the OMPI jobs running >> on that >> HNP (i.e., all OMPI jobs whose mpirun was executed from that host), >> querying >> the status of a job (provides the job map plus all proc states); >> querying >> the status of nodes (provides node names, status, and list of procs >> on each >> node including their state); querying the status of a specific >> process; >> spawning a new job; and terminating a job. In addition, you can >> attach to >> output streams of any process, specifying stdout, stderr, or both - >> this >> "tees" the specified streams, so it won't interfere with the job's >> normal >> output flow. >> >> I could also create a utility to allow attachment to the input >> stream of a >> process. However, I'm a little concerned about possible conflicts with >> whatever is already flowing across that stream. I would appreciate any >> suggestions as to whether or not to provide that capability. >> >> Note: we removed the concept of the ORTE "universe", so a tool can >> now talk >> to any mpirun without complications. Thus, tools can simultaneously >> "connect" to and monitor multiple mpiruns, if desired. >> >> >> 2. link against all of OMPI or ORTE, and execute a standalone >> program. In >> this mode, your tool would act as a surrogate for mpirun by directly >> spawning the user's application. This provides some flexibility, >> but it does >> mean that both the tool and the job -must- end together, and that >> the tool >> may need to be revised whenever OMPI/ORTE APIs are updated. >> >> >> 3. link against all of OMPI or ORTE, executing as a distributed set of >> processes. In this mode, you would execute your tool via "mpirun - >> pernode >> ./my_tool" (or whatever command is appropriate - this example would >> launch >> one tool process on every node in the allocation). If the tool >> processes >> need to communicate with each other, they can call MPI_Init or >> orte_init, >> depending upon the level of desired communication. Note that the >> tool job >> will be completely standalone from the application job and must be >> terminated separately. >> >> >> In all of these cases, it is possible for tool processes to connect >> (via MPI >> and/or ORTE-RML) to a job's processes provided that the application >> supports >> it. >> >> I can provide more details, of course, to anyone wishing them. What >> I would >> appreciate, though, is any feedback about desired commands, mode of >> operation, etc. that I might have missed or people would prefer be >> changed. >> This code is all in a private repository for my tmp branch, but I >> expect >> that to merge with the trunk fairly soon. I have provided a couple of >> example tools to illustrate the above modes of operation in that code. >> >> Thanks >> Ralph >> >> >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >