Okay, you should have it in r24929. Use: orte-ps --parseable
to get the new output. On Jul 23, 2011, at 11:43 AM, Ralph Castain wrote: > Gar - have to eat my words a bit. The jobid requested by orte-ps is just the > "local" jobid - i.e., it is expecting you to provide a number from 0-N, as I > described below (copied here): > >> A jobid of 1 indicates the primary application, 2 and above would specify >> comm_spawned jobs. > > Not providing the jobid at all corresponds to wildcard and returns the status > of all jobs under that mpirun. > > To specify which mpirun you want info on, you use the --pid option. It is > this option that isn't working properly - orte-ps returns info from all > mpiruns and doesn't check to provide only data from the given pid. > > I'll fix that part, and implement the parsable output. > > > On Jul 22, 2011, at 8:55 PM, Ralph Castain wrote: > >> >> On Jul 22, 2011, at 3:57 PM, Greg Watson wrote: >> >>> Hi Ralph, >>> >>> I'd like three things :-) >>> >>> a) A --report-jobid option that prints the jobid on the first line in a >>> form that can be passed to the -jobid option on ompi-ps. Probably tagging >>> it in the output if -tag-output is enabled (e.g. jobid:<jobid>) would be a >>> good idea. >>> >>> b) The orte-ps command output to use the same jobid format. >> >> I started looking at the above, and found that orte-ps is just plain wrong >> in the way it handles jobid. The jobid consists of two fields: a 16-bit >> number indicating the mpirun, and a 16-bit number indicating the job within >> that mpirun. Unfortunately, orte-ps sends a data request to every mpirun out >> there instead of only to the one corresponding to that jobid. >> >> What we probably should do is have you indicate the mpirun of interest via >> the -pid option, and then let jobid tell us which job you want within that >> mpirun. A jobid of 1 indicates the primary application, 2 and above would >> specify comm_spawned jobs. A jobid of -1 would return the status of all jobs >> under that mpirun. >> >> If multiple mpiruns are being reported, then the "jobid" in the report >> should again be the "local" jobid within that mpirun. >> >> After all, you don't really care what the orte-internal 16-bit identifier is >> for that mpirun. >> >>> >>> c) A more easily parsable output format from ompi-ps. It doesn't need to be >>> a full blown XML format, just something like the following would suffice: >>> >>> jobid:719585280:state:Running:slots:1:num procs:4 >>> process_name:./x:rank:0:pid:3082:node:node1.com:state:Running >>> process_name:./x:rank:1:pid:4567:node:node5.com:state:Running >>> process_name:./x:rank:2:pid:2343:node:node4.com:state:Running >>> process_name:./x:rank:3:pid:3422:node:node7.com:state:Running >>> jobid:345346663:state:running:slots:1:num procs:2 >>> process_name:./x:rank:0:pid:5563:node:node2.com:state:Running >>> process_name:./x:rank:1:pid:6677:node:node3.com:state:Running >> >> Shouldn't be too hard to do - bunch of if-then-else statements required, >> though. >> >>> >>> I'd be happy to help with any or all of these. >> >> Appreciate the offer - let me see how hard this proves to be... >> >>> >>> Cheers, >>> Greg >>> >>> On Jul 22, 2011, at 10:18 AM, Ralph Castain wrote: >>> >>>> Hmmm...well, it looks like we could have made this nicer than we did :-/ >>>> >>>> If you add --report-uri to the mpirun command line, you'll get back the >>>> uri for that mpirun. This has the form of <jobid>:<uri>. As the -h option >>>> indicates: >>>> >>>> -report-uri | --report-uri <arg0> >>>> Printout URI on stdout [-], stderr [+], or a file >>>> [anything else] >>>> >>>> The "jobid" required by the orte-ps command is the one reported there. We >>>> could easily add a --report-jobid option if that makes things easier. >>>> >>>> As to the difference in how orte-ps shows the jobid...well, that's >>>> probably historical. orte-ps uses an orte utility function to print the >>>> jobid, and that utility always shows the jobid in component form. Again, >>>> could add or just use the integer version. >>>> >>>> >>>> On Jul 22, 2011, at 7:01 AM, Greg Watson wrote: >>>> >>>>> Hi all, >>>>> >>>>> Does anyone know if it's possible to get the orte jobid from the mpirun >>>>> command? If not, how are you supposed to get it to use with orte-ps? >>>>> Also, orte-ps reports the jobid in [x,y] notation, but the jobid argument >>>>> seems to be an integer. How does that work? >>>>> >>>>> Thanks, >>>>> Greg >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >