Hmmm....no, can't imagine why. I'll fix - thanks! On Jul 27, 2011, at 3:14 PM, Greg Watson wrote:
> Ralph, > > Looking good so far. I did notice that ompi-ps always seems to have an exit > code of 243. Is that on purpose? > > Greg > > On Jul 25, 2011, at 4:44 PM, Ralph Castain wrote: > >> r24944 - let me know how it works! >> >> >> On Jul 25, 2011, at 1:01 PM, Greg Watson wrote: >> >>> That would probably be more intuitive. >>> >>> Thanks, >>> Greg >>> >>> On Jul 25, 2011, at 2:28 PM, Ralph Castain wrote: >>> >>>> job 0 is mpirun and its daemons - I can have it ignore that job as I doubt >>>> users care :-) >>>> >>>> On Jul 25, 2011, at 12:25 PM, Greg Watson wrote: >>>> >>>>> Ralph, >>>>> >>>>> The output format looks good, but I'm not sure it's quite correct. If I >>>>> run the mpirun command, I see the following: >>>>> >>>>> mpirun:47520:num nodes:1:num jobs:2 >>>>> jobid:0:state:RUNNING:slots:0:num procs:0 >>>>> jobid:1:state:RUNNING:slots:1:num procs:4 >>>>> process:x:rank:0:pid:47522:node:greg.local:state:SYNC REGISTERED >>>>> process:x:rank:1:pid:47523:node:greg.local:state:SYNC REGISTERED >>>>> process:x:rank:2:pid:47524:node:greg.local:state:SYNC REGISTERED >>>>> process:x:rank:3:pid:47525:node:greg.local:state:SYNC REGISTERED >>>>> >>>>> Seems to indicate there are two jobs, but one of them has 0 procs. Is >>>>> that expected? Not a huge problem, since I can just ignore the job with 0 >>>>> procs. >>>>> >>>>> Greg >>>>> >>>>> >>>>> On Jul 23, 2011, at 6:24 PM, Ralph Castain wrote: >>>>> >>>>>> Okay, you should have it in r24929. Use: >>>>>> >>>>>> orte-ps --parseable >>>>>> >>>>>> to get the new output. >>>>>> >>>>>> >>>>>> On Jul 23, 2011, at 11:43 AM, Ralph Castain wrote: >>>>>> >>>>>>> Gar - have to eat my words a bit. The jobid requested by orte-ps is >>>>>>> just the "local" jobid - i.e., it is expecting you to provide a number >>>>>>> from 0-N, as I described below (copied here): >>>>>>> >>>>>>>> A jobid of 1 indicates the primary application, 2 and above would >>>>>>>> specify comm_spawned jobs. >>>>>>> >>>>>>> Not providing the jobid at all corresponds to wildcard and returns the >>>>>>> status of all jobs under that mpirun. >>>>>>> >>>>>>> To specify which mpirun you want info on, you use the --pid option. It >>>>>>> is this option that isn't working properly - orte-ps returns info from >>>>>>> all mpiruns and doesn't check to provide only data from the given pid. >>>>>>> >>>>>>> I'll fix that part, and implement the parsable output. >>>>>>> >>>>>>> >>>>>>> On Jul 22, 2011, at 8:55 PM, Ralph Castain wrote: >>>>>>> >>>>>>>> >>>>>>>> On Jul 22, 2011, at 3:57 PM, Greg Watson wrote: >>>>>>>> >>>>>>>>> Hi Ralph, >>>>>>>>> >>>>>>>>> I'd like three things :-) >>>>>>>>> >>>>>>>>> a) A --report-jobid option that prints the jobid on the first line in >>>>>>>>> a form that can be passed to the -jobid option on ompi-ps. Probably >>>>>>>>> tagging it in the output if -tag-output is enabled (e.g. >>>>>>>>> jobid:<jobid>) would be a good idea. >>>>>>>>> >>>>>>>>> b) The orte-ps command output to use the same jobid format. >>>>>>>> >>>>>>>> I started looking at the above, and found that orte-ps is just plain >>>>>>>> wrong in the way it handles jobid. The jobid consists of two fields: a >>>>>>>> 16-bit number indicating the mpirun, and a 16-bit number indicating >>>>>>>> the job within that mpirun. Unfortunately, orte-ps sends a data >>>>>>>> request to every mpirun out there instead of only to the one >>>>>>>> corresponding to that jobid. >>>>>>>> >>>>>>>> What we probably should do is have you indicate the mpirun of interest >>>>>>>> via the -pid option, and then let jobid tell us which job you want >>>>>>>> within that mpirun. A jobid of 1 indicates the primary application, 2 >>>>>>>> and above would specify comm_spawned jobs. A jobid of -1 would return >>>>>>>> the status of all jobs under that mpirun. >>>>>>>> >>>>>>>> If multiple mpiruns are being reported, then the "jobid" in the report >>>>>>>> should again be the "local" jobid within that mpirun. >>>>>>>> >>>>>>>> After all, you don't really care what the orte-internal 16-bit >>>>>>>> identifier is for that mpirun. >>>>>>>> >>>>>>>>> >>>>>>>>> c) A more easily parsable output format from ompi-ps. It doesn't need >>>>>>>>> to be a full blown XML format, just something like the following >>>>>>>>> would suffice: >>>>>>>>> >>>>>>>>> jobid:719585280:state:Running:slots:1:num procs:4 >>>>>>>>> process_name:./x:rank:0:pid:3082:node:node1.com:state:Running >>>>>>>>> process_name:./x:rank:1:pid:4567:node:node5.com:state:Running >>>>>>>>> process_name:./x:rank:2:pid:2343:node:node4.com:state:Running >>>>>>>>> process_name:./x:rank:3:pid:3422:node:node7.com:state:Running >>>>>>>>> jobid:345346663:state:running:slots:1:num procs:2 >>>>>>>>> process_name:./x:rank:0:pid:5563:node:node2.com:state:Running >>>>>>>>> process_name:./x:rank:1:pid:6677:node:node3.com:state:Running >>>>>>>> >>>>>>>> Shouldn't be too hard to do - bunch of if-then-else statements >>>>>>>> required, though. >>>>>>>> >>>>>>>>> >>>>>>>>> I'd be happy to help with any or all of these. >>>>>>>> >>>>>>>> Appreciate the offer - let me see how hard this proves to be... >>>>>>>> >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Greg >>>>>>>>> >>>>>>>>> On Jul 22, 2011, at 10:18 AM, Ralph Castain wrote: >>>>>>>>> >>>>>>>>>> Hmmm...well, it looks like we could have made this nicer than we did >>>>>>>>>> :-/ >>>>>>>>>> >>>>>>>>>> If you add --report-uri to the mpirun command line, you'll get back >>>>>>>>>> the uri for that mpirun. This has the form of <jobid>:<uri>. As the >>>>>>>>>> -h option indicates: >>>>>>>>>> >>>>>>>>>> -report-uri | --report-uri <arg0> >>>>>>>>>> Printout URI on stdout [-], stderr [+], or a file >>>>>>>>>> [anything else] >>>>>>>>>> >>>>>>>>>> The "jobid" required by the orte-ps command is the one reported >>>>>>>>>> there. We could easily add a --report-jobid option if that makes >>>>>>>>>> things easier. >>>>>>>>>> >>>>>>>>>> As to the difference in how orte-ps shows the jobid...well, that's >>>>>>>>>> probably historical. orte-ps uses an orte utility function to print >>>>>>>>>> the jobid, and that utility always shows the jobid in component >>>>>>>>>> form. Again, could add or just use the integer version. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Jul 22, 2011, at 7:01 AM, Greg Watson wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> Does anyone know if it's possible to get the orte jobid from the >>>>>>>>>>> mpirun command? If not, how are you supposed to get it to use with >>>>>>>>>>> orte-ps? Also, orte-ps reports the jobid in [x,y] notation, but the >>>>>>>>>>> jobid argument seems to be an integer. How does that work? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Greg >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> devel mailing list >>>>>>>>>>> de...@open-mpi.org >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> devel mailing list >>>>>>>>>> de...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> devel mailing list >>>>>>>>> de...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel