Hmmm....no, can't imagine why. I'll fix - thanks!

On Jul 27, 2011, at 3:14 PM, Greg Watson wrote:

> Ralph,
> 
> Looking good so far. I did notice that ompi-ps always seems to have an exit 
> code of 243. Is that on purpose?
> 
> Greg
> 
> On Jul 25, 2011, at 4:44 PM, Ralph Castain wrote:
> 
>> r24944 - let me know how it works!
>> 
>> 
>> On Jul 25, 2011, at 1:01 PM, Greg Watson wrote:
>> 
>>> That would probably be more intuitive.
>>> 
>>> Thanks,
>>> Greg
>>> 
>>> On Jul 25, 2011, at 2:28 PM, Ralph Castain wrote:
>>> 
>>>> job 0 is mpirun and its daemons - I can have it ignore that job as I doubt 
>>>> users care :-)
>>>> 
>>>> On Jul 25, 2011, at 12:25 PM, Greg Watson wrote:
>>>> 
>>>>> Ralph,
>>>>> 
>>>>> The output format looks good, but I'm not sure it's quite correct. If I 
>>>>> run the mpirun command, I see the following:
>>>>> 
>>>>> mpirun:47520:num nodes:1:num jobs:2
>>>>> jobid:0:state:RUNNING:slots:0:num procs:0
>>>>> jobid:1:state:RUNNING:slots:1:num procs:4
>>>>> process:x:rank:0:pid:47522:node:greg.local:state:SYNC REGISTERED
>>>>> process:x:rank:1:pid:47523:node:greg.local:state:SYNC REGISTERED
>>>>> process:x:rank:2:pid:47524:node:greg.local:state:SYNC REGISTERED
>>>>> process:x:rank:3:pid:47525:node:greg.local:state:SYNC REGISTERED
>>>>> 
>>>>> Seems to indicate there are two jobs, but one of them has 0 procs. Is 
>>>>> that expected? Not a huge problem, since I can just ignore the job with 0 
>>>>> procs.
>>>>> 
>>>>> Greg
>>>>> 
>>>>> 
>>>>> On Jul 23, 2011, at 6:24 PM, Ralph Castain wrote:
>>>>> 
>>>>>> Okay, you should have it in r24929. Use:
>>>>>> 
>>>>>> orte-ps --parseable
>>>>>> 
>>>>>> to get the new output.
>>>>>> 
>>>>>> 
>>>>>> On Jul 23, 2011, at 11:43 AM, Ralph Castain wrote:
>>>>>> 
>>>>>>> Gar - have to eat my words a bit. The jobid requested by orte-ps is 
>>>>>>> just the "local" jobid - i.e., it is expecting you to provide a number 
>>>>>>> from 0-N, as I described below (copied here):
>>>>>>> 
>>>>>>>> A jobid of 1 indicates the primary application, 2 and above would 
>>>>>>>> specify comm_spawned jobs. 
>>>>>>> 
>>>>>>> Not providing the jobid at all corresponds to wildcard and returns the 
>>>>>>> status of all jobs under that mpirun.
>>>>>>> 
>>>>>>> To specify which mpirun you want info on, you use the --pid option. It 
>>>>>>> is this option that isn't working properly - orte-ps returns info from 
>>>>>>> all mpiruns and doesn't check to provide only data from the given pid.
>>>>>>> 
>>>>>>> I'll fix that part, and implement the parsable output.
>>>>>>> 
>>>>>>> 
>>>>>>> On Jul 22, 2011, at 8:55 PM, Ralph Castain wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> On Jul 22, 2011, at 3:57 PM, Greg Watson wrote:
>>>>>>>> 
>>>>>>>>> Hi Ralph,
>>>>>>>>> 
>>>>>>>>> I'd like three things :-)
>>>>>>>>> 
>>>>>>>>> a) A --report-jobid option that prints the jobid on the first line in 
>>>>>>>>> a form that can be passed to the -jobid option on ompi-ps. Probably 
>>>>>>>>> tagging it in the output if -tag-output is enabled (e.g. 
>>>>>>>>> jobid:<jobid>) would be a good idea.
>>>>>>>>> 
>>>>>>>>> b) The orte-ps command output to use the same jobid format.
>>>>>>>> 
>>>>>>>> I started looking at the above, and found that orte-ps is just plain 
>>>>>>>> wrong in the way it handles jobid. The jobid consists of two fields: a 
>>>>>>>> 16-bit number indicating the mpirun, and a 16-bit number indicating 
>>>>>>>> the job within that mpirun. Unfortunately, orte-ps sends a data 
>>>>>>>> request to every mpirun out there instead of only to the one 
>>>>>>>> corresponding to that jobid.
>>>>>>>> 
>>>>>>>> What we probably should do is have you indicate the mpirun of interest 
>>>>>>>> via the -pid option, and then let jobid tell us which job you want 
>>>>>>>> within that mpirun. A jobid of 1 indicates the primary application, 2 
>>>>>>>> and above would specify comm_spawned jobs. A jobid of -1 would return 
>>>>>>>> the status of all jobs under that mpirun.
>>>>>>>> 
>>>>>>>> If multiple mpiruns are being reported, then the "jobid" in the report 
>>>>>>>> should again be the "local" jobid within that mpirun.
>>>>>>>> 
>>>>>>>> After all, you don't really care what the orte-internal 16-bit 
>>>>>>>> identifier is for that mpirun.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> c) A more easily parsable output format from ompi-ps. It doesn't need 
>>>>>>>>> to be a full blown XML format, just something like the following 
>>>>>>>>> would suffice:
>>>>>>>>> 
>>>>>>>>> jobid:719585280:state:Running:slots:1:num procs:4
>>>>>>>>> process_name:./x:rank:0:pid:3082:node:node1.com:state:Running
>>>>>>>>> process_name:./x:rank:1:pid:4567:node:node5.com:state:Running
>>>>>>>>> process_name:./x:rank:2:pid:2343:node:node4.com:state:Running
>>>>>>>>> process_name:./x:rank:3:pid:3422:node:node7.com:state:Running
>>>>>>>>> jobid:345346663:state:running:slots:1:num procs:2
>>>>>>>>> process_name:./x:rank:0:pid:5563:node:node2.com:state:Running
>>>>>>>>> process_name:./x:rank:1:pid:6677:node:node3.com:state:Running
>>>>>>>> 
>>>>>>>> Shouldn't be too hard to do - bunch of if-then-else statements 
>>>>>>>> required, though.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I'd be happy to help with any or all of these.
>>>>>>>> 
>>>>>>>> Appreciate the offer - let me see how hard this proves to be...
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> Greg
>>>>>>>>> 
>>>>>>>>> On Jul 22, 2011, at 10:18 AM, Ralph Castain wrote:
>>>>>>>>> 
>>>>>>>>>> Hmmm...well, it looks like we could have made this nicer than we did 
>>>>>>>>>> :-/
>>>>>>>>>> 
>>>>>>>>>> If you add --report-uri to the mpirun command line, you'll get back 
>>>>>>>>>> the uri for that mpirun. This has the form of <jobid>:<uri>. As the 
>>>>>>>>>> -h option indicates:
>>>>>>>>>> 
>>>>>>>>>> -report-uri | --report-uri <arg0>  
>>>>>>>>>>                Printout URI on stdout [-], stderr [+], or a file
>>>>>>>>>>                [anything else]
>>>>>>>>>> 
>>>>>>>>>> The "jobid" required by the orte-ps command is the one reported 
>>>>>>>>>> there. We could easily add a --report-jobid option if that makes 
>>>>>>>>>> things easier.
>>>>>>>>>> 
>>>>>>>>>> As to the difference in how orte-ps shows the jobid...well, that's 
>>>>>>>>>> probably historical. orte-ps uses an orte utility function to print 
>>>>>>>>>> the jobid, and that utility always shows the jobid in component 
>>>>>>>>>> form. Again, could add or just use the integer version.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Jul 22, 2011, at 7:01 AM, Greg Watson wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi all,
>>>>>>>>>>> 
>>>>>>>>>>> Does anyone know if it's possible to get the orte jobid from the 
>>>>>>>>>>> mpirun command? If not, how are you supposed to get it to use with 
>>>>>>>>>>> orte-ps? Also, orte-ps reports the jobid in [x,y] notation, but the 
>>>>>>>>>>> jobid argument seems to be an integer. How does that work?
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Greg
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> de...@open-mpi.org
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> de...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> de...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to