Re: [OMPI devel] shmem error msg

2011-07-25 Thread Samuel K. Gutierrez

Hi Ralph,

It seems as if this issue is related to a missing shm_unlink wrapper  
within Valgrind.  I'm going to disable posix by default and commit  
later today.


Thanks,
--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Jul 23, 2011, at 8:54 PM, Samuel K. Gutierrez wrote:


Hi Ralph,

That's mine - I'll take a look.

Thanks,

Sam

Whenever I run valgrind on orterun (or any OMPI tool), I get the  
following

error msg:

--
A system call failed during shared memory initialization that should
not have.  It is likely that your MPI job will now either abort or
experience performance degradation.

 Local host:  Ralph
 System call: shm_unlink(2)
 Error:   Function not implemented (errno 78)
--

It's coming out of open-rte/help-opal-shmem-posix.txt.

Everything continues, so I'm not sure what this is all about. Anyone
recognize this???

It's on the trunk, running on a Mac, vanilla configure.
Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] shmem error msg

2011-07-25 Thread Ralph Castain

On Jul 25, 2011, at 10:16 AM, Samuel K. Gutierrez wrote:

> Hi Ralph,
> 
> It seems as if this issue is related to a missing shm_unlink wrapper within 
> Valgrind.  I'm going to disable posix by default and commit later today.

Is that the right solution? If the problem is something in valgrind, then let's 
not disable something just for their problem. Is there a way we can wrap it 
ourselves so the error doesn't cause the message?

Like I said, everything worked just fine - the message just implied the proc 
would die, and it doesn't.

> 
> Thanks,
> --
> Samuel K. Gutierrez
> Los Alamos National Laboratory
> 
> On Jul 23, 2011, at 8:54 PM, Samuel K. Gutierrez wrote:
> 
>> Hi Ralph,
>> 
>> That's mine - I'll take a look.
>> 
>> Thanks,
>> 
>> Sam
>> 
>>> Whenever I run valgrind on orterun (or any OMPI tool), I get the following
>>> error msg:
>>> 
>>> --
>>> A system call failed during shared memory initialization that should
>>> not have.  It is likely that your MPI job will now either abort or
>>> experience performance degradation.
>>> 
>>> Local host:  Ralph
>>> System call: shm_unlink(2)
>>> Error:   Function not implemented (errno 78)
>>> --
>>> 
>>> It's coming out of open-rte/help-opal-shmem-posix.txt.
>>> 
>>> Everything continues, so I'm not sure what this is all about. Anyone
>>> recognize this???
>>> 
>>> It's on the trunk, running on a Mac, vanilla configure.
>>> Ralph
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] shmem error msg

2011-07-25 Thread Samuel K. Gutierrez

Hi Ralph,


On Jul 25, 2011, at 11:05 AM, Ralph Castain wrote:



On Jul 25, 2011, at 10:16 AM, Samuel K. Gutierrez wrote:


Hi Ralph,

It seems as if this issue is related to a missing shm_unlink  
wrapper within Valgrind.  I'm going to disable posix by default and  
commit later today.


Is that the right solution?


No, not really.

If the problem is something in valgrind, then let's not disable  
something just for their problem. Is there a way we can wrap it  
ourselves so the error doesn't cause the message?


I think so.  They outline the procedure in  
README_MISSING_SYSCALL_OR_IOCTL, so I'll take a look.


Stay tuned,

Sam



Like I said, everything worked just fine - the message just implied  
the proc would die, and it doesn't.




Thanks,
--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Jul 23, 2011, at 8:54 PM, Samuel K. Gutierrez wrote:


Hi Ralph,

That's mine - I'll take a look.

Thanks,

Sam

Whenever I run valgrind on orterun (or any OMPI tool), I get the  
following

error msg:

--
A system call failed during shared memory initialization that  
should

not have.  It is likely that your MPI job will now either abort or
experience performance degradation.

Local host:  Ralph
System call: shm_unlink(2)
Error:   Function not implemented (errno 78)
--

It's coming out of open-rte/help-opal-shmem-posix.txt.

Everything continues, so I'm not sure what this is all about.  
Anyone

recognize this???

It's on the trunk, running on a Mac, vanilla configure.
Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] orte question

2011-07-25 Thread Greg Watson
Ralph,

The output format looks good, but I'm not sure it's quite correct. If I run the 
mpirun command, I see the following:

mpirun:47520:num nodes:1:num jobs:2
jobid:0:state:RUNNING:slots:0:num procs:0
jobid:1:state:RUNNING:slots:1:num procs:4
process:x:rank:0:pid:47522:node:greg.local:state:SYNC REGISTERED
process:x:rank:1:pid:47523:node:greg.local:state:SYNC REGISTERED
process:x:rank:2:pid:47524:node:greg.local:state:SYNC REGISTERED
process:x:rank:3:pid:47525:node:greg.local:state:SYNC REGISTERED

Seems to indicate there are two jobs, but one of them has 0 procs. Is that 
expected? Not a huge problem, since I can just ignore the job with 0 procs.

Greg


On Jul 23, 2011, at 6:24 PM, Ralph Castain wrote:

> Okay, you should have it in r24929. Use:
> 
> orte-ps --parseable
> 
> to get the new output.
> 
> 
> On Jul 23, 2011, at 11:43 AM, Ralph Castain wrote:
> 
>> Gar - have to eat my words a bit. The jobid requested by orte-ps is just the 
>> "local" jobid - i.e., it is expecting you to provide a number from 0-N, as I 
>> described below (copied here):
>> 
>>> A jobid of 1 indicates the primary application, 2 and above would specify 
>>> comm_spawned jobs. 
>> 
>> Not providing the jobid at all corresponds to wildcard and returns the 
>> status of all jobs under that mpirun.
>> 
>> To specify which mpirun you want info on, you use the --pid option. It is 
>> this option that isn't working properly - orte-ps returns info from all 
>> mpiruns and doesn't check to provide only data from the given pid.
>> 
>> I'll fix that part, and implement the parsable output.
>> 
>> 
>> On Jul 22, 2011, at 8:55 PM, Ralph Castain wrote:
>> 
>>> 
>>> On Jul 22, 2011, at 3:57 PM, Greg Watson wrote:
>>> 
 Hi Ralph,
 
 I'd like three things :-)
 
 a) A --report-jobid option that prints the jobid on the first line in a 
 form that can be passed to the -jobid option on ompi-ps. Probably tagging 
 it in the output if -tag-output is enabled (e.g. jobid:) would be a 
 good idea.
 
 b) The orte-ps command output to use the same jobid format.
>>> 
>>> I started looking at the above, and found that orte-ps is just plain wrong 
>>> in the way it handles jobid. The jobid consists of two fields: a 16-bit 
>>> number indicating the mpirun, and a 16-bit number indicating the job within 
>>> that mpirun. Unfortunately, orte-ps sends a data request to every mpirun 
>>> out there instead of only to the one corresponding to that jobid.
>>> 
>>> What we probably should do is have you indicate the mpirun of interest via 
>>> the -pid option, and then let jobid tell us which job you want within that 
>>> mpirun. A jobid of 1 indicates the primary application, 2 and above would 
>>> specify comm_spawned jobs. A jobid of -1 would return the status of all 
>>> jobs under that mpirun.
>>> 
>>> If multiple mpiruns are being reported, then the "jobid" in the report 
>>> should again be the "local" jobid within that mpirun.
>>> 
>>> After all, you don't really care what the orte-internal 16-bit identifier 
>>> is for that mpirun.
>>> 
 
 c) A more easily parsable output format from ompi-ps. It doesn't need to 
 be a full blown XML format, just something like the following would 
 suffice:
 
 jobid:719585280:state:Running:slots:1:num procs:4
 process_name:./x:rank:0:pid:3082:node:node1.com:state:Running
 process_name:./x:rank:1:pid:4567:node:node5.com:state:Running
 process_name:./x:rank:2:pid:2343:node:node4.com:state:Running
 process_name:./x:rank:3:pid:3422:node:node7.com:state:Running
 jobid:345346663:state:running:slots:1:num procs:2
 process_name:./x:rank:0:pid:5563:node:node2.com:state:Running
 process_name:./x:rank:1:pid:6677:node:node3.com:state:Running
>>> 
>>> Shouldn't be too hard to do - bunch of if-then-else statements required, 
>>> though.
>>> 
 
 I'd be happy to help with any or all of these.
>>> 
>>> Appreciate the offer - let me see how hard this proves to be...
>>> 
 
 Cheers,
 Greg
 
 On Jul 22, 2011, at 10:18 AM, Ralph Castain wrote:
 
> Hmmm...well, it looks like we could have made this nicer than we did :-/
> 
> If you add --report-uri to the mpirun command line, you'll get back the 
> uri for that mpirun. This has the form of :. As the -h option 
> indicates:
> 
> -report-uri | --report-uri   
> Printout URI on stdout [-], stderr [+], or a file
> [anything else]
> 
> The "jobid" required by the orte-ps command is the one reported there. We 
> could easily add a --report-jobid option if that makes things easier.
> 
> As to the difference in how orte-ps shows the jobid...well, that's 
> probably historical. orte-ps uses an orte utility function to print the 
> jobid, and that utility always shows the jobid in component form. Again, 
> could add or just use the integer version.
> 
> 
> On J

Re: [OMPI devel] orte question

2011-07-25 Thread Ralph Castain
job 0 is mpirun and its daemons - I can have it ignore that job as I doubt 
users care :-)

On Jul 25, 2011, at 12:25 PM, Greg Watson wrote:

> Ralph,
> 
> The output format looks good, but I'm not sure it's quite correct. If I run 
> the mpirun command, I see the following:
> 
> mpirun:47520:num nodes:1:num jobs:2
> jobid:0:state:RUNNING:slots:0:num procs:0
> jobid:1:state:RUNNING:slots:1:num procs:4
> process:x:rank:0:pid:47522:node:greg.local:state:SYNC REGISTERED
> process:x:rank:1:pid:47523:node:greg.local:state:SYNC REGISTERED
> process:x:rank:2:pid:47524:node:greg.local:state:SYNC REGISTERED
> process:x:rank:3:pid:47525:node:greg.local:state:SYNC REGISTERED
> 
> Seems to indicate there are two jobs, but one of them has 0 procs. Is that 
> expected? Not a huge problem, since I can just ignore the job with 0 procs.
> 
> Greg
> 
> 
> On Jul 23, 2011, at 6:24 PM, Ralph Castain wrote:
> 
>> Okay, you should have it in r24929. Use:
>> 
>> orte-ps --parseable
>> 
>> to get the new output.
>> 
>> 
>> On Jul 23, 2011, at 11:43 AM, Ralph Castain wrote:
>> 
>>> Gar - have to eat my words a bit. The jobid requested by orte-ps is just 
>>> the "local" jobid - i.e., it is expecting you to provide a number from 0-N, 
>>> as I described below (copied here):
>>> 
 A jobid of 1 indicates the primary application, 2 and above would specify 
 comm_spawned jobs. 
>>> 
>>> Not providing the jobid at all corresponds to wildcard and returns the 
>>> status of all jobs under that mpirun.
>>> 
>>> To specify which mpirun you want info on, you use the --pid option. It is 
>>> this option that isn't working properly - orte-ps returns info from all 
>>> mpiruns and doesn't check to provide only data from the given pid.
>>> 
>>> I'll fix that part, and implement the parsable output.
>>> 
>>> 
>>> On Jul 22, 2011, at 8:55 PM, Ralph Castain wrote:
>>> 
 
 On Jul 22, 2011, at 3:57 PM, Greg Watson wrote:
 
> Hi Ralph,
> 
> I'd like three things :-)
> 
> a) A --report-jobid option that prints the jobid on the first line in a 
> form that can be passed to the -jobid option on ompi-ps. Probably tagging 
> it in the output if -tag-output is enabled (e.g. jobid:) would be 
> a good idea.
> 
> b) The orte-ps command output to use the same jobid format.
 
 I started looking at the above, and found that orte-ps is just plain wrong 
 in the way it handles jobid. The jobid consists of two fields: a 16-bit 
 number indicating the mpirun, and a 16-bit number indicating the job 
 within that mpirun. Unfortunately, orte-ps sends a data request to every 
 mpirun out there instead of only to the one corresponding to that jobid.
 
 What we probably should do is have you indicate the mpirun of interest via 
 the -pid option, and then let jobid tell us which job you want within that 
 mpirun. A jobid of 1 indicates the primary application, 2 and above would 
 specify comm_spawned jobs. A jobid of -1 would return the status of all 
 jobs under that mpirun.
 
 If multiple mpiruns are being reported, then the "jobid" in the report 
 should again be the "local" jobid within that mpirun.
 
 After all, you don't really care what the orte-internal 16-bit identifier 
 is for that mpirun.
 
> 
> c) A more easily parsable output format from ompi-ps. It doesn't need to 
> be a full blown XML format, just something like the following would 
> suffice:
> 
> jobid:719585280:state:Running:slots:1:num procs:4
> process_name:./x:rank:0:pid:3082:node:node1.com:state:Running
> process_name:./x:rank:1:pid:4567:node:node5.com:state:Running
> process_name:./x:rank:2:pid:2343:node:node4.com:state:Running
> process_name:./x:rank:3:pid:3422:node:node7.com:state:Running
> jobid:345346663:state:running:slots:1:num procs:2
> process_name:./x:rank:0:pid:5563:node:node2.com:state:Running
> process_name:./x:rank:1:pid:6677:node:node3.com:state:Running
 
 Shouldn't be too hard to do - bunch of if-then-else statements required, 
 though.
 
> 
> I'd be happy to help with any or all of these.
 
 Appreciate the offer - let me see how hard this proves to be...
 
> 
> Cheers,
> Greg
> 
> On Jul 22, 2011, at 10:18 AM, Ralph Castain wrote:
> 
>> Hmmm...well, it looks like we could have made this nicer than we did :-/
>> 
>> If you add --report-uri to the mpirun command line, you'll get back the 
>> uri for that mpirun. This has the form of :. As the -h 
>> option indicates:
>> 
>> -report-uri | --report-uri   
>>Printout URI on stdout [-], stderr [+], or a file
>>[anything else]
>> 
>> The "jobid" required by the orte-ps command is the one reported there. 
>> We could easily add a --report-jobid option if that makes things easier.
>> 
>> As to the differ

Re: [OMPI devel] orte question

2011-07-25 Thread Greg Watson
That would probably be more intuitive.

Thanks,
Greg

On Jul 25, 2011, at 2:28 PM, Ralph Castain wrote:

> job 0 is mpirun and its daemons - I can have it ignore that job as I doubt 
> users care :-)
> 
> On Jul 25, 2011, at 12:25 PM, Greg Watson wrote:
> 
>> Ralph,
>> 
>> The output format looks good, but I'm not sure it's quite correct. If I run 
>> the mpirun command, I see the following:
>> 
>> mpirun:47520:num nodes:1:num jobs:2
>> jobid:0:state:RUNNING:slots:0:num procs:0
>> jobid:1:state:RUNNING:slots:1:num procs:4
>> process:x:rank:0:pid:47522:node:greg.local:state:SYNC REGISTERED
>> process:x:rank:1:pid:47523:node:greg.local:state:SYNC REGISTERED
>> process:x:rank:2:pid:47524:node:greg.local:state:SYNC REGISTERED
>> process:x:rank:3:pid:47525:node:greg.local:state:SYNC REGISTERED
>> 
>> Seems to indicate there are two jobs, but one of them has 0 procs. Is that 
>> expected? Not a huge problem, since I can just ignore the job with 0 procs.
>> 
>> Greg
>> 
>> 
>> On Jul 23, 2011, at 6:24 PM, Ralph Castain wrote:
>> 
>>> Okay, you should have it in r24929. Use:
>>> 
>>> orte-ps --parseable
>>> 
>>> to get the new output.
>>> 
>>> 
>>> On Jul 23, 2011, at 11:43 AM, Ralph Castain wrote:
>>> 
 Gar - have to eat my words a bit. The jobid requested by orte-ps is just 
 the "local" jobid - i.e., it is expecting you to provide a number from 
 0-N, as I described below (copied here):
 
> A jobid of 1 indicates the primary application, 2 and above would specify 
> comm_spawned jobs. 
 
 Not providing the jobid at all corresponds to wildcard and returns the 
 status of all jobs under that mpirun.
 
 To specify which mpirun you want info on, you use the --pid option. It is 
 this option that isn't working properly - orte-ps returns info from all 
 mpiruns and doesn't check to provide only data from the given pid.
 
 I'll fix that part, and implement the parsable output.
 
 
 On Jul 22, 2011, at 8:55 PM, Ralph Castain wrote:
 
> 
> On Jul 22, 2011, at 3:57 PM, Greg Watson wrote:
> 
>> Hi Ralph,
>> 
>> I'd like three things :-)
>> 
>> a) A --report-jobid option that prints the jobid on the first line in a 
>> form that can be passed to the -jobid option on ompi-ps. Probably 
>> tagging it in the output if -tag-output is enabled (e.g. jobid:) 
>> would be a good idea.
>> 
>> b) The orte-ps command output to use the same jobid format.
> 
> I started looking at the above, and found that orte-ps is just plain 
> wrong in the way it handles jobid. The jobid consists of two fields: a 
> 16-bit number indicating the mpirun, and a 16-bit number indicating the 
> job within that mpirun. Unfortunately, orte-ps sends a data request to 
> every mpirun out there instead of only to the one corresponding to that 
> jobid.
> 
> What we probably should do is have you indicate the mpirun of interest 
> via the -pid option, and then let jobid tell us which job you want within 
> that mpirun. A jobid of 1 indicates the primary application, 2 and above 
> would specify comm_spawned jobs. A jobid of -1 would return the status of 
> all jobs under that mpirun.
> 
> If multiple mpiruns are being reported, then the "jobid" in the report 
> should again be the "local" jobid within that mpirun.
> 
> After all, you don't really care what the orte-internal 16-bit identifier 
> is for that mpirun.
> 
>> 
>> c) A more easily parsable output format from ompi-ps. It doesn't need to 
>> be a full blown XML format, just something like the following would 
>> suffice:
>> 
>> jobid:719585280:state:Running:slots:1:num procs:4
>> process_name:./x:rank:0:pid:3082:node:node1.com:state:Running
>> process_name:./x:rank:1:pid:4567:node:node5.com:state:Running
>> process_name:./x:rank:2:pid:2343:node:node4.com:state:Running
>> process_name:./x:rank:3:pid:3422:node:node7.com:state:Running
>> jobid:345346663:state:running:slots:1:num procs:2
>> process_name:./x:rank:0:pid:5563:node:node2.com:state:Running
>> process_name:./x:rank:1:pid:6677:node:node3.com:state:Running
> 
> Shouldn't be too hard to do - bunch of if-then-else statements required, 
> though.
> 
>> 
>> I'd be happy to help with any or all of these.
> 
> Appreciate the offer - let me see how hard this proves to be...
> 
>> 
>> Cheers,
>> Greg
>> 
>> On Jul 22, 2011, at 10:18 AM, Ralph Castain wrote:
>> 
>>> Hmmm...well, it looks like we could have made this nicer than we did :-/
>>> 
>>> If you add --report-uri to the mpirun command line, you'll get back the 
>>> uri for that mpirun. This has the form of :. As the -h 
>>> option indicates:
>>> 
>>> -report-uri | --report-uri   
>>>   Printout URI on stdout [-], stderr [+], or a file
>>

Re: [OMPI devel] orte question

2011-07-25 Thread Ralph Castain
r24944 - let me know how it works!


On Jul 25, 2011, at 1:01 PM, Greg Watson wrote:

> That would probably be more intuitive.
> 
> Thanks,
> Greg
> 
> On Jul 25, 2011, at 2:28 PM, Ralph Castain wrote:
> 
>> job 0 is mpirun and its daemons - I can have it ignore that job as I doubt 
>> users care :-)
>> 
>> On Jul 25, 2011, at 12:25 PM, Greg Watson wrote:
>> 
>>> Ralph,
>>> 
>>> The output format looks good, but I'm not sure it's quite correct. If I run 
>>> the mpirun command, I see the following:
>>> 
>>> mpirun:47520:num nodes:1:num jobs:2
>>> jobid:0:state:RUNNING:slots:0:num procs:0
>>> jobid:1:state:RUNNING:slots:1:num procs:4
>>> process:x:rank:0:pid:47522:node:greg.local:state:SYNC REGISTERED
>>> process:x:rank:1:pid:47523:node:greg.local:state:SYNC REGISTERED
>>> process:x:rank:2:pid:47524:node:greg.local:state:SYNC REGISTERED
>>> process:x:rank:3:pid:47525:node:greg.local:state:SYNC REGISTERED
>>> 
>>> Seems to indicate there are two jobs, but one of them has 0 procs. Is that 
>>> expected? Not a huge problem, since I can just ignore the job with 0 procs.
>>> 
>>> Greg
>>> 
>>> 
>>> On Jul 23, 2011, at 6:24 PM, Ralph Castain wrote:
>>> 
 Okay, you should have it in r24929. Use:
 
 orte-ps --parseable
 
 to get the new output.
 
 
 On Jul 23, 2011, at 11:43 AM, Ralph Castain wrote:
 
> Gar - have to eat my words a bit. The jobid requested by orte-ps is just 
> the "local" jobid - i.e., it is expecting you to provide a number from 
> 0-N, as I described below (copied here):
> 
>> A jobid of 1 indicates the primary application, 2 and above would 
>> specify comm_spawned jobs. 
> 
> Not providing the jobid at all corresponds to wildcard and returns the 
> status of all jobs under that mpirun.
> 
> To specify which mpirun you want info on, you use the --pid option. It is 
> this option that isn't working properly - orte-ps returns info from all 
> mpiruns and doesn't check to provide only data from the given pid.
> 
> I'll fix that part, and implement the parsable output.
> 
> 
> On Jul 22, 2011, at 8:55 PM, Ralph Castain wrote:
> 
>> 
>> On Jul 22, 2011, at 3:57 PM, Greg Watson wrote:
>> 
>>> Hi Ralph,
>>> 
>>> I'd like three things :-)
>>> 
>>> a) A --report-jobid option that prints the jobid on the first line in a 
>>> form that can be passed to the -jobid option on ompi-ps. Probably 
>>> tagging it in the output if -tag-output is enabled (e.g. jobid:) 
>>> would be a good idea.
>>> 
>>> b) The orte-ps command output to use the same jobid format.
>> 
>> I started looking at the above, and found that orte-ps is just plain 
>> wrong in the way it handles jobid. The jobid consists of two fields: a 
>> 16-bit number indicating the mpirun, and a 16-bit number indicating the 
>> job within that mpirun. Unfortunately, orte-ps sends a data request to 
>> every mpirun out there instead of only to the one corresponding to that 
>> jobid.
>> 
>> What we probably should do is have you indicate the mpirun of interest 
>> via the -pid option, and then let jobid tell us which job you want 
>> within that mpirun. A jobid of 1 indicates the primary application, 2 
>> and above would specify comm_spawned jobs. A jobid of -1 would return 
>> the status of all jobs under that mpirun.
>> 
>> If multiple mpiruns are being reported, then the "jobid" in the report 
>> should again be the "local" jobid within that mpirun.
>> 
>> After all, you don't really care what the orte-internal 16-bit 
>> identifier is for that mpirun.
>> 
>>> 
>>> c) A more easily parsable output format from ompi-ps. It doesn't need 
>>> to be a full blown XML format, just something like the following would 
>>> suffice:
>>> 
>>> jobid:719585280:state:Running:slots:1:num procs:4
>>> process_name:./x:rank:0:pid:3082:node:node1.com:state:Running
>>> process_name:./x:rank:1:pid:4567:node:node5.com:state:Running
>>> process_name:./x:rank:2:pid:2343:node:node4.com:state:Running
>>> process_name:./x:rank:3:pid:3422:node:node7.com:state:Running
>>> jobid:345346663:state:running:slots:1:num procs:2
>>> process_name:./x:rank:0:pid:5563:node:node2.com:state:Running
>>> process_name:./x:rank:1:pid:6677:node:node3.com:state:Running
>> 
>> Shouldn't be too hard to do - bunch of if-then-else statements required, 
>> though.
>> 
>>> 
>>> I'd be happy to help with any or all of these.
>> 
>> Appreciate the offer - let me see how hard this proves to be...
>> 
>>> 
>>> Cheers,
>>> Greg
>>> 
>>> On Jul 22, 2011, at 10:18 AM, Ralph Castain wrote:
>>> 
 Hmmm...well, it looks like we could have made this nicer than we did 
 :-/
 
 If you add --report-uri to the mpirun command line, you'll g

[OMPI devel] Open MPI + HWLOC + Static build issue

2011-07-25 Thread Shamis, Pavel
Hello,

I have been trying to compile Open MPI (trunk) static version with hwloc, the 
last is enabled by default in trunk.
The build platform is AMD machine, that has dynamic libnuma version only.

Problem:
Open MPI fails to link orted, because it can't find static version of libnuma.

Workaround:
add --without-hwloc

Real solution:
Is it a way to keep hwloc enabled when static libnuma isn't presented on the 
system ? If it's a such way, I would like to know how to enable it.
Otherwise, I think configure script should handle such scenario, it means 
disable hwloc and enable some other alternative.

Regards,

Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory