Re: [OMPI devel] RFC: make predefined handles extern to pointers

2009-01-16 Thread Terry Dontje
Just wanted to give an update.  On a workspace with just the predefined 
communicators converted to opaque pointers I've ran netpipe and hpcc 
performance tests and compared the results before and after the 
changes.  The differences in performance with 10 sample run was 
undetectable.


I've also tested using comm_world that I can have an a.out compile and 
link with a non-debug version of the library and then run the a.out 
successfully with a debug version of the library.  At a simple level 
this proves that the change actually does what we believe it should.


I will be completing the rest of handles in the next couple days.  Upon 
completion I will rerun the same tests above and test running hpcc with 
a debug and non-debug version of the library without recompiling.


I believe I am on track to putting this back to the trunk by the end of 
next week.  So if anyone has any issues with this please speak up.


thanks,

--td

Graham, Richard L. wrote:

No specific test, just an idea how this might impact an app.  I am guessing it 
won't even be noticable.

Rich

- Original Message -
From: devel-boun...@open-mpi.org 
To: Open MPI Developers 
Sent: Thu Dec 18 07:13:08 2008
Subject: Re: [OMPI devel] RFC: make predefined handles extern to pointers

Richard Graham wrote:
  

Terry,
  Is there any way you can quantify the cost ?  This seems reasonable, but
would be nice to get an idea what the performance cost is (and not within a
tight loop where everything stays in cache).

Rich


  

Ok, I guess that would eliminate any of the simple perf tests like IMB, 
netperf, and such.  So do you have something else in mind, maybe HPCC? 


--td
  

On 12/16/08 10:41 AM, "Terry D. Dontje"  wrote:

  


WHAT:  To make predefined handles extern to pointers instead of an
address of an extern to a structure.

WHY:  To make OMPI more backwards compatible in regards to changes to
structures that define predefined handles.

WHERE:  In the trunk.  ompi/include/mpi.h.in and places in ompi that
directly use the predefined handles.

WHEN:  01/24/2009

TIMEOUT:  01/10/2009




The point of this change is to improve the odds that an MPI application
does not have to recompile when changes are made to the OMPI library.
In this case specifically the predefined handles that use the structures
for communicators, groups, ops, datatypes, error handlers, win, file,
and info.

An example of the changes for the communicator predefined handles can be
found in the hg tmp workspace at
ssh://www.open-mpi.org/~tdd/hg/predefcompat.

Note, the one downfall that Jeff and I could think of by doing this is
you potentially add one level of indirection but I believe that will be
a small overhead and if you use one of the predefined handles
repetitively (like in a loop) that the address will probably be stored
in a register once and no additional over should be seen due to this change.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

  

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
  



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
  




Re: [OMPI devel] RFC: make predefined handles extern to pointers

2009-01-16 Thread Jeff Squyres

w00t.

(translation: I believe that this is a Good Thing and should be put  
back to the trunk when ready)



On Jan 16, 2009, at 7:19 AM, Terry Dontje wrote:

Just wanted to give an update.  On a workspace with just the  
predefined communicators converted to opaque pointers I've ran  
netpipe and hpcc performance tests and compared the results before  
and after the changes.  The differences in performance with 10  
sample run was undetectable.


I've also tested using comm_world that I can have an a.out compile  
and link with a non-debug version of the library and then run the  
a.out successfully with a debug version of the library.  At a simple  
level this proves that the change actually does what we believe it  
should.


I will be completing the rest of handles in the next couple days.   
Upon completion I will rerun the same tests above and test running  
hpcc with a debug and non-debug version of the library without  
recompiling.


I believe I am on track to putting this back to the trunk by the end  
of next week.  So if anyone has any issues with this please speak up.


thanks,

--td

Graham, Richard L. wrote:
No specific test, just an idea how this might impact an app.  I am  
guessing it won't even be noticable.


Rich

- Original Message -
From: devel-boun...@open-mpi.org 
To: Open MPI Developers 
Sent: Thu Dec 18 07:13:08 2008
Subject: Re: [OMPI devel] RFC: make predefined handles extern to  
pointers


Richard Graham wrote:


Terry,
 Is there any way you can quantify the cost ?  This seems  
reasonable, but
would be nice to get an idea what the performance cost is (and not  
within a

tight loop where everything stays in cache).

Rich



Ok, I guess that would eliminate any of the simple perf tests like  
IMB, netperf, and such.  So do you have something else in mind,  
maybe HPCC?

--td

On 12/16/08 10:41 AM, "Terry D. Dontje"   
wrote:




WHAT:  To make predefined handles extern to pointers instead of an
address of an extern to a structure.

WHY:  To make OMPI more backwards compatible in regards to  
changes to

structures that define predefined handles.

WHERE:  In the trunk.  ompi/include/mpi.h.in and places in ompi  
that

directly use the predefined handles.

WHEN:  01/24/2009

TIMEOUT:  01/10/2009




The point of this change is to improve the odds that an MPI  
application
does not have to recompile when changes are made to the OMPI  
library.
In this case specifically the predefined handles that use the  
structures
for communicators, groups, ops, datatypes, error handlers, win,  
file,

and info.

An example of the changes for the communicator predefined handles  
can be

found in the hg tmp workspace at
ssh://www.open-mpi.org/~tdd/hg/predefcompat.

Note, the one downfall that Jeff and I could think of by doing  
this is
you potentially add one level of indirection but I believe that  
will be

a small overhead and if you use one of the predefined handles
repetitively (like in a loop) that the address will probably be  
stored
in a register once and no additional over should be seen due to  
this change.

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Open MPI v1.3rc7 has been posted

2009-01-16 Thread Jeff Squyres

All looks good from my overnight runs with 1.3rc7.

Thumbs up for release.


On Jan 15, 2009, at 10:52 PM, Jeff Squyres wrote:

I did a large MTT run (about 7k tests) with the openib patch on rc6  
and all came out good.  I'll do the same run on rc7 -- the results  
should be identical.  Will post results tomorrow morning.


On Jan 15, 2009, at 5:24 PM, Tim Mattox wrote:


Hi All,
The seventh release candidate of Open MPI v1.3 is now available:

http://www.open-mpi.org/software/ompi/v1.3/

Please run it through it's paces as best you can.
Anticipated release of 1.3 is Friday...  of what month or year, I  
don't know...


This only differs from rc6 with an openib change... see ticket #1753:
https://svn.open-mpi.org/trac/ompi/ticket/1753
--
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
tmat...@gmail.com || timat...@open-mpi.org
  I'm a bright... http://www.the-brights.net/
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] This is why we test

2009-01-16 Thread Jeff Squyres
We fixed the openib segv, but I forgot to followup about the timeouts  
that I mentioned in my original mail.


The timeouts were from poorly-configured spawn tests.  That is, I had  
8 cores in the job and ran the spawn test on all 8 cores (all  
aggressively polling).  The spawn test then spawned N more MPI  
processes each of which also [attempt to] poll heavily.  This causes  
obvious thrashage and the test doesn't complete before the timeout.


This is obviously poorly configured tests on my part and not a real  
problem (I confirmed by re-running the tests with <8 original MPI  
procs).  So as I mentioned in my prior mail, thumbs up for v1.3  
release from my perspective.




On Jan 15, 2009, at 9:05 AM, Jeff Squyres wrote:


Unfortunately, I have to throw the flag in the v1.3 release.  :-(

I ran ~16k tests via MTT yesterday on the rc5 and rc6 tarballs.  I  
found the following:


Found test runs: 15962
Passed: 15785 (98.89%)
Failed: 83 (0.52%)
--> Openib failures: 80 (0.50%)
Skipped: 46 (0.29%)
Timedout: 48 (0.30%)

The 80 openib failures are all seemingly random segv's.  I repeated  
a much smaller run this morning (about 700 runs) and still found a  
non-zero percentage of fails of the same flavor.


The timeouts are a little worrysome as well.

This unfortunately requires investigation.  :-(

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] -display-map

2009-01-16 Thread Greg Watson

When I try to build trunk, it fails with:

i_f77.lax/libmpi_f77_pmpi.a/pwin_unlock_f.o .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/pwin_wait_f.o .libs/libmpi_f77.lax/libmpi_f77_pmpi.a/ 
pwtick_f.o .libs/libmpi_f77.lax/libmpi_f77_pmpi.a/ 
pwtime_f.o   ../../../ompi/.libs/libmpi.0.0.0.dylib /usr/local/ 
openmpi-1.4-devel/lib/libopen-rte.0.0.0.dylib /usr/local/openmpi-1.4- 
devel/lib/libopen-pal.0.0.0.dylib  -install_name  /usr/local/ 
openmpi-1.4-devel/lib/libmpi_f77.0.dylib -compatibility_version 1 - 
current_version 1.0
ld: duplicate symbol _mpi_reduce_local_f in .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/preduce_local_f.o and .libs/reduce_local_f.o


collect2: ld returned 1 exit status
make[3]: *** [libmpi_f77.la] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

I'm using the default configure command (./configure --prefix=xxx) on  
Mac OS X 10.5. This works fine on the 1.3 branch.


Greg

On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:

Okay, it is in the trunk as of r20284 - I'll file the request to  
have it moved to 1.3.1.


Let me know if you get a chance to test the stdout/err stuff in the  
trunk - we should try and iterate it so any changes can make 1.3.1  
as well.


Thanks!
Ralph


On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:


Ralph,

I think the second form would be ideal and would simplify things  
greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages are  
associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name= field  
from the noderesolve element since the info is specific to the  
host element that contains it. In other words, I can make it look  
like this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point.  
The problem will be the natural interleaving of stdout/err from  
the various procs due to the async behavior of MPI. Mpirun  
receives fragmented output in the forwarding system, limited by  
the buffer sizes and the amount of data we can read at any one  
"bite" from the pipes connecting us to the procs. So even though  
the user -thinks- they output a single large line of stuff, it  
may show up at mpirun as a series of fragments. Hence, it gets  
tricky to know how to put appropriate XML brackets around it.


Given this input about when you actually want resolved name info,  
I can at least do something about that area. Won't be in 1.3.0,  
but should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not to  
turn that feature "on" for 1.3.0 as they felt it hasn't been  
adequately tested yet. The code is present, but cannot be  
activated in 1.3.0. However, I believe it is activated on the  
trunk when you do --xml --tagged-output, so perhaps some testing  
will help us debug and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map, so  
we consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then  
you may as well leave as-is and we will attempt to clean it up  
in Eclipse. It would be nice if a future version of ompi could  
output correct XML (including stdout) as this would vastly  
simplify the parsing we need to do.


Regards,

Greg

On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:

Hmmm...well, I can't do either for 1.3.0 as it is departing  
this afternoon.


The first option would be very hard to do. I would have to  
expose the display-map option across the code base and check it  
prior to printing anything about resolving node names. I guess  
I should ask: do you only want noderesolve statements when we  
are displaying the map? Right now, I will output them regardless.


The second option could be done. I could check if any "display"  
option has been specified, and output the  root at that  
time (likewise for the end). Anything we output in-between  
would be encapsulated between the two, but that would include  
any user output to stdout and/or stderr - which for 1.3.0 is  
not in xml.


Any thoughts?

Ralph

PS. Guess I should clarify that I was not striving for true XML  
interaction here, but rather a quasi-XML format that would help  
you to filter the output. I have no problem trying to get to  
something more formally correct, but it could be tricky in some  
places to achieve it due to the inherent async nature of the  
beast.



On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:


Ralph,

The XML is looking better now, but there is still one problem.  
To be valid, there needs to be only one root el

Re: [OMPI devel] -display-map

2009-01-16 Thread Jeff Squyres

References: <78c4b4d7-d9bc-4268-97cf-8cbad...@computer.org>  <9317bd55-13a2-44be-bcc0-3e42e2322...@computer.org> 
<5cb48a5d-1ce3-48f7-8890-c99239b0a...@lanl.gov> <22ebe824--47f1-a954-8b54536bf...@computer.org>  
<6dda0348-96b4-4e3f-91b4-490631cfe...@computer.org>   
<460591d2-bd7b-43ca-9b1e-1b4e02127...@lanl.gov>   
<4d997767-d893-43e7-bd4a-41266c9b4...@lanl.gov> <206dc9cd-aa61-4e7c-8a28-7dd3279ce...@computer.org>  
<5175dc9a-ee1f-4b38-be89-eb55fcef3...@lanl.gov> <66736892-ce43-464c-b439-7ed03ddb0...@computer.org>  
<8d67c754-d192-45ee-b4e8-071f67d78...@lanl.gov> <6D116CE6-9A8B-407E-A2D7-1 f716e827...@computer.org> <7d175b97-df6a-42db-81b9-4d9663861...@lanl.gov> 
<19ef4971-0390-4992-a8a7-cbc6b7189...@computer.org>
X-Mailer: Apple Mail (2.930.3)
Return-Path: jsquy...@cisco.com
X-OriginalArrivalTime: 16 Jan 2009 18:08:11.0165 (UTC) 
FILETIME=[646AA4D0:01C97805]

Er... whoops.  This looks like my mistake (I just recently add  
MPI_REDUCE_LOCAL to the trunk -- not v1.3).


I could have sworn that I tested this on a Mac, multiple times.  I'll  
test again...



On Jan 16, 2009, at 12:58 PM, Greg Watson wrote:


When I try to build trunk, it fails with:

i_f77.lax/libmpi_f77_pmpi.a/pwin_unlock_f.o .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/pwin_wait_f.o .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/pwtick_f.o .libs/libmpi_f77.lax/libmpi_f77_pmpi.a/ 
pwtime_f.o   ../../../ompi/.libs/libmpi.0.0.0.dylib /usr/local/ 
openmpi-1.4-devel/lib/libopen-rte.0.0.0.dylib /usr/local/openmpi-1.4- 
devel/lib/libopen-pal.0.0.0.dylib  -install_name  /usr/local/ 
openmpi-1.4-devel/lib/libmpi_f77.0.dylib -compatibility_version 1 - 
current_version 1.0
ld: duplicate symbol _mpi_reduce_local_f in .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/preduce_local_f.o and .libs/reduce_local_f.o


collect2: ld returned 1 exit status
make[3]: *** [libmpi_f77.la] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

I'm using the default configure command (./configure --prefix=xxx)  
on Mac OS X 10.5. This works fine on the 1.3 branch.


Greg

On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:

Okay, it is in the trunk as of r20284 - I'll file the request to  
have it moved to 1.3.1.


Let me know if you get a chance to test the stdout/err stuff in the  
trunk - we should try and iterate it so any changes can make 1.3.1  
as well.


Thanks!
Ralph


On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:


Ralph,

I think the second form would be ideal and would simplify things  
greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages  
are associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name= field  
from the noderesolve element since the info is specific to the  
host element that contains it. In other words, I can make it look  
like this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point.  
The problem will be the natural interleaving of stdout/err from  
the various procs due to the async behavior of MPI. Mpirun  
receives fragmented output in the forwarding system, limited by  
the buffer sizes and the amount of data we can read at any one  
"bite" from the pipes connecting us to the procs. So even though  
the user -thinks- they output a single large line of stuff, it  
may show up at mpirun as a series of fragments. Hence, it gets  
tricky to know how to put appropriate XML brackets around it.


Given this input about when you actually want resolved name  
info, I can at least do something about that area. Won't be in  
1.3.0, but should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not to  
turn that feature "on" for 1.3.0 as they felt it hasn't been  
adequately tested yet. The code is present, but cannot be  
activated in 1.3.0. However, I believe it is activated on the  
trunk when you do --xml --tagged-output, so perhaps some testing  
will help us debug and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map,  
so we consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then  
you may as well leave as-is and we will attempt to clean it up  
in Eclipse. It would be nice if a future version of ompi could  
output correct XML (including stdout) as this would vastly  
simplify the parsing we need to do.


Regards,

Greg

On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:

Hmmm...well, I can't do either for 1.3.0 as it is departing  
this afternoon.


The first option would be very hard to do. I 

Re: [OMPI devel] -display-map

2009-01-16 Thread Greg Watson
FYI, if I configure with --with-platform=contrib/platform/lanl/macosx- 
dynamic the build succeeds.


Greg

On Jan 16, 2009, at 1:08 PM, Jeff Squyres wrote:

References: <78c4b4d7-d9bc-4268-97cf-8cbad...@computer.org> > <9317bd55-13a2-44be-bcc0-3e42e2322...@computer.org> <5cb48a5d-1ce3-48f7-8890-c99239b0a...@lanl.gov 
> <22ebe824--47f1-a954-8b54536bf...@computer.org> > <6dda0348-96b4-4e3f-91b4-490631cfe...@computer.org> >  <460591d2-bd7b-43ca-9b1e-1b4e02127...@lanl.gov 
>  > <4d997767-d893-43e7-bd4a-41266c9b4...@lanl.gov> <206dc9cd-aa61-4e7c-8a28-7dd3279ce...@computer.org 
>  <5175dc9a-ee1f-4b38-be89-eb55fcef3...@lanl.gov 
> <66736892-ce43-464c-b439-7ed03ddb0...@computer.org> > <8d67c754-d192-45ee-b4e8-071f67d78...@lanl.gov>  
<6D116CE6-9A8B-407E-A2D7-!
1 f716e827...@computer.org> <7d175b97-df6a-42db-81b9-4d9663861...@lanl.gov 
> <19ef4971-0390-4992-a8a7-cbc6b7189...@computer.org>

X-Mailer: Apple Mail (2.930.3)
Return-Path: jsquy...@cisco.com
X-OriginalArrivalTime: 16 Jan 2009 18:08:11.0165 (UTC)  
FILETIME=[646AA4D0:01C97805]


Er... whoops.  This looks like my mistake (I just recently add  
MPI_REDUCE_LOCAL to the trunk -- not v1.3).


I could have sworn that I tested this on a Mac, multiple times.   
I'll test again...



On Jan 16, 2009, at 12:58 PM, Greg Watson wrote:


When I try to build trunk, it fails with:

i_f77.lax/libmpi_f77_pmpi.a/pwin_unlock_f.o .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/pwin_wait_f.o .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/pwtick_f.o .libs/libmpi_f77.lax/libmpi_f77_pmpi.a/ 
pwtime_f.o   ../../../ompi/.libs/libmpi.0.0.0.dylib /usr/local/ 
openmpi-1.4-devel/lib/libopen-rte.0.0.0.dylib /usr/local/ 
openmpi-1.4-devel/lib/libopen-pal.0.0.0.dylib  -install_name  /usr/ 
local/openmpi-1.4-devel/lib/libmpi_f77.0.dylib - 
compatibility_version 1 -current_version 1.0
ld: duplicate symbol _mpi_reduce_local_f in .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/preduce_local_f.o and .libs/reduce_local_f.o


collect2: ld returned 1 exit status
make[3]: *** [libmpi_f77.la] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

I'm using the default configure command (./configure --prefix=xxx)  
on Mac OS X 10.5. This works fine on the 1.3 branch.


Greg

On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:

Okay, it is in the trunk as of r20284 - I'll file the request to  
have it moved to 1.3.1.


Let me know if you get a chance to test the stdout/err stuff in  
the trunk - we should try and iterate it so any changes can make  
1.3.1 as well.


Thanks!
Ralph


On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:


Ralph,

I think the second form would be ideal and would simplify things  
greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages  
are associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name=  
field from the noderesolve element since the info is specific to  
the host element that contains it. In other words, I can make it  
look like this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point.  
The problem will be the natural interleaving of stdout/err from  
the various procs due to the async behavior of MPI. Mpirun  
receives fragmented output in the forwarding system, limited by  
the buffer sizes and the amount of data we can read at any one  
"bite" from the pipes connecting us to the procs. So even  
though the user -thinks- they output a single large line of  
stuff, it may show up at mpirun as a series of fragments.  
Hence, it gets tricky to know how to put appropriate XML  
brackets around it.


Given this input about when you actually want resolved name  
info, I can at least do something about that area. Won't be in  
1.3.0, but should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not  
to turn that feature "on" for 1.3.0 as they felt it hasn't been  
adequately tested yet. The code is present, but cannot be  
activated in 1.3.0. However, I believe it is activated on the  
trunk when you do --xml --tagged-output, so perhaps some  
testing will help us debug and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map,  
so we consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then  
you may as well leave as-is and we will attempt to clean it up  
in Eclipse. It would be nice if a future version of ompi could  
output correct XML (including stdout) as this would vastly  
simplify the parsing we need to do.


Regards,

Greg

On

Re: [OMPI devel] -display-map

2009-01-16 Thread Jeff Squyres

Fixed in r20288.  Thanks for the catch.

On Jan 16, 2009, at 2:04 PM, Greg Watson wrote:

FYI, if I configure with --with-platform=contrib/platform/lanl/ 
macosx-dynamic the build succeeds.


Greg

On Jan 16, 2009, at 1:08 PM, Jeff Squyres wrote:

References: <78c4b4d7-d9bc-4268-97cf-8cbad...@computer.org> > <9317bd55-13a2-44be-bcc0-3e42e2322...@computer.org> <5cb48a5d-1ce3-48f7-8890-c99239b0a...@lanl.gov 
> <22ebe824--47f1-a954-8b54536bf...@computer.org> > <6dda0348-96b4-4e3f-91b4-490631cfe...@computer.org> >  <460591d2-bd7b-43ca-9b1e-1b4e02127...@lanl.gov 
>  > <4d997767-d893-43e7-bd4a-41266c9b4...@lanl.gov> <206dc9cd-aa61-4e7c-8a28-7dd3279ce...@computer.org 
>  <5175dc9a-ee1f-4b38-be89-eb55fcef3...@lanl.gov 
> <66736892-ce43-464c-b439-7ed03ddb0...@computer.org> > <8d67c754-d192-45ee-b4e8-071f67d78...@lanl.gov>  
<6D116CE6-9A8B-407E-A2D7-!
1 f716e827...@computer.org> <7d175b97-df6a-42db-81b9-4d9663861...@lanl.gov 
> <19ef4971-0390-4992-a8a7-cbc6b7189...@computer.org>

X-Mailer: Apple Mail (2.930.3)
Return-Path: jsquy...@cisco.com
X-OriginalArrivalTime: 16 Jan 2009 18:08:11.0165 (UTC)  
FILETIME=[646AA4D0:01C97805]


Er... whoops.  This looks like my mistake (I just recently add  
MPI_REDUCE_LOCAL to the trunk -- not v1.3).


I could have sworn that I tested this on a Mac, multiple times.   
I'll test again...



On Jan 16, 2009, at 12:58 PM, Greg Watson wrote:


When I try to build trunk, it fails with:

i_f77.lax/libmpi_f77_pmpi.a/pwin_unlock_f.o .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/pwin_wait_f.o .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/pwtick_f.o .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/pwtime_f.o   ../../../ompi/.libs/libmpi. 
0.0.0.dylib /usr/local/openmpi-1.4-devel/lib/libopen-rte. 
0.0.0.dylib /usr/local/openmpi-1.4-devel/lib/libopen-pal. 
0.0.0.dylib  -install_name  /usr/local/openmpi-1.4-devel/lib/ 
libmpi_f77.0.dylib -compatibility_version 1 -current_version 1.0
ld: duplicate symbol _mpi_reduce_local_f in .libs/libmpi_f77.lax/ 
libmpi_f77_pmpi.a/preduce_local_f.o and .libs/reduce_local_f.o


collect2: ld returned 1 exit status
make[3]: *** [libmpi_f77.la] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

I'm using the default configure command (./configure --prefix=xxx)  
on Mac OS X 10.5. This works fine on the 1.3 branch.


Greg

On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:

Okay, it is in the trunk as of r20284 - I'll file the request to  
have it moved to 1.3.1.


Let me know if you get a chance to test the stdout/err stuff in  
the trunk - we should try and iterate it so any changes can make  
1.3.1 as well.


Thanks!
Ralph


On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:


Ralph,

I think the second form would be ideal and would simplify things  
greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages  
are associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name=  
field from the noderesolve element since the info is specific  
to the host element that contains it. In other words, I can  
make it look like this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point.  
The problem will be the natural interleaving of stdout/err  
from the various procs due to the async behavior of MPI.  
Mpirun receives fragmented output in the forwarding system,  
limited by the buffer sizes and the amount of data we can read  
at any one "bite" from the pipes connecting us to the procs.  
So even though the user -thinks- they output a single large  
line of stuff, it may show up at mpirun as a series of  
fragments. Hence, it gets tricky to know how to put  
appropriate XML brackets around it.


Given this input about when you actually want resolved name  
info, I can at least do something about that area. Won't be in  
1.3.0, but should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not  
to turn that feature "on" for 1.3.0 as they felt it hasn't  
been adequately tested yet. The code is present, but cannot be  
activated in 1.3.0. However, I believe it is activated on the  
trunk when you do --xml --tagged-output, so perhaps some  
testing will help us debug and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map,  
so we consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then  
you may as well leave as-is and we will attempt to clean it  
up in Eclipse. It would be nice if a future version of ompi  
could output correct XML (includ

Re: [OMPI devel] -display-map

2009-01-16 Thread Greg Watson

Ralph,

Is there something I need to do to enable stdout/err encapsulation  
(apart from -xml)? Here's what I see:


$ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np 5 / 
Users/greg/Documents/workspace1/testMPI/Debug/testMPI


















n = 0
n = 0
n = 0
n = 0
n = 0

On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote:

Okay, it is in the trunk as of r20284 - I'll file the request to  
have it moved to 1.3.1.


Let me know if you get a chance to test the stdout/err stuff in the  
trunk - we should try and iterate it so any changes can make 1.3.1  
as well.


Thanks!
Ralph


On Jan 15, 2009, at 11:03 AM, Greg Watson wrote:


Ralph,

I think the second form would be ideal and would simplify things  
greatly.


Greg

On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote:

Here is what I was able to do - note that the resolve messages are  
associated with the specific hostname, not the overall map:











Will that work for you? If you like, I can remove the name= field  
from the noderesolve element since the info is specific to the  
host element that contains it. In other words, I can make it look  
like this:











if that would help.

Ralph


On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote:

We -may- be able to do a more formal XML output at some point.  
The problem will be the natural interleaving of stdout/err from  
the various procs due to the async behavior of MPI. Mpirun  
receives fragmented output in the forwarding system, limited by  
the buffer sizes and the amount of data we can read at any one  
"bite" from the pipes connecting us to the procs. So even though  
the user -thinks- they output a single large line of stuff, it  
may show up at mpirun as a series of fragments. Hence, it gets  
tricky to know how to put appropriate XML brackets around it.


Given this input about when you actually want resolved name info,  
I can at least do something about that area. Won't be in 1.3.0,  
but should make 1.3.1.


As for XML-tagged stdout/err: the OMPI community asked me not to  
turn that feature "on" for 1.3.0 as they felt it hasn't been  
adequately tested yet. The code is present, but cannot be  
activated in 1.3.0. However, I believe it is activated on the  
trunk when you do --xml --tagged-output, so perhaps some testing  
will help us debug and validate it adequately for 1.3.1?


Thanks
Ralph


On Jan 14, 2009, at 7:02 AM, Greg Watson wrote:


Ralph,

The only time we use the resolved names is when we get a map, so  
we consider them part of the map output.


If quasi-XML is all that will ever be possible with 1.3, then  
you may as well leave as-is and we will attempt to clean it up  
in Eclipse. It would be nice if a future version of ompi could  
output correct XML (including stdout) as this would vastly  
simplify the parsing we need to do.


Regards,

Greg

On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote:

Hmmm...well, I can't do either for 1.3.0 as it is departing  
this afternoon.


The first option would be very hard to do. I would have to  
expose the display-map option across the code base and check it  
prior to printing anything about resolving node names. I guess  
I should ask: do you only want noderesolve statements when we  
are displaying the map? Right now, I will output them regardless.


The second option could be done. I could check if any "display"  
option has been specified, and output the  root at that  
time (likewise for the end). Anything we output in-between  
would be encapsulated between the two, but that would include  
any user output to stdout and/or stderr - which for 1.3.0 is  
not in xml.


Any thoughts?

Ralph

PS. Guess I should clarify that I was not striving for true XML  
interaction here, but rather a quasi-XML format that would help  
you to filter the output. I have no problem trying to get to  
something more formally correct, but it could be tricky in some  
places to achieve it due to the inherent async nature of the  
beast.



On Jan 13, 2009, at 12:17 PM, Greg Watson wrote:


Ralph,

The XML is looking better now, but there is still one problem.  
To be valid, there needs to be only one root element, but  
currently you don't have any (or many). So rather than:














the XML should be:













or: