Re: [OMPI devel] RFC: make predefined handles extern to pointers
Just wanted to give an update. On a workspace with just the predefined communicators converted to opaque pointers I've ran netpipe and hpcc performance tests and compared the results before and after the changes. The differences in performance with 10 sample run was undetectable. I've also tested using comm_world that I can have an a.out compile and link with a non-debug version of the library and then run the a.out successfully with a debug version of the library. At a simple level this proves that the change actually does what we believe it should. I will be completing the rest of handles in the next couple days. Upon completion I will rerun the same tests above and test running hpcc with a debug and non-debug version of the library without recompiling. I believe I am on track to putting this back to the trunk by the end of next week. So if anyone has any issues with this please speak up. thanks, --td Graham, Richard L. wrote: No specific test, just an idea how this might impact an app. I am guessing it won't even be noticable. Rich - Original Message - From: devel-boun...@open-mpi.org To: Open MPI Developers Sent: Thu Dec 18 07:13:08 2008 Subject: Re: [OMPI devel] RFC: make predefined handles extern to pointers Richard Graham wrote: Terry, Is there any way you can quantify the cost ? This seems reasonable, but would be nice to get an idea what the performance cost is (and not within a tight loop where everything stays in cache). Rich Ok, I guess that would eliminate any of the simple perf tests like IMB, netperf, and such. So do you have something else in mind, maybe HPCC? --td On 12/16/08 10:41 AM, "Terry D. Dontje" wrote: WHAT: To make predefined handles extern to pointers instead of an address of an extern to a structure. WHY: To make OMPI more backwards compatible in regards to changes to structures that define predefined handles. WHERE: In the trunk. ompi/include/mpi.h.in and places in ompi that directly use the predefined handles. WHEN: 01/24/2009 TIMEOUT: 01/10/2009 The point of this change is to improve the odds that an MPI application does not have to recompile when changes are made to the OMPI library. In this case specifically the predefined handles that use the structures for communicators, groups, ops, datatypes, error handlers, win, file, and info. An example of the changes for the communicator predefined handles can be found in the hg tmp workspace at ssh://www.open-mpi.org/~tdd/hg/predefcompat. Note, the one downfall that Jeff and I could think of by doing this is you potentially add one level of indirection but I believe that will be a small overhead and if you use one of the predefined handles repetitively (like in a loop) that the address will probably be stored in a register once and no additional over should be seen due to this change. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] RFC: make predefined handles extern to pointers
w00t. (translation: I believe that this is a Good Thing and should be put back to the trunk when ready) On Jan 16, 2009, at 7:19 AM, Terry Dontje wrote: Just wanted to give an update. On a workspace with just the predefined communicators converted to opaque pointers I've ran netpipe and hpcc performance tests and compared the results before and after the changes. The differences in performance with 10 sample run was undetectable. I've also tested using comm_world that I can have an a.out compile and link with a non-debug version of the library and then run the a.out successfully with a debug version of the library. At a simple level this proves that the change actually does what we believe it should. I will be completing the rest of handles in the next couple days. Upon completion I will rerun the same tests above and test running hpcc with a debug and non-debug version of the library without recompiling. I believe I am on track to putting this back to the trunk by the end of next week. So if anyone has any issues with this please speak up. thanks, --td Graham, Richard L. wrote: No specific test, just an idea how this might impact an app. I am guessing it won't even be noticable. Rich - Original Message - From: devel-boun...@open-mpi.org To: Open MPI Developers Sent: Thu Dec 18 07:13:08 2008 Subject: Re: [OMPI devel] RFC: make predefined handles extern to pointers Richard Graham wrote: Terry, Is there any way you can quantify the cost ? This seems reasonable, but would be nice to get an idea what the performance cost is (and not within a tight loop where everything stays in cache). Rich Ok, I guess that would eliminate any of the simple perf tests like IMB, netperf, and such. So do you have something else in mind, maybe HPCC? --td On 12/16/08 10:41 AM, "Terry D. Dontje" wrote: WHAT: To make predefined handles extern to pointers instead of an address of an extern to a structure. WHY: To make OMPI more backwards compatible in regards to changes to structures that define predefined handles. WHERE: In the trunk. ompi/include/mpi.h.in and places in ompi that directly use the predefined handles. WHEN: 01/24/2009 TIMEOUT: 01/10/2009 The point of this change is to improve the odds that an MPI application does not have to recompile when changes are made to the OMPI library. In this case specifically the predefined handles that use the structures for communicators, groups, ops, datatypes, error handlers, win, file, and info. An example of the changes for the communicator predefined handles can be found in the hg tmp workspace at ssh://www.open-mpi.org/~tdd/hg/predefcompat. Note, the one downfall that Jeff and I could think of by doing this is you potentially add one level of indirection but I believe that will be a small overhead and if you use one of the predefined handles repetitively (like in a loop) that the address will probably be stored in a register once and no additional over should be seen due to this change. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Open MPI v1.3rc7 has been posted
All looks good from my overnight runs with 1.3rc7. Thumbs up for release. On Jan 15, 2009, at 10:52 PM, Jeff Squyres wrote: I did a large MTT run (about 7k tests) with the openib patch on rc6 and all came out good. I'll do the same run on rc7 -- the results should be identical. Will post results tomorrow morning. On Jan 15, 2009, at 5:24 PM, Tim Mattox wrote: Hi All, The seventh release candidate of Open MPI v1.3 is now available: http://www.open-mpi.org/software/ompi/v1.3/ Please run it through it's paces as best you can. Anticipated release of 1.3 is Friday... of what month or year, I don't know... This only differs from rc6 with an openib change... see ticket #1753: https://svn.open-mpi.org/trac/ompi/ticket/1753 -- Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/ tmat...@gmail.com || timat...@open-mpi.org I'm a bright... http://www.the-brights.net/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] This is why we test
We fixed the openib segv, but I forgot to followup about the timeouts that I mentioned in my original mail. The timeouts were from poorly-configured spawn tests. That is, I had 8 cores in the job and ran the spawn test on all 8 cores (all aggressively polling). The spawn test then spawned N more MPI processes each of which also [attempt to] poll heavily. This causes obvious thrashage and the test doesn't complete before the timeout. This is obviously poorly configured tests on my part and not a real problem (I confirmed by re-running the tests with <8 original MPI procs). So as I mentioned in my prior mail, thumbs up for v1.3 release from my perspective. On Jan 15, 2009, at 9:05 AM, Jeff Squyres wrote: Unfortunately, I have to throw the flag in the v1.3 release. :-( I ran ~16k tests via MTT yesterday on the rc5 and rc6 tarballs. I found the following: Found test runs: 15962 Passed: 15785 (98.89%) Failed: 83 (0.52%) --> Openib failures: 80 (0.50%) Skipped: 46 (0.29%) Timedout: 48 (0.30%) The 80 openib failures are all seemingly random segv's. I repeated a much smaller run this morning (about 700 runs) and still found a non-zero percentage of fails of the same flavor. The timeouts are a little worrysome as well. This unfortunately requires investigation. :-( -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] -display-map
When I try to build trunk, it fails with: i_f77.lax/libmpi_f77_pmpi.a/pwin_unlock_f.o .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/pwin_wait_f.o .libs/libmpi_f77.lax/libmpi_f77_pmpi.a/ pwtick_f.o .libs/libmpi_f77.lax/libmpi_f77_pmpi.a/ pwtime_f.o ../../../ompi/.libs/libmpi.0.0.0.dylib /usr/local/ openmpi-1.4-devel/lib/libopen-rte.0.0.0.dylib /usr/local/openmpi-1.4- devel/lib/libopen-pal.0.0.0.dylib -install_name /usr/local/ openmpi-1.4-devel/lib/libmpi_f77.0.dylib -compatibility_version 1 - current_version 1.0 ld: duplicate symbol _mpi_reduce_local_f in .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/preduce_local_f.o and .libs/reduce_local_f.o collect2: ld returned 1 exit status make[3]: *** [libmpi_f77.la] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 I'm using the default configure command (./configure --prefix=xxx) on Mac OS X 10.5. This works fine on the 1.3 branch. Greg On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote: Okay, it is in the trunk as of r20284 - I'll file the request to have it moved to 1.3.1. Let me know if you get a chance to test the stdout/err stuff in the trunk - we should try and iterate it so any changes can make 1.3.1 as well. Thanks! Ralph On Jan 15, 2009, at 11:03 AM, Greg Watson wrote: Ralph, I think the second form would be ideal and would simplify things greatly. Greg On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote: Here is what I was able to do - note that the resolve messages are associated with the specific hostname, not the overall map: Will that work for you? If you like, I can remove the name= field from the noderesolve element since the info is specific to the host element that contains it. In other words, I can make it look like this: if that would help. Ralph On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote: We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user -thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml --tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (including stdout) as this would vastly simplify the parsing we need to do. Regards, Greg On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote: Hmmm...well, I can't do either for 1.3.0 as it is departing this afternoon. The first option would be very hard to do. I would have to expose the display-map option across the code base and check it prior to printing anything about resolving node names. I guess I should ask: do you only want noderesolve statements when we are displaying the map? Right now, I will output them regardless. The second option could be done. I could check if any "display" option has been specified, and output the root at that time (likewise for the end). Anything we output in-between would be encapsulated between the two, but that would include any user output to stdout and/or stderr - which for 1.3.0 is not in xml. Any thoughts? Ralph PS. Guess I should clarify that I was not striving for true XML interaction here, but rather a quasi-XML format that would help you to filter the output. I have no problem trying to get to something more formally correct, but it could be tricky in some places to achieve it due to the inherent async nature of the beast. On Jan 13, 2009, at 12:17 PM, Greg Watson wrote: Ralph, The XML is looking better now, but there is still one problem. To be valid, there needs to be only one root el
Re: [OMPI devel] -display-map
References: <78c4b4d7-d9bc-4268-97cf-8cbad...@computer.org> <9317bd55-13a2-44be-bcc0-3e42e2322...@computer.org> <5cb48a5d-1ce3-48f7-8890-c99239b0a...@lanl.gov> <22ebe824--47f1-a954-8b54536bf...@computer.org> <6dda0348-96b4-4e3f-91b4-490631cfe...@computer.org> <460591d2-bd7b-43ca-9b1e-1b4e02127...@lanl.gov> <4d997767-d893-43e7-bd4a-41266c9b4...@lanl.gov> <206dc9cd-aa61-4e7c-8a28-7dd3279ce...@computer.org> <5175dc9a-ee1f-4b38-be89-eb55fcef3...@lanl.gov> <66736892-ce43-464c-b439-7ed03ddb0...@computer.org> <8d67c754-d192-45ee-b4e8-071f67d78...@lanl.gov> <6D116CE6-9A8B-407E-A2D7-1 f716e827...@computer.org> <7d175b97-df6a-42db-81b9-4d9663861...@lanl.gov> <19ef4971-0390-4992-a8a7-cbc6b7189...@computer.org> X-Mailer: Apple Mail (2.930.3) Return-Path: jsquy...@cisco.com X-OriginalArrivalTime: 16 Jan 2009 18:08:11.0165 (UTC) FILETIME=[646AA4D0:01C97805] Er... whoops. This looks like my mistake (I just recently add MPI_REDUCE_LOCAL to the trunk -- not v1.3). I could have sworn that I tested this on a Mac, multiple times. I'll test again... On Jan 16, 2009, at 12:58 PM, Greg Watson wrote: When I try to build trunk, it fails with: i_f77.lax/libmpi_f77_pmpi.a/pwin_unlock_f.o .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/pwin_wait_f.o .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/pwtick_f.o .libs/libmpi_f77.lax/libmpi_f77_pmpi.a/ pwtime_f.o ../../../ompi/.libs/libmpi.0.0.0.dylib /usr/local/ openmpi-1.4-devel/lib/libopen-rte.0.0.0.dylib /usr/local/openmpi-1.4- devel/lib/libopen-pal.0.0.0.dylib -install_name /usr/local/ openmpi-1.4-devel/lib/libmpi_f77.0.dylib -compatibility_version 1 - current_version 1.0 ld: duplicate symbol _mpi_reduce_local_f in .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/preduce_local_f.o and .libs/reduce_local_f.o collect2: ld returned 1 exit status make[3]: *** [libmpi_f77.la] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 I'm using the default configure command (./configure --prefix=xxx) on Mac OS X 10.5. This works fine on the 1.3 branch. Greg On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote: Okay, it is in the trunk as of r20284 - I'll file the request to have it moved to 1.3.1. Let me know if you get a chance to test the stdout/err stuff in the trunk - we should try and iterate it so any changes can make 1.3.1 as well. Thanks! Ralph On Jan 15, 2009, at 11:03 AM, Greg Watson wrote: Ralph, I think the second form would be ideal and would simplify things greatly. Greg On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote: Here is what I was able to do - note that the resolve messages are associated with the specific hostname, not the overall map: Will that work for you? If you like, I can remove the name= field from the noderesolve element since the info is specific to the host element that contains it. In other words, I can make it look like this: if that would help. Ralph On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote: We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user -thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml --tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (including stdout) as this would vastly simplify the parsing we need to do. Regards, Greg On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote: Hmmm...well, I can't do either for 1.3.0 as it is departing this afternoon. The first option would be very hard to do. I
Re: [OMPI devel] -display-map
FYI, if I configure with --with-platform=contrib/platform/lanl/macosx- dynamic the build succeeds. Greg On Jan 16, 2009, at 1:08 PM, Jeff Squyres wrote: References: <78c4b4d7-d9bc-4268-97cf-8cbad...@computer.org> > <9317bd55-13a2-44be-bcc0-3e42e2322...@computer.org> <5cb48a5d-1ce3-48f7-8890-c99239b0a...@lanl.gov > <22ebe824--47f1-a954-8b54536bf...@computer.org> > <6dda0348-96b4-4e3f-91b4-490631cfe...@computer.org> > <460591d2-bd7b-43ca-9b1e-1b4e02127...@lanl.gov > > <4d997767-d893-43e7-bd4a-41266c9b4...@lanl.gov> <206dc9cd-aa61-4e7c-8a28-7dd3279ce...@computer.org > <5175dc9a-ee1f-4b38-be89-eb55fcef3...@lanl.gov > <66736892-ce43-464c-b439-7ed03ddb0...@computer.org> > <8d67c754-d192-45ee-b4e8-071f67d78...@lanl.gov> <6D116CE6-9A8B-407E-A2D7-! 1 f716e827...@computer.org> <7d175b97-df6a-42db-81b9-4d9663861...@lanl.gov > <19ef4971-0390-4992-a8a7-cbc6b7189...@computer.org> X-Mailer: Apple Mail (2.930.3) Return-Path: jsquy...@cisco.com X-OriginalArrivalTime: 16 Jan 2009 18:08:11.0165 (UTC) FILETIME=[646AA4D0:01C97805] Er... whoops. This looks like my mistake (I just recently add MPI_REDUCE_LOCAL to the trunk -- not v1.3). I could have sworn that I tested this on a Mac, multiple times. I'll test again... On Jan 16, 2009, at 12:58 PM, Greg Watson wrote: When I try to build trunk, it fails with: i_f77.lax/libmpi_f77_pmpi.a/pwin_unlock_f.o .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/pwin_wait_f.o .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/pwtick_f.o .libs/libmpi_f77.lax/libmpi_f77_pmpi.a/ pwtime_f.o ../../../ompi/.libs/libmpi.0.0.0.dylib /usr/local/ openmpi-1.4-devel/lib/libopen-rte.0.0.0.dylib /usr/local/ openmpi-1.4-devel/lib/libopen-pal.0.0.0.dylib -install_name /usr/ local/openmpi-1.4-devel/lib/libmpi_f77.0.dylib - compatibility_version 1 -current_version 1.0 ld: duplicate symbol _mpi_reduce_local_f in .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/preduce_local_f.o and .libs/reduce_local_f.o collect2: ld returned 1 exit status make[3]: *** [libmpi_f77.la] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 I'm using the default configure command (./configure --prefix=xxx) on Mac OS X 10.5. This works fine on the 1.3 branch. Greg On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote: Okay, it is in the trunk as of r20284 - I'll file the request to have it moved to 1.3.1. Let me know if you get a chance to test the stdout/err stuff in the trunk - we should try and iterate it so any changes can make 1.3.1 as well. Thanks! Ralph On Jan 15, 2009, at 11:03 AM, Greg Watson wrote: Ralph, I think the second form would be ideal and would simplify things greatly. Greg On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote: Here is what I was able to do - note that the resolve messages are associated with the specific hostname, not the overall map: Will that work for you? If you like, I can remove the name= field from the noderesolve element since the info is specific to the host element that contains it. In other words, I can make it look like this: if that would help. Ralph On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote: We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user -thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml --tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (including stdout) as this would vastly simplify the parsing we need to do. Regards, Greg On
Re: [OMPI devel] -display-map
Fixed in r20288. Thanks for the catch. On Jan 16, 2009, at 2:04 PM, Greg Watson wrote: FYI, if I configure with --with-platform=contrib/platform/lanl/ macosx-dynamic the build succeeds. Greg On Jan 16, 2009, at 1:08 PM, Jeff Squyres wrote: References: <78c4b4d7-d9bc-4268-97cf-8cbad...@computer.org> > <9317bd55-13a2-44be-bcc0-3e42e2322...@computer.org> <5cb48a5d-1ce3-48f7-8890-c99239b0a...@lanl.gov > <22ebe824--47f1-a954-8b54536bf...@computer.org> > <6dda0348-96b4-4e3f-91b4-490631cfe...@computer.org> > <460591d2-bd7b-43ca-9b1e-1b4e02127...@lanl.gov > > <4d997767-d893-43e7-bd4a-41266c9b4...@lanl.gov> <206dc9cd-aa61-4e7c-8a28-7dd3279ce...@computer.org > <5175dc9a-ee1f-4b38-be89-eb55fcef3...@lanl.gov > <66736892-ce43-464c-b439-7ed03ddb0...@computer.org> > <8d67c754-d192-45ee-b4e8-071f67d78...@lanl.gov> <6D116CE6-9A8B-407E-A2D7-! 1 f716e827...@computer.org> <7d175b97-df6a-42db-81b9-4d9663861...@lanl.gov > <19ef4971-0390-4992-a8a7-cbc6b7189...@computer.org> X-Mailer: Apple Mail (2.930.3) Return-Path: jsquy...@cisco.com X-OriginalArrivalTime: 16 Jan 2009 18:08:11.0165 (UTC) FILETIME=[646AA4D0:01C97805] Er... whoops. This looks like my mistake (I just recently add MPI_REDUCE_LOCAL to the trunk -- not v1.3). I could have sworn that I tested this on a Mac, multiple times. I'll test again... On Jan 16, 2009, at 12:58 PM, Greg Watson wrote: When I try to build trunk, it fails with: i_f77.lax/libmpi_f77_pmpi.a/pwin_unlock_f.o .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/pwin_wait_f.o .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/pwtick_f.o .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/pwtime_f.o ../../../ompi/.libs/libmpi. 0.0.0.dylib /usr/local/openmpi-1.4-devel/lib/libopen-rte. 0.0.0.dylib /usr/local/openmpi-1.4-devel/lib/libopen-pal. 0.0.0.dylib -install_name /usr/local/openmpi-1.4-devel/lib/ libmpi_f77.0.dylib -compatibility_version 1 -current_version 1.0 ld: duplicate symbol _mpi_reduce_local_f in .libs/libmpi_f77.lax/ libmpi_f77_pmpi.a/preduce_local_f.o and .libs/reduce_local_f.o collect2: ld returned 1 exit status make[3]: *** [libmpi_f77.la] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1 I'm using the default configure command (./configure --prefix=xxx) on Mac OS X 10.5. This works fine on the 1.3 branch. Greg On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote: Okay, it is in the trunk as of r20284 - I'll file the request to have it moved to 1.3.1. Let me know if you get a chance to test the stdout/err stuff in the trunk - we should try and iterate it so any changes can make 1.3.1 as well. Thanks! Ralph On Jan 15, 2009, at 11:03 AM, Greg Watson wrote: Ralph, I think the second form would be ideal and would simplify things greatly. Greg On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote: Here is what I was able to do - note that the resolve messages are associated with the specific hostname, not the overall map: Will that work for you? If you like, I can remove the name= field from the noderesolve element since the info is specific to the host element that contains it. In other words, I can make it look like this: if that would help. Ralph On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote: We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user -thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml --tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (includ
Re: [OMPI devel] -display-map
Ralph, Is there something I need to do to enable stdout/err encapsulation (apart from -xml)? Here's what I see: $ mpirun -mca orte_show_resolved_nodenames 1 -xml -display-map -np 5 / Users/greg/Documents/workspace1/testMPI/Debug/testMPI n = 0 n = 0 n = 0 n = 0 n = 0 On Jan 15, 2009, at 1:13 PM, Ralph Castain wrote: Okay, it is in the trunk as of r20284 - I'll file the request to have it moved to 1.3.1. Let me know if you get a chance to test the stdout/err stuff in the trunk - we should try and iterate it so any changes can make 1.3.1 as well. Thanks! Ralph On Jan 15, 2009, at 11:03 AM, Greg Watson wrote: Ralph, I think the second form would be ideal and would simplify things greatly. Greg On Jan 15, 2009, at 10:53 AM, Ralph Castain wrote: Here is what I was able to do - note that the resolve messages are associated with the specific hostname, not the overall map: Will that work for you? If you like, I can remove the name= field from the noderesolve element since the info is specific to the host element that contains it. In other words, I can make it look like this: if that would help. Ralph On Jan 14, 2009, at 7:57 AM, Ralph Castain wrote: We -may- be able to do a more formal XML output at some point. The problem will be the natural interleaving of stdout/err from the various procs due to the async behavior of MPI. Mpirun receives fragmented output in the forwarding system, limited by the buffer sizes and the amount of data we can read at any one "bite" from the pipes connecting us to the procs. So even though the user -thinks- they output a single large line of stuff, it may show up at mpirun as a series of fragments. Hence, it gets tricky to know how to put appropriate XML brackets around it. Given this input about when you actually want resolved name info, I can at least do something about that area. Won't be in 1.3.0, but should make 1.3.1. As for XML-tagged stdout/err: the OMPI community asked me not to turn that feature "on" for 1.3.0 as they felt it hasn't been adequately tested yet. The code is present, but cannot be activated in 1.3.0. However, I believe it is activated on the trunk when you do --xml --tagged-output, so perhaps some testing will help us debug and validate it adequately for 1.3.1? Thanks Ralph On Jan 14, 2009, at 7:02 AM, Greg Watson wrote: Ralph, The only time we use the resolved names is when we get a map, so we consider them part of the map output. If quasi-XML is all that will ever be possible with 1.3, then you may as well leave as-is and we will attempt to clean it up in Eclipse. It would be nice if a future version of ompi could output correct XML (including stdout) as this would vastly simplify the parsing we need to do. Regards, Greg On Jan 13, 2009, at 3:30 PM, Ralph Castain wrote: Hmmm...well, I can't do either for 1.3.0 as it is departing this afternoon. The first option would be very hard to do. I would have to expose the display-map option across the code base and check it prior to printing anything about resolving node names. I guess I should ask: do you only want noderesolve statements when we are displaying the map? Right now, I will output them regardless. The second option could be done. I could check if any "display" option has been specified, and output the root at that time (likewise for the end). Anything we output in-between would be encapsulated between the two, but that would include any user output to stdout and/or stderr - which for 1.3.0 is not in xml. Any thoughts? Ralph PS. Guess I should clarify that I was not striving for true XML interaction here, but rather a quasi-XML format that would help you to filter the output. I have no problem trying to get to something more formally correct, but it could be tricky in some places to achieve it due to the inherent async nature of the beast. On Jan 13, 2009, at 12:17 PM, Greg Watson wrote: Ralph, The XML is looking better now, but there is still one problem. To be valid, there needs to be only one root element, but currently you don't have any (or many). So rather than: the XML should be: or: