[OMPI devel] openib currently broken
The freelist changes from yesterday appear to have broken the openib btl. We didn't get lots of test failures in MTT last night only because there was a separate (unrelated) typo in the ofud BTL that prevented the nightly tarball from building on any IB-capable machines. :-) Rich hopes to look into fixing the openib BTL problem today; he thinks it's a case of a simple oversight: the openib BTL is not using the new freelist init functions. Rich: are there other places that are not using the new init functions that need to? -- Jeff Squyres Cisco Systems
Re: [OMPI devel] openib currently broken
On 11/2/07 12:21 PM, "Jeff Squyres" wrote: > The freelist changes from yesterday appear to have broken the openib > btl. We didn't get lots of test failures in MTT last night only > because there was a separate (unrelated) typo in the ofud BTL that > prevented the nightly tarball from building on any IB-capable > machines. :-) > > Rich hopes to look into fixing the openib BTL problem today; he > thinks it's a case of a simple oversight: the openib BTL is not using > the new freelist init functions. > > Rich: are there other places that are not using the new init > functions that need to? > >>> >> the ompi free list has two init functions, I changed just one. The IB >>> btl uses the >>> >> one I have not yet changed, but the pml uses the one I did change. > >>> >> rich > > -- > Jeff Squyres > Cisco Systems > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] openib currently broken
R16641 should have fixed the regression. Anyone using ompi_free_list_t_ex() and providing a memory allocator would have been bitten by this, since I did not update this function (which will be deprecated in favor of a version parallel to ompi_free_list_t_new) to initialize the new fields defined. From looking through the btls, this seems to be only the openib btl. Rich On 11/2/07 12:31 PM, "Richard Graham" wrote: > > > > On 11/2/07 12:21 PM, "Jeff Squyres" wrote: > >> The freelist changes from yesterday appear to have broken the openib >> btl. We didn't get lots of test failures in MTT last night only >> because there was a separate (unrelated) typo in the ofud BTL that >> prevented the nightly tarball from building on any IB-capable >> machines. :-) >> >> Rich hopes to look into fixing the openib BTL problem today; he >> thinks it's a case of a simple oversight: the openib BTL is not using >> the new freelist init functions. >> >> Rich: are there other places that are not using the new init >> functions that need to? >> >> the ompi free list has two init functions, I changed just one. The IB btl uses the >> one I have not yet changed, but the pml uses the one I did change. >> >> rich >> >> -- >> Jeff Squyres >> Cisco Systems >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] FreeBSD Support?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi everyone. So I noticed, after doing a build of the latest ompi-trunk (rev. 16641) on a FreeBSD 6.2 machine, that the autogen.sh script is still dying due to a configure script permissions issue. I'm pasting a diff that solves the problem on my machine. Are there any foreseeable problems with committing this change to ompi-trunk? - --- Index: autogen.sh === - --- autogen.sh (revision 16641) +++ autogen.sh (working copy) @@ -435,6 +435,7 @@ pushd opal/libltdl > /dev/null 2>&1 run_and_check $ompi_aclocal run_and_check $ompi_automake +chmod u+w configure # Need this for FreeBSD. run_and_check $ompi_autoconf popd > /dev/null 2>&1 unset indent - --- Thanks. - -- Karol Mroz km...@cs.ubc.ca -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHK3LFuoug78g/Mz8RAvIDAJ9+XiEdY24ToQeaZkLxVvMK2FNuqQCcDDPU 5lMQijN9Y9ldt+zGpm/ZcJU= =eSQ/ -END PGP SIGNATURE-
Re: [OMPI devel] openib currently broken
Rich - I'm not 100% sure its fixed - I'm still seeing "out of memory" errors when running about 40 prob imb over openib. But I ran out of time to investigate deeply... Could you try running a nontrivial omb to check? -jms Sent from my PDA -Original Message- From: Richard Graham [mailto:rlgra...@ornl.gov] Sent: Friday, November 02, 2007 02:07 PM Eastern Standard Time To: Open MPI Developers Subject:Re: [OMPI devel] openib currently broken R16641 should have fixed the regression. Anyone using ompi_free_list_t_ex() and providing a memory allocator would have been bitten by this, since I did not update this function (which will be deprecated in favor of a version parallel to ompi_free_list_t_new) to initialize the new fields defined. From looking through the btls, this seems to be only the openib btl. Rich On 11/2/07 12:31 PM, "Richard Graham" wrote: > > > > On 11/2/07 12:21 PM, "Jeff Squyres" wrote: > >> The freelist changes from yesterday appear to have broken the openib >> btl. We didn't get lots of test failures in MTT last night only >> because there was a separate (unrelated) typo in the ofud BTL that >> prevented the nightly tarball from building on any IB-capable >> machines. :-) >> >> Rich hopes to look into fixing the openib BTL problem today; he >> thinks it's a case of a simple oversight: the openib BTL is not using >> the new freelist init functions. >> >> Rich: are there other places that are not using the new init >> functions that need to? >> >> the ompi free list has two init functions, I changed just one. The IB btl uses the >> one I have not yet changed, but the pml uses the one I did change. >> >> rich >> >> -- >> Jeff Squyres >> Cisco Systems >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] openib currently broken
Rich, Do the ompi_free_list changes impact the sm btl? Solaris SPARC sm btl seems to have an issue starting with last nights put back but I have not looked into it yet. -DON Richard Graham wrote: R16641 should have fixed the regression. Anyone using ompi_free_list_t_ex() and providing a memory allocator would have been bitten by this, since I did not update this function (which will be deprecated in favor of a version parallel to ompi_free_list_t_new) to initialize the new fields defined. From looking through the btls, this seems to be only the openib btl. Rich On 11/2/07 12:31 PM, "Richard Graham" wrote: On 11/2/07 12:21 PM, "Jeff Squyres" wrote: The freelist changes from yesterday appear to have broken the openib btl. We didn't get lots of test failures in MTT last night only because there was a separate (unrelated) typo in the ofud BTL that prevented the nightly tarball from building on any IB-capable machines. :-) Rich hopes to look into fixing the openib BTL problem today; he thinks it's a case of a simple oversight: the openib BTL is not using the new freelist init functions. Rich: are there other places that are not using the new init functions that need to? the ompi free list has two init functions, I changed just one. The IB btl uses the one I have not yet changed, but the pml uses the one I did change. rich -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] openib currently broken
It does. I was able to run the point-to-point intel tests with 3 procs using sm and self, ob1 and dr. Rich On 11/2/07 3:57 PM, "Don Kerr" wrote: > Rich, > > Do the ompi_free_list changes impact the sm btl? Solaris SPARC sm btl > seems to have an issue starting with last nights put back but I have not > looked into it yet. > > -DON > > Richard Graham wrote: > >> R16641 should have fixed the regression. Anyone using >> ompi_free_list_t_ex() and providing >> a memory allocator would have been bitten by this, since I did not >> update this function >> (which will be deprecated in favor of a version parallel to >> ompi_free_list_t_new) to initialize >> the new fields defined. From looking through the btls, this seems to >> be only the openib btl. >> >> Rich >> >> >> On 11/2/07 12:31 PM, "Richard Graham" wrote: >> >> >> >> >> On 11/2/07 12:21 PM, "Jeff Squyres" wrote: >> >> The freelist changes from yesterday appear to have broken the >> openib >> btl. We didn't get lots of test failures in MTT last night only >> because there was a separate (unrelated) typo in the ofud BTL >> that >> prevented the nightly tarball from building on any IB-capable >> machines. :-) >> >> Rich hopes to look into fixing the openib BTL problem today; he >> thinks it's a case of a simple oversight: the openib BTL is >> not using >> the new freelist init functions. >> >> Rich: are there other places that are not using the new init >> functions that need to? >> the ompi free list has two init functions, I changed just >> one. The IB btl uses the one I have not yet changed, but the pml uses the one I did >> change. >> rich >> >> -- >> Jeff Squyres >> Cisco Systems >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] openib currently broken
Jeff, I ran IMB on 60 procs with the openib and self btls, and all ran fine. The tests that were run were ping-pong, ping-ping, SendRecv, Exchange, Allreduce, Reduce, Reduce_scatter, Allgather, Allgatherv, Alltoall, Bcast, and Barrier. I also ran on 40 procs, and several smaller runs. If you can reproduce and provide more details (I realize you ran out of time), I can take another look. I would expect a bug in the changes would cause one to walk over memory, rather than change the memory usage, but who knows. I will be off line until late Sunday... Rich On 11/2/07 3:26 PM, "Jeff Squyres (jsquyres)" wrote: > Rich - > > I'm not 100% sure its fixed - I'm still seeing "out of memory" errors when > running about 40 prob imb over openib. But I ran out of time to investigate > deeply... > > Could you try running a nontrivial omb to check? > > -jms > Sent from my PDA > > -Original Message- > From: Richard Graham [mailto:rlgra...@ornl.gov] > Sent: Friday, November 02, 2007 02:07 PM Eastern Standard Time > To: Open MPI Developers > Subject:Re: [OMPI devel] openib currently broken > > R16641 should have fixed the regression. Anyone using ompi_free_list_t_ex() > and providing > a memory allocator would have been bitten by this, since I did not update > this function > (which will be deprecated in favor of a version parallel to > ompi_free_list_t_new) to initialize > the new fields defined. From looking through the btls, this seems to be > only the openib btl. > > Rich > > > On 11/2/07 12:31 PM, "Richard Graham" wrote: > >> > >> > >> > >> > On 11/2/07 12:21 PM, "Jeff Squyres" wrote: >> > >>> >> The freelist changes from yesterday appear to have broken the openib >>> >> btl. We didn't get lots of test failures in MTT last night only >>> >> because there was a separate (unrelated) typo in the ofud BTL that >>> >> prevented the nightly tarball from building on any IB-capable >>> >> machines. :-) >>> >> >>> >> Rich hopes to look into fixing the openib BTL problem today; he >>> >> thinks it's a case of a simple oversight: the openib BTL is not using >>> >> the new freelist init functions. >>> >> >>> >> Rich: are there other places that are not using the new init >>> >> functions that need to? >>> >> >>> >> the ompi free list has two init functions, I changed just one. The IB > btl uses the >>> >> one I have not yet changed, but the pml uses the one I did change. >>> >> >>> >> rich >>> >> >>> >> -- >>> >> Jeff Squyres >>> >> Cisco Systems >>> >> >>> >> ___ >>> >> devel mailing list >>> >> de...@open-mpi.org >>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> > >> > >> > >> > ___ >> > devel mailing list >> > de...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] openib currently broken
Did you run with a higher number of procs? -jms Sent from my PDA -Original Message- From: Richard Graham [mailto:rlgra...@ornl.gov] Sent: Friday, November 02, 2007 04:05 PM Eastern Standard Time To: Open MPI Developers Subject:Re: [OMPI devel] openib currently broken It does. I was able to run the point-to-point intel tests with 3 procs using sm and self, ob1 and dr. Rich On 11/2/07 3:57 PM, "Don Kerr" wrote: > Rich, > > Do the ompi_free_list changes impact the sm btl? Solaris SPARC sm btl > seems to have an issue starting with last nights put back but I have not > looked into it yet. > > -DON > > Richard Graham wrote: > >> R16641 should have fixed the regression. Anyone using >> ompi_free_list_t_ex() and providing >> a memory allocator would have been bitten by this, since I did not >> update this function >> (which will be deprecated in favor of a version parallel to >> ompi_free_list_t_new) to initialize >> the new fields defined. From looking through the btls, this seems to >> be only the openib btl. >> >> Rich >> >> >> On 11/2/07 12:31 PM, "Richard Graham" wrote: >> >> >> >> >> On 11/2/07 12:21 PM, "Jeff Squyres" wrote: >> >> The freelist changes from yesterday appear to have broken the >> openib >> btl. We didn't get lots of test failures in MTT last night only >> because there was a separate (unrelated) typo in the ofud BTL >> that >> prevented the nightly tarball from building on any IB-capable >> machines. :-) >> >> Rich hopes to look into fixing the openib BTL problem today; he >> thinks it's a case of a simple oversight: the openib BTL is >> not using >> the new freelist init functions. >> >> Rich: are there other places that are not using the new init >> functions that need to? >> the ompi free list has two init functions, I changed just >> one. The IB btl uses the one I have not yet changed, but the pml uses the one I did >> change. >> rich >> >> -- >> Jeff Squyres >> Cisco Systems >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] openib currently broken
Ok. I'll dig a bit over the weekend. Thanks! -jms Sent from my PDA -Original Message- From: Richard Graham [mailto:rlgra...@ornl.gov] Sent: Friday, November 02, 2007 05:50 PM Eastern Standard Time To: Open MPI Developers Subject:Re: [OMPI devel] openib currently broken Jeff, I ran IMB on 60 procs with the openib and self btls, and all ran fine. The tests that were run were ping-pong, ping-ping, SendRecv, Exchange, Allreduce, Reduce, Reduce_scatter, Allgather, Allgatherv, Alltoall, Bcast, and Barrier. I also ran on 40 procs, and several smaller runs. If you can reproduce and provide more details (I realize you ran out of time), I can take another look. I would expect a bug in the changes would cause one to walk over memory, rather than change the memory usage, but who knows. I will be off line until late Sunday... Rich On 11/2/07 3:26 PM, "Jeff Squyres (jsquyres)" wrote: > Rich - > > I'm not 100% sure its fixed - I'm still seeing "out of memory" errors when > running about 40 prob imb over openib. But I ran out of time to investigate > deeply... > > Could you try running a nontrivial omb to check? > > -jms > Sent from my PDA > > -Original Message- > From: Richard Graham [mailto:rlgra...@ornl.gov] > Sent: Friday, November 02, 2007 02:07 PM Eastern Standard Time > To: Open MPI Developers > Subject:Re: [OMPI devel] openib currently broken > > R16641 should have fixed the regression. Anyone using ompi_free_list_t_ex() > and providing > a memory allocator would have been bitten by this, since I did not update > this function > (which will be deprecated in favor of a version parallel to > ompi_free_list_t_new) to initialize > the new fields defined. From looking through the btls, this seems to be > only the openib btl. > > Rich > > > On 11/2/07 12:31 PM, "Richard Graham" wrote: > >> > >> > >> > >> > On 11/2/07 12:21 PM, "Jeff Squyres" wrote: >> > >>> >> The freelist changes from yesterday appear to have broken the openib >>> >> btl. We didn't get lots of test failures in MTT last night only >>> >> because there was a separate (unrelated) typo in the ofud BTL that >>> >> prevented the nightly tarball from building on any IB-capable >>> >> machines. :-) >>> >> >>> >> Rich hopes to look into fixing the openib BTL problem today; he >>> >> thinks it's a case of a simple oversight: the openib BTL is not using >>> >> the new freelist init functions. >>> >> >>> >> Rich: are there other places that are not using the new init >>> >> functions that need to? >>> >> >>> >> the ompi free list has two init functions, I changed just one. The IB > btl uses the >>> >> one I have not yet changed, but the pml uses the one I did change. >>> >> >>> >> rich >>> >> >>> >> -- >>> >> Jeff Squyres >>> >> Cisco Systems >>> >> >>> >> ___ >>> >> devel mailing list >>> >> de...@open-mpi.org >>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> > >> > >> > >> > ___ >> > devel mailing list >> > de...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel