I can report that that openmpi-v3.0.x-201803060306-c79e33b.tar.gz doesn’t show the problem.
I also reran all of the osu benchmarks and performance was general in-line with my 3.0.0 and 3.0.1rc3 builds. Any chance of the fix making the 3.0.1 release (or a minimal recommend patch I can apply to 3.0.0)? -Alan On Fri, Mar 9, 2018 at 12:21 PM Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > Specifically, Nathan is referring to: > > https://www.open-mpi.org/nightly/v3.0.x/ > > > > On Mar 9, 2018, at 12:51 PM, Nathan Hjelm <hje...@me.com> wrote: > > > > Fixed in master and I'm the 3.0.x branch. Try the nightly tarball. > > > > On Mar 9, 2018, at 10:01 AM, Alan Wild <a...@madllama.net> wrote: > > > >> I’ve been running the OSU micro benchmarks ( > http://mvapich.cse.ohio-state.edu/benchmarks/ ). on my various MPI > installations. One test that has been consistently failing is osu_put_bibw > when compiled with either openmpi 3.0.0 or openmpi 3.0.1rc3 when these > builds have also linked in the Mellanox mxm, hcoll, and SHaRP libraries AND > when running this two rank test across two nodes communicating with EDR > Infiniband. > >> > >> Fortunately this failure was true for both optimized and debug builds > of openmpi. > >> > >> Stepping into the code with Allinea DDT I think I found the issue... > >> > >> MPI_Win_post is ultimately calling ompi_osc_rdma_post_atomic() and on > line 245 there’s an if statement that reads: > >> > >> If (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { > >> return OMPI_ERR_OUT_OF_RESOURCE; > >> } > >> > >> (Sorry can’t easily cut and paste the code... my work PC can’t get to > my personal email so I have to post this from an iPad). > >> > >> Anyway, if you look at the proceeding ~16 lines of code... “ret” is > never initialized or assigned to in any way... (as far as I can tell). I’m > not completely familiar with the all the macros used, but it doesn’t appear > that any of them are assigning to “ret”. Surprised this isn’t causing more > chaos. > >> > >> If I’m “right”.. is the right thing just to initialize ret to > OMPI_SUCCESS or perhaps should this condition just come out? > >> > >> Thoughts? > >> > >> -Alan > >> a...@madllama.net > >> -- > >> a...@madllama.net http://humbleville.blogspot.com > >> _______________________________________________ > >> devel mailing list > >> devel@lists.open-mpi.org > >> https://lists.open-mpi.org/mailman/listinfo/devel > > _______________________________________________ > > devel mailing list > > devel@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel -- a...@madllama.net http://humbleville.blogspot.com
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel