I can report that that openmpi-v3.0.x-201803060306-c79e33b.tar.gz doesn’t
show the problem.

I also reran all of the osu benchmarks and performance was general in-line
with my 3.0.0 and 3.0.1rc3 builds.

Any chance of the fix making the 3.0.1 release (or a minimal recommend
patch I can apply to 3.0.0)?

-Alan

On Fri, Mar 9, 2018 at 12:21 PM Jeff Squyres (jsquyres) <jsquy...@cisco.com>
wrote:

> Specifically, Nathan is referring to:
>
>     https://www.open-mpi.org/nightly/v3.0.x/
>
>
> > On Mar 9, 2018, at 12:51 PM, Nathan Hjelm <hje...@me.com> wrote:
> >
> > Fixed in master and I'm the 3.0.x branch. Try the nightly tarball.
> >
> > On Mar 9, 2018, at 10:01 AM, Alan Wild <a...@madllama.net> wrote:
> >
> >> I’ve been running the OSU micro benchmarks  (
> http://mvapich.cse.ohio-state.edu/benchmarks/ ). on my various MPI
> installations.  One test that has been consistently failing is osu_put_bibw
> when compiled with either openmpi 3.0.0 or openmpi 3.0.1rc3 when these
> builds have also linked in the Mellanox mxm, hcoll, and SHaRP libraries AND
> when running this two rank test across two nodes communicating with EDR
> Infiniband.
> >>
> >> Fortunately this failure was true for both optimized and debug builds
> of openmpi.
> >>
> >> Stepping into the code with Allinea DDT I think I found the issue...
> >>
> >> MPI_Win_post is ultimately calling ompi_osc_rdma_post_atomic() and on
> line 245 there’s an if statement that reads:
> >>
> >>         If (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) {
> >>                 return OMPI_ERR_OUT_OF_RESOURCE;
> >>         }
> >>
> >> (Sorry can’t easily cut and paste the code... my work PC can’t get to
> my personal email so I have to post this from an iPad).
> >>
> >> Anyway,  if you look at the proceeding ~16 lines of code... “ret” is
> never initialized or assigned to in any way... (as far as I can tell).  I’m
> not completely familiar with the all the macros used, but it doesn’t appear
> that any of them are assigning to “ret”.  Surprised this isn’t causing more
> chaos.
> >>
> >> If I’m “right”.. is the right thing just to initialize ret to
> OMPI_SUCCESS or perhaps should this condition just come out?
> >>
> >> Thoughts?
> >>
> >> -Alan
> >> a...@madllama.net
> >> --
> >> a...@madllama.net http://humbleville.blogspot.com
> >> _______________________________________________
> >> devel mailing list
> >> devel@lists.open-mpi.org
> >> https://lists.open-mpi.org/mailman/listinfo/devel
> > _______________________________________________
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/devel
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

-- 
a...@madllama.net http://humbleville.blogspot.com
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to