I’ve been running the OSU micro benchmarks ( http://mvapich.cse.ohio-state.edu/benchmarks/ ). on my various MPI installations. One test that has been consistently failing is osu_put_bibw when compiled with either openmpi 3.0.0 or openmpi 3.0.1rc3 when these builds have also linked in the Mellanox mxm, hcoll, and SHaRP libraries AND when running this two rank test across two nodes communicating with EDR Infiniband.
Fortunately this failure was true for both optimized and debug builds of openmpi. Stepping into the code with Allinea DDT I think I found the issue... MPI_Win_post is ultimately calling ompi_osc_rdma_post_atomic() and on line 245 there’s an if statement that reads: If (OPAL_UNLIKELY(OMPI_SUCCESS != ret)) { return OMPI_ERR_OUT_OF_RESOURCE; } (Sorry can’t easily cut and paste the code... my work PC can’t get to my personal email so I have to post this from an iPad). Anyway, if you look at the proceeding ~16 lines of code... “ret” is never initialized or assigned to in any way... (as far as I can tell). I’m not completely familiar with the all the macros used, but it doesn’t appear that any of them are assigning to “ret”. Surprised this isn’t causing more chaos. If I’m “right”.. is the right thing just to initialize ret to OMPI_SUCCESS or perhaps should this condition just come out? Thoughts? -Alan a...@madllama.net -- a...@madllama.net http://humbleville.blogspot.com
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel