> On Nov 6, 2014, at 1:39 PM, Nathan Hjelm <hje...@lanl.gov> wrote: > > On Thu, Nov 06, 2014 at 04:29:44PM -0500, Joshua Ladd wrote: >> On Thursday, November 6, 2014, Nathan Hjelm <hje...@lanl.gov> wrote: >> >> On Thu, Nov 06, 2014 at 04:06:23PM -0500, Joshua Ladd wrote: >>> Nathan, >>> Has this bug always been present in OpenIB or is this a recent >> addition? >>> If this is regression, I would also be inclined to say that this is >> a >> >> The bug is as old as the message coalescing feature in the openib >> btl. When the feature was added the openib btl no longer supported >> calling btl_free on descriptors allocated by sendi (a serious bug). >> >>> blocker for 1.8.4. This is a SIGNIFICANT bug. Both Howard and I >> were quite >>> surprised that all the while this code has been in use at LANL >>> in production systems, this issue was never discovered. >> >> Don't know why it suddenly came up but in 1.8.1 we added a inline send >> optimization to the MPI_Isend path. The optimization uses the btl_sendi >> function to get the fragment on the wire without allocating a send >> request. If this fails the btl fragment returned by sendi is released >> with btl_free, a send request is allocated, and the normal send path is >> used. The optimization was tested with the openib btl so I don't know >> why this wasn't caught earlier. My guess is some other change may have >> triggered it. >> >> We fixed the bug in 1.8.4 by totally disabling message coalescing. The >> feature is meant to game the osu_mbw_mr test and does next to nothing >> for real apps. Additionally, the inline send optimization makes the >> feature less of a win with osu_mbw_mr anyway. We beat mvapich handily on >> LANL systems without message coalescing. >> >> [josh] Can you point to the PR, Nathan? I didn't realize this was already >> addressed in the 1.8.4 prerelease. I would seek Howard's guidance as to >> whether this is an acceptable solution for LANL. Regardless of your >> opinion about the utility of MC, real decisions are made on the basis of >> those benchmarks, so I'm not entirely convinced of your argument >> here. OMPI, as we are all aware tends to be a target on the basis of >> these comparisons. > > This was already discussed here. On LANL systems the message rates are > the same with and without the message coalescing "feature" so we are > turning it off and disabling it for 1.8.4. As for the PR. It looks like > Ralph has not merged it into 1.8.4 yet.
Correct - it is still pending review/comment. You’ll find it here: https://github.com/open-mpi/ompi-release/pull/79 <https://github.com/open-mpi/ompi-release/pull/79> > > -Nathan > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/11/16253.php