> On Nov 6, 2014, at 1:39 PM, Nathan Hjelm <hje...@lanl.gov> wrote:
> 
> On Thu, Nov 06, 2014 at 04:29:44PM -0500, Joshua Ladd wrote:
>>   On Thursday, November 6, 2014, Nathan Hjelm <hje...@lanl.gov> wrote:
>> 
>>     On Thu, Nov 06, 2014 at 04:06:23PM -0500, Joshua Ladd wrote:
>>>   Nathan,
>>>   Has this bug always been present in OpenIB or is this a recent
>>     addition?
>>>   If this is regression, I would also be inclined to say that this is
>>     a
>> 
>>     The bug is as old as the message coalescing feature in the openib
>>     btl. When the feature was added the openib btl no longer supported
>>     calling btl_free on descriptors allocated by sendi (a serious bug).
>> 
>>>   blocker for 1.8.4. This is a SIGNIFICANT bug. Both Howard and I
>>     were quite
>>>   surprised that all the while this code has been in use at LANL
>>>   in production systems, this issue was never discovered.
>> 
>>     Don't know why it suddenly came up but in 1.8.1 we added a inline send
>>     optimization to the MPI_Isend path. The optimization uses the btl_sendi
>>     function to get the fragment on the wire without allocating a send
>>     request. If this fails the btl fragment returned by sendi is released
>>     with btl_free, a send request is allocated, and the normal send path is
>>     used. The optimization was tested with the openib btl so I don't know
>>     why this wasn't caught earlier. My guess is some other change may have
>>     triggered it.
>> 
>>     We fixed the bug in 1.8.4 by totally disabling message coalescing. The
>>     feature is meant to game the osu_mbw_mr test and does next to nothing
>>     for real apps. Additionally, the inline send optimization makes the
>>     feature less of a win with osu_mbw_mr anyway. We beat mvapich handily on
>>     LANL systems without message coalescing.
>> 
>>   [josh] Can you point to the PR, Nathan? I didn't realize this was already
>>   addressed in the 1.8.4 prerelease. I would seek Howard's guidance as to
>>   whether this is an acceptable solution for LANL.  Regardless of your
>>   opinion about the utility of MC, real decisions are made on the basis of
>>   those benchmarks, so I'm not entirely convinced of your argument
>>   here.  OMPI, as we are all aware tends to be a target on the basis of
>>   these comparisons. 
> 
> This was already discussed here. On LANL systems the message rates are
> the same with and without the message coalescing "feature" so we are
> turning it off and disabling it for 1.8.4. As for the PR. It looks like
> Ralph has not merged it into 1.8.4 yet.

Correct - it is still pending review/comment. You’ll find it here:

https://github.com/open-mpi/ompi-release/pull/79 
<https://github.com/open-mpi/ompi-release/pull/79>


> 
> -Nathan
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/11/16253.php

Reply via email to