Please do a "ompi_info --param btl sm" on your environment. The lazy_free 
direct the internals of the SM BTL not to release the memory fragments used to 
communicate until the lazy limit is reached. The default value was deemed as 
reasonable a while back when the number of default fragments was large. Lately 
there were some patches to reduce the memory footprint of the SM BTL and these 
might have lowered the available fragments to a limit where the default value 
for the lazy_free is now too large.

  george.

On Mar 2, 2012, at 10:08 , Matthias Jurenz wrote:

> In thanks to the OTPO tool, I figured out that setting the MCA parameter 
> btl_sm_fifo_lazy_free to 1 (default is 120) improves the latency 
> significantly: 
> 0,88µs
> 
> But somehow I get the feeling that this doesn't eliminate the actual 
> problem...
> 
> Matthias
> 
> On Friday 02 March 2012 15:37:03 Matthias Jurenz wrote:
>> On Friday 02 March 2012 14:58:45 Jeffrey Squyres wrote:
>>> Ok.  Good that there's no oversubscription bug, at least.  :-)
>>> 
>>> Did you see my off-list mail to you yesterday about building with an
>>> external copy of hwloc 1.4 to see if that helps?
>> 
>> Yes, I did - I answered as well. Our mail server seems to be something busy
>> today...
>> 
>> Just for the record: Using hwloc-1.4 makes no difference.
>> 
>> Matthias
>> 
>>> On Mar 2, 2012, at 8:26 AM, Matthias Jurenz wrote:
>>>> To exclude a possible bug within the LSF component, I rebuilt Open MPI
>>>> without support for LSF (--without-lsf).
>>>> 
>>>> -> It makes no difference - the latency is still bad: ~1.1us.
>>>> 
>>>> Matthias
>>>> 
>>>> On Friday 02 March 2012 13:50:13 Matthias Jurenz wrote:
>>>>> SORRY, it was obviously a big mistake by me. :-(
>>>>> 
>>>>> Open MPI 1.5.5 was built with LSF support, so when starting an LSF job
>>>>> it's necessary to request at least the number of tasks/cores as used
>>>>> for the subsequent mpirun command. That was not the case - I forgot
>>>>> the bsub's '-n' option to specify the number of task, so only *one*
>>>>> task/core was requested.
>>>>> 
>>>>> Open MPI 1.4.5 was built *without* LSF support, so the supposed
>>>>> misbehavior could not happen with it.
>>>>> 
>>>>> In short, there is no bug in Open MPI 1.5.x regarding to the detection
>>>>> of oversubscription. Sorry for any confusion!
>>>>> 
>>>>> Matthias
>>>>> 
>>>>> On Tuesday 28 February 2012 13:36:56 Matthias Jurenz wrote:
>>>>>> When using Open MPI v1.4.5 I get ~1.1us. That's the same result as I
>>>>>> get with Open MPI v1.5.x using mpi_yield_when_idle=0.
>>>>>> So I think there is a bug in Open MPI (v1.5.4 and v1.5.5rc2)
>>>>>> regarding to the automatic performance mode selection.
>>>>>> 
>>>>>> When enabling the degraded performance mode for Open MPI 1.4.5
>>>>>> (mpi_yield_when_idle=1) I get ~1.8us latencies.
>>>>>> 
>>>>>> Matthias
>>>>>> 
>>>>>> On Tuesday 28 February 2012 06:20:28 Christopher Samuel wrote:
>>>>>>> On 13/02/12 22:11, Matthias Jurenz wrote:
>>>>>>>> Do you have any idea? Please help!
>>>>>>> 
>>>>>>> Do you see the same bad latency in the old branch (1.4.5) ?
>>>>>>> 
>>>>>>> cheers,
>>>>>>> Chris
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to