Re: [OMPI devel] poor btl sm latency

2012-03-12 Thread Matthias Jurenz
It's a SUSE Linux Enterprise Server 11 Service Pack 1 with kernel version 
2.6.32.49-0.3-default.

Matthias

On Friday 09 March 2012 16:36:41 you wrote:
> What OS are you using ?
> 
> Joshua
> 
> - Original Message -
> From: Matthias Jurenz [mailto:matthias.jur...@tu-dresden.de]
> Sent: Friday, March 09, 2012 08:50 AM
> To: Open MPI Developers 
> Cc: Mora, Joshua
> Subject: Re: [OMPI devel] poor btl sm latency
> 
> I just made an interesting observation:
> 
> When binding the processes to two neighboring cores (L2 sharing) NetPIPE
> shows *sometimes* pretty good results: ~0.5us
> 
> $ mpirun -mca btl sm,self -np 1 hwloc-bind -v core:0 ./NPmpi_ompi1.5.5 -u 4
> -n 10 -p 0 : -np 1 hwloc-bind -v core:1 ./NPmpi_ompi1.5.5 -u 4 -n
> 10 -p 0 using object #0 depth 6 below cpuset 0x,0x
> using object #1 depth 6 below cpuset 0x,0x
> adding 0x0001 to 0x0
> adding 0x0001 to 0x0
> assuming the command starts at ./NPmpi_ompi1.5.5
> binding on cpu set 0x0001
> adding 0x0002 to 0x0
> adding 0x0002 to 0x0
> assuming the command starts at ./NPmpi_ompi1.5.5
> binding on cpu set 0x0002
> Using no perturbations
> 
> 0: n035
> Using no perturbations
> 
> 1: n035
> Now starting the main loop
>   0:   1 bytes 10 times -->  6.01 Mbps in   1.27 usec
>   1:   2 bytes 10 times --> 12.04 Mbps in   1.27 usec
>   2:   3 bytes 10 times --> 18.07 Mbps in   1.27 usec
>   3:   4 bytes 10 times --> 24.13 Mbps in   1.26 usec
> 
> $ mpirun -mca btl sm,self -np 1 hwloc-bind -v core:0 ./NPmpi_ompi1.5.5 -u 4
> -n 10 -p 0 : -np 1 hwloc-bind -v core:1 ./NPmpi_ompi1.5.5 -u 4 -n
> 10 -p 0 using object #0 depth 6 below cpuset 0x,0x
> adding 0x0001 to 0x0
> adding 0x0001 to 0x0
> assuming the command starts at ./NPmpi_ompi1.5.5
> binding on cpu set 0x0001
> using object #1 depth 6 below cpuset 0x,0x
> adding 0x0002 to 0x0
> adding 0x0002 to 0x0
> assuming the command starts at ./NPmpi_ompi1.5.5
> binding on cpu set 0x0002
> Using no perturbations
> 
> 0: n035
> Using no perturbations
> 
> 1: n035
> Now starting the main loop
>   0:   1 bytes 10 times --> 12.96 Mbps in   0.59 usec
>   1:   2 bytes 10 times --> 25.78 Mbps in   0.59 usec
>   2:   3 bytes 10 times --> 38.62 Mbps in   0.59 usec
>   3:   4 bytes 10 times --> 52.88 Mbps in   0.58 usec
> 
> I can reproduce that approximately every tenth run.
> 
> When binding the processes for exclusive L2 caches (e.g. core 0 and 2) I
> get constant latencies ~1.1us
> 
> Matthias
> 
> On Monday 05 March 2012 09:52:39 Matthias Jurenz wrote:
> > Here the SM BTL parameters:
> > 
> > $ ompi_info --param btl sm
> > MCA btl: parameter "btl_base_verbose" (current value: <0>, data source:
> > default value) Verbosity level of the BTL framework
> > MCA btl: parameter "btl" (current value: , data source:
> > file
> > [/sw/atlas/libraries/openmpi/1.5.5rc3/x86_64/etc/openmpi-mca-params.conf]
> > ) Default selection set of components for the btl framework ( means
> > use all components that can be found)
> > MCA btl: information "btl_sm_have_knem_support" (value: <1>, data source:
> > default value) Whether this component supports the knem Linux kernel
> > module or not
> > MCA btl: parameter "btl_sm_use_knem" (current value: <-1>, data source:
> > default value) Whether knem support is desired or not (negative = try to
> > enable knem support, but continue even if it is not available, 0 = do not
> > enable knem support, positive = try to enable knem support and fail if it
> > is not available)
> > MCA btl: parameter "btl_sm_knem_dma_min" (current value: <0>, data
> > source: default value) Minimum message size (in bytes) to use the knem
> > DMA mode; ignored if knem does not support DMA mode (0 = do not use the
> > knem DMA mode) MCA btl: parameter "btl_sm_knem_max_simultaneous"
> > (current value: <0>, data source: default value) Max number of
> > simultaneous ongoing knem operations to support (0 = do everything
> > synchronously, which probably gives the best large message latency; >0
> > means to do all operations asynchronously, which supports better overlap
> > for simultaneous large message sends)
> > MCA btl: parameter "btl_sm_free_list_num" (current value: <8>, data
> > source: default value)
> > MCA btl: parameter "btl_sm_free_list_max" (current value: <-1>, data
> > source: default value)
> > MCA btl: parameter "btl_sm_free_list_inc" (current value: <64>, data
> > source: default value)
> > MCA btl: parameter "btl_sm_max_procs" (current value: <-1>, data source:
> > default value)
> > MCA btl: parameter "btl_sm_mpool" (current value: , data source:
> > default value)
> > MCA btl: parameter "btl_sm_fifo_size" (current value: <4096>, data
> > source: default value)
> > MCA btl: parameter "btl_sm_num_fifos" (current value: <1>, data source:
> > default val

[OMPI devel] help-mpi-btl-openib.txt needs updating with real btl_openib_ib_min_rnr_timer and btl_openib_ib_timeout defaults

2012-03-12 Thread Chris Samuel
Hi all,

We've been working trying to track down an IB issue here where a
user was having code (Gromacs, run with OMPI 1.4.5) dieing with:

[[18115,1],2][btl_openib_component.c:3224:handle_wc] from bruce030 to: bruce130 
error polling LP CQ with status 
RETRY EXCEEDED ERROR status number 12 for wr_id 7406080 opcode 0 vendor error 
129 qp_idx 2

The odd thing I've spotted though is that in the error it says:

* btl_openib_ib_retry_count - The number of times the sender will attempt to 
retry (defaulted to 7, the maximum 
value).
* btl_openib_ib_timeout - The local ACK timeout parameter (defaulted to 10).

Those don't match the values compiled into OMPI 1.4.5:

ompi_info -a | egrep 'btl_openib_ib_min_rnr_timer|btl_openib_ib_timeout'
 MCA btl: parameter "btl_openib_ib_min_rnr_timer" (current value: "25",
data source: default value)
 MCA btl: parameter "btl_openib_ib_timeout" (current value: "20", data
source: default value)

It looks like the file:

 ompi/mca/btl/openib/help-mpi-btl-openib.txt

needs to be updated with the correct values.

We're stuck on 1.4 for the forseable future (too many apps to
recompile) so I don't know if 1.5+ has the same issue.

It's been there since at least 2009.. :-)

http://www.open-mpi.org/community/lists/users/2009/03/8600.php

cheers!
Chris
-- 
   Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/


Re: [OMPI devel] help-mpi-btl-openib.txt needs updating with real btl_openib_ib_min_rnr_timer and btl_openib_ib_timeout defaults

2012-03-12 Thread Chris Samuel
On Tuesday 13 March 2012 10:06:43 Chris Samuel wrote:

> Those don't match the values compiled into OMPI 1.4.5:
> 
> ompi_info -a | egrep
> 'btl_openib_ib_min_rnr_timer|btl_openib_ib_timeout' MCA btl:
> parameter "btl_openib_ib_min_rnr_timer" (current value: "25", data
> source: default value)
>  MCA btl: parameter "btl_openib_ib_timeout" (current value: "20",
> data source: default value)

Wrong command line, sigh..

# ompi_info -a | egrep 'MCA.*(btl_openib_ib_retry_count|btl_openib_ib_timeout)'
 MCA btl: parameter "btl_openib_ib_timeout" (current value: 
"20", data source: default value)
 MCA btl: parameter "btl_openib_ib_retry_count" (current value: 
"7", data source: default value)

Even more oddly, the second run of the users job did print
that 7 and 20 as the defaults.

I suspect this is likely the result of the user accidentally
using an earlier version of OMPI, not 1.4.5, for his first run.

Sorry for the noise!

cheers,
Chris
-- 
   Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/