It's a SUSE Linux Enterprise Server 11 Service Pack 1 with kernel version
2.6.32.49-0.3-default.
Matthias
On Friday 09 March 2012 16:36:41 you wrote:
> What OS are you using ?
>
> Joshua
>
> - Original Message -
> From: Matthias Jurenz [mailto:matthias.jur...@tu-dresden.de]
> Sent: Friday, March 09, 2012 08:50 AM
> To: Open MPI Developers
> Cc: Mora, Joshua
> Subject: Re: [OMPI devel] poor btl sm latency
>
> I just made an interesting observation:
>
> When binding the processes to two neighboring cores (L2 sharing) NetPIPE
> shows *sometimes* pretty good results: ~0.5us
>
> $ mpirun -mca btl sm,self -np 1 hwloc-bind -v core:0 ./NPmpi_ompi1.5.5 -u 4
> -n 10 -p 0 : -np 1 hwloc-bind -v core:1 ./NPmpi_ompi1.5.5 -u 4 -n
> 10 -p 0 using object #0 depth 6 below cpuset 0x,0x
> using object #1 depth 6 below cpuset 0x,0x
> adding 0x0001 to 0x0
> adding 0x0001 to 0x0
> assuming the command starts at ./NPmpi_ompi1.5.5
> binding on cpu set 0x0001
> adding 0x0002 to 0x0
> adding 0x0002 to 0x0
> assuming the command starts at ./NPmpi_ompi1.5.5
> binding on cpu set 0x0002
> Using no perturbations
>
> 0: n035
> Using no perturbations
>
> 1: n035
> Now starting the main loop
> 0: 1 bytes 10 times --> 6.01 Mbps in 1.27 usec
> 1: 2 bytes 10 times --> 12.04 Mbps in 1.27 usec
> 2: 3 bytes 10 times --> 18.07 Mbps in 1.27 usec
> 3: 4 bytes 10 times --> 24.13 Mbps in 1.26 usec
>
> $ mpirun -mca btl sm,self -np 1 hwloc-bind -v core:0 ./NPmpi_ompi1.5.5 -u 4
> -n 10 -p 0 : -np 1 hwloc-bind -v core:1 ./NPmpi_ompi1.5.5 -u 4 -n
> 10 -p 0 using object #0 depth 6 below cpuset 0x,0x
> adding 0x0001 to 0x0
> adding 0x0001 to 0x0
> assuming the command starts at ./NPmpi_ompi1.5.5
> binding on cpu set 0x0001
> using object #1 depth 6 below cpuset 0x,0x
> adding 0x0002 to 0x0
> adding 0x0002 to 0x0
> assuming the command starts at ./NPmpi_ompi1.5.5
> binding on cpu set 0x0002
> Using no perturbations
>
> 0: n035
> Using no perturbations
>
> 1: n035
> Now starting the main loop
> 0: 1 bytes 10 times --> 12.96 Mbps in 0.59 usec
> 1: 2 bytes 10 times --> 25.78 Mbps in 0.59 usec
> 2: 3 bytes 10 times --> 38.62 Mbps in 0.59 usec
> 3: 4 bytes 10 times --> 52.88 Mbps in 0.58 usec
>
> I can reproduce that approximately every tenth run.
>
> When binding the processes for exclusive L2 caches (e.g. core 0 and 2) I
> get constant latencies ~1.1us
>
> Matthias
>
> On Monday 05 March 2012 09:52:39 Matthias Jurenz wrote:
> > Here the SM BTL parameters:
> >
> > $ ompi_info --param btl sm
> > MCA btl: parameter "btl_base_verbose" (current value: <0>, data source:
> > default value) Verbosity level of the BTL framework
> > MCA btl: parameter "btl" (current value: , data source:
> > file
> > [/sw/atlas/libraries/openmpi/1.5.5rc3/x86_64/etc/openmpi-mca-params.conf]
> > ) Default selection set of components for the btl framework ( means
> > use all components that can be found)
> > MCA btl: information "btl_sm_have_knem_support" (value: <1>, data source:
> > default value) Whether this component supports the knem Linux kernel
> > module or not
> > MCA btl: parameter "btl_sm_use_knem" (current value: <-1>, data source:
> > default value) Whether knem support is desired or not (negative = try to
> > enable knem support, but continue even if it is not available, 0 = do not
> > enable knem support, positive = try to enable knem support and fail if it
> > is not available)
> > MCA btl: parameter "btl_sm_knem_dma_min" (current value: <0>, data
> > source: default value) Minimum message size (in bytes) to use the knem
> > DMA mode; ignored if knem does not support DMA mode (0 = do not use the
> > knem DMA mode) MCA btl: parameter "btl_sm_knem_max_simultaneous"
> > (current value: <0>, data source: default value) Max number of
> > simultaneous ongoing knem operations to support (0 = do everything
> > synchronously, which probably gives the best large message latency; >0
> > means to do all operations asynchronously, which supports better overlap
> > for simultaneous large message sends)
> > MCA btl: parameter "btl_sm_free_list_num" (current value: <8>, data
> > source: default value)
> > MCA btl: parameter "btl_sm_free_list_max" (current value: <-1>, data
> > source: default value)
> > MCA btl: parameter "btl_sm_free_list_inc" (current value: <64>, data
> > source: default value)
> > MCA btl: parameter "btl_sm_max_procs" (current value: <-1>, data source:
> > default value)
> > MCA btl: parameter "btl_sm_mpool" (current value: , data source:
> > default value)
> > MCA btl: parameter "btl_sm_fifo_size" (current value: <4096>, data
> > source: default value)
> > MCA btl: parameter "btl_sm_num_fifos" (current value: <1>, data source:
> > default val