
Mvapich implements multicore optimized collectives, which perform substantially 
better than default algorithms.
FYI,  ORNL team works on new high performance collectives framework for OMPI. 
The framework provides significant boost in collectives performance.


Pavel (Pasha) Shamis
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory

On Mar 23, 2012, at 9:17 AM, Pavel Mezentsev wrote:

I've been comparing 1.5.4 and 1.5.5rc3 with the same parameters that's why I 
didn't use --bind-to-core. I checked and the usage of --bind-to-core improved 
the result comparing to 1.5.4:
#repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
         1000        84.96        85.08        85.02

So I guess with 1.5.5 the processes move from core to core within node even 
though I use all cores, right? Then why 1.5.4 behaves differently?

I need --bind-to-core in some cases and that's why I need 1.5.5rc3 instead of 
more stable 1.5.4. I know that I can use numactl explicitly but --bind-to-core 
is more convinient :)

2012/3/23 Ralph Castain <<>>
I don't see where you told OMPI to --bind-to-core. We don't automatically bind, 
so you have to explicitly tell us to do so.

On Mar 23, 2012, at 6:20 AM, Pavel Mezentsev wrote:

> Hello
> I'm doing some testing with IMB and dicovered a strange thing:
> Since I have a system with new AMD opteron 6276 processors I'm using 1.5.5rc3 
> since it supports binding to cores.
> But when I run the barrier test form intel mpi benchmarks, the best I get is:
> #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
>           598     15159.56     15211.05     15184.70
>  (/opt/openmpi-1.5.5rc3/intel12/bin/mpirun -x OMP_NUM_THREADS=1  -hostfile 
> hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca 
> coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 1 -np 256 
> openmpi-1.5.5rc3/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256 
> barrier)
> And with openmpi 1.5.4 the result is much better:
> #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
>          1000       113.23       113.33       113.28
> (/opt/openmpi-1.5.4/intel12/bin/mpirun -x OMP_NUM_THREADS=1  -hostfile 
> hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca 
> coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 3 -np 256 
> openmpi-1.5.4/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256 
> barrier)
> and still I couldn't come close to the result I got with mvapich:
> #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
>          1000        17.51        17.53        17.53
> (/opt/mvapich2-1.8/intel12/bin/mpiexec.hydra -env OMP_NUM_THREADS 1 -hostfile 
> hosts_all2all_2 -np 256 mvapich2-1.8/intel12/IMB-MPI1 -mem 2 -off_cache 16,64 
> -msglog 1:16 -npmin 256 barrier)
> I dunno if this is a bug or me doing something not the way I should. So is 
> there a way to improve my results?
> Best regards,
> Pavel Mezentsev
> _______________________________________________
> devel mailing list

devel mailing list<>

devel mailing list<>

Reply via email to