Re: [OMPI devel] barrier problem

Jeffrey Squyres Tue, 27 Mar 2012 16:24:53 -0400

FWIW:

1. There were definitely some issues with binding to cores and process layouts 
on Opterons that should be fixed in the 1.5.5 that was finally released today.


2. It is strange that the performance of barrier is so much different between 
1.5.4 and 1.5.5.  Is there a reason you were choosing different algorithm 
numbers between the two?  (one of your command lines had 
"coll_tuned_barrier_algorithm 1", the other had "coll_tuned_barrier_algorithm 
3").


On Mar 23, 2012, at 10:11 AM, Shamis, Pavel wrote:

> Pavel,
> 
> Mvapich implements multicore optimized collectives, which perform 
> substantially better than default algorithms.
> FYI,  ORNL team works on new high performance collectives framework for OMPI. 
> The framework provides significant boost in collectives performance.
> 
> Regards,
> 
> Pavel (Pasha) Shamis
> ---
> Application Performance Tools Group
> Computer Science and Math Division
> Oak Ridge National Laboratory
> 
> 
> 
> 
> 
> 
> On Mar 23, 2012, at 9:17 AM, Pavel Mezentsev wrote:
> 
> I've been comparing 1.5.4 and 1.5.5rc3 with the same parameters that's why I 
> didn't use --bind-to-core. I checked and the usage of --bind-to-core improved 
> the result comparing to 1.5.4:
> #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
>         1000        84.96        85.08        85.02
> 
> So I guess with 1.5.5 the processes move from core to core within node even 
> though I use all cores, right? Then why 1.5.4 behaves differently?
> 
> I need --bind-to-core in some cases and that's why I need 1.5.5rc3 instead of 
> more stable 1.5.4. I know that I can use numactl explicitly but 
> --bind-to-core is more convinient :)
> 
> 2012/3/23 Ralph Castain <r...@open-mpi.org<mailto:r...@open-mpi.org>>
> I don't see where you told OMPI to --bind-to-core. We don't automatically 
> bind, so you have to explicitly tell us to do so.
> 
> On Mar 23, 2012, at 6:20 AM, Pavel Mezentsev wrote:
> 
>> Hello
>> 
>> I'm doing some testing with IMB and dicovered a strange thing:
>> 
>> Since I have a system with new AMD opteron 6276 processors I'm using 
>> 1.5.5rc3 since it supports binding to cores.
>> 
>> But when I run the barrier test form intel mpi benchmarks, the best I get is:
>> #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
>>          598     15159.56     15211.05     15184.70
>> (/opt/openmpi-1.5.5rc3/intel12/bin/mpirun -x OMP_NUM_THREADS=1  -hostfile 
>> hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca 
>> coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 1 -np 256 
>> openmpi-1.5.5rc3/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256 
>> barrier)
>> 
>> And with openmpi 1.5.4 the result is much better:
>> #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
>>         1000       113.23       113.33       113.28
>> 
>> (/opt/openmpi-1.5.4/intel12/bin/mpirun -x OMP_NUM_THREADS=1  -hostfile 
>> hosts_all2all_2 -npernode 32 --mca btl openib,sm,self -mca 
>> coll_tuned_use_dynamic_rules 1 -mca coll_tuned_barrier_algorithm 3 -np 256 
>> openmpi-1.5.4/intel12/IMB-MPI1 -off_cache 16,64 -msglog 1:16 -npmin 256 
>> barrier)
>> 
>> and still I couldn't come close to the result I got with mvapich:
>> #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
>>         1000        17.51        17.53        17.53
>> 
>> (/opt/mvapich2-1.8/intel12/bin/mpiexec.hydra -env OMP_NUM_THREADS 1 
>> -hostfile hosts_all2all_2 -np 256 mvapich2-1.8/intel12/IMB-MPI1 -mem 2 
>> -off_cache 16,64 -msglog 1:16 -npmin 256 barrier)
>> 
>> I dunno if this is a bug or me doing something not the way I should. So is 
>> there a way to improve my results?
>> 
>> Best regards,
>> Pavel Mezentsev
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org<mailto:de...@open-mpi.org>
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org<mailto:de...@open-mpi.org>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org<mailto:de...@open-mpi.org>
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] barrier problem

Reply via email to