Try manually specifying the collective component "-mca coll tuned" You seem to be using the "sync" collective component, any stale mca param files lying around ?
--Nysal On Tue, Jan 11, 2011 at 6:28 PM, Doron Shoham <doron.o...@gmail.com> wrote: > Hi > > All machines on the setup are IDataPlex with Nehalem 12 cores per node, > 24GB memory. > > > > · *Problem 1 – OMPI 1.4.3 hangs in gather:* > > > > I’m trying to run IMB and gather operation with OMPI 1.4.3 (Vanilla). > > It happens when np >= 64 and message size exceed 4k: > > mpirun -np 64 -machinefile voltairenodes -mca btl sm,self,openib > imb/src-1.4.2/IMB-MPI1 gather –npmin 64 > > > > voltairenodes consists of 64 machines. > > > > #---------------------------------------------------------------- > > # Benchmarking Gather > > # #processes = 64 > > #---------------------------------------------------------------- > > #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] > > 0 1000 0.02 0.02 0.02 > > 1 331 14.02 14.16 14.09 > > 2 331 12.87 13.08 12.93 > > 4 331 14.29 14.43 14.34 > > 8 331 16.03 16.20 16.11 > > 16 331 17.54 17.74 17.64 > > 32 331 20.49 20.62 20.53 > > 64 331 23.57 23.84 23.70 > > 128 331 28.02 28.35 28.18 > > 256 331 34.78 34.88 34.80 > > 512 331 46.34 46.91 46.60 > > 1024 331 63.96 64.71 64.33 > > 2048 331 460.67 465.74 463.18 > > 4096 331 637.33 643.99 640.75 > > > > This the padb output: > > padb –A –x –Ormgr=mpirun –tree: > > > > =~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2011.01.06 14:33:17 > =~=~=~=~=~=~=~=~=~=~=~= > > > > Warning, remote process state differs across ranks > > state : ranks > > R (running) : > [1,3-6,8,10-13,16-20,23-28,30-32,34-42,44-45,47-49,51-53,56-59,61-63] > > S (sleeping) : [0,2,7,9,14-15,21-22,29,33,43,46,50,54-55,60] > > Stack trace(s) for thread: 1 > > ----------------- > > [0-63] (64 processes) > > ----------------- > > main() at ?:? > > IMB_init_buffers_iter() at ?:? > > IMB_gather() at ?:? > > PMPI_Gather() at pgather.c:175 > > mca_coll_sync_gather() at coll_sync_gather.c:46 > > ompi_coll_tuned_gather_intra_dec_fixed() at > coll_tuned_decision_fixed.c:714 > > ----------------- > > [0,3-63] (62 processes) > > ----------------- > > ompi_coll_tuned_gather_intra_linear_sync() at > coll_tuned_gather.c:248 > > mca_pml_ob1_recv() at pml_ob1_irecv.c:104 > > ompi_request_wait_completion() at > ../../../../ompi/request/request.h:375 > > opal_condition_wait() at > ../../../../opal/threads/condition.h:99 > > ----------------- > > [1] (1 processes) > > ----------------- > > ompi_coll_tuned_gather_intra_linear_sync() at > coll_tuned_gather.c:302 > > mca_pml_ob1_send() at pml_ob1_isend.c:125 > > ompi_request_wait_completion() at > ../../../../ompi/request/request.h:375 > > opal_condition_wait() at > ../../../../opal/threads/condition.h:99 > > ----------------- > > [2] (1 processes) > > ----------------- > > ompi_coll_tuned_gather_intra_linear_sync() at > coll_tuned_gather.c:315 > > ompi_request_default_wait() at request/req_wait.c:37 > > ompi_request_wait_completion() at > ../ompi/request/request.h:375 > > opal_condition_wait() at ../opal/threads/condition.h:99 > > Stack trace(s) for thread: 2 > > ----------------- > > [0-63] (64 processes) > > ----------------- > > start_thread() at ?:? > > btl_openib_async_thread() at btl_openib_async.c:344 > > poll() at ?:? > > Stack trace(s) for thread: 3 > > ----------------- > > [0-63] (64 processes) > > ----------------- > > start_thread() at ?:? > > service_thread_start() at btl_openib_fd.c:427 > > select() at ?:? > > -bash-3.2$ > > > > > > When running again padb after couple of minutes, I can see that the total > number of processes remain in the same position but > > different processes are at different positions. > > For example, this is the diff between two padb outputs: > > > > Warning, remote process state differs across ranks > > state : ranks > > -R (running) : [0,2-4,6-13,16-18,20-21,28-31,33-36,38-56,58,60,62-63] > > -S (sleeping) : [1,5,14-15,19,22-27,32,37,57,59,61] > > +R (running) : [2,5-14,16-23,25,28-40,42-48,50-51,53-58,61,63] > > +S (sleeping) : [0-1,3-4,15,24,26-27,41,49,52,59-60,62] > > Stack trace(s) for thread: 1 > > ----------------- > > [0-63] (64 processes) > > @@ -13,21 +13,21 @@ > > mca_coll_sync_gather() at coll_sync_gather.c:46 > > ompi_coll_tuned_gather_intra_dec_fixed() at coll_tuned_decision_fixed.c:714 > > ----------------- > > - [0,3-63] (62 processes) > > + [0-5,8-63] (62 processes) > > ----------------- > > ompi_coll_tuned_gather_intra_linear_sync() at coll_tuned_gather.c:248 > > mca_pml_ob1_recv() at pml_ob1_irecv.c:104 > > ompi_request_wait_completion() at ../../../../ompi/request/request.h:375 > > opal_condition_wait() at ../../../../opal/threads/condition.h:99 > > ----------------- > > - [1] (1 processes) > > + [6] (1 processes) > > ----------------- > > ompi_coll_tuned_gather_intra_linear_sync() at coll_tuned_gather.c:302 > > mca_pml_ob1_send() at pml_ob1_isend.c:125 > > ompi_request_wait_completion() at ../../../../ompi/request/request.h:375 > > opal_condition_wait() at ../../../../opal/threads/condition.h:99 > > ----------------- > > - [2] (1 processes) > > + [7] (1 processes) > > ----------------- > > ompi_coll_tuned_gather_intra_linear_sync() at coll_tuned_gather.c:315 > > ompi_request_default_wait() at request/req_wait.c:37 > > > > > > *Choosing different gather algorithm seems to bypass the hang.* > > I’ve used the following mca parameters: > > --mca coll_tuned_use_dynamic_rules 1 > > --mca coll_tuned_gather_algorithm 1 > > > > Actually, both dec_fixed and basic_linear works while binomial and > linear_sync doesn’t. > > > > With OMPI 1.5 it doesn’t hangs (with all gather algorithms) and it much > faster (the number of repetitions is much higher): > > #---------------------------------------------------------------- > > # Benchmarking Gather > > # #processes = 64 > > #---------------------------------------------------------------- > > #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] > > 0 1000 0.02 0.03 0.02 > > 1 1000 18.50 18.55 18.53 > > 2 1000 18.17 18.25 18.22 > > 4 1000 19.04 19.10 19.07 > > 8 1000 19.60 19.67 19.64 > > 16 1000 21.39 21.47 21.43 > > 32 1000 24.83 24.91 24.87 > > 64 1000 27.35 27.45 27.40 > > 128 1000 33.23 33.34 33.29 > > 256 1000 41.24 41.39 41.32 > > 512 1000 52.62 52.81 52.71 > > 1024 1000 73.20 73.46 73.32 > > 2048 1000 416.36 418.04 417.22 > > 4096 1000 638.54 640.70 639.65 > > 8192 1000 506.26 506.97 506.63 > > 16384 1000 600.63 601.40 601.02 > > 32768 1000 639.52 640.34 639.93 > > 65536 640 914.22 916.02 915.13 > > 131072 320 2287.37 2295.18 2291.35 > > 262144 160 4041.36 4070.58 4056.27 > > 524288 80 7292.35 7463.27 7397.14 > > 1048576 40 13647.15 14107.15 13905.29 > > 2097152 20 30625.00 32635.45 31815.36 > > 4194304 10 63543.01 70987.49 68680.48 > > > > > > · *Problem 2 – segmentation fault with OMPI 1.4.3/1.5 and IMB > gather np=768:* > > When trying to run the same command but with np=768 I get segmentation > fault: > > openmpi-1.4.3/bin/mpirun -np 768 -machinefile voltairenodes -mca btl > sm,self,openib -mca coll_tuned_use_dynamic_rules 1 -mca > coll_tuned_gather_algorithm 1 imb/src/IMB-MPI1 gather -npmin 768 -mem 1.6 > > > > This happens in OMPI 1.4.3 and 1.5 > > > > [compa163:20249] *** Process received signal *** > > [compa163:20249] Signal: Segmentation fault (11) > > [compa163:20249] Signal code: Address not mapped (1) > > [compa163:20249] Failing at address: 0x2aab4a204000 > > [compa163:20249] [ 0] /lib64/libpthread.so.0 [0x366aa0e7c0] > > [compa163:20249] [ 1] > /gpfs/asrc/home/voltaire/install//openmpi-1.4.3/lib/libmpi.so.0(ompi_convertor_unpack+0x15f) > [0x2b077882282e] > > [compa163:20249] [ 2] > /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_pml_ob1.so > [0x2b077b9e1672] > > [compa163:20249] [ 3] > /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_pml_ob1.so > [0x2b077b9dd0b6] > > [compa163:20249] [ 4] > /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_btl_sm.so > [0x2b077c459d87] > > [compa163:20249] [ 5] > /gpfs/asrc/home/voltaire/install//openmpi-1.4.3/lib/libopen-pal.so.0(opal_progress+0xbe) > [0x2b0778d845b8] > > [compa163:20249] [ 6] > /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_pml_ob1.so > [0x2b077b9d6d62] > > [compa163:20249] [ 7] > /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_pml_ob1.so > [0x2b077b9d6ba7] > > [compa163:20249] [ 8] > /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_pml_ob1.so > [0x2b077b9d6a90] > > [compa163:20249] [ 9] > /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_coll_tuned.so > [0x2b077d298dc5] > > [compa163:20249] [10] > /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_coll_tuned.so > [0x2b077d2990d3] > > [compa163:20249] [11] > /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_coll_tuned.so > [0x2b077d286e9b] > > [compa163:20249] [12] > /gpfs/asrc/home/voltaire/install/openmpi-1.4.3/lib/openmpi/mca_coll_sync.so > [0x2b077d07e96c] > > [compa163:20249] [13] > /gpfs/asrc/home/voltaire/install//openmpi-1.4.3/lib/libmpi.so.0(PMPI_Gather+0x55e) > [0x2b077883ec9a] > > [compa163:20249] [14] imb/src/IMB-MPI1(IMB_gather+0xe8) [0x40a088] > > [compa163:20249] [15] imb/src/IMB-MPI1(IMB_init_buffers_iter+0x28a) > [0x405baa] > > [compa163:20249] [16] imb/src/IMB-MPI1(main+0x30f) [0x40362f] > > [compa163:20249] [17] /lib64/libc.so.6(__libc_start_main+0xf4) > [0x3669e1d994] > > [compa163:20249] [18] imb/src/IMB-MPI1 [0x403269] > [compa163:20249] *** End of error message *** > > > Any ideas? More debuggin tips? > > Thanks, > Doron > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >