Hi, all I faced the "Unbelievable situation"
during running IMB benchmark. /home/USERS/lenny/OMPI_ORTE_LMC/bin/mpirun -np 96 --bynode -hostfile hostfile_ompi -mca btl_openib_max_lmc 1 ./IMB-MPI1 PingPong PingPing Sendrecv Exchange Allreduce Reduce Reduce_scatter Bcast Barrier #---------------------------------------------------------------- # Benchmarking Allreduce # #processes = 96 #---------------------------------------------------------------- #Benchmarking #procs #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Allreduce 96 0 1000 0.02 0.03 0.02 Allreduce 96 4 1000 297.88 298.07 297.95 Allreduce 96 8 1000 296.15 296.32 296.24 Allreduce 96 16 1000 297.99 298.17 298.09 Allreduce 96 32 1000 296.97 297.20 297.04 Allreduce 96 64 1000 298.43 298.64 298.49 Allreduce 96 128 1000 296.86 297.07 296.93 Allreduce 96 256 1000 298.00 298.30 298.09 Allreduce 96 512 1000 296.79 296.96 296.85 Allreduce 96 1024 1000 299.23 299.39 299.31 Allreduce 96 2048 1000 295.51 295.64 295.57 Allreduce 96 4096 1000 246.02 246.13 246.08 Allreduce 96 8192 1000 492.52 492.74 492.63 Allreduce 96 16384 1000 5380.59 5381.47 5381.10 Allreduce 96 32768 1000 5372.86 5373.69 5373.36 Allreduce 96 65536 640 5470.41 5471.88 5471.16 Allreduce 96 131072 320 5554.52 5556.82 5555.75 [witch24:15639] Unbelievable situation ... we got a duplicated fragment with seq number of 0 (expected 65534) from witch23 [witch24:15639] Unbelievable situation ... we got a duplicated fragment with seq number of 65116 (expected 65534) from witch23 [witch24:15639] *** Process received signal *** [witch24:15639] Signal: Segmentation fault (11) [witch24:15639] Signal code: Address not mapped (1) [witch24:15639] Failing at address: 0x632457d0 [witch24:15639] [ 0] /lib64/libpthread.so.0 [0x2b7929a9bc10] [witch24:15639] [ 1] /home/USERS/lenny/OMPI_ORTE_LMC/lib/openmpi/mca_allocator_bucket.so [0x2b792aa47d34] [witch24:15639] [ 2] /home/USERS/lenny/OMPI_ORTE_LMC/lib/openmpi/mca_pml_ob1.so [0x2b792b172163] [witch24:15639] [ 3] /home/USERS/lenny/OMPI_ORTE_LMC/lib/openmpi/mca_btl_openib.so [0x2b792b6b0772] [witch24:15639] [ 4] /home/USERS/lenny/OMPI_ORTE_LMC/lib/openmpi/mca_btl_openib.so [0x2b792b6b15ff] [witch24:15639] [ 5] /home/USERS/lenny/OMPI_ORTE_LMC/lib/openmpi/mca_bml_r2.so [0x2b792b38307f] [witch24:15639] [ 6] /home/USERS/lenny/OMPI_ORTE_LMC/lib/libopen-pal.so.0(opal_progress+0x4a) [0x2b79294cd16a] [witch24:15639] [ 7] /home/USERS/lenny/OMPI_ORTE_LMC/lib/libmpi.so.0 [0x2b79292163a8] [witch24:15639] [ 8] /home/USERS/lenny/OMPI_ORTE_LMC/lib/openmpi/mca_coll_tuned.so [0x2b792c077cb7] [witch24:15639] [ 9] /home/USERS/lenny/OMPI_ORTE_LMC/lib/openmpi/mca_coll_tuned.so [0x2b792c07b296] [witch24:15639] [10] /home/USERS/lenny/OMPI_ORTE_LMC/lib/libmpi.so.0(PMPI_Allreduce+0x1e7) [0x2b7929229907] [witch24:15639] [11] ./IMB-MPI1(IMB_allreduce+0x8e) [0x40764e] [witch24:15639] [12] ./IMB-MPI1(main+0x3aa) [0x4034ea] [witch24:15639] [13] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b7929bc2154] [witch24:15639] [14] ./IMB-MPI1 [0x4030a9] [witch24:15639] *** End of error message *** ------------------------------------------------------------------------ -- Best Regards, Lenny.