Hey, all, I'm not sure if this is a known bug or some sort of limitation I'm unaware of, but I've been building and testing with the OFED 1.3 GA release on a small fabric that has a mix of Arbel-based and newer Connect-X HCAs. What I've discovered is that mvapich and openmpi work fine across the entire fabric, but mvapich2 crashes when I use a mix of Arbels and Connect-X. The errors vary depending on the test program but here's an example: [EMAIL PROTECTED] IMB-3.0]$ mpirun -n 5 ./IMB-MPI1 . . . (output snipped) . . .
#----------------------------------------------------------------------- ------ # Benchmarking Sendrecv # #processes = 2 # ( 3 additional processes waiting in MPI_Barrier) #----------------------------------------------------------------------- ------ #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 3.51 3.51 3.51 0.00 1 1000 3.63 3.63 3.63 0.52 2 1000 3.67 3.67 3.67 1.04 4 1000 3.64 3.64 3.64 2.09 8 1000 3.67 3.67 3.67 4.16 16 1000 3.67 3.67 3.67 8.31 32 1000 3.74 3.74 3.74 16.32 64 1000 3.90 3.90 3.90 31.28 128 1000 4.75 4.75 4.75 51.39 256 1000 5.21 5.21 5.21 93.79 512 1000 5.96 5.96 5.96 163.77 1024 1000 7.88 7.89 7.89 247.54 2048 1000 11.42 11.42 11.42 342.00 4096 1000 15.33 15.33 15.33 509.49 8192 1000 22.19 22.20 22.20 703.83 16384 1000 34.57 34.57 34.57 903.88 32768 1000 51.32 51.32 51.32 1217.94 65536 640 85.80 85.81 85.80 1456.74 131072 320 155.23 155.24 155.24 1610.40 262144 160 301.84 301.86 301.85 1656.39 524288 80 598.62 598.69 598.66 1670.31 1048576 40 1175.22 1175.30 1175.26 1701.69 2097152 20 2309.05 2309.05 2309.05 1732.32 4194304 10 4548.72 4548.98 4548.85 1758.64 [0] Abort: Got FATAL event 3 at line 796 in file ibv_channel_manager.c rank 0 in job 1 compute-0-0.local_36049 caused collective abort of all ranks exit status of rank 0: killed by signal 9 If, however, I define my mpdring to contain only Connect-X systems OR only Arbel systems, IMB-MPI1 runs to completion. Can any suggest a workaround or is this a real bug with mvapich2? -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania
_______________________________________________ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general