A co-worker here was seeing the following MPI error from his job: [1] Abort: [ldev2:1] Got completion with error, code=1 at line 2148 in file viacheck.c
After some tracking down he found that apparently if he used a "system" call [int system(const char *string)] the next MPI command will fail. I have been able to reproduce this with the attached simple "hello" program. Perhaps someone has seen this type of error? Here is the output from 2 runs: [EMAIL PROTECTED]:~/ior-test 17:04:04 > mpirun_rsh -rsh -hostfile hostfile -np 2 ./hello x ldev1 [0] Abort: [ldev1:0] Got completion with error, code=1 at line 2148 in file viacheck.c ldev2 mpirun_rsh: Abort signaled from [0] done. [EMAIL PROTECTED]:~/ior-test 17:05:23 > mpirun_rsh -rsh -hostfile hostfile -np 2 ./hello now = 0.000000 now = 0.000052 now = 0.000094 now = 0.000121 now = 0.000151 now = 0.001072 now = 0.001102 now = 0.001118 now = 0.001141 now = 0.001160 done. We are running mvapich 0.9.7 and the openib trunk rev 6829. Thanks, Ira
hello.c
Description: Binary data
_______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general