Okay, fixed and cmr'd to you
On Mar 18, 2014, at 11:00 AM, Ralph Castain <r...@open-mpi.org> wrote: > > On Mar 18, 2014, at 10:54 AM, Dave Goodell (dgoodell) <dgood...@cisco.com> > wrote: > >> Ralph, >> >> I'm seeing problems with MPIEXEC_TIMEOUT in v1.7 @ r31103 (fairly close to >> HEAD): >> >> ----8<---- >> MPIEXEC_TIMEOUT=8 mpirun --mca btl usnic,sm,self -np 4 ./sleeper >> -------------------------------------------------------------------------- >> The user-provided time limit for job execution has been >> reached: >> >> MPIEXEC_TIMEOUT: 8 seconds >> >> The job will now be aborted. Please check your code and/or >> adjust/remove the job execution time limit (as specified >> by MPIEXEC_TIMEOUT in your environment). >> >> -------------------------------------------------------------------------- >> srun: error: mpi015: task 0: Killed >> srun: Terminating job step 689585.2 >> srun: Job step aborted: Waiting up to 2 seconds for job step to finish. >> ^C[savbu-usnic-a:26668] [[14634,0],0]->[[14634,0],1] >> mca_oob_tcp_msg_send_bytes: write failed: Connection reset by peer (104) [sd >> = 16] >> [savbu-usnic-a:26668] [[14634,0],0]-[[14634,0],1] >> mca_oob_tcp_peer_send_handler: unable to send header >> >> ^CAbort is in progress...hit ctrl-c again within 5 seconds to forcibly >> terminate >> >> ^C >> ----8<---- >> >> Where each of the "^C" is a ctrl-c with arbitrary was allowed to pass >> beforehand (several minutes for the first two, <5s in the third). >> >> Where "sleeper" is just an MPI program that does: >> >> ----8<---- >> MPI_Init(&argc, &argv); >> MPI_Comm_rank(MPI_COMM_WORLD, &wrank); >> MPI_Comm_size(MPI_COMM_WORLD, &wsize); >> >> while (1) { >> sleep(60); >> } >> >> MPI_Finalize(); >> ----8<---- >> >> It happens under slurm and SSH. If I launch on localhost (no >> --host/--hostfile option, no slurm, etc.) then it exits just fine. The >> example output I gave above used the "usnic" BTL, but "tcp" has identical >> behavior. >> >> This worked fine in v1.7.4. I've bisected the change in behavior down to >> r30981: https://svn.open-mpi.org/trac/ompi/changeset/30981 >> >> Should I file a ticket? >> > > Crud - no, I'll take a look in a little bit > > >> -Dave >> >