Odd - I guess my machine is just consistently lucky, as was the CI’s when this went thru. The problem field is actually stale - we haven’t used it in years - so I simply removed it from orte_process_info.
https://github.com/open-mpi/ompi/pull/3741 <https://github.com/open-mpi/ompi/pull/3741> Should fix the problem. > On Jun 23, 2017, at 3:38 AM, George Bosilca <bosi...@icl.utk.edu> wrote: > > Ralph, > > I got consistent segfaults during the infrastructure tearing down in the > orterun (I noticed them on a OSX). After digging a little bit it turns out > that the opal_buffet_t class has been cleaned-up in orte_finalize before > orte_proc_info_finalize is called, leading to calling the destructors into a > randomly initialized memory. If I change the order of the teardown to move > orte_proc_info_finalize before orte_finalize things work better, but I still > get a very annoying warning about a "Bad file descriptor in select". > > Any better fix ? > > George. > > PS: Here is the patch I am currently using to get rid of the segfaults > > diff --git a/orte/tools/orterun/orterun.c b/orte/tools/orterun/orterun.c > index 85aba0a0f3..506b931d35 100644 > --- a/orte/tools/orterun/orterun.c > +++ b/orte/tools/orterun/orterun.c > @@ -222,10 +222,10 @@ int orterun(int argc, char *argv[]) > DONE: > /* cleanup and leave */ > orte_submit_finalize(); > - orte_finalize(); > - orte_session_dir_cleanup(ORTE_JOBID_WILDCARD); > /* cleanup the process info */ > orte_proc_info_finalize(); > + orte_finalize(); > + orte_session_dir_cleanup(ORTE_JOBID_WILDCARD); > > if (orte_debug_flag) { > fprintf(stderr, "exiting with status %d\n", orte_exit_status); > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel