Re: [OMPI devel] orterun busted

r...@open-mpi.org Fri, 23 Jun 2017 06:25:31 -0700

Odd - I guess my machine is just consistently lucky, as was the CI’s when this 
went thru. The problem field is actually stale - we haven’t used it in years - 
so I simply removed it from orte_process_info.


https://github.com/open-mpi/ompi/pull/3741 
<https://github.com/open-mpi/ompi/pull/3741>

Should fix the problem.

> On Jun 23, 2017, at 3:38 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
> Ralph,
> 
> I got consistent segfaults during the infrastructure tearing down in the 
> orterun (I noticed them on a OSX). After digging a little bit it turns out 
> that the opal_buffet_t class has been cleaned-up in orte_finalize before 
> orte_proc_info_finalize is called, leading to calling the destructors into a 
> randomly initialized memory. If I change the order of the teardown to move 
> orte_proc_info_finalize before orte_finalize things work better, but I still 
> get a very annoying warning about a "Bad file descriptor in select".
> 
> Any better fix ?
> 
> George.
> 
> PS: Here is the patch I am currently using to get rid of the segfaults
> 
> diff --git a/orte/tools/orterun/orterun.c b/orte/tools/orterun/orterun.c
> index 85aba0a0f3..506b931d35 100644
> --- a/orte/tools/orterun/orterun.c
> +++ b/orte/tools/orterun/orterun.c
> @@ -222,10 +222,10 @@ int orterun(int argc, char *argv[])
>   DONE:
>      /* cleanup and leave */
>      orte_submit_finalize();
> -    orte_finalize();
> -    orte_session_dir_cleanup(ORTE_JOBID_WILDCARD);
>      /* cleanup the process info */
>      orte_proc_info_finalize();
> +    orte_finalize();
> +    orte_session_dir_cleanup(ORTE_JOBID_WILDCARD);
> 
>      if (orte_debug_flag) {
>          fprintf(stderr, "exiting with status %d\n", orte_exit_status);
> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] orterun busted

Reply via email to