I guess (r32401). George.
On Fri, Aug 1, 2014 at 12:32 PM, Ralph Castain <r...@open-mpi.org> wrote: > I found the problem - the issue is that assert on the convertor. MPI apps > are setting that convertor, but not non-MPI apps, and so the field is NULL. > Can we remove that assert? > > > On Aug 1, 2014, at 9:30 AM, George Bosilca <bosi...@icl.utk.edu> wrote: > > I missed the fact that the app doesn't force it. But if this is indeed the > case then it is extremely weird that you are seing someone else releasing > your proc. > > Regarding the destruction of the proc, the OPAL layer only does in a > single place, when the local proc is set (opal_proc_local_set). Moreover, > it does call OBJ_RETAIN when it does this, so the proc should not vanish > without you having control over it. > > I looked at the code and noticed that it only crash in apps, the place > where the ORTE proc is not provided to the OPAL layer. > > George. > > > > On Fri, Aug 1, 2014 at 12:12 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> >> On Aug 1, 2014, at 8:27 AM, George Bosilca <bosi...@icl.utk.edu> wrote: >> >> This commit brings two things. One if the renaming suggested by Gilles. >> The second one is forcing the ORTE process down on the OPAL. This doesn't >> fit the current design of the BTL move. The current design assumes that the >> local OPAL process is part of the local OMPI process. >> >> >> Your statement isn't accurate - this commit sets the opal_proc_t for all >> *non-MPI* processes. As the comment in ess_base_std_app.c notes, and the >> commit message states, ORTE sets and controls the opal_proc_local structure >> for all ORTE tools and non-MPI procs as (shockingly) they don't call >> MPI_Init, and hence don't go thru ompi_proc_init, and were therefore >> leaving the opal_proc_local structure set to the default "nothing" state. >> This caused all the OPAL layer functions that access it to think nothing >> had been setup yet. >> >> My destruct issue is caused by the OPAL layer destructing the object, >> which seems odd to me but <shrug> >> >> >> George. >> >> PS: If it doesn't break lose everywhere is because the OMPI layer reset >> it's own process after the RTE (which explain why Ralph notice that his >> proc has been OBJ_DESTRUCT). >> >> >> >> On Fri, Aug 1, 2014 at 10:44 AM, <svn-commit-mai...@open-mpi.org> wrote: >> >>> Author: rhc (Ralph Castain) >>> Date: 2014-08-01 10:44:11 EDT (Fri, 01 Aug 2014) >>> New Revision: 32398 >>> URL: https://svn.open-mpi.org/trac/ompi/changeset/32398 >>> >>> Log: >>> Some more cleanups. Remove direct references to ORTE by changing >>> OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, >>> orted, tools) set the OPAL proc structure fields so OPAL knows what is >>> going on and uses the correct print functions (still need to fix the >>> problem for non-MPI apps). Properly return uint32_t from the opal utilities >>> instead of int32_t as that is what the ORTE process name fields contain. >>> >>> Thanks to Gilles for pointing out some of the discrepancies. >>> >>> Text files modified: >>> trunk/ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c | 2 >>> trunk/ompi/mca/coll/hierarch/coll_hierarch.c | 2 >>> trunk/ompi/mca/coll/sm/coll_sm_module.c | 6 ++-- >>> trunk/ompi/mca/dpm/orte/dpm_orte.c | 10 ++++---- >>> trunk/ompi/mca/pml/bfo/pml_bfo_failover.c | 6 ++-- >>> trunk/ompi/mca/rte/orte/rte_orte.h | 2 >>> trunk/ompi/proc/proc.c | 14 >>> ++++++------ >>> trunk/ompi/runtime/ompi_mpi_abort.c | 4 +- >>> trunk/ompi/runtime/ompi_mpi_init.c | 4 +- >>> trunk/opal/util/proc.c | 9 +++---- >>> trunk/opal/util/proc.h | 4 +- >>> trunk/orte/mca/ess/base/ess_base_std_orted.c | 9 ++++++++ >>> trunk/orte/mca/ess/base/ess_base_std_tool.c | 9 ++++++++ >>> trunk/orte/mca/ess/hnp/ess_hnp_module.c | 8 +++++++ >>> trunk/orte/runtime/orte_init.c | 42 >>> ++++++++++++++++++++++++++++++++++++++++ >>> trunk/orte/util/proc_info.c | 6 +++++ >>> trunk/orte/util/proc_info.h | 4 ++ >>> 17 files changed, 108 insertions(+), 33 deletions(-) >>> >>> >>> Diff not shown due to size (21547 bytes). >>> To see the diff, run the following command: >>> >>> svn diff -r 32397:32398 --no-diff-deleted >>> >>> _______________________________________________ >>> svn mailing list >>> s...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/svn >>> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/08/15456.php >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/08/15457.php >> > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15458.php > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15459.php >