I guess (r32401).

  George.



On Fri, Aug 1, 2014 at 12:32 PM, Ralph Castain <r...@open-mpi.org> wrote:

> I found the problem - the issue is that assert on the convertor. MPI apps
> are setting that convertor, but not non-MPI apps, and so the field is NULL.
> Can we remove that assert?
>
>
> On Aug 1, 2014, at 9:30 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>
> I missed the fact that the app doesn't force it. But if this is indeed the
> case then it is extremely weird that you are seing someone else releasing
> your proc.
>
> Regarding the destruction of the proc, the OPAL layer only does in a
> single place, when the local proc is set (opal_proc_local_set). Moreover,
> it does call OBJ_RETAIN when it does this, so the proc should not vanish
> without you having control over it.
>
> I looked at the code and noticed that it only crash in apps, the place
> where the ORTE proc is not provided to the OPAL layer.
>
>   George.
>
>
>
> On Fri, Aug 1, 2014 at 12:12 PM, Ralph Castain <r...@open-mpi.org> wrote:
>
>>
>> On Aug 1, 2014, at 8:27 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>
>> This commit brings two things. One if the renaming suggested by Gilles.
>> The second one is forcing the ORTE process down on the OPAL. This doesn't
>> fit the current design of the BTL move. The current design assumes that the
>> local OPAL process is part of the local OMPI process.
>>
>>
>> Your statement isn't accurate - this commit sets the opal_proc_t for all
>> *non-MPI* processes. As the comment in ess_base_std_app.c notes, and the
>> commit message states, ORTE sets and controls the opal_proc_local structure
>> for all ORTE tools and non-MPI procs as (shockingly) they don't call
>> MPI_Init, and hence don't go thru ompi_proc_init, and were therefore
>> leaving the opal_proc_local structure set to the default "nothing" state.
>> This caused all the OPAL layer functions that access it to think nothing
>> had been setup yet.
>>
>> My destruct issue is caused by the OPAL layer destructing the object,
>> which seems odd to me but <shrug>
>>
>>
>>   George.
>>
>> PS: If it doesn't break lose everywhere is because the OMPI layer reset
>> it's own process after the RTE (which explain why Ralph notice that his
>> proc has been OBJ_DESTRUCT).
>>
>>
>>
>> On Fri, Aug 1, 2014 at 10:44 AM, <svn-commit-mai...@open-mpi.org> wrote:
>>
>>> Author: rhc (Ralph Castain)
>>> Date: 2014-08-01 10:44:11 EDT (Fri, 01 Aug 2014)
>>> New Revision: 32398
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/32398
>>>
>>> Log:
>>> Some more cleanups. Remove direct references to ORTE by changing
>>> OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun,
>>> orted, tools) set the OPAL proc structure fields so OPAL knows what is
>>> going on and uses the correct print functions (still need to fix the
>>> problem for non-MPI apps). Properly return uint32_t from the opal utilities
>>> instead of int32_t as that is what the ORTE process name fields contain.
>>>
>>> Thanks to Gilles for pointing out some of the discrepancies.
>>>
>>> Text files modified:
>>>    trunk/ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c |     2
>>>    trunk/ompi/mca/coll/hierarch/coll_hierarch.c        |     2
>>>    trunk/ompi/mca/coll/sm/coll_sm_module.c             |     6 ++--
>>>    trunk/ompi/mca/dpm/orte/dpm_orte.c                  |    10 ++++----
>>>    trunk/ompi/mca/pml/bfo/pml_bfo_failover.c           |     6 ++--
>>>    trunk/ompi/mca/rte/orte/rte_orte.h                  |     2
>>>    trunk/ompi/proc/proc.c                              |    14
>>> ++++++------
>>>    trunk/ompi/runtime/ompi_mpi_abort.c                 |     4 +-
>>>    trunk/ompi/runtime/ompi_mpi_init.c                  |     4 +-
>>>    trunk/opal/util/proc.c                              |     9 +++----
>>>    trunk/opal/util/proc.h                              |     4 +-
>>>    trunk/orte/mca/ess/base/ess_base_std_orted.c        |     9 ++++++++
>>>    trunk/orte/mca/ess/base/ess_base_std_tool.c         |     9 ++++++++
>>>    trunk/orte/mca/ess/hnp/ess_hnp_module.c             |     8 +++++++
>>>    trunk/orte/runtime/orte_init.c                      |    42
>>> ++++++++++++++++++++++++++++++++++++++++
>>>    trunk/orte/util/proc_info.c                         |     6 +++++
>>>    trunk/orte/util/proc_info.h                         |     4 ++
>>>    17 files changed, 108 insertions(+), 33 deletions(-)
>>>
>>>
>>> Diff not shown due to size (21547 bytes).
>>> To see the diff, run the following command:
>>>
>>>         svn diff -r 32397:32398 --no-diff-deleted
>>>
>>> _______________________________________________
>>> svn mailing list
>>> s...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/08/15456.php
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/08/15457.php
>>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15458.php
>
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15459.php
>

Reply via email to