Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-18 Thread Ralph Castain
I don't recommend our solution as a general approach - we moved the object 
instance to the framework base so it never goes out of memory.

Regardless, it seems to me that proper cleanup is the better solution, although 
it means work. I've asked that it be added to next week's telecon agenda so we 
can reach some resolution.


On Jul 18, 2014, at 9:35 AM, Gilles Gouaillardet 
 wrote:

> It would make sense, though I guess I always thought that was part of what 
> happened in OBJ_CLASS_INSTANCE - guess I was wrong. My thinking was that 
> DEREGISTER would be the counter to INSTANCE, and I do want to keep this from 
> getting even more clunky - so maybe renaming INSTANCE to be REGISTER and 
> completing the initialization inside it would be the way to go. Or renaming 
> DEREGISTER to something more obviously the counter to INSTANCE?
> 
> 
> just so we are clear :
> 
> on one hand OBJ_CLASS_INSTANCE is a macro that must be invoked "outside" of a 
> function :
> It *statically* initializes a struct.
> 
> on the other hand, OBJ_CLASS_DEREGISTER is a macro that must be invoked 
> inside a function.
> 
> using OBJ_CLASS_REGISTER is not only about renaming, it also requires to move 
> all these invokations into functions.
> 
> my idea of having both OBJ_CLASS_INSTANCE and OBJ_CLASS_REGISTER is :
> - we do not need to move OBJ_CLASS_INSTANCE into functions
> - we can have two behaviours depending on OPAL_ENABLE_DEBUG :
> OBJ_CLASS_REGISTER would simply do nothing if OPAL_ENABLE_DEBUG is zero (and 
> opal_class_initialize would still be invoked in opal_obj_new). that could 
> also be a bit faster than having only one OBJ_CLASS_REGISTER macro in 
> optimized mode.
> 
> that being said, i am also fine with simplifying this, remove 
> OBJ_CLASS_INSTANCE and use OBJ_CLASS_REGISTER and OBJ_CLASS_DEREGISTER
> 
> 
> about the bug you hit, did you already solve it and how ?
> a trivial workaround is not to dlclose the dynamic library (ok, that's 
> cheating ...)
> a simple workaround (if it is even doable) is to declare the class "somewhere 
> else" so the (library containing the) class struct is not dlclose'd before it 
> is invoked (ok, that's ugly ...).
> 
> what i wrote earlier was misleading :
> OBJ_CLASS_INSTANCE(class);
> foo = OBJ_NEW(class);
> then
> opal_class_t class_class = {...};
> foo->super.obj_class = _class;
> 
> class_class is no more accessible when the OBJ_RELEASE is called since the 
> library was dlclose'd, so you do not even get a change to invoke the 
> destructor ...
>  
> a possible workaround could be to malloc a copy of class_class, have 
> foo->super.obj_class point to it after each OBJ_NEW, and finally have its 
> cls_destruct_array point to NULL when closing the framework/component.
> (of course that causes a leak ...)
> 
> Cheers,
> 
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15198.php



Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-18 Thread Ralph Castain

On Jul 18, 2014, at 10:24 AM, George Bosilca  wrote:

> 1. If I remember correctly, this topic has already been raised in the Forum. 
> And the decision was to maintain the current behavior (tools and MPI 
> init/fini are independent/disconnected).
> 
> 2. Having to manually set a global flag in order to correctly finalize a 
> library is HORRIBLE by any reasonable CS standards.

As I said in my original note, we don't have to set a global flag. All you have 
to do is decrement the already-existing reference counter that tracks how many 
times we called init_util, indicating that you are done with it so it can go 
ahead and truly finalize on next invocation. This is a typical symmetrical 
operation. All we are doing is correctly communicating to the library that we 
don't want it to actually tear things down at this time.

> 
> 3. Let's not go in shadowy corners of the MPI_T usage, and stay mainstream. 
> Here is a partial snippet of the most usual way the tool interface is 
> supposed to be used.
> 
> MPI_T_init_thread(MPI_THREAD_SINGLE, );
> ...
> MPI_Init(, );
> MPI_Finalize();
>   
>   With the proposed patch, we clean up all OPAL memory as soon as we reach 
> the MPI_Finalize (aka. without the call to MPI_T_finalize).

Are you referring to Nathan's patch? In that case, your statement isn't correct 
- the destructor only gets run at the end of the user's program, and thus the 
OPAL memory will not be cleaned up until that time.

>  All MPI_T calls after MPI_Finalize will trigger a segfault.
> 
>   George.
> 
> 
> 
> On Thu, Jul 17, 2014 at 10:55 PM, Ralph Castain  wrote:
> As I said, I don't know which solution is the one to follow - they both have 
> significant "ick" factors, though I wouldn't go so far as to characterize 
> either of them as "horrible". Not being "clean" after calling MPI_Finalize 
> seems just as strange.
> 
> Nathan and I did discuss the init-after-finalize issue, and he intends to 
> raise it with the Forum as it doesn't seem a logical thing to do. So that 
> issue may go away. Still leaves us pondering the right solution, and 
> hopefully coming up with something better than either of the ones we have so 
> far.
> 
> 
> On Jul 17, 2014, at 7:48 PM, George Bosilca  wrote:
> 
>> I think Case #1 is only a partial solution, as it only solves the example 
>> attached to the ticket. Based on my reading the the tool chapter calling 
>> MPI_T_init after MPI_Finalize is legit, and this case is not covered by the 
>> patch. But this is not the major issue I have with this patch. From a coding 
>> perspective, it makes the initialization of OPAL horribly unnatural, 
>> requiring any other layer using OPAL to make a horrible gymnastic just to 
>> tear it down correctly (setting opal_init_util_init_extra to the right 
>> value).
>> 
>>   George.
>> 
>> 
>> 
>> On Wed, Jul 16, 2014 at 11:29 AM, Pritchard, Howard r  
>> wrote:
>> HI Folks,
>> 
>> I vote for solution #1.  Doesn't change current behavior.  Doesn't open the 
>> door to becoming dependent on availability of
>> ctor/dtor feature in future toolchains.
>> 
>> Howard
>> 
>> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
>> Sent: Wednesday, July 16, 2014 9:08 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function 
>> to opal
>> 
>> On Wed, Jul 16, 2014 at 07:59:14AM -0700, Ralph Castain wrote:
>> > I discussed this over IM with Nathan to try and get a better understanding 
>> > of the options. Basically, we have two approaches available to us:
>> >
>> > 1. my solution resolves the segv problem and eliminates leaks so long as 
>> > the user calls MPI_Init/Finalize after calling the MPI_T init/finalize 
>> > functions. This method will still leak memory if the user doesn't use MPI 
>> > after calling the MPI_T functions, but does mean that all memory used by 
>> > MPI will be released upon MPI_Finalize. So if the user program continues 
>> > beyond MPI, they won't be carrying the MPI memory footprint with them. 
>> > This continues our current behavior.
>> >
>> > 2. the destructor method, which release the MPI memory footprint upon 
>> > final program termination instead of at MPI_Finalize. This also solves the 
>> > segv and leak problems, and ensures that someone calling only the MPI_T 
>> > init/finalize functions will be valgrind-clean, but means that a user 
>> > program that runs beyond MPI will carry the MPI memory footprint with 
>> > them. This is a change in our current behavior.
>> 
>> Correct. Though the only thing we will carry around until termination is the 
>> memory associated with opal/mca/if, opal/mca/event, opal_net, opal_malloc, 
>> opal_show_help, opal_output, opal_dss, opal_datatype, and opal_class. Not 
>> sure how much memory this is.
>> 
>> -Nathan
>> ___
>> 

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-18 Thread George Bosilca
1. If I remember correctly, this topic has already been raised in the
Forum. And the decision was to maintain the current behavior (tools and MPI
init/fini are independent/disconnected).

2. Having to manually set a global flag in order to correctly finalize a
library is HORRIBLE by any reasonable CS standards.

3. Let's not go in shadowy corners of the MPI_T usage, and stay mainstream.
Here is a partial snippet of the most usual way the tool interface is
supposed to be used.

MPI_T_init_thread(MPI_THREAD_SINGLE, );
...
MPI_Init(, );
MPI_Finalize();

  With the proposed patch, we clean up all OPAL memory as soon as we reach
the MPI_Finalize (aka. without the call to MPI_T_finalize).  All MPI_T
calls after MPI_Finalize will trigger a segfault.

  George.



On Thu, Jul 17, 2014 at 10:55 PM, Ralph Castain  wrote:

> As I said, I don't know which solution is the one to follow - they both
> have significant "ick" factors, though I wouldn't go so far as to
> characterize either of them as "horrible". Not being "clean" after calling
> MPI_Finalize seems just as strange.
>
> Nathan and I did discuss the init-after-finalize issue, and he intends to
> raise it with the Forum as it doesn't seem a logical thing to do. So that
> issue may go away. Still leaves us pondering the right solution, and
> hopefully coming up with something better than either of the ones we have
> so far.
>
>
> On Jul 17, 2014, at 7:48 PM, George Bosilca  wrote:
>
> I think Case #1 is only a partial solution, as it only solves the example
> attached to the ticket. Based on my reading the the tool chapter calling
> MPI_T_init after MPI_Finalize is legit, and this case is not covered by the
> patch. But this is not the major issue I have with this patch. From a
> coding perspective, it makes the initialization of OPAL horribly unnatural,
> requiring any other layer using OPAL to make a horrible gymnastic just to
> tear it down correctly (setting opal_init_util_init_extra to the right
> value).
>
>   George.
>
>
>
> On Wed, Jul 16, 2014 at 11:29 AM, Pritchard, Howard r 
> wrote:
>
>> HI Folks,
>>
>> I vote for solution #1.  Doesn't change current behavior.  Doesn't open
>> the door to becoming dependent on availability of
>> ctor/dtor feature in future toolchains.
>>
>> Howard
>>
>>
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
>> Sent: Wednesday, July 16, 2014 9:08 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor))
>> function to opal
>>
>> On Wed, Jul 16, 2014 at 07:59:14AM -0700, Ralph Castain wrote:
>> > I discussed this over IM with Nathan to try and get a better
>> understanding of the options. Basically, we have two approaches available
>> to us:
>> >
>> > 1. my solution resolves the segv problem and eliminates leaks so long
>> as the user calls MPI_Init/Finalize after calling the MPI_T init/finalize
>> functions. This method will still leak memory if the user doesn't use MPI
>> after calling the MPI_T functions, but does mean that all memory used by
>> MPI will be released upon MPI_Finalize. So if the user program continues
>> beyond MPI, they won't be carrying the MPI memory footprint with them. This
>> continues our current behavior.
>> >
>> > 2. the destructor method, which release the MPI memory footprint upon
>> final program termination instead of at MPI_Finalize. This also solves the
>> segv and leak problems, and ensures that someone calling only the MPI_T
>> init/finalize functions will be valgrind-clean, but means that a user
>> program that runs beyond MPI will carry the MPI memory footprint with them.
>> This is a change in our current behavior.
>>
>> Correct. Though the only thing we will carry around until termination is
>> the memory associated with opal/mca/if, opal/mca/event, opal_net,
>> opal_malloc, opal_show_help, opal_output, opal_dss, opal_datatype, and
>> opal_class. Not sure how much memory this is.
>>
>> -Nathan
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15172.php
>>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15193.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15194.php
>


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-18 Thread Gilles Gouaillardet
>
> It would make sense, though I guess I always thought that was part of what
> happened in OBJ_CLASS_INSTANCE - guess I was wrong. My thinking was that
> DEREGISTER would be the counter to INSTANCE, and I do want to keep this
> from getting even more clunky - so maybe renaming INSTANCE to be REGISTER
> and completing the initialization inside it would be the way to go. Or
> renaming DEREGISTER to something more obviously the counter to INSTANCE?
>
>
just so we are clear :

on one hand OBJ_CLASS_INSTANCE is a macro that must be invoked "outside" of
a function :
It *statically* initializes a struct.

on the other hand, OBJ_CLASS_DEREGISTER is a macro that must be invoked
inside a function.

using OBJ_CLASS_REGISTER is not only about renaming, it also requires to
move all these invokations into functions.

my idea of having both OBJ_CLASS_INSTANCE and OBJ_CLASS_REGISTER is :
- we do not need to move OBJ_CLASS_INSTANCE into functions
- we can have two behaviours depending on OPAL_ENABLE_DEBUG :
OBJ_CLASS_REGISTER would simply do nothing if OPAL_ENABLE_DEBUG is zero
(and opal_class_initialize would still be invoked in opal_obj_new). that
could also be a bit faster than having only one OBJ_CLASS_REGISTER macro in
optimized mode.

that being said, i am also fine with simplifying this, remove
OBJ_CLASS_INSTANCE and use OBJ_CLASS_REGISTER and OBJ_CLASS_DEREGISTER


about the bug you hit, did you already solve it and how ?
a trivial workaround is not to dlclose the dynamic library (ok, that's
cheating ...)
a simple workaround (if it is even doable) is to declare the class
"somewhere else" so the (library containing the) class struct is not
dlclose'd before it is invoked (ok, that's ugly ...).

what i wrote earlier was misleading :
OBJ_CLASS_INSTANCE(class);
foo = OBJ_NEW(class);
then
opal_class_t class_class = {...};
foo->super.obj_class = _class;

class_class is no more accessible when the OBJ_RELEASE is called since the
library was dlclose'd, so you do not even get a change to invoke the
destructor ...

a possible workaround could be to malloc a copy of class_class, have
foo->super.obj_class point to it after each OBJ_NEW, and finally have its
cls_destruct_array point to NULL when closing the framework/component.
(of course that causes a leak ...)

Cheers,

Gilles


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-18 Thread Ralph Castain

On Jul 18, 2014, at 8:25 AM, Gilles Gouaillardet 
 wrote:

> +1 for the overall idea !
> 
> On Fri, Jul 18, 2014 at 10:17 PM, Ralph Castain  wrote:
>> * add an OBJ_CLASS_DEREGISTER and require that all instantiations be matched 
>> by deregister at close of the framework/component that instanced it. Of 
>> course, that requires that we protect the class system against someone 
>> releasing/deconstructing an object after the class was deregistered since we 
>> don't know who might be using that class outside of where it was created.
>> 
> 
> my understanding is that in theory, we already have an issue and fortunatly, 
> we do not hit it :
> let's consider a framework/component that instanciate a class 
> (OBJ_CLASS_INSTANCE) *with a destructor*, allocate an object of this class 
> (OBJ_NEW) and expects "someone else" will free it (OBJ_RELEASE)
> if this framework/component ends up in a dynamic library that is dlclose'd 
> when the framework/component is no more used, then OBJ_RELEASE will try to 
> call the destructor which is no more accessible (since the lib was dlclose'd)

FWIW: Intel has hit that exact scenario in our testing and got a glorious segv 
out of it. We now have an assert for NULL in the base object macro's to warn us 
to fix it (which I can provide for review if we want to include it in our repo).

> 
> i could not experience such a scenario yet, and of course, this does not mean 
> there is no problem. i experienced a "kind of" similar situation described in 
> http://www.open-mpi.org/community/lists/devel/2014/06/14937.php
> 
> back to OBJ_CLASS_DEREGISTER, what about an OBJ_CLASS_REGISTER in order to 
> make this symmetric and easier to debug ?
> 
> currently, OBJ_CLASS_REGISTER is "implied" the first time an object of a 
> given class is allocated. from opal_obj_new :
> if (0 == cls->cls_initialized) opal_class_initialize(cls);
> 
> that could be replaced by an error if 0 == cls->cls_initialized
> and OBJ_CLASS_REGISTER would simply call opal_class_initialize

It would make sense, though I guess I always thought that was part of what 
happened in OBJ_CLASS_INSTANCE - guess I was wrong. My thinking was that 
DEREGISTER would be the counter to INSTANCE, and I do want to keep this from 
getting even more clunky - so maybe renaming INSTANCE to be REGISTER and 
completing the initialization inside it would be the way to go. Or renaming 
DEREGISTER to something more obviously the counter to INSTANCE?


> 
> of course, this change could be implemented only when compiled
> with OPAL_ENABLE_DEBUG
> 
> Cheers,
> 
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15196.php



Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-18 Thread Gilles Gouaillardet
+1 for the overall idea !

On Fri, Jul 18, 2014 at 10:17 PM, Ralph Castain  wrote:
>
> * add an OBJ_CLASS_DEREGISTER and require that all instantiations be
> matched by deregister at close of the framework/component that instanced
> it. Of course, that requires that we protect the class system against
> someone releasing/deconstructing an object after the class was deregistered
> since we don't know who might be using that class outside of where it was
> created.
>
> my understanding is that in theory, we already have an issue and
fortunatly, we do not hit it :
let's consider a framework/component that instanciate a class
(OBJ_CLASS_INSTANCE) *with a destructor*, allocate an object of this class
(OBJ_NEW) and expects "someone else" will free it (OBJ_RELEASE)
if this framework/component ends up in a dynamic library that is dlclose'd
when the framework/component is no more used, then OBJ_RELEASE will try to
call the destructor which is no more accessible (since the lib was
dlclose'd)

i could not experience such a scenario yet, and of course, this does not
mean there is no problem. i experienced a "kind of" similar situation
described in http://www.open-mpi.org/community/lists/devel/2014/06/14937.php

back to OBJ_CLASS_DEREGISTER, what about an OBJ_CLASS_REGISTER in order to
make this symmetric and easier to debug ?

currently, OBJ_CLASS_REGISTER is "implied" the first time an object of a
given class is allocated. from opal_obj_new :
if (0 == cls->cls_initialized) opal_class_initialize(cls);

that could be replaced by an error if 0 == cls->cls_initialized
and OBJ_CLASS_REGISTER would simply call opal_class_initialize

of course, this change could be implemented only when compiled
with OPAL_ENABLE_DEBUG

Cheers,

Gilles


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-18 Thread Ralph Castain
I'm going to resurface this suggestion. Is there some reason why this wouldn't 
be the way to resolve the problem?

> * add an OBJ_CLASS_DEREGISTER and require that all instantiations be matched 
> by deregister at close of the framework/component that instanced it. Of 
> course, that requires that we protect the class system against someone 
> releasing/deconstructing an object after the class was deregistered since we 
> don't know who might be using that class outside of where it was created.
> 
> * ensure each framework/component "deregisters" every declared MCA param when 
> finalizing/closing
> 
> * ensure every framework close gets called, and that every framework properly 
> closes all its components. We especially need to ensure that components that 
> were opened but not selected get closed!

I'm asking because it is apparent that this issue of reinitializing is going to 
recur under a variety of scenarios. The two methods we've discussed so far are 
really just bandaids - all we are doing is avoiding actually "finalizing" via 
different mechanisms. The root problem is that we *don't* cleanly finalize OPAL.

So why not address that problem? There is no technical reason we can't cleanly 
finalize the OPAL layer - is the root issue that we're just unwilling to make 
the effort to do it?



On Jul 17, 2014, at 7:55 PM, Ralph Castain  wrote:

> As I said, I don't know which solution is the one to follow - they both have 
> significant "ick" factors, though I wouldn't go so far as to characterize 
> either of them as "horrible". Not being "clean" after calling MPI_Finalize 
> seems just as strange.
> 
> Nathan and I did discuss the init-after-finalize issue, and he intends to 
> raise it with the Forum as it doesn't seem a logical thing to do. So that 
> issue may go away. Still leaves us pondering the right solution, and 
> hopefully coming up with something better than either of the ones we have so 
> far.
> 
> 
> On Jul 17, 2014, at 7:48 PM, George Bosilca  wrote:
> 
>> I think Case #1 is only a partial solution, as it only solves the example 
>> attached to the ticket. Based on my reading the the tool chapter calling 
>> MPI_T_init after MPI_Finalize is legit, and this case is not covered by the 
>> patch. But this is not the major issue I have with this patch. From a coding 
>> perspective, it makes the initialization of OPAL horribly unnatural, 
>> requiring any other layer using OPAL to make a horrible gymnastic just to 
>> tear it down correctly (setting opal_init_util_init_extra to the right 
>> value).
>> 
>>   George.
>> 
>> 
>> 
>> On Wed, Jul 16, 2014 at 11:29 AM, Pritchard, Howard r  
>> wrote:
>> HI Folks,
>> 
>> I vote for solution #1.  Doesn't change current behavior.  Doesn't open the 
>> door to becoming dependent on availability of
>> ctor/dtor feature in future toolchains.
>> 
>> Howard
>> 
>> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
>> Sent: Wednesday, July 16, 2014 9:08 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function 
>> to opal
>> 
>> On Wed, Jul 16, 2014 at 07:59:14AM -0700, Ralph Castain wrote:
>> > I discussed this over IM with Nathan to try and get a better understanding 
>> > of the options. Basically, we have two approaches available to us:
>> >
>> > 1. my solution resolves the segv problem and eliminates leaks so long as 
>> > the user calls MPI_Init/Finalize after calling the MPI_T init/finalize 
>> > functions. This method will still leak memory if the user doesn't use MPI 
>> > after calling the MPI_T functions, but does mean that all memory used by 
>> > MPI will be released upon MPI_Finalize. So if the user program continues 
>> > beyond MPI, they won't be carrying the MPI memory footprint with them. 
>> > This continues our current behavior.
>> >
>> > 2. the destructor method, which release the MPI memory footprint upon 
>> > final program termination instead of at MPI_Finalize. This also solves the 
>> > segv and leak problems, and ensures that someone calling only the MPI_T 
>> > init/finalize functions will be valgrind-clean, but means that a user 
>> > program that runs beyond MPI will carry the MPI memory footprint with 
>> > them. This is a change in our current behavior.
>> 
>> Correct. Though the only thing we will carry around until termination is the 
>> memory associated with opal/mca/if, opal/mca/event, opal_net, opal_malloc, 
>> opal_show_help, opal_output, opal_dss, opal_datatype, and opal_class. Not 
>> sure how much memory this is.
>> 
>> -Nathan
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15172.php
>> 
>> ___
>> devel mailing