Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal
I don't recommend our solution as a general approach - we moved the object instance to the framework base so it never goes out of memory. Regardless, it seems to me that proper cleanup is the better solution, although it means work. I've asked that it be added to next week's telecon agenda so we can reach some resolution. On Jul 18, 2014, at 9:35 AM, Gilles Gouaillardetwrote: > It would make sense, though I guess I always thought that was part of what > happened in OBJ_CLASS_INSTANCE - guess I was wrong. My thinking was that > DEREGISTER would be the counter to INSTANCE, and I do want to keep this from > getting even more clunky - so maybe renaming INSTANCE to be REGISTER and > completing the initialization inside it would be the way to go. Or renaming > DEREGISTER to something more obviously the counter to INSTANCE? > > > just so we are clear : > > on one hand OBJ_CLASS_INSTANCE is a macro that must be invoked "outside" of a > function : > It *statically* initializes a struct. > > on the other hand, OBJ_CLASS_DEREGISTER is a macro that must be invoked > inside a function. > > using OBJ_CLASS_REGISTER is not only about renaming, it also requires to move > all these invokations into functions. > > my idea of having both OBJ_CLASS_INSTANCE and OBJ_CLASS_REGISTER is : > - we do not need to move OBJ_CLASS_INSTANCE into functions > - we can have two behaviours depending on OPAL_ENABLE_DEBUG : > OBJ_CLASS_REGISTER would simply do nothing if OPAL_ENABLE_DEBUG is zero (and > opal_class_initialize would still be invoked in opal_obj_new). that could > also be a bit faster than having only one OBJ_CLASS_REGISTER macro in > optimized mode. > > that being said, i am also fine with simplifying this, remove > OBJ_CLASS_INSTANCE and use OBJ_CLASS_REGISTER and OBJ_CLASS_DEREGISTER > > > about the bug you hit, did you already solve it and how ? > a trivial workaround is not to dlclose the dynamic library (ok, that's > cheating ...) > a simple workaround (if it is even doable) is to declare the class "somewhere > else" so the (library containing the) class struct is not dlclose'd before it > is invoked (ok, that's ugly ...). > > what i wrote earlier was misleading : > OBJ_CLASS_INSTANCE(class); > foo = OBJ_NEW(class); > then > opal_class_t class_class = {...}; > foo->super.obj_class = _class; > > class_class is no more accessible when the OBJ_RELEASE is called since the > library was dlclose'd, so you do not even get a change to invoke the > destructor ... > > a possible workaround could be to malloc a copy of class_class, have > foo->super.obj_class point to it after each OBJ_NEW, and finally have its > cls_destruct_array point to NULL when closing the framework/component. > (of course that causes a leak ...) > > Cheers, > > Gilles > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15198.php
Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal
On Jul 18, 2014, at 10:24 AM, George Bosilcawrote: > 1. If I remember correctly, this topic has already been raised in the Forum. > And the decision was to maintain the current behavior (tools and MPI > init/fini are independent/disconnected). > > 2. Having to manually set a global flag in order to correctly finalize a > library is HORRIBLE by any reasonable CS standards. As I said in my original note, we don't have to set a global flag. All you have to do is decrement the already-existing reference counter that tracks how many times we called init_util, indicating that you are done with it so it can go ahead and truly finalize on next invocation. This is a typical symmetrical operation. All we are doing is correctly communicating to the library that we don't want it to actually tear things down at this time. > > 3. Let's not go in shadowy corners of the MPI_T usage, and stay mainstream. > Here is a partial snippet of the most usual way the tool interface is > supposed to be used. > > MPI_T_init_thread(MPI_THREAD_SINGLE, ); > ... > MPI_Init(, ); > MPI_Finalize(); > > With the proposed patch, we clean up all OPAL memory as soon as we reach > the MPI_Finalize (aka. without the call to MPI_T_finalize). Are you referring to Nathan's patch? In that case, your statement isn't correct - the destructor only gets run at the end of the user's program, and thus the OPAL memory will not be cleaned up until that time. > All MPI_T calls after MPI_Finalize will trigger a segfault. > > George. > > > > On Thu, Jul 17, 2014 at 10:55 PM, Ralph Castain wrote: > As I said, I don't know which solution is the one to follow - they both have > significant "ick" factors, though I wouldn't go so far as to characterize > either of them as "horrible". Not being "clean" after calling MPI_Finalize > seems just as strange. > > Nathan and I did discuss the init-after-finalize issue, and he intends to > raise it with the Forum as it doesn't seem a logical thing to do. So that > issue may go away. Still leaves us pondering the right solution, and > hopefully coming up with something better than either of the ones we have so > far. > > > On Jul 17, 2014, at 7:48 PM, George Bosilca wrote: > >> I think Case #1 is only a partial solution, as it only solves the example >> attached to the ticket. Based on my reading the the tool chapter calling >> MPI_T_init after MPI_Finalize is legit, and this case is not covered by the >> patch. But this is not the major issue I have with this patch. From a coding >> perspective, it makes the initialization of OPAL horribly unnatural, >> requiring any other layer using OPAL to make a horrible gymnastic just to >> tear it down correctly (setting opal_init_util_init_extra to the right >> value). >> >> George. >> >> >> >> On Wed, Jul 16, 2014 at 11:29 AM, Pritchard, Howard r >> wrote: >> HI Folks, >> >> I vote for solution #1. Doesn't change current behavior. Doesn't open the >> door to becoming dependent on availability of >> ctor/dtor feature in future toolchains. >> >> Howard >> >> >> -Original Message- >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm >> Sent: Wednesday, July 16, 2014 9:08 AM >> To: Open MPI Developers >> Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function >> to opal >> >> On Wed, Jul 16, 2014 at 07:59:14AM -0700, Ralph Castain wrote: >> > I discussed this over IM with Nathan to try and get a better understanding >> > of the options. Basically, we have two approaches available to us: >> > >> > 1. my solution resolves the segv problem and eliminates leaks so long as >> > the user calls MPI_Init/Finalize after calling the MPI_T init/finalize >> > functions. This method will still leak memory if the user doesn't use MPI >> > after calling the MPI_T functions, but does mean that all memory used by >> > MPI will be released upon MPI_Finalize. So if the user program continues >> > beyond MPI, they won't be carrying the MPI memory footprint with them. >> > This continues our current behavior. >> > >> > 2. the destructor method, which release the MPI memory footprint upon >> > final program termination instead of at MPI_Finalize. This also solves the >> > segv and leak problems, and ensures that someone calling only the MPI_T >> > init/finalize functions will be valgrind-clean, but means that a user >> > program that runs beyond MPI will carry the MPI memory footprint with >> > them. This is a change in our current behavior. >> >> Correct. Though the only thing we will carry around until termination is the >> memory associated with opal/mca/if, opal/mca/event, opal_net, opal_malloc, >> opal_show_help, opal_output, opal_dss, opal_datatype, and opal_class. Not >> sure how much memory this is. >> >> -Nathan >> ___ >>
Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal
1. If I remember correctly, this topic has already been raised in the Forum. And the decision was to maintain the current behavior (tools and MPI init/fini are independent/disconnected). 2. Having to manually set a global flag in order to correctly finalize a library is HORRIBLE by any reasonable CS standards. 3. Let's not go in shadowy corners of the MPI_T usage, and stay mainstream. Here is a partial snippet of the most usual way the tool interface is supposed to be used. MPI_T_init_thread(MPI_THREAD_SINGLE, ); ... MPI_Init(, ); MPI_Finalize(); With the proposed patch, we clean up all OPAL memory as soon as we reach the MPI_Finalize (aka. without the call to MPI_T_finalize). All MPI_T calls after MPI_Finalize will trigger a segfault. George. On Thu, Jul 17, 2014 at 10:55 PM, Ralph Castainwrote: > As I said, I don't know which solution is the one to follow - they both > have significant "ick" factors, though I wouldn't go so far as to > characterize either of them as "horrible". Not being "clean" after calling > MPI_Finalize seems just as strange. > > Nathan and I did discuss the init-after-finalize issue, and he intends to > raise it with the Forum as it doesn't seem a logical thing to do. So that > issue may go away. Still leaves us pondering the right solution, and > hopefully coming up with something better than either of the ones we have > so far. > > > On Jul 17, 2014, at 7:48 PM, George Bosilca wrote: > > I think Case #1 is only a partial solution, as it only solves the example > attached to the ticket. Based on my reading the the tool chapter calling > MPI_T_init after MPI_Finalize is legit, and this case is not covered by the > patch. But this is not the major issue I have with this patch. From a > coding perspective, it makes the initialization of OPAL horribly unnatural, > requiring any other layer using OPAL to make a horrible gymnastic just to > tear it down correctly (setting opal_init_util_init_extra to the right > value). > > George. > > > > On Wed, Jul 16, 2014 at 11:29 AM, Pritchard, Howard r > wrote: > >> HI Folks, >> >> I vote for solution #1. Doesn't change current behavior. Doesn't open >> the door to becoming dependent on availability of >> ctor/dtor feature in future toolchains. >> >> Howard >> >> >> -Original Message- >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm >> Sent: Wednesday, July 16, 2014 9:08 AM >> To: Open MPI Developers >> Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor)) >> function to opal >> >> On Wed, Jul 16, 2014 at 07:59:14AM -0700, Ralph Castain wrote: >> > I discussed this over IM with Nathan to try and get a better >> understanding of the options. Basically, we have two approaches available >> to us: >> > >> > 1. my solution resolves the segv problem and eliminates leaks so long >> as the user calls MPI_Init/Finalize after calling the MPI_T init/finalize >> functions. This method will still leak memory if the user doesn't use MPI >> after calling the MPI_T functions, but does mean that all memory used by >> MPI will be released upon MPI_Finalize. So if the user program continues >> beyond MPI, they won't be carrying the MPI memory footprint with them. This >> continues our current behavior. >> > >> > 2. the destructor method, which release the MPI memory footprint upon >> final program termination instead of at MPI_Finalize. This also solves the >> segv and leak problems, and ensures that someone calling only the MPI_T >> init/finalize functions will be valgrind-clean, but means that a user >> program that runs beyond MPI will carry the MPI memory footprint with them. >> This is a change in our current behavior. >> >> Correct. Though the only thing we will carry around until termination is >> the memory associated with opal/mca/if, opal/mca/event, opal_net, >> opal_malloc, opal_show_help, opal_output, opal_dss, opal_datatype, and >> opal_class. Not sure how much memory this is. >> >> -Nathan >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15172.php >> > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15193.php > > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15194.php >
Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal
> > It would make sense, though I guess I always thought that was part of what > happened in OBJ_CLASS_INSTANCE - guess I was wrong. My thinking was that > DEREGISTER would be the counter to INSTANCE, and I do want to keep this > from getting even more clunky - so maybe renaming INSTANCE to be REGISTER > and completing the initialization inside it would be the way to go. Or > renaming DEREGISTER to something more obviously the counter to INSTANCE? > > just so we are clear : on one hand OBJ_CLASS_INSTANCE is a macro that must be invoked "outside" of a function : It *statically* initializes a struct. on the other hand, OBJ_CLASS_DEREGISTER is a macro that must be invoked inside a function. using OBJ_CLASS_REGISTER is not only about renaming, it also requires to move all these invokations into functions. my idea of having both OBJ_CLASS_INSTANCE and OBJ_CLASS_REGISTER is : - we do not need to move OBJ_CLASS_INSTANCE into functions - we can have two behaviours depending on OPAL_ENABLE_DEBUG : OBJ_CLASS_REGISTER would simply do nothing if OPAL_ENABLE_DEBUG is zero (and opal_class_initialize would still be invoked in opal_obj_new). that could also be a bit faster than having only one OBJ_CLASS_REGISTER macro in optimized mode. that being said, i am also fine with simplifying this, remove OBJ_CLASS_INSTANCE and use OBJ_CLASS_REGISTER and OBJ_CLASS_DEREGISTER about the bug you hit, did you already solve it and how ? a trivial workaround is not to dlclose the dynamic library (ok, that's cheating ...) a simple workaround (if it is even doable) is to declare the class "somewhere else" so the (library containing the) class struct is not dlclose'd before it is invoked (ok, that's ugly ...). what i wrote earlier was misleading : OBJ_CLASS_INSTANCE(class); foo = OBJ_NEW(class); then opal_class_t class_class = {...}; foo->super.obj_class = _class; class_class is no more accessible when the OBJ_RELEASE is called since the library was dlclose'd, so you do not even get a change to invoke the destructor ... a possible workaround could be to malloc a copy of class_class, have foo->super.obj_class point to it after each OBJ_NEW, and finally have its cls_destruct_array point to NULL when closing the framework/component. (of course that causes a leak ...) Cheers, Gilles
Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal
On Jul 18, 2014, at 8:25 AM, Gilles Gouaillardetwrote: > +1 for the overall idea ! > > On Fri, Jul 18, 2014 at 10:17 PM, Ralph Castain wrote: >> * add an OBJ_CLASS_DEREGISTER and require that all instantiations be matched >> by deregister at close of the framework/component that instanced it. Of >> course, that requires that we protect the class system against someone >> releasing/deconstructing an object after the class was deregistered since we >> don't know who might be using that class outside of where it was created. >> > > my understanding is that in theory, we already have an issue and fortunatly, > we do not hit it : > let's consider a framework/component that instanciate a class > (OBJ_CLASS_INSTANCE) *with a destructor*, allocate an object of this class > (OBJ_NEW) and expects "someone else" will free it (OBJ_RELEASE) > if this framework/component ends up in a dynamic library that is dlclose'd > when the framework/component is no more used, then OBJ_RELEASE will try to > call the destructor which is no more accessible (since the lib was dlclose'd) FWIW: Intel has hit that exact scenario in our testing and got a glorious segv out of it. We now have an assert for NULL in the base object macro's to warn us to fix it (which I can provide for review if we want to include it in our repo). > > i could not experience such a scenario yet, and of course, this does not mean > there is no problem. i experienced a "kind of" similar situation described in > http://www.open-mpi.org/community/lists/devel/2014/06/14937.php > > back to OBJ_CLASS_DEREGISTER, what about an OBJ_CLASS_REGISTER in order to > make this symmetric and easier to debug ? > > currently, OBJ_CLASS_REGISTER is "implied" the first time an object of a > given class is allocated. from opal_obj_new : > if (0 == cls->cls_initialized) opal_class_initialize(cls); > > that could be replaced by an error if 0 == cls->cls_initialized > and OBJ_CLASS_REGISTER would simply call opal_class_initialize It would make sense, though I guess I always thought that was part of what happened in OBJ_CLASS_INSTANCE - guess I was wrong. My thinking was that DEREGISTER would be the counter to INSTANCE, and I do want to keep this from getting even more clunky - so maybe renaming INSTANCE to be REGISTER and completing the initialization inside it would be the way to go. Or renaming DEREGISTER to something more obviously the counter to INSTANCE? > > of course, this change could be implemented only when compiled > with OPAL_ENABLE_DEBUG > > Cheers, > > Gilles > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15196.php
Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal
+1 for the overall idea ! On Fri, Jul 18, 2014 at 10:17 PM, Ralph Castainwrote: > > * add an OBJ_CLASS_DEREGISTER and require that all instantiations be > matched by deregister at close of the framework/component that instanced > it. Of course, that requires that we protect the class system against > someone releasing/deconstructing an object after the class was deregistered > since we don't know who might be using that class outside of where it was > created. > > my understanding is that in theory, we already have an issue and fortunatly, we do not hit it : let's consider a framework/component that instanciate a class (OBJ_CLASS_INSTANCE) *with a destructor*, allocate an object of this class (OBJ_NEW) and expects "someone else" will free it (OBJ_RELEASE) if this framework/component ends up in a dynamic library that is dlclose'd when the framework/component is no more used, then OBJ_RELEASE will try to call the destructor which is no more accessible (since the lib was dlclose'd) i could not experience such a scenario yet, and of course, this does not mean there is no problem. i experienced a "kind of" similar situation described in http://www.open-mpi.org/community/lists/devel/2014/06/14937.php back to OBJ_CLASS_DEREGISTER, what about an OBJ_CLASS_REGISTER in order to make this symmetric and easier to debug ? currently, OBJ_CLASS_REGISTER is "implied" the first time an object of a given class is allocated. from opal_obj_new : if (0 == cls->cls_initialized) opal_class_initialize(cls); that could be replaced by an error if 0 == cls->cls_initialized and OBJ_CLASS_REGISTER would simply call opal_class_initialize of course, this change could be implemented only when compiled with OPAL_ENABLE_DEBUG Cheers, Gilles
Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal
I'm going to resurface this suggestion. Is there some reason why this wouldn't be the way to resolve the problem? > * add an OBJ_CLASS_DEREGISTER and require that all instantiations be matched > by deregister at close of the framework/component that instanced it. Of > course, that requires that we protect the class system against someone > releasing/deconstructing an object after the class was deregistered since we > don't know who might be using that class outside of where it was created. > > * ensure each framework/component "deregisters" every declared MCA param when > finalizing/closing > > * ensure every framework close gets called, and that every framework properly > closes all its components. We especially need to ensure that components that > were opened but not selected get closed! I'm asking because it is apparent that this issue of reinitializing is going to recur under a variety of scenarios. The two methods we've discussed so far are really just bandaids - all we are doing is avoiding actually "finalizing" via different mechanisms. The root problem is that we *don't* cleanly finalize OPAL. So why not address that problem? There is no technical reason we can't cleanly finalize the OPAL layer - is the root issue that we're just unwilling to make the effort to do it? On Jul 17, 2014, at 7:55 PM, Ralph Castainwrote: > As I said, I don't know which solution is the one to follow - they both have > significant "ick" factors, though I wouldn't go so far as to characterize > either of them as "horrible". Not being "clean" after calling MPI_Finalize > seems just as strange. > > Nathan and I did discuss the init-after-finalize issue, and he intends to > raise it with the Forum as it doesn't seem a logical thing to do. So that > issue may go away. Still leaves us pondering the right solution, and > hopefully coming up with something better than either of the ones we have so > far. > > > On Jul 17, 2014, at 7:48 PM, George Bosilca wrote: > >> I think Case #1 is only a partial solution, as it only solves the example >> attached to the ticket. Based on my reading the the tool chapter calling >> MPI_T_init after MPI_Finalize is legit, and this case is not covered by the >> patch. But this is not the major issue I have with this patch. From a coding >> perspective, it makes the initialization of OPAL horribly unnatural, >> requiring any other layer using OPAL to make a horrible gymnastic just to >> tear it down correctly (setting opal_init_util_init_extra to the right >> value). >> >> George. >> >> >> >> On Wed, Jul 16, 2014 at 11:29 AM, Pritchard, Howard r >> wrote: >> HI Folks, >> >> I vote for solution #1. Doesn't change current behavior. Doesn't open the >> door to becoming dependent on availability of >> ctor/dtor feature in future toolchains. >> >> Howard >> >> >> -Original Message- >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm >> Sent: Wednesday, July 16, 2014 9:08 AM >> To: Open MPI Developers >> Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function >> to opal >> >> On Wed, Jul 16, 2014 at 07:59:14AM -0700, Ralph Castain wrote: >> > I discussed this over IM with Nathan to try and get a better understanding >> > of the options. Basically, we have two approaches available to us: >> > >> > 1. my solution resolves the segv problem and eliminates leaks so long as >> > the user calls MPI_Init/Finalize after calling the MPI_T init/finalize >> > functions. This method will still leak memory if the user doesn't use MPI >> > after calling the MPI_T functions, but does mean that all memory used by >> > MPI will be released upon MPI_Finalize. So if the user program continues >> > beyond MPI, they won't be carrying the MPI memory footprint with them. >> > This continues our current behavior. >> > >> > 2. the destructor method, which release the MPI memory footprint upon >> > final program termination instead of at MPI_Finalize. This also solves the >> > segv and leak problems, and ensures that someone calling only the MPI_T >> > init/finalize functions will be valgrind-clean, but means that a user >> > program that runs beyond MPI will carry the MPI memory footprint with >> > them. This is a change in our current behavior. >> >> Correct. Though the only thing we will carry around until termination is the >> memory associated with opal/mca/if, opal/mca/event, opal_net, opal_malloc, >> opal_show_help, opal_output, opal_dss, opal_datatype, and opal_class. Not >> sure how much memory this is. >> >> -Nathan >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15172.php >> >> ___ >> devel mailing