I see several main categories of leaks at the current head (I did not test before Ralph's changes):
- a bunch of RTE leaks --> which is think is what Ralph is trying to pare down - OMPI pre-defined attribute leaks --> This will take a little thinking to fix; it's complicated - OMPI pre-defined datatype leaks --> George says these can't be fixed, but I don't believe it :-) - BML cleanup leaks --> Not sure what's happening here yet - PML allocator leaks (ob1, csum, bfo) --> it looks like much is allocated (including the allocator) during component_open, but it is only freed during component_fini -- NOT during component_close. So unselected PMLs (e.g., csum, bfo) have leaks. This seems like a mistake -- should that which is freed in component_fini be moved to component_close? - some miscellaneous leaks --> I just committed fixes for these in r24865 On Jul 8, 2011, at 9:37 AM, George Bosilca wrote: > What memory leaks? My valgrind claims there are basically none at the MPI > level, so I'm wondering ... > > First, let's make sure everyone understand why there are two initialization > functions in OPAL. One has to call opal_init_util before anything else > otherwise no access to utilities dealing with the command line, > configurations file, output system, help and SOS stuff and so on will be > available. Initializing OPAL adds all the other frameworks: the MCA base, > memory, backtrace, carto and friends. > > Now let's look how this is used from the MPI perspective: > > ompi_mpi_init.c: 309 -> opal_init_util > > ompi_mpi_init.c:357 -> orte_init > > orte_init.c:81 -> opal_init > > opal_init.c:335 -> opal_init_util > > ------------------------------------ > > ompi_mpi_finalize.c:430 -> orte_finalize > > orte_finalize.c:42 -> opal_finalize > > opal_finalize.c:163 -> opal_finalize_util > > ompi_mpi_finalize.c:434 -> opal_finalize_util > > So we have opal_init * 1 and opal_util * 2. Clearly the opal util is not a > simple ON/OFF stuff. With Ralph patch the OPAL utilities will disappear as > soon as the OMPI layer call orte_fini. Luckily, today there is nothing > between the call to orte_fini and opal_finalize_util, so we're safe from a > segfault. > > Moreover, from a software engineering point of view there are two choices for > allowing library composition (ORTE using OPAL, OMPI using ORTE and OPAL, > something else using OMPI and ORTE and OPAL). Either you do the management at > the lowest level using counters, or you provide accessors to check the > init/fini state of the library and do the management at the upper level > (similar to the MPI library). In Open MPI and this for the last 7 years we > chose the first approach. And so far there was no compelling case to switch. > > george. > > On Jul 8, 2011, at 14:42 , Jeff Squyres wrote: > >> Developers -- >> >> Ralph and I talked about this one the other day; he's trying to plug some >> memory leaks during finalize. >> >> Before this commit, there was a counter that counts how many times >> opal_init*() is invoked. opal_finalize() decremented the counter, but >> didn't actually do anything else until the counter went to zero. The >> original intent was that we should call finalize exactly as many times as we >> call initialize, but it appears that we didn't do that (e.g., we call >> opal_init_util() and then opal_init(), but then only call opal_finalize() >> once). >> >> Does anyone have a reason NOT to convert the init/finalize counter to a >> simple boolean? (i.e., can you provide a reason to back this change out?) >> >> >> >> On Jul 8, 2011, at 2:43 AM, r...@osl.iu.edu wrote: >> >>> Author: rhc >>> Date: 2011-07-08 02:43:19 EDT (Fri, 08 Jul 2011) >>> New Revision: 24862 >>> URL: https://svn.open-mpi.org/trac/ompi/changeset/24862 >>> >>> Log: >>> Replace a useless counter with a boolean check to see if we have already >>> passed thru opal_finalize so we don't call finalize, and then don't pass >>> thru it (as was happening on several tools) >>> >>> Text files modified: >>> trunk/opal/runtime/opal_finalize.c | 16 ++++++---------- >>> >>> trunk/opal/runtime/opal_init.c | 16 ++++++---------- >>> >>> trunk/orte/tools/orterun/orterun.c | 4 ---- >>> >>> 3 files changed, 12 insertions(+), 24 deletions(-) >>> >>> Modified: trunk/opal/runtime/opal_finalize.c >>> ============================================================================== >>> --- trunk/opal/runtime/opal_finalize.c (original) >>> +++ trunk/opal/runtime/opal_finalize.c 2011-07-08 02:43:19 EDT (Fri, >>> 08 Jul 2011) >>> @@ -54,18 +54,16 @@ >>> #include "opal/runtime/opal_cr.h" >>> #include "opal/mca/crs/base/base.h" >>> >>> -extern int opal_initialized; >>> -extern int opal_util_initialized; >>> +extern bool opal_initialized; >>> +extern bool opal_util_initialized; >>> >>> int >>> opal_finalize_util(void) >>> { >>> - if( --opal_util_initialized != 0 ) { >>> - if( opal_util_initialized < 0 ) { >>> - return OPAL_ERROR; >>> - } >>> + if (!opal_util_initialized) { >>> return OPAL_SUCCESS; >>> } >>> + opal_util_initialized = false; >>> >>> /* Clear out all the registered MCA params */ >>> mca_base_param_finalize(); >>> @@ -111,12 +109,10 @@ >>> int >>> opal_finalize(void) >>> { >>> - if( --opal_initialized != 0 ) { >>> - if( opal_initialized < 0 ) { >>> - return OPAL_ERROR; >>> - } >>> + if (!opal_initialized) { >>> return OPAL_SUCCESS; >>> } >>> + opal_initialized = false; >>> >>> /* close the checkpoint and restart service */ >>> opal_cr_finalize(); >>> >>> Modified: trunk/opal/runtime/opal_init.c >>> ============================================================================== >>> --- trunk/opal/runtime/opal_init.c (original) >>> +++ trunk/opal/runtime/opal_init.c 2011-07-08 02:43:19 EDT (Fri, 08 Jul >>> 2011) >>> @@ -68,8 +68,8 @@ >>> #endif >>> const char opal_version_string[] = OPAL_IDENT_STRING; >>> >>> -int opal_initialized = 0; >>> -int opal_util_initialized = 0; >>> +bool opal_initialized = false; >>> +bool opal_util_initialized = false; >>> int opal_cache_line_size; >>> >>> static int >>> @@ -219,12 +219,10 @@ >>> int ret; >>> char *error = NULL; >>> >>> - if( ++opal_util_initialized != 1 ) { >>> - if( opal_util_initialized < 1 ) { >>> - return OPAL_ERROR; >>> - } >>> + if (opal_util_initialized) { >>> return OPAL_SUCCESS; >>> } >>> + opal_util_initialized = true; >>> >>> /* JMS See note in runtime/opal.h -- this is temporary; to be >>> replaced with real hwloc information soon (in trunk/v1.5 and >>> @@ -324,12 +322,10 @@ >>> int ret; >>> char *error = NULL; >>> >>> - if( ++opal_initialized != 1 ) { >>> - if( opal_initialized < 1 ) { >>> - return OPAL_ERROR; >>> - } >>> + if (opal_initialized) { >>> return OPAL_SUCCESS; >>> } >>> + opal_initialized = true; >>> >>> /* initialize util code */ >>> if (OPAL_SUCCESS != (ret = opal_init_util(pargc, pargv))) { >>> >>> Modified: trunk/orte/tools/orterun/orterun.c >>> ============================================================================== >>> --- trunk/orte/tools/orterun/orterun.c (original) >>> +++ trunk/orte/tools/orterun/orterun.c 2011-07-08 02:43:19 EDT (Fri, >>> 08 Jul 2011) >>> @@ -618,10 +618,6 @@ >>> ORTE_ERROR_LOG(rc); >>> return rc; >>> } >>> - /* finalize the OPAL utils. As they are opened again from >>> orte_init->opal_init >>> - * we continue to have a reference count on them. So we have to >>> finalize them twice... >>> - */ >>> - opal_finalize_util(); >>> >>> /* check for request to report uri */ >>> if (NULL != orterun_globals.report_uri) { >>> _______________________________________________ >>> svn-full mailing list >>> svn-f...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/