Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun
*An enhancement to permit some form of delimiter escaping would probably still be nice, but is low priority.* [Josh] Not a problem, Dave. We will do this. On Wed, Jul 16, 2014 at 4:32 PM, Dave Goodell (dgoodell)wrote: > On Jul 16, 2014, at 3:08 PM, Joshua Ladd wrote: > > > Ralph warned me that no matter what decision we made, someone would > probably violently object. So, with that in mind, let me put my diplomat > hat on... > > FWIW, I don't think my objections here have been "violent". > > > Dave, I'm sorry you view this as a "crapification" of your mpirun user > interface. Your lament is duly noted and we are happy to work with you to > come to (yet another) happy compromise. It's unfortunate that the issue > wasn't raised during the dev meeting when these decisions were made (you > were sitting right next to me while I discussed this out loud with Jeff and > Ralph.) At the time, it seemed all interested parties had expressed their > concern and voiced their opinion on the matter, and the implementation you > see in the trunk today, including the decision to deprecate the "-x" > option, was the generally agreed upon consensus. > > I think I misunderstood that the "-x" option was going away altogether, so > sorry for the after-the-fact complaints. Probably poor listening on my > part, but I don't think that changes my opinion on the UI impact. > > I'm aware that I'm in the minority here. I don't think that I care enough > to continue arguing over this, so let's just let it be as-is. An > enhancement to permit some form of delimiter escaping would probably still > be nice, but is low priority. > > -Dave > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15185.php >
Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun
On Jul 16, 2014, at 3:08 PM, Joshua Laddwrote: > Ralph warned me that no matter what decision we made, someone would probably > violently object. So, with that in mind, let me put my diplomat hat on... FWIW, I don't think my objections here have been "violent". > Dave, I'm sorry you view this as a "crapification" of your mpirun user > interface. Your lament is duly noted and we are happy to work with you to > come to (yet another) happy compromise. It's unfortunate that the issue > wasn't raised during the dev meeting when these decisions were made (you were > sitting right next to me while I discussed this out loud with Jeff and > Ralph.) At the time, it seemed all interested parties had expressed their > concern and voiced their opinion on the matter, and the implementation you > see in the trunk today, including the decision to deprecate the "-x" option, > was the generally agreed upon consensus. I think I misunderstood that the "-x" option was going away altogether, so sorry for the after-the-fact complaints. Probably poor listening on my part, but I don't think that changes my opinion on the UI impact. I'm aware that I'm in the minority here. I don't think that I care enough to continue arguing over this, so let's just let it be as-is. An enhancement to permit some form of delimiter escaping would probably still be nice, but is low priority. -Dave
Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun
Ralph warned me that no matter what decision we made, someone would probably violently object. So, with that in mind, let me put my diplomat hat on... Dave, I'm sorry you view this as a "crapification" of your mpirun user interface. Your lament is duly noted and we are happy to work with you to come to (yet another) happy compromise. It's unfortunate that the issue wasn't raised during the dev meeting when these decisions were made (you were sitting right next to me while I discussed this out loud with Jeff and Ralph.) At the time, it seemed all interested parties had expressed their concern and voiced their opinion on the matter, and the implementation you see in the trunk today, including the decision to deprecate the "-x" option, was the generally agreed upon consensus. Josh On Wed, Jul 16, 2014 at 2:43 PM, Ralph Castainwrote: > > On Jul 16, 2014, at 11:34 AM, Dave Goodell (dgoodell) > wrote: > > > On Jul 16, 2014, at 12:15 PM, Mike Dubman > wrote: > > > >> we have a strong use-case for list of env variables passed as mca > params.(it was presented and discussed in the past). > > > > I'm not disputing your use case for "mca_base_env_list". I'm only > lamenting the crapification of our mpirun user interface. We had an > earlier change along these lines based on similar reasoning: the > easy-to-use-and-remember "--pernode N" was replaced with "--map-by > ppr:N:node", which I cannot ever seem to remember. > > Same reason, I'm afraid - no way to do the mapping > > > > >> we can rename opal_base_envlist as "-mca x var=val" for consistency. > >> also, "-x" param now is just an alias for "-mca opal_base_envlist > var=val" - so, we can keep it (w/o deprecation warning) as it re-uses same > infra. > > > > Mike, will you make this change? > > > > Thanks, > > -Dave > > ___ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15181.php > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15182.php >
Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal
On Wed, Jul 16, 2014 at 7:36 AM, Nathan Hjelmwrote: > Correction. xlc does support the destructor function attribute. The odd > one out is PGI. > Are the Solaris Studio compilers still of interest to the Open MPI community? If so, I've confirmed support using the following 5-line test on a Solaris-10/SPARC platform. #include int X = 0; __attribute__((__constructor__)) void hello(void) { X = 42; } __attribute__((__destructor__)) void goodbye(void) { printf("X = %d\n", X); } int main(void) { return 0; } -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun
On Jul 16, 2014, at 11:34 AM, Dave Goodell (dgoodell)wrote: > On Jul 16, 2014, at 12:15 PM, Mike Dubman wrote: > >> we have a strong use-case for list of env variables passed as mca params.(it >> was presented and discussed in the past). > > I'm not disputing your use case for "mca_base_env_list". I'm only lamenting > the crapification of our mpirun user interface. We had an earlier change > along these lines based on similar reasoning: the easy-to-use-and-remember > "--pernode N" was replaced with "--map-by ppr:N:node", which I cannot ever > seem to remember. Same reason, I'm afraid - no way to do the mapping > >> we can rename opal_base_envlist as "-mca x var=val" for consistency. >> also, "-x" param now is just an alias for "-mca opal_base_envlist var=val" - >> so, we can keep it (w/o deprecation warning) as it re-uses same infra. > > Mike, will you make this change? > > Thanks, > -Dave > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15181.php
Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun
On Jul 16, 2014, at 12:15 PM, Mike Dubmanwrote: > we have a strong use-case for list of env variables passed as mca params.(it > was presented and discussed in the past). I'm not disputing your use case for "mca_base_env_list". I'm only lamenting the crapification of our mpirun user interface. We had an earlier change along these lines based on similar reasoning: the easy-to-use-and-remember "--pernode N" was replaced with "--map-by ppr:N:node", which I cannot ever seem to remember. > we can rename opal_base_envlist as "-mca x var=val" for consistency. > also, "-x" param now is just an alias for "-mca opal_base_envlist var=val" - > so, we can keep it (w/o deprecation warning) as it re-uses same infra. Mike, will you make this change? Thanks, -Dave
Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun
On Jul 16, 2014, at 12:27 PM, Joshua Laddwrote: > Dave, > > Your example will error out. If someone tries to set envars with both > mechanism, the job fails. The decision to do so was also made at the Dev > meeting and is so that we don't have to do this kind of checking. Hmm, indeed it does. I had mis-installed my OMPI build when I first tested this. I also didn't expect that setting "-mca mca_base_env_list foo=quux" on the command line also would cause the environment to be updated for the code I previously quoted. Apologies for the false bug alarm. -Dave
Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun
Dave, Your example will error out. If someone tries to set envars with both mechanism, the job fails. The decision to do so was also made at the Dev meeting and is so that we don't have to do this kind of checking. Josh On Wed, Jul 16, 2014 at 12:22 PM, Dave Goodell (dgoodell) < dgood...@cisco.com> wrote: > On Jul 15, 2014, at 2:03 PM, Mike Dubmanwrote: > > > these are two separate issues: > > > > 1. -x var=val (or -mca opal_base_envlist var=val) will work in the same > way > > opal_base_envlist does the same as "-x" and can be used in the very same > fashion as -x > > > > 2. When list of vars is passed with help of opal_base_envlist, the > escaping is possible but escaped char should differ from delimiter char. > > That would be my preference (use something like "\" as the escape char). > Though we could always go with a scheme where a doubled delimiter means > "literal delimiter", sort of like "$$" in a Makefile. > > > I think -x should stay as shotrt-form alias for -mca opal_base_envlist > var=val and -mca opal_base_envlist var. > > on dev meeting it was decided to deprecate it as some point. > > Can we revisit this decision? Mike and I both seem to be in favor of > retaining "-x", at least for non-conflicting uses. Would someone who is > against retaining "-x" please speak up in favor of that position? > > Also, Mike, I just looked again at the code and I don't think it is > robustly checking for conflict cases. It's possible to do this and you > won't get an error with the current code, right? > > 8< > $ mpirun -mca mca_base_env_list foo=bar -x foo=baz ... > 8< > > See this code, which only looks at the environment when looking for > "mca_base_env_list": > > > Modified: trunk/orte/tools/orterun/orterun.c > > > == > > --- trunk/orte/tools/orterun/orterun.cTue Jul 8 20:10:04 2014 > (r32162) > > +++ trunk/orte/tools/orterun/orterun.c2014-07-08 20:38:25 EDT > (Tue, 08 Jul 2014) (r32163) > > @@ -1722,6 +1722,13 @@ > > > > /* Did the user request to export any environment variables on the > cmd line? */ > > if (opal_cmd_line_is_taken(_line, "x")) { > > +char* env_set_flag = getenv("OMPI_MCA_mca_base_env_list"); > > +if (NULL != env_set_flag) { > > +orte_show_help("help-orterun.txt", > "orterun:conflict-env-set", false); > > +return ORTE_ERR_FATAL; > > +} else { > > +orte_show_help("help-orterun.txt", > "orterun:deprecated-env-set", false); > > +} > > j = opal_cmd_line_get_ninsts(_line, "x"); > > for (i = 0; i < j; ++i) { > > param = opal_cmd_line_get_param(_line, "x", i, 0); > > > -Dave > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15173.php >
Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun
we have a strong use-case for list of env variables passed as mca params.(it was presented and discussed in the past). we can rename opal_base_envlist as "-mca x var=val" for consistency. also, "-x" param now is just an alias for "-mca opal_base_envlist var=val" - so, we can keep it (w/o deprecation warning) as it re-uses same infra. On Wed, Jul 16, 2014 at 8:10 PM, Ralph Castainwrote: > Don't look at me - I didn't care to begin with as I never use -x and won't > be using this param! :-) > > My point was only that having both is clunky and leads to potential > conflict. No good solution, so > > > On Jul 16, 2014, at 9:57 AM, Dave Goodell (dgoodell) > wrote: > > > On Jul 16, 2014, at 11:31 AM, Ralph Castain wrote: > > > >> Nobody was "against" retaining it. The issue is that "-x" isn't an MCA > parameter, nor does it get translated to one under the covers. So the > problem was one of how to insert it into the typical MCA param precedence > chain. > > > > I understand the combination of the two features is clunky and could > lead to odd corner cases, but the "-x" argument is a feature I actually use > on a fairly regular basis, but I am unlikely to use mca_base_env_list > unless given no other choice. It's just a worse, clunkier interface unless > one really needs to set that MCA parameter via environment variable. > > > > So can we just strike the deprecation warning that is currently issued > when "-x" is passed in the absence of "mca_base_env_list"? > > > > -Dave > > > > ___ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15176.php > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15177.php >
Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun
Don't look at me - I didn't care to begin with as I never use -x and won't be using this param! :-) My point was only that having both is clunky and leads to potential conflict. No good solution, so On Jul 16, 2014, at 9:57 AM, Dave Goodell (dgoodell)wrote: > On Jul 16, 2014, at 11:31 AM, Ralph Castain wrote: > >> Nobody was "against" retaining it. The issue is that "-x" isn't an MCA >> parameter, nor does it get translated to one under the covers. So the >> problem was one of how to insert it into the typical MCA param precedence >> chain. > > I understand the combination of the two features is clunky and could lead to > odd corner cases, but the "-x" argument is a feature I actually use on a > fairly regular basis, but I am unlikely to use mca_base_env_list unless given > no other choice. It's just a worse, clunkier interface unless one really > needs to set that MCA parameter via environment variable. > > So can we just strike the deprecation warning that is currently issued when > "-x" is passed in the absence of "mca_base_env_list"? > > -Dave > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15176.php
Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun
On Jul 16, 2014, at 11:31 AM, Ralph Castainwrote: > Nobody was "against" retaining it. The issue is that "-x" isn't an MCA > parameter, nor does it get translated to one under the covers. So the problem > was one of how to insert it into the typical MCA param precedence chain. I understand the combination of the two features is clunky and could lead to odd corner cases, but the "-x" argument is a feature I actually use on a fairly regular basis, but I am unlikely to use mca_base_env_list unless given no other choice. It's just a worse, clunkier interface unless one really needs to set that MCA parameter via environment variable. So can we just strike the deprecation warning that is currently issued when "-x" is passed in the absence of "mca_base_env_list"? -Dave
Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal
What about providing garbage collection for both POSIX and MPI threads? This problem hints at several underlying layers of "programming faults". -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Wednesday, July 16, 2014 8:59 AM To: Open MPI Developers Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal I discussed this over IM with Nathan to try and get a better understanding of the options. Basically, we have two approaches available to us: 1. my solution resolves the segv problem and eliminates leaks so long as the user calls MPI_Init/Finalize after calling the MPI_T init/finalize functions. This method will still leak memory if the user doesn't use MPI after calling the MPI_T functions, but does mean that all memory used by MPI will be released upon MPI_Finalize. So if the user program continues beyond MPI, they won't be carrying the MPI memory footprint with them. This continues our current behavior. 2. the destructor method, which release the MPI memory footprint upon final program termination instead of at MPI_Finalize. This also solves the segv and leak problems, and ensures that someone calling only the MPI_T init/finalize functions will be valgrind-clean, but means that a user program that runs beyond MPI will carry the MPI memory footprint with them. This is a change in our current behavior. I'm not sure which approach is best, but I think this captures the heart of the decision. On Jul 16, 2014, at 7:36 AM, Nathan Hjelmwrote: > On Wed, Jul 16, 2014 at 08:26:44AM -0600, Nathan Hjelm wrote: >> A number of issues have been raised as part of this discussion. Here >> is what I have seen so far: >> >> - contructor/destructor order not garaunteed: From an opal perspective >> this should not be a problem. Most components are unloaded by >> opal_finalize () not opal_finalize_util (). So opal components >> opal should already be finalized by the time the destructor is called >> (or we can finalize them in the destructor if necessary). >> >> - portability: All the compilers most of us care about: gcc, intel, >> clang. The exceptions appear to be xlc and pgi. For these compilers >> we can fall back on Ralph's solution and just leak if >> MPI_Finalize () is not called after MPI_T_Finalize (). Attached is an >> implementation that does that (needs some adjustment). > > Correction. xlc does support the destructor function attribute. The > odd one out is PGI. > > -Nathan > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15168.php ___ devel mailing list de...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2014/07/15170.php - No virus found in this message. Checked by AVG - www.avg.com Version: 2014.0.4716 / Virus Database: 3986/7863 - Release Date: 07/16/14
Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun
Nobody was "against" retaining it. The issue is that "-x" isn't an MCA parameter, nor does it get translated to one under the covers. So the problem was one of how to insert it into the typical MCA param precedence chain. If we don't try to do that, but instead simply error out if both -x and the MCA param are present, then that resolves the problem. However, and this is a big "but", remember that there is no way to "unset" an MCA param. So if someone puts the mca_base_env_list param into the default MCA param file, then there is no way a user can put -x on the cmd line because the conflict will be detected - and they can't turn "off" the MCA param to resolve it. The inconsistent behavior may prove highly annoying in the user community, but I don't know how widely used this feature will be. On Jul 16, 2014, at 9:22 AM, Dave Goodell (dgoodell)wrote: > On Jul 15, 2014, at 2:03 PM, Mike Dubman wrote: > >> these are two separate issues: >> >> 1. -x var=val (or -mca opal_base_envlist var=val) will work in the same way >> opal_base_envlist does the same as "-x" and can be used in the very same >> fashion as -x >> >> 2. When list of vars is passed with help of opal_base_envlist, the escaping >> is possible but escaped char should differ from delimiter char. > > That would be my preference (use something like "\" as the escape char). > Though we could always go with a scheme where a doubled delimiter means > "literal delimiter", sort of like "$$" in a Makefile. > >> I think -x should stay as shotrt-form alias for -mca opal_base_envlist >> var=val and -mca opal_base_envlist var. >> on dev meeting it was decided to deprecate it as some point. > > Can we revisit this decision? Mike and I both seem to be in favor of > retaining "-x", at least for non-conflicting uses. Would someone who is > against retaining "-x" please speak up in favor of that position? > > Also, Mike, I just looked again at the code and I don't think it is robustly > checking for conflict cases. It's possible to do this and you won't get an > error with the current code, right? > > 8< > $ mpirun -mca mca_base_env_list foo=bar -x foo=baz ... > 8< > > See this code, which only looks at the environment when looking for > "mca_base_env_list": > >> Modified: trunk/orte/tools/orterun/orterun.c >> == >> --- trunk/orte/tools/orterun/orterun.c Tue Jul 8 20:10:04 2014 >> (r32162) >> +++ trunk/orte/tools/orterun/orterun.c 2014-07-08 20:38:25 EDT (Tue, >> 08 Jul 2014) (r32163) >> @@ -1722,6 +1722,13 @@ >> >> /* Did the user request to export any environment variables on the cmd >> line? */ >> if (opal_cmd_line_is_taken(_line, "x")) { >> +char* env_set_flag = getenv("OMPI_MCA_mca_base_env_list"); >> +if (NULL != env_set_flag) { >> +orte_show_help("help-orterun.txt", "orterun:conflict-env-set", >> false); >> +return ORTE_ERR_FATAL; >> +} else { >> +orte_show_help("help-orterun.txt", >> "orterun:deprecated-env-set", false); >> +} >> j = opal_cmd_line_get_ninsts(_line, "x"); >> for (i = 0; i < j; ++i) { >> param = opal_cmd_line_get_param(_line, "x", i, 0); > > > -Dave > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15173.php
Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun
On Jul 15, 2014, at 2:03 PM, Mike Dubmanwrote: > these are two separate issues: > > 1. -x var=val (or -mca opal_base_envlist var=val) will work in the same way > opal_base_envlist does the same as "-x" and can be used in the very same > fashion as -x > > 2. When list of vars is passed with help of opal_base_envlist, the escaping > is possible but escaped char should differ from delimiter char. That would be my preference (use something like "\" as the escape char). Though we could always go with a scheme where a doubled delimiter means "literal delimiter", sort of like "$$" in a Makefile. > I think -x should stay as shotrt-form alias for -mca opal_base_envlist > var=val and -mca opal_base_envlist var. > on dev meeting it was decided to deprecate it as some point. Can we revisit this decision? Mike and I both seem to be in favor of retaining "-x", at least for non-conflicting uses. Would someone who is against retaining "-x" please speak up in favor of that position? Also, Mike, I just looked again at the code and I don't think it is robustly checking for conflict cases. It's possible to do this and you won't get an error with the current code, right? 8< $ mpirun -mca mca_base_env_list foo=bar -x foo=baz ... 8< See this code, which only looks at the environment when looking for "mca_base_env_list": > Modified: trunk/orte/tools/orterun/orterun.c > == > --- trunk/orte/tools/orterun/orterun.cTue Jul 8 20:10:04 2014 > (r32162) > +++ trunk/orte/tools/orterun/orterun.c2014-07-08 20:38:25 EDT (Tue, > 08 Jul 2014) (r32163) > @@ -1722,6 +1722,13 @@ > > /* Did the user request to export any environment variables on the cmd > line? */ > if (opal_cmd_line_is_taken(_line, "x")) { > +char* env_set_flag = getenv("OMPI_MCA_mca_base_env_list"); > +if (NULL != env_set_flag) { > +orte_show_help("help-orterun.txt", "orterun:conflict-env-set", > false); > +return ORTE_ERR_FATAL; > +} else { > +orte_show_help("help-orterun.txt", "orterun:deprecated-env-set", > false); > +} > j = opal_cmd_line_get_ninsts(_line, "x"); > for (i = 0; i < j; ++i) { > param = opal_cmd_line_get_param(_line, "x", i, 0); -Dave
Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal
HI Folks, I vote for solution #1. Doesn't change current behavior. Doesn't open the door to becoming dependent on availability of ctor/dtor feature in future toolchains. Howard -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm Sent: Wednesday, July 16, 2014 9:08 AM To: Open MPI Developers Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal On Wed, Jul 16, 2014 at 07:59:14AM -0700, Ralph Castain wrote: > I discussed this over IM with Nathan to try and get a better understanding of > the options. Basically, we have two approaches available to us: > > 1. my solution resolves the segv problem and eliminates leaks so long as the > user calls MPI_Init/Finalize after calling the MPI_T init/finalize functions. > This method will still leak memory if the user doesn't use MPI after calling > the MPI_T functions, but does mean that all memory used by MPI will be > released upon MPI_Finalize. So if the user program continues beyond MPI, they > won't be carrying the MPI memory footprint with them. This continues our > current behavior. > > 2. the destructor method, which release the MPI memory footprint upon final > program termination instead of at MPI_Finalize. This also solves the segv and > leak problems, and ensures that someone calling only the MPI_T init/finalize > functions will be valgrind-clean, but means that a user program that runs > beyond MPI will carry the MPI memory footprint with them. This is a change in > our current behavior. Correct. Though the only thing we will carry around until termination is the memory associated with opal/mca/if, opal/mca/event, opal_net, opal_malloc, opal_show_help, opal_output, opal_dss, opal_datatype, and opal_class. Not sure how much memory this is. -Nathan
Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal
On Wed, Jul 16, 2014 at 07:59:14AM -0700, Ralph Castain wrote: > I discussed this over IM with Nathan to try and get a better understanding of > the options. Basically, we have two approaches available to us: > > 1. my solution resolves the segv problem and eliminates leaks so long as the > user calls MPI_Init/Finalize after calling the MPI_T init/finalize functions. > This method will still leak memory if the user doesn't use MPI after calling > the MPI_T functions, but does mean that all memory used by MPI will be > released upon MPI_Finalize. So if the user program continues beyond MPI, they > won't be carrying the MPI memory footprint with them. This continues our > current behavior. > > 2. the destructor method, which release the MPI memory footprint upon final > program termination instead of at MPI_Finalize. This also solves the segv and > leak problems, and ensures that someone calling only the MPI_T init/finalize > functions will be valgrind-clean, but means that a user program that runs > beyond MPI will carry the MPI memory footprint with them. This is a change in > our current behavior. Correct. Though the only thing we will carry around until termination is the memory associated with opal/mca/if, opal/mca/event, opal_net, opal_malloc, opal_show_help, opal_output, opal_dss, opal_datatype, and opal_class. Not sure how much memory this is. -Nathan pgpnPkl7xqqrj.pgp Description: PGP signature
Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal
I discussed this over IM with Nathan to try and get a better understanding of the options. Basically, we have two approaches available to us: 1. my solution resolves the segv problem and eliminates leaks so long as the user calls MPI_Init/Finalize after calling the MPI_T init/finalize functions. This method will still leak memory if the user doesn't use MPI after calling the MPI_T functions, but does mean that all memory used by MPI will be released upon MPI_Finalize. So if the user program continues beyond MPI, they won't be carrying the MPI memory footprint with them. This continues our current behavior. 2. the destructor method, which release the MPI memory footprint upon final program termination instead of at MPI_Finalize. This also solves the segv and leak problems, and ensures that someone calling only the MPI_T init/finalize functions will be valgrind-clean, but means that a user program that runs beyond MPI will carry the MPI memory footprint with them. This is a change in our current behavior. I'm not sure which approach is best, but I think this captures the heart of the decision. On Jul 16, 2014, at 7:36 AM, Nathan Hjelmwrote: > On Wed, Jul 16, 2014 at 08:26:44AM -0600, Nathan Hjelm wrote: >> A number of issues have been raised as part of this discussion. Here is >> what I have seen so far: >> >> - contructor/destructor order not garaunteed: From an opal perspective >> this should not be a problem. Most components are unloaded by >> opal_finalize () not opal_finalize_util (). So opal components >> opal should already be finalized by the time the destructor is called >> (or we can finalize them in the destructor if necessary). >> >> - portability: All the compilers most of us care about: gcc, intel, >> clang. The exceptions appear to be xlc and pgi. For these compilers >> we can fall back on Ralph's solution and just leak if >> MPI_Finalize () is not called after MPI_T_Finalize (). Attached is an >> implementation that does that (needs some adjustment). > > Correction. xlc does support the destructor function attribute. The odd > one out is PGI. > > -Nathan > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15168.php
Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal
On Wed, Jul 16, 2014 at 08:26:44AM -0600, Nathan Hjelm wrote: > A number of issues have been raised as part of this discussion. Here is > what I have seen so far: > > - contructor/destructor order not garaunteed: From an opal perspective >this should not be a problem. Most components are unloaded by >opal_finalize () not opal_finalize_util (). So opal components >opal should already be finalized by the time the destructor is called >(or we can finalize them in the destructor if necessary). > > - portability: All the compilers most of us care about: gcc, intel, >clang. The exceptions appear to be xlc and pgi. For these compilers >we can fall back on Ralph's solution and just leak if >MPI_Finalize () is not called after MPI_T_Finalize (). Attached is an >implementation that does that (needs some adjustment). Correction. xlc does support the destructor function attribute. The odd one out is PGI. -Nathan pgpGxJ67igVdH.pgp Description: PGP signature
Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal
Forgot to attach the patch. -Nathan diff --git a/opal/runtime/opal_finalize.c b/opal/runtime/opal_finalize.c index 318eba7..22b2e58 100644 --- a/opal/runtime/opal_finalize.c +++ b/opal/runtime/opal_finalize.c @@ -1,3 +1,4 @@ +/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */ /* * Copyright (c) 2004-2010 The Trustees of Indiana University and Indiana * University Research and Technology @@ -10,7 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2008-2012 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2010-2013 Los Alamos National Security, LLC. + * Copyright (c) 2010-2014 Los Alamos National Security, LLC. * All rights reserved. * Copyright (c) 2013-2014 Intel, Inc. All rights reserved * $COPYRIGHT$ @@ -57,6 +58,7 @@ extern int opal_initialized; extern int opal_util_initialized; +extern bool opal_init_util_called; int opal_finalize_util(void) @@ -65,9 +67,21 @@ opal_finalize_util(void) if( opal_util_initialized < 0 ) { return OPAL_ERROR; } -return OPAL_SUCCESS; } +/* do not do anything more if the destuctor attribute is available */ +#if OPAL_HAVE_ATTRIBUTE_DESTRUCTOR +return OPAL_SUCCESS; +} + +static void __attribute__((destructor)) opal_cleanup_resources (void) { +if (!opal_init_util_called) { +/* nothing to clean up */ +return; +} +#endif +opal_init_util_called = false; + /* close interfaces code. */ if (opal_if_base_framework.framework_refcnt > 1) { /* opal if may have been opened many times -- FIXME */ @@ -89,6 +103,11 @@ opal_finalize_util(void) (void) mca_base_framework_close(_installdirs_base_framework); +#if OPAL_HAVE_ATTRIBUTE_DESTRUCTOR +/* there are issues with tearing down everything in opal_finalize_util. doing + * so will cause opal_init_util to segmentation fault if called after finalize. + * this cleanup is safe in a destructor function. */ + /* finalize the memory allocator */ opal_malloc_finalize(); @@ -108,8 +127,9 @@ opal_finalize_util(void) /* finalize the class/object system */ opal_class_finalize(); - +#else return OPAL_SUCCESS; +#endif } diff --git a/opal/runtime/opal_init.c b/opal/runtime/opal_init.c index 6567a9f..a48517b 100644 --- a/opal/runtime/opal_init.c +++ b/opal/runtime/opal_init.c @@ -71,6 +71,8 @@ const char opal_version_string[] = OPAL_IDENT_STRING; int opal_initialized = 0; int opal_util_initialized = 0; +bool opal_init_util_called = false; + /* We have to put a guess in here in case hwloc is not available. If hwloc is available, this value will be overwritten when the hwloc data is loaded. */ @@ -247,13 +249,17 @@ opal_init_util(int* pargc, char*** pargv) int ret; char *error = NULL; -if( ++opal_util_initialized != 1 ) { +++opal_util_initialized; + +if (opal_init_util_called) { if( opal_util_initialized < 1 ) { return OPAL_ERROR; } return OPAL_SUCCESS; } +opal_init_util_called = true; + /* initialize the memory allocator */ opal_malloc_init(); pgpQAhUei3oae.pgp Description: PGP signature
Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal
A number of issues have been raised as part of this discussion. Here is what I have seen so far: - contructor/destructor order not garaunteed: From an opal perspective this should not be a problem. Most components are unloaded by opal_finalize () not opal_finalize_util (). So opal components opal should already be finalized by the time the destructor is called (or we can finalize them in the destructor if necessary). - portability: All the compilers most of us care about: gcc, intel, clang. The exceptions appear to be xlc and pgi. For these compilers we can fall back on Ralph's solution and just leak if MPI_Finalize () is not called after MPI_T_Finalize (). Attached is an implementation that does that (needs some adjustment). -Nathan pgpcsLSFVFI1U.pgp Description: PGP signature
Re: [OMPI devel] Onesided failures
Sounds like a good plan. Thanks for looking into this Gilles! Regards, Rolf From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles GOUAILLARDET Sent: Wednesday, July 16, 2014 9:53 AM To: Open MPI Developers Subject: Re: [OMPI devel] Onesided failures Rolf, From the man page of MPI_Win_allocate_shared It is the user's responsibility to ensure that the communicator comm represents a group of processes that can create a shared memory segment that can be accessed by all processes in the group And from the mtt logs, you are running 4 tasks on 2 nodes. Unless I am missing something obvious, I will update the test tomorrow and add a comm split to ensure MPI_Win_allocate_shared is called from single node communicator and skip the test if this impossible Cheers, Gilles Rolf vandeVaart> wrote: On both 1.8 and trunk (as Ralph mentioned in meeting) we are seeing three tests fail. http://mtt.open-mpi.org/index.php?do_redir=2205 Ibm/onesided/win_allocate_shared Ibm/onesided/win_allocated_shared_mpifh Ibm/onesided/win_allocated_shared_usempi Is there a ticket that covers these failures? Thanks, Rolf This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Re: [OMPI devel] Onesided failures
Rolf, From the man page of MPI_Win_allocate_shared It is the user's responsibility to ensure that the communicator comm represents a group of processes that can create a shared memory segment that can be accessed by all processes in the group And from the mtt logs, you are running 4 tasks on 2 nodes. Unless I am missing something obvious, I will update the test tomorrow and add a comm split to ensure MPI_Win_allocate_shared is called from single node communicator and skip the test if this impossible Cheers, Gilles Rolf vandeVaartwrote: > > >On both 1.8 and trunk (as Ralph mentioned in meeting) we are seeing three >tests fail. > >http://mtt.open-mpi.org/index.php?do_redir=2205 > > > >Ibm/onesided/win_allocate_shared > >Ibm/onesided/win_allocated_shared_mpifh > >Ibm/onesided/win_allocated_shared_usempi > > > >Is there a ticket that covers these failures? > > > >Thanks, > >Rolf > >This email message is for the sole use of the intended recipient(s) and may >contain confidential information. Any unauthorized review, use, disclosure or >distribution is prohibited. If you are not the intended recipient, please >contact the sender by reply email and destroy all copies of the original >message. >
[OMPI devel] Onesided failures
On both 1.8 and trunk (as Ralph mentioned in meeting) we are seeing three tests fail. http://mtt.open-mpi.org/index.php?do_redir=2205 Ibm/onesided/win_allocate_shared Ibm/onesided/win_allocated_shared_mpifh Ibm/onesided/win_allocated_shared_usempi Is there a ticket that covers these failures? Thanks, Rolf --- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ---
Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal
Ralph and all, my understanding is that opal_finalize_util agressively tries to free memory that would be still allocated otherwise. an other way of saying "make valgrind happy" is "fully automake memory leak detection" (Joost pointed to the -fsanitize=leak feature of gcc 4.9 in http://www.open-mpi.org/community/lists/devel/2014/05/14672.php) the following simple program : #include int main(int argc, char* argv[]) { int ret, provided; ret = MPI_T_init_thread(MPI_THREAD_SINGLE, ); ret = MPI_T_finalize(); return 0; } leaks a *lot* of objects (and might remove some environment variables as well) which have been half destroyed by opal_finalize_util, for example : - classes are still marked as initialized *but* the cls_contruct_array has been free'd - the oob framework was not unallocated, it is still marked as MCA_BASE_FRAMEWORK_FLAG_REGISTERED but some mca variables were freed, and that will cause problems when MPI_Init try to (re)start the tcp component now my 0.02$ : ideally, MPI_Finalize nor MPI_T_finalize would leak any memory and the framework would be re-initializable. this could be a goal and George gave some good explanations on why it is hard to achieve. from my pragmatic point of view, and for this test case only, i am very happy with a simple working solution, even if it means that MPI_T_finalize leaks way too much memory in order to work around the non re-initializable framework. Cheers, Gilles On 2014/07/16 12:49, Ralph Castain wrote: > I've attached a solution that blocks the segfault without requiring any > gyrations. Can someone explain why this isn't adequate? > > Alternate solution was to simply decrement opal_util_initialized in > MPI_T_finalize rather than calling finalize itself. Either way resolves the > problem in a very simple manner. >
Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal
I've attached a solution that blocks the segfault without requiring any gyrations. Can someone explain why this isn't adequate?Alternate solution was to simply decrement opal_util_initialized in MPI_T_finalize rather than calling finalize itself. Either way resolves the problem in a very simple manner. fix.diff Description: Binary data mpit.c Description: Binary data On Jul 15, 2014, at 6:10 PM, Ralph Castainwrote:I'm unsure where Intel's compilers sit on that list.When you say it works except for reinit, are you saying that the only issue here is that MPI_T_Finalize is calling opal_finalize_util solely because of the valgrind cleanup? And if it didn't do that, we would leak but would otherwise be just fine?Just checking my understanding. Looking at the code, that would certainly appear to be true due to the reference counter in there, which would prevent us from eventually cleaning up because the counter wouldn't reach zero. However, couldn't we resolve that by (a) having MPI_T_Init set a global flag indicating it was called, and then (b) in opal_finalize, check the flag and add another call to opal_finalize_util if the flag is set?Seems like all we really need to do is ensure that the init/finalize calls match, and that is far easier to ensure than doing the rest of this stuff.On Jul 15, 2014, at 5:48 PM, George Bosilca wrote:Enforcing the portability of this sounds like a huge [almost impossible] mess, without a clean portable solution (more about this below). However, few things should be considered:- Except for reinit, Open MPI works without it! If we provide such a capability it will be more a convenience capability to keep valgrind happy, than a necessity - in case the constructor/destructor functionality is available we explicitly control the ordering in which the shared libraries are opened/closed as we control the dl_open/dl_close for most of the shared libraries. George.PS: Other cases about shared libraries constructor/destructor.The easy ones.Mac OS X: https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/DynamicLibraries/100-Articles/DynamicLibraryDesignGuidelines.html Solaris: http://docs.oracle.com/cd/E18659_01/html/821-1383/bkamq.htmlAnd the others PGI: http://www.pgroup.com/userforum/viewtopic.php?t=697=4efce7bfb4e914e42f48f219fc7e6a7e XLC: beg for forgiveness (there is a -binitifini function but it must be specified at link time) On Tue, Jul 15, 2014 at 8:06 PM, Paul Hargrove wrote: The priority appears to have been added in gcc 4.3.You'll note it is not described in https://gcc.gnu.org/onlinedocs/gcc-4.2.0/gcc/Function-Attributes.html I also don't think the presence of the priority argument fixes anything...An OpenMPI code author cannot change the "priority" of a ctor or dtor in a precompiled third-party library (libpmi comes to mind). Nor can one know what value the third part chose (in order to be higher or lower than theirs). You cannot even be assured the third-party didn't set priority to INT_MIN or INT_MAX (or whatever). That text also says nothing about dl_open() and dl_close() which must be considered in Open MPI.Before assuming constructor/destructor attributes are going to save the world, wash your dog, and pick up the dry cleaning, one should probably verify some minimal level of support on non-gnu tool-chains including vendor compilers (PGI, XLC, etc) and system linkers (Darwin and Solaris). -PaulOn Tue, Jul 15, 2014 at 4:52 PM, Joshua Ladd wrote: According to http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html "constructor destructor constructor (priority) destructor (priority) The constructor attribute causes the function to be called automatically before execution enters main (). Similarly, the destructor attribute causes the function to be called automatically after main () completes or exit () is called. Functions with these attributes are useful for initializing data that is used implicitly during the execution of the program. You may provide an optional integer priority to control the order in which constructor and destructor functions are run. A constructor with a smaller priority number runs before a constructor with a larger priority number; the opposite relationship holds for destructors. So, if you have a constructor that allocates a resource and a destructor that deallocates the same resource, both functions typically have the same priority. The priorities for constructor and destructor functions are the same as those specified for namespace-scope C++ objects (see C++ Attributes). These attributes are not currently implemented for Objective-C."On Tue, Jul 15, 2014 at 5:20 PM, Paul Hargrove wrote: On Tue, Jul 15, 2014 at 12:49 PM, Pritchard, Howard r wrote: I don't think there's anything wrong with using ctor/dtors in shared libraries, but one does need to make