Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Joshua Ladd
*An enhancement to permit some form of delimiter escaping would probably
still be nice, but is low priority.*
[Josh] Not a problem, Dave. We will do this.


On Wed, Jul 16, 2014 at 4:32 PM, Dave Goodell (dgoodell)  wrote:

> On Jul 16, 2014, at 3:08 PM, Joshua Ladd  wrote:
>
> > Ralph warned me that no matter what decision we made, someone would
> probably violently object. So, with that in mind, let me put my diplomat
> hat on...
>
> FWIW, I don't think my objections here have been "violent".
>
> > Dave, I'm sorry you view this as a "crapification" of your mpirun user
> interface. Your lament is duly noted and we are happy to work with you to
> come to (yet another) happy compromise. It's unfortunate that the issue
> wasn't raised during the dev meeting when these decisions were made (you
> were sitting right next to me while I discussed this out loud with Jeff and
> Ralph.) At the time, it seemed all interested parties had expressed their
> concern and voiced their opinion on the matter, and the implementation you
> see in the trunk today, including the decision to deprecate the "-x"
> option, was the generally agreed upon consensus.
>
> I think I misunderstood that the "-x" option was going away altogether, so
> sorry for the after-the-fact complaints.  Probably poor listening on my
> part, but I don't think that changes my opinion on the UI impact.
>
> I'm aware that I'm in the minority here.  I don't think that I care enough
> to continue arguing over this, so let's just let it be as-is.  An
> enhancement to permit some form of delimiter escaping would probably still
> be nice, but is low priority.
>
> -Dave
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15185.php
>


Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Dave Goodell (dgoodell)
On Jul 16, 2014, at 3:08 PM, Joshua Ladd  wrote:

> Ralph warned me that no matter what decision we made, someone would probably 
> violently object. So, with that in mind, let me put my diplomat hat on...

FWIW, I don't think my objections here have been "violent".

> Dave, I'm sorry you view this as a "crapification" of your mpirun user 
> interface. Your lament is duly noted and we are happy to work with you to 
> come to (yet another) happy compromise. It's unfortunate that the issue 
> wasn't raised during the dev meeting when these decisions were made (you were 
> sitting right next to me while I discussed this out loud with Jeff and 
> Ralph.) At the time, it seemed all interested parties had expressed their 
> concern and voiced their opinion on the matter, and the implementation you 
> see in the trunk today, including the decision to deprecate the "-x" option, 
> was the generally agreed upon consensus. 

I think I misunderstood that the "-x" option was going away altogether, so 
sorry for the after-the-fact complaints.  Probably poor listening on my part, 
but I don't think that changes my opinion on the UI impact.

I'm aware that I'm in the minority here.  I don't think that I care enough to 
continue arguing over this, so let's just let it be as-is.  An enhancement to 
permit some form of delimiter escaping would probably still be nice, but is low 
priority.

-Dave



Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Joshua Ladd
Ralph warned me that no matter what decision we made, someone would
probably violently object. So, with that in mind, let me put my diplomat
hat on...

Dave, I'm sorry you view this as a "crapification" of your mpirun user
interface. Your lament is duly noted and we are happy to work with you to
come to (yet another) happy compromise. It's unfortunate that the issue
wasn't raised during the dev meeting when these decisions were made (you
were sitting right next to me while I discussed this out loud with Jeff and
Ralph.) At the time, it seemed all interested parties had expressed their
concern and voiced their opinion on the matter, and the implementation you
see in the trunk today, including the decision to deprecate the "-x"
option, was the generally agreed upon consensus.

Josh


On Wed, Jul 16, 2014 at 2:43 PM, Ralph Castain  wrote:

>
> On Jul 16, 2014, at 11:34 AM, Dave Goodell (dgoodell) 
> wrote:
>
> > On Jul 16, 2014, at 12:15 PM, Mike Dubman 
> wrote:
> >
> >> we have a strong use-case for list of env variables passed as mca
> params.(it was presented and discussed in the past).
> >
> > I'm not disputing your use case for "mca_base_env_list".  I'm only
> lamenting the crapification of our mpirun user interface.  We had an
> earlier change along these lines based on similar reasoning: the
> easy-to-use-and-remember "--pernode N" was replaced with "--map-by
> ppr:N:node", which I cannot ever seem to remember.
>
> Same reason, I'm afraid - no way to do the mapping
>
> >
> >> we can rename opal_base_envlist as "-mca x var=val" for consistency.
> >> also, "-x" param now is just an alias for "-mca opal_base_envlist
> var=val" - so, we can keep it (w/o deprecation warning) as it re-uses same
> infra.
> >
> > Mike, will you make this change?
> >
> > Thanks,
> > -Dave
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15181.php
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15182.php
>


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Paul Hargrove
On Wed, Jul 16, 2014 at 7:36 AM, Nathan Hjelm  wrote:

> Correction. xlc does support the destructor function attribute. The odd
> one out is PGI.
>

Are the Solaris Studio compilers still of interest to the Open MPI
community?
If so, I've confirmed support using the following 5-line test on a
Solaris-10/SPARC platform.

#include 
int X = 0;
__attribute__((__constructor__)) void hello(void) { X = 42; }
__attribute__((__destructor__)) void goodbye(void) { printf("X = %d\n", X);
}
int main(void) { return 0; }



-Paul




-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Ralph Castain

On Jul 16, 2014, at 11:34 AM, Dave Goodell (dgoodell)  
wrote:

> On Jul 16, 2014, at 12:15 PM, Mike Dubman  wrote:
> 
>> we have a strong use-case for list of env variables passed as mca params.(it 
>> was presented and discussed in the past).
> 
> I'm not disputing your use case for "mca_base_env_list".  I'm only lamenting 
> the crapification of our mpirun user interface.  We had an earlier change 
> along these lines based on similar reasoning: the easy-to-use-and-remember 
> "--pernode N" was replaced with "--map-by ppr:N:node", which I cannot ever 
> seem to remember.

Same reason, I'm afraid - no way to do the mapping

> 
>> we can rename opal_base_envlist as "-mca x var=val" for consistency.
>> also, "-x" param now is just an alias for "-mca opal_base_envlist var=val" - 
>> so, we can keep it (w/o deprecation warning) as it re-uses same infra.
> 
> Mike, will you make this change?
> 
> Thanks,
> -Dave
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15181.php



Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Dave Goodell (dgoodell)
On Jul 16, 2014, at 12:15 PM, Mike Dubman  wrote:

> we have a strong use-case for list of env variables passed as mca params.(it 
> was presented and discussed in the past).

I'm not disputing your use case for "mca_base_env_list".  I'm only lamenting 
the crapification of our mpirun user interface.  We had an earlier change along 
these lines based on similar reasoning: the easy-to-use-and-remember "--pernode 
N" was replaced with "--map-by ppr:N:node", which I cannot ever seem to 
remember.

> we can rename opal_base_envlist as "-mca x var=val" for consistency.
> also, "-x" param now is just an alias for "-mca opal_base_envlist var=val" - 
> so, we can keep it (w/o deprecation warning) as it re-uses same infra.

Mike, will you make this change?

Thanks,
-Dave

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Dave Goodell (dgoodell)
On Jul 16, 2014, at 12:27 PM, Joshua Ladd  wrote:

> Dave,
> 
> Your example will error out. If someone tries to set envars with both 
> mechanism, the job fails. The decision to do so was also made at the Dev 
> meeting and is so that we don't have to do this kind of checking. 

Hmm, indeed it does.  I had mis-installed my OMPI build when I first tested 
this.  I also didn't expect that setting "-mca mca_base_env_list foo=quux" on 
the command line also would cause the environment to be updated for the code I 
previously quoted.

Apologies for the false bug alarm.

-Dave



Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Joshua Ladd
Dave,

Your example will error out. If someone tries to set envars with both
mechanism, the job fails. The decision to do so was also made at the Dev
meeting and is so that we don't have to do this kind of checking.

Josh


On Wed, Jul 16, 2014 at 12:22 PM, Dave Goodell (dgoodell) <
dgood...@cisco.com> wrote:

> On Jul 15, 2014, at 2:03 PM, Mike Dubman  wrote:
>
> > these are two separate issues:
> >
> > 1. -x var=val (or -mca opal_base_envlist var=val) will work in the same
> way
> > opal_base_envlist does the same as "-x" and can be used in the very same
> fashion as -x
> >
> > 2. When list of vars is passed with help of opal_base_envlist, the
> escaping is possible but escaped char should differ from delimiter char.
>
> That would be my preference (use something like "\" as the escape char).
>  Though we could always go with a scheme where a doubled delimiter means
> "literal delimiter", sort of like "$$" in a Makefile.
>
> > I think -x should stay as shotrt-form alias for -mca opal_base_envlist
> var=val and -mca opal_base_envlist var.
> > on dev meeting it was decided to deprecate it as some point.
>
> Can we revisit this decision?  Mike and I both seem to be in favor of
> retaining "-x", at least for non-conflicting uses.  Would someone who is
> against retaining "-x" please speak up in favor of that position?
>
> Also, Mike, I just looked again at the code and I don't think it is
> robustly checking for conflict cases.  It's possible to do this and you
> won't get an error with the current code, right?
>
> 8<
> $ mpirun -mca mca_base_env_list foo=bar -x foo=baz ...
> 8<
>
> See this code, which only looks at the environment when looking for
> "mca_base_env_list":
>
> > Modified: trunk/orte/tools/orterun/orterun.c
> >
> ==
> > --- trunk/orte/tools/orterun/orterun.cTue Jul  8 20:10:04 2014
>  (r32162)
> > +++ trunk/orte/tools/orterun/orterun.c2014-07-08 20:38:25 EDT
> (Tue, 08 Jul 2014)  (r32163)
> > @@ -1722,6 +1722,13 @@
> >
> >  /* Did the user request to export any environment variables on the
> cmd line? */
> >  if (opal_cmd_line_is_taken(_line, "x")) {
> > +char* env_set_flag = getenv("OMPI_MCA_mca_base_env_list");
> > +if (NULL != env_set_flag) {
> > +orte_show_help("help-orterun.txt",
> "orterun:conflict-env-set", false);
> > +return ORTE_ERR_FATAL;
> > +} else {
> > +orte_show_help("help-orterun.txt",
> "orterun:deprecated-env-set", false);
> > +}
> >  j = opal_cmd_line_get_ninsts(_line, "x");
> >  for (i = 0; i < j; ++i) {
> >  param = opal_cmd_line_get_param(_line, "x", i, 0);
>
>
> -Dave
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15173.php
>


Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Mike Dubman
we have a strong use-case for list of env variables passed as mca params.(it
was presented and discussed in the past).

we can rename opal_base_envlist as "-mca x var=val" for consistency.
also, "-x" param now is just an alias for "-mca opal_base_envlist var=val"
- so, we can keep it (w/o deprecation warning) as it re-uses same infra.



On Wed, Jul 16, 2014 at 8:10 PM, Ralph Castain  wrote:

> Don't look at me - I didn't care to begin with as I never use -x and won't
> be using this param! :-)
>
> My point was only that having both is clunky and leads to potential
> conflict. No good solution, so 
>
>
> On Jul 16, 2014, at 9:57 AM, Dave Goodell (dgoodell) 
> wrote:
>
> > On Jul 16, 2014, at 11:31 AM, Ralph Castain  wrote:
> >
> >> Nobody was "against" retaining it. The issue is that "-x" isn't an MCA
> parameter, nor does it get translated to one under the covers. So the
> problem was one of how to insert it into the typical MCA param precedence
> chain.
> >
> > I understand the combination of the two features is clunky and could
> lead to odd corner cases, but the "-x" argument is a feature I actually use
> on a fairly regular basis, but I am unlikely to use mca_base_env_list
> unless given no other choice.  It's just a worse, clunkier interface unless
> one really needs to set that MCA parameter via environment variable.
> >
> > So can we just strike the deprecation warning that is currently issued
> when "-x" is passed in the absence of "mca_base_env_list"?
> >
> > -Dave
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15176.php
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15177.php
>


Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Ralph Castain
Don't look at me - I didn't care to begin with as I never use -x and won't be 
using this param! :-)

My point was only that having both is clunky and leads to potential conflict. 
No good solution, so 


On Jul 16, 2014, at 9:57 AM, Dave Goodell (dgoodell)  wrote:

> On Jul 16, 2014, at 11:31 AM, Ralph Castain  wrote:
> 
>> Nobody was "against" retaining it. The issue is that "-x" isn't an MCA 
>> parameter, nor does it get translated to one under the covers. So the 
>> problem was one of how to insert it into the typical MCA param precedence 
>> chain.
> 
> I understand the combination of the two features is clunky and could lead to 
> odd corner cases, but the "-x" argument is a feature I actually use on a 
> fairly regular basis, but I am unlikely to use mca_base_env_list unless given 
> no other choice.  It's just a worse, clunkier interface unless one really 
> needs to set that MCA parameter via environment variable.
> 
> So can we just strike the deprecation warning that is currently issued when 
> "-x" is passed in the absence of "mca_base_env_list"?
> 
> -Dave
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15176.php



Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Dave Goodell (dgoodell)
On Jul 16, 2014, at 11:31 AM, Ralph Castain  wrote:

> Nobody was "against" retaining it. The issue is that "-x" isn't an MCA 
> parameter, nor does it get translated to one under the covers. So the problem 
> was one of how to insert it into the typical MCA param precedence chain.

I understand the combination of the two features is clunky and could lead to 
odd corner cases, but the "-x" argument is a feature I actually use on a fairly 
regular basis, but I am unlikely to use mca_base_env_list unless given no other 
choice.  It's just a worse, clunkier interface unless one really needs to set 
that MCA parameter via environment variable.

So can we just strike the deprecation warning that is currently issued when 
"-x" is passed in the absence of "mca_base_env_list"?

-Dave



Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Kenneth A. Lloyd
What about providing garbage collection for both POSIX and MPI threads? This
problem hints at several underlying layers of "programming faults".

-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Wednesday, July 16, 2014 8:59 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function
to opal

I discussed this over IM with Nathan to try and get a better understanding
of the options. Basically, we have two approaches available to us:

1. my solution resolves the segv problem and eliminates leaks so long as the
user calls MPI_Init/Finalize after calling the MPI_T init/finalize
functions. This method will still leak memory if the user doesn't use MPI
after calling the MPI_T functions, but does mean that all memory used by MPI
will be released upon MPI_Finalize. So if the user program continues beyond
MPI, they won't be carrying the MPI memory footprint with them. This
continues our current behavior.

2. the destructor method, which release the MPI memory footprint upon final
program termination instead of at MPI_Finalize. This also solves the segv
and leak problems, and ensures that someone calling only the MPI_T
init/finalize functions will be valgrind-clean, but means that a user
program that runs beyond MPI will carry the MPI memory footprint with them.
This is a change in our current behavior.

I'm not sure which approach is best, but I think this captures the heart of
the decision.


On Jul 16, 2014, at 7:36 AM, Nathan Hjelm  wrote:

> On Wed, Jul 16, 2014 at 08:26:44AM -0600, Nathan Hjelm wrote:
>> A number of issues have been raised as part of this discussion. Here 
>> is what I have seen so far:
>> 
>> - contructor/destructor order not garaunteed: From an opal perspective
>>   this should not be a problem. Most components are unloaded by
>>   opal_finalize () not opal_finalize_util (). So opal components
>>   opal should already be finalized by the time the destructor is called
>>   (or we can finalize them in the destructor if necessary).
>> 
>> - portability: All the compilers most of us care about: gcc, intel,
>>   clang. The exceptions appear to be xlc and pgi. For these compilers
>>   we can fall back on Ralph's solution and just leak if
>>   MPI_Finalize () is not called after MPI_T_Finalize (). Attached is an
>>   implementation that does that (needs some adjustment).
> 
> Correction. xlc does support the destructor function attribute. The 
> odd one out is PGI.
> 
> -Nathan
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15168.php

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15170.php


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4716 / Virus Database: 3986/7863 - Release Date: 07/16/14



Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Ralph Castain
Nobody was "against" retaining it. The issue is that "-x" isn't an MCA 
parameter, nor does it get translated to one under the covers. So the problem 
was one of how to insert it into the typical MCA param precedence chain.

If we don't try to do that, but instead simply error out if both -x and the MCA 
param are present, then that resolves the problem. However, and this is a big 
"but", remember that there is no way to "unset" an MCA param. So if someone 
puts the mca_base_env_list param into the default MCA param file, then there is 
no way a user can put -x on the cmd line because the conflict will be detected 
- and they can't turn "off" the MCA param to resolve it.

The inconsistent behavior may prove highly annoying in the user community, but 
I don't know how widely used this feature will be.


On Jul 16, 2014, at 9:22 AM, Dave Goodell (dgoodell)  wrote:

> On Jul 15, 2014, at 2:03 PM, Mike Dubman  wrote:
> 
>> these are two separate issues:
>> 
>> 1. -x var=val (or -mca opal_base_envlist var=val) will work in the same way
>> opal_base_envlist does the same as "-x" and can be used in the very same 
>> fashion as -x
>> 
>> 2. When list of vars is passed with help of opal_base_envlist, the escaping 
>> is possible but escaped char should differ from delimiter char.
> 
> That would be my preference (use something like "\" as the escape char).  
> Though we could always go with a scheme where a doubled delimiter means 
> "literal delimiter", sort of like "$$" in a Makefile.
> 
>> I think -x should stay as shotrt-form alias for -mca opal_base_envlist 
>> var=val and -mca opal_base_envlist var.
>> on dev meeting it was decided to deprecate it as some point.
> 
> Can we revisit this decision?  Mike and I both seem to be in favor of 
> retaining "-x", at least for non-conflicting uses.  Would someone who is 
> against retaining "-x" please speak up in favor of that position?
> 
> Also, Mike, I just looked again at the code and I don't think it is robustly 
> checking for conflict cases.  It's possible to do this and you won't get an 
> error with the current code, right?
> 
> 8<
> $ mpirun -mca mca_base_env_list foo=bar -x foo=baz ...
> 8<
> 
> See this code, which only looks at the environment when looking for 
> "mca_base_env_list":
> 
>> Modified: trunk/orte/tools/orterun/orterun.c
>> ==
>> --- trunk/orte/tools/orterun/orterun.c   Tue Jul  8 20:10:04 2014
>> (r32162)
>> +++ trunk/orte/tools/orterun/orterun.c   2014-07-08 20:38:25 EDT (Tue, 
>> 08 Jul 2014)  (r32163)
>> @@ -1722,6 +1722,13 @@
>> 
>> /* Did the user request to export any environment variables on the cmd 
>> line? */
>> if (opal_cmd_line_is_taken(_line, "x")) {
>> +char* env_set_flag = getenv("OMPI_MCA_mca_base_env_list");
>> +if (NULL != env_set_flag) {
>> +orte_show_help("help-orterun.txt", "orterun:conflict-env-set", 
>> false);
>> +return ORTE_ERR_FATAL;
>> +} else {
>> +orte_show_help("help-orterun.txt", 
>> "orterun:deprecated-env-set", false);
>> +}
>> j = opal_cmd_line_get_ninsts(_line, "x");
>> for (i = 0; i < j; ++i) {
>> param = opal_cmd_line_get_param(_line, "x", i, 0);
> 
> 
> -Dave
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15173.php



Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-16 Thread Dave Goodell (dgoodell)
On Jul 15, 2014, at 2:03 PM, Mike Dubman  wrote:

> these are two separate issues:
> 
> 1. -x var=val (or -mca opal_base_envlist var=val) will work in the same way
> opal_base_envlist does the same as "-x" and can be used in the very same 
> fashion as -x
> 
> 2. When list of vars is passed with help of opal_base_envlist, the escaping 
> is possible but escaped char should differ from delimiter char.

That would be my preference (use something like "\" as the escape char).  
Though we could always go with a scheme where a doubled delimiter means 
"literal delimiter", sort of like "$$" in a Makefile.

> I think -x should stay as shotrt-form alias for -mca opal_base_envlist 
> var=val and -mca opal_base_envlist var.
> on dev meeting it was decided to deprecate it as some point.

Can we revisit this decision?  Mike and I both seem to be in favor of retaining 
"-x", at least for non-conflicting uses.  Would someone who is against 
retaining "-x" please speak up in favor of that position?

Also, Mike, I just looked again at the code and I don't think it is robustly 
checking for conflict cases.  It's possible to do this and you won't get an 
error with the current code, right?

8<
$ mpirun -mca mca_base_env_list foo=bar -x foo=baz ...
8<

See this code, which only looks at the environment when looking for 
"mca_base_env_list":

> Modified: trunk/orte/tools/orterun/orterun.c
> ==
> --- trunk/orte/tools/orterun/orterun.cTue Jul  8 20:10:04 2014
> (r32162)
> +++ trunk/orte/tools/orterun/orterun.c2014-07-08 20:38:25 EDT (Tue, 
> 08 Jul 2014)  (r32163)
> @@ -1722,6 +1722,13 @@
>  
>  /* Did the user request to export any environment variables on the cmd 
> line? */
>  if (opal_cmd_line_is_taken(_line, "x")) {
> +char* env_set_flag = getenv("OMPI_MCA_mca_base_env_list");
> +if (NULL != env_set_flag) {
> +orte_show_help("help-orterun.txt", "orterun:conflict-env-set", 
> false);
> +return ORTE_ERR_FATAL;
> +} else {
> +orte_show_help("help-orterun.txt", "orterun:deprecated-env-set", 
> false);
> +}
>  j = opal_cmd_line_get_ninsts(_line, "x");
>  for (i = 0; i < j; ++i) {
>  param = opal_cmd_line_get_param(_line, "x", i, 0);


-Dave



Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Pritchard, Howard r
HI Folks,

I vote for solution #1.  Doesn't change current behavior.  Doesn't open the 
door to becoming dependent on availability of
ctor/dtor feature in future toolchains.

Howard


-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
Sent: Wednesday, July 16, 2014 9:08 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to 
opal

On Wed, Jul 16, 2014 at 07:59:14AM -0700, Ralph Castain wrote:
> I discussed this over IM with Nathan to try and get a better understanding of 
> the options. Basically, we have two approaches available to us:
> 
> 1. my solution resolves the segv problem and eliminates leaks so long as the 
> user calls MPI_Init/Finalize after calling the MPI_T init/finalize functions. 
> This method will still leak memory if the user doesn't use MPI after calling 
> the MPI_T functions, but does mean that all memory used by MPI will be 
> released upon MPI_Finalize. So if the user program continues beyond MPI, they 
> won't be carrying the MPI memory footprint with them. This continues our 
> current behavior.
> 
> 2. the destructor method, which release the MPI memory footprint upon final 
> program termination instead of at MPI_Finalize. This also solves the segv and 
> leak problems, and ensures that someone calling only the MPI_T init/finalize 
> functions will be valgrind-clean, but means that a user program that runs 
> beyond MPI will carry the MPI memory footprint with them. This is a change in 
> our current behavior.

Correct. Though the only thing we will carry around until termination is the 
memory associated with opal/mca/if, opal/mca/event, opal_net, opal_malloc, 
opal_show_help, opal_output, opal_dss, opal_datatype, and opal_class. Not sure 
how much memory this is.

-Nathan


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Nathan Hjelm
On Wed, Jul 16, 2014 at 07:59:14AM -0700, Ralph Castain wrote:
> I discussed this over IM with Nathan to try and get a better understanding of 
> the options. Basically, we have two approaches available to us:
> 
> 1. my solution resolves the segv problem and eliminates leaks so long as the 
> user calls MPI_Init/Finalize after calling the MPI_T init/finalize functions. 
> This method will still leak memory if the user doesn't use MPI after calling 
> the MPI_T functions, but does mean that all memory used by MPI will be 
> released upon MPI_Finalize. So if the user program continues beyond MPI, they 
> won't be carrying the MPI memory footprint with them. This continues our 
> current behavior.
> 
> 2. the destructor method, which release the MPI memory footprint upon final 
> program termination instead of at MPI_Finalize. This also solves the segv and 
> leak problems, and ensures that someone calling only the MPI_T init/finalize 
> functions will be valgrind-clean, but means that a user program that runs 
> beyond MPI will carry the MPI memory footprint with them. This is a change in 
> our current behavior.

Correct. Though the only thing we will carry around until termination is
the memory associated with opal/mca/if, opal/mca/event, opal_net,
opal_malloc, opal_show_help, opal_output, opal_dss, opal_datatype, and
opal_class. Not sure how much memory this is.

-Nathan


pgpnPkl7xqqrj.pgp
Description: PGP signature


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Ralph Castain
I discussed this over IM with Nathan to try and get a better understanding of 
the options. Basically, we have two approaches available to us:

1. my solution resolves the segv problem and eliminates leaks so long as the 
user calls MPI_Init/Finalize after calling the MPI_T init/finalize functions. 
This method will still leak memory if the user doesn't use MPI after calling 
the MPI_T functions, but does mean that all memory used by MPI will be released 
upon MPI_Finalize. So if the user program continues beyond MPI, they won't be 
carrying the MPI memory footprint with them. This continues our current 
behavior.

2. the destructor method, which release the MPI memory footprint upon final 
program termination instead of at MPI_Finalize. This also solves the segv and 
leak problems, and ensures that someone calling only the MPI_T init/finalize 
functions will be valgrind-clean, but means that a user program that runs 
beyond MPI will carry the MPI memory footprint with them. This is a change in 
our current behavior.

I'm not sure which approach is best, but I think this captures the heart of the 
decision.


On Jul 16, 2014, at 7:36 AM, Nathan Hjelm  wrote:

> On Wed, Jul 16, 2014 at 08:26:44AM -0600, Nathan Hjelm wrote:
>> A number of issues have been raised as part of this discussion. Here is
>> what I have seen so far:
>> 
>> - contructor/destructor order not garaunteed: From an opal perspective
>>   this should not be a problem. Most components are unloaded by
>>   opal_finalize () not opal_finalize_util (). So opal components
>>   opal should already be finalized by the time the destructor is called
>>   (or we can finalize them in the destructor if necessary).
>> 
>> - portability: All the compilers most of us care about: gcc, intel,
>>   clang. The exceptions appear to be xlc and pgi. For these compilers
>>   we can fall back on Ralph's solution and just leak if
>>   MPI_Finalize () is not called after MPI_T_Finalize (). Attached is an
>>   implementation that does that (needs some adjustment).
> 
> Correction. xlc does support the destructor function attribute. The odd
> one out is PGI.
> 
> -Nathan
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15168.php



Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Nathan Hjelm
On Wed, Jul 16, 2014 at 08:26:44AM -0600, Nathan Hjelm wrote:
> A number of issues have been raised as part of this discussion. Here is
> what I have seen so far:
> 
>  - contructor/destructor order not garaunteed: From an opal perspective
>this should not be a problem. Most components are unloaded by
>opal_finalize () not opal_finalize_util (). So opal components
>opal should already be finalized by the time the destructor is called
>(or we can finalize them in the destructor if necessary).
> 
>  - portability: All the compilers most of us care about: gcc, intel,
>clang. The exceptions appear to be xlc and pgi. For these compilers
>we can fall back on Ralph's solution and just leak if
>MPI_Finalize () is not called after MPI_T_Finalize (). Attached is an
>implementation that does that (needs some adjustment).

Correction. xlc does support the destructor function attribute. The odd
one out is PGI.

-Nathan


pgpGxJ67igVdH.pgp
Description: PGP signature


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Nathan Hjelm
Forgot to attach the patch.

-Nathan

diff --git a/opal/runtime/opal_finalize.c b/opal/runtime/opal_finalize.c
index 318eba7..22b2e58 100644
--- a/opal/runtime/opal_finalize.c
+++ b/opal/runtime/opal_finalize.c
@@ -1,3 +1,4 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */
 /*
  * Copyright (c) 2004-2010 The Trustees of Indiana University and Indiana
  * University Research and Technology
@@ -10,7 +11,7 @@
  * Copyright (c) 2004-2005 The Regents of the University of California.
  * All rights reserved.
  * Copyright (c) 2008-2012 Cisco Systems, Inc.  All rights reserved.
- * Copyright (c) 2010-2013 Los Alamos National Security, LLC.
+ * Copyright (c) 2010-2014 Los Alamos National Security, LLC.
  * All rights reserved.
  * Copyright (c) 2013-2014 Intel, Inc. All rights reserved
  * $COPYRIGHT$
@@ -57,6 +58,7 @@
 
 extern int opal_initialized;
 extern int opal_util_initialized;
+extern bool opal_init_util_called;
 
 int
 opal_finalize_util(void)
@@ -65,9 +67,21 @@ opal_finalize_util(void)
 if( opal_util_initialized < 0 ) {
 return OPAL_ERROR;
 }
-return OPAL_SUCCESS;
 }
 
+/* do not do anything more if the destuctor attribute is available */
+#if OPAL_HAVE_ATTRIBUTE_DESTRUCTOR
+return OPAL_SUCCESS;
+}
+
+static void __attribute__((destructor)) opal_cleanup_resources (void) {
+if (!opal_init_util_called) {
+/* nothing to clean up */
+return;
+}
+#endif
+opal_init_util_called = false;
+
 /* close interfaces code. */
 if (opal_if_base_framework.framework_refcnt > 1) {
 /* opal if may have been opened many times -- FIXME */
@@ -89,6 +103,11 @@ opal_finalize_util(void)
 
 (void) mca_base_framework_close(_installdirs_base_framework);
 
+#if OPAL_HAVE_ATTRIBUTE_DESTRUCTOR
+/* there are issues with tearing down everything in opal_finalize_util. 
doing
+ * so will cause opal_init_util to segmentation fault if called after 
finalize.
+ * this cleanup is safe in a destructor function. */
+
 /* finalize the memory allocator */
 opal_malloc_finalize();
 
@@ -108,8 +127,9 @@ opal_finalize_util(void)
 
 /* finalize the class/object system */
 opal_class_finalize();
-
+#else
 return OPAL_SUCCESS;
+#endif
 }
 
 
diff --git a/opal/runtime/opal_init.c b/opal/runtime/opal_init.c
index 6567a9f..a48517b 100644
--- a/opal/runtime/opal_init.c
+++ b/opal/runtime/opal_init.c
@@ -71,6 +71,8 @@ const char opal_version_string[] = OPAL_IDENT_STRING;
 
 int opal_initialized = 0;
 int opal_util_initialized = 0;
+bool opal_init_util_called = false;
+
 /* We have to put a guess in here in case hwloc is not available.  If
hwloc is available, this value will be overwritten when the
hwloc data is loaded. */
@@ -247,13 +249,17 @@ opal_init_util(int* pargc, char*** pargv)
 int ret;
 char *error = NULL;
 
-if( ++opal_util_initialized != 1 ) {
+++opal_util_initialized;
+
+if (opal_init_util_called) {
 if( opal_util_initialized < 1 ) {
 return OPAL_ERROR;
 }
 return OPAL_SUCCESS;
 }
 
+opal_init_util_called = true;
+
 /* initialize the memory allocator */
 opal_malloc_init();
 


pgpQAhUei3oae.pgp
Description: PGP signature


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Nathan Hjelm
A number of issues have been raised as part of this discussion. Here is
what I have seen so far:

 - contructor/destructor order not garaunteed: From an opal perspective
   this should not be a problem. Most components are unloaded by
   opal_finalize () not opal_finalize_util (). So opal components
   opal should already be finalized by the time the destructor is called
   (or we can finalize them in the destructor if necessary).

 - portability: All the compilers most of us care about: gcc, intel,
   clang. The exceptions appear to be xlc and pgi. For these compilers
   we can fall back on Ralph's solution and just leak if
   MPI_Finalize () is not called after MPI_T_Finalize (). Attached is an
   implementation that does that (needs some adjustment).


-Nathan





pgpcsLSFVFI1U.pgp
Description: PGP signature


Re: [OMPI devel] Onesided failures

2014-07-16 Thread Rolf vandeVaart
Sounds like a good plan.  Thanks for looking into this Gilles!
Regards,
Rolf

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles GOUAILLARDET
Sent: Wednesday, July 16, 2014 9:53 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] Onesided failures

Rolf,

From the man page of MPI_Win_allocate_shared

It is the user's responsibility to ensure that the communicator comm represents 
a group of processes that can create a shared memory segment that can be 
accessed by all processes in the group

And from the mtt logs, you are running 4 tasks on 2 nodes.

Unless I am missing something obvious, I will update the test tomorrow and add 
a comm split to ensure MPI_Win_allocate_shared is called from single node 
communicator and skip the test if this impossible

Cheers,

Gilles

Rolf vandeVaart > wrote:
On both 1.8 and trunk (as Ralph mentioned in meeting) we are seeing three tests 
fail.
http://mtt.open-mpi.org/index.php?do_redir=2205

Ibm/onesided/win_allocate_shared
Ibm/onesided/win_allocated_shared_mpifh
Ibm/onesided/win_allocated_shared_usempi

Is there a ticket that covers these failures?

Thanks,
Rolf

This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.



Re: [OMPI devel] Onesided failures

2014-07-16 Thread Gilles GOUAILLARDET
Rolf,

From the man page of MPI_Win_allocate_shared

It is the user's responsibility to ensure that the communicator comm represents 
a group of processes that can create a shared memory segment that can be 
accessed by all processes in the group

And from the mtt logs, you are running 4 tasks on 2 nodes.

Unless I am missing something obvious, I will update the test tomorrow and add 
a comm split to ensure MPI_Win_allocate_shared is called from single node 
communicator and skip the test if this impossible

Cheers,

Gilles

Rolf vandeVaart  wrote:
>
>
>On both 1.8 and trunk (as Ralph mentioned in meeting) we are seeing three 
>tests fail.
>
>http://mtt.open-mpi.org/index.php?do_redir=2205
>
> 
>
>Ibm/onesided/win_allocate_shared
>
>Ibm/onesided/win_allocated_shared_mpifh
>
>Ibm/onesided/win_allocated_shared_usempi
>
> 
>
>Is there a ticket that covers these failures?
>
> 
>
>Thanks,
>
>Rolf
>
>This email message is for the sole use of the intended recipient(s) and may 
>contain confidential information.  Any unauthorized review, use, disclosure or 
>distribution is prohibited.  If you are not the intended recipient, please 
>contact the sender by reply email and destroy all copies of the original 
>message. 
>


[OMPI devel] Onesided failures

2014-07-16 Thread Rolf vandeVaart
On both 1.8 and trunk (as Ralph mentioned in meeting) we are seeing three tests 
fail.
http://mtt.open-mpi.org/index.php?do_redir=2205

Ibm/onesided/win_allocate_shared
Ibm/onesided/win_allocated_shared_mpifh
Ibm/onesided/win_allocated_shared_usempi

Is there a ticket that covers these failures?

Thanks,
Rolf

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Gilles Gouaillardet
Ralph and all,

my understanding is that

opal_finalize_util

agressively tries to free memory that would be still allocated otherwise.

an other way of saying "make valgrind happy" is "fully automake memory
leak detection"
(Joost pointed to the -fsanitize=leak feature of gcc 4.9 in
http://www.open-mpi.org/community/lists/devel/2014/05/14672.php)

the following simple program :

#include 

int main(int argc, char* argv[])
{
  int ret, provided;
  ret = MPI_T_init_thread(MPI_THREAD_SINGLE, );
  ret = MPI_T_finalize();
  return 0;
}

leaks a *lot* of objects (and might remove some environment variables as
well) which have been half destroyed by opal_finalize_util, for example :
- classes are still marked as initialized *but* the cls_contruct_array
has been free'd
- the oob framework was not unallocated, it is still marked as
MCA_BASE_FRAMEWORK_FLAG_REGISTERED
  but some mca variables were freed, and that will cause problems when
MPI_Init try to (re)start the tcp component

now my 0.02$ :

ideally, MPI_Finalize nor MPI_T_finalize would leak any memory and the
framework would be re-initializable.
this could be a goal and George gave some good explanations on why it is
hard to achieve.
from my pragmatic point of view, and for this test case only, i am very
happy with a simple working solution,
even if it means that MPI_T_finalize leaks way too much memory in order
to work around the non re-initializable framework.

Cheers,

Gilles

On 2014/07/16 12:49, Ralph Castain wrote:
> I've attached a solution that blocks the segfault without requiring any 
> gyrations. Can someone explain why this isn't adequate?
>
> Alternate solution was to simply decrement opal_util_initialized in 
> MPI_T_finalize rather than calling finalize itself. Either way resolves the 
> problem in a very simple manner.
>



Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Ralph Castain
I've attached a solution that blocks the segfault without requiring any gyrations. Can someone explain why this isn't adequate?Alternate solution was to simply decrement opal_util_initialized in MPI_T_finalize rather than calling finalize itself. Either way resolves the problem in a very simple manner.

fix.diff
Description: Binary data


mpit.c
Description: Binary data
On Jul 15, 2014, at 6:10 PM, Ralph Castain  wrote:I'm unsure where Intel's compilers sit on that list.When you say it works except for reinit, are you saying that the only issue here is that MPI_T_Finalize is calling opal_finalize_util solely because of the valgrind cleanup? And if it didn't do that, we would leak but would otherwise be just fine?Just checking my understanding. Looking at the code, that would certainly appear to be true due to the reference counter in there, which would prevent us from eventually cleaning up because the counter wouldn't reach zero. However, couldn't we resolve that by (a) having MPI_T_Init set a global flag indicating it was called, and then (b) in opal_finalize, check the flag and add another call to opal_finalize_util if the flag is set?Seems like all we really need to do is ensure that the init/finalize calls match, and that is far easier to ensure than doing the rest of this stuff.On Jul 15, 2014, at 5:48 PM, George Bosilca  wrote:Enforcing the portability of this sounds like a huge [almost impossible] mess, without a clean portable solution (more about this below). However, few things should be considered:- Except for reinit, Open MPI works without it! If we provide such a capability it will be more a convenience capability to keep valgrind happy, than a necessity
- in case the constructor/destructor functionality is available we explicitly control the ordering in which the shared libraries are opened/closed as we control the dl_open/dl_close for most of the shared libraries.
  George.PS: Other cases about shared libraries constructor/destructor.The easy ones.Mac OS X: https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/DynamicLibraries/100-Articles/DynamicLibraryDesignGuidelines.html
Solaris: http://docs.oracle.com/cd/E18659_01/html/821-1383/bkamq.htmlAnd the others
PGI: http://www.pgroup.com/userforum/viewtopic.php?t=697=4efce7bfb4e914e42f48f219fc7e6a7e
XLC: beg for forgiveness (there is a -binitifini function but it must be specified at link time)
On Tue, Jul 15, 2014 at 8:06 PM, Paul Hargrove  wrote:
The priority appears to have been added in gcc 4.3.You'll note it is not described in https://gcc.gnu.org/onlinedocs/gcc-4.2.0/gcc/Function-Attributes.html

I also don't think the presence of the priority argument fixes anything...An OpenMPI code author cannot change the "priority" of a ctor or dtor in a precompiled third-party library (libpmi comes to mind).  Nor can one know what value the third part chose (in order to be higher or lower than theirs).  You cannot even be assured the third-party didn't set priority to INT_MIN or INT_MAX (or whatever).

That text also says nothing about dl_open() and dl_close() which must be considered in Open MPI.Before assuming constructor/destructor attributes are going to save the world, wash your dog, and pick up the dry cleaning, one should probably verify some minimal level of support on non-gnu tool-chains including vendor compilers (PGI, XLC, etc) and system linkers (Darwin and Solaris).

-PaulOn Tue, Jul 15, 2014 at 4:52 PM, Joshua Ladd  wrote:

According to http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html

"constructor 
 destructor  constructor (priority) destructor (priority)


The constructor attribute causes the function to be called
automatically before execution enters main ().  Similarly, the
destructor attribute causes the function to be called
automatically after main () completes or exit () is
called.  Functions with these attributes are useful for
initializing data that is used implicitly during the execution of
the program.

 You may provide an optional integer priority to control the order in
which constructor and destructor functions are run.  A constructor
with a smaller priority number runs before a constructor with a larger
priority number; the opposite relationship holds for destructors.  So,
if you have a constructor that allocates a resource and a destructor
that deallocates the same resource, both functions typically have the
same priority.  The priorities for constructor and destructor
functions are the same as those specified for namespace-scope C++
objects (see C++ Attributes).

 These attributes are not currently implemented for Objective-C."On Tue, Jul 15, 2014 at 5:20 PM, Paul Hargrove  wrote:


On Tue, Jul 15, 2014 at 12:49 PM, Pritchard, Howard r  wrote:



I don't think there's anything wrong with using ctor/dtors in shared libraries,




but one does need to make