Re: [OMPI devel] about r32685

2014-09-09 Thread Brice Goglin
Gilles,

The strange configure check comes from this commit

https://github.com/open-mpi/hwloc/commit/6a9299ce9d1cb1c13b3b346fe6fdfed2df75c672
Are you sure your patch won't break something else?
I'll ask Pavan what he thinks about your patch.
I agree that it's crazy we don't find strncasecmp on some Linux boxes
but this detection code is already a mess so I'd rather no change it again.

Brice



Le 09/09/2014 04:56, Gilles Gouaillardet a écrit :
> Ralph,
>
> ok, let me clarify my point :
>
> tolower() is invoked in :
> opal/mca/hwloc/hwloc191/hwloc/src/misc.c
> and ctype.h is already #include'd in this file
>
> tolower() is also invoked in :
> opal/mca/hwloc/hwloc191/hwloc/include/private/misc.h
> *only* if HWLOC_HAVE_DECL_STRNCASECMP is not #define'd :
>
> static __hwloc_inline int hwloc_strncasecmp(const char *s1, const char
> *s2, size_t n)
> {
> #ifdef HWLOC_HAVE_DECL_STRNCASECMP
>   return strncasecmp(s1, s2, n);
> #else
>   while (n) {
> char c1 = tolower(*s1), c2 = tolower(*s2);
> if (!c1 || !c2 || c1 != c2)
>   return c1-c2;
> n--; s1++; s2++;
>   }
>   return 0;
> #endif
> }
>
> my point was that on your CentOS box, HWLOC_HAVE_DECL_STRNCASECMP
> *should* have been #define'd by configure,
> even if you are using intel or clang 3.2 compiler.
>
> Cheers,
>
> Gilles
>
> On 2014/09/09 11:47, Ralph Castain wrote:
>> I'll have to let Brice comment on the config change. All I can say is that 
>> "tolower" on my CentOS box is defined in , and that has to be 
>> included in the misc.h header.
>>
>>
>> On Sep 8, 2014, at 5:49 PM, Gilles Gouaillardet 
>>  wrote:
>>
>>> Ralph and Brice,
>>>
>>> i noted Ralph commited r32685 in order to fix a problem with Intel
>>> compilers.
>>> The very similar issue occurs with clang 3.2 (gcc and clang 3.4 are ok
>>> for me)
>>>
>>> imho, the root cause is in the hwloc configure.
>>> in this case, configure fails to detect strncasecmp is part of the C
>>> include files.
>>>
>>> in order to achieve this, the conftest.1.c program is compiled and a
>>> failure means that
>>> strncasecmp is supported since it is declared in some C include files.
>>>
>>> gcc and clang 3.4 both fail to compile this program :
>>>
>>> $ gcc -c /tmp/conftest.1.c ; echo $?
>>> /tmp/conftest.1.c:592: warning: data definition has no type or storage class
>>> /tmp/conftest.1.c:592: error: conflicting types for ‘strncasecmp’
>>> 1
>>>
>>> $ clang --version
>>> clang version 3.4 (tags/RELEASE_34/final)
>>> Target: x86_64-redhat-linux-gnu
>>> Thread model: posix
>>> $ clang -c /tmp/conftest.1.c ; echo $?
>>> /tmp/conftest.1.c:592:8: warning: type specifier missing, defaults to 'int'
>>> [-Wimplicit-int]
>>> strncasecmp(int,long,int,long,int,long,int,long,int,long);
>>> ^~~
>>> /tmp/conftest.1.c:592:8: error: conflicting types for 'strncasecmp'
>>> /usr/include/string.h:540:12: note: previous declaration is here
>>> extern int strncasecmp (__const char *__s1, __const char *__s2, size_t __n)
>>> ^
>>> /tmp/conftest.1.c:596:19: error: too many arguments to function call,
>>> expected
>>> 3, have 10
>>> strncasecmp(1,2,3,4,5,6,7,8,9,10);
>>> ~~~ ^~
>>> 1 warning and 2 errors generated.
>>> 1
>>>
>>>
>>> but clang 3.2 and icc simply issue a warning and no error :
>>>
>>> $ clang --version
>>> clang version 3.2 (tags/RELEASE_32/final)
>>> Target: x86_64-unknown-linux-gnu
>>> Thread model: posix
>>> $ clang -c /tmp/conftest.1.c ; echo $?
>>> /tmp/conftest.1.c:592:8: warning: type specifier missing, defaults to 'int'
>>> [-Wimplicit-int]
>>> strncasecmp(int,long,int,long,int,long,int,long,int,long);
>>> ^~~
>>> /tmp/conftest.1.c:592:8: warning: incompatible redeclaration of library
>>> function
>>> 'strncasecmp'
>>> /usr/include/string.h:540:12: note: 'strncasecmp' is a builtin with type
>>> 'int
>>> (const char *, const char *, size_t)'
>>> extern int strncasecmp (__const char *__s1, __const char *__s2, size_t __n)
>>> ^
>>> 2 warnings generated.
>>> 0
>>>
>>> $ icc -c conftest.1.c ; echo $?
>>> conftest.1.c(592): warning #77: this declaration has no storage class or
>>> type specifier
>>> strncasecmp(int,long,int,long,int,long,int,long,int,long);
>>> ^
>>>
>>> conftest.1.c(592): warning #147: declaration is incompatible with "int
>>> strncasecmp(const char *, const char *, size_t={unsigned long})"
>>> (declared at line 540 of "/usr/include/string.h")
>>> strncasecmp(int,long,int,long,int,long,int,long,int,long);
>>> ^
>>>
>>> 0
>>>
>>>
>>> the attached hwloc_config.patch is used in order to make the test
>>> program slightly different (conftest.2.c) and it does fail with all the
>>> compilers.
>>>
>>>
>>> that being said, r32685 might not be reversed since in the case
>>> strncasecmp is not supported by the system (i do not even know if such
>>> os exist)
>>> ctype.h must be #include'd in order to get the prototype of the
>>> tolower() function.
>>>
>>>
>>> could you please review the hwloc_config.patch and comment ?
>>>
>>> Cheers,
>>>
>>> Gilles
>>> __

Re: [OMPI devel] about r32685

2014-09-09 Thread Gilles Gouaillardet
Brice,

the goal of the test is to check whether a function declaration is
already present
in one of the header.
in order to achieve this, the test will redeclare the function with an
absurd prototype :

strncasecmp(int,long,int,long,int,long,int,long,int,long);

/* by the way, there is no prototype for the returned value if any */

the test *assumes* the compiler will fail with an error if the function
is re-declared with a different prototype.
this asumption is just wrong (well, at least with clang 3.2 and intel
compilers)

my patch redeclares the function as a variable instead of a function
with a different prototype,
and this leads to a failure with gnu, (all) clang and intel compilers
(this is all i could test so far)

as long as the _HWLOC_CHECK_DECL is invoked on a function and not on a
global variable,
i do not think my patch can break anything.

an other option could be to use the preprocessor and grep :
$CC -E conftest.1.c | egrep -q '[[:space:]]+strncasecmp[[:space:]]+'
but i do not believe this is the way to go ...

Cheers,

Gilles

On 2014/09/09 14:30, Brice Goglin wrote:
> Gilles,
>
> The strange configure check comes from this commit
>
> https://github.com/open-mpi/hwloc/commit/6a9299ce9d1cb1c13b3b346fe6fdfed2df75c672
> Are you sure your patch won't break something else?
> I'll ask Pavan what he thinks about your patch.
> I agree that it's crazy we don't find strncasecmp on some Linux boxes
> but this detection code is already a mess so I'd rather no change it again.
>
> Brice
>
>
>
> Le 09/09/2014 04:56, Gilles Gouaillardet a écrit :
>> Ralph,
>>
>> ok, let me clarify my point :
>>
>> tolower() is invoked in :
>> opal/mca/hwloc/hwloc191/hwloc/src/misc.c
>> and ctype.h is already #include'd in this file
>>
>> tolower() is also invoked in :
>> opal/mca/hwloc/hwloc191/hwloc/include/private/misc.h
>> *only* if HWLOC_HAVE_DECL_STRNCASECMP is not #define'd :
>>
>> static __hwloc_inline int hwloc_strncasecmp(const char *s1, const char
>> *s2, size_t n)
>> {
>> #ifdef HWLOC_HAVE_DECL_STRNCASECMP
>>   return strncasecmp(s1, s2, n);
>> #else
>>   while (n) {
>> char c1 = tolower(*s1), c2 = tolower(*s2);
>> if (!c1 || !c2 || c1 != c2)
>>   return c1-c2;
>> n--; s1++; s2++;
>>   }
>>   return 0;
>> #endif
>> }
>>
>> my point was that on your CentOS box, HWLOC_HAVE_DECL_STRNCASECMP
>> *should* have been #define'd by configure,
>> even if you are using intel or clang 3.2 compiler.
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/09/09 11:47, Ralph Castain wrote:
>>> I'll have to let Brice comment on the config change. All I can say is that 
>>> "tolower" on my CentOS box is defined in , and that has to be 
>>> included in the misc.h header.
>>>
>>>
>>> On Sep 8, 2014, at 5:49 PM, Gilles Gouaillardet 
>>>  wrote:
>>>
 Ralph and Brice,

 i noted Ralph commited r32685 in order to fix a problem with Intel
 compilers.
 The very similar issue occurs with clang 3.2 (gcc and clang 3.4 are ok
 for me)

 imho, the root cause is in the hwloc configure.
 in this case, configure fails to detect strncasecmp is part of the C
 include files.

 in order to achieve this, the conftest.1.c program is compiled and a
 failure means that
 strncasecmp is supported since it is declared in some C include files.

 gcc and clang 3.4 both fail to compile this program :

 $ gcc -c /tmp/conftest.1.c ; echo $?
 /tmp/conftest.1.c:592: warning: data definition has no type or storage 
 class
 /tmp/conftest.1.c:592: error: conflicting types for ‘strncasecmp’
 1

 $ clang --version
 clang version 3.4 (tags/RELEASE_34/final)
 Target: x86_64-redhat-linux-gnu
 Thread model: posix
 $ clang -c /tmp/conftest.1.c ; echo $?
 /tmp/conftest.1.c:592:8: warning: type specifier missing, defaults to 'int'
 [-Wimplicit-int]
 strncasecmp(int,long,int,long,int,long,int,long,int,long);
 ^~~
 /tmp/conftest.1.c:592:8: error: conflicting types for 'strncasecmp'
 /usr/include/string.h:540:12: note: previous declaration is here
 extern int strncasecmp (__const char *__s1, __const char *__s2, size_t __n)
 ^
 /tmp/conftest.1.c:596:19: error: too many arguments to function call,
 expected
 3, have 10
 strncasecmp(1,2,3,4,5,6,7,8,9,10);
 ~~~ ^~
 1 warning and 2 errors generated.
 1


 but clang 3.2 and icc simply issue a warning and no error :

 $ clang --version
 clang version 3.2 (tags/RELEASE_32/final)
 Target: x86_64-unknown-linux-gnu
 Thread model: posix
 $ clang -c /tmp/conftest.1.c ; echo $?
 /tmp/conftest.1.c:592:8: warning: type specifier missing, defaults to 'int'
 [-Wimplicit-int]
 strncasecmp(int,long,int,long,int,long,int,long,int,long);
 ^~~
 /tmp/conftest.1.c:592:8: warning: incompatible redeclaration of library
 function
 'strncasecmp'
 /usr/include/string.h:540:12: no

[OMPI devel] race condition in grpcomm/rcd

2014-09-09 Thread Gilles Gouaillardet
Folks,

Since r32672 (trunk), grpcomm/rcd is the default module.
the attached spawn.c test program is a trimmed version of the
spawn_with_env_vars.c test case
from the ibm test suite.

when invoked on two nodes :
- the program hangs with -np 2
- the program can crash with np > 2
error message is
[node0:30701] [[42913,0],0] TWO RECEIVES WITH SAME PEER [[42913,0],1]
AND TAG -33 - ABORTING

here is my full command line (from node0) :

mpirun -host node0,node1 -np 2 --oversubscribe --mca btl tcp,self --mca
coll ^ml ./spawn

a simple workaround is to add the following extra parameter to the
mpirun command line :
--mca grpcomm_rcd_priority 0

my understanding it that the race condition occurs when all the
processes call MPI_Finalize()
internally, the pmix module will have mpirun/orted issue two ALLGATHER
involving mpirun and orted
(one job 1 aka the parent, and one for job 2 aka the spawned tasks)
the error message is very explicit : this is not (currently) supported

i wrote the attached rml.patch which is really a workaround and not a fix :
in this case, each job will invoke an ALLGATHER but with a different tag
/* that works for a limited number of jobs only */

i did not commit this patch since this is not a fix, could someone
(Ralph ?) please review the issue and comment ?


Cheers,

Gilles

/*
 * $HEADER$
 *
 * Program to test MPI_Comm_spawn with environment variables.
 */

#include 
#include 
#include 

#include "mpi.h"

static void do_parent(char *cmd, int rank, int count)
{
int *errcode, err;
int i;
MPI_Comm child_inter;
MPI_Comm intra;
FILE *fp;
int found;
int size;

/* First, see if cmd exists on all ranks */

fp = fopen(cmd, "r");
if (NULL == fp) {
found = 0;
} else {
fclose(fp);
found = 1;
}
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Allreduce(&found, &count, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
if (count != size) {
if (rank == 0) {
MPI_Abort(MPI_COMM_WORLD, 77);
}
return;
}

/* Now try the spawn if it's found anywhere */

errcode = malloc(sizeof(int) * count);
if (NULL == errcode) {
MPI_Abort(MPI_COMM_WORLD, 1);
}
memset(errcode, -1, count);
MPI_Comm_spawn(cmd, MPI_ARGV_NULL, count, MPI_INFO_NULL, 0,
   MPI_COMM_WORLD, &child_inter, errcode);

/* Clean up */
MPI_Barrier(child_inter);

MPI_Comm_disconnect(&child_inter);
free(errcode);
}


static void do_target(MPI_Comm parent)
{
MPI_Barrier(parent);
MPI_Comm_disconnect(&parent);
}


int main(int argc, char *argv[])
{
int rank, size;
MPI_Comm parent;

/* Ok, we're good.  Proceed with the test. */
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);

/* Check to see if we *were* spawned -- because this is a test, we
   can only assume the existence of this one executable.  Hence, we
   both mpirun it and spawn it. */

parent = MPI_COMM_NULL;
MPI_Comm_get_parent(&parent);
if (parent != MPI_COMM_NULL) {
do_target(parent);
} else {
do_parent(argv[0], rank, size);
}

MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (0 < rank) sleep(3);

MPI_Finalize();

/* All done */

return 0;
}
Index: orte/mca/grpcomm/brks/grpcomm_brks.c
===
--- orte/mca/grpcomm/brks/grpcomm_brks.c(revision 32688)
+++ orte/mca/grpcomm/brks/grpcomm_brks.c(working copy)
@@ -6,6 +6,8 @@
  * Copyright (c) 2011-2013 Los Alamos National Security, LLC. All
  * rights reserved.
  * Copyright (c) 2014  Intel, Inc.  All rights reserved.
+ * Copyright (c) 2014  Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -111,6 +113,7 @@
 static int brks_allgather_send_dist(orte_grpcomm_coll_t *coll, orte_vpid_t 
distance) {
 orte_process_name_t peer_send, peer_recv;
 opal_buffer_t *send_buf;
+orte_rml_tag_t tag;
 int rc;

 peer_send.jobid = ORTE_PROC_MY_NAME->jobid;
@@ -174,8 +177,14 @@
  ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
  ORTE_NAME_PRINT(&peer_send)));

+if (1 != coll->sig->sz || ORTE_VPID_WILDCARD != 
coll->sig->signature[0].vpid) {
+tag = ORTE_RML_TAG_ALLGATHER;
+} else {
+tag = ORTE_RML_TAG_JOB_ALLGATHER + 
ORTE_LOCAL_JOBID(coll->sig->signature[0].jobid) % 
(ORTE_RML_TAG_MAX-ORTE_RML_TAG_JOB_ALLGATHER);
+}
+
 if (0 > (rc = orte_rml.send_buffer_nb(&peer_send, send_buf,
-  -ORTE_RML_TAG_ALLGATHER,
+  -tag,
   orte_rml_send_callback, NULL))) {
 ORTE_ERROR_LOG(rc);
 

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-09 Thread Ralph Castain
Yeah, that's not the correct fix. The right way to fix it is for all three 
components to have their own RML tag, and for each of them to establish a 
persistent receive. They then can use the signature to tell which collective 
the incoming message belongs to.

I'll fix it, but it won't be until tomorrow I'm afraid as today is shot.


On Sep 9, 2014, at 3:10 AM, Gilles Gouaillardet  
wrote:

> Folks,
> 
> Since r32672 (trunk), grpcomm/rcd is the default module.
> the attached spawn.c test program is a trimmed version of the
> spawn_with_env_vars.c test case
> from the ibm test suite.
> 
> when invoked on two nodes :
> - the program hangs with -np 2
> - the program can crash with np > 2
> error message is
> [node0:30701] [[42913,0],0] TWO RECEIVES WITH SAME PEER [[42913,0],1]
> AND TAG -33 - ABORTING
> 
> here is my full command line (from node0) :
> 
> mpirun -host node0,node1 -np 2 --oversubscribe --mca btl tcp,self --mca
> coll ^ml ./spawn
> 
> a simple workaround is to add the following extra parameter to the
> mpirun command line :
> --mca grpcomm_rcd_priority 0
> 
> my understanding it that the race condition occurs when all the
> processes call MPI_Finalize()
> internally, the pmix module will have mpirun/orted issue two ALLGATHER
> involving mpirun and orted
> (one job 1 aka the parent, and one for job 2 aka the spawned tasks)
> the error message is very explicit : this is not (currently) supported
> 
> i wrote the attached rml.patch which is really a workaround and not a fix :
> in this case, each job will invoke an ALLGATHER but with a different tag
> /* that works for a limited number of jobs only */
> 
> i did not commit this patch since this is not a fix, could someone
> (Ralph ?) please review the issue and comment ?
> 
> 
> Cheers,
> 
> Gilles
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15780.php



[OMPI devel] RFC: Multiple duplicate MCA param generates error

2014-09-09 Thread Ralph Castain
WHAT: Generate an error if the user provides duplicate MCA params on the 
cmd line

WHY:   User confusion due to unexpected behavior

WHEN: Tues, Sept 16 as this is a gating issue for 1.8.3 release


In the beginning, OMPI would look at a cmd line for MCA params - if a param was 
listed more than once, we would take the LAST value given and ignore all the 
rest. At some point, this behavior was changed in 
opal/mca/base/mca_base_cmd_line.c such that we concatenated the values into a 
comma-delimited list. Unfortunately, the backend parser doesn't know how to 
deal with such a list when converting the param to values such as INT or BOOL.

In r32686, I reverted the behavior back to the original one of taking the LAST 
value. However, this can lead to unexpected behavior. For example, consider the 
case where the user provides a cmd line containing the following:

... -mca btl ^sm. -mca btl openib.

In this case, the result will be a setting of "btl=openib", and only the openib 
BTL will be selected. If the user thought that all BTLs other than sm were to 
be considered, this could be a surprise. Likewise, note that if the order is 
reversed, the result would be "btl=^sm" - a completely different behavior.

On the telecon, we couldn't think of any use-case where we would want the last 
value or concatenating behaviors. Instead, there was consensus that we should 
generate an error as the user is asking us to do conflicting things.

However, we acknowledged that we might not understand all the use-cases, so we 
are issuing this as an RFC in case someone knows of a reason for the other 
behaviors.



[OMPI devel] Stop updating the Trac wiki (Github conversion)

2014-09-09 Thread Jeff Squyres (jsquyres)
As part of our migration to Github, there are three main things to convert:

1. SVN -> Git
2. Trac wiki -> Github markdown
3. Track tickets -> Github issues

I have a good automated solution for #1; that can be run whenever we're ready 
to switchover.

I used an 80% automated solution for #2 (i.e., a bunch of regexps), and then 
did a bunch of manual fixups for what the regexps didn't catch.  You can see 
the result here (temporarily):

   https://github.com/jsquyres/ompi-test/wiki

*** PLEASE DO NOT UPDATE THE TRAC WIKI ANY MORE!  The trac wiki page conversion 
is basically done, and I don't intend to do it again.

Meaning: if you make any updates to the Trac wiki, they won't be brought over 
to the new github wiki.

Lastly, I'm just starting to see about converting Trac tickets to Github 
issues. I'll be working on that this week, and hope to have updated information 
for everyone for next Tuesday's call.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: Multiple duplicate MCA param generates error

2014-09-09 Thread Mike Dubman
maybe we should have another MCA parameter to specify desired policy?
LAST,CONCAT,FIRST and let user select it?

basically, it is to mimic "setenv(var,val,overwrite)" behavior which is
easy to explain and good to have.


On Tue, Sep 9, 2014 at 7:31 PM, Ralph Castain  wrote:

> WHAT: Generate an error if the user provides duplicate MCA params on
> the cmd line
>
> WHY:   User confusion due to unexpected behavior
>
> WHEN: Tues, Sept 16 as this is a gating issue for 1.8.3 release
>
>
> In the beginning, OMPI would look at a cmd line for MCA params - if a
> param was listed more than once, we would take the LAST value given and
> ignore all the rest. At some point, this behavior was changed in
> opal/mca/base/mca_base_cmd_line.c such that we concatenated the values into
> a comma-delimited list. Unfortunately, the backend parser doesn't know how
> to deal with such a list when converting the param to values such as INT or
> BOOL.
>
> In r32686, I reverted the behavior back to the original one of taking the
> LAST value. However, this can lead to unexpected behavior. For example,
> consider the case where the user provides a cmd line containing the
> following:
>
> ... -mca btl ^sm. -mca btl openib.
>
> In this case, the result will be a setting of "btl=openib", and only the
> openib BTL will be selected. If the user thought that all BTLs other than
> sm were to be considered, this could be a surprise. Likewise, note that if
> the order is reversed, the result would be "btl=^sm" - a completely
> different behavior.
>
> On the telecon, we couldn't think of any use-case where we would want the
> last value or concatenating behaviors. Instead, there was consensus that we
> should generate an error as the user is asking us to do conflicting things.
>
> However, we acknowledged that we might not understand all the use-cases,
> so we are issuing this as an RFC in case someone knows of a reason for the
> other behaviors.
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/09/15782.php
>



-- 

Kind Regards,

M.


[OMPI devel] How to fix git messes

2014-09-09 Thread Jeff Squyres (jsquyres)
This is a fantastically useful flow chart:

http://justinhileman.info/article/git-pretty/

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: Multiple duplicate MCA param generates error

2014-09-09 Thread Ralph Castain
I can't see how a user would know how to use such a thing - which mca params 
can absorb a concatenated value? Why would you take the last vs the first 
instead of just providing a value only once?


On Sep 9, 2014, at 10:42 AM, Mike Dubman  wrote:

> maybe we should have another MCA parameter to specify desired policy? 
> LAST,CONCAT,FIRST and let user select it?
> 
> basically, it is to mimic "setenv(var,val,overwrite)" behavior which is easy 
> to explain and good to have.
> 
> 
> On Tue, Sep 9, 2014 at 7:31 PM, Ralph Castain  wrote:
> WHAT: Generate an error if the user provides duplicate MCA params on the 
> cmd line
> 
> WHY:   User confusion due to unexpected behavior
> 
> WHEN: Tues, Sept 16 as this is a gating issue for 1.8.3 release
> 
> 
> In the beginning, OMPI would look at a cmd line for MCA params - if a param 
> was listed more than once, we would take the LAST value given and ignore all 
> the rest. At some point, this behavior was changed in 
> opal/mca/base/mca_base_cmd_line.c such that we concatenated the values into a 
> comma-delimited list. Unfortunately, the backend parser doesn't know how to 
> deal with such a list when converting the param to values such as INT or BOOL.
> 
> In r32686, I reverted the behavior back to the original one of taking the 
> LAST value. However, this can lead to unexpected behavior. For example, 
> consider the case where the user provides a cmd line containing the following:
> 
> ... -mca btl ^sm. -mca btl openib.
> 
> In this case, the result will be a setting of "btl=openib", and only the 
> openib BTL will be selected. If the user thought that all BTLs other than sm 
> were to be considered, this could be a surprise. Likewise, note that if the 
> order is reversed, the result would be "btl=^sm" - a completely different 
> behavior.
> 
> On the telecon, we couldn't think of any use-case where we would want the 
> last value or concatenating behaviors. Instead, there was consensus that we 
> should generate an error as the user is asking us to do conflicting things.
> 
> However, we acknowledged that we might not understand all the use-cases, so 
> we are issuing this as an RFC in case someone knows of a reason for the other 
> behaviors.
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15782.php
> 
> 
> 
> -- 
> 
> Kind Regards,
> 
> M.
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15784.php



Re: [OMPI devel] RFC: Multiple duplicate MCA param generates error

2014-09-09 Thread Jeff Squyres (jsquyres)
On Sep 9, 2014, at 1:42 PM, Mike Dubman  wrote:

> maybe we should have another MCA parameter to specify desired policy? 
> LAST,CONCAT,FIRST and let user select it?
> 
> basically, it is to mimic "setenv(var,val,overwrite)" behavior which is easy 
> to explain and good to have.

Is there a use case for this behavior?

-x behavior is different than MCA param setting behavior.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/