Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-19 Thread Ralph H Castain



On 11/19/07 8:19 AM, "Jeff Squyres"  wrote:

> On Nov 19, 2007, at 10:15 AM, Ralph H Castain wrote:
> 
>> Hmmm...okay, I get the message: if it is going to be fixed, then I
>> have to
>> fix it on my end. ;-) Which means it ain't happening anytime soon,
>> so Tim P
>> is the one left in the lurch.
>> 
>> Sorry Tim! Perhaps you can come up with a compromise that re-enables
>> what
>> you want to do.
> 
> Yep -- sorry Tim.  :-(
> 
> Is this only a problem for the rsh/ssh launcher?  I *think* that
> should be the only launcher affected -- all others use raw argv string
> arrays, right?  If so, we already have a local/non-local flag in the
> rsh launcher, right?  I agree it kinda sucks, but it may be necessary
> to construct argv differently between local/non-local (which we
> already do, right?).
> 

Nope, not any more - you can't do a local rsh since mpirun is acting as the
local daemon. And we only have one way of constructing argv arrays for
orteds as they are always non-local.

I believe the problem is occurring for all environments due to the
reconstruction of the argv array to include the cmd line mca params. Tim
will have to clarify for his application - I know I encountered the problem
on SLURM, TM, and rsh for my application, but I fixed that special case once
I discovered what was going on.


> 
> 
>> On 11/19/07 8:11 AM, "Jeff Squyres"  wrote:
>> 
>>> On Nov 19, 2007, at 10:01 AM, Ralph H Castain wrote:
>>> 
 Unfortunately, it -is- a significant problem when passing the params
 on to
 the orteds, as Tim has eloquently pointed out in his original
 posting. I
 guess I don't see why it would be a significant problem to just have
 opal_cmd_line_parse do the "right thing" - if a string param is
 quoted, then
 just remove the quotes.
>>> 
>>> How would opal_cmd_line_parse know when to remove quotes and when not
>>> remove quotes?  I.e., what if an MCA param (or some other value, such
>>> as a user application argv) required quotes?  This is not uncommon
>>> for
>>> the wrapper compilers, for example (passing -D's to the C
>>> preprocessor
>>> that take string arguments).
>>> 
>>> AFAIK, opal_cmd_line_parse() does exactly what I intended it to.  :-)
>>> It performs no shell-like quoting because it expects to be given a
>>> list of "final" string tokens.
>>> 
 Problem solved, and I wouldn't have to add a bunch of code to figure
 out
 when to assemble argv arrays in different ways.
>>> 
>>> Er... I don't understand: I joined this thread because I thought
>>> there
>>> was still something to fix...?
>>> 
>>> I would be against changing opal_cmd_line_parse() in the manner that
>>> you have described because it will break other things (e.g., the
>>> wrapper compilers).
>>> 
>>> 
>>> 
 
 
 On 11/19/07 7:52 AM, "Jeff Squyres"  wrote:
 
> I guess I don't see why this is an opal_cmd_line_parse() problem.
> 
> If you invoke executables through system(), then a shell is used
> and
> quoting is necessary to preserve the individual string tokens
> (i.e.,
> "my beloved string" would be passed to the application as one argv
> token, without the quotes).
> 
> But if you're building up an array of argv and calling some form of
> exec(), then no shell is involved and quoting should not be
> necessary.  Specifically:
> 
>opal_append_argv(&argc, &argv, "my beloved string");
> 
> will be passed as one string token to the application.
> 
> opal_cmd_line_parse() is passed an array of argv, meaning that the
> command line have already been split into individual string
> tokens.  I
> guess the question is whether these come in directly from the
> arguments to main() or whether we are getting a single string and
> breaking it up into tokens.  If the latter is true, then we need to
> re-
> evaluate our break-into-tokens algorithm.  I have a dim
> recollection
> that these come in from the arguments to main(), though.
> 
> I guess I can see where this would get complicated for rsh/ssh
> invocations, because *both* models are used. (i.e., you exec
> locally
> but it turns into a system-like invocation on the remote side).  In
> this case, I think you'll need to quote extended strings (e.g.,
> those
> containing spaces) for the non-local invocations not not quote it
> for
> local invocations.
> 
> 
> 
> 
> On Nov 19, 2007, at 9:19 AM, Ralph H Castain wrote:
> 
>> My, you are joining late! ;-)
>> 
>> The problem is with MCA params that take string arguments. If we
>> pass the following:
>> 
>> -mca foo "my beloved string"
>> 
>> on the command line of an orted, we get a value for foo that
>> includes the quote marks. I verified this rather painfully when
>> attempting to pass a command line mca param for a uri. I
>> eventually
>> had to add specific code

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-19 Thread Jeff Squyres

On Nov 19, 2007, at 10:15 AM, Ralph H Castain wrote:

Hmmm...okay, I get the message: if it is going to be fixed, then I  
have to
fix it on my end. ;-) Which means it ain't happening anytime soon,  
so Tim P

is the one left in the lurch.

Sorry Tim! Perhaps you can come up with a compromise that re-enables  
what

you want to do.


Yep -- sorry Tim.  :-(

Is this only a problem for the rsh/ssh launcher?  I *think* that  
should be the only launcher affected -- all others use raw argv string  
arrays, right?  If so, we already have a local/non-local flag in the  
rsh launcher, right?  I agree it kinda sucks, but it may be necessary  
to construct argv differently between local/non-local (which we  
already do, right?).





On 11/19/07 8:11 AM, "Jeff Squyres"  wrote:


On Nov 19, 2007, at 10:01 AM, Ralph H Castain wrote:


Unfortunately, it -is- a significant problem when passing the params
on to
the orteds, as Tim has eloquently pointed out in his original
posting. I
guess I don't see why it would be a significant problem to just have
opal_cmd_line_parse do the "right thing" - if a string param is
quoted, then
just remove the quotes.


How would opal_cmd_line_parse know when to remove quotes and when not
remove quotes?  I.e., what if an MCA param (or some other value, such
as a user application argv) required quotes?  This is not uncommon  
for
the wrapper compilers, for example (passing -D's to the C  
preprocessor

that take string arguments).

AFAIK, opal_cmd_line_parse() does exactly what I intended it to.  :-)
It performs no shell-like quoting because it expects to be given a
list of "final" string tokens.


Problem solved, and I wouldn't have to add a bunch of code to figure
out
when to assemble argv arrays in different ways.


Er... I don't understand: I joined this thread because I thought  
there

was still something to fix...?

I would be against changing opal_cmd_line_parse() in the manner that
you have described because it will break other things (e.g., the
wrapper compilers).






On 11/19/07 7:52 AM, "Jeff Squyres"  wrote:


I guess I don't see why this is an opal_cmd_line_parse() problem.

If you invoke executables through system(), then a shell is used  
and
quoting is necessary to preserve the individual string tokens  
(i.e.,

"my beloved string" would be passed to the application as one argv
token, without the quotes).

But if you're building up an array of argv and calling some form of
exec(), then no shell is involved and quoting should not be
necessary.  Specifically:

   opal_append_argv(&argc, &argv, "my beloved string");

will be passed as one string token to the application.

opal_cmd_line_parse() is passed an array of argv, meaning that the
command line have already been split into individual string
tokens.  I
guess the question is whether these come in directly from the
arguments to main() or whether we are getting a single string and
breaking it up into tokens.  If the latter is true, then we need to
re-
evaluate our break-into-tokens algorithm.  I have a dim  
recollection

that these come in from the arguments to main(), though.

I guess I can see where this would get complicated for rsh/ssh
invocations, because *both* models are used. (i.e., you exec  
locally

but it turns into a system-like invocation on the remote side).  In
this case, I think you'll need to quote extended strings (e.g.,  
those
containing spaces) for the non-local invocations not not quote it  
for

local invocations.




On Nov 19, 2007, at 9:19 AM, Ralph H Castain wrote:


My, you are joining late! ;-)

The problem is with MCA params that take string arguments. If we
pass the following:

-mca foo "my beloved string"

on the command line of an orted, we get a value for foo that
includes the quote marks. I verified this rather painfully when
attempting to pass a command line mca param for a uri. I  
eventually

had to add specific code to clean the paramater up.

This appears to be somehow related to the precise method used to
register the param. For example, the following deprecated method
works fine:

On the setup end:

  opal_argv_append(argc, argv, "--gprreplica");
  if (NULL != orte_process_info.gpr_replica_uri) {
  contact_info = strdup(orte_process_info.gpr_replica_uri);
  } else {
  contact_info = orte_rml.get_contact_info();
  }
  asprintf(¶m, "\"%s\"", contact_info);
  opal_argv_append(argc, argv, param);

And on the receiving end:
  id = mca_base_param_register_string("gpr", "replica", "uri",
NULL, orte_process_info.gpr_replica_uri);
  mca_base_param_lookup_string(id,
&(orte_process_info.gpr_replica_uri));
  mca_base_param_set_internal(id, true);


However, the following does NOT work cleanly:

On the setup end:
 rml_uri = orte_rml.get_contact_info();
  asprintf(¶m, "\"%s\"", rml_uri);
  opal_argv_append(argc, argv, "--hnp-uri");
  opal_argv_append(argc, argv, param);

On the receiving end:
  mca_base_param_reg_string_name("orte", "hnp_uri",
 "HNP co

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-19 Thread Ralph H Castain
Hmmm...okay, I get the message: if it is going to be fixed, then I have to
fix it on my end. ;-) Which means it ain't happening anytime soon, so Tim P
is the one left in the lurch.

Sorry Tim! Perhaps you can come up with a compromise that re-enables what
you want to do.

Ralph



On 11/19/07 8:11 AM, "Jeff Squyres"  wrote:

> On Nov 19, 2007, at 10:01 AM, Ralph H Castain wrote:
> 
>> Unfortunately, it -is- a significant problem when passing the params
>> on to
>> the orteds, as Tim has eloquently pointed out in his original
>> posting. I
>> guess I don't see why it would be a significant problem to just have
>> opal_cmd_line_parse do the "right thing" - if a string param is
>> quoted, then
>> just remove the quotes.
> 
> How would opal_cmd_line_parse know when to remove quotes and when not
> remove quotes?  I.e., what if an MCA param (or some other value, such
> as a user application argv) required quotes?  This is not uncommon for
> the wrapper compilers, for example (passing -D's to the C preprocessor
> that take string arguments).
> 
> AFAIK, opal_cmd_line_parse() does exactly what I intended it to.  :-)
> It performs no shell-like quoting because it expects to be given a
> list of "final" string tokens.
> 
>> Problem solved, and I wouldn't have to add a bunch of code to figure
>> out
>> when to assemble argv arrays in different ways.
> 
> Er... I don't understand: I joined this thread because I thought there
> was still something to fix...?
> 
> I would be against changing opal_cmd_line_parse() in the manner that
> you have described because it will break other things (e.g., the
> wrapper compilers).
> 
> 
> 
>> 
>> 
>> On 11/19/07 7:52 AM, "Jeff Squyres"  wrote:
>> 
>>> I guess I don't see why this is an opal_cmd_line_parse() problem.
>>> 
>>> If you invoke executables through system(), then a shell is used and
>>> quoting is necessary to preserve the individual string tokens (i.e.,
>>> "my beloved string" would be passed to the application as one argv
>>> token, without the quotes).
>>> 
>>> But if you're building up an array of argv and calling some form of
>>> exec(), then no shell is involved and quoting should not be
>>> necessary.  Specifically:
>>> 
>>> opal_append_argv(&argc, &argv, "my beloved string");
>>> 
>>> will be passed as one string token to the application.
>>> 
>>> opal_cmd_line_parse() is passed an array of argv, meaning that the
>>> command line have already been split into individual string
>>> tokens.  I
>>> guess the question is whether these come in directly from the
>>> arguments to main() or whether we are getting a single string and
>>> breaking it up into tokens.  If the latter is true, then we need to
>>> re-
>>> evaluate our break-into-tokens algorithm.  I have a dim recollection
>>> that these come in from the arguments to main(), though.
>>> 
>>> I guess I can see where this would get complicated for rsh/ssh
>>> invocations, because *both* models are used. (i.e., you exec locally
>>> but it turns into a system-like invocation on the remote side).  In
>>> this case, I think you'll need to quote extended strings (e.g., those
>>> containing spaces) for the non-local invocations not not quote it for
>>> local invocations.
>>> 
>>> 
>>> 
>>> 
>>> On Nov 19, 2007, at 9:19 AM, Ralph H Castain wrote:
>>> 
 My, you are joining late! ;-)
 
 The problem is with MCA params that take string arguments. If we
 pass the following:
 
 -mca foo "my beloved string"
 
 on the command line of an orted, we get a value for foo that
 includes the quote marks. I verified this rather painfully when
 attempting to pass a command line mca param for a uri. I eventually
 had to add specific code to clean the paramater up.
 
 This appears to be somehow related to the precise method used to
 register the param. For example, the following deprecated method
 works fine:
 
 On the setup end:
 
opal_argv_append(argc, argv, "--gprreplica");
if (NULL != orte_process_info.gpr_replica_uri) {
contact_info = strdup(orte_process_info.gpr_replica_uri);
} else {
contact_info = orte_rml.get_contact_info();
}
asprintf(¶m, "\"%s\"", contact_info);
opal_argv_append(argc, argv, param);
 
 And on the receiving end:
id = mca_base_param_register_string("gpr", "replica", "uri",
 NULL, orte_process_info.gpr_replica_uri);
mca_base_param_lookup_string(id,
 &(orte_process_info.gpr_replica_uri));
mca_base_param_set_internal(id, true);
 
 
 However, the following does NOT work cleanly:
 
 On the setup end:
   rml_uri = orte_rml.get_contact_info();
asprintf(¶m, "\"%s\"", rml_uri);
opal_argv_append(argc, argv, "--hnp-uri");
opal_argv_append(argc, argv, param);
 
 On the receiving end:
mca_base_param_reg_string_name("orte", "hnp_uri",
   "HNP cont

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-19 Thread Jeff Squyres

On Nov 19, 2007, at 10:01 AM, Ralph H Castain wrote:

Unfortunately, it -is- a significant problem when passing the params  
on to
the orteds, as Tim has eloquently pointed out in his original  
posting. I

guess I don't see why it would be a significant problem to just have
opal_cmd_line_parse do the "right thing" - if a string param is  
quoted, then

just remove the quotes.


How would opal_cmd_line_parse know when to remove quotes and when not  
remove quotes?  I.e., what if an MCA param (or some other value, such  
as a user application argv) required quotes?  This is not uncommon for  
the wrapper compilers, for example (passing -D's to the C preprocessor  
that take string arguments).


AFAIK, opal_cmd_line_parse() does exactly what I intended it to.  :-)   
It performs no shell-like quoting because it expects to be given a  
list of "final" string tokens.


Problem solved, and I wouldn't have to add a bunch of code to figure  
out

when to assemble argv arrays in different ways.


Er... I don't understand: I joined this thread because I thought there  
was still something to fix...?


I would be against changing opal_cmd_line_parse() in the manner that  
you have described because it will break other things (e.g., the  
wrapper compilers).







On 11/19/07 7:52 AM, "Jeff Squyres"  wrote:


I guess I don't see why this is an opal_cmd_line_parse() problem.

If you invoke executables through system(), then a shell is used and
quoting is necessary to preserve the individual string tokens (i.e.,
"my beloved string" would be passed to the application as one argv
token, without the quotes).

But if you're building up an array of argv and calling some form of
exec(), then no shell is involved and quoting should not be
necessary.  Specifically:

opal_append_argv(&argc, &argv, "my beloved string");

will be passed as one string token to the application.

opal_cmd_line_parse() is passed an array of argv, meaning that the
command line have already been split into individual string  
tokens.  I

guess the question is whether these come in directly from the
arguments to main() or whether we are getting a single string and
breaking it up into tokens.  If the latter is true, then we need to  
re-

evaluate our break-into-tokens algorithm.  I have a dim recollection
that these come in from the arguments to main(), though.

I guess I can see where this would get complicated for rsh/ssh
invocations, because *both* models are used. (i.e., you exec locally
but it turns into a system-like invocation on the remote side).  In
this case, I think you'll need to quote extended strings (e.g., those
containing spaces) for the non-local invocations not not quote it for
local invocations.




On Nov 19, 2007, at 9:19 AM, Ralph H Castain wrote:


My, you are joining late! ;-)

The problem is with MCA params that take string arguments. If we
pass the following:

-mca foo "my beloved string"

on the command line of an orted, we get a value for foo that
includes the quote marks. I verified this rather painfully when
attempting to pass a command line mca param for a uri. I eventually
had to add specific code to clean the paramater up.

This appears to be somehow related to the precise method used to
register the param. For example, the following deprecated method
works fine:

On the setup end:

   opal_argv_append(argc, argv, "--gprreplica");
   if (NULL != orte_process_info.gpr_replica_uri) {
   contact_info = strdup(orte_process_info.gpr_replica_uri);
   } else {
   contact_info = orte_rml.get_contact_info();
   }
   asprintf(¶m, "\"%s\"", contact_info);
   opal_argv_append(argc, argv, param);

And on the receiving end:
   id = mca_base_param_register_string("gpr", "replica", "uri",
NULL, orte_process_info.gpr_replica_uri);
   mca_base_param_lookup_string(id,
&(orte_process_info.gpr_replica_uri));
   mca_base_param_set_internal(id, true);


However, the following does NOT work cleanly:

On the setup end:
  rml_uri = orte_rml.get_contact_info();
   asprintf(¶m, "\"%s\"", rml_uri);
   opal_argv_append(argc, argv, "--hnp-uri");
   opal_argv_append(argc, argv, param);

On the receiving end:
   mca_base_param_reg_string_name("orte", "hnp_uri",
  "HNP contact info",
  true, false, NULL,  &uri);

Thereby necessitating the addition of the following code to clean it
up:
   if (NULL != uri) {
   /* the uri value passed to us will have quote marks around
it to protect
   * the value if passed on the command line. We must remove
those
   * to have a correct uri string
   */
  if ('"' == uri[0]) {
   /* if the first char is a quote, then so will the last
one be */
  ptr = &uri[1];
   len = strlen(ptr) - 1;
   } else {
   ptr = &uri[0];
   len = strlen(uri);
   }

   /* we have to copy the string by hand as strndup is a GNU
extension
* and may not be generally available
*/
  or

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-19 Thread Ralph H Castain
Unfortunately, it -is- a significant problem when passing the params on to
the orteds, as Tim has eloquently pointed out in his original posting. I
guess I don't see why it would be a significant problem to just have
opal_cmd_line_parse do the "right thing" - if a string param is quoted, then
just remove the quotes.

Problem solved, and I wouldn't have to add a bunch of code to figure out
when to assemble argv arrays in different ways.


On 11/19/07 7:52 AM, "Jeff Squyres"  wrote:

> I guess I don't see why this is an opal_cmd_line_parse() problem.
> 
> If you invoke executables through system(), then a shell is used and
> quoting is necessary to preserve the individual string tokens (i.e.,
> "my beloved string" would be passed to the application as one argv
> token, without the quotes).
> 
> But if you're building up an array of argv and calling some form of
> exec(), then no shell is involved and quoting should not be
> necessary.  Specifically:
> 
>  opal_append_argv(&argc, &argv, "my beloved string");
> 
> will be passed as one string token to the application.
> 
> opal_cmd_line_parse() is passed an array of argv, meaning that the
> command line have already been split into individual string tokens.  I
> guess the question is whether these come in directly from the
> arguments to main() or whether we are getting a single string and
> breaking it up into tokens.  If the latter is true, then we need to re-
> evaluate our break-into-tokens algorithm.  I have a dim recollection
> that these come in from the arguments to main(), though.
> 
> I guess I can see where this would get complicated for rsh/ssh
> invocations, because *both* models are used. (i.e., you exec locally
> but it turns into a system-like invocation on the remote side).  In
> this case, I think you'll need to quote extended strings (e.g., those
> containing spaces) for the non-local invocations not not quote it for
> local invocations.
> 
> 
> 
> 
> On Nov 19, 2007, at 9:19 AM, Ralph H Castain wrote:
> 
>> My, you are joining late! ;-)
>> 
>> The problem is with MCA params that take string arguments. If we
>> pass the following:
>> 
>> -mca foo "my beloved string"
>> 
>> on the command line of an orted, we get a value for foo that
>> includes the quote marks. I verified this rather painfully when
>> attempting to pass a command line mca param for a uri. I eventually
>> had to add specific code to clean the paramater up.
>> 
>> This appears to be somehow related to the precise method used to
>> register the param. For example, the following deprecated method
>> works fine:
>> 
>> On the setup end:
>> 
>> opal_argv_append(argc, argv, "--gprreplica");
>> if (NULL != orte_process_info.gpr_replica_uri) {
>> contact_info = strdup(orte_process_info.gpr_replica_uri);
>> } else {
>> contact_info = orte_rml.get_contact_info();
>> }
>> asprintf(¶m, "\"%s\"", contact_info);
>> opal_argv_append(argc, argv, param);
>> 
>> And on the receiving end:
>> id = mca_base_param_register_string("gpr", "replica", "uri",
>> NULL, orte_process_info.gpr_replica_uri);
>> mca_base_param_lookup_string(id,
>> &(orte_process_info.gpr_replica_uri));
>> mca_base_param_set_internal(id, true);
>> 
>> 
>> However, the following does NOT work cleanly:
>> 
>> On the setup end:
>>rml_uri = orte_rml.get_contact_info();
>> asprintf(¶m, "\"%s\"", rml_uri);
>> opal_argv_append(argc, argv, "--hnp-uri");
>> opal_argv_append(argc, argv, param);
>> 
>> On the receiving end:
>> mca_base_param_reg_string_name("orte", "hnp_uri",
>>"HNP contact info",
>>true, false, NULL,  &uri);
>> 
>> Thereby necessitating the addition of the following code to clean it
>> up:
>> if (NULL != uri) {
>> /* the uri value passed to us will have quote marks around
>> it to protect
>> * the value if passed on the command line. We must remove
>> those
>> * to have a correct uri string
>> */
>>if ('"' == uri[0]) {
>> /* if the first char is a quote, then so will the last
>> one be */
>>ptr = &uri[1];
>> len = strlen(ptr) - 1;
>> } else {
>> ptr = &uri[0];
>> len = strlen(uri);
>> }
>> 
>> /* we have to copy the string by hand as strndup is a GNU
>> extension
>>  * and may not be generally available
>>  */
>>orte_process_info.my_hnp_uri = (char*)malloc(len+1);
>> for (i=0; i < len; i++) {
>> orte_process_info.my_hnp_uri[i] = ptr[i];
>> }
>> orte_process_info.my_hnp_uri[len] = '\0';  /* NULL terminate
>> */
>>free(uri);
>> 
>> }
>> 
>> It was my understanding that you wanted us to move away from the
>> deprecated interface ­ hence my comment that we cannot just quote
>> all the strings as we would have to add this code all over the
>> place, or (better) fix opal_cmd_line_par

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-19 Thread Jeff Squyres

I guess I don't see why this is an opal_cmd_line_parse() problem.

If you invoke executables through system(), then a shell is used and  
quoting is necessary to preserve the individual string tokens (i.e.,  
"my beloved string" would be passed to the application as one argv  
token, without the quotes).


But if you're building up an array of argv and calling some form of  
exec(), then no shell is involved and quoting should not be  
necessary.  Specifically:


opal_append_argv(&argc, &argv, "my beloved string");

will be passed as one string token to the application.

opal_cmd_line_parse() is passed an array of argv, meaning that the  
command line have already been split into individual string tokens.  I  
guess the question is whether these come in directly from the  
arguments to main() or whether we are getting a single string and  
breaking it up into tokens.  If the latter is true, then we need to re- 
evaluate our break-into-tokens algorithm.  I have a dim recollection  
that these come in from the arguments to main(), though.


I guess I can see where this would get complicated for rsh/ssh  
invocations, because *both* models are used. (i.e., you exec locally  
but it turns into a system-like invocation on the remote side).  In  
this case, I think you'll need to quote extended strings (e.g., those  
containing spaces) for the non-local invocations not not quote it for  
local invocations.





On Nov 19, 2007, at 9:19 AM, Ralph H Castain wrote:


My, you are joining late! ;-)

The problem is with MCA params that take string arguments. If we  
pass the following:


-mca foo "my beloved string"

on the command line of an orted, we get a value for foo that  
includes the quote marks. I verified this rather painfully when  
attempting to pass a command line mca param for a uri. I eventually  
had to add specific code to clean the paramater up.


This appears to be somehow related to the precise method used to  
register the param. For example, the following deprecated method  
works fine:


On the setup end:

opal_argv_append(argc, argv, "--gprreplica");
if (NULL != orte_process_info.gpr_replica_uri) {
contact_info = strdup(orte_process_info.gpr_replica_uri);
} else {
contact_info = orte_rml.get_contact_info();
}
asprintf(¶m, "\"%s\"", contact_info);
opal_argv_append(argc, argv, param);

And on the receiving end:
id = mca_base_param_register_string("gpr", "replica", "uri",  
NULL, orte_process_info.gpr_replica_uri);
mca_base_param_lookup_string(id,  
&(orte_process_info.gpr_replica_uri));

mca_base_param_set_internal(id, true);


However, the following does NOT work cleanly:

On the setup end:
   rml_uri = orte_rml.get_contact_info();
asprintf(¶m, "\"%s\"", rml_uri);
opal_argv_append(argc, argv, "--hnp-uri");
opal_argv_append(argc, argv, param);

On the receiving end:
mca_base_param_reg_string_name("orte", "hnp_uri",
   "HNP contact info",
   true, false, NULL,  &uri);

Thereby necessitating the addition of the following code to clean it  
up:

if (NULL != uri) {
/* the uri value passed to us will have quote marks around  
it to protect
* the value if passed on the command line. We must remove  
those

* to have a correct uri string
*/
   if ('"' == uri[0]) {
/* if the first char is a quote, then so will the last  
one be */

   ptr = &uri[1];
len = strlen(ptr) - 1;
} else {
ptr = &uri[0];
len = strlen(uri);
}

/* we have to copy the string by hand as strndup is a GNU  
extension

 * and may not be generally available
 */
   orte_process_info.my_hnp_uri = (char*)malloc(len+1);
for (i=0; i < len; i++) {
orte_process_info.my_hnp_uri[i] = ptr[i];
}
orte_process_info.my_hnp_uri[len] = '\0';  /* NULL terminate  
*/

   free(uri);

}

It was my understanding that you wanted us to move away from the  
deprecated interface – hence my comment that we cannot just quote  
all the strings as we would have to add this code all over the  
place, or (better) fix opal_cmd_line_parse.


Hope that helps
Ralph


On 11/19/07 7:01 AM, "Jeff Squyres"  wrote:

> Sorry -- I'm just joining this conversation late: what's the problem
> with opal_cmd_line_parse?
>
> It should obey all quoting from shells, etc.  I.e., it shouldn't  
care

> about tokens with special characters (to include spaces) because the
> shell divides all of that stuff up -- it just gets a char*[] that it
> treats as discrete tokens.
>
> Is it doing something wrong?
>
>
>
> On Nov 19, 2007, at 8:39 AM, Ralph H Castain wrote:
>
>> I'm not sure it is really necessary - the problem is solely within
>> opal_cmd_line_parse and (if someone can parse that code ;-)) is
>> truly simple
>> to fix. The overly long cmd line issue is due to a bug that 

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-19 Thread Ralph H Castain
My, you are joining late! ;-)

The problem is with MCA params that take string arguments. If we pass the
following:

-mca foo "my beloved string"

on the command line of an orted, we get a value for foo that includes the
quote marks. I verified this rather painfully when attempting to pass a
command line mca param for a uri. I eventually had to add specific code to
clean the paramater up.

This appears to be somehow related to the precise method used to register
the param. For example, the following deprecated method works fine:

On the setup end:

opal_argv_append(argc, argv, "--gprreplica");
if (NULL != orte_process_info.gpr_replica_uri) {
contact_info = strdup(orte_process_info.gpr_replica_uri);
} else {
contact_info = orte_rml.get_contact_info();
}
asprintf(¶m, "\"%s\"", contact_info);
opal_argv_append(argc, argv, param);

And on the receiving end:
id = mca_base_param_register_string("gpr", "replica", "uri", NULL,
orte_process_info.gpr_replica_uri);
mca_base_param_lookup_string(id, &(orte_process_info.gpr_replica_uri));
mca_base_param_set_internal(id, true);

 
However, the following does NOT work cleanly:

On the setup end:
rml_uri = orte_rml.get_contact_info();
asprintf(¶m, "\"%s\"", rml_uri);
opal_argv_append(argc, argv, "--hnp-uri");
opal_argv_append(argc, argv, param);

On the receiving end:
mca_base_param_reg_string_name("orte", "hnp_uri",
   "HNP contact info",
   true, false, NULL,  &uri);

Thereby necessitating the addition of the following code to clean it up:
if (NULL != uri) {
/* the uri value passed to us will have quote marks around it to
protect
* the value if passed on the command line. We must remove those
* to have a correct uri string
*/
if ('"' == uri[0]) {
/* if the first char is a quote, then so will the last one be */
ptr = &uri[1];
len = strlen(ptr) - 1;
} else {
ptr = &uri[0];
len = strlen(uri);
}

/* we have to copy the string by hand as strndup is a GNU extension
 * and may not be generally available
 */
orte_process_info.my_hnp_uri = (char*)malloc(len+1);
for (i=0; i < len; i++) {
orte_process_info.my_hnp_uri[i] = ptr[i];
}
orte_process_info.my_hnp_uri[len] = '\0';  /* NULL terminate */
free(uri);

}

It was my understanding that you wanted us to move away from the deprecated
interface ­ hence my comment that we cannot just quote all the strings as we
would have to add this code all over the place, or (better) fix
opal_cmd_line_parse.

Hope that helps
Ralph


On 11/19/07 7:01 AM, "Jeff Squyres"  wrote:

> Sorry -- I'm just joining this conversation late: what's the problem
> with opal_cmd_line_parse?
> 
> It should obey all quoting from shells, etc.  I.e., it shouldn't care
> about tokens with special characters (to include spaces) because the
> shell divides all of that stuff up -- it just gets a char*[] that it
> treats as discrete tokens.
> 
> Is it doing something wrong?
> 
> 
> 
> On Nov 19, 2007, at 8:39 AM, Ralph H Castain wrote:
> 
>> I'm not sure it is really necessary - the problem is solely within
>> opal_cmd_line_parse and (if someone can parse that code ;-)) is
>> truly simple
>> to fix. The overly long cmd line issue is due to a bug that Josh was
>> going
>> to look at (may already have done so while I was out of touch).
>> 
>> Ralph
>> 
>> 
>> 
>> On 11/9/07 5:10 AM, "Jeff Squyres"  wrote:
>> 
>>> Should there be another option for passing MCA parameters between
>>> processes, such as via stdin (or any file descriptor)?  I.e., during
>>> the command line parsing to check for command line MCA params,
>>> perhaps
>>> a new argument could be introduced: -mcauri , where  could
>>> be a few different forms:
>>> 
>>> - file://stdin: (note the 2 //, not 3, so "stdin" would never
>>> conflict
>>> with a real file named /stdin)  Read the parameters in off stdin.
>>> 
>>> - rml://...rml contact info...: read in the MCA params via the RML
>>> (although I assume that reading via the RML would be *wy* to late
>>> during the MCA setup process -- I mentioned this option for
>>> completeness, even though I don't think it'll work)
>>> 
>>> - ip://ipaddress:port: open a socket back and read the MCA params in
>>> over a socket.  This could have some scalability issues...?  But who
>>> knows; it could be tied into the hierarchical startup such that we
>>> wouldn't have to have an all-to-one connection scheme.  Certainly it
>>> would cause scalability problems when paired with today's all-to-one
>>> RML connection scheme for the OOB.
>>> 
>>> I'm not sure that the rml: and ip: schemes are worthwhile.  Maybe a
>>> file://stdin kind of approach could work?  Or perhaps some other kind
>>> of URI/IPC...?  (I really haven't thought through the 

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-19 Thread Jeff Squyres
Sorry -- I'm just joining this conversation late: what's the problem  
with opal_cmd_line_parse?


It should obey all quoting from shells, etc.  I.e., it shouldn't care  
about tokens with special characters (to include spaces) because the  
shell divides all of that stuff up -- it just gets a char*[] that it  
treats as discrete tokens.


Is it doing something wrong?



On Nov 19, 2007, at 8:39 AM, Ralph H Castain wrote:


I'm not sure it is really necessary - the problem is solely within
opal_cmd_line_parse and (if someone can parse that code ;-)) is  
truly simple
to fix. The overly long cmd line issue is due to a bug that Josh was  
going

to look at (may already have done so while I was out of touch).

Ralph



On 11/9/07 5:10 AM, "Jeff Squyres"  wrote:


Should there be another option for passing MCA parameters between
processes, such as via stdin (or any file descriptor)?  I.e., during
the command line parsing to check for command line MCA params,  
perhaps

a new argument could be introduced: -mcauri , where  could
be a few different forms:

- file://stdin: (note the 2 //, not 3, so "stdin" would never  
conflict

with a real file named /stdin)  Read the parameters in off stdin.

- rml://...rml contact info...: read in the MCA params via the RML
(although I assume that reading via the RML would be *wy* to late
during the MCA setup process -- I mentioned this option for
completeness, even though I don't think it'll work)

- ip://ipaddress:port: open a socket back and read the MCA params in
over a socket.  This could have some scalability issues...?  But who
knows; it could be tied into the hierarchical startup such that we
wouldn't have to have an all-to-one connection scheme.  Certainly it
would cause scalability problems when paired with today's all-to-one
RML connection scheme for the OOB.

I'm not sure that the rml: and ip: schemes are worthwhile.  Maybe a
file://stdin kind of approach could work?  Or perhaps some other kind
of URI/IPC...?  (I really haven't thought through the issues -- this
is off the top of my head)



On Nov 8, 2007, at 2:36 PM, Ralph H Castain wrote:


Might I suggest:

https://svn.open-mpi.org/trac/ompi/ticket/1073

It deals with some of these issues and explains the boundaries of  
the

problem. As for what a string param can contain, I have no opinion.
I only
note that it must handle special characters such as ';', '/', etc.
that are
typically found in uri's. I cannot think of any reason it should
have a
quote in it.

Ralph



On 11/8/07 12:25 PM, "Tim Prins"  wrote:


The alias option you presented does not work. I think we do some
weird
things to find the absolute path for ssh, instead of just issuing  
the

command.

I would spend some time fixing this, but I don't want to do it
wrong. We
could quote all the param values, and change the parser to remove  
the

quotes, but this is assuming that a mca param does not contain
quotes.

So I guess there are 2 questions that need to be answered before a
fix
is made:

1. What exactly can a string mca param contain? Can it have  
quotes or

spaces or?

2. Which mca parameters should be forwarded? Should it be just the
ones
from the command line? From the environment? From config files?

Tim

Ralph Castain wrote:

What changed is that we never passed mca params to the orted
before - they
always went to the app, but it's the orted that has the issue.
There is a
bug ticket thread on this subject - I forget the number  
immediately.


Basically, the problem was that we cannot generally pass the local
environment to the orteds when we launch them. However, people
needed
various mca params to get to the orteds to control their behavior.
The only
way to resolve that problem was to pass the params via the command
line,
which is what was done.

Except for a very few cases, all of our mca params are single
values that do
not include spaces, so this is not a problem that is causing
widespread
issues. As I said, I already had to deal with one special case
that didn't
involve spaces, but did have special characters that required
quoting, which
identified the larger problem of dealing with quoted strings.

I have no objection to a more general fix. Like I said in my note,
though,
the general fix will take a larger effort. If someone is willing
to do so,
that is fine with me - I was only offering solutions that would
fill the
interim time as I haven't heard anyone step up to say they would
fix it
anytime soon.

Please feel free to jump in and volunteer! ;-) I'm willing to put
the quotes
around things if you will fix the mca cmd line parser to cleanly
remove them
on the other end.

Ralph



On 11/7/07 5:50 PM, "Tim Prins"  wrote:


I'm curious what changed to make this a problem. How were we
passing mca
param
from the base to the app before, and why did it change?

I think that options 1 & 2 below are no good, since we, in
general, allow
string mca params to have spaces (as far as I understand it). So
a more
general approach is needed.

T

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-19 Thread Ralph H Castain
I'm not sure it is really necessary - the problem is solely within
opal_cmd_line_parse and (if someone can parse that code ;-)) is truly simple
to fix. The overly long cmd line issue is due to a bug that Josh was going
to look at (may already have done so while I was out of touch).

Ralph



On 11/9/07 5:10 AM, "Jeff Squyres"  wrote:

> Should there be another option for passing MCA parameters between
> processes, such as via stdin (or any file descriptor)?  I.e., during
> the command line parsing to check for command line MCA params, perhaps
> a new argument could be introduced: -mcauri , where  could
> be a few different forms:
> 
> - file://stdin: (note the 2 //, not 3, so "stdin" would never conflict
> with a real file named /stdin)  Read the parameters in off stdin.
> 
> - rml://...rml contact info...: read in the MCA params via the RML
> (although I assume that reading via the RML would be *wy* to late
> during the MCA setup process -- I mentioned this option for
> completeness, even though I don't think it'll work)
> 
> - ip://ipaddress:port: open a socket back and read the MCA params in
> over a socket.  This could have some scalability issues...?  But who
> knows; it could be tied into the hierarchical startup such that we
> wouldn't have to have an all-to-one connection scheme.  Certainly it
> would cause scalability problems when paired with today's all-to-one
> RML connection scheme for the OOB.
> 
> I'm not sure that the rml: and ip: schemes are worthwhile.  Maybe a
> file://stdin kind of approach could work?  Or perhaps some other kind
> of URI/IPC...?  (I really haven't thought through the issues -- this
> is off the top of my head)
> 
> 
> 
> On Nov 8, 2007, at 2:36 PM, Ralph H Castain wrote:
> 
>> Might I suggest:
>> 
>> https://svn.open-mpi.org/trac/ompi/ticket/1073
>> 
>> It deals with some of these issues and explains the boundaries of the
>> problem. As for what a string param can contain, I have no opinion.
>> I only
>> note that it must handle special characters such as ';', '/', etc.
>> that are
>> typically found in uri's. I cannot think of any reason it should
>> have a
>> quote in it.
>> 
>> Ralph
>> 
>> 
>> 
>> On 11/8/07 12:25 PM, "Tim Prins"  wrote:
>> 
>>> The alias option you presented does not work. I think we do some
>>> weird
>>> things to find the absolute path for ssh, instead of just issuing the
>>> command.
>>> 
>>> I would spend some time fixing this, but I don't want to do it
>>> wrong. We
>>> could quote all the param values, and change the parser to remove the
>>> quotes, but this is assuming that a mca param does not contain
>>> quotes.
>>> 
>>> So I guess there are 2 questions that need to be answered before a
>>> fix
>>> is made:
>>> 
>>> 1. What exactly can a string mca param contain? Can it have quotes or
>>> spaces or?
>>> 
>>> 2. Which mca parameters should be forwarded? Should it be just the
>>> ones
>>> from the command line? From the environment? From config files?
>>> 
>>> Tim
>>> 
>>> Ralph Castain wrote:
 What changed is that we never passed mca params to the orted
 before - they
 always went to the app, but it's the orted that has the issue.
 There is a
 bug ticket thread on this subject - I forget the number immediately.
 
 Basically, the problem was that we cannot generally pass the local
 environment to the orteds when we launch them. However, people
 needed
 various mca params to get to the orteds to control their behavior.
 The only
 way to resolve that problem was to pass the params via the command
 line,
 which is what was done.
 
 Except for a very few cases, all of our mca params are single
 values that do
 not include spaces, so this is not a problem that is causing
 widespread
 issues. As I said, I already had to deal with one special case
 that didn't
 involve spaces, but did have special characters that required
 quoting, which
 identified the larger problem of dealing with quoted strings.
 
 I have no objection to a more general fix. Like I said in my note,
 though,
 the general fix will take a larger effort. If someone is willing
 to do so,
 that is fine with me - I was only offering solutions that would
 fill the
 interim time as I haven't heard anyone step up to say they would
 fix it
 anytime soon.
 
 Please feel free to jump in and volunteer! ;-) I'm willing to put
 the quotes
 around things if you will fix the mca cmd line parser to cleanly
 remove them
 on the other end.
 
 Ralph
 
 
 
 On 11/7/07 5:50 PM, "Tim Prins"  wrote:
 
> I'm curious what changed to make this a problem. How were we
> passing mca
> param
> from the base to the app before, and why did it change?
> 
> I think that options 1 & 2 below are no good, since we, in
> general, allow
> string mca params to have spaces (as far as I understand it).

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-09 Thread Jeff Squyres
Should there be another option for passing MCA parameters between  
processes, such as via stdin (or any file descriptor)?  I.e., during  
the command line parsing to check for command line MCA params, perhaps  
a new argument could be introduced: -mcauri , where  could  
be a few different forms:


- file://stdin: (note the 2 //, not 3, so "stdin" would never conflict  
with a real file named /stdin)  Read the parameters in off stdin.


- rml://...rml contact info...: read in the MCA params via the RML  
(although I assume that reading via the RML would be *wy* to late  
during the MCA setup process -- I mentioned this option for  
completeness, even though I don't think it'll work)


- ip://ipaddress:port: open a socket back and read the MCA params in  
over a socket.  This could have some scalability issues...?  But who  
knows; it could be tied into the hierarchical startup such that we  
wouldn't have to have an all-to-one connection scheme.  Certainly it  
would cause scalability problems when paired with today's all-to-one  
RML connection scheme for the OOB.


I'm not sure that the rml: and ip: schemes are worthwhile.  Maybe a  
file://stdin kind of approach could work?  Or perhaps some other kind  
of URI/IPC...?  (I really haven't thought through the issues -- this  
is off the top of my head)




On Nov 8, 2007, at 2:36 PM, Ralph H Castain wrote:


Might I suggest:

https://svn.open-mpi.org/trac/ompi/ticket/1073

It deals with some of these issues and explains the boundaries of the
problem. As for what a string param can contain, I have no opinion.  
I only
note that it must handle special characters such as ';', '/', etc.  
that are
typically found in uri's. I cannot think of any reason it should  
have a

quote in it.

Ralph



On 11/8/07 12:25 PM, "Tim Prins"  wrote:

The alias option you presented does not work. I think we do some  
weird

things to find the absolute path for ssh, instead of just issuing the
command.

I would spend some time fixing this, but I don't want to do it  
wrong. We

could quote all the param values, and change the parser to remove the
quotes, but this is assuming that a mca param does not contain  
quotes.


So I guess there are 2 questions that need to be answered before a  
fix

is made:

1. What exactly can a string mca param contain? Can it have quotes or
spaces or?

2. Which mca parameters should be forwarded? Should it be just the  
ones

from the command line? From the environment? From config files?

Tim

Ralph Castain wrote:
What changed is that we never passed mca params to the orted  
before - they
always went to the app, but it's the orted that has the issue.  
There is a

bug ticket thread on this subject - I forget the number immediately.

Basically, the problem was that we cannot generally pass the local
environment to the orteds when we launch them. However, people  
needed
various mca params to get to the orteds to control their behavior.  
The only
way to resolve that problem was to pass the params via the command  
line,

which is what was done.

Except for a very few cases, all of our mca params are single  
values that do
not include spaces, so this is not a problem that is causing  
widespread
issues. As I said, I already had to deal with one special case  
that didn't
involve spaces, but did have special characters that required  
quoting, which

identified the larger problem of dealing with quoted strings.

I have no objection to a more general fix. Like I said in my note,  
though,
the general fix will take a larger effort. If someone is willing  
to do so,
that is fine with me - I was only offering solutions that would  
fill the
interim time as I haven't heard anyone step up to say they would  
fix it

anytime soon.

Please feel free to jump in and volunteer! ;-) I'm willing to put  
the quotes
around things if you will fix the mca cmd line parser to cleanly  
remove them

on the other end.

Ralph



On 11/7/07 5:50 PM, "Tim Prins"  wrote:

I'm curious what changed to make this a problem. How were we  
passing mca

param
from the base to the app before, and why did it change?

I think that options 1 & 2 below are no good, since we, in  
general, allow
string mca params to have spaces (as far as I understand it). So  
a more

general approach is needed.

Tim

On Wednesday 07 November 2007 10:40:45 am Ralph H Castain wrote:

Sorry for delay - wasn't ignoring the issue.

There are several fixes to this problem - ranging in order from  
least to

most work:

1. just alias "ssh" to be "ssh -Y" and run without setting the  
mca param.
It won't affect anything on the backend because the daemon/procs  
don't use

ssh.

2. include "pls_rsh_agent" in the array of mca params not to be  
passed to

the orted in orte/mca/pls/base/pls_base_general_support_fns.c, the
orte_pls_base_orted_append_basic_args function. This would fix  
the specific
problem cited here, but I admit that listing every such param by  
name would

get tedious.

3. we could ea

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-08 Thread Ralph H Castain
Might I suggest:

https://svn.open-mpi.org/trac/ompi/ticket/1073

It deals with some of these issues and explains the boundaries of the
problem. As for what a string param can contain, I have no opinion. I only
note that it must handle special characters such as ';', '/', etc. that are
typically found in uri's. I cannot think of any reason it should have a
quote in it.

Ralph



On 11/8/07 12:25 PM, "Tim Prins"  wrote:

> The alias option you presented does not work. I think we do some weird
> things to find the absolute path for ssh, instead of just issuing the
> command.
> 
> I would spend some time fixing this, but I don't want to do it wrong. We
> could quote all the param values, and change the parser to remove the
> quotes, but this is assuming that a mca param does not contain quotes.
> 
> So I guess there are 2 questions that need to be answered before a fix
> is made:
> 
> 1. What exactly can a string mca param contain? Can it have quotes or
> spaces or?
> 
> 2. Which mca parameters should be forwarded? Should it be just the ones
> from the command line? From the environment? From config files?
> 
> Tim
> 
> Ralph Castain wrote:
>> What changed is that we never passed mca params to the orted before - they
>> always went to the app, but it's the orted that has the issue. There is a
>> bug ticket thread on this subject - I forget the number immediately.
>> 
>> Basically, the problem was that we cannot generally pass the local
>> environment to the orteds when we launch them. However, people needed
>> various mca params to get to the orteds to control their behavior. The only
>> way to resolve that problem was to pass the params via the command line,
>> which is what was done.
>> 
>> Except for a very few cases, all of our mca params are single values that do
>> not include spaces, so this is not a problem that is causing widespread
>> issues. As I said, I already had to deal with one special case that didn't
>> involve spaces, but did have special characters that required quoting, which
>> identified the larger problem of dealing with quoted strings.
>> 
>> I have no objection to a more general fix. Like I said in my note, though,
>> the general fix will take a larger effort. If someone is willing to do so,
>> that is fine with me - I was only offering solutions that would fill the
>> interim time as I haven't heard anyone step up to say they would fix it
>> anytime soon.
>> 
>> Please feel free to jump in and volunteer! ;-) I'm willing to put the quotes
>> around things if you will fix the mca cmd line parser to cleanly remove them
>> on the other end.
>> 
>> Ralph
>> 
>> 
>> 
>> On 11/7/07 5:50 PM, "Tim Prins"  wrote:
>> 
>>> I'm curious what changed to make this a problem. How were we passing mca
>>> param
>>> from the base to the app before, and why did it change?
>>> 
>>> I think that options 1 & 2 below are no good, since we, in general, allow
>>> string mca params to have spaces (as far as I understand it). So a more
>>> general approach is needed.
>>> 
>>> Tim
>>> 
>>> On Wednesday 07 November 2007 10:40:45 am Ralph H Castain wrote:
 Sorry for delay - wasn't ignoring the issue.
 
 There are several fixes to this problem - ranging in order from least to
 most work:
 
 1. just alias "ssh" to be "ssh -Y" and run without setting the mca param.
 It won't affect anything on the backend because the daemon/procs don't use
 ssh.
 
 2. include "pls_rsh_agent" in the array of mca params not to be passed to
 the orted in orte/mca/pls/base/pls_base_general_support_fns.c, the
 orte_pls_base_orted_append_basic_args function. This would fix the specific
 problem cited here, but I admit that listing every such param by name would
 get tedious.
 
 3. we could easily detect that a "problem" character was in the mca param
 value when we add it to the orted's argv, and then put "" around it. The
 problem, however, is that the mca param parser on the far end doesn't
 remove those "" from the resulting string. At least, I spent over a day
 fighting with a problem only to discover that was happening. Could be an
 error in the way I was doing things, or could be a real characteristic of
 the parser. Anyway, we would have to ensure that the parser removes any
 surrounding "" before passing along the param value or this won't work.
 
 Ralph
 
 On 11/5/07 12:10 PM, "Tim Prins"  wrote:
> Hi,
> 
> Commit 16364 broke things when using multiword mca param values. For
> instance:
> 
> mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca pls_rsh_agent
> "ssh -Y" xterm
> 
> Will crash and burn, because the value "ssh -Y" is being stored into the
> argv orted_cmd_line in orterun.c:1506. This is then added to the launch
> command for the orted:
> 
> /usr/bin/ssh -Y odin004  PATH=/san/homedirs/tprins/usr/rsl/bin:$PATH ;
> export PATH ;
> LD_LIBRARY_PATH=/san

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-08 Thread Tim Prins
The alias option you presented does not work. I think we do some weird 
things to find the absolute path for ssh, instead of just issuing the 
command.


I would spend some time fixing this, but I don't want to do it wrong. We 
could quote all the param values, and change the parser to remove the 
quotes, but this is assuming that a mca param does not contain quotes.


So I guess there are 2 questions that need to be answered before a fix 
is made:


1. What exactly can a string mca param contain? Can it have quotes or 
spaces or?


2. Which mca parameters should be forwarded? Should it be just the ones 
from the command line? From the environment? From config files?


Tim

Ralph Castain wrote:

What changed is that we never passed mca params to the orted before - they
always went to the app, but it's the orted that has the issue. There is a
bug ticket thread on this subject - I forget the number immediately.

Basically, the problem was that we cannot generally pass the local
environment to the orteds when we launch them. However, people needed
various mca params to get to the orteds to control their behavior. The only
way to resolve that problem was to pass the params via the command line,
which is what was done.

Except for a very few cases, all of our mca params are single values that do
not include spaces, so this is not a problem that is causing widespread
issues. As I said, I already had to deal with one special case that didn't
involve spaces, but did have special characters that required quoting, which
identified the larger problem of dealing with quoted strings.

I have no objection to a more general fix. Like I said in my note, though,
the general fix will take a larger effort. If someone is willing to do so,
that is fine with me - I was only offering solutions that would fill the
interim time as I haven't heard anyone step up to say they would fix it
anytime soon.

Please feel free to jump in and volunteer! ;-) I'm willing to put the quotes
around things if you will fix the mca cmd line parser to cleanly remove them
on the other end.

Ralph



On 11/7/07 5:50 PM, "Tim Prins"  wrote:


I'm curious what changed to make this a problem. How were we passing mca param
from the base to the app before, and why did it change?

I think that options 1 & 2 below are no good, since we, in general, allow
string mca params to have spaces (as far as I understand it). So a more
general approach is needed.

Tim

On Wednesday 07 November 2007 10:40:45 am Ralph H Castain wrote:

Sorry for delay - wasn't ignoring the issue.

There are several fixes to this problem - ranging in order from least to
most work:

1. just alias "ssh" to be "ssh -Y" and run without setting the mca param.
It won't affect anything on the backend because the daemon/procs don't use
ssh.

2. include "pls_rsh_agent" in the array of mca params not to be passed to
the orted in orte/mca/pls/base/pls_base_general_support_fns.c, the
orte_pls_base_orted_append_basic_args function. This would fix the specific
problem cited here, but I admit that listing every such param by name would
get tedious.

3. we could easily detect that a "problem" character was in the mca param
value when we add it to the orted's argv, and then put "" around it. The
problem, however, is that the mca param parser on the far end doesn't
remove those "" from the resulting string. At least, I spent over a day
fighting with a problem only to discover that was happening. Could be an
error in the way I was doing things, or could be a real characteristic of
the parser. Anyway, we would have to ensure that the parser removes any
surrounding "" before passing along the param value or this won't work.

Ralph

On 11/5/07 12:10 PM, "Tim Prins"  wrote:

Hi,

Commit 16364 broke things when using multiword mca param values. For
instance:

mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca pls_rsh_agent
"ssh -Y" xterm

Will crash and burn, because the value "ssh -Y" is being stored into the
argv orted_cmd_line in orterun.c:1506. This is then added to the launch
command for the orted:

/usr/bin/ssh -Y odin004  PATH=/san/homedirs/tprins/usr/rsl/bin:$PATH ;
export PATH ;
LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib:$LD_LIBRARY_PATH ;
export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/orted --debug
--debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 --nodename
odin004 --universe tpr...@odin.cs.indiana.edu:default-universe-27872
--nsreplica
"0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0
:4090 8"
--gprreplica
"0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0
:4090 8"
-mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca
mca_base_param_file_path
/u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/homedirs/tprins/rsl/
examp les
-mca mca_base_param_file_path_force /san/homedirs/tprins/rsl/examples

Notice that in this command we now have "-mca pls_rsh_agent ssh -Y". So
the quotes have been lost, as we die a horrible death.

So we nee

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-07 Thread Ralph Castain
What changed is that we never passed mca params to the orted before - they
always went to the app, but it's the orted that has the issue. There is a
bug ticket thread on this subject - I forget the number immediately.

Basically, the problem was that we cannot generally pass the local
environment to the orteds when we launch them. However, people needed
various mca params to get to the orteds to control their behavior. The only
way to resolve that problem was to pass the params via the command line,
which is what was done.

Except for a very few cases, all of our mca params are single values that do
not include spaces, so this is not a problem that is causing widespread
issues. As I said, I already had to deal with one special case that didn't
involve spaces, but did have special characters that required quoting, which
identified the larger problem of dealing with quoted strings.

I have no objection to a more general fix. Like I said in my note, though,
the general fix will take a larger effort. If someone is willing to do so,
that is fine with me - I was only offering solutions that would fill the
interim time as I haven't heard anyone step up to say they would fix it
anytime soon.

Please feel free to jump in and volunteer! ;-) I'm willing to put the quotes
around things if you will fix the mca cmd line parser to cleanly remove them
on the other end.

Ralph



On 11/7/07 5:50 PM, "Tim Prins"  wrote:

> I'm curious what changed to make this a problem. How were we passing mca param
> from the base to the app before, and why did it change?
> 
> I think that options 1 & 2 below are no good, since we, in general, allow
> string mca params to have spaces (as far as I understand it). So a more
> general approach is needed.
> 
> Tim
> 
> On Wednesday 07 November 2007 10:40:45 am Ralph H Castain wrote:
>> Sorry for delay - wasn't ignoring the issue.
>> 
>> There are several fixes to this problem - ranging in order from least to
>> most work:
>> 
>> 1. just alias "ssh" to be "ssh -Y" and run without setting the mca param.
>> It won't affect anything on the backend because the daemon/procs don't use
>> ssh.
>> 
>> 2. include "pls_rsh_agent" in the array of mca params not to be passed to
>> the orted in orte/mca/pls/base/pls_base_general_support_fns.c, the
>> orte_pls_base_orted_append_basic_args function. This would fix the specific
>> problem cited here, but I admit that listing every such param by name would
>> get tedious.
>> 
>> 3. we could easily detect that a "problem" character was in the mca param
>> value when we add it to the orted's argv, and then put "" around it. The
>> problem, however, is that the mca param parser on the far end doesn't
>> remove those "" from the resulting string. At least, I spent over a day
>> fighting with a problem only to discover that was happening. Could be an
>> error in the way I was doing things, or could be a real characteristic of
>> the parser. Anyway, we would have to ensure that the parser removes any
>> surrounding "" before passing along the param value or this won't work.
>> 
>> Ralph
>> 
>> On 11/5/07 12:10 PM, "Tim Prins"  wrote:
>>> Hi,
>>> 
>>> Commit 16364 broke things when using multiword mca param values. For
>>> instance:
>>> 
>>> mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca pls_rsh_agent
>>> "ssh -Y" xterm
>>> 
>>> Will crash and burn, because the value "ssh -Y" is being stored into the
>>> argv orted_cmd_line in orterun.c:1506. This is then added to the launch
>>> command for the orted:
>>> 
>>> /usr/bin/ssh -Y odin004  PATH=/san/homedirs/tprins/usr/rsl/bin:$PATH ;
>>> export PATH ;
>>> LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib:$LD_LIBRARY_PATH ;
>>> export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/orted --debug
>>> --debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 --nodename
>>> odin004 --universe tpr...@odin.cs.indiana.edu:default-universe-27872
>>> --nsreplica
>>> "0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0
>>> :4090 8"
>>> --gprreplica
>>> "0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0
>>> :4090 8"
>>> -mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca
>>> mca_base_param_file_path
>>> /u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/homedirs/tprins/rsl/
>>> examp les
>>> -mca mca_base_param_file_path_force /san/homedirs/tprins/rsl/examples
>>> 
>>> Notice that in this command we now have "-mca pls_rsh_agent ssh -Y". So
>>> the quotes have been lost, as we die a horrible death.
>>> 
>>> So we need to add the quotes back in somehow, or pass these options
>>> differently. I'm not sure what the best way to fix this.
>>> 
>>> Thanks,
>>> 
>>> Tim
> 
> 




Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-07 Thread Tim Prins
I'm curious what changed to make this a problem. How were we passing mca param 
from the base to the app before, and why did it change?

I think that options 1 & 2 below are no good, since we, in general, allow 
string mca params to have spaces (as far as I understand it). So a more 
general approach is needed. 

Tim

On Wednesday 07 November 2007 10:40:45 am Ralph H Castain wrote:
> Sorry for delay - wasn't ignoring the issue.
>
> There are several fixes to this problem - ranging in order from least to
> most work:
>
> 1. just alias "ssh" to be "ssh -Y" and run without setting the mca param.
> It won't affect anything on the backend because the daemon/procs don't use
> ssh.
>
> 2. include "pls_rsh_agent" in the array of mca params not to be passed to
> the orted in orte/mca/pls/base/pls_base_general_support_fns.c, the
> orte_pls_base_orted_append_basic_args function. This would fix the specific
> problem cited here, but I admit that listing every such param by name would
> get tedious.
>
> 3. we could easily detect that a "problem" character was in the mca param
> value when we add it to the orted's argv, and then put "" around it. The
> problem, however, is that the mca param parser on the far end doesn't
> remove those "" from the resulting string. At least, I spent over a day
> fighting with a problem only to discover that was happening. Could be an
> error in the way I was doing things, or could be a real characteristic of
> the parser. Anyway, we would have to ensure that the parser removes any
> surrounding "" before passing along the param value or this won't work.
>
> Ralph
>
> On 11/5/07 12:10 PM, "Tim Prins"  wrote:
> > Hi,
> >
> > Commit 16364 broke things when using multiword mca param values. For
> > instance:
> >
> > mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca pls_rsh_agent
> > "ssh -Y" xterm
> >
> > Will crash and burn, because the value "ssh -Y" is being stored into the
> > argv orted_cmd_line in orterun.c:1506. This is then added to the launch
> > command for the orted:
> >
> > /usr/bin/ssh -Y odin004  PATH=/san/homedirs/tprins/usr/rsl/bin:$PATH ;
> > export PATH ;
> > LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib:$LD_LIBRARY_PATH ;
> > export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/orted --debug
> > --debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 --nodename
> > odin004 --universe tpr...@odin.cs.indiana.edu:default-universe-27872
> > --nsreplica
> > "0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0
> >:4090 8"
> > --gprreplica
> > "0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0
> >:4090 8"
> > -mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca
> > mca_base_param_file_path
> > /u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/homedirs/tprins/rsl/
> >examp les
> > -mca mca_base_param_file_path_force /san/homedirs/tprins/rsl/examples
> >
> > Notice that in this command we now have "-mca pls_rsh_agent ssh -Y". So
> > the quotes have been lost, as we die a horrible death.
> >
> > So we need to add the quotes back in somehow, or pass these options
> > differently. I'm not sure what the best way to fix this.
> >
> > Thanks,
> >
> > Tim




Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-07 Thread Ralph H Castain
Sorry for delay - wasn't ignoring the issue.

There are several fixes to this problem - ranging in order from least to
most work:

1. just alias "ssh" to be "ssh -Y" and run without setting the mca param. It
won't affect anything on the backend because the daemon/procs don't use ssh.

2. include "pls_rsh_agent" in the array of mca params not to be passed to
the orted in orte/mca/pls/base/pls_base_general_support_fns.c, the
orte_pls_base_orted_append_basic_args function. This would fix the specific
problem cited here, but I admit that listing every such param by name would
get tedious.

3. we could easily detect that a "problem" character was in the mca param
value when we add it to the orted's argv, and then put "" around it. The
problem, however, is that the mca param parser on the far end doesn't remove
those "" from the resulting string. At least, I spent over a day fighting
with a problem only to discover that was happening. Could be an error in the
way I was doing things, or could be a real characteristic of the parser.
Anyway, we would have to ensure that the parser removes any surrounding ""
before passing along the param value or this won't work.

Ralph



On 11/5/07 12:10 PM, "Tim Prins"  wrote:

> Hi,
> 
> Commit 16364 broke things when using multiword mca param values. For
> instance:
> 
> mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca pls_rsh_agent
> "ssh -Y" xterm
> 
> Will crash and burn, because the value "ssh -Y" is being stored into the
> argv orted_cmd_line in orterun.c:1506. This is then added to the launch
> command for the orted:
> 
> /usr/bin/ssh -Y odin004  PATH=/san/homedirs/tprins/usr/rsl/bin:$PATH ;
> export PATH ; 
> LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib:$LD_LIBRARY_PATH ;
> export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/orted --debug
> --debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 --nodename
> odin004 --universe tpr...@odin.cs.indiana.edu:default-universe-27872
> --nsreplica 
> "0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0:4090
> 8" 
> --gprreplica 
> "0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0:4090
> 8" 
> -mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca
> mca_base_param_file_path
> /u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/homedirs/tprins/rsl/examp
> les 
> -mca mca_base_param_file_path_force /san/homedirs/tprins/rsl/examples
> 
> Notice that in this command we now have "-mca pls_rsh_agent ssh -Y". So
> the quotes have been lost, as we die a horrible death.
> 
> So we need to add the quotes back in somehow, or pass these options
> differently. I'm not sure what the best way to fix this.
> 
> Thanks,
> 
> Tim




[OMPI devel] Multiworld MCA parameter values broken

2007-11-05 Thread Tim Prins

Hi,

Commit 16364 broke things when using multiword mca param values. For 
instance:


mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca pls_rsh_agent 
"ssh -Y" xterm


Will crash and burn, because the value "ssh -Y" is being stored into the 
argv orted_cmd_line in orterun.c:1506. This is then added to the launch 
command for the orted:


/usr/bin/ssh -Y odin004  PATH=/san/homedirs/tprins/usr/rsl/bin:$PATH ; 
export PATH ; 
LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib:$LD_LIBRARY_PATH ; 
export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/orted --debug 
--debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 --nodename 
odin004 --universe tpr...@odin.cs.indiana.edu:default-universe-27872 
--nsreplica 
"0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0:40908" 
--gprreplica 
"0.0;tcp://129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0:40908" 
-mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca 
mca_base_param_file_path 
/u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/homedirs/tprins/rsl/examples 
-mca mca_base_param_file_path_force /san/homedirs/tprins/rsl/examples


Notice that in this command we now have "-mca pls_rsh_agent ssh -Y". So 
the quotes have been lost, as we die a horrible death.


So we need to add the quotes back in somehow, or pass these options 
differently. I'm not sure what the best way to fix this.


Thanks,

Tim