Unfortunately, it -is- a significant problem when passing the params on to
the orteds, as Tim has eloquently pointed out in his original posting. I
guess I don't see why it would be a significant problem to just have
opal_cmd_line_parse do the "right thing" - if a string param is quoted, then
just remove the quotes.

Problem solved, and I wouldn't have to add a bunch of code to figure out
when to assemble argv arrays in different ways.


On 11/19/07 7:52 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

> I guess I don't see why this is an opal_cmd_line_parse() problem.
> 
> If you invoke executables through system(), then a shell is used and
> quoting is necessary to preserve the individual string tokens (i.e.,
> "my beloved string" would be passed to the application as one argv
> token, without the quotes).
> 
> But if you're building up an array of argv and calling some form of
> exec(), then no shell is involved and quoting should not be
> necessary.  Specifically:
> 
>      opal_append_argv(&argc, &argv, "my beloved string");
> 
> will be passed as one string token to the application.
> 
> opal_cmd_line_parse() is passed an array of argv, meaning that the
> command line have already been split into individual string tokens.  I
> guess the question is whether these come in directly from the
> arguments to main() or whether we are getting a single string and
> breaking it up into tokens.  If the latter is true, then we need to re-
> evaluate our break-into-tokens algorithm.  I have a dim recollection
> that these come in from the arguments to main(), though.
> 
> I guess I can see where this would get complicated for rsh/ssh
> invocations, because *both* models are used. (i.e., you exec locally
> but it turns into a system-like invocation on the remote side).  In
> this case, I think you'll need to quote extended strings (e.g., those
> containing spaces) for the non-local invocations not not quote it for
> local invocations.
> 
> 
> 
> 
> On Nov 19, 2007, at 9:19 AM, Ralph H Castain wrote:
> 
>> My, you are joining late! ;-)
>> 
>> The problem is with MCA params that take string arguments. If we
>> pass the following:
>> 
>> -mca foo "my beloved string"
>> 
>> on the command line of an orted, we get a value for foo that
>> includes the quote marks. I verified this rather painfully when
>> attempting to pass a command line mca param for a uri. I eventually
>> had to add specific code to clean the paramater up.
>> 
>> This appears to be somehow related to the precise method used to
>> register the param. For example, the following deprecated method
>> works fine:
>> 
>> On the setup end:
>> 
>>     opal_argv_append(argc, argv, "--gprreplica");
>>     if (NULL != orte_process_info.gpr_replica_uri) {
>>         contact_info = strdup(orte_process_info.gpr_replica_uri);
>>     } else {
>>         contact_info = orte_rml.get_contact_info();
>>     }
>>     asprintf(&param, "\"%s\"", contact_info);
>>     opal_argv_append(argc, argv, param);
>> 
>> And on the receiving end:
>>     id = mca_base_param_register_string("gpr", "replica", "uri",
>> NULL, orte_process_info.gpr_replica_uri);
>>     mca_base_param_lookup_string(id,
>> &(orte_process_info.gpr_replica_uri));
>>     mca_base_param_set_internal(id, true);
>> 
>> 
>> However, the following does NOT work cleanly:
>> 
>> On the setup end:
>>    rml_uri = orte_rml.get_contact_info();
>>     asprintf(&param, "\"%s\"", rml_uri);
>>     opal_argv_append(argc, argv, "--hnp-uri");
>>     opal_argv_append(argc, argv, param);
>> 
>> On the receiving end:
>>     mca_base_param_reg_string_name("orte", "hnp_uri",
>>                                    "HNP contact info",
>>                                    true, false, NULL,  &uri);
>> 
>> Thereby necessitating the addition of the following code to clean it
>> up:
>>     if (NULL != uri) {
>>         /* the uri value passed to us will have quote marks around
>> it to protect
>>         * the value if passed on the command line. We must remove
>> those
>>         * to have a correct uri string
>>         */
>>        if ('"' == uri[0]) {
>>             /* if the first char is a quote, then so will the last
>> one be */
>>            ptr = &uri[1];
>>             len = strlen(ptr) - 1;
>>         } else {
>>             ptr = &uri[0];
>>             len = strlen(uri);
>>         }
>> 
>>         /* we have to copy the string by hand as strndup is a GNU
>> extension
>>          * and may not be generally available
>>          */
>>        orte_process_info.my_hnp_uri = (char*)malloc(len+1);
>>         for (i=0; i < len; i++) {
>>             orte_process_info.my_hnp_uri[i] = ptr[i];
>>         }
>>         orte_process_info.my_hnp_uri[len] = '\0';  /* NULL terminate
>> */
>>        free(uri);
>> 
>>     }
>> 
>> It was my understanding that you wanted us to move away from the
>> deprecated interface ­ hence my comment that we cannot just quote
>> all the strings as we would have to add this code all over the
>> place, or (better) fix opal_cmd_line_parse.
>> 
>> Hope that helps
>> Ralph
>> 
>> 
>> On 11/19/07 7:01 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:
>> 
>>> Sorry -- I'm just joining this conversation late: what's the problem
>>> with opal_cmd_line_parse?
>>> 
>>> It should obey all quoting from shells, etc.  I.e., it shouldn't
>> care
>>> about tokens with special characters (to include spaces) because the
>>> shell divides all of that stuff up -- it just gets a char*[] that it
>>> treats as discrete tokens.
>>> 
>>> Is it doing something wrong?
>>> 
>>> 
>>> 
>>> On Nov 19, 2007, at 8:39 AM, Ralph H Castain wrote:
>>> 
>>>> I'm not sure it is really necessary - the problem is solely within
>>>> opal_cmd_line_parse and (if someone can parse that code ;-)) is
>>>> truly simple
>>>> to fix. The overly long cmd line issue is due to a bug that Josh
>> was
>>>> going
>>>> to look at (may already have done so while I was out of touch).
>>>> 
>>>> Ralph
>>>> 
>>>> 
>>>> 
>>>> On 11/9/07 5:10 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:
>>>> 
>>>>> Should there be another option for passing MCA parameters between
>>>>> processes, such as via stdin (or any file descriptor)?  I.e.,
>> during
>>>>> the command line parsing to check for command line MCA params,
>>>>> perhaps
>>>>> a new argument could be introduced: -mcauri <uri>, where <uri>
>> could
>>>>> be a few different forms:
>>>>> 
>>>>> - file://stdin: (note the 2 //, not 3, so "stdin" would never
>>>>> conflict
>>>>> with a real file named /stdin)  Read the parameters in off stdin.
>>>>> 
>>>>> - rml://...rml contact info...: read in the MCA params via the RML
>>>>> (although I assume that reading via the RML would be *waaaay* to
>> late
>>>>> during the MCA setup process -- I mentioned this option for
>>>>> completeness, even though I don't think it'll work)
>>>>> 
>>>>> - ip://ipaddress:port: open a socket back and read the MCA
>> params in
>>>>> over a socket.  This could have some scalability issues...?  But
>> who
>>>>> knows; it could be tied into the hierarchical startup such that we
>>>>> wouldn't have to have an all-to-one connection scheme.
>> Certainly it
>>>>> would cause scalability problems when paired with today's all-to-
>> one
>>>>> RML connection scheme for the OOB.
>>>>> 
>>>>> I'm not sure that the rml: and ip: schemes are worthwhile.
>> Maybe a
>>>>> file://stdin kind of approach could work?  Or perhaps some other
>> kind
>>>>> of URI/IPC...?  (I really haven't thought through the issues --
>> this
>>>>> is off the top of my head)
>>>>> 
>>>>> 
>>>>> 
>>>>> On Nov 8, 2007, at 2:36 PM, Ralph H Castain wrote:
>>>>> 
>>>>>> Might I suggest:
>>>>>> 
>>>>>> https://svn.open-mpi.org/trac/ompi/ticket/1073
>>>>>> 
>>>>>> It deals with some of these issues and explains the boundaries of
>>>>>> the
>>>>>> problem. As for what a string param can contain, I have no
>> opinion.
>>>>>> I only
>>>>>> note that it must handle special characters such as ';', '/',
>> etc.
>>>>>> that are
>>>>>> typically found in uri's. I cannot think of any reason it should
>>>>>> have a
>>>>>> quote in it.
>>>>>> 
>>>>>> Ralph
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 11/8/07 12:25 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote:
>>>>>> 
>>>>>>> The alias option you presented does not work. I think we do some
>>>>>>> weird
>>>>>>> things to find the absolute path for ssh, instead of just
>> issuing
>>>>>>> the
>>>>>>> command.
>>>>>>> 
>>>>>>> I would spend some time fixing this, but I don't want to do it
>>>>>>> wrong. We
>>>>>>> could quote all the param values, and change the parser to
>> remove
>>>>>>> the
>>>>>>> quotes, but this is assuming that a mca param does not contain
>>>>>>> quotes.
>>>>>>> 
>>>>>>> So I guess there are 2 questions that need to be answered
>> before a
>>>>>>> fix
>>>>>>> is made:
>>>>>>> 
>>>>>>> 1. What exactly can a string mca param contain? Can it have
>>>>>>> quotes or
>>>>>>> spaces or?
>>>>>>> 
>>>>>>> 2. Which mca parameters should be forwarded? Should it be just
>> the
>>>>>>> ones
>>>>>>> from the command line? From the environment? From config files?
>>>>>>> 
>>>>>>> Tim
>>>>>>> 
>>>>>>> Ralph Castain wrote:
>>>>>>>> What changed is that we never passed mca params to the orted
>>>>>>>> before - they
>>>>>>>> always went to the app, but it's the orted that has the issue.
>>>>>>>> There is a
>>>>>>>> bug ticket thread on this subject - I forget the number
>>>>>>>> immediately.
>>>>>>>> 
>>>>>>>> Basically, the problem was that we cannot generally pass the
>> local
>>>>>>>> environment to the orteds when we launch them. However, people
>>>>>>>> needed
>>>>>>>> various mca params to get to the orteds to control their
>> behavior.
>>>>>>>> The only
>>>>>>>> way to resolve that problem was to pass the params via the
>> command
>>>>>>>> line,
>>>>>>>> which is what was done.
>>>>>>>> 
>>>>>>>> Except for a very few cases, all of our mca params are single
>>>>>>>> values that do
>>>>>>>> not include spaces, so this is not a problem that is causing
>>>>>>>> widespread
>>>>>>>> issues. As I said, I already had to deal with one special case
>>>>>>>> that didn't
>>>>>>>> involve spaces, but did have special characters that required
>>>>>>>> quoting, which
>>>>>>>> identified the larger problem of dealing with quoted strings.
>>>>>>>> 
>>>>>>>> I have no objection to a more general fix. Like I said in my
>> note,
>>>>>>>> though,
>>>>>>>> the general fix will take a larger effort. If someone is
>> willing
>>>>>>>> to do so,
>>>>>>>> that is fine with me - I was only offering solutions that would
>>>>>>>> fill the
>>>>>>>> interim time as I haven't heard anyone step up to say they
>> would
>>>>>>>> fix it
>>>>>>>> anytime soon.
>>>>>>>> 
>>>>>>>> Please feel free to jump in and volunteer! ;-) I'm willing to
>> put
>>>>>>>> the quotes
>>>>>>>> around things if you will fix the mca cmd line parser to
>> cleanly
>>>>>>>> remove them
>>>>>>>> on the other end.
>>>>>>>> 
>>>>>>>> Ralph
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 11/7/07 5:50 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote:
>>>>>>>> 
>>>>>>>>> I'm curious what changed to make this a problem. How were we
>>>>>>>>> passing mca
>>>>>>>>> param
>>>>>>>>> from the base to the app before, and why did it change?
>>>>>>>>> 
>>>>>>>>> I think that options 1 & 2 below are no good, since we, in
>>>>>>>>> general, allow
>>>>>>>>> string mca params to have spaces (as far as I understand
>> it). So
>>>>>>>>> a more
>>>>>>>>> general approach is needed.
>>>>>>>>> 
>>>>>>>>> Tim
>>>>>>>>> 
>>>>>>>>> On Wednesday 07 November 2007 10:40:45 am Ralph H Castain
>> wrote:
>>>>>>>>>> Sorry for delay - wasn't ignoring the issue.
>>>>>>>>>> 
>>>>>>>>>> There are several fixes to this problem - ranging in order
>> from
>>>>>>>>>> least to
>>>>>>>>>> most work:
>>>>>>>>>> 
>>>>>>>>>> 1. just alias "ssh" to be "ssh -Y" and run without setting
>> the
>>>>>>>>>> mca param.
>>>>>>>>>> It won't affect anything on the backend because the daemon/
>> procs
>>>>>>>>>> don't use
>>>>>>>>>> ssh.
>>>>>>>>>> 
>>>>>>>>>> 2. include "pls_rsh_agent" in the array of mca params not
>> to be
>>>>>>>>>> passed to
>>>>>>>>>> the orted in orte/mca/pls/base/
>> pls_base_general_support_fns.c,
>>>>>>>>>> the
>>>>>>>>>> orte_pls_base_orted_append_basic_args function. This would
>> fix
>>>>>>>>>> the specific
>>>>>>>>>> problem cited here, but I admit that listing every such
>> param by
>>>>>>>>>> name would
>>>>>>>>>> get tedious.
>>>>>>>>>> 
>>>>>>>>>> 3. we could easily detect that a "problem" character was in
>> the
>>>>>>>>>> mca param
>>>>>>>>>> value when we add it to the orted's argv, and then put ""
>> around
>>>>>>>>>> it. The
>>>>>>>>>> problem, however, is that the mca param parser on the far end
>>>>>>>>>> doesn't
>>>>>>>>>> remove those "" from the resulting string. At least, I spent
>>>>>>>>>> over a day
>>>>>>>>>> fighting with a problem only to discover that was happening.
>>>>>>>>>> Could be an
>>>>>>>>>> error in the way I was doing things, or could be a real
>>>>>>>>>> characteristic of
>>>>>>>>>> the parser. Anyway, we would have to ensure that the parser
>>>>>>>>>> removes any
>>>>>>>>>> surrounding "" before passing along the param value or this
>>>>>>>>>> won't work.
>>>>>>>>>> 
>>>>>>>>>> Ralph
>>>>>>>>>> 
>>>>>>>>>> On 11/5/07 12:10 PM, "Tim Prins" <tpr...@cs.indiana.edu>
>> wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> Commit 16364 broke things when using multiword mca param
>>>>>>>>>>> values. For
>>>>>>>>>>> instance:
>>>>>>>>>>> 
>>>>>>>>>>> mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca
>>>>>>>>>>> pls_rsh_agent
>>>>>>>>>>> "ssh -Y" xterm
>>>>>>>>>>> 
>>>>>>>>>>> Will crash and burn, because the value "ssh -Y" is being
>> stored
>>>>>>>>>>> into the
>>>>>>>>>>> argv orted_cmd_line in orterun.c:1506. This is then added to
>>>>>>>>>>> the launch
>>>>>>>>>>> command for the orted:
>>>>>>>>>>> 
>>>>>>>>>>> /usr/bin/ssh -Y odin004  PATH=/san/homedirs/tprins/usr/rsl/
>> bin:
>>>>>>>>>>> $PATH ;
>>>>>>>>>>> export PATH ;
>>>>>>>>>>> LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib:
>>>>>>>>>>> $LD_LIBRARY_PATH ;
>>>>>>>>>>> export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/
>> orted
>>>>>>>>>>> --debug
>>>>>>>>>>> --debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 --
>>>>>>>>>>> nodename
>>>>>>>>>>> odin004 --universe tpr...@odin.cs.indiana.edu:default-
>>>>>>>>>>> universe-27872
>>>>>>>>>>> --nsreplica
>>>>>>>>>>> "0.0;tcp://
>>>>>>>>>>> 129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:
>> 21a0
>>>>>>>>>>> :4090 8"
>>>>>>>>>>> --gprreplica
>>>>>>>>>>> "0.0;tcp://
>>>>>>>>>>> 129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:
>> 21a0
>>>>>>>>>>> :4090 8"
>>>>>>>>>>> -mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca
>>>>>>>>>>> mca_base_param_file_path
>>>>>>>>>>> /u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/
>> homedirs/
>>>>>>>>>>> tprins/rsl/
>>>>>>>>>>> examp les
>>>>>>>>>>> -mca mca_base_param_file_path_force /san/homedirs/tprins/
>> rsl/
>>>>>>>>>>> examples
>>>>>>>>>>> 
>>>>>>>>>>> Notice that in this command we now have "-mca
>> pls_rsh_agent ssh
>>>>>>>>>>> -Y". So
>>>>>>>>>>> the quotes have been lost, as we die a horrible death.
>>>>>>>>>>> 
>>>>>>>>>>> So we need to add the quotes back in somehow, or pass these
>>>>>>>>>>> options
>>>>>>>>>>> differently. I'm not sure what the best way to fix this.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>>>>>>>> Tim
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>> 
> 



Reply via email to