Unfortunately, it -is- a significant problem when passing the params on to the orteds, as Tim has eloquently pointed out in his original posting. I guess I don't see why it would be a significant problem to just have opal_cmd_line_parse do the "right thing" - if a string param is quoted, then just remove the quotes.
Problem solved, and I wouldn't have to add a bunch of code to figure out when to assemble argv arrays in different ways. On 11/19/07 7:52 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote: > I guess I don't see why this is an opal_cmd_line_parse() problem. > > If you invoke executables through system(), then a shell is used and > quoting is necessary to preserve the individual string tokens (i.e., > "my beloved string" would be passed to the application as one argv > token, without the quotes). > > But if you're building up an array of argv and calling some form of > exec(), then no shell is involved and quoting should not be > necessary. Specifically: > > opal_append_argv(&argc, &argv, "my beloved string"); > > will be passed as one string token to the application. > > opal_cmd_line_parse() is passed an array of argv, meaning that the > command line have already been split into individual string tokens. I > guess the question is whether these come in directly from the > arguments to main() or whether we are getting a single string and > breaking it up into tokens. If the latter is true, then we need to re- > evaluate our break-into-tokens algorithm. I have a dim recollection > that these come in from the arguments to main(), though. > > I guess I can see where this would get complicated for rsh/ssh > invocations, because *both* models are used. (i.e., you exec locally > but it turns into a system-like invocation on the remote side). In > this case, I think you'll need to quote extended strings (e.g., those > containing spaces) for the non-local invocations not not quote it for > local invocations. > > > > > On Nov 19, 2007, at 9:19 AM, Ralph H Castain wrote: > >> My, you are joining late! ;-) >> >> The problem is with MCA params that take string arguments. If we >> pass the following: >> >> -mca foo "my beloved string" >> >> on the command line of an orted, we get a value for foo that >> includes the quote marks. I verified this rather painfully when >> attempting to pass a command line mca param for a uri. I eventually >> had to add specific code to clean the paramater up. >> >> This appears to be somehow related to the precise method used to >> register the param. For example, the following deprecated method >> works fine: >> >> On the setup end: >> >> opal_argv_append(argc, argv, "--gprreplica"); >> if (NULL != orte_process_info.gpr_replica_uri) { >> contact_info = strdup(orte_process_info.gpr_replica_uri); >> } else { >> contact_info = orte_rml.get_contact_info(); >> } >> asprintf(¶m, "\"%s\"", contact_info); >> opal_argv_append(argc, argv, param); >> >> And on the receiving end: >> id = mca_base_param_register_string("gpr", "replica", "uri", >> NULL, orte_process_info.gpr_replica_uri); >> mca_base_param_lookup_string(id, >> &(orte_process_info.gpr_replica_uri)); >> mca_base_param_set_internal(id, true); >> >> >> However, the following does NOT work cleanly: >> >> On the setup end: >> rml_uri = orte_rml.get_contact_info(); >> asprintf(¶m, "\"%s\"", rml_uri); >> opal_argv_append(argc, argv, "--hnp-uri"); >> opal_argv_append(argc, argv, param); >> >> On the receiving end: >> mca_base_param_reg_string_name("orte", "hnp_uri", >> "HNP contact info", >> true, false, NULL, &uri); >> >> Thereby necessitating the addition of the following code to clean it >> up: >> if (NULL != uri) { >> /* the uri value passed to us will have quote marks around >> it to protect >> * the value if passed on the command line. We must remove >> those >> * to have a correct uri string >> */ >> if ('"' == uri[0]) { >> /* if the first char is a quote, then so will the last >> one be */ >> ptr = &uri[1]; >> len = strlen(ptr) - 1; >> } else { >> ptr = &uri[0]; >> len = strlen(uri); >> } >> >> /* we have to copy the string by hand as strndup is a GNU >> extension >> * and may not be generally available >> */ >> orte_process_info.my_hnp_uri = (char*)malloc(len+1); >> for (i=0; i < len; i++) { >> orte_process_info.my_hnp_uri[i] = ptr[i]; >> } >> orte_process_info.my_hnp_uri[len] = '\0'; /* NULL terminate >> */ >> free(uri); >> >> } >> >> It was my understanding that you wanted us to move away from the >> deprecated interface hence my comment that we cannot just quote >> all the strings as we would have to add this code all over the >> place, or (better) fix opal_cmd_line_parse. >> >> Hope that helps >> Ralph >> >> >> On 11/19/07 7:01 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote: >> >>> Sorry -- I'm just joining this conversation late: what's the problem >>> with opal_cmd_line_parse? >>> >>> It should obey all quoting from shells, etc. I.e., it shouldn't >> care >>> about tokens with special characters (to include spaces) because the >>> shell divides all of that stuff up -- it just gets a char*[] that it >>> treats as discrete tokens. >>> >>> Is it doing something wrong? >>> >>> >>> >>> On Nov 19, 2007, at 8:39 AM, Ralph H Castain wrote: >>> >>>> I'm not sure it is really necessary - the problem is solely within >>>> opal_cmd_line_parse and (if someone can parse that code ;-)) is >>>> truly simple >>>> to fix. The overly long cmd line issue is due to a bug that Josh >> was >>>> going >>>> to look at (may already have done so while I was out of touch). >>>> >>>> Ralph >>>> >>>> >>>> >>>> On 11/9/07 5:10 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote: >>>> >>>>> Should there be another option for passing MCA parameters between >>>>> processes, such as via stdin (or any file descriptor)? I.e., >> during >>>>> the command line parsing to check for command line MCA params, >>>>> perhaps >>>>> a new argument could be introduced: -mcauri <uri>, where <uri> >> could >>>>> be a few different forms: >>>>> >>>>> - file://stdin: (note the 2 //, not 3, so "stdin" would never >>>>> conflict >>>>> with a real file named /stdin) Read the parameters in off stdin. >>>>> >>>>> - rml://...rml contact info...: read in the MCA params via the RML >>>>> (although I assume that reading via the RML would be *waaaay* to >> late >>>>> during the MCA setup process -- I mentioned this option for >>>>> completeness, even though I don't think it'll work) >>>>> >>>>> - ip://ipaddress:port: open a socket back and read the MCA >> params in >>>>> over a socket. This could have some scalability issues...? But >> who >>>>> knows; it could be tied into the hierarchical startup such that we >>>>> wouldn't have to have an all-to-one connection scheme. >> Certainly it >>>>> would cause scalability problems when paired with today's all-to- >> one >>>>> RML connection scheme for the OOB. >>>>> >>>>> I'm not sure that the rml: and ip: schemes are worthwhile. >> Maybe a >>>>> file://stdin kind of approach could work? Or perhaps some other >> kind >>>>> of URI/IPC...? (I really haven't thought through the issues -- >> this >>>>> is off the top of my head) >>>>> >>>>> >>>>> >>>>> On Nov 8, 2007, at 2:36 PM, Ralph H Castain wrote: >>>>> >>>>>> Might I suggest: >>>>>> >>>>>> https://svn.open-mpi.org/trac/ompi/ticket/1073 >>>>>> >>>>>> It deals with some of these issues and explains the boundaries of >>>>>> the >>>>>> problem. As for what a string param can contain, I have no >> opinion. >>>>>> I only >>>>>> note that it must handle special characters such as ';', '/', >> etc. >>>>>> that are >>>>>> typically found in uri's. I cannot think of any reason it should >>>>>> have a >>>>>> quote in it. >>>>>> >>>>>> Ralph >>>>>> >>>>>> >>>>>> >>>>>> On 11/8/07 12:25 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote: >>>>>> >>>>>>> The alias option you presented does not work. I think we do some >>>>>>> weird >>>>>>> things to find the absolute path for ssh, instead of just >> issuing >>>>>>> the >>>>>>> command. >>>>>>> >>>>>>> I would spend some time fixing this, but I don't want to do it >>>>>>> wrong. We >>>>>>> could quote all the param values, and change the parser to >> remove >>>>>>> the >>>>>>> quotes, but this is assuming that a mca param does not contain >>>>>>> quotes. >>>>>>> >>>>>>> So I guess there are 2 questions that need to be answered >> before a >>>>>>> fix >>>>>>> is made: >>>>>>> >>>>>>> 1. What exactly can a string mca param contain? Can it have >>>>>>> quotes or >>>>>>> spaces or? >>>>>>> >>>>>>> 2. Which mca parameters should be forwarded? Should it be just >> the >>>>>>> ones >>>>>>> from the command line? From the environment? From config files? >>>>>>> >>>>>>> Tim >>>>>>> >>>>>>> Ralph Castain wrote: >>>>>>>> What changed is that we never passed mca params to the orted >>>>>>>> before - they >>>>>>>> always went to the app, but it's the orted that has the issue. >>>>>>>> There is a >>>>>>>> bug ticket thread on this subject - I forget the number >>>>>>>> immediately. >>>>>>>> >>>>>>>> Basically, the problem was that we cannot generally pass the >> local >>>>>>>> environment to the orteds when we launch them. However, people >>>>>>>> needed >>>>>>>> various mca params to get to the orteds to control their >> behavior. >>>>>>>> The only >>>>>>>> way to resolve that problem was to pass the params via the >> command >>>>>>>> line, >>>>>>>> which is what was done. >>>>>>>> >>>>>>>> Except for a very few cases, all of our mca params are single >>>>>>>> values that do >>>>>>>> not include spaces, so this is not a problem that is causing >>>>>>>> widespread >>>>>>>> issues. As I said, I already had to deal with one special case >>>>>>>> that didn't >>>>>>>> involve spaces, but did have special characters that required >>>>>>>> quoting, which >>>>>>>> identified the larger problem of dealing with quoted strings. >>>>>>>> >>>>>>>> I have no objection to a more general fix. Like I said in my >> note, >>>>>>>> though, >>>>>>>> the general fix will take a larger effort. If someone is >> willing >>>>>>>> to do so, >>>>>>>> that is fine with me - I was only offering solutions that would >>>>>>>> fill the >>>>>>>> interim time as I haven't heard anyone step up to say they >> would >>>>>>>> fix it >>>>>>>> anytime soon. >>>>>>>> >>>>>>>> Please feel free to jump in and volunteer! ;-) I'm willing to >> put >>>>>>>> the quotes >>>>>>>> around things if you will fix the mca cmd line parser to >> cleanly >>>>>>>> remove them >>>>>>>> on the other end. >>>>>>>> >>>>>>>> Ralph >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 11/7/07 5:50 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote: >>>>>>>> >>>>>>>>> I'm curious what changed to make this a problem. How were we >>>>>>>>> passing mca >>>>>>>>> param >>>>>>>>> from the base to the app before, and why did it change? >>>>>>>>> >>>>>>>>> I think that options 1 & 2 below are no good, since we, in >>>>>>>>> general, allow >>>>>>>>> string mca params to have spaces (as far as I understand >> it). So >>>>>>>>> a more >>>>>>>>> general approach is needed. >>>>>>>>> >>>>>>>>> Tim >>>>>>>>> >>>>>>>>> On Wednesday 07 November 2007 10:40:45 am Ralph H Castain >> wrote: >>>>>>>>>> Sorry for delay - wasn't ignoring the issue. >>>>>>>>>> >>>>>>>>>> There are several fixes to this problem - ranging in order >> from >>>>>>>>>> least to >>>>>>>>>> most work: >>>>>>>>>> >>>>>>>>>> 1. just alias "ssh" to be "ssh -Y" and run without setting >> the >>>>>>>>>> mca param. >>>>>>>>>> It won't affect anything on the backend because the daemon/ >> procs >>>>>>>>>> don't use >>>>>>>>>> ssh. >>>>>>>>>> >>>>>>>>>> 2. include "pls_rsh_agent" in the array of mca params not >> to be >>>>>>>>>> passed to >>>>>>>>>> the orted in orte/mca/pls/base/ >> pls_base_general_support_fns.c, >>>>>>>>>> the >>>>>>>>>> orte_pls_base_orted_append_basic_args function. This would >> fix >>>>>>>>>> the specific >>>>>>>>>> problem cited here, but I admit that listing every such >> param by >>>>>>>>>> name would >>>>>>>>>> get tedious. >>>>>>>>>> >>>>>>>>>> 3. we could easily detect that a "problem" character was in >> the >>>>>>>>>> mca param >>>>>>>>>> value when we add it to the orted's argv, and then put "" >> around >>>>>>>>>> it. The >>>>>>>>>> problem, however, is that the mca param parser on the far end >>>>>>>>>> doesn't >>>>>>>>>> remove those "" from the resulting string. At least, I spent >>>>>>>>>> over a day >>>>>>>>>> fighting with a problem only to discover that was happening. >>>>>>>>>> Could be an >>>>>>>>>> error in the way I was doing things, or could be a real >>>>>>>>>> characteristic of >>>>>>>>>> the parser. Anyway, we would have to ensure that the parser >>>>>>>>>> removes any >>>>>>>>>> surrounding "" before passing along the param value or this >>>>>>>>>> won't work. >>>>>>>>>> >>>>>>>>>> Ralph >>>>>>>>>> >>>>>>>>>> On 11/5/07 12:10 PM, "Tim Prins" <tpr...@cs.indiana.edu> >> wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Commit 16364 broke things when using multiword mca param >>>>>>>>>>> values. For >>>>>>>>>>> instance: >>>>>>>>>>> >>>>>>>>>>> mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca >>>>>>>>>>> pls_rsh_agent >>>>>>>>>>> "ssh -Y" xterm >>>>>>>>>>> >>>>>>>>>>> Will crash and burn, because the value "ssh -Y" is being >> stored >>>>>>>>>>> into the >>>>>>>>>>> argv orted_cmd_line in orterun.c:1506. This is then added to >>>>>>>>>>> the launch >>>>>>>>>>> command for the orted: >>>>>>>>>>> >>>>>>>>>>> /usr/bin/ssh -Y odin004 PATH=/san/homedirs/tprins/usr/rsl/ >> bin: >>>>>>>>>>> $PATH ; >>>>>>>>>>> export PATH ; >>>>>>>>>>> LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib: >>>>>>>>>>> $LD_LIBRARY_PATH ; >>>>>>>>>>> export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/ >> orted >>>>>>>>>>> --debug >>>>>>>>>>> --debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 -- >>>>>>>>>>> nodename >>>>>>>>>>> odin004 --universe tpr...@odin.cs.indiana.edu:default- >>>>>>>>>>> universe-27872 >>>>>>>>>>> --nsreplica >>>>>>>>>>> "0.0;tcp:// >>>>>>>>>>> 129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d: >> 21a0 >>>>>>>>>>> :4090 8" >>>>>>>>>>> --gprreplica >>>>>>>>>>> "0.0;tcp:// >>>>>>>>>>> 129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d: >> 21a0 >>>>>>>>>>> :4090 8" >>>>>>>>>>> -mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca >>>>>>>>>>> mca_base_param_file_path >>>>>>>>>>> /u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/ >> homedirs/ >>>>>>>>>>> tprins/rsl/ >>>>>>>>>>> examp les >>>>>>>>>>> -mca mca_base_param_file_path_force /san/homedirs/tprins/ >> rsl/ >>>>>>>>>>> examples >>>>>>>>>>> >>>>>>>>>>> Notice that in this command we now have "-mca >> pls_rsh_agent ssh >>>>>>>>>>> -Y". So >>>>>>>>>>> the quotes have been lost, as we die a horrible death. >>>>>>>>>>> >>>>>>>>>>> So we need to add the quotes back in somehow, or pass these >>>>>>>>>>> options >>>>>>>>>>> differently. I'm not sure what the best way to fix this. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Tim >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>> >