I'm not sure it is really necessary - the problem is solely within opal_cmd_line_parse and (if someone can parse that code ;-)) is truly simple to fix. The overly long cmd line issue is due to a bug that Josh was going to look at (may already have done so while I was out of touch).
Ralph On 11/9/07 5:10 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote: > Should there be another option for passing MCA parameters between > processes, such as via stdin (or any file descriptor)? I.e., during > the command line parsing to check for command line MCA params, perhaps > a new argument could be introduced: -mcauri <uri>, where <uri> could > be a few different forms: > > - file://stdin: (note the 2 //, not 3, so "stdin" would never conflict > with a real file named /stdin) Read the parameters in off stdin. > > - rml://...rml contact info...: read in the MCA params via the RML > (although I assume that reading via the RML would be *waaaay* to late > during the MCA setup process -- I mentioned this option for > completeness, even though I don't think it'll work) > > - ip://ipaddress:port: open a socket back and read the MCA params in > over a socket. This could have some scalability issues...? But who > knows; it could be tied into the hierarchical startup such that we > wouldn't have to have an all-to-one connection scheme. Certainly it > would cause scalability problems when paired with today's all-to-one > RML connection scheme for the OOB. > > I'm not sure that the rml: and ip: schemes are worthwhile. Maybe a > file://stdin kind of approach could work? Or perhaps some other kind > of URI/IPC...? (I really haven't thought through the issues -- this > is off the top of my head) > > > > On Nov 8, 2007, at 2:36 PM, Ralph H Castain wrote: > >> Might I suggest: >> >> https://svn.open-mpi.org/trac/ompi/ticket/1073 >> >> It deals with some of these issues and explains the boundaries of the >> problem. As for what a string param can contain, I have no opinion. >> I only >> note that it must handle special characters such as ';', '/', etc. >> that are >> typically found in uri's. I cannot think of any reason it should >> have a >> quote in it. >> >> Ralph >> >> >> >> On 11/8/07 12:25 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote: >> >>> The alias option you presented does not work. I think we do some >>> weird >>> things to find the absolute path for ssh, instead of just issuing the >>> command. >>> >>> I would spend some time fixing this, but I don't want to do it >>> wrong. We >>> could quote all the param values, and change the parser to remove the >>> quotes, but this is assuming that a mca param does not contain >>> quotes. >>> >>> So I guess there are 2 questions that need to be answered before a >>> fix >>> is made: >>> >>> 1. What exactly can a string mca param contain? Can it have quotes or >>> spaces or? >>> >>> 2. Which mca parameters should be forwarded? Should it be just the >>> ones >>> from the command line? From the environment? From config files? >>> >>> Tim >>> >>> Ralph Castain wrote: >>>> What changed is that we never passed mca params to the orted >>>> before - they >>>> always went to the app, but it's the orted that has the issue. >>>> There is a >>>> bug ticket thread on this subject - I forget the number immediately. >>>> >>>> Basically, the problem was that we cannot generally pass the local >>>> environment to the orteds when we launch them. However, people >>>> needed >>>> various mca params to get to the orteds to control their behavior. >>>> The only >>>> way to resolve that problem was to pass the params via the command >>>> line, >>>> which is what was done. >>>> >>>> Except for a very few cases, all of our mca params are single >>>> values that do >>>> not include spaces, so this is not a problem that is causing >>>> widespread >>>> issues. As I said, I already had to deal with one special case >>>> that didn't >>>> involve spaces, but did have special characters that required >>>> quoting, which >>>> identified the larger problem of dealing with quoted strings. >>>> >>>> I have no objection to a more general fix. Like I said in my note, >>>> though, >>>> the general fix will take a larger effort. If someone is willing >>>> to do so, >>>> that is fine with me - I was only offering solutions that would >>>> fill the >>>> interim time as I haven't heard anyone step up to say they would >>>> fix it >>>> anytime soon. >>>> >>>> Please feel free to jump in and volunteer! ;-) I'm willing to put >>>> the quotes >>>> around things if you will fix the mca cmd line parser to cleanly >>>> remove them >>>> on the other end. >>>> >>>> Ralph >>>> >>>> >>>> >>>> On 11/7/07 5:50 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote: >>>> >>>>> I'm curious what changed to make this a problem. How were we >>>>> passing mca >>>>> param >>>>> from the base to the app before, and why did it change? >>>>> >>>>> I think that options 1 & 2 below are no good, since we, in >>>>> general, allow >>>>> string mca params to have spaces (as far as I understand it). So >>>>> a more >>>>> general approach is needed. >>>>> >>>>> Tim >>>>> >>>>> On Wednesday 07 November 2007 10:40:45 am Ralph H Castain wrote: >>>>>> Sorry for delay - wasn't ignoring the issue. >>>>>> >>>>>> There are several fixes to this problem - ranging in order from >>>>>> least to >>>>>> most work: >>>>>> >>>>>> 1. just alias "ssh" to be "ssh -Y" and run without setting the >>>>>> mca param. >>>>>> It won't affect anything on the backend because the daemon/procs >>>>>> don't use >>>>>> ssh. >>>>>> >>>>>> 2. include "pls_rsh_agent" in the array of mca params not to be >>>>>> passed to >>>>>> the orted in orte/mca/pls/base/pls_base_general_support_fns.c, the >>>>>> orte_pls_base_orted_append_basic_args function. This would fix >>>>>> the specific >>>>>> problem cited here, but I admit that listing every such param by >>>>>> name would >>>>>> get tedious. >>>>>> >>>>>> 3. we could easily detect that a "problem" character was in the >>>>>> mca param >>>>>> value when we add it to the orted's argv, and then put "" around >>>>>> it. The >>>>>> problem, however, is that the mca param parser on the far end >>>>>> doesn't >>>>>> remove those "" from the resulting string. At least, I spent >>>>>> over a day >>>>>> fighting with a problem only to discover that was happening. >>>>>> Could be an >>>>>> error in the way I was doing things, or could be a real >>>>>> characteristic of >>>>>> the parser. Anyway, we would have to ensure that the parser >>>>>> removes any >>>>>> surrounding "" before passing along the param value or this >>>>>> won't work. >>>>>> >>>>>> Ralph >>>>>> >>>>>> On 11/5/07 12:10 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Commit 16364 broke things when using multiword mca param >>>>>>> values. For >>>>>>> instance: >>>>>>> >>>>>>> mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca >>>>>>> pls_rsh_agent >>>>>>> "ssh -Y" xterm >>>>>>> >>>>>>> Will crash and burn, because the value "ssh -Y" is being stored >>>>>>> into the >>>>>>> argv orted_cmd_line in orterun.c:1506. This is then added to >>>>>>> the launch >>>>>>> command for the orted: >>>>>>> >>>>>>> /usr/bin/ssh -Y odin004 PATH=/san/homedirs/tprins/usr/rsl/bin: >>>>>>> $PATH ; >>>>>>> export PATH ; >>>>>>> LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib: >>>>>>> $LD_LIBRARY_PATH ; >>>>>>> export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/orted >>>>>>> --debug >>>>>>> --debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 -- >>>>>>> nodename >>>>>>> odin004 --universe tpr...@odin.cs.indiana.edu:default- >>>>>>> universe-27872 >>>>>>> --nsreplica >>>>>>> "0.0;tcp:// >>>>>>> 129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0 >>>>>>> :4090 8" >>>>>>> --gprreplica >>>>>>> "0.0;tcp:// >>>>>>> 129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0 >>>>>>> :4090 8" >>>>>>> -mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca >>>>>>> mca_base_param_file_path >>>>>>> /u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/homedirs/ >>>>>>> tprins/rsl/ >>>>>>> examp les >>>>>>> -mca mca_base_param_file_path_force /san/homedirs/tprins/rsl/ >>>>>>> examples >>>>>>> >>>>>>> Notice that in this command we now have "-mca pls_rsh_agent ssh >>>>>>> -Y". So >>>>>>> the quotes have been lost, as we die a horrible death. >>>>>>> >>>>>>> So we need to add the quotes back in somehow, or pass these >>>>>>> options >>>>>>> differently. I'm not sure what the best way to fix this. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Tim >>>>> >>>> >>> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >