Should there be another option for passing MCA parameters between processes, such as via stdin (or any file descriptor)? I.e., during the command line parsing to check for command line MCA params, perhaps a new argument could be introduced: -mcauri <uri>, where <uri> could be a few different forms:

- file://stdin: (note the 2 //, not 3, so "stdin" would never conflict with a real file named /stdin) Read the parameters in off stdin.

- rml://...rml contact info...: read in the MCA params via the RML (although I assume that reading via the RML would be *waaaay* to late during the MCA setup process -- I mentioned this option for completeness, even though I don't think it'll work)

- ip://ipaddress:port: open a socket back and read the MCA params in over a socket. This could have some scalability issues...? But who knows; it could be tied into the hierarchical startup such that we wouldn't have to have an all-to-one connection scheme. Certainly it would cause scalability problems when paired with today's all-to-one RML connection scheme for the OOB.

I'm not sure that the rml: and ip: schemes are worthwhile. Maybe a file://stdin kind of approach could work? Or perhaps some other kind of URI/IPC...? (I really haven't thought through the issues -- this is off the top of my head)



On Nov 8, 2007, at 2:36 PM, Ralph H Castain wrote:

Might I suggest:

https://svn.open-mpi.org/trac/ompi/ticket/1073

It deals with some of these issues and explains the boundaries of the
problem. As for what a string param can contain, I have no opinion. I only note that it must handle special characters such as ';', '/', etc. that are typically found in uri's. I cannot think of any reason it should have a
quote in it.

Ralph



On 11/8/07 12:25 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote:

The alias option you presented does not work. I think we do some weird
things to find the absolute path for ssh, instead of just issuing the
command.

I would spend some time fixing this, but I don't want to do it wrong. We
could quote all the param values, and change the parser to remove the
quotes, but this is assuming that a mca param does not contain quotes.

So I guess there are 2 questions that need to be answered before a fix
is made:

1. What exactly can a string mca param contain? Can it have quotes or
spaces or?

2. Which mca parameters should be forwarded? Should it be just the ones
from the command line? From the environment? From config files?

Tim

Ralph Castain wrote:
What changed is that we never passed mca params to the orted before - they always went to the app, but it's the orted that has the issue. There is a
bug ticket thread on this subject - I forget the number immediately.

Basically, the problem was that we cannot generally pass the local
environment to the orteds when we launch them. However, people needed various mca params to get to the orteds to control their behavior. The only way to resolve that problem was to pass the params via the command line,
which is what was done.

Except for a very few cases, all of our mca params are single values that do not include spaces, so this is not a problem that is causing widespread issues. As I said, I already had to deal with one special case that didn't involve spaces, but did have special characters that required quoting, which
identified the larger problem of dealing with quoted strings.

I have no objection to a more general fix. Like I said in my note, though, the general fix will take a larger effort. If someone is willing to do so, that is fine with me - I was only offering solutions that would fill the interim time as I haven't heard anyone step up to say they would fix it
anytime soon.

Please feel free to jump in and volunteer! ;-) I'm willing to put the quotes around things if you will fix the mca cmd line parser to cleanly remove them
on the other end.

Ralph



On 11/7/07 5:50 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote:

I'm curious what changed to make this a problem. How were we passing mca
param
from the base to the app before, and why did it change?

I think that options 1 & 2 below are no good, since we, in general, allow string mca params to have spaces (as far as I understand it). So a more
general approach is needed.

Tim

On Wednesday 07 November 2007 10:40:45 am Ralph H Castain wrote:
Sorry for delay - wasn't ignoring the issue.

There are several fixes to this problem - ranging in order from least to
most work:

1. just alias "ssh" to be "ssh -Y" and run without setting the mca param. It won't affect anything on the backend because the daemon/procs don't use
ssh.

2. include "pls_rsh_agent" in the array of mca params not to be passed to
the orted in orte/mca/pls/base/pls_base_general_support_fns.c, the
orte_pls_base_orted_append_basic_args function. This would fix the specific problem cited here, but I admit that listing every such param by name would
get tedious.

3. we could easily detect that a "problem" character was in the mca param value when we add it to the orted's argv, and then put "" around it. The problem, however, is that the mca param parser on the far end doesn't remove those "" from the resulting string. At least, I spent over a day fighting with a problem only to discover that was happening. Could be an error in the way I was doing things, or could be a real characteristic of the parser. Anyway, we would have to ensure that the parser removes any surrounding "" before passing along the param value or this won't work.

Ralph

On 11/5/07 12:10 PM, "Tim Prins" <tpr...@cs.indiana.edu> wrote:
Hi,

Commit 16364 broke things when using multiword mca param values. For
instance:

mpirun --debug-daemons -mca orte_debug 1 -mca pls rsh -mca pls_rsh_agent
"ssh -Y" xterm

Will crash and burn, because the value "ssh -Y" is being stored into the argv orted_cmd_line in orterun.c:1506. This is then added to the launch
command for the orted:

/usr/bin/ssh -Y odin004 PATH=/san/homedirs/tprins/usr/rsl/bin: $PATH ;
export PATH ;
LD_LIBRARY_PATH=/san/homedirs/tprins/usr/rsl/lib: $LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; /san/homedirs/tprins/usr/rsl/bin/orted --debug --debug-daemons --name 0.1 --num_procs 2 --vpid_start 0 -- nodename odin004 --universe tpr...@odin.cs.indiana.edu:default- universe-27872
--nsreplica
"0.0;tcp:// 129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0
:4090 8"
--gprreplica
"0.0;tcp:// 129.79.240.100:40907;tcp6://2001:18e8:2:240:2e0:81ff:fe2d:21a0
:4090 8"
-mca orte_debug 1 -mca pls_rsh_agent ssh -Y -mca
mca_base_param_file_path
/u/tprins/usr/rsl/share/openmpi/amca-param-sets:/san/homedirs/ tprins/rsl/
examp les
-mca mca_base_param_file_path_force /san/homedirs/tprins/rsl/ examples

Notice that in this command we now have "-mca pls_rsh_agent ssh -Y". So
the quotes have been lost, as we die a horrible death.

So we need to add the quotes back in somehow, or pass these options
differently. I'm not sure what the best way to fix this.

Thanks,

Tim





_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

Reply via email to