As a note - one difference is that WS-GRAM won't delegate a job
credential unless you ask it to. The goal is to minimize work on
jobs that don't need to use the delegated proxy. You need to specify
-J in your submit to have a credential delegated, which in turn sets
the X509_USER_PROXY environment variable.
I haven't seen mpig in action, so I don't know what's supposed to set
the MPI* environment variables under discussion.
Charles
On Nov 9, 2007, at 6:23 AM, Karonis Nicholas wrote:
yes, I've seen this before. It's not an MPIg issue, it's a
Globus installation/configuration issue.
Propagating env vars to the running job that are explicitly specified
in RSL or XML Globus job description files or "permanent" env vars
found in .* files (e.g., .soft, .cshrc) has always been a sticky
wicket because they often (always?) require modifying the Globus
Job Manager scripts to pass env vars/vals to the running app.
My best advice to move forward in troubleshooting is to take MPIg
out of the picture for now and run Globus jobs only with a small
"/bin/env" program, both WS and pre-WS, specifying env vars in
the RSL/XML and check stdout.
Here's an example RSL you can use for pre-WS:
&(count=1)
(host_count="1:ia64-compute")
(environment=(FOO bar))
(executable=/usr/bin/env)
and you can run that with:
globusrun -o -r "<gatekeeper/JM>" -f my.rsl
and here's an example XML you can use for WS:
<job>
<executable>/usr/bin/env</executable>
<environment> <name>FOO</name> <value>bar</value> </environment>
<count>1</count>
<maxTime>1</maxTime>
</job>
which you can run with:
globusrun-ws -q -s -submit \
-F https://<gatekeeper>/wsrf/services/
ManagedJobFactoryService \
-Ft <FactoryType> \
-f $$.env.xml
where <gatekeeper> is something like tg-grid1.uc.teragrid.org
and <FactoryType> is something like PBS.
There is a so-called "7-step test script" that was developed
a long time back (pre-WS) to test a Globus installation that
tests things like "propagating env vars". There's a WS-version
of the "7-step test script" under development. I've created
a tarball that has both scripts and other various supporting
files (with READMEs) and attached it to the end of this
message. The pre-WS should work and most of the WS stuff
should work too.
Good luck,
Nick
<7step.tar>
On Nov 8, 2007, at 11:14 AM, Michael Lambert wrote:
Some users of mine are reporting errors when attempting to run
MPIg jobs via
WS Gram. Pre-WS Gram jobs run fine. A comparison of the
environment variables
available to pre vs. post WS Gram revealed that the WS jobs were
missing a
slew of information such as X509_USER_PROXY (which causes the jobs
to fail
immediately), all of the MPI* vars and some others. Has anyone here
experienced similar issues?
BTW, we are running GT 4.0.4 on AIX 5.3.
--
Michael Lambert
System Administrator
Louisiana Optical Network Initiative /
High Performance Computing @ LSU
http://www.loni.org/
http://www.hpc.lsu.edu/