As a note - one difference is that WS-GRAM won't delegate a job credential unless you ask it to. The goal is to minimize work on jobs that don't need to use the delegated proxy. You need to specify -J in your submit to have a credential delegated, which in turn sets the X509_USER_PROXY environment variable.

I haven't seen mpig in action, so I don't know what's supposed to set the MPI* environment variables under discussion.


Charles

On Nov 9, 2007, at 6:23 AM, Karonis Nicholas wrote:

yes, I've seen this before.  It's not an MPIg issue, it's a
Globus installation/configuration issue.

Propagating env vars to the running job that are explicitly specified
in RSL or XML Globus job description files or "permanent" env vars
found in .* files (e.g., .soft, .cshrc) has always been a sticky
wicket because they often (always?) require modifying the Globus
Job Manager scripts to pass env vars/vals to the running app.

My best advice to move forward in troubleshooting is to take MPIg
out of the picture for now and run Globus jobs only with a small
"/bin/env" program, both WS and pre-WS, specifying env vars in
the RSL/XML and check stdout.

Here's an example RSL you can use for pre-WS:
&(count=1)
(host_count="1:ia64-compute")
(environment=(FOO bar))
(executable=/usr/bin/env)

and you can run that with:
globusrun -o -r "<gatekeeper/JM>" -f my.rsl



and here's an example XML you can use for WS:
<job>
    <executable>/usr/bin/env</executable>
    <environment> <name>FOO</name> <value>bar</value> </environment>
    <count>1</count>
    <maxTime>1</maxTime>
</job>

which you can run with:
globusrun-ws -q -s -submit \
-F https://<gatekeeper>/wsrf/services/ ManagedJobFactoryService \
        -Ft <FactoryType> \
        -f $$.env.xml

where <gatekeeper> is something like tg-grid1.uc.teragrid.org
and <FactoryType> is something like PBS.

There is a so-called "7-step test script" that was developed
a long time back (pre-WS) to test a Globus installation that
tests things like "propagating env vars".   There's a WS-version
of the "7-step test script" under development.  I've created
a tarball that has both scripts and other various supporting
files (with READMEs) and attached it to the end of this
message.  The pre-WS should work and most of the WS stuff
should work too.

Good luck,
Nick

<7step.tar>

On Nov 8, 2007, at 11:14 AM, Michael Lambert wrote:

Some users of mine are reporting errors when attempting to run MPIg jobs via WS Gram. Pre-WS Gram jobs run fine. A comparison of the environment variables available to pre vs. post WS Gram revealed that the WS jobs were missing a slew of information such as X509_USER_PROXY (which causes the jobs to fail
immediately), all of the MPI* vars and some others. Has anyone here
experienced similar issues?

BTW, we are running GT 4.0.4 on AIX 5.3.

--
Michael Lambert
System Administrator
Louisiana Optical Network Initiative /
High Performance Computing @ LSU
http://www.loni.org/
http://www.hpc.lsu.edu/


Reply via email to