Sorry, but while I'm trying to figure out what's going on - can you
run the monitor without the -F/-Ft? They should be redundant given
the information in the EPR, and I'd like to verify that it works in
their absence.
What machine is the client on? Does it make any difference if you do
the job submission from a different host?
Last bit of info: Can you run the batch/monitor jobs with -debug,
then run a failed "-submit -c" (with no -J/-S/-s) with -debug and send
the results? It looks like the monitor part of the code must be
getting different information when the code runs straight through than
when it comes in two pieces, but looking at globusrun_ws.c I can't see
how.
Thanks,
Charles
On Jun 26, 2008, at 9:58 AM, Steven Timm wrote:
I made the change using vdt's vdt-local-setup.sh
which I know doesn't get modified, and now the epr shows the right
ip in it, and the example you gave works.
but my initial example still doesn't.
bash-3.00$ globusrun-ws -submit -batch -o foo.epr -F
fnpcosg1.fnal.gov:9443 -FtCondor -c /usr/bin/id
Submitting job...Done.
Job ID: uuid:decb6502-438f-11dd-9611-001422086c92
Termination time: 06/27/2008 14:55 GMT
bash-3.00$ more foo.epr
<ns00:EndpointReferenceType xmlns:ns00="http://schemas.xmlsoap.org/ws/2004/03/ad
dressing"><ns00:Address>https://131.225.166.2:9443/wsrf/services/ManagedExecutab
leJobService</ns00:Address><ns00:ReferenceProperties><ResourceID
xmlns="http://w
ww.globus.org/namespaces/2004/10/gram/job">df3b1b40-438f-11dd-88db-
cf7a593808fb<
/ResourceID></ns00:ReferenceProperties><wsa:ReferenceParameters
xmlns:wsa="http:
//schemas.xmlsoap.org/ws/2004/03/addressing"/></
ns00:EndpointReferenceType>
bash-3.00$ globusrun-ws -monitor -j foo.epr -F fnpcosg1.fnal.gov:
9443 -Ft Condor
Current job state: Done
Requesting original job description...Done.
Destroying job...Done.
bash-3.00$ globusrun-ws -submit -F fnpcosg1.fnal.gov:9443 -Ft Condor
-J -s -c /usr/bin/id
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:fa78cea2-438f-11dd-a905-001422086c92
Termination time: 06/27/2008 14:55 GMT
globusrun-ws:
globus_service_engine
.c:globus_l_service_engine_session_started_callback:2744:
Session failed to start
globus_xio_gsi.c:globus_l_xio_gsi_read_token_cb:1335:
The peer authenticated as /DC=org/DC=doegrids/OU=Services/
CN=fnpcosg1.fnal.gov.Expected the peer to authenticate as /CN=host/
fnpc3x1.fnal.gov
bash-3.00$ globusrun-ws -submit -F fnpcosg1.fnal.gov:9443 -Ft Condor
-J -c /usr/bin/id
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:355cc8b6-4390-11dd-a249-001422086c92
Termination time: 06/27/2008 14:57 GMT
globusrun-ws:
globus_service_engine
.c:globus_l_service_engine_session_started_callback:2744:
Session failed to start
globus_xio_gsi.c:globus_l_xio_gsi_read_token_cb:1335:
The peer authenticated as /DC=org/DC=doegrids/OU=Services/
CN=fnpcosg1.fnal.gov.Expected the peer to authenticate as /CN=host/
fnpc3x1.fnal.gov
bash-3.00$ globusrun-ws -submit -F fnpcosg1.fnal.gov:9443 -Ft Condor
-s -c /usr/bin/id
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:3a4ee764-4390-11dd-bb28-001422086c92
Termination time: 06/27/2008 14:57 GMT
globusrun-ws:
globus_service_engine
.c:globus_l_service_engine_session_started_callback:2744:
Session failed to start
globus_xio_gsi.c:globus_l_xio_gsi_read_token_cb:1335:
The peer authenticated as /DC=org/DC=doegrids/OU=Services/
CN=fnpcosg1.fnal.gov.Expected the peer to authenticate as /CN=host/
fnpc3x1.fnal.gov
Any idea what else we might have to fix?
Steve Timm
------------------------------------------------------------------
Steven C. Timm, Ph.D (630) 840-8525
[EMAIL PROTECTED] http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant
Group Leader.
On Thu, 26 Jun 2008, Charles Bacon wrote:
On Jun 26, 2008, at 9:09 AM, Steven Timm wrote:
On Thu, 26 Jun 2008, Charles Bacon wrote:
As an experiment, can you tell me what happens if you run the job
in two parts:
First, try -submit -batch -o foo.epr
Check what hostname/IP shows up in the EPR as the endpoint of the
service.
<ns00:EndpointReferenceType xmlns:ns00="http://schemas.xmlsoap.org/ws/2004/03/ad
dressing"><ns00:Address>https://131.225.167.18:9443/wsrf/services/ManagedExecuta
bleJobService</ns00:Address><ns00:ReferenceProperties><ResourceID
xmlns="http://
www.globus.org/namespaces/2004/10/gram/
job">da7e0c90-4388-11dd-96e1-d1739b31397d
</ResourceID></ns00:ReferenceProperties><wsa:ReferenceParameters
xmlns:wsa="http
://schemas.xmlsoap.org/ws/2004/03/addressing"/></
ns00:EndpointReferenceType>
that's the wrong IP, it should be the other one.
Okay. So, that's going to be the difference between globus-job-run
and globusrun-ws. The globusrun-ws client is getting back an
address from the container that it will use to get further
updates. The (submit/batch) part of the job is using the address
you hand-supplied on the commandline, so it's working. The
(monitor) part of the client is failing because the service is
returning a bad address.
The fix is to get the container to bind to the right address, which
you can do with GLOBUS_HOSTNAME.
as far as I can tell, GLOBUS_HOSTNAME is not set in the environment
of the container. What's the best way to set it in a VDT
environment?
I did set GLOBUS_HOSTNAME before I installed the VDT, to fnpcosg1.
I am now running the container in full-out debug mode so if there
are any logs you need to see, let me know.
It's starting globus-start-container out of /etc/init.d/globus-ws.
It looks like it sources both setup.sh and vdt/etc/globus-
options.sh. globus-options.sh looks like it is intended to setup
the JVM options used by the container. If I were going to set
GLOBUS_HOSTNAME, based on what I've seen I'd put it in the init.d
script, or the globus-options.sh file. I'm not sure if those two
are vulnerable to being overwritten during a pacman update or by a
vdt-control on/off.
The other place you can fix it that's not VDT-specific is under
$GLOBUS_LOCATION/etc/globus_wsrf_core/server-config.wsdd. The
options are described at http://www.globus.org/toolkit/docs/4.0/common/javawscore/admin-index.html#id2531913
. Basically, adding a "<parameter name="logicalHost"
value="the.right.ip.address"> to the globalConfiguration section is
equivalent to setting your GLOBUS_HOSTNAME to that IP address.
Charles