I made the change using vdt's vdt-local-setup.sh
which I know doesn't get modified, and now the epr shows the right
ip in it, and the example you gave works.
but my initial example still doesn't.
bash-3.00$ globusrun-ws -submit -batch -o foo.epr -F
fnpcosg1.fnal.gov:9443 -FtCondor -c /usr/bin/id
Submitting job...Done.
Job ID: uuid:decb6502-438f-11dd-9611-001422086c92
Termination time: 06/27/2008 14:55 GMT
bash-3.00$ more foo.epr
<ns00:EndpointReferenceType
xmlns:ns00="http://schemas.xmlsoap.org/ws/2004/03/ad
dressing"><ns00:Address>https://131.225.166.2:9443/wsrf/services/ManagedExecutab
leJobService</ns00:Address><ns00:ReferenceProperties><ResourceID
xmlns="http://w
ww.globus.org/namespaces/2004/10/gram/job">df3b1b40-438f-11dd-88db-cf7a593808fb<
/ResourceID></ns00:ReferenceProperties><wsa:ReferenceParameters
xmlns:wsa="http:
//schemas.xmlsoap.org/ws/2004/03/addressing"/></ns00:EndpointReferenceType>
bash-3.00$ globusrun-ws -monitor -j foo.epr -F fnpcosg1.fnal.gov:9443 -Ft
Condor
Current job state: Done
Requesting original job description...Done.
Destroying job...Done.
bash-3.00$ globusrun-ws -submit -F fnpcosg1.fnal.gov:9443 -Ft Condor -J -s
-c /usr/bin/id
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:fa78cea2-438f-11dd-a905-001422086c92
Termination time: 06/27/2008 14:55 GMT
globusrun-ws:
globus_service_engine.c:globus_l_service_engine_session_started_callback:2744:
Session failed to start
globus_xio_gsi.c:globus_l_xio_gsi_read_token_cb:1335:
The peer authenticated as
/DC=org/DC=doegrids/OU=Services/CN=fnpcosg1.fnal.gov.Expected the peer to
authenticate as /CN=host/fnpc3x1.fnal.gov
bash-3.00$ globusrun-ws -submit -F fnpcosg1.fnal.gov:9443 -Ft Condor -J
-c /usr/bin/id
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:355cc8b6-4390-11dd-a249-001422086c92
Termination time: 06/27/2008 14:57 GMT
globusrun-ws:
globus_service_engine.c:globus_l_service_engine_session_started_callback:2744:
Session failed to start
globus_xio_gsi.c:globus_l_xio_gsi_read_token_cb:1335:
The peer authenticated as
/DC=org/DC=doegrids/OU=Services/CN=fnpcosg1.fnal.gov.Expected the peer to
authenticate as /CN=host/fnpc3x1.fnal.gov
bash-3.00$ globusrun-ws -submit -F fnpcosg1.fnal.gov:9443 -Ft Condor -s
-c /usr/bin/id
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:3a4ee764-4390-11dd-bb28-001422086c92
Termination time: 06/27/2008 14:57 GMT
globusrun-ws:
globus_service_engine.c:globus_l_service_engine_session_started_callback:2744:
Session failed to start
globus_xio_gsi.c:globus_l_xio_gsi_read_token_cb:1335:
The peer authenticated as
/DC=org/DC=doegrids/OU=Services/CN=fnpcosg1.fnal.gov.Expected the peer to
authenticate as /CN=host/fnpc3x1.fnal.gov
Any idea what else we might have to fix?
Steve Timm
------------------------------------------------------------------
Steven C. Timm, Ph.D (630) 840-8525
[EMAIL PROTECTED] http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.
On Thu, 26 Jun 2008, Charles Bacon wrote:
On Jun 26, 2008, at 9:09 AM, Steven Timm wrote:
On Thu, 26 Jun 2008, Charles Bacon wrote:
As an experiment, can you tell me what happens if you run the job in two
parts:
First, try -submit -batch -o foo.epr
Check what hostname/IP shows up in the EPR as the endpoint of the service.
<ns00:EndpointReferenceType
xmlns:ns00="http://schemas.xmlsoap.org/ws/2004/03/ad
dressing"><ns00:Address>https://131.225.167.18:9443/wsrf/services/ManagedExecuta
bleJobService</ns00:Address><ns00:ReferenceProperties><ResourceID
xmlns="http://
www.globus.org/namespaces/2004/10/gram/job">da7e0c90-4388-11dd-96e1-d1739b31397d
</ResourceID></ns00:ReferenceProperties><wsa:ReferenceParameters
xmlns:wsa="http
://schemas.xmlsoap.org/ws/2004/03/addressing"/></ns00:EndpointReferenceType>
that's the wrong IP, it should be the other one.
Okay. So, that's going to be the difference between globus-job-run and
globusrun-ws. The globusrun-ws client is getting back an address from the
container that it will use to get further updates. The (submit/batch) part
of the job is using the address you hand-supplied on the commandline, so it's
working. The (monitor) part of the client is failing because the service is
returning a bad address.
The fix is to get the container to bind to the right address, which you can
do with GLOBUS_HOSTNAME.
as far as I can tell, GLOBUS_HOSTNAME is not set in the environment
of the container. What's the best way to set it in a VDT environment?
I did set GLOBUS_HOSTNAME before I installed the VDT, to fnpcosg1.
I am now running the container in full-out debug mode so if there
are any logs you need to see, let me know.
It's starting globus-start-container out of /etc/init.d/globus-ws. It looks
like it sources both setup.sh and vdt/etc/globus-options.sh.
globus-options.sh looks like it is intended to setup the JVM options used by
the container. If I were going to set GLOBUS_HOSTNAME, based on what I've
seen I'd put it in the init.d script, or the globus-options.sh file. I'm not
sure if those two are vulnerable to being overwritten during a pacman update
or by a vdt-control on/off.
The other place you can fix it that's not VDT-specific is under
$GLOBUS_LOCATION/etc/globus_wsrf_core/server-config.wsdd. The options are
described at
http://www.globus.org/toolkit/docs/4.0/common/javawscore/admin-index.html#id2531913.
Basically, adding a "<parameter name="logicalHost"
value="the.right.ip.address"> to the globalConfiguration section is
equivalent to setting your GLOBUS_HOSTNAME to that IP address.
Charles