I made the change using vdt's vdt-local-setup.sh
which I know doesn't get modified, and now the epr shows the right
ip in it, and the example you gave works.
but my initial example still doesn't.
bash-3.00$ globusrun-ws -submit -batch -o foo.epr -F
fnpcosg1.fnal.gov:9443 -FtCondor -c /usr/bin/id
Submitting job...Done.
Job ID: uuid:decb6502-438f-11dd-9611-001422086c92
Termination time: 06/27/2008 14:55 GMT
bash-3.00$ more foo.epr
<ns00:EndpointReferenceType xmlns:ns00="http://schemas.xmlsoap.org/ws/2004/03/ad
dressing"><ns00:Address>https://131.225.166.2:9443/wsrf/services/ManagedExecutab
leJobService</ns00:Address><ns00:ReferenceProperties><ResourceID
xmlns="http://w
ww.globus.org/namespaces/2004/10/gram/job">df3b1b40-438f-11dd-88db-
cf7a593808fb<
/ResourceID></ns00:ReferenceProperties><wsa:ReferenceParameters
xmlns:wsa="http:
//schemas.xmlsoap.org/ws/2004/03/addressing"/></
ns00:EndpointReferenceType>
bash-3.00$ globusrun-ws -monitor -j foo.epr -F fnpcosg1.fnal.gov:
9443 -Ft Condor
Current job state: Done
Requesting original job description...Done.
Destroying job...Done.
bash-3.00$ globusrun-ws -submit -F fnpcosg1.fnal.gov:9443 -Ft
Condor -J -s -c /usr/bin/id
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:fa78cea2-438f-11dd-a905-001422086c92
Termination time: 06/27/2008 14:55 GMT
globusrun-ws:
globus_service_engine
.c:globus_l_service_engine_session_started_callback:2744:
Session failed to start
globus_xio_gsi.c:globus_l_xio_gsi_read_token_cb:1335:
The peer authenticated as /DC=org/DC=doegrids/OU=Services/
CN=fnpcosg1.fnal.gov.Expected the peer to authenticate as /CN=host/
fnpc3x1.fnal.gov
bash-3.00$ globusrun-ws -submit -F fnpcosg1.fnal.gov:9443 -Ft
Condor -J -c /usr/bin/id
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:355cc8b6-4390-11dd-a249-001422086c92
Termination time: 06/27/2008 14:57 GMT
globusrun-ws:
globus_service_engine
.c:globus_l_service_engine_session_started_callback:2744:
Session failed to start
globus_xio_gsi.c:globus_l_xio_gsi_read_token_cb:1335:
The peer authenticated as /DC=org/DC=doegrids/OU=Services/
CN=fnpcosg1.fnal.gov.Expected the peer to authenticate as /CN=host/
fnpc3x1.fnal.gov
bash-3.00$ globusrun-ws -submit -F fnpcosg1.fnal.gov:9443 -Ft
Condor -s -c /usr/bin/id
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:3a4ee764-4390-11dd-bb28-001422086c92
Termination time: 06/27/2008 14:57 GMT
globusrun-ws:
globus_service_engine
.c:globus_l_service_engine_session_started_callback:2744:
Session failed to start
globus_xio_gsi.c:globus_l_xio_gsi_read_token_cb:1335:
The peer authenticated as /DC=org/DC=doegrids/OU=Services/
CN=fnpcosg1.fnal.gov.Expected the peer to authenticate as /CN=host/
fnpc3x1.fnal.gov
Any idea what else we might have to fix?
Steve Timm
------------------------------------------------------------------
Steven C. Timm, Ph.D (630) 840-8525
[EMAIL PROTECTED] http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant
Group Leader.
On Thu, 26 Jun 2008, Charles Bacon wrote:
On Jun 26, 2008, at 9:09 AM, Steven Timm wrote:
On Thu, 26 Jun 2008, Charles Bacon wrote:
As an experiment, can you tell me what happens if you run the
job in two parts:
First, try -submit -batch -o foo.epr
Check what hostname/IP shows up in the EPR as the endpoint of
the service.
<ns00:EndpointReferenceType xmlns:ns00="http://schemas.xmlsoap.org/ws/2004/03/ad
dressing"><ns00:Address>https://131.225.167.18:9443/wsrf/services/ManagedExecuta
bleJobService</ns00:Address><ns00:ReferenceProperties><ResourceID
xmlns="http://
www.globus.org/namespaces/2004/10/gram/
job">da7e0c90-4388-11dd-96e1-d1739b31397d
</ResourceID></ns00:ReferenceProperties><wsa:ReferenceParameters
xmlns:wsa="http
://schemas.xmlsoap.org/ws/2004/03/addressing"/></
ns00:EndpointReferenceType>
that's the wrong IP, it should be the other one.
Okay. So, that's going to be the difference between globus-job-
run and globusrun-ws. The globusrun-ws client is getting back an
address from the container that it will use to get further
updates. The (submit/batch) part of the job is using the address
you hand-supplied on the commandline, so it's working. The
(monitor) part of the client is failing because the service is
returning a bad address.
The fix is to get the container to bind to the right address,
which you can do with GLOBUS_HOSTNAME.
as far as I can tell, GLOBUS_HOSTNAME is not set in the environment
of the container. What's the best way to set it in a VDT
environment?
I did set GLOBUS_HOSTNAME before I installed the VDT, to fnpcosg1.
I am now running the container in full-out debug mode so if there
are any logs you need to see, let me know.
It's starting globus-start-container out of /etc/init.d/globus-
ws. It looks like it sources both setup.sh and vdt/etc/globus-
options.sh. globus-options.sh looks like it is intended to setup
the JVM options used by the container. If I were going to set
GLOBUS_HOSTNAME, based on what I've seen I'd put it in the init.d
script, or the globus-options.sh file. I'm not sure if those two
are vulnerable to being overwritten during a pacman update or by a
vdt-control on/off.
The other place you can fix it that's not VDT-specific is under
$GLOBUS_LOCATION/etc/globus_wsrf_core/server-config.wsdd. The
options are described at http://www.globus.org/toolkit/docs/4.0/common/javawscore/admin-index.html#id2531913
. Basically, adding a "<parameter name="logicalHost"
value="the.right.ip.address"> to the globalConfiguration section
is equivalent to setting your GLOBUS_HOSTNAME to that IP address.
Charles