Victor,

You can't upgrade to WS-GRAM 4.2 and be compatible with the
rest of the TeraGrid.  4.0 and 4.2 clients and servers aren't
compatible.

JP
On Aug 31, 2009, at 9:51 AM, Martin Feller wrote:

Hi,

The problem you describe, and which is summarized in the bug you mention,
is an architectural problem in WS-GRAM in 4.0.
We fixed it in the 4.2 branch. We had to change the interface for this change
that's why we can't port it back to the 4.0 branch.
If you can upgrade to the 4.2 series I'd recommend this.

With 4.0.x there is currently no other way than:
1. Stop the container
2. Delete the problematic job from the persistence directory (by default
  ~/.globus of the user who runs the container).
  In your case: remove the file
~containeruser/.globus/<hostname>-<port>/ ManagedExecutableJobResourceStateType/1748b3d0-8c4b-11de-8543- b8f655c16264.xml
3. Restart the container.

-Martin

Hazlewood, Victor Gene wrote:
Hey GTers,

Running WSRF v 4.0.8-r2 on a Cray XT5. Have a user job that looks like
it has gone into an unresolvable state and the log file is filling up
with messages about not being able to resolve the FailureFileCleanUp
state.   Anyone have any suggestions how to get rid of this?   Have
looked at the documentation (nothing I found covers this), looked at
bugzilla (http://bugzilla.mcs.anl.gov/globus/show_bug.cgi?id=5247 is
close but says it will be fixed in a future release, but gives no
instructions how to resolve it currently). I'm running out of ideas.



The recurring messages are



2009-08-29 12:40:02,267 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,getInternalState:1666] getting resource datum internalState

2009-08-29 12:40:02,267 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,remove:285] Waiting to be Done or Failed. Current state:
FailureFileCleanUp



Any help on how to resolve this would be appreciated (besides the "it is
fixed in the next release" type of resolution).



Below are the complete job entries for the job.



-Victor





Victor Hazlewood, CISSP

Senior HPC Systems Analyst

National Institute for Computational Science

University of Tennessee

http://www.nics.tennessee.edu/ <http://www.nics.utk.edu/>





Complete log file entry:



2009-08-28 20:13:32,174 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initialize:142] Entering initialize()

2009-08-28 20:13:32,175 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initialize:147] at super.initialize()

2009-08-28 20:13:32,180 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initialize:153] at initSecurity()

2009-08-28 20:13:32,180 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initSecurity:316] Entering initSecurity()

2009-08-28 20:13:32,182 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initSecurity:338] resource credential subject:

2009-08-28 20:13:32,183 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initSecurity:346] setting resource securty grid map...

2009-08-28 20:13:32,183 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initSecurity:356] Leaving initSecurity()

2009-08-28 20:13:32,186 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initVariableMap:704]
GLOBUS_SCRATCH_DIR:${GLOBUS_USER_HOME}/.globus/scratch

2009-08-28 20:13:32,370 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1290] resolving variables in attribute
environment

2009-08-28 20:13:32,370 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1295] looking at string
${GLOBUS_USER_HOME}

2009-08-28 20:13:32,370 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1296] found $ at index 0

2009-08-28 20:13:32,371 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1302] found '{'---looks like a
reference

2009-08-28 20:13:32,371 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1348] looking up GLOBUS_USER_HOME in
{GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch,
GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2,
GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264,
GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu}

2009-08-28 20:13:32,371 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1353] mapped GLOBUS_USER_HOME to value
/nics/c/home/turuncu

2009-08-28 20:13:32,371 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1392] Final string is
/nics/c/home/turuncu

2009-08-28 20:13:32,372 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1290] resolving variables in attribute
environment

2009-08-28 20:13:32,372 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1295] looking at string
${GLOBUS_USER_NAME}

2009-08-28 20:13:32,372 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1296] found $ at index 0

2009-08-28 20:13:32,372 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1302] found '{'---looks like a
reference

2009-08-28 20:13:32,373 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1348] looking up GLOBUS_USER_NAME in
{GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch,
GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2,
GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264,
GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu}

2009-08-28 20:13:32,373 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1353] mapped GLOBUS_USER_NAME to value
turuncu

2009-08-28 20:13:32,373 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1392] Final string is turuncu

2009-08-28 20:13:32,373 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1290] resolving variables in attribute
environment

2009-08-28 20:13:32,374 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1295] looking at string
${GLOBUS_SCRATCH_DIR}

2009-08-28 20:13:32,374 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1296] found $ at index 0

2009-08-28 20:13:32,374 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1302] found '{'---looks like a
reference

2009-08-28 20:13:32,374 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1348] looking up GLOBUS_SCRATCH_DIR in
{GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch,
GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2,
GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264,
GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu}

2009-08-28 20:13:32,375 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1353] mapped GLOBUS_SCRATCH_DIR to
value ${GLOBUS_USER_HOME}/.globus/scratch

2009-08-28 20:13:32,375 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1295] looking at string
${GLOBUS_USER_HOME}/.globus/scratch

2009-08-28 20:13:32,375 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1296] found $ at index 0

2009-08-28 20:13:32,375 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1302] found '{'---looks like a
reference

2009-08-28 20:13:32,376 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1348] looking up GLOBUS_USER_HOME in
{GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch,
GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2,
GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264,
GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu}

2009-08-28 20:13:32,376 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1353] mapped GLOBUS_USER_HOME to value
/nics/c/home/turuncu

2009-08-28 20:13:32,376 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1392] Final string is
/nics/c/home/turuncu/.globus/scratch

2009-08-28 20:13:32,377 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initExtraPerlAttributes:588] Adding extra attributes to the
Perl job attribute map

2009-08-28 20:13:32,377 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initExtraPerlAttributes:615] checking for condorness of PBS

2009-08-28 20:13:32,421 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initialize:171] Perl Job Description: $description = {

   jobdir => [
'/nics/c/home/turuncu/.globus/1748b3d0-8c4b-11de-8543- b8f655c16264' ],

   environment => [ [ 'GLOBUS_LOCATION',
'/usr/local/globus-wsrf-4.0.8-r2' ], [ 'X509_CERT_DIR',
'/etc/grid-security/certificates' ], [ 'X509_USER_PROXY', '' ], [
'X509_USER_CERT', '' ], [ 'X509_USER_KEY', '' ], [ 'HOME',
'/nics/c/home/turuncu' ], [ 'LOGNAME', 'turuncu' ], [
'SCRATCH_DIRECTORY', '/nics/c/home/turuncu/.globus/scratch' ], [
'JAVA_HOME', '/opt/java/jdk1.6.0_05/jre' ], [ 'GLOBUS_GRAM_JOB_HANDLE',
'https://grid.nics.utk.edu:4321/wsrf/services/ManagedExecutableJobServic
e?1748b3d0-8c4b-11de-8543-b8f655c16264' ],  ],

2009-08-28 20:13:32,421 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initialize:178] Leaving initialize()

2009-08-28 20:13:32,429 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,getInternalState:1666] getting resource datum internalState

2009-08-28 20:13:32,429 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,remove:275] Remove called with external state Done and
internal state FailureFileCleanUp

2009-08-28 20:13:32,429 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,remove:285] Waiting to be Done or Failed. Current state:
FailureFileCleanUp

2009-08-28 20:13:34,432 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,getInternalState:1666] getting resource datum internalState

2009-08-28 20:13:34,432 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,remove:285] Waiting to be Done or Failed. Current state:
FailureFileCleanUp



(last two messages repeated 29536 times)








Reply via email to