Hey GTers,
Running WSRF v 4.0.8-r2 on a Cray XT5. Have a user job that looks
like
it has gone into an unresolvable state and the log file is filling up
with messages about not being able to resolve the FailureFileCleanUp
state. Anyone have any suggestions how to get rid of this? Have
looked at the documentation (nothing I found covers this), looked at
bugzilla (http://bugzilla.mcs.anl.gov/globus/show_bug.cgi?id=5247 is
close but says it will be fixed in a future release, but gives no
instructions how to resolve it currently). I'm running out of ideas.
The recurring messages are
2009-08-29 12:40:02,267 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,getInternalState:1666] getting resource datum internalState
2009-08-29 12:40:02,267 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,remove:285] Waiting to be Done or Failed. Current state:
FailureFileCleanUp
Any help on how to resolve this would be appreciated (besides the
"it is
fixed in the next release" type of resolution).
Below are the complete job entries for the job.
-Victor
Victor Hazlewood, CISSP
Senior HPC Systems Analyst
National Institute for Computational Science
University of Tennessee
http://www.nics.tennessee.edu/ <http://www.nics.utk.edu/>
Complete log file entry:
2009-08-28 20:13:32,174 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initialize:142] Entering initialize()
2009-08-28 20:13:32,175 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initialize:147] at super.initialize()
2009-08-28 20:13:32,180 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initialize:153] at initSecurity()
2009-08-28 20:13:32,180 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initSecurity:316] Entering initSecurity()
2009-08-28 20:13:32,182 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initSecurity:338] resource credential subject:
2009-08-28 20:13:32,183 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initSecurity:346] setting resource securty grid map...
2009-08-28 20:13:32,183 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initSecurity:356] Leaving initSecurity()
2009-08-28 20:13:32,186 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initVariableMap:704]
GLOBUS_SCRATCH_DIR:${GLOBUS_USER_HOME}/.globus/scratch
2009-08-28 20:13:32,370 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1290] resolving variables in
attribute
environment
2009-08-28 20:13:32,370 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1295] looking at string
${GLOBUS_USER_HOME}
2009-08-28 20:13:32,370 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1296] found $ at index 0
2009-08-28 20:13:32,371 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1302] found '{'---looks like a
reference
2009-08-28 20:13:32,371 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1348] looking up GLOBUS_USER_HOME
in
{GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch,
GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2,
GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264,
GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu}
2009-08-28 20:13:32,371 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1353] mapped GLOBUS_USER_HOME to
value
/nics/c/home/turuncu
2009-08-28 20:13:32,371 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1392] Final string is
/nics/c/home/turuncu
2009-08-28 20:13:32,372 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1290] resolving variables in
attribute
environment
2009-08-28 20:13:32,372 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1295] looking at string
${GLOBUS_USER_NAME}
2009-08-28 20:13:32,372 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1296] found $ at index 0
2009-08-28 20:13:32,372 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1302] found '{'---looks like a
reference
2009-08-28 20:13:32,373 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1348] looking up GLOBUS_USER_NAME
in
{GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch,
GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2,
GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264,
GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu}
2009-08-28 20:13:32,373 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1353] mapped GLOBUS_USER_NAME to
value
turuncu
2009-08-28 20:13:32,373 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1392] Final string is turuncu
2009-08-28 20:13:32,373 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1290] resolving variables in
attribute
environment
2009-08-28 20:13:32,374 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1295] looking at string
${GLOBUS_SCRATCH_DIR}
2009-08-28 20:13:32,374 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1296] found $ at index 0
2009-08-28 20:13:32,374 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1302] found '{'---looks like a
reference
2009-08-28 20:13:32,374 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1348] looking up
GLOBUS_SCRATCH_DIR in
{GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch,
GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2,
GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264,
GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu}
2009-08-28 20:13:32,375 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1353] mapped GLOBUS_SCRATCH_DIR to
value ${GLOBUS_USER_HOME}/.globus/scratch
2009-08-28 20:13:32,375 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1295] looking at string
${GLOBUS_USER_HOME}/.globus/scratch
2009-08-28 20:13:32,375 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1296] found $ at index 0
2009-08-28 20:13:32,375 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1302] found '{'---looks like a
reference
2009-08-28 20:13:32,376 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1348] looking up GLOBUS_USER_HOME
in
{GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch,
GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2,
GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264,
GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu}
2009-08-28 20:13:32,376 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1353] mapped GLOBUS_USER_HOME to
value
/nics/c/home/turuncu
2009-08-28 20:13:32,376 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,resolveVariableInString:1392] Final string is
/nics/c/home/turuncu/.globus/scratch
2009-08-28 20:13:32,377 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initExtraPerlAttributes:588] Adding extra attributes to the
Perl job attribute map
2009-08-28 20:13:32,377 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initExtraPerlAttributes:615] checking for condorness of PBS
2009-08-28 20:13:32,421 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initialize:171] Perl Job Description: $description = {
jobdir => [
'/nics/c/home/turuncu/.globus/1748b3d0-8c4b-11de-8543-
b8f655c16264' ],
environment => [ [ 'GLOBUS_LOCATION',
'/usr/local/globus-wsrf-4.0.8-r2' ], [ 'X509_CERT_DIR',
'/etc/grid-security/certificates' ], [ 'X509_USER_PROXY', '' ], [
'X509_USER_CERT', '' ], [ 'X509_USER_KEY', '' ], [ 'HOME',
'/nics/c/home/turuncu' ], [ 'LOGNAME', 'turuncu' ], [
'SCRATCH_DIRECTORY', '/nics/c/home/turuncu/.globus/scratch' ], [
'JAVA_HOME', '/opt/java/jdk1.6.0_05/jre' ],
[ 'GLOBUS_GRAM_JOB_HANDLE',
'https://grid.nics.utk.edu:4321/wsrf/services/ManagedExecutableJobServic
e?1748b3d0-8c4b-11de-8543-b8f655c16264' ], ],
2009-08-28 20:13:32,421 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,initialize:178] Leaving initialize()
2009-08-28 20:13:32,429 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,getInternalState:1666] getting resource datum internalState
2009-08-28 20:13:32,429 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,remove:275] Remove called with external state Done and
internal state FailureFileCleanUp
2009-08-28 20:13:32,429 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,remove:285] Waiting to be Done or Failed. Current state:
FailureFileCleanUp
2009-08-28 20:13:34,432 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,getInternalState:1666] getting resource datum internalState
2009-08-28 20:13:34,432 DEBUG
ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264
[Thread-7,remove:285] Waiting to be Done or Failed. Current state:
FailureFileCleanUp
(last two messages repeated 29536 times)