Hey GTers,
Running WSRF v 4.0.8-r2 on a Cray XT5. Have a user job that looks like it has gone into an unresolvable state and the log file is filling up with messages about not being able to resolve the FailureFileCleanUp state. Anyone have any suggestions how to get rid of this? Have looked at the documentation (nothing I found covers this), looked at bugzilla (http://bugzilla.mcs.anl.gov/globus/show_bug.cgi?id=5247 is close but says it will be fixed in a future release, but gives no instructions how to resolve it currently). I'm running out of ideas. The recurring messages are 2009-08-29 12:40:02,267 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,getInternalState:1666] getting resource datum internalState 2009-08-29 12:40:02,267 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,remove:285] Waiting to be Done or Failed. Current state: FailureFileCleanUp Any help on how to resolve this would be appreciated (besides the "it is fixed in the next release" type of resolution). Below are the complete job entries for the job. -Victor Victor Hazlewood, CISSP Senior HPC Systems Analyst National Institute for Computational Science University of Tennessee http://www.nics.tennessee.edu/ <http://www.nics.utk.edu/> Complete log file entry: 2009-08-28 20:13:32,174 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initialize:142] Entering initialize() 2009-08-28 20:13:32,175 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initialize:147] at super.initialize() 2009-08-28 20:13:32,180 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initialize:153] at initSecurity() 2009-08-28 20:13:32,180 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initSecurity:316] Entering initSecurity() 2009-08-28 20:13:32,182 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initSecurity:338] resource credential subject: 2009-08-28 20:13:32,183 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initSecurity:346] setting resource securty grid map... 2009-08-28 20:13:32,183 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initSecurity:356] Leaving initSecurity() 2009-08-28 20:13:32,186 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initVariableMap:704] GLOBUS_SCRATCH_DIR:${GLOBUS_USER_HOME}/.globus/scratch 2009-08-28 20:13:32,370 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1290] resolving variables in attribute environment 2009-08-28 20:13:32,370 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1295] looking at string ${GLOBUS_USER_HOME} 2009-08-28 20:13:32,370 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1296] found $ at index 0 2009-08-28 20:13:32,371 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1302] found '{'---looks like a reference 2009-08-28 20:13:32,371 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1348] looking up GLOBUS_USER_HOME in {GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch, GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2, GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264, GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu} 2009-08-28 20:13:32,371 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1353] mapped GLOBUS_USER_HOME to value /nics/c/home/turuncu 2009-08-28 20:13:32,371 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1392] Final string is /nics/c/home/turuncu 2009-08-28 20:13:32,372 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1290] resolving variables in attribute environment 2009-08-28 20:13:32,372 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1295] looking at string ${GLOBUS_USER_NAME} 2009-08-28 20:13:32,372 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1296] found $ at index 0 2009-08-28 20:13:32,372 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1302] found '{'---looks like a reference 2009-08-28 20:13:32,373 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1348] looking up GLOBUS_USER_NAME in {GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch, GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2, GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264, GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu} 2009-08-28 20:13:32,373 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1353] mapped GLOBUS_USER_NAME to value turuncu 2009-08-28 20:13:32,373 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1392] Final string is turuncu 2009-08-28 20:13:32,373 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1290] resolving variables in attribute environment 2009-08-28 20:13:32,374 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1295] looking at string ${GLOBUS_SCRATCH_DIR} 2009-08-28 20:13:32,374 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1296] found $ at index 0 2009-08-28 20:13:32,374 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1302] found '{'---looks like a reference 2009-08-28 20:13:32,374 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1348] looking up GLOBUS_SCRATCH_DIR in {GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch, GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2, GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264, GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu} 2009-08-28 20:13:32,375 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1353] mapped GLOBUS_SCRATCH_DIR to value ${GLOBUS_USER_HOME}/.globus/scratch 2009-08-28 20:13:32,375 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1295] looking at string ${GLOBUS_USER_HOME}/.globus/scratch 2009-08-28 20:13:32,375 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1296] found $ at index 0 2009-08-28 20:13:32,375 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1302] found '{'---looks like a reference 2009-08-28 20:13:32,376 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1348] looking up GLOBUS_USER_HOME in {GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch, GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2, GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264, GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu} 2009-08-28 20:13:32,376 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1353] mapped GLOBUS_USER_HOME to value /nics/c/home/turuncu 2009-08-28 20:13:32,376 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,resolveVariableInString:1392] Final string is /nics/c/home/turuncu/.globus/scratch 2009-08-28 20:13:32,377 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initExtraPerlAttributes:588] Adding extra attributes to the Perl job attribute map 2009-08-28 20:13:32,377 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initExtraPerlAttributes:615] checking for condorness of PBS 2009-08-28 20:13:32,421 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initialize:171] Perl Job Description: $description = { jobdir => [ '/nics/c/home/turuncu/.globus/1748b3d0-8c4b-11de-8543-b8f655c16264' ], environment => [ [ 'GLOBUS_LOCATION', '/usr/local/globus-wsrf-4.0.8-r2' ], [ 'X509_CERT_DIR', '/etc/grid-security/certificates' ], [ 'X509_USER_PROXY', '' ], [ 'X509_USER_CERT', '' ], [ 'X509_USER_KEY', '' ], [ 'HOME', '/nics/c/home/turuncu' ], [ 'LOGNAME', 'turuncu' ], [ 'SCRATCH_DIRECTORY', '/nics/c/home/turuncu/.globus/scratch' ], [ 'JAVA_HOME', '/opt/java/jdk1.6.0_05/jre' ], [ 'GLOBUS_GRAM_JOB_HANDLE', 'https://grid.nics.utk.edu:4321/wsrf/services/ManagedExecutableJobServic e?1748b3d0-8c4b-11de-8543-b8f655c16264' ], ], 2009-08-28 20:13:32,421 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,initialize:178] Leaving initialize() 2009-08-28 20:13:32,429 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,getInternalState:1666] getting resource datum internalState 2009-08-28 20:13:32,429 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,remove:275] Remove called with external state Done and internal state FailureFileCleanUp 2009-08-28 20:13:32,429 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,remove:285] Waiting to be Done or Failed. Current state: FailureFileCleanUp 2009-08-28 20:13:34,432 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,getInternalState:1666] getting resource datum internalState 2009-08-28 20:13:34,432 DEBUG ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 [Thread-7,remove:285] Waiting to be Done or Failed. Current state: FailureFileCleanUp (last two messages repeated 29536 times)