Please try the following:
1. In the situation when the job hangs:
How about submitting a job in batch mode (globusrun-ws -submit -b -o job.epr
...)
and query for job status instead of listening for notifications
(globusrun-ws -status -j job.epr)
Does the job status change after a while? (I don't expect it, but just to
make sure)
2. Shut down the container, enable debug logging in Gram4
(uncomment # log4j.category.org.globus.exec.service=DEBUG in
$GLOBUS_LOCATION/container-log4j.properties), clean up the persistence
directory,
move the problematic persisted job into the persistence data, start the
container,
submit a job.
Please send the container logfile then.
Thanks, Martin
Yuriy wrote:
Hi,
I am having very strange problems with globus GRAM.
Submission of job with globusrun-ws hangs on "Job Unsubmitted"
message. I tried to submit job from two different machines with the
same result.
globusrun-ws -submit -J -S -F ng2.auckland.ac.nz:8443 -Ft Fork -o test.epr -c
/bin/echo "hello"
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:6eeadb2c-6ffa-11dd-a2f7-00163e000005
Termination time: 08/23/2008 03:28 GMT
Current job state: Unsubmitted
Sample java program (attached) and CoG client
(cog-job-submit) work normally.
Globus restart does not help, unless I remove persisted
directory. Persisted is on local partition. I figured that single
file in ManagedExecutableJobResourceStateType causes the problem (xml
attached). When I remove this file and restart globus, globusws-run
works normally. When I copy this file into
persisted/ManagedExecutableJobResourceState, and restart globus, it
breaks again. My globus breaks every 3-7 days so there are other job
resouces that cause this problem.
globus version is 4.0.7 from VDT 1.10
What is going on here?
Regards,
Yuriy