On Oct 20, 2008, at 7:40 PM, John Sanabria wrote:
Hi,
I'm developing a platform for executing jobs using traditional
Globus commands such as 'globus-job-run' and 'globus-job-submit'.
Now, when a user decides to make an asynchronous execution, the
platform queries periodically for the job status to the remote
resource using the 'globus-job-status' command.
I'm executing tasks lasting more than 5 days.
I have noted that approximately one day or less after I start the
execution, the 'globus-job-status' command returns 'DONE' but the
job hasn't finished.
Is this normal behavior? I read the paper 'The Gridway Framework For
Adaptive Scheduling And Execution Grids' and I found this:
"The job manager is probed periodically at each polling. If the job
manager does not respond, the GRAM gatekeeper is probed. If the
gatekeeper responds, a new job manager is started to resume watching
over the job. If the gatekeeper fails to respond..."
According that, I think that this behavior is not abnormal, but I
don't know how to query the GRAM gatekeeper and what message send to
it for requesting that it starts a new job manager for watching a job.
I appreciate your comments, advice and pointers to documentation
about this topic.
Cheers,
I wonder if the proxy you have delegated to the GRAM is expiring after
the day? Are you creating a proxy with a long enough lifetime to last
for the whole jobs?
Joe