Hi,
I'm developing a platform for executing jobs using traditional Globus
commands such as 'globus-job-run' and 'globus-job-submit'. Now, when a
user decides to make an asynchronous execution, the platform queries
periodically for the job status to the remote resource using the
'globus-job-status' command.
I'm executing tasks lasting more than 5 days.
I have noted that approximately one day or less after I start the
execution, the 'globus-job-status' command returns 'DONE' but the job
hasn't finished.
Is this normal behavior? I read the paper 'The Gridway Framework For
Adaptive Scheduling And Execution Grids' and I found this:
"The job manager is probed periodically at each polling. If the job
manager does not respond, the GRAM gatekeeper is probed. If the
gatekeeper responds, a new job manager is started to resume watching
over the job. If the gatekeeper fails to respond..."
According that, I think that this behavior is not abnormal, but I don't
know how to query the GRAM gatekeeper and what message send to it for
requesting that it starts a new job manager for watching a job.
I appreciate your comments, advice and pointers to documentation about
this topic.
Cheers,