By default, the files are stored in directories under the home dir of the account that starts the container.
e.g. ~/.globus/persisted/...

So there is not an issue with the ws gram service being able access the persisted job state files. The only issue here is the file system (e.g. nfs) and the default can be changed to unsure that it is writing to local disk to avoid any file system issues.

-Stu

On Jun 20, 2008, at Jun 20, 4:22 AM, [EMAIL PROTECTED] wrote:

On Jun 19, 2008, at Jun 19, 6:40 PM, [EMAIL PROTECTED]
wrote:
Hi,
Thanks for your reply.

I Think there are some places to store each of job state
information. But
where and how that information can be stored or retrieved ?


Yes.  At various times the job state information is written to and
read from flat files.  We're looking into alternatively using a
database to store the information.  Possibly coming sometime in the
4.2 series.


So, where that information be stored in Globus Toolkit 4.0.7. And how
Globus manage that information to prevent some failure that i had mention
before ?

Thanks

Tonny


Hi Tonny,

GRAM is fault tolerant, meaning that when/if the container or service
host crashes, the job details are not lost.  When the GRAM4 service
is
restarted, then the processing/monitoring of the job resumes. GRAM2
requires user/client intervention to restart the processing of the
job.

If the job included file stage in directives and those had not
completed at the time of the crash, then gram will resume processing the job for that job state and continue until the job has been fully
processed.

If the job had already been submitted to the local resource manager,
then GRAM will resume monitoring the job in the LRM and continue
processing the job to completion. GRAM persists the LRM job id. If
the crash included the LRM and the LRM is also fault tolerant and
resumes processing of the job, then the job will be completely
processed without requiring any client intervention.

A persistent connection between the GRAM client and service is not
maintained, so network failures between the client and service can be
overcome.

In GRAM4 (WS GRAM), an EPR is included in the reply to
createManagedJob. This allows the client to contact the service when desired to get the current job status, cancel the job, subscribe for
notifications, ...

If the createManagedJob call is received by the GRAM service, but the
reply (containing the EPR) is not received by the client (possibly
due
to network failure), then GRAM4 provides the means to subsequently
get
the EPR in order to control the previously submitted job.
Detail about that are here:
http://www-unix.globus.org/toolkit/docs/4.0/execution/wsgram/user-index.html#s-wsgram-user-submissionid

Lemme know if you have any more questions on this.

Regards,
-Stu

On Jun 19, 2008, at Jun 19, 10:00 AM, [EMAIL PROTECTED]
wrote:


Hi,

I'm not quite understand about how GT4 manages job that was
submitted when
some failures happen, for example lost connection with client that
caused
by temporary network failure and lost contact with LRM that caused
by
globus being restarted during job execution.

does anybody know about this ?

Regards

Tonny










Reply via email to