> On Jun 19, 2008, at Jun 19, 6:40 PM, [EMAIL PROTECTED] > wrote: >> Hi, >> Thanks for your reply. >> >> I Think there are some places to store each of job state >> information. But >> where and how that information can be stored or retrieved ?
> > Yes. At various times the job state information is written to and > read from flat files. We're looking into alternatively using a > database to store the information. Possibly coming sometime in the > 4.2 series. > So, where that information be stored in Globus Toolkit 4.0.7. And how Globus manage that information to prevent some failure that i had mention before ? Thanks Tonny >> >>> Hi Tonny, >>> >>> GRAM is fault tolerant, meaning that when/if the container or service >>> host crashes, the job details are not lost. When the GRAM4 service >>> is >>> restarted, then the processing/monitoring of the job resumes. GRAM2 >>> requires user/client intervention to restart the processing of the >>> job. >>> >>> If the job included file stage in directives and those had not >>> completed at the time of the crash, then gram will resume processing >>> the job for that job state and continue until the job has been fully >>> processed. >>> >>> If the job had already been submitted to the local resource manager, >>> then GRAM will resume monitoring the job in the LRM and continue >>> processing the job to completion. GRAM persists the LRM job id. If >>> the crash included the LRM and the LRM is also fault tolerant and >>> resumes processing of the job, then the job will be completely >>> processed without requiring any client intervention. >>> >>> A persistent connection between the GRAM client and service is not >>> maintained, so network failures between the client and service can be >>> overcome. >>> >>> In GRAM4 (WS GRAM), an EPR is included in the reply to >>> createManagedJob. This allows the client to contact the service when >>> desired to get the current job status, cancel the job, subscribe for >>> notifications, ... >>> >>> If the createManagedJob call is received by the GRAM service, but the >>> reply (containing the EPR) is not received by the client (possibly >>> due >>> to network failure), then GRAM4 provides the means to subsequently >>> get >>> the EPR in order to control the previously submitted job. >>> Detail about that are here: >>> http://www-unix.globus.org/toolkit/docs/4.0/execution/wsgram/user-index.html#s-wsgram-user-submissionid >>> >>> Lemme know if you have any more questions on this. >>> >>> Regards, >>> -Stu >>> >>> On Jun 19, 2008, at Jun 19, 10:00 AM, [EMAIL PROTECTED] >>> wrote: >>> >>>> >>>> Hi, >>>> >>>> I'm not quite understand about how GT4 manages job that was >>>> submitted when >>>> some failures happen, for example lost connection with client that >>>> caused >>>> by temporary network failure and lost contact with LRM that caused >>>> by >>>> globus being restarted during job execution. >>>> >>>> does anybody know about this ? >>>> >>>> Regards >>>> >>>> Tonny >>>> >>> >>> >> >> > >
