There is an advantage for task (or job) state to capture the information that really comes from the machine (completed, cancelled, failed, etc), and for experiment state to be set to canceled by Airavata. That is, there should be parts of Airavata that capture machine-specific state information about the job for logging/auditing purposes.

* Airavata issues "cancel" command to job in "launched" or "executing" state.

* Airavata confirms that the job has left the queue or is no longer executing. This could be machine-specific, but the main question is "has the job left the queue?" or "is the job no longer in executing state?" I don't think it is "if this is trestles, and since we issued a qdel command, is the job marked as completed; of if this is stampede, is the job now marked as failed?"

* If the job cancel works, the Airavata marks this as canceled.

* If cancel fails for some reason, don't change the Experiment state but throw an error.


Marlon

On 8/13/14, 2:57 AM, Lahiru Gunathilake wrote:
Hi All,

I have few concerns about experiment cancellation. When we want to cancel
and experiment we have to run a particular command in the computing
resource. Based on the computing resource different resources show the job
status of the cancelled jobs in a different way. Ex: trestles shows the
cancelled jobs as completed, some other machines show it as as cancelled,
some might show it as failed.

I think we should replicated this information in the JobDetails object as
the Job status and make sure the Experiments and Task statuses as
cancelled. The other approach is when we cancel we explicitly make all the
states in the experiment model (experiments,tasks,job states as cancelled)
as cancelled and manually handle the state we get from the computing
resource.

My concerns should we really hide that information shown in the computing
resource from the Job status we are storing in to the registry ? or leave
it as it is and handle other statuses to represent the cancelled
experiments ? If we make everything cancel there will be inconsistency in
the JobStatus.

WDYT ?

Lahiru


Reply via email to