Thanks for the responses. I will take these points in to consideration
during cancel implementation.

Lahiru


On Wed, Aug 13, 2014 at 7:33 PM, Eroma Abeysinghe <
[email protected]> wrote:

> My questions and thoughts on Experiment cancellation
> 1. What are we going to do for output or partial output of the job at the
> time of cancelling?
>     Are we going to discard or make them available for the experiment. Are
> we safe keeping all the job information, messages on CANCELLED jobs or
> discard them as well?
>
> 2. Are we going to allow editing for CANCELLED or CANCELLING experiments?
> IMO we should not. because allowing editing is required if its going to
> Re-launch.
>
> 3. With existing experiment and job states we need to decide which are
> going to be CANCELLED
> Out of Airavata Experiment states Cancellation should be allowed for
> states;
> CREATED
> VALIDATED
> SCHEDULED
> LAUNCHED
> EXECUTING
> Cancellation should be communicated to resources if the job states are;
> SUBMITTED
> SETUP
> QUEUED
> ACTIVE
> HELD
>
> There is SUSPENDED state in both experiment and job but is this a
> currently active state?
>
> 4. Cloning will be available for CANCELLED and CANCELLING experiments.
>
> 5. In Experiment Summary we should display any errors took place in
> cancelling process
>
>
>
>
>
>
>
>
>
>
>
>
> On Wed, Aug 13, 2014 at 9:01 AM, Marlon Pierce <[email protected]> wrote:
>
>> There is an advantage for task (or job) state to capture the information
>> that really comes from the machine (completed, cancelled, failed, etc), and
>> for experiment state to be set to canceled by Airavata.  That is, there
>> should be parts of Airavata that capture machine-specific state information
>> about the job for logging/auditing purposes.
>>
>> * Airavata issues "cancel" command to job in "launched" or "executing"
>> state.
>>
>> * Airavata confirms that the job has left the queue or is no longer
>> executing. This could be machine-specific, but the main question is "has
>> the job left the queue?" or "is the job no longer in executing state?"  I
>> don't think it is "if this is trestles, and since we issued a qdel command,
>> is the job marked as completed; of if this is stampede, is the job now
>> marked as failed?"
>>
>> * If the job cancel works, the Airavata marks this as canceled.
>>
>> * If cancel fails for some reason, don't change the Experiment state but
>> throw an error.
>>
>>
>> Marlon
>>
>>
>> On 8/13/14, 2:57 AM, Lahiru Gunathilake wrote:
>>
>>> Hi All,
>>>
>>> I have few concerns about experiment cancellation. When we want to cancel
>>> and experiment we have to run a particular command in the computing
>>> resource. Based on the computing resource different resources show the
>>> job
>>> status of the cancelled jobs in a different way. Ex: trestles shows the
>>> cancelled jobs as completed, some other machines show it as as cancelled,
>>> some might show it as failed.
>>>
>>> I think we should replicated this information in the JobDetails object as
>>> the Job status and make sure the Experiments and Task statuses as
>>> cancelled. The other approach is when we cancel we explicitly make all
>>> the
>>> states in the experiment model (experiments,tasks,job states as
>>> cancelled)
>>> as cancelled and manually handle the state we get from the computing
>>> resource.
>>>
>>> My concerns should we really hide that information shown in the computing
>>> resource from the Job status we are storing in to the registry ? or leave
>>> it as it is and handle other statuses to represent the cancelled
>>> experiments ? If we make everything cancel there will be inconsistency in
>>> the JobStatus.
>>>
>>> WDYT ?
>>>
>>> Lahiru
>>>
>>>
>>
>
>
> --
> Thank You,
> Best Regards,
> Eroma
>



-- 
System Analyst Programmer
PTI Lab
Indiana University

Reply via email to