Hi Marlon, I should be able to wrap-up later today or early tomorrow.
Regards Lahiru On Mon, Aug 18, 2014 at 7:01 PM, Marlon Pierce <[email protected]> wrote: > How goes the implementation? > > Marlon > > > On 8/13/14, 11:09 PM, Lahiru Gunathilake wrote: > >> Thank you very much for all the inputs ! This will take these in to >> consideration. >> >> Regards >> Lahiru >> >> >> On Wed, Aug 13, 2014 at 10:31 PM, Miller, Mark <[email protected]> wrote: >> >> If I understand this correctly, I want to offer some input from our >>> experience with CIPRES. >>> >>> Currently, if a CIPRES user wishes to cancel a job, they must delete the >>> entire job, and therefore all ability to view the input and other files >>> used become unavailable. >>> >>> This is not an ideal solution. >>> >>> >>> >>> There is value to the user to being able to see partially completed >>> results, or even the input files they used. >>> >>> >>> >>> So I would vote for making partial output of the job available as an >>> option. >>> >>> Any additional information you can provide about status would be useful, >>> especially for folks who are debugging failures.. >>> >>> >>> >>> Just my 2c. >>> >>> >>> >>> Mark >>> >>> >>> >>> *From:* Eroma Abeysinghe [mailto:[email protected]] >>> *Sent:* Wednesday, August 13, 2014 7:04 AM >>> *To:* [email protected] >>> *Subject:* Re: Experiment Cancellation >>> >>> >>> >>> >>> My questions and thoughts on Experiment cancellation >>> 1. What are we going to do for output or partial output of the job at the >>> time of cancelling? >>> Are we going to discard or make them available for the experiment. >>> Are >>> we safe keeping all the job information, messages on CANCELLED jobs or >>> discard them as well? >>> >>> 2. Are we going to allow editing for CANCELLED or CANCELLING experiments? >>> IMO we should not. because allowing editing is required if its going to >>> Re-launch. >>> >>> 3. With existing experiment and job states we need to decide which are >>> going to be CANCELLED >>> Out of Airavata Experiment states Cancellation should be allowed for >>> states; >>> CREATED >>> VALIDATED >>> SCHEDULED >>> LAUNCHED >>> EXECUTING >>> Cancellation should be communicated to resources if the job states are; >>> SUBMITTED >>> SETUP >>> QUEUED >>> ACTIVE >>> HELD >>> >>> >>> There is SUSPENDED state in both experiment and job but is this a >>> currently active state? >>> >>> 4. Cloning will be available for CANCELLED and CANCELLING experiments. >>> >>> 5. In Experiment Summary we should display any errors took place in >>> cancelling process >>> >>> >>> >>> >>> >>> On Wed, Aug 13, 2014 at 9:01 AM, Marlon Pierce <[email protected]> wrote: >>> >>> There is an advantage for task (or job) state to capture the information >>> that really comes from the machine (completed, cancelled, failed, etc), >>> and >>> for experiment state to be set to canceled by Airavata. That is, there >>> should be parts of Airavata that capture machine-specific state >>> information >>> about the job for logging/auditing purposes. >>> >>> * Airavata issues "cancel" command to job in "launched" or "executing" >>> state. >>> >>> * Airavata confirms that the job has left the queue or is no longer >>> executing. This could be machine-specific, but the main question is "has >>> the job left the queue?" or "is the job no longer in executing state?" I >>> don't think it is "if this is trestles, and since we issued a qdel >>> command, >>> is the job marked as completed; of if this is stampede, is the job now >>> marked as failed?" >>> >>> * If the job cancel works, the Airavata marks this as canceled. >>> >>> * If cancel fails for some reason, don't change the Experiment state but >>> throw an error. >>> >>> >>> Marlon >>> >>> >>> >>> On 8/13/14, 2:57 AM, Lahiru Gunathilake wrote: >>> >>> Hi All, >>> >>> I have few concerns about experiment cancellation. When we want to cancel >>> and experiment we have to run a particular command in the computing >>> resource. Based on the computing resource different resources show the >>> job >>> status of the cancelled jobs in a different way. Ex: trestles shows the >>> cancelled jobs as completed, some other machines show it as as cancelled, >>> some might show it as failed. >>> >>> I think we should replicated this information in the JobDetails object as >>> the Job status and make sure the Experiments and Task statuses as >>> cancelled. The other approach is when we cancel we explicitly make all >>> the >>> states in the experiment model (experiments,tasks,job states as >>> cancelled) >>> as cancelled and manually handle the state we get from the computing >>> resource. >>> >>> My concerns should we really hide that information shown in the computing >>> resource from the Job status we are storing in to the registry ? or leave >>> it as it is and handle other statuses to represent the cancelled >>> experiments ? If we make everything cancel there will be inconsistency in >>> the JobStatus. >>> >>> WDYT ? >>> >>> Lahiru >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Thank You, >>> >>> Best Regards, >>> >>> Eroma >>> >>> >> >> > -- System Analyst Programmer PTI Lab Indiana University
