[
https://issues.apache.org/jira/browse/AIRAVATA-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lahiru Jayathilake updated AIRAVATA-3893:
-----------------------------------------
Summary: Support automated HPC Job Re-Submission across Clusters after
HPC-Side failures (was: Support for Automatic Resubmission of Failed Jobs
After Successful Submission)
> Support automated HPC Job Re-Submission across Clusters after HPC-Side
> failures
> -------------------------------------------------------------------------------
>
> Key: AIRAVATA-3893
> URL: https://issues.apache.org/jira/browse/AIRAVATA-3893
> Project: Airavata
> Issue Type: Improvement
> Components: Airavata System
> Reporter: Lahiru Jayathilake
> Priority: Major
>
> Currently, the Airavata Metascheduler does not have the capability to
> automatically resubmit jobs to other clusters if the job has been
> successfully submitted but fails during execution (e.g., due to resource
> allocation issues).
> This feature request aims to enhance the Metascheduler by introducing the
> ability to handle such job failures more effectively. The Metascheduler
> should automatically attempt to resubmit failed jobs to other configured
> clusters, ensuring more reliable completion of experiments.
> This enhancement will improve the system’s robustness in handling transient
> failures or resource constraints across multiple clusters.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)