[jira] [Updated] (AIRFLOW-14) DagRun Refactor (Scheduler 2.0)

Chris Riccomini (JIRA) Tue, 03 May 2016 08:29:14 -0700

     [ 
https://issues.apache.org/jira/browse/AIRFLOW-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chris Riccomini updated AIRFLOW-14:
-----------------------------------
    Component/s: scheduler

> DagRun Refactor (Scheduler 2.0)
> -------------------------------
>
>                 Key: AIRFLOW-14
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-14
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: scheduler
>            Reporter: Jeremiah Lowin
>            Assignee: Jeremiah Lowin
>              Labels: backfill, dagrun, scheduler
>
> For full proposal, please see the Wiki: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62694286
> Borrowing from that page: 
> *Description of New Workflow*
> DagRuns represent the state of a DAG at a certain point in time (perhaps they 
> should be called DagInstances?). To run a DAG – or to manage the execution of 
> a DAG – a DagRun must first be created. This can be done manually (simply by 
> creating a DagRun object) or automatically, using methods like 
> dag.schedule_dag(). Therefore, both scheduling new runs OR introducing ad-hoc 
> runs can be done by any process at any time, simply by creating the 
> appropriate object.
> Just creating a DagRun is not enough to actually run the DAG (just as 
> creating a TaskInstance is not the same as actually running a task). We need 
> a Job for that. The DagRunJob is fairly simple in structure. It maintains a 
> set of DagRuns that it is tasked with executing, and loops over that set 
> until all the DagRuns either succeed or fail. New DagRuns can be passed to 
> the job explicitly via DagRunJob.submit_dagruns() or by defining its 
> DagRunJob.collect_dagruns() method, which is called during each loop. When 
> the DagRunJob is executing a specific DagRun, it locks it. Other DagRunJobs 
> will not try to execute locked DagRuns. This way, many DagRunJobs can run 
> simultaneously in either a local or distributed setting, and can even be 
> pointed at the same DagRuns, without worrying about collisions or 
> interference.
> The basic DagRunJob loop works like this:
> - refresh dags
> - collect new dagruns
> - process dagruns (including updating dagrun states for success/failure)
> - call executor/own heartbeat
> By tweaking the DagRunJob, we can easily recreate the behavior of the current 
> SchedulerJob and BackfillJob. The Scheduler simply runs forever and picks up 
> ALL active DagRuns in collect_dagruns(); Backfill generates DagRuns 
> corresponding to the requested start/end dates and submits them to itself 
> prior to initiating its loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (AIRFLOW-14) DagRun Refactor (Scheduler 2.0)

Reply via email to