[ 
https://issues.apache.org/jira/browse/AIRFLOW-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16164512#comment-16164512
 ] 

ASF subversion and git services commented on AIRFLOW-160:
---------------------------------------------------------

Commit 028b3b88ff4f191c78bf1d9c41bf43a792f640ff in incubator-airflow's branch 
refs/heads/master from [~ashb]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=028b3b8 ]

[AIRFLOW-1606][Airflow-1606][AIRFLOW-1605][AIRFLOW-160] DAG.sync_to_db is now a 
normal method

Previously it was a static method that took as
it's first argument a
DAG, which really meant it wasn't truly a static
method.

To avoid reversing the parameter order I have
given sensible defaults
from the one and only use in the rest of the code
base.

Also remove documented "sync_to_db" parameter on
DagBag that no longer
exists -- this doc string refers to a parameter
that was removed in
[AIRFLOW-160].

Closes #2605 from ashb/AIRFLOW-1606-db-sync_to_db-
not-static


> Parse DAG files through child processes
> ---------------------------------------
>
>                 Key: AIRFLOW-160
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-160
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: scheduler
>            Reporter: Paul Yang
>            Assignee: Paul Yang
>
> Currently, the Airflow scheduler parses all user DAG files in the same 
> process as the scheduler itself. We've seen issues in production where bad 
> DAG files cause scheduler to fail. A simple example is if the user script 
> calls `sys.exit(1)`, the scheduler will exit as well. We've also seen an 
> unusual case where modules loaded by the user DAG affect operation of the 
> scheduler. For better uptime, the scheduler should be resistant to these 
> problematic user DAGs.
> The proposed solution is to parse and schedule user DAGs through child 
> processes. This way, the main scheduler process is more isolated from bad 
> DAGs. There's a side benefit as well - since parsing is distributed among 
> multiple processes, it's possible to parse the DAG files more frequently, 
> reducing the latency between when a DAG is modified and when the changes are 
> picked up.
> Another issue right now is that all DAGs must be scheduled before any tasks 
> are sent to the executor. This means that the frequency of task scheduling is 
> limited by the slowest DAG to schedule. The changes needed for scheduling 
> DAGs through child processes will also make it easy to decouple this process 
> and allow tasks to be scheduled and sent to the executor in a more 
> independent fashion. This way, overall scheduling won't be held back by a 
> slow DAG.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to