[ 
https://issues.apache.org/jira/browse/AIRFLOW-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300747#comment-15300747
 ] 

Bolke de Bruin commented on AIRFLOW-160:
----------------------------------------

+1 on the idea, -1 on more polling. I think inotify is more suitable or an API 
call to refresh the dagbag if triggered externally. API call is also nicer 
because it can update all processes that require a load of the dagbag.

> Parse DAG files through child processes
> ---------------------------------------
>
>                 Key: AIRFLOW-160
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-160
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: scheduler
>            Reporter: Paul Yang
>            Assignee: Paul Yang
>
> Currently, the Airflow scheduler parses all user DAG files in the same 
> process as the scheduler itself. We've seen issues in production where bad 
> DAG files cause scheduler to fail. A simple example is if the user script 
> calls `sys.exit(1)`, the scheduler will exit as well. We've also seen an 
> unusual case where modules loaded by the user DAG affect operation of the 
> scheduler. For better uptime, the scheduler should be resistant to these 
> problematic user DAGs.
> The proposed solution is to parse and schedule user DAGs through child 
> processes. This way, the main scheduler process is more isolated from bad 
> DAGs. There's a side benefit as well - since parsing is distributed among 
> multiple processes, it's possible to parse the DAG files more frequently, 
> reducing the latency between when a DAG is modified and when the changes are 
> picked up.
> Another issue right now is that all DAGs must be scheduled before any tasks 
> are sent to the executor. This means that the frequency of task scheduling is 
> limited by the slowest DAG to schedule. The changes needed for scheduling 
> DAGs through child processes will also make it easy to decouple this process 
> and allow tasks to be scheduled and sent to the executor in a more 
> independent fashion. This way, overall scheduling won't be held back by a 
> slow DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to