[ https://issues.apache.org/jira/browse/AIRFLOW-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300747#comment-15300747 ]
Bolke de Bruin commented on AIRFLOW-160: ---------------------------------------- +1 on the idea, -1 on more polling. I think inotify is more suitable or an API call to refresh the dagbag if triggered externally. API call is also nicer because it can update all processes that require a load of the dagbag. > Parse DAG files through child processes > --------------------------------------- > > Key: AIRFLOW-160 > URL: https://issues.apache.org/jira/browse/AIRFLOW-160 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler > Reporter: Paul Yang > Assignee: Paul Yang > > Currently, the Airflow scheduler parses all user DAG files in the same > process as the scheduler itself. We've seen issues in production where bad > DAG files cause scheduler to fail. A simple example is if the user script > calls `sys.exit(1)`, the scheduler will exit as well. We've also seen an > unusual case where modules loaded by the user DAG affect operation of the > scheduler. For better uptime, the scheduler should be resistant to these > problematic user DAGs. > The proposed solution is to parse and schedule user DAGs through child > processes. This way, the main scheduler process is more isolated from bad > DAGs. There's a side benefit as well - since parsing is distributed among > multiple processes, it's possible to parse the DAG files more frequently, > reducing the latency between when a DAG is modified and when the changes are > picked up. > Another issue right now is that all DAGs must be scheduled before any tasks > are sent to the executor. This means that the frequency of task scheduling is > limited by the slowest DAG to schedule. The changes needed for scheduling > DAGs through child processes will also make it easy to decouple this process > and allow tasks to be scheduled and sent to the executor in a more > independent fashion. This way, overall scheduling won't be held back by a > slow DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)