[ https://issues.apache.org/jira/browse/AIRFLOW-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300992#comment-15300992 ]
Chris Riccomini commented on AIRFLOW-160: ----------------------------------------- {quote} We've also seen an unusual case where modules loaded by the user DAG affect operation of the scheduler {quote} We're also very concerned about security, and having DAGs evaluated in-process in the scheduler is pretty dangerous, since it allows DAGs to take over the scheduler. Definite +1 to making DAG parsing a subprocess. As a separate ticket, we will also probably want to make the subprocesses run as a DAG-specific user (e.g. owner). This will prevent DAGs from messing with the Airflow files on the file system, killing Airflow processes, etc. {quote} I think inotify is more suitable or an API call to refresh the dagbag if triggered externally. API call is also nicer because it can update all processes that require a load of the dagbag. {quote} +1 to this comment as well. Our ops folks were actually asking today if there's an API to trigger a DAG refresh. They are going to push DAGs to a folder via a deploy script, and would like to tell Airflow to refresh accordingly. Polling other than during this operation is pointless. inotify would also work (and is probably a better solution than the API, even). > Parse DAG files through child processes > --------------------------------------- > > Key: AIRFLOW-160 > URL: https://issues.apache.org/jira/browse/AIRFLOW-160 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler > Reporter: Paul Yang > Assignee: Paul Yang > > Currently, the Airflow scheduler parses all user DAG files in the same > process as the scheduler itself. We've seen issues in production where bad > DAG files cause scheduler to fail. A simple example is if the user script > calls `sys.exit(1)`, the scheduler will exit as well. We've also seen an > unusual case where modules loaded by the user DAG affect operation of the > scheduler. For better uptime, the scheduler should be resistant to these > problematic user DAGs. > The proposed solution is to parse and schedule user DAGs through child > processes. This way, the main scheduler process is more isolated from bad > DAGs. There's a side benefit as well - since parsing is distributed among > multiple processes, it's possible to parse the DAG files more frequently, > reducing the latency between when a DAG is modified and when the changes are > picked up. > Another issue right now is that all DAGs must be scheduled before any tasks > are sent to the executor. This means that the frequency of task scheduling is > limited by the slowest DAG to schedule. The changes needed for scheduling > DAGs through child processes will also make it easy to decouple this process > and allow tasks to be scheduled and sent to the executor in a more > independent fashion. This way, overall scheduling won't be held back by a > slow DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)