[ https://issues.apache.org/jira/browse/AIRFLOW-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16164512#comment-16164512 ]
ASF subversion and git services commented on AIRFLOW-160: --------------------------------------------------------- Commit 028b3b88ff4f191c78bf1d9c41bf43a792f640ff in incubator-airflow's branch refs/heads/master from [~ashb] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=028b3b8 ] [AIRFLOW-1606][Airflow-1606][AIRFLOW-1605][AIRFLOW-160] DAG.sync_to_db is now a normal method Previously it was a static method that took as it's first argument a DAG, which really meant it wasn't truly a static method. To avoid reversing the parameter order I have given sensible defaults from the one and only use in the rest of the code base. Also remove documented "sync_to_db" parameter on DagBag that no longer exists -- this doc string refers to a parameter that was removed in [AIRFLOW-160]. Closes #2605 from ashb/AIRFLOW-1606-db-sync_to_db- not-static > Parse DAG files through child processes > --------------------------------------- > > Key: AIRFLOW-160 > URL: https://issues.apache.org/jira/browse/AIRFLOW-160 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler > Reporter: Paul Yang > Assignee: Paul Yang > > Currently, the Airflow scheduler parses all user DAG files in the same > process as the scheduler itself. We've seen issues in production where bad > DAG files cause scheduler to fail. A simple example is if the user script > calls `sys.exit(1)`, the scheduler will exit as well. We've also seen an > unusual case where modules loaded by the user DAG affect operation of the > scheduler. For better uptime, the scheduler should be resistant to these > problematic user DAGs. > The proposed solution is to parse and schedule user DAGs through child > processes. This way, the main scheduler process is more isolated from bad > DAGs. There's a side benefit as well - since parsing is distributed among > multiple processes, it's possible to parse the DAG files more frequently, > reducing the latency between when a DAG is modified and when the changes are > picked up. > Another issue right now is that all DAGs must be scheduled before any tasks > are sent to the executor. This means that the frequency of task scheduling is > limited by the slowest DAG to schedule. The changes needed for scheduling > DAGs through child processes will also make it easy to decouple this process > and allow tasks to be scheduled and sent to the executor in a more > independent fashion. This way, overall scheduling won't be held back by a > slow DAG. -- This message was sent by Atlassian JIRA (v6.4.14#64029)