[ https://issues.apache.org/jira/browse/AIRFLOW-764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828798#comment-15828798 ]
Ludovic Claude commented on AIRFLOW-764: ---------------------------------------- A related issue is that max_active_dag_run is not honoured for externally triggered DAGs. This issue affects me as I use Airflow to scan a folder containing lots of sub-folders to process, and for each sub-folder a DAG is triggered. I would like to be able to restrict the amount of parallel processing in order to complete the work faster, but even with pools Airflow starts processing on all Dag runs, then the scheduler attempts to complete similar tasks for all DAG runs instead of trying to complete individual DAG runs faster. If you look at the Tree view, the green blocks are layered out horizontally, I would like to see them layered vertically. Here is a comment from Maxime Beauchemin: http://markmail.org/message/dm4heorbyatgcvyk Without looking at the latest code to confirm what I'm about to write, `max_active_runs` really only prevents the scheduler from creating new active DAG runs. For `max_active_runs` to apply to externally triggered runs, we'd need to introduce handling of a new status of `scheduled` to DAG runs. The scheduler would have to handle the new simple task of flipping the status from this `scheduled` to `running` when `actual_active_dag_runs < max_active_runs`. We'd probably want for the CLI command and the UI DAG run creation process to default DAG run status to this new `scheduled` state. I think it should be a fairly simple feature to add in. > max_active_runs_per_dag not respected for DAGs triggered manually within a > few seconds of one another > ----------------------------------------------------------------------------------------------------- > > Key: AIRFLOW-764 > URL: https://issues.apache.org/jira/browse/AIRFLOW-764 > Project: Apache Airflow > Issue Type: Bug > Components: core, executor > Affects Versions: Airflow 1.7.1.3 > Environment: debian linux, mysql with localexecutor > Reporter: Jeffrey Enns > Attachments: test_dag.py, test_dag_screen.png, test_job.sh, > trigger_two.sh > > > Given the following configuration: > ``` > [core] > executor = LocalExecutor > max_active_runs_per_dag = 1 > parallelism = 20 > dag_concurrency = 1 > ``` > Even with `max_active_runs_per_dag=1`, it is possible to cause two (or more) > DAG runs to run in parallel by triggering the runs manually within a few > seconds/milliseconds of one another. Task Instances from the distinct DAG > runs will show as active in the “Task Instances” web view at the same time. > I only looked at the scheduler code briefly, but it looked as if a race > condition would be possible for manually triggered DAGs that could lead to > this behaviour. > I’ve attached a test DAG and two shell scripts I used to reliably reproduce > this behaviour. Put `test_dag.py` and `test_job.sh` in the DAGs folder, and > then run `trigger_two.sh` to reproduce the bug. > Also attached is a screenshot showing DAG runs (for the dag ‘race_dag’) > running in parallel after following the steps described immediately above > (note the execution date, start date, and end date for each TI). -- This message was sent by Atlassian JIRA (v6.3.4#6332)