Ok, sorry, these concurrency settings are confusing.

Let me clarify.

`max_active_tasks_per_dag` is a core airflow setting and it provides the
default for DAG.max_active_tasks.

DAG.max_active_tasks I think is a reasonable config to have but the problem
in my view is the scope.  I feel it should be applied at the dag *run* scope
and not across all dag runs.  That just gets into confusing and footgunish
territory if you allow many concurrent dag runs but limit the number of
concurrent tasks.  Then you might have many many dags running but all
limping along.

So I guess let me change my proposal.  I would propose that we have
DAG.max_active_tasks be applied at the dag *run* scope.  Not limiting
concurrency across all dag runs.

I think in practice this is essentially what it already is, because I would
expect that the vast majority of dag runs are the only dag run running for
a given dag at a given time.  It's only when you have many dag runs of the
same dag running that this parameter ends up meaning something different.

So, I propose, DAG.max_active_tasks should be evaluated per-dag-run.  And
we can change the name accordingly if folks on board.

Now whether a mapped task is a task or not, I leave that for another day :)






On Fri, Oct 4, 2024 at 10:28 AM Daniel Standish <
daniel.stand...@astronomer.io> wrote:

> The setting  max_active_tasks_per_dag seems mostly useless to me / and
> footgunish.
>
> Why?
>
> Because you already have a setting for max active dag runs.  If you don't
> want to run more tasks, don't create the extra dag runs.
>
> We also already have a mechanism (param on base operator) for limiting
> individual tasks across all dag runs where that may be needed.  But just a
> general "i don't want more than 16 tasks running across all dag runs of all
> types and for all tasks" seems just, imprecise and not useful.
>
> I actually think it makes sense to remove this param entirely.  But at
> least we should remove the default.
>
> WDYT
>

Reply via email to