Good idea, it would also be good if we could have access to the information
exposed in the UI - so that "operations users" can see it and maybe even
act on it + API/ CLI to check it. I think in the future of Airflow 3 where
we will have task isolation, having `0` for all the DAGs will be a
prerequisite for switching to "task isolation" mode and this could be
actually verified in a migration tool.

On Fri, Jun 14, 2024 at 10:59 AM Eugen Kosteev <[email protected]> wrote:

> Hi.
>
> I would like to discuss the proposal of adding a new column to the "DAG
> File Processing Stats" of DAG processor logs.
>
> Currently in the logs of DAG processor, there is following data
> (screenshot below) that includes # of DAGs, runtime, etc. per DAG file.
> [image: image.png]
>
> It seems that it would be beneficial to have also there data about the
> number of queries performed to the Airflow database during parsing of each
> file.
> It maybe convenient to have it in case of debugging issues related to high
> load on Airflow database, e.g. typical scenario is when DAG file(s) have
> a lot of queries to database done on the top level of code and those are
> executed each time during parsing of these DAG files.
> One common example is excessive usage of "Variables.get" as top-level
> statements in DAG files.
>
> Having information about "number of queries to Airflow database" per DAG
> file may help a lot during debugging issues related to high load on
> database or issues related to long parsing of the DAG files.
>
> One caveat is that due to e.g. caching enabled for Variables or because of
> other reasons (dynamic DAGs), number of queries may be very different for
> each parsing of the DAG file,
> but at least we can have it as "Last Run Number of Queries" - that would
> already give some idea and engineer can also review logs historically to
> see its data in the past.
>
> What are your thoughts?
>
> --
> Eugene
>

Reply via email to