Hi.

I would like to discuss the proposal of adding a new column to the "DAG
File Processing Stats" of DAG processor logs.

Currently in the logs of DAG processor, there is following data
(screenshot below) that includes # of DAGs, runtime, etc. per DAG file.
[image: image.png]

It seems that it would be beneficial to have also there data about the
number of queries performed to the Airflow database during parsing of each
file.
It maybe convenient to have it in case of debugging issues related to high
load on Airflow database, e.g. typical scenario is when DAG file(s) have
a lot of queries to database done on the top level of code and those are
executed each time during parsing of these DAG files.
One common example is excessive usage of "Variables.get" as top-level
statements in DAG files.

Having information about "number of queries to Airflow database" per DAG
file may help a lot during debugging issues related to high load on
database or issues related to long parsing of the DAG files.

One caveat is that due to e.g. caching enabled for Variables or because of
other reasons (dynamic DAGs), number of queries may be very different for
each parsing of the DAG file,
but at least we can have it as "Last Run Number of Queries" - that would
already give some idea and engineer can also review logs historically to
see its data in the past.

What are your thoughts?

-- 
Eugene

Reply via email to