We have `DAG Code` model in the database — which from memory (I haven’t checked 
just now) is the entire source of the file, even if there are multiple dags 
defined in one file, so we could add the columns to that row.

-ash

> On 14 Jun 2024, at 11:37, Jarek Potiuk <ja...@potiuk.com> wrote:
> 
> Yep . Per DAG file is what I actually meant :)
> 
> On Fri, Jun 14, 2024 at 12:26 PM Eugen Kosteev <eu...@kosteev.com> wrote:
> 
>> The thing is that it is "last count per DAG file".
>> I do not think we can actually calculate this per DAG, well we can split
>> total number of queries by number of DAGs in the file, but this maybe
>> confusing.
>> 
>> On Fri, Jun 14, 2024 at 12:24 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>> 
>>>> the cardinality of those logs is too high.
>>> 
>>> I was thinking about only showing "last count per DAG" - then cardinality
>>> would be "good enough" I think. It could also be exposed via metrics now
>>> that I think of it - no real need to see it in UI or API.
>>> 
>>> On Fri, Jun 14, 2024 at 12:14 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
>>> 
>>>> Yeah, valuable to show it in logs. For showing it in a web server or
>>>> storing it in DB, the cardinality of those logs is too high.
>>>> 
>>>> On Fri, 14 Jun 2024 at 11:09, Eugen Kosteev <eu...@kosteev.com> wrote:
>>>> 
>>>>> Yeah, I also think it is a good idea to expose it in the Airflow UI.
>>>>> 
>>>>> Although, atm we do not have an entity such as DAG file (and this
>>>> metric is
>>>>> per DAG file) in Airflow database, so we would need to design it a
>>>> little
>>>>> bit.
>>>>> And attaching to the DAG model is not correct.
>>>>> 
>>>>> But I totally agree, it would be good to have it in Airflow UI as well
>>>> for
>>>>> "operation users" to have access to this information.
>>>>> 
>>>>> On Fri, Jun 14, 2024 at 11:22 AM Jarek Potiuk <ja...@potiuk.com>
>>>> wrote:
>>>>> 
>>>>>> Good idea, it would also be good if we could have access to the
>>>>> information
>>>>>> exposed in the UI - so that "operations users" can see it and maybe
>>>> even
>>>>>> act on it + API/ CLI to check it. I think in the future of Airflow 3
>>>>> where
>>>>>> we will have task isolation, having `0` for all the DAGs will be a
>>>>>> prerequisite for switching to "task isolation" mode and this could be
>>>>>> actually verified in a migration tool.
>>>>>> 
>>>>>> On Fri, Jun 14, 2024 at 10:59 AM Eugen Kosteev <eu...@kosteev.com>
>>>>> wrote:
>>>>>> 
>>>>>>> Hi.
>>>>>>> 
>>>>>>> I would like to discuss the proposal of adding a new column to the
>>>> "DAG
>>>>>>> File Processing Stats" of DAG processor logs.
>>>>>>> 
>>>>>>> Currently in the logs of DAG processor, there is following data
>>>>>>> (screenshot below) that includes # of DAGs, runtime, etc. per DAG
>>>> file.
>>>>>>> [image: image.png]
>>>>>>> 
>>>>>>> It seems that it would be beneficial to have also there data about
>>>> the
>>>>>>> number of queries performed to the Airflow database during parsing
>>>> of
>>>>>> each
>>>>>>> file.
>>>>>>> It maybe convenient to have it in case of debugging issues related
>>>> to
>>>>>> high
>>>>>>> load on Airflow database, e.g. typical scenario is when DAG file(s)
>>>>> have
>>>>>>> a lot of queries to database done on the top level of code and
>>>> those
>>>>> are
>>>>>>> executed each time during parsing of these DAG files.
>>>>>>> One common example is excessive usage of "Variables.get" as
>>>> top-level
>>>>>>> statements in DAG files.
>>>>>>> 
>>>>>>> Having information about "number of queries to Airflow database"
>>>> per
>>>>> DAG
>>>>>>> file may help a lot during debugging issues related to high load on
>>>>>>> database or issues related to long parsing of the DAG files.
>>>>>>> 
>>>>>>> One caveat is that due to e.g. caching enabled for Variables or
>>>> because
>>>>>> of
>>>>>>> other reasons (dynamic DAGs), number of queries may be very
>>>> different
>>>>> for
>>>>>>> each parsing of the DAG file,
>>>>>>> but at least we can have it as "Last Run Number of Queries" - that
>>>>> would
>>>>>>> already give some idea and engineer can also review logs
>>>> historically
>>>>> to
>>>>>>> see its data in the past.
>>>>>>> 
>>>>>>> What are your thoughts?
>>>>>>> 
>>>>>>> --
>>>>>>> Eugene
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Eugene
>>>>> 
>>>> 
>>> 
>> 
>> --
>> Eugene
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to