AutomationDev85 opened a new pull request, #46510:
URL: https://github.com/apache/airflow/pull/46510
# Overview
Hi Airflow community,
I´m currently trying to get Airflow metrics exported via OTEL -> Prometheus
into Grafana Dashboards. I wanted to use the label advantage of OTEL to create
nice Dashboards.
During the implementation I found an issue and will use metric
airflow_dag_processing_last_duration to explain the details:
This metric is exported in 2 different ways in the Airflow code, to support
statsd way and otel way with labels.
Stats.timing(f"dag_processing.last_duration.{file_name}", stat.last_duration)
Stats.timing("dag_processing.last_duration", stat.last_duration,
tags={"file_name": file_name})
The following prometheus example contains metric export for 2 dags (dag1,
dag2). But the metric with the label (Otel) does only export the metric for 1
dag. Metric export with file_name in the metric name is exported for 2 dags.
# HELP airflow_dag_processing_last_duration
# TYPE airflow_dag_processing_last_duration gauge
airflow_dag_processing_last_duration{file_name="dag1",job="Airflow"} 0.293856
# HELP airflow_dag_processing_last_duration_dag1
# TYPE airflow_dag_processing_last_duration_dag1 gauge
airflow_dag_processing_last_duration_dag1 {job="Airflow"} 0.293856
# HELP airflow_dag_processing_last_duration_blabla2
# TYPE airflow_dag_processing_last_duration_blabla2 gauge
airflow_dag_processing_last_duration_dag2{job="Airflow"} 0.343803
I would expect that the metric is also exported like this:
airflow_dag_processing_last_duration{file_name="dag2",job="Airflow"} 0.343803
So I debugged some time and found out that this issue is only related to the
gauge export. If the metric is a counter the label export works fine.
The issue is that the gauge value is created as an ObserveableGauge and with
that OTEL uses a callback to collect the metric. For this OTEL python lib
creates intruments to handle the metric. The down side of this is that if a
second Observable instrument with the same metric name is created, OTEL will
only create one instrument because it checks for the metric name.
This results in the issue that only the callback for the first registered
metric will be executed and all other metrics with different label but same
name will be ignored.
My idea is now to switch to an syncronos gauge export of the metric like it
is used for the counter export.
I´m not sure why the ObserveableGauge was used but I did not found a
solution to fix the issue without switching to sync gauge export.
Also not sure about any down side of using sync gauge, like maybe runtime,
but for the counter metric export sync was also used. Anyone has more know how
in that area and give feedback for this?
# Details of changes:
* Use of normal sync Gauge instead of ObservedGauge.
* Moved logic to handle gauge into InternalGauge class.
Looking forward to fix this issue and the feedback from your side!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]