Copilot commented on code in PR #67572:
URL: https://github.com/apache/airflow/pull/67572#discussion_r3306618928


##########
airflow-core/src/airflow/dag_processing/manager.py:
##########
@@ -1566,6 +1571,16 @@ def emit_metrics(*, parse_time: float, dag_file_stats: 
Sequence[DagFileStat]):
     stats.gauge("dag_processing.total_parse_time", parse_time)
     stats.gauge("dagbag_size", sum(stat.num_dags for stat in dag_file_stats))
     stats.gauge("dag_processing.import_errors", sum(stat.import_errors for 
stat in dag_file_stats))
+    # COUNT(*) on the serialized_dag table adds one DB round-trip per parse 
loop.
+    # This can be expensive on large deployments (query plan is DB-dependent 
and
+    # may involve a full table scan).  The call is isolated so that a transient
+    # DB error never kills the parse loop; throttling this metric in the future
+    # is a straightforward follow-up if the round-trip becomes a bottleneck.
+    try:
+        with create_session() as session:
+            stats.gauge("serialized_dag.count", 
SerializedDagModel.get_count(session=session))
+    except SQLAlchemyError:
+        log.exception("Failed to emit serialized_dag.count metric")

Review Comment:
   This adds a DB round-trip (and potentially a full table scan) once per parse 
loop, which can materially increase scheduler/manager load at scale. Consider 
throttling (e.g., emit once per N loops or per time interval), gating behind a 
config/feature flag, or reusing an existing session/connection from the parse 
loop to avoid extra connection churn.



##########
airflow-core/src/airflow/dag_processing/manager.py:
##########
@@ -1566,6 +1571,16 @@ def emit_metrics(*, parse_time: float, dag_file_stats: 
Sequence[DagFileStat]):
     stats.gauge("dag_processing.total_parse_time", parse_time)
     stats.gauge("dagbag_size", sum(stat.num_dags for stat in dag_file_stats))
     stats.gauge("dag_processing.import_errors", sum(stat.import_errors for 
stat in dag_file_stats))
+    # COUNT(*) on the serialized_dag table adds one DB round-trip per parse 
loop.
+    # This can be expensive on large deployments (query plan is DB-dependent 
and
+    # may involve a full table scan).  The call is isolated so that a transient
+    # DB error never kills the parse loop; throttling this metric in the future
+    # is a straightforward follow-up if the round-trip becomes a bottleneck.
+    try:
+        with create_session() as session:
+            stats.gauge("serialized_dag.count", 
SerializedDagModel.get_count(session=session))
+    except SQLAlchemyError:

Review Comment:
   When a DB error occurs, the gauge is not emitted at all, which can leave 
dashboards showing a stale last value and potentially mask ongoing failure. 
Consider also emitting an explicit failure signal (e.g., a counter like 
`serialized_dag.count_error` or a sentinel gauge value) so monitoring can alert 
reliably on missing/failed samples.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to