Re: [PR] Use average DAG run execution times as deadline reference [airflow]

via GitHub Sat, 20 Sep 2025 17:59:42 -0700


o-nikolas commented on code in PR #55088:
URL: https://github.com/apache/airflow/pull/55088#discussion_r2330883810



##########
airflow-core/src/airflow/models/deadline.py:
##########
@@ -355,6 +355,52 @@ def _evaluate_with(self, *, session: Session, **kwargs: 
Any) -> datetime:
 
             return _fetch_from_db(DagRun.queued_at, session=session, **kwargs)
 
+    class AverageRuntimeDeadline(BaseDeadlineReference):
+        """A deadline that calculates the average runtime from past DAG 
runs."""
+
+        required_kwargs = {"dag_id"}
+
+        @provide_session
+        def _evaluate_with(self, *, session: Session, **kwargs: Any) -> 
datetime:
+            from airflow.models import DagRun
+
+            dag_id = kwargs["dag_id"]
+            limit = kwargs.get("limit", 10)  # Default to 10 runs if not 
specified
+
+            # Query for completed DAG runs with both start and end dates
+            # Order by logical_date descending to get most recent runs first
+            query = (
+                select(func.extract("epoch", DagRun.end_date - 
DagRun.start_date))
+                .filter(DagRun.dag_id == dag_id, 
DagRun.start_date.isnot(None), DagRun.end_date.isnot(None))
+                .order_by(DagRun.logical_date.desc())
+            )
+
+            # Apply limit (defaults to 10)
+            query = query.limit(limit)
+            logger.info(

Review Comment:
   Is this going to log on each dag parse? We might want to switch this to debug



##########
airflow-core/src/airflow/models/deadline.py:
##########
@@ -355,6 +355,52 @@ def _evaluate_with(self, *, session: Session, **kwargs: 
Any) -> datetime:
 
             return _fetch_from_db(DagRun.queued_at, session=session, **kwargs)
 
+    class AverageRuntimeDeadline(BaseDeadlineReference):
+        """A deadline that calculates the average runtime from past DAG 
runs."""
+
+        required_kwargs = {"dag_id"}
+
+        @provide_session
+        def _evaluate_with(self, *, session: Session, **kwargs: Any) -> 
datetime:
+            from airflow.models import DagRun
+
+            dag_id = kwargs["dag_id"]
+            limit = kwargs.get("limit", 10)  # Default to 10 runs if not 
specified
+
+            # Query for completed DAG runs with both start and end dates
+            # Order by logical_date descending to get most recent runs first
+            query = (
+                select(func.extract("epoch", DagRun.end_date - 
DagRun.start_date))
+                .filter(DagRun.dag_id == dag_id, 
DagRun.start_date.isnot(None), DagRun.end_date.isnot(None))
+                .order_by(DagRun.logical_date.desc())
+            )
+
+            # Apply limit (defaults to 10)
+            query = query.limit(limit)
+            logger.info(
+                "Limiting average runtime calculation to latest %d runs for 
dag_id: %s", limit, dag_id
+            )
+
+            # Get all durations and calculate average
+            durations = session.execute(query).scalars().all()
+
+            if not durations:
+                logger.warning(
+                    "In the AverageRuntimeDeadline, no completed DAG runs 
found for dag_id: %s, defaulting to 0 seconds",
+                    dag_id,
+                )
+                avg_seconds = 0

Review Comment:
   Doesn't this mean the deadline will fire immediately? I think we should 
return whatever a "Truthy" value we need to in this case (so that the deadline 
is guaranteed not to fire) to ensure it doesn't auto fail until we've reached 
the limit and can calculate a good average?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Use average DAG run execution times as deadline reference [airflow]

Reply via email to