yuseok89 commented on code in PR #67242:
URL: https://github.com/apache/airflow/pull/67242#discussion_r3313074533
##########
airflow-core/src/airflow/api_fastapi/core_api/routes/ui/dags.py:
##########
@@ -308,3 +311,45 @@ def get_latest_run_info(dag_id: str, session: SessionDep)
-> DAGRunLightResponse
latest_run_info = session.execute(latest_run_info_select).one_or_none()
return DAGRunLightResponse(**latest_run_info._mapping) if latest_run_info
else None
+
+
+@dags_router.get(
+ "/run_state_counts",
+ dependencies=[
+ Depends(requires_access_dag(method="GET")),
+ Depends(requires_access_dag(method="GET",
access_entity=DagAccessEntity.RUN)),
+ ],
+ operation_id="get_dag_run_state_counts_ui",
+)
+def get_dag_run_state_counts(
+ session: SessionDep,
+ readable_dags_filter: ReadableDagsFilterDep,
+ dag_ids: Annotated[list[str], Query(min_length=1)],
+ run_after_gte: datetime | None = None,
+) -> DAGsRunStateCountsCollectionResponse:
+ """Return per-Dag DagRun state counts (zero-filled) for the Dag list
page."""
+ permitted_dag_ids = readable_dags_filter.value or set()
+ requested_dag_ids = [dag_id for dag_id in dict.fromkeys(dag_ids) if dag_id
in permitted_dag_ids]
+ counts_by_dag: dict[str, dict[DagRunState, int]] = {
+ dag_id: {state: 0 for state in DagRunState} for dag_id in
requested_dag_ids
+ }
+
+ if requested_dag_ids:
+ count_query = (
+ select(DagRun.dag_id, DagRun.state, func.count().label("cnt"))
+ .where(DagRun.dag_id.in_(requested_dag_ids))
+ .group_by(DagRun.dag_id, DagRun.state)
+ )
+ if run_after_gte is not None:
+ count_query = count_query.where(DagRun.run_after >= run_after_gte)
+ for row in session.execute(count_query):
+ if row.state is None:
+ continue
+ counts_by_dag[row.dag_id][DagRunState(row.state)] = row.cnt
+
Review Comment:
This was a point I hadn't thought of. Thanks for catching it.
It's taking me a bit of time to test this part directly. Getting test
results with actual numbers will take even longer, but there are a few things
I'd like to discuss before then, and I wanted to check with you before heading
further in the wrong direction.
You mentioned 10M, but I think 5M was already enough for a solid test.
<img width="796" height="119" alt="image"
src="https://github.com/user-attachments/assets/64687dc0-76d3-45d0-a4ec-f47f0819b8ae"
/>
That said, a few questions came up while testing. You pointed to the Home
page (which historical_metrics backs) as already capping these counts at 1K+,
but **is that cap actually effective?**
With 5M DagRuns inserted, loading the Home page itself is often pending for
30s+, which really wasn't much different from loading the Dags page with the
unbounded SQL (the initial approach).
The historical_metrics you mentioned looks like it makes 4 calls, one per
state. Similarly, I tried running this Dags state count as 4 queries, one per
state (each UNION-ing up to 50 dag_ids per page), capped at 1K+, and checked
the result.
This too showed no noticeable difference in execution time.
If my testing direction is wrong, I'd appreciate you letting me know.
I'll try to attach the numeric test results within this week.
My question here is: even when there's no meaningful difference like this,
is giving users the limited `1K+` information the right direction? Even if
there were a small difference, I wonder whether there's a better option than
making most users see limited information because of a design built around a
tiny minority of heavy users. What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]