1fanwang opened a new issue, #66800:
URL: https://github.com/apache/airflow/issues/66800

   ### Description
   
   `scheduler_job_runner.py` emits gauges for pool slot states 
(`pool.open_slots`, `pool.queued_slots`, `pool.running_slots`, 
`pool.starving_tasks`). On most backends, gauges are last-write-wins — a spike 
in pool pressure between two scheduler loop iterations shows up as a single 
value, and the distribution between scrapes is lost.
   
   ### Use case / motivation
   
   Backend operators sizing pools want p50/p95/p99 of pool utilization, not 
just point-in-time gauge samples. Today there's no way to see the spread.
   
   ### Proposal
   
   Alongside each existing pool slot gauge emission, also emit a histogram with 
the same value. Four `Stats.histogram(...)` additions in 
`scheduler_job_runner.py`, same call sites as the existing gauges. Nothing 
removed — gauges stay for backwards-compatible scrapers.
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's Code of Conduct
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to