neel-astro opened a new issue, #50507:
URL: https://github.com/apache/airflow/issues/50507

   ### Apache Airflow version
   
   3.0.0
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   When the task supervisor monitor subprocess's max wait time drops to 0 (i.e. 
task process heartbeat happened long time ago), then the CPU usage shoots to 
100%. This might also be happening as a side effect of #50500, that causes 
supervisor to runs indefinitely after the task process has finished (as a 
result the HEARTBEAT_TIMEOUT - last_heartbeat_ago * 0.75 would be < 0 and thus 
the wait time get set to 0).
   
   When selector.select has timeout set to 0, it would mean a non-blocking mode 
and report currently ready file objects, and returns even if nothing is ready. 
Because we have the selector.select in a tight while loop from 
monitor_subprocess causes the CPU usage to spike to 100%. Reference: 
https://docs.python.org/3/library/selectors.html#selectors.BaseSelector.select 
   
   Code reference: 
https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/execution_time/supervisor.py#L886-L895
   
   ### What you think should happen instead?
   
   The CPU should not throttle for that edge case
   
   ### How to reproduce
   
   Set the `task_instance_heartbeat_timeout` to half of 
`min_heartbeat_interval`, so that the max wait time would end up in being 0. 
Observe the CPU usage during task execution.
   
   ### Operating System
   
   Debian GNU/Linux 12
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Astronomer
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   Setting the min of the max_wait_time to `0.1` instead of `0` 
(https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/execution_time/supervisor.py#L889)
 seems to be resolving the underlying issue.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to