dlesco opened a new issue, #35806:
URL: https://github.com/apache/airflow/issues/35806

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   The triggerer emits "Triggerer's async thread was blocked for x.xx seconds" 
messages even on a cluster where none of the DAGs have been updated to use 
deferred mode for operators, so there is no load on the triggerer.
   
   I have a detailed analysis of why.
   
   In the "Anything Else" section, I am including a series of logs showing the 
problem, under differing conditions. But they all share the same condition that 
there is no load on the triggerer.
   
   The issue is the triggerer's `block_watchdog` async function, which 
hardcodes a timeout of 0.2 seconds for consecutive runs of the async function.  
This especially does not play well if you set in kubernetes 
`spec.template.spec.containers[].resources.limits.cpu`.  For Google Composer, 
all pods on GKE Autopilot must have a cpu limit set, but it's an issue for any 
k8s deployment that sets a low cpu limit.
   
   The k8s `resources.limits.cpu` feature makes use of the Linux kernel's 
cgroups cpu throttling feature.  There is a cpu `period`, typically 100ms, and 
if all the threads in a container consume more than their allotted cpu time, 
the container is not scheduled onto a cpu for the rest of the period.  So, if 
you have a cpu limit of 0.5, and over the course of 25ms, 50ms of cpu time is 
used by the threads in the container, then the container will not run for the 
next 75ms.  This introduces latency into the threads.
   
   An article about it:
   https://medium.com/@ramandumcs/cpu-throttling-unbundled-eae883e7e494
   
   This is not the only problem.  If multiple threads share one cpu, then the 
context-switching to the other threads also introduces latency into the 
threads.  For the triggerer container, we have these threads of execution:
   
   1. The thread running the async loop.
   2. The main thread, running the `_run_trigger_loop` function.
   3. The kubernetes configured livenessProbe for the container.
   
   A hardcoded 200ms is a "real-time" constraint.  In order to satisfy such a 
real-time constraint, you would actually need 3 cpu-threads, so that the other 
concurrent threads do not introduce latencies, and you'd need no cpu throttling 
enabled.
   
   But we're doing batch jobs, not real-time stock trading or something.
   
   
   
   ### What you think should happen instead
   
   Since Airflow is a batch job processing system, we should not have expensive 
real-time constraints that require you to dedicate multiple CPUs to the 
triggerer.  Instead:
   
   1. Make the block_watchdog timeout configurable in airflow.cfg
   2. Have a default on the order of seconds, not hundreds of milliseconds.
   3. In the Helm chart, add comments about how the triggerer cpu limit affects 
triggerer block_watchdog latency.
   
   
   ### How to reproduce
   
   Have the helm chart values set a triggerer limit of 0.5 or 1.0 cpu, and let 
it run for a day. You should see similar log messages as I'm including.
   
   
   ### Operating System
   
   NAME="Ubuntu" VERSION="20.04.6 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian 
PRETTY_NAME="Ubuntu 20.04.6 LTS" VERSION_ID="20.04" 
HOME_URL="https://www.ubuntu.com/"; SUPPORT_URL="https://help.ubuntu.com/"; 
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"; 
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy";
 VERSION_CODENAME=focal UBUNTU_CODENAME=focal
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Google Cloud Composer
   
   ### Deployment details
   
   composer-2-4-6-airflow-2-6-3
   
   ### Anything else
   
   #### 0.5 CPU limit
   ```json
   {
     "timestamp": "2023-11-10T00:43:09.627748420Z",
     "textPayload": "Triggerer's async thread was blocked for 1.33 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-10T03:23:03.509407823Z",
     "textPayload": "Triggerer's async thread was blocked for 1.11 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-10T05:19:03.527455640Z",
     "textPayload": "Triggerer's async thread was blocked for 0.81 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-10T07:15:46.782236976Z",
     "textPayload": "Triggerer's async thread was blocked for 0.45 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-10T09:12:34.504077450Z",
     "textPayload": "Triggerer's async thread was blocked for 0.42 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-10T11:09:25.525505471Z",
     "textPayload": "Triggerer's async thread was blocked for 0.46 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-10T13:06:16.307688459Z",
     "textPayload": "Triggerer's async thread was blocked for 0.96 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-10T15:28:10.720029804Z",
     "textPayload": "Triggerer's async thread was blocked for 1.29 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-10T17:24:15.113605243Z",
     "textPayload": "Triggerer's async thread was blocked for 0.81 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   ```
   
   #### 1.0 cpu limit
   ```json
   {
     "timestamp": "2023-11-10T19:50:40.023219655Z",
     "textPayload": "Triggerer's async thread was blocked for 0.57 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-10T21:42:45.592743140Z",
     "textPayload": "Triggerer's async thread was blocked for 0.22 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-10T23:35:31.606070073Z",
     "textPayload": "Triggerer's async thread was blocked for 0.21 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-11T01:28:24.456517186Z",
     "textPayload": "Triggerer's async thread was blocked for 0.44 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-11T03:21:01.439348532Z",
     "textPayload": "Triggerer's async thread was blocked for 0.30 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-11T05:13:32.570751960Z",
     "textPayload": "Triggerer's async thread was blocked for 0.21 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-11T07:05:51.271996064Z",
     "textPayload": "Triggerer's async thread was blocked for 0.24 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-11T10:50:37.726952059Z",
     "textPayload": "Triggerer's async thread was blocked for 0.21 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-11T12:43:03.944972600Z",
     "textPayload": "Triggerer's async thread was blocked for 0.24 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-11T14:35:29.019331992Z",
     "textPayload": "Triggerer's async thread was blocked for 0.34 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-11T16:28:20.008302527Z",
     "textPayload": "Triggerer's async thread was blocked for 0.62 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-11T22:06:00.143154881Z",
     "textPayload": "Triggerer's async thread was blocked for 0.23 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-11T23:58:26.327818028Z",
     "textPayload": "Triggerer's async thread was blocked for 0.41 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-12T03:43:12.151969113Z",
     "textPayload": "Triggerer's async thread was blocked for 0.59 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-12T05:35:38.605106362Z",
     "textPayload": "Triggerer's async thread was blocked for 0.24 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-12T07:28:09.902736229Z",
     "textPayload": "Triggerer's async thread was blocked for 0.23 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-12T09:20:40.018579247Z",
     "textPayload": "Triggerer's async thread was blocked for 0.22 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-12T11:13:12.791422256Z",
     "textPayload": "Triggerer's async thread was blocked for 0.78 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-12T13:05:43.599749227Z",
     "textPayload": "Triggerer's async thread was blocked for 0.29 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-12T16:50:34.846483498Z",
     "textPayload": "Triggerer's async thread was blocked for 0.24 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-12T18:43:07.569643303Z",
     "textPayload": "Triggerer's async thread was blocked for 0.24 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   ```
   
   #### 1.0 cpu, livenessProbe removed from the Deployment
   ```json
   {
     "timestamp": "2023-11-12T22:16:11.391819276Z",
     "textPayload": "Triggerer's async thread was blocked for 0.37 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-13T00:07:53.229342739Z",
     "textPayload": "Triggerer's async thread was blocked for 0.24 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-13T03:53:02.842909571Z",
     "textPayload": "Triggerer's async thread was blocked for 0.26 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-13T05:45:36.218492337Z",
     "textPayload": "Triggerer's async thread was blocked for 0.33 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-13T07:38:00.747163337Z",
     "textPayload": "Triggerer's async thread was blocked for 0.20 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-13T09:30:33.190422037Z",
     "textPayload": "Triggerer's async thread was blocked for 0.25 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-13T11:23:15.590603839Z",
     "textPayload": "Triggerer's async thread was blocked for 0.27 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-13T15:08:49.891520932Z",
     "textPayload": "Triggerer's async thread was blocked for 0.25 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   {
     "timestamp": "2023-11-13T18:54:06.744651454Z",
     "textPayload": "Triggerer's async thread was blocked for 0.21 seconds, 
likely due to the highly utilized environment. In case of Warnings informing 
you about slow callback consider visiting 
https://cloud.google.com/composer/docs/composer-2/troubleshooting-triggerer.";
   }
   ```
   
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to