tardunge commented on issue #51213:
URL: https://github.com/apache/airflow/issues/51213#issuecomment-2942741734

   @jroachgolf84 Didn't find a fix but I think i'm close to the root cause. The 
co-routine might not be progressing due to a potential deadlock.
   I see the issue happening more often and consistently after some message 
from sqs has been consumed.
   When a message get's consumed, the triggerer yields a TriggerEvent and then 
breaks out of the run method at 
[L184](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/triggers/sqs.py#L184)
   
   After this, the triggerer gets into a completed state and the main job 
responsible for running the triggerer coroutines marks it for removal and adds 
a new instance.
   The new instance gets spawned and this is where the stalling happens at 
[L187](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/triggers/sqs.py#L187)
   The get connection method, eventually talks to the supervisor at 
[triggerer_job_runner.py 
L394](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/triggerer_job_runner.py#L394).
   
   Meanwhile, we have this busy loop responsible for spawning and maintaining 
the lifecycle of triggerers at 
[L749](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/triggerer_job_runner.py#L749)
 and this loops uses a method called `sync_state_to_supervisor` at 
[L936](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/triggerer_job_runner.py#L936).
   
   I highly suspect the `GetConnection` and the main event loop are contending 
for this `LOCK` at 
[L965](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/triggerer_job_runner.py#L965)
 and resulting in a deadlock at some time.
   It would be great if there is a determinisitc simulated test case for things 
like these.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to