trlopes1974 commented on issue #39717:
URL: https://github.com/apache/airflow/issues/39717#issuecomment-2217421431

   Humm.
   I'm not sure I agree with the "configuration specific" targeting 
problem/issue/ whatever.
   
   It is clear now that this happens with several configurations Kubernets, 
Celery / Redis ( we have RabbitMQ).
   Some have clearly stated that messing with the  task_adoption_timeout 
(increasing to 2H or so) has fixed their issue, and this gives me migraines  
has it makes no sense ( in my mind ) how can a timeout value interact with the 
scheduling/executing of tasks. In my last provided logs you can see that after 
10minutes the task is marked as failed but there is no evidence that it left 
the queued state... could it be some logic failure in the scheduler/worker? (I 
see no concurrency or exhaustion issues on our setup).
   
   I see 2 different problems in this issue:
   1 - the task is never executed ( it is queued but the scheduler does not 
launch it) and this is the case where you have an external_task_id but you have 
no reference of it see it in the worker ( celery/flower );
   2 - the task is executed, the worker "tries" or launches it but something in 
the execution ( either in fork or in new process ) messes up the return value 
in the os.waitpid(). The curious part here is that for Airflow the task was 
executed with success despite that we see the failure in celery/flower. 
   
   
   Yes, it seems that this is one of those that keeps hiding in several places 
and it will be hard to find it.
   The good news is that (in our case) it keeps happening from time to time, 
randomly on different tasks. 
   One curious thing is that, in our case, it is affecting only a few DAGs and 
not others....
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to