[ 
https://issues.apache.org/jira/browse/AIRFLOW-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Davidheiser updated AIRFLOW-2827:
---------------------------------------
    Issue Type: Bug  (was: Wish)

> Tasks that fail with spurious Celery issues are not retried
> -----------------------------------------------------------
>
>                 Key: AIRFLOW-2827
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2827
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: James Davidheiser
>            Priority: Major
>
> We have a DAG with ~500 tasks, running on Airflow set up in Kubernetes with 
> RabbitMQ using a setup derived pretty heavily from 
> [https://github.com/mumoshu/kube-airflow.]  Occasionally, we will hit some 
> spurious Celery execution failures (possibly related to #2011 ), resulting in 
> the Worker throwing errors that look like this:
>  
> ```[2018-07-30 11:04:26,812: ERROR/ForkPoolWorker-9] Task 
> airflow.executors.celery_executor.execute_command[462de800-ad3f-4151-90bf-9155cc6c66f6]
>  raised unexpected: AirflowException('Celery command failed',)
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 
> 382, in trace_task
>     R = retval = fun(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 
> 641, in __protected_call__
>     return self.run(*args, **kwargs)
>   File 
> "/usr/local/lib/python2.7/dist-packages/airflow/executors/celery_executor.py",
>  line 55, in execute_command
>     raise AirflowException('Celery command failed')
> AirflowException: Celery command failed```
>  
> When these tasks fail, they send a "task failed" email that has very little 
> information about the state of the task failure.  The logs for the task run 
> are empty, because the task never actually did anything and the error message 
> was generated by the worker.  Also, the task does not retry, so if something 
> goes wrong with Celery, the task simply fails outright instead of trying 
> again.
>  
> This may be the same issue reported in #1844, but I am not sure because there 
> is not much detail there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to