[ https://issues.apache.org/jira/browse/AIRFLOW-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
James Davidheiser updated AIRFLOW-2827: --------------------------------------- Issue Type: Bug (was: Wish) > Tasks that fail with spurious Celery issues are not retried > ----------------------------------------------------------- > > Key: AIRFLOW-2827 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2827 > Project: Apache Airflow > Issue Type: Bug > Reporter: James Davidheiser > Priority: Major > > We have a DAG with ~500 tasks, running on Airflow set up in Kubernetes with > RabbitMQ using a setup derived pretty heavily from > [https://github.com/mumoshu/kube-airflow.] Occasionally, we will hit some > spurious Celery execution failures (possibly related to #2011 ), resulting in > the Worker throwing errors that look like this: > > ```[2018-07-30 11:04:26,812: ERROR/ForkPoolWorker-9] Task > airflow.executors.celery_executor.execute_command[462de800-ad3f-4151-90bf-9155cc6c66f6] > raised unexpected: AirflowException('Celery command failed',) > Traceback (most recent call last): > File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line > 382, in trace_task > R = retval = fun(*args, **kwargs) > File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line > 641, in __protected_call__ > return self.run(*args, **kwargs) > File > "/usr/local/lib/python2.7/dist-packages/airflow/executors/celery_executor.py", > line 55, in execute_command > raise AirflowException('Celery command failed') > AirflowException: Celery command failed``` > > When these tasks fail, they send a "task failed" email that has very little > information about the state of the task failure. The logs for the task run > are empty, because the task never actually did anything and the error message > was generated by the worker. Also, the task does not retry, so if something > goes wrong with Celery, the task simply fails outright instead of trying > again. > > This may be the same issue reported in #1844, but I am not sure because there > is not much detail there. -- This message was sent by Atlassian JIRA (v7.6.3#76005)