minnieshi edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-821258217


   **My situation**
   - kubenetes
   - airflow 1.10.14
   - celery executor
   - only 1 dag is 'on', the rest 20 dags are 'off'
   - dag is correct as it works in other environment. 
   - pool (default_pool), 32 slots, 0 used slots, queued slots =1)
   - tasks in the dag can be run manually (by clear it), but it does not 
automatically run the next task.
   - one situation: after restarting the scheduler manually (to restart 
configuration is set to never, value schedulerNumRuns is set  -1),  it decided 
to run 3 out of 4 tasks, and the last one just **stuck at the queued state**
   - after that, tried to load the dag with different name and different id, 
the 1st task of the dag 
    **stuck at the 'scheduled' state** after clear the task.
   - when checking log on scheduler, it has error like this
   `[2021-04-16 13:06:36,392] {celery_executor.py:282} ERROR - Error fetching 
Celery task state, ignoring it:AirflowTaskTimeout('Timeout, PID: 3497') Celery 
Task ID: ('XXXXXXXX_YYY_test', 'Task_blahblahblah', datetime.datetime(2021, 4, 
15, 3, 0, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1)`
   
   - i reported here: https://github.com/helm/charts/issues/19399 but found the 
issue is already closed.
   
   **I tried to experiment the ...., which did not help as i expected.**
   - Uploaded the dag with new name/id. enabled, cleared the dag (otherwise the 
1st task just stuck at the 'queued' state)
   and 1st task is at the 'scheduled' state and stuck there.
   - check scheduler log:
   - `[2021-04-16 15:58:51,991] {celery_executor.py:282} ERROR - Error fetching 
Celery task state, ignoring it:AirflowTaskTimeout('Timeout, PID: 1851')
   Celery Task ID: ('XXXXXX_min_test_3', 'Load_XXXX_to_YYYY', 
datetime.datetime(2021, 4, 15, 3, 0, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, 
STD]>), 1)
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/celery_executor.py",
 line 117, in fetch_celery_task_state
       res = (celery_task[0], celery_task[1].state)`
   
   
   ps. The environment is set up new, and migrated a few tables listed below 
from old environment.  during debug of this stuck situation, the table 'dag', 
'task_*', 'celery_*' had been truncated.
   ```
   - celeray_taskmeta
   - dag
   - dag_run
   - log
   - task_fail
   - task_instance
   - task_reschedule
   - connections
   ```
   
   the dag log itself is empty, since the task was not executed. 
   the worker has no errors. i will attach the log anyhow.
   
[log-workers.txt](https://github.com/apache/airflow/files/6327151/log-workers.txt)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to