[ 
https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089818#comment-16089818
 ] 

Rick Otten edited comment on AIRFLOW-401 at 7/17/17 1:28 PM:
-------------------------------------------------------------

According to the logs, ever since we set it up, it restarts itself regularly.  
Before we had a lot of tasks, it was more often than since we added a lot of 
tasks.  This is what the logs look like during a routine restart (which cleans 
up all of the dead children):


{code:none}
Jul 15 12:54:38 myserver airflow[4467]: [2017-07-15 12:54:38,253] 
{jobs.py:1313} INFO - Exited execute loop
Jul 15 12:54:43 myserver systemd[1]: airflow-scheduler.service: Service 
hold-off time over, scheduling restart.
Jul 15 12:54:43 myserver systemd[1]: Stopped Airflow scheduler daemon.
Jul 15 12:54:43 myserver systemd[1]: Started Airflow scheduler daemon.
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,364] 
{__init__.py:57} INFO - Using executor LocalExecutor
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,450] 
{driver.py:120} INFO - Generating grammar tables from 
/usr/lib/python3.5/lib2to3/Grammar.txt
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,476] 
{driver.py:120} INFO - Generating grammar tables from 
/usr/lib/python3.5/lib2to3/PatternGrammar.txt
Jul 15 12:54:44 myserver airflow[20289]:   ____________       _____________
Jul 15 12:54:44 myserver airflow[20289]:  ____    |__( )_________  __/__  
/________      __
Jul 15 12:54:44 myserver airflow[20289]: ____  /| |_  /__  ___/_  /_ __  /_  __ 
\_ | /| / /
Jul 15 12:54:44 myserver airflow[20289]: ___  ___ |  / _  /   _  __/ _  / / /_/ 
/_ |/ |/ /
Jul 15 12:54:44 myserver airflow[20289]:  _/_/  |_/_/  /_/    /_/    /_/  
\____/____/|__/
Jul 15 12:54:44 myserver airflow[20289]:  
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,777] 
{jobs.py:1264} INFO - Starting the scheduler
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,778] 
{jobs.py:1280} INFO - Processing files using up to 4 processes at a time
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,778] 
{jobs.py:1282} INFO - Running execute loop for -1 seconds
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,778] 
{jobs.py:1284} INFO - Processing each file at most 100 times
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,778] 
{jobs.py:1286} INFO - Process each file at most once every 120 seconds
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,778] 
{jobs.py:1288} INFO - Checking for new files in /home/airflow/airflow/dags 
every 300 seconds
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,778] 
{jobs.py:1291} INFO - Searching for files in /home/airflow/airflow/dags
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,779] 
{jobs.py:1294} INFO - There are 20 files in /home/airflow/airflow/dags
Jul 15 12:54:44 myserver airflow[20289]: 
/home/airflow/airflow/lib/python3.5/site-packages/airflow/www/app.py:23: 
FlaskWTFDeprecationWarning: "flask_wtf.CsrfProtect" has been renamed to 
"CSRFProtect" and will be removed in 1.0.
Jul 15 12:54:44 myserver airflow[20289]:   csrf = CsrfProtect()
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,994] 
{jobs.py:1356} INFO - Resetting state for orphaned tasks
Jul 15 12:54:45 myserver airflow[20289]: [2017-07-15 12:54:45,008] 
{jobs.py:1366} INFO - Resetting backup_clarivoypg 2017-07-14 06:00:00
Jul 15 12:54:45 myserver airflow[20289]: [2017-07-15 12:54:45,018] 
{jobs.py:1366} INFO - Resetting daily_web_response_processor 2017-07-14 09:00:00
Jul 15 12:54:45 myserver airflow[20289]: [2017-07-15 12:54:45,023] 
{jobs.py:1366} INFO - Resetting ping_pgbouncers 2017-07-15 09:00:00
Jul 15 12:54:45 myserver airflow[20289]: [2017-07-15 12:54:45,027] 
{jobs.py:1407} INFO - Heartbeating the process manager
Jul 15 12:54:45 myserver airflow[20289]: [2017-07-15 12:54:45,032] 
{dag_processing.py:627} INFO - Started a process (PID: 20462) to generate tasks 
for /home/airflow/airflow/dags/my_daily.py - logging into 
/home/airflow/airflow/logs/scheduler/2017-07-15/my_daily.py.log
Jul 15 12:54:45 myserver airflow[20289]: [2017-07-15 12:54:45,036] 
{dag_processing.py:627} INFO - Started a process (PID: 20463) to generate tasks 
for /home/airflow/airflow/dags/my_hourly.py - logging into 
/home/airflow/airflow/logs/scheduler/2017-07-15/my_hourly.py.log
Jul 15 12:54:45 myserver airflow[20289]: [2017-07-15 12:54:45,042] 
{dag_processing.py:627} INFO - Started a process (PID: 20464) to generate tasks 
for /home/airflow/airflow/dags/my_odd_job.py - logging into 
/home/airflow/airflow/logs/scheduler/2017-07-15/my_odd_job.py.log
Jul 15 12:54:45 myserver airflow[20289]: [2017-07-15 12:54:45,047] 
{dag_processing.py:627} INFO - Started a process (PID: 20465) to generate tasks 
for /home/airflow/airflow/dags/my_other_odd_job.py - logging into 
/home/airflow/airflow/logs/scheduler/2017-07-15/my_other_odd_job.py.log
Jul 15 12:54:45 myserver airflow[20289]: [2017-07-15 12:54:45,049] 
{jobs.py:1443} INFO - Heartbeating the executor
{code}



was (Author: rotten):
According to the logs, ever since we set it up, it restarts itself regularly.  
Before we had a lot of tasks, it was more often than since we added a lot of 
tasks.  This is what the logs look like during a routine restart (which cleans 
up all of the dead children):


{code:plain text}
Jul 15 12:54:38 myserver airflow[4467]: [2017-07-15 12:54:38,253] 
{jobs.py:1313} INFO - Exited execute loop
Jul 15 12:54:43 myserver systemd[1]: airflow-scheduler.service: Service 
hold-off time over, scheduling restart.
Jul 15 12:54:43 myserver systemd[1]: Stopped Airflow scheduler daemon.
Jul 15 12:54:43 myserver systemd[1]: Started Airflow scheduler daemon.
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,364] 
{__init__.py:57} INFO - Using executor LocalExecutor
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,450] 
{driver.py:120} INFO - Generating grammar tables from 
/usr/lib/python3.5/lib2to3/Grammar.txt
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,476] 
{driver.py:120} INFO - Generating grammar tables from 
/usr/lib/python3.5/lib2to3/PatternGrammar.txt
Jul 15 12:54:44 myserver airflow[20289]:   ____________       _____________
Jul 15 12:54:44 myserver airflow[20289]:  ____    |__( )_________  __/__  
/________      __
Jul 15 12:54:44 myserver airflow[20289]: ____  /| |_  /__  ___/_  /_ __  /_  __ 
\_ | /| / /
Jul 15 12:54:44 myserver airflow[20289]: ___  ___ |  / _  /   _  __/ _  / / /_/ 
/_ |/ |/ /
Jul 15 12:54:44 myserver airflow[20289]:  _/_/  |_/_/  /_/    /_/    /_/  
\____/____/|__/
Jul 15 12:54:44 myserver airflow[20289]:  
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,777] 
{jobs.py:1264} INFO - Starting the scheduler
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,778] 
{jobs.py:1280} INFO - Processing files using up to 4 processes at a time
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,778] 
{jobs.py:1282} INFO - Running execute loop for -1 seconds
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,778] 
{jobs.py:1284} INFO - Processing each file at most 100 times
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,778] 
{jobs.py:1286} INFO - Process each file at most once every 120 seconds
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,778] 
{jobs.py:1288} INFO - Checking for new files in /home/airflow/airflow/dags 
every 300 seconds
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,778] 
{jobs.py:1291} INFO - Searching for files in /home/airflow/airflow/dags
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,779] 
{jobs.py:1294} INFO - There are 20 files in /home/airflow/airflow/dags
Jul 15 12:54:44 myserver airflow[20289]: 
/home/airflow/airflow/lib/python3.5/site-packages/airflow/www/app.py:23: 
FlaskWTFDeprecationWarning: "flask_wtf.CsrfProtect" has been renamed to 
"CSRFProtect" and will be removed in 1.0.
Jul 15 12:54:44 myserver airflow[20289]:   csrf = CsrfProtect()
Jul 15 12:54:44 myserver airflow[20289]: [2017-07-15 12:54:44,994] 
{jobs.py:1356} INFO - Resetting state for orphaned tasks
Jul 15 12:54:45 myserver airflow[20289]: [2017-07-15 12:54:45,008] 
{jobs.py:1366} INFO - Resetting backup_clarivoypg 2017-07-14 06:00:00
Jul 15 12:54:45 myserver airflow[20289]: [2017-07-15 12:54:45,018] 
{jobs.py:1366} INFO - Resetting daily_web_response_processor 2017-07-14 09:00:00
Jul 15 12:54:45 myserver airflow[20289]: [2017-07-15 12:54:45,023] 
{jobs.py:1366} INFO - Resetting ping_pgbouncers 2017-07-15 09:00:00
Jul 15 12:54:45 myserver airflow[20289]: [2017-07-15 12:54:45,027] 
{jobs.py:1407} INFO - Heartbeating the process manager
Jul 15 12:54:45 myserver airflow[20289]: [2017-07-15 12:54:45,032] 
{dag_processing.py:627} INFO - Started a process (PID: 20462) to generate tasks 
for /home/airflow/airflow/dags/my_daily.py - logging into 
/home/airflow/airflow/logs/scheduler/2017-07-15/my_daily.py.log
Jul 15 12:54:45 myserver airflow[20289]: [2017-07-15 12:54:45,036] 
{dag_processing.py:627} INFO - Started a process (PID: 20463) to generate tasks 
for /home/airflow/airflow/dags/my_hourly.py - logging into 
/home/airflow/airflow/logs/scheduler/2017-07-15/my_hourly.py.log
Jul 15 12:54:45 myserver airflow[20289]: [2017-07-15 12:54:45,042] 
{dag_processing.py:627} INFO - Started a process (PID: 20464) to generate tasks 
for /home/airflow/airflow/dags/my_odd_job.py - logging into 
/home/airflow/airflow/logs/scheduler/2017-07-15/my_odd_job.py.log
Jul 15 12:54:45 myserver airflow[20289]: [2017-07-15 12:54:45,047] 
{dag_processing.py:627} INFO - Started a process (PID: 20465) to generate tasks 
for /home/airflow/airflow/dags/my_other_odd_job.py - logging into 
/home/airflow/airflow/logs/scheduler/2017-07-15/my_other_odd_job.py.log
Jul 15 12:54:45 myserver airflow[20289]: [2017-07-15 12:54:45,049] 
{jobs.py:1443} INFO - Heartbeating the executor
{code}


> scheduler gets stuck without a trace
> ------------------------------------
>
>                 Key: AIRFLOW-401
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-401
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: executor, scheduler
>    Affects Versions: Airflow 1.7.1.3
>            Reporter: Nadeem Ahmed Nazeer
>            Assignee: Bolke de Bruin
>            Priority: Minor
>              Labels: celery, kombu
>         Attachments: Dag_code.txt, schduler_cpu100%.png, 
> scheduler_stuck_7hours.png, scheduler_stuck.png
>
>
> The scheduler gets stuck without a trace or error. When this happens, the CPU 
> usage of scheduler service is at 100%. No jobs get submitted and everything 
> comes to a halt. Looks it goes into some kind of infinite loop. 
> The only way I could make it run again is by manually restarting the 
> scheduler service. But again, after running some tasks it gets stuck. I've 
> tried with both Celery and Local executors but same issue occurs. I am using 
> the -n 3 parameter while starting scheduler. 
> Scheduler configs,
> job_heartbeat_sec = 5
> scheduler_heartbeat_sec = 5
> executor = LocalExecutor
> parallelism = 32
> Please help. I would be happy to provide any other information needed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to