[jira] [Comment Edited] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-16 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930761#comment-16930761
 ] 

Daniel Imberman edited comment on AIRFLOW-5447 at 9/16/19 6:32 PM:
---

Ok so I've broken down the currently running threads in hopes this helps us out
  

Thread 1: attempting to put a new task in the task_queue 
{code:java}
Thread 0x7f0c13c7d700
  File "/usr/local/airflow/.local/bin/airflow", line 32, in 
    args.func(args)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", 
line 74, in wrapper
    return f(*args, **kwargs)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/bin/cli.py", 
line 1013, in scheduler
    job.run()  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/base_job.py",
 line 213, in run
    self._execute()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py",
 line 1350, in _execute
    self._execute_helper()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py",
 line 1439, in _execute_helper
    self.executor.heartbeat()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/executors/base_executor.py",
 line 132, in heartbeat
    self.trigger_tasks(open_slots)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/executors/base_executor.py",
 line 156, in trigger_tasks
    executor_config=simple_ti.executor_config)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py",
 line 767, in execute_async
    self.task_queue.put((key, command, kube_executor_config))
  File "", line 2, in put
  File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 819, in 
_callmethod
    kind, result = conn.recv()
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 250, in 
recv
    buf = self._recv_bytes()
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 407, in 
_recv_bytes
    buf = self._recv(4)
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 379, in 
_recv
    chunk = read(handle, remaining)
  File "", line 1, in 
  File "", line 5, in 
{code}
 

Thread 2: re-reading plugins files
{code:java}
Thread 0x7f0c01c31700
  File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)Thread 0x7f0bff430700  File 
"/usr/local/lib/python3.7/multiprocessing/managers.py", line 201, in 
handle_request
    result = func(c, *args, **kwds)
  File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 422, in 
accept_connection
    self.serve_client(c)
  File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 234, in 
serve_client
    request = recv()
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 251, in 
recv
    return _ForkingPickler.loads(buf.getbuffer())
  File "", line 202, in _lock_unlock_module
  File "", line 98, in acquire
  File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 178, in 
accepter
    c = self.listener.accept()
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 453, in 
accept
    c = self._listener.accept()
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 598, in 
accept
    s, self._last_accepted = self._socket.accept()
  File "/usr/local/lib/python3.7/socket.py", line 212, in accept
    fd, addr = self._accept()
  File "/usr/local/airflow/.local/bin/airflow", line 21, in 
    from airflow import configuration
  File "", line 983, in _find_and_load
  File "", line 967, in _find_and_load_unlocked
  File "", line 677, in _load_unlocked
  File "", line 728, in exec_module
  File "", line 219, in _call_with_frames_removed
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/__init__.py", 
line 94, in 
    operators._integrate_plugins()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/operators/__init__.py",
 line 104, in _integrate_plugins
    from airflow.plugins_manager import operators_modules
  File "", line 983, in _find_and_load
  File "", line 967, in _find_and_load_unlocked
  File "", line 677, in _load_unlocked
{code}
 

Thread 3: thread manager server
{code:java}
Thread 0x7f0c13c7d700  File "", line 728, 
in exec_module
  File "", line 219, in _call_with_frames_removed
  File 

[jira] [Comment Edited] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-16 Thread Chris Wegrzyn (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930644#comment-16930644
 ] 

Chris Wegrzyn edited comment on AIRFLOW-5447 at 9/16/19 4:34 PM:
-

After a bit of wrestling with pyrasite and probably dumb luck, I managed to get 
what appears to be a telling stack trace:

 
{code:java}
Thread 0x7fb39d56d700
  File "/usr/local/airflow/.local/bin/airflow", line 32, in 
args.func(args)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", 
line 74, in wrapper
return f(*args, **kwargs)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/bin/cli.py", 
line 1013, in scheduler
job.run()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/base_job.py",
 line 213, in run
self._execute()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py",
 line 1350, in _execute
self._execute_helper()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py",
 line 1439, in _execute_helper
self.executor.heartbeat()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/executors/base_executor.py",
 line 132, in heartbeat
self.trigger_tasks(open_slots)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/executors/base_executor.py",
 line 156, in trigger_tasks
executor_config=simple_ti.executor_config)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py",
 line 767, in execute_async
self.task_queue.put((key, command, kube_executor_config))
  File "", line 2, in put
  File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 819, in 
_callmethod
kind, result = conn.recv()
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 250, in 
recv
buf = self._recv_bytes()
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 407, in 
_recv_bytes
buf = self._recv(4)
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 379, in 
_recv
chunk = read(handle, remaining)
  File "", line 1, in 
  File "", line 5, in {code}
It does seem like something is going wrong with the communication related to 
the put to the task_queue.

 

UPDATE: just in case it's helpful, here's the result of `pyrasite  
dump_stacks.py` for the scheduler and all of its subprocesses: 
{code:java}
Thread 0x7f0c13c7d700
  File "/usr/local/airflow/.local/bin/airflow", line 32, in 
args.func(args)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py", 
line 74, in wrapper
return f(*args, **kwargs)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/bin/cli.py", 
line 1013, in scheduler
job.run()  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/base_job.py",
 line 213, in run
self._execute()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py",
 line 1350, in _execute
self._execute_helper()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py",
 line 1439, in _execute_helper
self.executor.heartbeat()
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/executors/base_executor.py",
 line 132, in heartbeat
self.trigger_tasks(open_slots)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/executors/base_executor.py",
 line 156, in trigger_tasks
executor_config=simple_ti.executor_config)
  File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py",
 line 767, in execute_async
self.task_queue.put((key, command, kube_executor_config))
  File "", line 2, in put
  File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 819, in 
_callmethod
kind, result = conn.recv()
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 250, in 
recv
buf = self._recv_bytes()
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 407, in 
_recv_bytes
buf = self._recv(4)
  File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 379, in 
_recv
chunk = read(handle, remaining)
  File "", line 1, in 
  File "", line 5, in 
Thread 0x7f0c01c31700
  File "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)Thread 0x7f0bff430700  File 
"/usr/local/lib/python3.7/multiprocessing/managers.py", line 201, in 
handle_request
result = func(c, *args, **kwds)
  File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 422, in 

[jira] [Comment Edited] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-15 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16930224#comment-16930224
 ] 

Daniel Imberman edited comment on AIRFLOW-5447 at 9/16/19 4:21 AM:
---

[~Yuval.Itzchakov] [~cwegrzyn]Thank you guys for getting this info to us. I 
THINK this might have to do with a bug in the k8s python client which requires 
"create" and "get" privileges for "pods/exec"

[https://stackoverflow.com/questions/53827345/airflow-k8s-operator-xcom-handshake-status-403-forbidden]
 [https://github.com/kubernetes-client/python/issues/690]

The reason I believe this is that this lack of running/updating of pods point 
to a failure or the KubernetesJobWatcher. When we finally started seeing 
similar problems we were seeing these failures from the JobWatcher 
[https://user-images.githubusercontent.com/1036482/64914385-2f0eca80-d71e-11e9-8f8b-44a1c8620b92.png].
 I'm going to look into this further tomorrow and get back ASAP.


was (Author: dimberman):
[~Yuval.Itzchakov] [~cwegrzyn]Thank you guys for getting this info to us. I 
THINK this might have to do with a bug in the k8s kubernetes client which 
requires "create" and "get" privileges for "pods/exec"

[https://stackoverflow.com/questions/53827345/airflow-k8s-operator-xcom-handshake-status-403-forbidden]
[https://github.com/kubernetes-client/python/issues/690]

The reason I believe this is that this lack of running/updating of pods point 
to a failure or the KubernetesJobWatcher. When we finally started seeing 
similar problems we were seeing these failures from the JobWatcher 
[https://user-images.githubusercontent.com/1036482/64914385-2f0eca80-d71e-11e9-8f8b-44a1c8620b92.png].
 I'm going to look into this further tomorrow and get back ASAP.

> KubernetesExecutor hangs on task queueing
> -
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor-kubernetes
>Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
>Reporter: Henry Cohen
>Assignee: Daniel Imberman
>Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the 
> KubernetesExecutor, with the webserver and scheduler running in the 
> kubernetes cluster, tasks are scheduled, but when added to the task queue, 
> the executor process hangs indefinitely. Based on log messages, it appears to 
> be stuck at this line 
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-13 Thread Chris Wegrzyn (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929517#comment-16929517
 ] 

Chris Wegrzyn edited comment on AIRFLOW-5447 at 9/13/19 8:41 PM:
-

I just upgraded a deployment from 1.10.2 to 1.10.5 (also had the same issues on 
1.10.4, haven't yet tried 1.10.3) and am experiencing the same issues.

 

I've tracked it down to the same line. Judging by the log messages, we get this 
log message:

[https://github.com/apache/airflow/blob/1.10.5/airflow/contrib/executors/kubernetes_executor.py#L762]

 

But we never get this log message:

[https://github.com/apache/airflow/blob/1.10.5/airflow/executors/base_executor.py#L135]

 

Here's a slightly sanitized log: 
{code:java}
[2019-09-13 19:51:24,074] {{scheduler_job.py:1438 DEBUG - Heartbeating the 
executor}}
 [2019-09-13 19:51:24,074] {{base_executor.py:124 DEBUG - 0 running task 
instances}}
 [2019-09-13 19:51:24,074] {{base_executor.py:125 DEBUG - 0 in queue}}
 [2019-09-13 19:51:24,074] {{base_executor.py:126 DEBUG - 32 open slots}}
 [2019-09-13 19:51:24,075] {{base_executor.py:135 DEBUG - Calling the  sync 
method}}
 [2019-09-13 19:51:24,083] {{scheduler_job.py:1459 DEBUG - Ran scheduling loop 
in 0.01 seconds}}
 [2019-09-13 19:51:24,083] {{scheduler_job.py:1462 DEBUG - Sleeping for 1.00 
seconds}}
 [2019-09-13 19:51:24,087] {{settings.py:54 INFO - Configured default timezone 
}}
 [2019-09-13 19:51:24,093] {{settings.py:327 DEBUG - Failed to import 
airflow_local_settings.}}
 Traceback (most recent call last):
 {{ File 
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/settings.py", 
line 315, in import_local_settings}}
 {{ import airflow_local_settings}}
 ModuleNotFoundError: No module named 'airflow_local_settings'
 [2019-09-13 19:51:24,094] {{logging_config.py:59 DEBUG - Unable to load custom 
logging, using default config instead}}
 [2019-09-13 19:51:24,109] {{settings.py:170 DEBUG - Setting up DB connection 
pool (PID 49)}}
 [2019-09-13 19:51:24,110] {{settings.py:213 INFO - settings.configure_orm(): 
Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=49}}
 [2019-09-13 19:51:24,295] {{settings.py:238 DEBUG - Disposing DB connection 
pool (PID 55)}}
 [2019-09-13 19:51:24,380] {{settings.py:238 DEBUG - Disposing DB connection 
pool (PID 59)}}
 [2019-09-13 19:51:25,084] {{scheduler_job.py:1474 DEBUG - Sleeping for 0.99 
seconds to prevent excessive logging}}
 [2019-09-13 19:51:25,117] {{scheduler_job.py:257 DEBUG - Waiting for 
}}
 [2019-09-13 19:51:25,118] {{scheduler_job.py:257 DEBUG - Waiting for 
}}
 [2019-09-13 19:51:25,226] {{settings.py:238 DEBUG - Disposing DB connection 
pool (PID 69)}}
 [2019-09-13 19:51:25,278] {{settings.py:238 DEBUG - Disposing DB connection 
pool (PID 73)}}
 [2019-09-13 19:51:26,076] {{scheduler_job.py:1390 DEBUG - Starting Loop...}}
 [2019-09-13 19:51:26,076] {{scheduler_job.py:1401 DEBUG - Harvesting DAG 
parsing results}}
 [2019-09-13 19:51:26,076] {{dag_processing.py:637 DEBUG - Received message of 
type DagParsingStat}}
 [2019-09-13 19:51:26,077] {{dag_processing.py:637 DEBUG - Received message of 
type SimpleDag}}
 [2019-09-13 19:51:26,077] {{dag_processing.py:637 DEBUG - Received message of 
type DagParsingStat}}
 [2019-09-13 19:51:26,078] {{dag_processing.py:637 DEBUG - Received message of 
type DagParsingStat}}
 [2019-09-13 19:51:26,078] {{scheduler_job.py:1403 DEBUG - Harvested 1 
SimpleDAGs}}
 [2019-09-13 19:51:26,109] {{scheduler_job.py:921 INFO - 1 tasks up for 
execution:}}
 {{ }}
 [2019-09-13 19:51:26,122] {{scheduler_job.py:953 INFO - Figuring out tasks to 
run in Pool(name=default_pool) with 128 open slots and 1 task instances ready 
to be queued}}
 [2019-09-13 19:51:26,123] {{scheduler_job.py:981 INFO - DAG parse_log has 0/16 
running and queued tasks}}
 [2019-09-13 19:51:26,132] {{scheduler_job.py:257 DEBUG - Waiting for 
}}
 [2019-09-13 19:51:26,133] {{scheduler_job.py:257 DEBUG - Waiting for 
}}
 [2019-09-13 19:51:26,142] {{scheduler_job.py:1031 INFO - Setting the following 
tasks to queued state:}}
 {{ }}
 [2019-09-13 19:51:26,157] {{scheduler_job.py:1107 INFO - Setting the following 
1 tasks to queued state:}}
 {{ }}
 [2019-09-13 19:51:26,157] {{scheduler_job.py:1143 INFO - Sending ('parse_log', 
'xyz_parse_log_2019-09-13', datetime.datetime(2019, 9, 12, 0, 0, 
tzinfo=), 1) to executor with priority 
2 and queue default}}
 [2019-09-13 19:51:26,158] {{base_executor.py:59 INFO - Adding to queue: 
['airflow', 'run', 'parse_log', 'xyz_parse_log_2019-09-13', 
'2019-09-12T00:00:00+00:00', '--local', '--pool', 'default_pool', '-sd', 
'/usr/local/airflow/dags/xyz.py']}}
 [2019-09-13 19:51:26,158] {{scheduler_job.py:1438 DEBUG - Heartbeating the 
executor}}
 [2019-09-13 19:51:26,159] {{base_executor.py:124 DEBUG - 0 running task 
instances}}
 [2019-09-13 19:51:26,159] {{base_executor.py:125 DEBUG - 1 in queue}}
 [2019-09-13 

[jira] [Comment Edited] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-12 Thread Henry Cohen (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928788#comment-16928788
 ] 

Henry Cohen edited comment on AIRFLOW-5447 at 9/12/19 6:00 PM:
---

This is a sample of what I see when running with the example DAGs, they queue, 
but when the first one tries to start it just sits, and eventually the 
processes die and the scheduler hangs
{noformat}
[2019-09-12 17:56:03,034] kubernetes_executor.py:698 INFO - TaskInstance: 
 found in 
queued state but was not launched, rescheduling
 [2019-09-12 17:56:03,043] scheduler_job.py:1376 INFO - Resetting orphaned 
tasks for active dag runs
 [2019-09-12 17:56:03,085] base_job.py:308 INFO - Reset the following 30 
TaskInstances:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 [2019-09-12 17:56:03,092] dag_processing.py:545 INFO - Launched 
DagFileProcessorManager with pid: 35
 [2019-09-12 17:56:03,093] scheduler_job.py:1390 DEBUG - Starting Loop...
 [2019-09-12 17:56:03,093] scheduler_job.py:1401 DEBUG - Harvesting DAG parsing 
results
 [2019-09-12 17:56:03,093] scheduler_job.py:1403 DEBUG - Harvested 0 SimpleDAGs
 [2019-09-12 17:56:03,093] scheduler_job.py:1438 DEBUG - Heartbeating the 
executor
 [2019-09-12 17:56:03,093] base_executor.py:124 DEBUG - 0 running task instances
 [2019-09-12 17:56:03,094] base_executor.py:125 DEBUG - 0 in queue
 [2019-09-12 17:56:03,094] base_executor.py:126 DEBUG - 96 open slots
 [2019-09-12 17:56:03,094] base_executor.py:135 DEBUG - Calling the  sync method
 [2019-09-12 17:56:03,100] scheduler_job.py:1459 DEBUG - Ran scheduling loop in 
0.01 seconds
 [2019-09-12 17:56:03,101] scheduler_job.py:1462 DEBUG - Sleeping for 1.00 
seconds
 [2019-09-12 17:56:03,107] settings.py:54 INFO - Configured default timezone 

 [2019-09-12 17:56:03,109] settings.py:327 DEBUG - Failed to import 
airflow_local_settings.
 Traceback (most recent call last):
 File "/usr/local/lib/python3.7/site-packages/airflow/settings.py", line 315, 
in import_local_settings
 import airflow_local_settings
 ModuleNotFoundError: No module named 'airflow_local_settings'
 [2019-09-12 17:56:03,111] logging_config.py:47 INFO - Successfully imported 
user-defined logging config from log_config.LOGGING_CONFIG
 [2019-09-12 17:56:03,120] settings.py:170 DEBUG - Setting up DB connection 
pool (PID 35)
 [2019-09-12 17:56:03,121] settings.py:213 INFO - settings.configure_orm(): 
Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=35
 [2019-09-12 17:56:03,289] settings.py:238 DEBUG - Disposing DB connection pool 
(PID 45)
 [2019-09-12 17:56:03,356] settings.py:238 DEBUG - Disposing DB connection pool 
(PID 41)
 [2019-09-12 17:56:04,101] scheduler_job.py:1474 DEBUG - Sleeping for 0.99 
seconds to prevent excessive logging
 [2019-09-12 17:56:04,126] scheduler_job.py:257 DEBUG - Waiting for 

 [2019-09-12 17:56:04,127] scheduler_job.py:257 DEBUG - Waiting for 

 [2019-09-12 17:56:04,162] settings.py:238 DEBUG - Disposing DB connection pool 
(PID 55)
 [2019-09-12 17:56:04,223] settings.py:238 DEBUG - Disposing DB connection pool 
(PID 58)
 [2019-09-12 17:56:05,095] scheduler_job.py:1390 DEBUG - Starting Loop...
 [2019-09-12 17:56:05,095] scheduler_job.py:1401 DEBUG - Harvesting DAG parsing 
results
 [2019-09-12 17:56:05,097] dag_processing.py:637 DEBUG - Received message of 
type DagParsingStat
 [2019-09-12 17:56:05,098] dag_processing.py:637 DEBUG - Received message of 
type SimpleDag
 [2019-09-12 17:56:05,098] dag_processing.py:637 DEBUG - Received message of 
type SimpleDag
 [2019-09-12 17:56:05,099] dag_processing.py:637 DEBUG - Received message of 
type SimpleDag
 [2019-09-12 17:56:05,099] dag_processing.py:637 DEBUG - Received message of 
type SimpleDag
 [2019-09-12 17:56:05,100] dag_processing.py:637 DEBUG - Received message of 
type DagParsingStat
 [2019-09-12 17:56:05,101] dag_processing.py:637 DEBUG - Received message of 
type DagParsingStat
 [2019-09-12 17:56:05,101] scheduler_job.py:1403 DEBUG - Harvested 4 SimpleDAGs
 [2019-09-12 17:56:05,128] scheduler_job.py:921 INFO - 5 tasks up for execution:
 
 
 
 
 
 [2019-09-12 17:56:05,138] scheduler_job.py:953 INFO - Figuring out tasks to 
run in Pool(name=default_pool) with 128 open slots and 5 task instances ready 
to be queued
 [2019-09-12 17:56:05,139] scheduler_job.py:981 INFO - DAG 
example_subdag_operator has 0/48 running and queued tasks
 [2019-09-12 17:56:05,139] scheduler_job.py:981 INFO - DAG 
latest_only_with_trigger has 0/48 running and queued tasks
 [2019-09-12 17:56:05,139] scheduler_job.py:981 INFO - DAG 
latest_only_with_trigger has 1/48 running and queued tasks
 [2019-09-12 17:56:05,139] scheduler_job.py:981 INFO - DAG 
latest_only_with_trigger has 2/48 running and queued tasks
 [2019-09-12 17:56:05,139] scheduler_job.py:981 INFO - DAG 
latest_only_with_trigger has 3/48 running and queued tasks
 [2019-09-12 17:56:05,139] scheduler_job.py:257 

[jira] [Comment Edited] (AIRFLOW-5447) KubernetesExecutor hangs on task queueing

2019-09-12 Thread Henry Cohen (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928788#comment-16928788
 ] 

Henry Cohen edited comment on AIRFLOW-5447 at 9/12/19 5:59 PM:
---

`[2019-09-12 17:56:03,034] {{kubernetes_executor.py:698}} INFO - TaskInstance: 
 found in 
queued state but was not launched, rescheduling
 [2019-09-12 17:56:03,043] {{scheduler_job.py:1376}} INFO - Resetting orphaned 
tasks for active dag runs
 [2019-09-12 17:56:03,085] {{base_job.py:308}} INFO - Reset the following 30 
TaskInstances:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 [2019-09-12 17:56:03,092] {{dag_processing.py:545}} INFO - Launched 
DagFileProcessorManager with pid: 35
 [2019-09-12 17:56:03,093] {{scheduler_job.py:1390}} DEBUG - Starting Loop...
 [2019-09-12 17:56:03,093] {{scheduler_job.py:1401}} DEBUG - Harvesting DAG 
parsing results
 [2019-09-12 17:56:03,093] {{scheduler_job.py:1403}} DEBUG - Harvested 0 
SimpleDAGs
 [2019-09-12 17:56:03,093] {{scheduler_job.py:1438}} DEBUG - Heartbeating the 
executor
 [2019-09-12 17:56:03,093] {{base_executor.py:124}} DEBUG - 0 running task 
instances
 [2019-09-12 17:56:03,094] {{base_executor.py:125}} DEBUG - 0 in queue
 [2019-09-12 17:56:03,094] {{base_executor.py:126}} DEBUG - 96 open slots
 [2019-09-12 17:56:03,094] {{base_executor.py:135}} DEBUG - Calling the  sync method
 [2019-09-12 17:56:03,100] {{scheduler_job.py:1459}} DEBUG - Ran scheduling 
loop in 0.01 seconds
 [2019-09-12 17:56:03,101] {{scheduler_job.py:1462}} DEBUG - Sleeping for 1.00 
seconds
 [2019-09-12 17:56:03,107] {{settings.py:54}} INFO - Configured default 
timezone 
 [2019-09-12 17:56:03,109] {{settings.py:327}} DEBUG - Failed to import 
airflow_local_settings.
 Traceback (most recent call last):
 File "/usr/local/lib/python3.7/site-packages/airflow/settings.py", line 315, 
in import_local_settings
 import airflow_local_settings
 ModuleNotFoundError: No module named 'airflow_local_settings'
 [2019-09-12 17:56:03,111] {{logging_config.py:47}} INFO - Successfully 
imported user-defined logging config from log_config.LOGGING_CONFIG
 [2019-09-12 17:56:03,120] {{settings.py:170}} DEBUG - Setting up DB connection 
pool (PID 35)
 [2019-09-12 17:56:03,121] {{settings.py:213}} INFO - settings.configure_orm(): 
Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=35
 [2019-09-12 17:56:03,289] {{settings.py:238}} DEBUG - Disposing DB connection 
pool (PID 45)
 [2019-09-12 17:56:03,356] {{settings.py:238}} DEBUG - Disposing DB connection 
pool (PID 41)
 [2019-09-12 17:56:04,101] {{scheduler_job.py:1474}} DEBUG - Sleeping for 0.99 
seconds to prevent excessive logging
 [2019-09-12 17:56:04,126] {{scheduler_job.py:257}} DEBUG - Waiting for 

 [2019-09-12 17:56:04,127] {{scheduler_job.py:257}} DEBUG - Waiting for 

 [2019-09-12 17:56:04,162] {{settings.py:238}} DEBUG - Disposing DB connection 
pool (PID 55)
 [2019-09-12 17:56:04,223] {{settings.py:238}} DEBUG - Disposing DB connection 
pool (PID 58)
 [2019-09-12 17:56:05,095] {{scheduler_job.py:1390}} DEBUG - Starting Loop...
 [2019-09-12 17:56:05,095] {{scheduler_job.py:1401}} DEBUG - Harvesting DAG 
parsing results
 [2019-09-12 17:56:05,097] {{dag_processing.py:637}} DEBUG - Received message 
of type DagParsingStat
 [2019-09-12 17:56:05,098] {{dag_processing.py:637}} DEBUG - Received message 
of type SimpleDag
 [2019-09-12 17:56:05,098] {{dag_processing.py:637}} DEBUG - Received message 
of type SimpleDag
 [2019-09-12 17:56:05,099] {{dag_processing.py:637}} DEBUG - Received message 
of type SimpleDag
 [2019-09-12 17:56:05,099] {{dag_processing.py:637}} DEBUG - Received message 
of type SimpleDag
 [2019-09-12 17:56:05,100] {{dag_processing.py:637}} DEBUG - Received message 
of type DagParsingStat
 [2019-09-12 17:56:05,101] {{dag_processing.py:637}} DEBUG - Received message 
of type DagParsingStat
 [2019-09-12 17:56:05,101] {{scheduler_job.py:1403}} DEBUG - Harvested 4 
SimpleDAGs
 [2019-09-12 17:56:05,128] {{scheduler_job.py:921}} INFO - 5 tasks up for 
execution:
 
 
 
 
 
 [2019-09-12 17:56:05,138] {{scheduler_job.py:953}} INFO - Figuring out tasks 
to run in Pool(name=default_pool) with 128 open slots and 5 task instances 
ready to be queued
 [2019-09-12 17:56:05,139] {{scheduler_job.py:981}} INFO - DAG 
example_subdag_operator has 0/48 running and queued tasks
 [2019-09-12 17:56:05,139] {{scheduler_job.py:981}} INFO - DAG 
latest_only_with_trigger has 0/48 running and queued tasks
 [2019-09-12 17:56:05,139] {{scheduler_job.py:981}} INFO - DAG 
latest_only_with_trigger has 1/48 running and queued tasks
 [2019-09-12 17:56:05,139] {{scheduler_job.py:981}} INFO - DAG 
latest_only_with_trigger has 2/48 running and queued tasks
 [2019-09-12 17:56:05,139] {{scheduler_job.py:981}} INFO - DAG 
latest_only_with_trigger has 3/48 running and queued tasks
 [2019-09-12 17:56:05,139] {{scheduler_job.py:257}} DEBUG - Waiting for