[jira] [Commented] (AIRFLOW-2244) Exception importing Azure (wasb) log handler
[ https://issues.apache.org/jira/browse/AIRFLOW-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410636#comment-16410636 ] John Arnold commented on AIRFLOW-2244: -- in airflow/models.py: if 'mysql' in settings.SQL_ALCHEMY_CONN: LongText = LONGTEXT else: LongText = Text This is not guarded at all, you can't iterate settings.SQL_ALCHEMY_CONN if it's empty. You get a TypeError. > Exception importing Azure (wasb) log handler > > > Key: AIRFLOW-2244 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2244 > Project: Apache Airflow > Issue Type: Bug > Components: core >Reporter: John Arnold >Priority: Major > > Attempting to configure remote logging in azure storage, I'm getting import > exceptions trying to load the wasb handler: > Unable to load the config, contains a configuration error. > Traceback (most recent call last): > File "/usr/lib/python3.6/logging/config.py", line 382, in resolve > found = getattr(found, frag) > AttributeError: module 'airflow.utils.log' has no attribute > 'wasb_task_handler' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/usr/lib/python3.6/logging/config.py", line 558, in configure > handler = self.configure_handler(handlers[name]) > File "/usr/lib/python3.6/logging/config.py", line 708, in configure_handler > klass = self.resolve(cname) > File "/usr/lib/python3.6/logging/config.py", line 384, in resolve > self.importer(used) > File > "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/utils/log/wasb_task_handler.py", > line 18, in > from airflow.contrib.hooks.wasb_hook import WasbHook > File > "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/contrib/hooks/wasb_hook.py", > line 16, in > from airflow.hooks.base_hook import BaseHook > File > "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/hooks/base_hook.py", > line 24, in > from airflow.models import Connection > File > "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/models.py", > line 119, in > if 'mysql' in settings.SQL_ALCHEMY_CONN: > TypeError: argument of type 'NoneType' is not iterable > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/var/lib/airflow/venv/bin/airflow", line 4, in > > __import__('pkg_resources').run_script('apache-airflow==1.10.0.dev0+incubating', > 'airflow') > File > "/var/lib/airflow/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", > line 658, in run_script > self.require(requires)[0].run_script(script_name, ns) > File > "/var/lib/airflow/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", > line 1438, in run_script > exec(code, namespace, namespace) > File > "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/EGG-INFO/scripts/airflow", > line 16, in > from airflow import configuration > File > "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/__init__.py", > line 31, in > from airflow import settings > File > "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/settings.py", > line 198, in > configure_logging() > File > "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/logging_config.py", > line 76, in configure_logging > raise e > File > "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/logging_config.py", > line 71, in configure_logging > dictConfig(logging_config) > File "/usr/lib/python3.6/logging/config.py", line 795, in dictConfig > dictConfigClass(config).configure() > File "/usr/lib/python3.6/logging/config.py", line 566, in configure > '%r: %s' % (name, e)) > ValueError: Unable to configure handler 'processor': argument of type > 'NoneType' is not iterable > (venv) johnar@netdev1-westus2:~/github/incubator-airflow$ airflow scheduler > Unable to load the config, contains a configuration error. > Traceback (most recent call last): > File "/usr/lib/python3.6/logging/config.py", line 382, in resolve > found = getattr(found, frag) > AttributeError: module 'airflow.utils.log' has no attribute > 'wasb_task_handler' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/usr/lib/python3.6/logging/config.py", line 558, in configure > handler = self.configure_handler(handlers[name]) > File
[jira] [Issue Comment Deleted] (AIRFLOW-1775) Remote file handler for logging
[ https://issues.apache.org/jira/browse/AIRFLOW-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Arnold updated AIRFLOW-1775: - Comment: was deleted (was: in airflow/models.py: if 'mysql' in settings.SQL_ALCHEMY_CONN: LongText = LONGTEXT else: LongText = Text This is not guarded at all, you can't iterate settings.SQL_ALCHEMY_CONN if it's empty. You get a TypeError.) > Remote file handler for logging > --- > > Key: AIRFLOW-1775 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1775 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Reporter: Niels Zeilemaker >Priority: Major > > We're using a mounted Azure file share to store our log files. Currently, > Airflow is writing it's logs to that fileshare. However, this is making the > solution quite expensive, as you pay per action on Azure file shares. > If we would have a remote_file_task_logging handler, we could mimic the > s3_task_logging, and copy the file to the fileshare in the close method. > Reducing the cost, while still providing persistent storage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1775) Remote file handler for logging
[ https://issues.apache.org/jira/browse/AIRFLOW-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410486#comment-16410486 ] John Arnold commented on AIRFLOW-1775: -- in airflow/models.py: if 'mysql' in settings.SQL_ALCHEMY_CONN: LongText = LONGTEXT else: LongText = Text This is not guarded at all, you can't iterate settings.SQL_ALCHEMY_CONN if it's empty. You get a TypeError. > Remote file handler for logging > --- > > Key: AIRFLOW-1775 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1775 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Reporter: Niels Zeilemaker >Priority: Major > > We're using a mounted Azure file share to store our log files. Currently, > Airflow is writing it's logs to that fileshare. However, this is making the > solution quite expensive, as you pay per action on Azure file shares. > If we would have a remote_file_task_logging handler, we could mimic the > s3_task_logging, and copy the file to the fileshare in the close method. > Reducing the cost, while still providing persistent storage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2244) Exception importing Azure (wasb) log handler
John Arnold created AIRFLOW-2244: Summary: Exception importing Azure (wasb) log handler Key: AIRFLOW-2244 URL: https://issues.apache.org/jira/browse/AIRFLOW-2244 Project: Apache Airflow Issue Type: Bug Components: core Reporter: John Arnold Attempting to configure remote logging in azure storage, I'm getting import exceptions trying to load the wasb handler: Unable to load the config, contains a configuration error. Traceback (most recent call last): File "/usr/lib/python3.6/logging/config.py", line 382, in resolve found = getattr(found, frag) AttributeError: module 'airflow.utils.log' has no attribute 'wasb_task_handler' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/logging/config.py", line 558, in configure handler = self.configure_handler(handlers[name]) File "/usr/lib/python3.6/logging/config.py", line 708, in configure_handler klass = self.resolve(cname) File "/usr/lib/python3.6/logging/config.py", line 384, in resolve self.importer(used) File "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/utils/log/wasb_task_handler.py", line 18, in from airflow.contrib.hooks.wasb_hook import WasbHook File "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/contrib/hooks/wasb_hook.py", line 16, in from airflow.hooks.base_hook import BaseHook File "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/hooks/base_hook.py", line 24, in from airflow.models import Connection File "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/models.py", line 119, in if 'mysql' in settings.SQL_ALCHEMY_CONN: TypeError: argument of type 'NoneType' is not iterable During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/var/lib/airflow/venv/bin/airflow", line 4, in __import__('pkg_resources').run_script('apache-airflow==1.10.0.dev0+incubating', 'airflow') File "/var/lib/airflow/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 658, in run_script self.require(requires)[0].run_script(script_name, ns) File "/var/lib/airflow/venv/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1438, in run_script exec(code, namespace, namespace) File "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/EGG-INFO/scripts/airflow", line 16, in from airflow import configuration File "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/__init__.py", line 31, in from airflow import settings File "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/settings.py", line 198, in configure_logging() File "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/logging_config.py", line 76, in configure_logging raise e File "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/logging_config.py", line 71, in configure_logging dictConfig(logging_config) File "/usr/lib/python3.6/logging/config.py", line 795, in dictConfig dictConfigClass(config).configure() File "/usr/lib/python3.6/logging/config.py", line 566, in configure '%r: %s' % (name, e)) ValueError: Unable to configure handler 'processor': argument of type 'NoneType' is not iterable (venv) johnar@netdev1-westus2:~/github/incubator-airflow$ airflow scheduler Unable to load the config, contains a configuration error. Traceback (most recent call last): File "/usr/lib/python3.6/logging/config.py", line 382, in resolve found = getattr(found, frag) AttributeError: module 'airflow.utils.log' has no attribute 'wasb_task_handler' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/logging/config.py", line 558, in configure handler = self.configure_handler(handlers[name]) File "/usr/lib/python3.6/logging/config.py", line 708, in configure_handler klass = self.resolve(cname) File "/usr/lib/python3.6/logging/config.py", line 384, in resolve self.importer(used) File "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/utils/log/wasb_task_handler.py", line 18, in from airflow.contrib.hooks.wasb_hook import WasbHook File "/var/lib/airflow/venv/lib/python3.6/site-packages/apache_airflow-1.10.0.dev0+incubating-py3.6.egg/airflow/contrib/hooks/wasb_hook.py", line 16, in from airflow.hooks.base_hook import BaseHook File
[jira] [Resolved] (AIRFLOW-2235) Fix wrong docstrings in Transfer operators for MySQL
[ https://issues.apache.org/jira/browse/AIRFLOW-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao resolved AIRFLOW-2235. -- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3147 [https://github.com/apache/incubator-airflow/pull/3147] > Fix wrong docstrings in Transfer operators for MySQL > > > Key: AIRFLOW-2235 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2235 > Project: Apache Airflow > Issue Type: Bug > Components: docs, Documentation >Reporter: Kengo Seki >Assignee: Tao Feng >Priority: Minor > Fix For: 1.10.0 > > > Docstrings in HiveToMySqlTransfer and PrestoToMySqlTransfer says: > {code} > :param sql: SQL query to execute against the MySQL database > {code} > but actually these queries are executed against Hive and Presto respectively. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[2/2] incubator-airflow git commit: Merge pull request #3147 from feng-tao/airflow-2235
Merge pull request #3147 from feng-tao/airflow-2235 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/bd010048 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/bd010048 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/bd010048 Branch: refs/heads/master Commit: bd010048b3351a88152d0e5661c8ee99d31412fb Parents: 7e762d4 21152ef Author: Joy GaoAuthored: Thu Mar 22 14:41:59 2018 -0700 Committer: Joy Gao Committed: Thu Mar 22 14:41:59 2018 -0700 -- airflow/operators/hive_to_mysql.py | 2 +- airflow/operators/presto_to_mysql.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) --
[1/2] incubator-airflow git commit: [airflow-2235] Fix wrong docstrings in two operators
Repository: incubator-airflow Updated Branches: refs/heads/master 7e762d42d -> bd010048b [airflow-2235] Fix wrong docstrings in two operators Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/21152efb Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/21152efb Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/21152efb Branch: refs/heads/master Commit: 21152efb7d84a249e749f4bc821ebb1051e65324 Parents: 8754cb1 Author: Tao fengAuthored: Wed Mar 21 21:54:32 2018 -0700 Committer: Tao feng Committed: Wed Mar 21 21:54:32 2018 -0700 -- airflow/operators/hive_to_mysql.py | 2 +- airflow/operators/presto_to_mysql.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/21152efb/airflow/operators/hive_to_mysql.py -- diff --git a/airflow/operators/hive_to_mysql.py b/airflow/operators/hive_to_mysql.py index d2d9d0c..6962d31 100644 --- a/airflow/operators/hive_to_mysql.py +++ b/airflow/operators/hive_to_mysql.py @@ -25,7 +25,7 @@ class HiveToMySqlTransfer(BaseOperator): into memory before being pushed to MySQL, so this operator should be used for smallish amount of data. -:param sql: SQL query to execute against the MySQL database +:param sql: SQL query to execute against Hive server :type sql: str :param mysql_table: target MySQL table, use dot notation to target a specific database http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/21152efb/airflow/operators/presto_to_mysql.py -- diff --git a/airflow/operators/presto_to_mysql.py b/airflow/operators/presto_to_mysql.py index d0c323a..116391d 100644 --- a/airflow/operators/presto_to_mysql.py +++ b/airflow/operators/presto_to_mysql.py @@ -23,7 +23,7 @@ class PrestoToMySqlTransfer(BaseOperator): into memory before being pushed to MySQL, so this operator should be used for smallish amount of data. -:param sql: SQL query to execute against the MySQL database +:param sql: SQL query to execute against Presto :type sql: str :param mysql_table: target MySQL table, use dot notation to target a specific database
[jira] [Created] (AIRFLOW-2243) airflow seems to load modules multiple times
sulphide created AIRFLOW-2243: - Summary: airflow seems to load modules multiple times Key: AIRFLOW-2243 URL: https://issues.apache.org/jira/browse/AIRFLOW-2243 Project: Apache Airflow Issue Type: Bug Reporter: sulphide airflow uses the builtin imp.load_source to load modules, but the documentation (for python 2) says: [https://docs.python.org/2/library/imp.html#imp.load_source] {{imp.}}{{load_source}}(_name_, _pathname_[, _file_]) Load and initialize a module implemented as a Python source file and return its module object. If the module was already initialized, it will be initialized _again_. The _name_ argument is used to create or access a module object. The_pathname_ argument points to the source file. The _file_ argument is the source file, open for reading as text, from the beginning. It must currently be a real file object, not a user-defined class emulating a file. Note that if a properly matching byte-compiled file (with suffix {{.pyc}} or {{.pyo}}) exists, it will be used instead of parsing the given source file. this means that airflow behaves differently from a typical python program in that a module may be imported multiple times, which could have unexpected effects for those relying on the typical python import semantics. https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L300 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-1235) Odd behaviour when all gunicorn workers die
[ https://issues.apache.org/jira/browse/AIRFLOW-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-1235. - Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #2330 [https://github.com/apache/incubator-airflow/pull/2330] > Odd behaviour when all gunicorn workers die > --- > > Key: AIRFLOW-1235 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1235 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: 1.8.0 >Reporter: Erik Forsberg >Assignee: Kengo Seki >Priority: Major > Fix For: 2.0.0 > > > The webserver has sometimes stopped responding to port 443, and today I found > the issue - I had a misconfigured resolv.conf that made it unable to talk to > my postgresql. This was the root cause, but the way airflow webserver behaved > was a bit odd. > It seems that when all gunicorn workers failed to start, the gunicorn master > shut down. However, the main process (the one that starts gunicorn master) > did not shut down, so there was no way of detecting the failed status of > webserver from e.g. systemd or init script. > Full traceback leading to stale webserver process: > {noformat} > May 21 09:51:57 airmaster01 airflow[26451]: [2017-05-21 09:51:57 +] > [23794] [ERROR] Exception in worker process: > May 21 09:51:57 airmaster01 airflow[26451]: Traceback (most recent call last): > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 1122, in _do_get > May 21 09:51:57 airmaster01 airflow[26451]: return self._pool.get(wait, > self._timeout) > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/util/queue.py", > line 145, in get > May 21 09:51:57 airmaster01 airflow[26451]: raise Empty > May 21 09:51:57 airmaster01 airflow[26451]: sqlalchemy.util.queue.Empty > May 21 09:51:57 airmaster01 airflow[26451]: During handling of the above > exception, another exception occurred: > May 21 09:51:57 airmaster01 airflow[26451]: Traceback (most recent call last): > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/engine/base.py", > line 2147, in _wrap_pool_connect > May 21 09:51:57 airmaster01 airflow[26451]: return fn() > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 387, in connect > May 21 09:51:57 airmaster01 airflow[26451]: return > _ConnectionFairy._checkout(self) > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 766, in _checkout > May 21 09:51:57 airmaster01 airflow[26451]: fairy = > _ConnectionRecord.checkout(pool) > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 516, in checkout > May 21 09:51:57 airmaster01 airflow[26451]: rec = pool._do_get() > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 1138, in _do_get > May 21 09:51:57 airmaster01 airflow[26451]: self._dec_overflow() > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/util/langhelpers.py", > line 66, in __exit__ > May 21 09:51:57 airmaster01 airflow[26451]: compat.reraise(exc_type, > exc_value, exc_tb) > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/util/compat.py", > line 187, in reraise > May 21 09:51:57 airmaster01 airflow[26451]: raise value > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 1135, in _do_get > May 21 09:51:57 airmaster01 airflow[26451]: return self._create_connection() > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 333, in _create_connection > May 21 09:51:57 airmaster01 airflow[26451]: return _ConnectionRecord(self) > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 461, in __init__ > May 21 09:51:57 airmaster01 airflow[26451]: > self.__connect(first_connect_check=True) > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 651, in __connect > May 21 09:51:57 airmaster01 airflow[26451]: connection = > pool._invoke_creator(self) > May 21 09:51:57 airmaster01 airflow[26451]: File >
[jira] [Commented] (AIRFLOW-1235) Odd behaviour when all gunicorn workers die
[ https://issues.apache.org/jira/browse/AIRFLOW-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410113#comment-16410113 ] ASF subversion and git services commented on AIRFLOW-1235: -- Commit 7e762d42df50d84e4740e15c24594c50aaab53a2 in incubator-airflow's branch refs/heads/master from [~sekikn] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=7e762d4 ] [AIRFLOW-1235] Fix webserver's odd behaviour In some cases, the gunicorn master shuts down but the webserver monitor process doesn't. This PR add timeout functionality to shutdown all related processes in such cases. Dear Airflow maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Airflow JIRA] (https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-1235 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: In some cases, the gunicorn master shuts down but the webserver monitor process doesn't. This PR add timeout functionality to shutdown all related processes in such cases. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: tests.core:CliTests.test_cli_webserver_shutdown_wh en_gunicorn_master_is_killed ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git- commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" Closes #2330 from sekikn/AIRFLOW-1235 > Odd behaviour when all gunicorn workers die > --- > > Key: AIRFLOW-1235 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1235 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: 1.8.0 >Reporter: Erik Forsberg >Assignee: Kengo Seki >Priority: Major > Fix For: 2.0.0 > > > The webserver has sometimes stopped responding to port 443, and today I found > the issue - I had a misconfigured resolv.conf that made it unable to talk to > my postgresql. This was the root cause, but the way airflow webserver behaved > was a bit odd. > It seems that when all gunicorn workers failed to start, the gunicorn master > shut down. However, the main process (the one that starts gunicorn master) > did not shut down, so there was no way of detecting the failed status of > webserver from e.g. systemd or init script. > Full traceback leading to stale webserver process: > {noformat} > May 21 09:51:57 airmaster01 airflow[26451]: [2017-05-21 09:51:57 +] > [23794] [ERROR] Exception in worker process: > May 21 09:51:57 airmaster01 airflow[26451]: Traceback (most recent call last): > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 1122, in _do_get > May 21 09:51:57 airmaster01 airflow[26451]: return self._pool.get(wait, > self._timeout) > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/util/queue.py", > line 145, in get > May 21 09:51:57 airmaster01 airflow[26451]: raise Empty > May 21 09:51:57 airmaster01 airflow[26451]: sqlalchemy.util.queue.Empty > May 21 09:51:57 airmaster01 airflow[26451]: During handling of the above > exception, another exception occurred: > May 21 09:51:57 airmaster01 airflow[26451]: Traceback (most recent call last): > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/engine/base.py", > line 2147, in _wrap_pool_connect > May 21 09:51:57 airmaster01 airflow[26451]: return fn() > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 387, in connect > May 21 09:51:57 airmaster01 airflow[26451]: return > _ConnectionFairy._checkout(self) > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 766, in _checkout > May 21 09:51:57 airmaster01 airflow[26451]: fairy = > _ConnectionRecord.checkout(pool) > May 21 09:51:57 airmaster01 airflow[26451]: File >
[jira] [Commented] (AIRFLOW-1235) Odd behaviour when all gunicorn workers die
[ https://issues.apache.org/jira/browse/AIRFLOW-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410114#comment-16410114 ] ASF subversion and git services commented on AIRFLOW-1235: -- Commit 7e762d42df50d84e4740e15c24594c50aaab53a2 in incubator-airflow's branch refs/heads/master from [~sekikn] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=7e762d4 ] [AIRFLOW-1235] Fix webserver's odd behaviour In some cases, the gunicorn master shuts down but the webserver monitor process doesn't. This PR add timeout functionality to shutdown all related processes in such cases. Dear Airflow maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Airflow JIRA] (https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-1235 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: In some cases, the gunicorn master shuts down but the webserver monitor process doesn't. This PR add timeout functionality to shutdown all related processes in such cases. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: tests.core:CliTests.test_cli_webserver_shutdown_wh en_gunicorn_master_is_killed ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git- commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" Closes #2330 from sekikn/AIRFLOW-1235 > Odd behaviour when all gunicorn workers die > --- > > Key: AIRFLOW-1235 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1235 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: 1.8.0 >Reporter: Erik Forsberg >Assignee: Kengo Seki >Priority: Major > Fix For: 2.0.0 > > > The webserver has sometimes stopped responding to port 443, and today I found > the issue - I had a misconfigured resolv.conf that made it unable to talk to > my postgresql. This was the root cause, but the way airflow webserver behaved > was a bit odd. > It seems that when all gunicorn workers failed to start, the gunicorn master > shut down. However, the main process (the one that starts gunicorn master) > did not shut down, so there was no way of detecting the failed status of > webserver from e.g. systemd or init script. > Full traceback leading to stale webserver process: > {noformat} > May 21 09:51:57 airmaster01 airflow[26451]: [2017-05-21 09:51:57 +] > [23794] [ERROR] Exception in worker process: > May 21 09:51:57 airmaster01 airflow[26451]: Traceback (most recent call last): > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 1122, in _do_get > May 21 09:51:57 airmaster01 airflow[26451]: return self._pool.get(wait, > self._timeout) > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/util/queue.py", > line 145, in get > May 21 09:51:57 airmaster01 airflow[26451]: raise Empty > May 21 09:51:57 airmaster01 airflow[26451]: sqlalchemy.util.queue.Empty > May 21 09:51:57 airmaster01 airflow[26451]: During handling of the above > exception, another exception occurred: > May 21 09:51:57 airmaster01 airflow[26451]: Traceback (most recent call last): > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/engine/base.py", > line 2147, in _wrap_pool_connect > May 21 09:51:57 airmaster01 airflow[26451]: return fn() > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 387, in connect > May 21 09:51:57 airmaster01 airflow[26451]: return > _ConnectionFairy._checkout(self) > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 766, in _checkout > May 21 09:51:57 airmaster01 airflow[26451]: fairy = > _ConnectionRecord.checkout(pool) > May 21 09:51:57 airmaster01 airflow[26451]: File >
[jira] [Commented] (AIRFLOW-1235) Odd behaviour when all gunicorn workers die
[ https://issues.apache.org/jira/browse/AIRFLOW-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410112#comment-16410112 ] ASF subversion and git services commented on AIRFLOW-1235: -- Commit 7e762d42df50d84e4740e15c24594c50aaab53a2 in incubator-airflow's branch refs/heads/master from [~sekikn] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=7e762d4 ] [AIRFLOW-1235] Fix webserver's odd behaviour In some cases, the gunicorn master shuts down but the webserver monitor process doesn't. This PR add timeout functionality to shutdown all related processes in such cases. Dear Airflow maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Airflow JIRA] (https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-1235 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: In some cases, the gunicorn master shuts down but the webserver monitor process doesn't. This PR add timeout functionality to shutdown all related processes in such cases. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: tests.core:CliTests.test_cli_webserver_shutdown_wh en_gunicorn_master_is_killed ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git- commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" Closes #2330 from sekikn/AIRFLOW-1235 > Odd behaviour when all gunicorn workers die > --- > > Key: AIRFLOW-1235 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1235 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: 1.8.0 >Reporter: Erik Forsberg >Assignee: Kengo Seki >Priority: Major > Fix For: 2.0.0 > > > The webserver has sometimes stopped responding to port 443, and today I found > the issue - I had a misconfigured resolv.conf that made it unable to talk to > my postgresql. This was the root cause, but the way airflow webserver behaved > was a bit odd. > It seems that when all gunicorn workers failed to start, the gunicorn master > shut down. However, the main process (the one that starts gunicorn master) > did not shut down, so there was no way of detecting the failed status of > webserver from e.g. systemd or init script. > Full traceback leading to stale webserver process: > {noformat} > May 21 09:51:57 airmaster01 airflow[26451]: [2017-05-21 09:51:57 +] > [23794] [ERROR] Exception in worker process: > May 21 09:51:57 airmaster01 airflow[26451]: Traceback (most recent call last): > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 1122, in _do_get > May 21 09:51:57 airmaster01 airflow[26451]: return self._pool.get(wait, > self._timeout) > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/util/queue.py", > line 145, in get > May 21 09:51:57 airmaster01 airflow[26451]: raise Empty > May 21 09:51:57 airmaster01 airflow[26451]: sqlalchemy.util.queue.Empty > May 21 09:51:57 airmaster01 airflow[26451]: During handling of the above > exception, another exception occurred: > May 21 09:51:57 airmaster01 airflow[26451]: Traceback (most recent call last): > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/engine/base.py", > line 2147, in _wrap_pool_connect > May 21 09:51:57 airmaster01 airflow[26451]: return fn() > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 387, in connect > May 21 09:51:57 airmaster01 airflow[26451]: return > _ConnectionFairy._checkout(self) > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 766, in _checkout > May 21 09:51:57 airmaster01 airflow[26451]: fairy = > _ConnectionRecord.checkout(pool) > May 21 09:51:57 airmaster01 airflow[26451]: File >
incubator-airflow git commit: [AIRFLOW-1235] Fix webserver's odd behaviour
Repository: incubator-airflow Updated Branches: refs/heads/master 8c42d03c4 -> 7e762d42d [AIRFLOW-1235] Fix webserver's odd behaviour In some cases, the gunicorn master shuts down but the webserver monitor process doesn't. This PR add timeout functionality to shutdown all related processes in such cases. Dear Airflow maintainers, Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below! ### JIRA - [x] My PR addresses the following [Airflow JIRA] (https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-1235 ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: In some cases, the gunicorn master shuts down but the webserver monitor process doesn't. This PR add timeout functionality to shutdown all related processes in such cases. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: tests.core:CliTests.test_cli_webserver_shutdown_wh en_gunicorn_master_is_killed ### Commits - [x] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git- commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" Closes #2330 from sekikn/AIRFLOW-1235 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/7e762d42 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/7e762d42 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/7e762d42 Branch: refs/heads/master Commit: 7e762d42df50d84e4740e15c24594c50aaab53a2 Parents: 8c42d03 Author: Kengo SekiAuthored: Thu Mar 22 11:50:27 2018 -0700 Committer: Arthur Wiedmer Committed: Thu Mar 22 11:50:27 2018 -0700 -- airflow/bin/cli.py | 129 +- airflow/config_templates/default_airflow.cfg | 3 + airflow/exceptions.py| 4 + tests/core.py| 10 ++ 4 files changed, 91 insertions(+), 55 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/7e762d42/airflow/bin/cli.py -- diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py index 449d8ca..1801cc7 100755 --- a/airflow/bin/cli.py +++ b/airflow/bin/cli.py @@ -46,7 +46,7 @@ import airflow from airflow import api from airflow import jobs, settings from airflow import configuration as conf -from airflow.exceptions import AirflowException +from airflow.exceptions import AirflowException, AirflowWebServerTimeout from airflow.executors import GetDefaultExecutor from airflow.models import (DagModel, DagBag, TaskInstance, DagPickle, DagRun, Variable, DagStat, @@ -592,7 +592,12 @@ def get_num_ready_workers_running(gunicorn_master_proc): return len(ready_workers) -def restart_workers(gunicorn_master_proc, num_workers_expected): +def get_num_workers_running(gunicorn_master_proc): +workers = psutil.Process(gunicorn_master_proc.pid).children() +return len(workers) + + +def restart_workers(gunicorn_master_proc, num_workers_expected, master_timeout): """ Runs forever, monitoring the child processes of @gunicorn_master_proc and restarting workers occasionally. @@ -618,17 +623,18 @@ def restart_workers(gunicorn_master_proc, num_workers_expected): gracefully and that the oldest worker is terminated. """ -def wait_until_true(fn): +def wait_until_true(fn, timeout=0): """ Sleeps until fn is true """ +t = time.time() while not fn(): +if 0 < timeout and timeout <= time.time() - t: +raise AirflowWebServerTimeout( +"No response from gunicorn master within {0} seconds" +.format(timeout)) time.sleep(0.1) -def get_num_workers_running(gunicorn_master_proc): -workers = psutil.Process(gunicorn_master_proc.pid).children() -return len(workers) - def start_refresh(gunicorn_master_proc): batch_size = conf.getint('webserver', 'worker_refresh_batch_size') log.debug('%s doing a refresh of %s workers', state,
[jira] [Closed] (AIRFLOW-2241) Set pool slot to 0 if the user specified pool doesn't exist
[ https://issues.apache.org/jira/browse/AIRFLOW-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Feng closed AIRFLOW-2241. - Resolution: Duplicate > Set pool slot to 0 if the user specified pool doesn't exist > --- > > Key: AIRFLOW-2241 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2241 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Tao Feng >Assignee: Tao Feng >Priority: Major > > Currently if the pool for the user specified task_instance doesn't exist in > meta-database, it will throw an exception during > _find_executable_task_instances. We should skip that task instead of throwing > an exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2241) Set pool slot to 0 if the user specified pool doesn't exist
[ https://issues.apache.org/jira/browse/AIRFLOW-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409924#comment-16409924 ] Tao Feng commented on AIRFLOW-2241: --- The issue is fixed in AIRFLOW-1157. Close the issue for now. > Set pool slot to 0 if the user specified pool doesn't exist > --- > > Key: AIRFLOW-2241 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2241 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Tao Feng >Assignee: Tao Feng >Priority: Major > > Currently if the pool for the user specified task_instance doesn't exist in > meta-database, it will throw an exception during > _find_executable_task_instances. We should skip that task instead of throwing > an exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-1067) Should not use airf...@airflow.com in examples
[ https://issues.apache.org/jira/browse/AIRFLOW-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved AIRFLOW-1067. Resolution: Fixed > Should not use airf...@airflow.com in examples > -- > > Key: AIRFLOW-1067 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1067 > Project: Apache Airflow > Issue Type: Bug >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng >Priority: Minor > > airflow.com is owned by a company named Airflow (selling fans, etc). We > should use airf...@example.com in all examples. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2231) DAG with a relativedelta schedule_interval fails
[ https://issues.apache.org/jira/browse/AIRFLOW-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409452#comment-16409452 ] Kyle Brooks commented on AIRFLOW-2231: -- Thanks Joy, I was able to get relativedelta to work by making some slight changes to the model.py code. I do think there's a use case for using relativedelta instead of cron syntax because it is more powerful. I will submit a PR if and when I get approval to share the modifications. Regarding not being able to reproduce the bug. I would guess that the version of dateutil.relativedelta.relativedelta that you have implements __hash__ and so the class DAG __init__ does not fail like mine. I have seen the same issue where the DAG will never be scheduled for classes that are hashable due to issues in the section of code you identified in your link above. > DAG with a relativedelta schedule_interval fails > > > Key: AIRFLOW-2231 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2231 > Project: Apache Airflow > Issue Type: Bug > Components: DAG >Reporter: Kyle Brooks >Priority: Major > Attachments: test_reldel.py > > > The documentation for the DAG class says using > dateutil.relativedelta.relativedelta as a schedule_interval is supported but > it fails: > > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 285, > in process_file > m = imp.load_source(mod_name, filepath) > File > "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/imp.py", > line 172, in load_source > module = _load(spec) > File "", line 675, in _load > File "", line 655, in _load_unlocked > File "", line 678, in exec_module > File "", line 205, in _call_with_frames_removed > File "/Users/k398995/airflow/dags/test_reldel.py", line 33, in > dagrun_timeout=timedelta(minutes=60)) > File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 2914, > in __init__ > if schedule_interval in cron_presets: > TypeError: unhashable type: 'relativedelta' > > It looks like the __init__ function for class DAG assumes the > schedule_interval is hashable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2242) Add Gousto to companies list
Dejan created AIRFLOW-2242: -- Summary: Add Gousto to companies list Key: AIRFLOW-2242 URL: https://issues.apache.org/jira/browse/AIRFLOW-2242 Project: Apache Airflow Issue Type: Task Reporter: Dejan Assignee: Dejan -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2241) Set pool slot to 0 if the user specified pool doesn't exist
[ https://issues.apache.org/jira/browse/AIRFLOW-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Feng updated AIRFLOW-2241: -- Summary: Set pool slot to 0 if the user specified pool doesn't exist (was: Set pool slot to 0 if user specifies a pool doesn't exist) > Set pool slot to 0 if the user specified pool doesn't exist > --- > > Key: AIRFLOW-2241 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2241 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Tao Feng >Assignee: Tao Feng >Priority: Major > > Currently if the pool for the user specified task_instance doesn't exist in > meta-database, it will throw an exception during > _find_executable_task_instances. We should skip that task instead of throwing > an exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-1460) Tasks get stuck in the "removed" state
[ https://issues.apache.org/jira/browse/AIRFLOW-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxime Beauchemin resolved AIRFLOW-1460. Resolution: Fixed Assignee: Maxime Beauchemin (was: George Leslie-Waksman) > Tasks get stuck in the "removed" state > -- > > Key: AIRFLOW-1460 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1460 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Assignee: Maxime Beauchemin >Priority: Major > Fix For: 1.9.1 > > > The current handling of task instances that get assigned the state "removed" > is that they get ignored. > If the underlying task later gets re-added, the existing task instances will > prevent the task from running in the future. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1460) Tasks get stuck in the "removed" state
[ https://issues.apache.org/jira/browse/AIRFLOW-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409138#comment-16409138 ] ASF subversion and git services commented on AIRFLOW-1460: -- Commit 8c42d03c4e35a0046e46f0e2e6db588702ee7e8b in incubator-airflow's branch refs/heads/master from [~tronbabylove] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=8c42d03 ] [AIRFLOW-1460] Allow restoration of REMOVED TI's When a task instance exists in the database but its corresponding task no longer exists in the DAG, the scheduler marks the task instance as REMOVED. Once removed, task instances stayed removed forever, even if the task were to be added back to the DAG. This change allows for the restoration of REMOVED task instances. If a task instance is in state REMOVED but the corresponding task is present in the DAG, restore the task instance by setting its state to NONE. A new unit test simulates the removal and restoration of a task from a DAG and verifies that the task instance is restored: `./run_unit_tests.sh tests.models:DagRunTest` JIRA: https://issues.apache.org/jira/browse/AIRFLOW-1460 Closes #3137 from astahlman/airflow-1460-restore- tis > Tasks get stuck in the "removed" state > -- > > Key: AIRFLOW-1460 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1460 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Assignee: George Leslie-Waksman >Priority: Major > Fix For: 1.9.1 > > > The current handling of task instances that get assigned the state "removed" > is that they get ignored. > If the underlying task later gets re-added, the existing task instances will > prevent the task from running in the future. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-1460] Allow restoration of REMOVED TI's
Repository: incubator-airflow Updated Branches: refs/heads/master 8754cb1c6 -> 8c42d03c4 [AIRFLOW-1460] Allow restoration of REMOVED TI's When a task instance exists in the database but its corresponding task no longer exists in the DAG, the scheduler marks the task instance as REMOVED. Once removed, task instances stayed removed forever, even if the task were to be added back to the DAG. This change allows for the restoration of REMOVED task instances. If a task instance is in state REMOVED but the corresponding task is present in the DAG, restore the task instance by setting its state to NONE. A new unit test simulates the removal and restoration of a task from a DAG and verifies that the task instance is restored: `./run_unit_tests.sh tests.models:DagRunTest` JIRA: https://issues.apache.org/jira/browse/AIRFLOW-1460 Closes #3137 from astahlman/airflow-1460-restore- tis Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/8c42d03c Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/8c42d03c Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/8c42d03c Branch: refs/heads/master Commit: 8c42d03c4e35a0046e46f0e2e6db588702ee7e8b Parents: 8754cb1 Author: Andrew StahlmanAuthored: Wed Mar 21 23:54:05 2018 -0700 Committer: Maxime Beauchemin Committed: Wed Mar 21 23:54:05 2018 -0700 -- airflow/models.py | 21 ++--- tests/models.py | 24 2 files changed, 42 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8c42d03c/airflow/models.py -- diff --git a/airflow/models.py b/airflow/models.py index c1b608a..aa10ad5 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -4905,16 +4905,31 @@ class DagRun(Base, LoggingMixin): dag = self.get_dag() tis = self.get_task_instances(session=session) -# check for removed tasks +# check for removed or restored tasks task_ids = [] for ti in tis: task_ids.append(ti.task_id) +task = None try: -dag.get_task(ti.task_id) +task = dag.get_task(ti.task_id) except AirflowException: -if self.state is not State.RUNNING and not dag.partial: +if ti.state == State.REMOVED: +pass # ti has already been removed, just ignore it +elif self.state is not State.RUNNING and not dag.partial: +self.log.warning("Failed to get task '{}' for dag '{}'. " + "Marking it as removed.".format(ti, dag)) +Stats.incr( +"task_removed_from_dag.{}".format(dag.dag_id), 1, 1) ti.state = State.REMOVED +is_task_in_dag = task is not None +should_restore_task = is_task_in_dag and ti.state == State.REMOVED +if should_restore_task: +self.log.info("Restoring task '{}' which was previously " + "removed from DAG '{}'".format(ti, dag)) +Stats.incr("task_restored_to_dag.{}".format(dag.dag_id), 1, 1) +ti.state = State.NONE + # check for missing tasks for task in six.itervalues(dag.task_dict): if task.adhoc: http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8c42d03c/tests/models.py -- diff --git a/tests/models.py b/tests/models.py index 5d8184c..98913af 100644 --- a/tests/models.py +++ b/tests/models.py @@ -892,6 +892,30 @@ class DagRunTest(unittest.TestCase): self.assertTrue(dagrun.is_backfill) self.assertFalse(dagrun2.is_backfill) +def test_removed_task_instances_can_be_restored(self): +def with_all_tasks_removed(dag): +return DAG(dag_id=dag.dag_id, start_date=dag.start_date) + +dag = DAG('test_task_restoration', start_date=DEFAULT_DATE) +dag.add_task(DummyOperator(task_id='flaky_task', owner='test')) + +dagrun = self.create_dag_run(dag) +flaky_ti = dagrun.get_task_instances()[0] +self.assertEquals('flaky_task', flaky_ti.task_id) +self.assertEquals(State.NONE, flaky_ti.state) + +dagrun.dag = with_all_tasks_removed(dag) + +dagrun.verify_integrity() +flaky_ti.refresh_from_db() +self.assertEquals(State.REMOVED, flaky_ti.state) + +dagrun.dag.add_task(DummyOperator(task_id='flaky_task', owner='test')) + +dagrun.verify_integrity() +flaky_ti.refresh_from_db() +
[jira] [Commented] (AIRFLOW-1460) Tasks get stuck in the "removed" state
[ https://issues.apache.org/jira/browse/AIRFLOW-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409139#comment-16409139 ] ASF subversion and git services commented on AIRFLOW-1460: -- Commit 8c42d03c4e35a0046e46f0e2e6db588702ee7e8b in incubator-airflow's branch refs/heads/master from [~tronbabylove] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=8c42d03 ] [AIRFLOW-1460] Allow restoration of REMOVED TI's When a task instance exists in the database but its corresponding task no longer exists in the DAG, the scheduler marks the task instance as REMOVED. Once removed, task instances stayed removed forever, even if the task were to be added back to the DAG. This change allows for the restoration of REMOVED task instances. If a task instance is in state REMOVED but the corresponding task is present in the DAG, restore the task instance by setting its state to NONE. A new unit test simulates the removal and restoration of a task from a DAG and verifies that the task instance is restored: `./run_unit_tests.sh tests.models:DagRunTest` JIRA: https://issues.apache.org/jira/browse/AIRFLOW-1460 Closes #3137 from astahlman/airflow-1460-restore- tis > Tasks get stuck in the "removed" state > -- > > Key: AIRFLOW-1460 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1460 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Assignee: George Leslie-Waksman >Priority: Major > Fix For: 1.9.1 > > > The current handling of task instances that get assigned the state "removed" > is that they get ignored. > If the underlying task later gets re-added, the existing task instances will > prevent the task from running in the future. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-1460) Tasks get stuck in the "removed" state
[ https://issues.apache.org/jira/browse/AIRFLOW-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxime Beauchemin updated AIRFLOW-1460: --- Fix Version/s: 1.9.1 > Tasks get stuck in the "removed" state > -- > > Key: AIRFLOW-1460 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1460 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Assignee: George Leslie-Waksman >Priority: Major > Fix For: 1.9.1 > > > The current handling of task instances that get assigned the state "removed" > is that they get ignored. > If the underlying task later gets re-added, the existing task instances will > prevent the task from running in the future. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1104) Concurrency check in scheduler should count queued tasks as well as running
[ https://issues.apache.org/jira/browse/AIRFLOW-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Feng reassigned AIRFLOW-1104: - Assignee: Tao Feng > Concurrency check in scheduler should count queued tasks as well as running > --- > > Key: AIRFLOW-1104 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1104 > Project: Apache Airflow > Issue Type: Bug > Environment: see https://github.com/apache/incubator-airflow/pull/2221 > "Tasks with the QUEUED state should also be counted below, but for now we > cannot count them. This is because there is no guarantee that queued tasks in > failed dagruns will or will not eventually run and queued tasks that will > never run will consume slots and can stall a DAG. Once we can guarantee that > all queued tasks in failed dagruns will never run (e.g. make sure that all > running/newly queued TIs have running dagruns), then we can include QUEUED > tasks here, with the constraint that they are in running dagruns." >Reporter: Alex Guziel >Assignee: Tao Feng >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)