[jira] [Closed] (AIRFLOW-1798) Include celery ssl configs in default template
[ https://issues.apache.org/jira/browse/AIRFLOW-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Huang closed AIRFLOW-1798. - Resolution: Done resolved by AIRFLOW-1840 > Include celery ssl configs in default template > -- > > Key: AIRFLOW-1798 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1798 > Project: Apache Airflow > Issue Type: Improvement > Components: configuration >Reporter: Daniel Huang >Assignee: Daniel Huang >Priority: Trivial > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work started] (AIRFLOW-1798) Include celery ssl configs in default template
[ https://issues.apache.org/jira/browse/AIRFLOW-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-1798 started by Daniel Huang. - > Include celery ssl configs in default template > -- > > Key: AIRFLOW-1798 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1798 > Project: Apache Airflow > Issue Type: Improvement > Components: configuration >Reporter: Daniel Huang >Assignee: Daniel Huang >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1798) Include celery ssl configs in default template
Daniel Huang created AIRFLOW-1798: - Summary: Include celery ssl configs in default template Key: AIRFLOW-1798 URL: https://issues.apache.org/jira/browse/AIRFLOW-1798 Project: Apache Airflow Issue Type: Improvement Components: configuration Reporter: Daniel Huang Assignee: Daniel Huang Priority: Trivial -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-1559) MySQL warnings about aborted connections, missing engine disposal
[ https://issues.apache.org/jira/browse/AIRFLOW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244805#comment-16244805 ] Daniel Huang commented on AIRFLOW-1559: --- [~jeffreybian] I added engine dispose calls in a few spots, including the ones you mentioned. Please take a look! https://github.com/apache/incubator-airflow/pull/2767 > MySQL warnings about aborted connections, missing engine disposal > - > > Key: AIRFLOW-1559 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1559 > Project: Apache Airflow > Issue Type: Bug > Components: db >Reporter: Daniel Huang >Assignee: Daniel Huang >Priority: Minor > > We're seeing a flood of warnings about aborted connections in our MySQL logs. > {code} > Aborted connection 56720 to db: 'airflow' user: 'foo' host: 'x.x.x.x' (Got an > error reading communication packets) > {code} > It appears this is because we're not performing [engine > disposal|http://docs.sqlalchemy.org/en/latest/core/connections.html#engine-disposal]. > The most common source of this warning is from the scheduler, when it kicks > off new processes to process the DAG files. Calling dispose in > https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L403 > greatly reduced these messages. However, the worker is still causing some of > these, I assume from when we spin up processes to run tasks. We do call > dispose in > https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1394-L1396, > but I think it's a bit early. Not sure if there's a place we can put this > cleanup to ensure it's done everywhere. > Quick script to reproduce this warning message: > {code} > from airflow import settings > from airflow.models import Connection > session = settings.Session() > session.query(Connection).count() > session.close() > # not calling settings.engine.dispose() > {code} > Reproduced with Airflow 1.8.1, MySQL 5.7, and SQLAlchemy 1.1.13. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (AIRFLOW-1559) MySQL warnings about aborted connections, missing engine disposal
[ https://issues.apache.org/jira/browse/AIRFLOW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Huang reassigned AIRFLOW-1559: - Assignee: Daniel Huang > MySQL warnings about aborted connections, missing engine disposal > - > > Key: AIRFLOW-1559 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1559 > Project: Apache Airflow > Issue Type: Bug > Components: db >Reporter: Daniel Huang >Assignee: Daniel Huang >Priority: Minor > > We're seeing a flood of warnings about aborted connections in our MySQL logs. > {code} > Aborted connection 56720 to db: 'airflow' user: 'foo' host: 'x.x.x.x' (Got an > error reading communication packets) > {code} > It appears this is because we're not performing [engine > disposal|http://docs.sqlalchemy.org/en/latest/core/connections.html#engine-disposal]. > The most common source of this warning is from the scheduler, when it kicks > off new processes to process the DAG files. Calling dispose in > https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L403 > greatly reduced these messages. However, the worker is still causing some of > these, I assume from when we spin up processes to run tasks. We do call > dispose in > https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1394-L1396, > but I think it's a bit early. Not sure if there's a place we can put this > cleanup to ensure it's done everywhere. > Quick script to reproduce this warning message: > {code} > from airflow import settings > from airflow.models import Connection > session = settings.Session() > session.query(Connection).count() > session.close() > # not calling settings.engine.dispose() > {code} > Reproduced with Airflow 1.8.1, MySQL 5.7, and SQLAlchemy 1.1.13. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (AIRFLOW-1559) MySQL warnings about aborted connections, missing engine disposal
[ https://issues.apache.org/jira/browse/AIRFLOW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-1559 started by Daniel Huang. - > MySQL warnings about aborted connections, missing engine disposal > - > > Key: AIRFLOW-1559 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1559 > Project: Apache Airflow > Issue Type: Bug > Components: db >Reporter: Daniel Huang >Assignee: Daniel Huang >Priority: Minor > > We're seeing a flood of warnings about aborted connections in our MySQL logs. > {code} > Aborted connection 56720 to db: 'airflow' user: 'foo' host: 'x.x.x.x' (Got an > error reading communication packets) > {code} > It appears this is because we're not performing [engine > disposal|http://docs.sqlalchemy.org/en/latest/core/connections.html#engine-disposal]. > The most common source of this warning is from the scheduler, when it kicks > off new processes to process the DAG files. Calling dispose in > https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L403 > greatly reduced these messages. However, the worker is still causing some of > these, I assume from when we spin up processes to run tasks. We do call > dispose in > https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1394-L1396, > but I think it's a bit early. Not sure if there's a place we can put this > cleanup to ensure it's done everywhere. > Quick script to reproduce this warning message: > {code} > from airflow import settings > from airflow.models import Connection > session = settings.Session() > session.query(Connection).count() > session.close() > # not calling settings.engine.dispose() > {code} > Reproduced with Airflow 1.8.1, MySQL 5.7, and SQLAlchemy 1.1.13. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (AIRFLOW-1794) No Exception.message in Python 3
[ https://issues.apache.org/jira/browse/AIRFLOW-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-1794 started by Daniel Huang. - > No Exception.message in Python 3 > > > Key: AIRFLOW-1794 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1794 > Project: Apache Airflow > Issue Type: Bug >Reporter: Daniel Huang >Assignee: Daniel Huang >Priority: Minor > > [~ashb] ran into this > {code} > Traceback (most recent call last): > File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1988, in > wsgi_app > response = self.full_dispatch_request() > File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1641, in > full_dispatch_request > rv = self.handle_user_exception(e) > File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1544, in > handle_user_exception > reraise(exc_type, exc_value, tb) > File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 33, in > reraise > raise value > File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1639, in > full_dispatch_request > rv = self.dispatch_request() > File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1625, in > dispatch_request > return self.view_functions[rule.endpoint](**req.view_args) > File "/usr/local/lib/python3.5/dist-packages/flask_admin/base.py", line 69, > in inner > return self._run_view(f, *args, **kwargs) > File "/usr/local/lib/python3.5/dist-packages/flask_admin/base.py", line > 368, in _run_view > return fn(self, *args, **kwargs) > File "/usr/local/lib/python3.5/dist-packages/flask_login.py", line 758, in > decorated_view > return func(*args, **kwargs) > File "/usr/local/lib/python3.5/dist-packages/airflow/www/utils.py", line > 262, in wrapper > return f(*args, **kwargs) > File "/usr/local/lib/python3.5/dist-packages/airflow/www/views.py", line > 715, in log > .format(task_log_reader, e.message)] > AttributeError: 'AttributeError' object has no attribute 'message' > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1794) No Exception.message in Python 3
Daniel Huang created AIRFLOW-1794: - Summary: No Exception.message in Python 3 Key: AIRFLOW-1794 URL: https://issues.apache.org/jira/browse/AIRFLOW-1794 Project: Apache Airflow Issue Type: Bug Reporter: Daniel Huang Assignee: Daniel Huang Priority: Minor [~ashb] ran into this {code} Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1988, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1641, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1544, in handle_user_exception reraise(exc_type, exc_value, tb) File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 33, in reraise raise value File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1639, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1625, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/usr/local/lib/python3.5/dist-packages/flask_admin/base.py", line 69, in inner return self._run_view(f, *args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/flask_admin/base.py", line 368, in _run_view return fn(self, *args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/flask_login.py", line 758, in decorated_view return func(*args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/airflow/www/utils.py", line 262, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/airflow/www/views.py", line 715, in log .format(task_log_reader, e.message)] AttributeError: 'AttributeError' object has no attribute 'message' {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-1463) Scheduler does not reschedule tasks in QUEUED state
[ https://issues.apache.org/jira/browse/AIRFLOW-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16173844#comment-16173844 ] Daniel Huang commented on AIRFLOW-1463: --- Hitting this as well due to deployments and one of my DAGs intermittently hitting the {{DAGBAG_IMPORT_TIMEOUT}}. Just restarting the scheduler does not get my tasks out of the QUEUED state. I've had to delete the task instance so it the scheduler re-queues the task with an increased timeout. > Scheduler does not reschedule tasks in QUEUED state > --- > > Key: AIRFLOW-1463 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1463 > Project: Apache Airflow > Issue Type: Improvement > Components: cli > Environment: Ubuntu 14.04 > Airflow 1.8.0 > SQS backed task queue, AWS RDS backed meta storage > DAG folder is synced by script on code push: archive is downloaded from s3, > unpacked, moved, install script is run. airflow executable is replaced with > symlink pointing to the latest version of code, no airflow processes are > restarted. >Reporter: Stanislav Pak >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > Our pipelines related code is deployed almost simultaneously on all airflow > boxes: scheduler+webserver box, workers boxes. Some common python package is > deployed on those boxes on every other code push (3-5 deployments per hour). > Due to installation specifics, a DAG that imports module from that package > might fail. If DAG import fails when worker runs a task, the task is still > removed from the queue but task state is not changed, so in this case the > task stays in QUEUED state forever. > Beside the described case, there is scenario when it happens because of DAG > update lag in scheduler. A task can be scheduled with old DAG and worker can > run the task with new DAG that fails to be imported. > There might be other scenarios when it happens. > Proposal: > Catch errors when importing DAG on task run and clear task instance state if > import fails. This should fix transient issues of this kind. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (AIRFLOW-1627) SubDagOperator initialization should only query pools when necessary
[ https://issues.apache.org/jira/browse/AIRFLOW-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-1627 started by Daniel Huang. - > SubDagOperator initialization should only query pools when necessary > > > Key: AIRFLOW-1627 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1627 > Project: Apache Airflow > Issue Type: Improvement > Components: operators, subdag >Reporter: Daniel Huang >Assignee: Daniel Huang >Priority: Minor > > If a SubDagOperator is assigned to a pool, it queries db for pool info to > ensure there is no pool conflict with one of its tasks when only 1 slot > remains. However, we should check that there's a possible conflict (a task in > the subdag is in the same pool as the subdag) before actually querying for > pools. > I have a DAG with hundreds of subdags and I found that the pool conflict > check was taking up a fair chunk of time when processing the DAG file. > Relevant code: > https://github.com/apache/incubator-airflow/blob/a81c153cc48e4c99a9e0a5047990b84c5d07e3cb/airflow/operators/subdag_operator.py#L60-L81 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1627) SubDagOperator initialization should only query pools when necessary
Daniel Huang created AIRFLOW-1627: - Summary: SubDagOperator initialization should only query pools when necessary Key: AIRFLOW-1627 URL: https://issues.apache.org/jira/browse/AIRFLOW-1627 Project: Apache Airflow Issue Type: Improvement Components: operators, subdag Reporter: Daniel Huang Assignee: Daniel Huang Priority: Minor If a SubDagOperator is assigned to a pool, it queries db for pool info to ensure there is no pool conflict with one of its tasks when only 1 slot remains. However, we should check that there's a possible conflict (a task in the subdag is in the same pool as the subdag) before actually querying for pools. I have a DAG with hundreds of subdags and I found that the pool conflict check was taking up a fair chunk of time when processing the DAG file. Relevant code: https://github.com/apache/incubator-airflow/blob/a81c153cc48e4c99a9e0a5047990b84c5d07e3cb/airflow/operators/subdag_operator.py#L60-L81 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1559) MySQL warnings about aborted connections, missing engine disposal
Daniel Huang created AIRFLOW-1559: - Summary: MySQL warnings about aborted connections, missing engine disposal Key: AIRFLOW-1559 URL: https://issues.apache.org/jira/browse/AIRFLOW-1559 Project: Apache Airflow Issue Type: Bug Components: db Reporter: Daniel Huang Priority: Minor We're seeing a flood of warnings about aborted connections in our MySQL logs. {code} Aborted connection 56720 to db: 'airflow' user: 'foo' host: 'x.x.x.x' (Got an error reading communication packets) {code} It appears this is because we're not performing [engine disposal|http://docs.sqlalchemy.org/en/latest/core/connections.html#engine-disposal]. The most common source of this warning is from the scheduler, when it kicks off new processes to process the DAG files. Calling dispose in https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L403 greatly reduced these messages. However, the worker is still causing some of these, I assume from when we spin up processes to run tasks. We do call dispose in https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1394-L1396, but I think it's a bit early. Not sure if there's a place we can put this cleanup to ensure it's done everywhere. Quick script to reproduce this warning message: {code} from airflow import settings from airflow.models import Connection session = settings.Session() session.query(Connection).count() session.close() # not calling settings.engine.dispose() {code} Reproduced with Airflow 1.8.1, MySQL 5.7, and SQLAlchemy 1.1.13. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1538) DAG prematurely marked as failed before downstream tasks are marked as upstream_failed
Daniel Huang created AIRFLOW-1538: - Summary: DAG prematurely marked as failed before downstream tasks are marked as upstream_failed Key: AIRFLOW-1538 URL: https://issues.apache.org/jira/browse/AIRFLOW-1538 Project: Apache Airflow Issue Type: Bug Components: scheduler Reporter: Daniel Huang Priority: Minor When a task fails, I'm seeing that only sometimes do its downstream tasks get marked as upstream_failed. Most of the time the downstream tasks are left with null state. While the DAG run is still correctly marked as failed, I have some tasks with trigger rule all_failed/one_failed further down that don't get executed because the failure isn't propagated down. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1537) SubDagOperator does not retry properly
Daniel Huang created AIRFLOW-1537: - Summary: SubDagOperator does not retry properly Key: AIRFLOW-1537 URL: https://issues.apache.org/jira/browse/AIRFLOW-1537 Project: Apache Airflow Issue Type: Bug Components: backfill, scheduler, subdag Reporter: Daniel Huang Since AIRFLOW-1124, we only backfill tasks with no state. While that makes sense for a backfill where we will use in combination with clear, it doesn't quite work in the context of a sub DAG that is set to retry. I'd expect failed tasks to be rerun at least, but there can be some other side effects to doing that in other cases. [~artwr] mentioned this in https://github.com/apache/incubator-airflow/pull/2247/files#r112189191. Any thoughts on how to handle this [~bolke]? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-1400) catchup=False caused exception
[ https://issues.apache.org/jira/browse/AIRFLOW-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082309#comment-16082309 ] Daniel Huang commented on AIRFLOW-1400: --- What's your DAG's {{schedule_interval}}? > catchup=False caused exception > -- > > Key: AIRFLOW-1400 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1400 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.8 >Reporter: Xi Wang > > When I set of the task with catchup=False, it threw a error as > follow(logs/scheduler/my_dag_name): > [2017-07-10 15:13:12,534] {jobs.py:354} DagFileProcessor373 ERROR - Got an > exception! Propagating... > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 346, in helper > pickle_dags) > File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 53, in > wrapper > result = func(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 1581, in > process_file > self._process_dags(dagbag, dags, ti_keys_to_schedule) > File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 1171, in > _process_dags > dag_run = self.create_dag_run(dag) > File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 53, in > wrapper > result = func(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 776, in > create_dag_run > if next_start <= now: > TypeError: can't compare datetime.datetime to NoneType > It seems next_start was not defined properly, > (https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L777) > Any help is appreciated. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-1355) Unable to launch jobs : DAGs not being executed.
[ https://issues.apache.org/jira/browse/AIRFLOW-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067650#comment-16067650 ] Daniel Huang commented on AIRFLOW-1355: --- If the DAG is in Running status, what state are the tasks in? Queued or Running? > Unable to launch jobs : DAGs not being executed. > > > Key: AIRFLOW-1355 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1355 > Project: Apache Airflow > Issue Type: Bug > Components: DAG >Affects Versions: Airflow 2.0 > Environment: Mac OS and Ubuntu >Reporter: Pavan KN > > Steps to re-produce: > 1. Create new installation > 2. Launch Airflow > 3. Enable a DAG and trigger it manually > DAG/Job won't get executed. Will stay in Running status, but no execution > starts and continues to stay at same status. > Same issues is there with Sequential, Local and Celeri executors. > Happening in 2.0 version. Tried on multiple Mac machines and on Ubuntu. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-1328) Daily DAG execute the past day
[ https://issues.apache.org/jira/browse/AIRFLOW-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056292#comment-16056292 ] Daniel Huang commented on AIRFLOW-1328: --- >From the [docs|https://airflow.incubator.apache.org/scheduler.html]: {quote}Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended. Let’s Repeat That The scheduler runs your job one schedule_interval AFTER the start date, at the END of the period. {quote} This means in your case, the DAG with execution date 06-19T15:00:00 is not expected to run until 06-20T07:00:00 (the next scheduled interval). > Daily DAG execute the past day > -- > > Key: AIRFLOW-1328 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1328 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun >Affects Versions: Airflow 1.8 > Environment: debian jessie >Reporter: Pierre-Antoine Tible > > Hello, > I'm running Airflow 1.8 under debian jessie. I installed it via pip. > I am using the LocalScheduler with a Mysql. > I made a simple DAG with a BashOperator for a daily task (two times) : > +_default_args = { > 'owner': 'airflow', > 'depends_on_past': False, > 'start_date': datetime.now() - timedelta(days=1, seconds=6), > 'email': ['XXX'], > 'email_on_failure': True, > 'email_on_retry': False, > 'retries': 1, > 'retry_delay': timedelta(minutes=5), > 'execution_timeout': None, > #'catchup': False, > #'backfill': False, > # 'queue': 'bash_queue', > # 'pool': 'backfill', > # 'priority_weight': 10, > # 'end_date': datetime(2016, 1, 1), > } > dag = DAG('campaign-reminder', default_args=default_args, > schedule_interval="0,0 7,15 * * *", concurrency=1, max_active_runs=1) > dag.catchup = False > t1 = BashOperator( > task_id='campaign-reminder', > bash_command=' ', > dag=dag)_+ > I did it today, it works, but the execution date was "06-19T15:00:00", we are > the 20th, so it's one day behind the schedule. > My first though was a mistake with the start_date, so I put a datetime() and > it did the same ... > I don't understand why. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AIRFLOW-1296) DAGs using operators involving cascading skipped tasks fail prematurely
Daniel Huang created AIRFLOW-1296: - Summary: DAGs using operators involving cascading skipped tasks fail prematurely Key: AIRFLOW-1296 URL: https://issues.apache.org/jira/browse/AIRFLOW-1296 Project: Apache Airflow Issue Type: Bug Components: scheduler Reporter: Daniel Huang So this is basically the same issue as AIRFLOW-872 and AIRFLOW-719. A workaround had fixed this (https://github.com/apache/incubator-airflow/pull/2125), but was later reverted (https://github.com/apache/incubator-airflow/pull/2195). I totally agree with the reason for reverting, but I still think this is an issue. The issue is related to any operators that involves cascading skipped tasks, like ShortCircuitOperator or LatestOnlyOperator. These operators mark only their *direct* downstream task as SKIPPED, but additional downstream tasks from that skipped task is left up to the scheduler to cascade the SKIPPED state (see latest only op docs about this expected behavior https://airflow.incubator.apache.org/concepts.html#latest-run-only). However, instead the scheduler marks the DAG run as FAILED prematurely before the DAG has a chance to skip all downstream tasks. This example DAG should reproduce the issue: https://gist.github.com/dhuang/61d38fb001c3a917edf4817bb0c915f9. Expected result: DAG succeeds with tasks - latest_only (success) -> dummy1 (skipped) -> dummy2 (skipped) -> dummy3 (skipped) Actual result: DAG fails with tasks - latest_only (success) -> dummy1 (skipped) -> dummy2 (none) -> dummy3 (none) I believe the results I'm seeing are because of this deadlock prevention logic, https://github.com/apache/incubator-airflow/blob/1.8.1/airflow/models.py#L4182. While that actual result shown above _could_ mean a deadlock, in this case it shouldn't be. Since this {{update_state}} logic is reached first in each scheduler run, dummy2/dummy3 don't get a chance to cascade the SKIPPED state. Commenting out that block gives me the results I expect. [~bolke] I know you spent awhile trying to reproduce my issue and weren't able to, but I'm still hitting this on a fresh environment, default configs, sqlite/mysql dbs, local/sequential/celery executors, and 1.8.1/master. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-1147) airflow scheduler not working
[ https://issues.apache.org/jira/browse/AIRFLOW-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15985069#comment-15985069 ] Daniel Huang commented on AIRFLOW-1147: --- Can you provide some log output of the scheduler? Also just checking, did you toggle the DAG as "on" in the UI? > airflow scheduler not working > - > > Key: AIRFLOW-1147 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1147 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.8 > Environment: CentOS running on 128 GB ram >Reporter: Mubin Khalid >Priority: Critical > Labels: documentation, newbie > Original Estimate: 24h > Remaining Estimate: 24h > > I've created some `DAG`s, and I tried to put it on scheduler. I want to run > all the tasks in the DAG after exact 24 hours. > I tried to do something like this. > {code} > DEFAULT_ARGS= { > 'owner' : 'mubin', > 'depends_on_past' : False, > 'start_date' : datetime(2017, 4, 24, 14, 30), > 'retries' : 5, > 'retry_delay' : timedetla(1), > } > SCHEDULE_INTERVAL = timedelta(minutes=1440) > # SCHEDULE_INTERVAL= timedelta(hours=24) > # SCHEDULE_INTERVAL= timedelta(days=1) > dag = DAG('StandardizeDataDag', > default_args = DEFAULT_ARGS, > schedule_interval = SCHEDULE_INTERVAL > ) > {code} > I tried to put different intervals, but not any working. However if I try to > reset db {code} airflow resetdb -y {code} and then run {code} airflow > initdb {code} , it works for once. then after that, scheduler isn't able to > run it. > PS. {code} airflow scheduler {code} executed from {code} root {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Work started] (AIRFLOW-1076) Support getting variable by string in templates
[ https://issues.apache.org/jira/browse/AIRFLOW-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-1076 started by Daniel Huang. - > Support getting variable by string in templates > --- > > Key: AIRFLOW-1076 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1076 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Daniel Huang >Assignee: Daniel Huang >Priority: Minor > > Currently, one can fetch variables in templates with {{ var.value.foo }}. But > that doesn't work if the variable key has a character you can't use as an > attribute, like ":" or "-". > Should provide alternative method of {{ var.value.get('foo:bar') }}. Can then > also supply a default value if the variable is not found. This also allows > you to fetch the variable specified in another jinja variable (probably not > common use case). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-1076) Support getting variable by string in templates
[ https://issues.apache.org/jira/browse/AIRFLOW-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Huang updated AIRFLOW-1076: -- Description: Currently, one can fetch variables in templates with {{ var.value.foo }}. But that doesn't work if the variable key has a character you can't use as an attribute, like ":" or "-". Should provide alternative method of {{ var.value.get('foo:bar') }}. Can then also supply a default value if the variable is not found. This also allows you to fetch the variable specified in another jinja variable (probably not common use case). was: Currently, one can fetch variables in templates with `{{ var.value.foo}}`. But that doesn't work if the variable key has a character you can't use as an attribute, like ":" or "-". Should provide alternative method of `{{ var.value.get('foo:bar') }}`. Can then also supply a default value if the variable is not found. > Support getting variable by string in templates > --- > > Key: AIRFLOW-1076 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1076 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Daniel Huang >Assignee: Daniel Huang >Priority: Minor > > Currently, one can fetch variables in templates with {{ var.value.foo }}. But > that doesn't work if the variable key has a character you can't use as an > attribute, like ":" or "-". > Should provide alternative method of {{ var.value.get('foo:bar') }}. Can then > also supply a default value if the variable is not found. This also allows > you to fetch the variable specified in another jinja variable (probably not > common use case). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-1076) Support getting variable by string in templates
Daniel Huang created AIRFLOW-1076: - Summary: Support getting variable by string in templates Key: AIRFLOW-1076 URL: https://issues.apache.org/jira/browse/AIRFLOW-1076 Project: Apache Airflow Issue Type: Improvement Reporter: Daniel Huang Assignee: Daniel Huang Priority: Minor Currently, one can fetch variables in templates with `{{ var.value.foo}}`. But that doesn't work if the variable key has a character you can't use as an attribute, like ":" or "-". Should provide alternative method of `{{ var.value.get('foo:bar') }}`. Can then also supply a default value if the variable is not found. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Work started] (AIRFLOW-1075) Cleanup security docs
[ https://issues.apache.org/jira/browse/AIRFLOW-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-1075 started by Daniel Huang. - > Cleanup security docs > - > > Key: AIRFLOW-1075 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1075 > Project: Apache Airflow > Issue Type: Improvement > Components: docs >Reporter: Daniel Huang >Assignee: Daniel Huang >Priority: Trivial > > Noticed a few minor things to fix, like "Impersonation" being under "SSL" > section. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-1075) Cleanup security docs
Daniel Huang created AIRFLOW-1075: - Summary: Cleanup security docs Key: AIRFLOW-1075 URL: https://issues.apache.org/jira/browse/AIRFLOW-1075 Project: Apache Airflow Issue Type: Improvement Components: docs Reporter: Daniel Huang Assignee: Daniel Huang Priority: Trivial Noticed a few minor things to fix, like "Impersonation" being under "SSL" section. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Work started] (AIRFLOW-982) Update examples/docs for ALL_SUCCESS/ONE_SUCCESS including skipped
[ https://issues.apache.org/jira/browse/AIRFLOW-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-982 started by Daniel Huang. > Update examples/docs for ALL_SUCCESS/ONE_SUCCESS including skipped > -- > > Key: AIRFLOW-982 > URL: https://issues.apache.org/jira/browse/AIRFLOW-982 > Project: Apache Airflow > Issue Type: Improvement > Components: docs, examples >Reporter: Daniel Huang >Assignee: Daniel Huang >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-983) Make trigger rules more explicit regarding success vs skipped
Daniel Huang created AIRFLOW-983: Summary: Make trigger rules more explicit regarding success vs skipped Key: AIRFLOW-983 URL: https://issues.apache.org/jira/browse/AIRFLOW-983 Project: Apache Airflow Issue Type: Improvement Components: dependencies Reporter: Daniel Huang Since AIRFLOW-719, the trigger rules all_success/one_success include both success and skipped states. We should probably make ALL_SUCCESS strictly success (again) and add a separate ALL_SUCCESS_OR_SKIPPED/ALL_FAILED_OR_SKIPPED. ALL_SUCCESS_OR_SKIPPED may be a more appropriate default trigger rule as well. Otherwise, we need to note in LatestOnly/ShortCircuit/Branch operators of the appropriate trigger rule to use there. Some previous discussion in https://github.com/apache/incubator-airflow/pull/1961 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-982) Update examples/docs for ALL_SUCCESS/ONE_SUCCESS including skipped
Daniel Huang created AIRFLOW-982: Summary: Update examples/docs for ALL_SUCCESS/ONE_SUCCESS including skipped Key: AIRFLOW-982 URL: https://issues.apache.org/jira/browse/AIRFLOW-982 Project: Apache Airflow Issue Type: Improvement Components: docs, examples Reporter: Daniel Huang Assignee: Daniel Huang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-956) Setup readthedocs for versioned docs
Daniel Huang created AIRFLOW-956: Summary: Setup readthedocs for versioned docs Key: AIRFLOW-956 URL: https://issues.apache.org/jira/browse/AIRFLOW-956 Project: Apache Airflow Issue Type: Improvement Components: docs Reporter: Daniel Huang Assignee: Daniel Huang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Work started] (AIRFLOW-946) Virtualenv not explicitly used by webserver/worker subprocess
[ https://issues.apache.org/jira/browse/AIRFLOW-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-946 started by Daniel Huang. > Virtualenv not explicitly used by webserver/worker subprocess > - > > Key: AIRFLOW-946 > URL: https://issues.apache.org/jira/browse/AIRFLOW-946 > Project: Apache Airflow > Issue Type: Bug > Components: cli >Reporter: Daniel Huang >Assignee: Daniel Huang >Priority: Minor > > I have airflow installed in a virtualenv. I'd expect calling > {{/path/to/venv/bin/airflow webserver}} or {{/path/to/venv/bin/airflow > worker}} *without activating my virtualenv* to work. However, they both fail > to run properly because they spawn a process that is called without > specifying the virtualenv. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-946) Virtualenv not explicitly used by webserver/worker subprocess
Daniel Huang created AIRFLOW-946: Summary: Virtualenv not explicitly used by webserver/worker subprocess Key: AIRFLOW-946 URL: https://issues.apache.org/jira/browse/AIRFLOW-946 Project: Apache Airflow Issue Type: Bug Components: cli Reporter: Daniel Huang Assignee: Daniel Huang Priority: Minor I have airflow installed in a virtualenv. I'd expect calling {{/path/to/venv/bin/airflow webserver}} or {{/path/to/venv/bin/airflow worker}} *without activating my virtualenv* to work. However, they both fail to run properly because they spawn a process that is called without specifying the virtualenv. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-55) Add HDFS Log Support
[ https://issues.apache.org/jira/browse/AIRFLOW-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891506#comment-15891506 ] Daniel Huang commented on AIRFLOW-55: - Took another stab at getting this through using the existing WebHDFSHook, https://github.com/apache/incubator-airflow/pull/2119. > Add HDFS Log Support > > > Key: AIRFLOW-55 > URL: https://issues.apache.org/jira/browse/AIRFLOW-55 > Project: Apache Airflow > Issue Type: New Feature > Components: hooks >Affects Versions: Airflow 1.7.0 >Reporter: Wu Xiang >Assignee: Daniel Huang > Labels: features > Fix For: Airflow 1.8 > > > To support save task logs on HDFS. > PR: > https://github.com/apache/incubator-airflow/pull/1409 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (AIRFLOW-55) Add HDFS Log Support
[ https://issues.apache.org/jira/browse/AIRFLOW-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Huang reassigned AIRFLOW-55: --- Assignee: Daniel Huang > Add HDFS Log Support > > > Key: AIRFLOW-55 > URL: https://issues.apache.org/jira/browse/AIRFLOW-55 > Project: Apache Airflow > Issue Type: New Feature > Components: hooks >Affects Versions: Airflow 1.7.0 >Reporter: Wu Xiang >Assignee: Daniel Huang > Labels: features > Fix For: Airflow 1.8 > > > To support save task logs on HDFS. > PR: > https://github.com/apache/incubator-airflow/pull/1409 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-872) Tasks are not skipping correctly
[ https://issues.apache.org/jira/browse/AIRFLOW-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883889#comment-15883889 ] Daniel Huang commented on AIRFLOW-872: -- Dug a little deeper. Your example DAG and the DAG I ran into this issue both use tasks with the default trigger rule of ALL_SUCCESS. SKIPPED is not considered a successful status, so the [code is failing the DAG|https://github.com/apache/incubator-airflow/blob/2ce7556a590f81b44a7222830e648c6912eb6986/airflow/ti_deps/deps/trigger_rule_dep.py#L166] when it sees t3's parent was skipped, which makes sense. I don't know if this behavior change from 1.7.1.3->1.8 was an intentional bug fix or unintended side effect. I think we just need a NONE_FAILED trigger rule option that accepts success/skipped. I'm happy to make the change if others think it sounds reasonable. Open to a better name as well. Also, I do wonder if it would make more sense as the default (ALL_SUCCESS would be the stricter option). Otherwise, we may want to call out this behavior in the LatestOnly/ShortCircuit operators. > Tasks are not skipping correctly > > > Key: AIRFLOW-872 > URL: https://issues.apache.org/jira/browse/AIRFLOW-872 > Project: Apache Airflow > Issue Type: Bug > Components: cli, DagRun, webserver >Affects Versions: Airflow 1.8 > Environment: docker-compose using docker-compose-LocalExecutor.yml > on this fork/branch > https://github.com/fdm1/docker-airflow/tree/v1-8-not-skipping >Reporter: Frank Massi > Attachments: Screen Shot 2017-02-13 at 1.09.35 PM.png, Screen Shot > 2017-02-13 at 1.11.09 PM.png, Screen Shot 2017-02-13 at 1.11.15 PM.png, > Screen Shot 2017-02-13 at 1.13.13 PM.png > > > When using the BranchPythonOperator or ShortCircuitOperator to make Dags > idempotent, if running the Dag after a successful DagRun and clearing the > initial task, the second task skips and then the Dag enters a failed state, > as opposed to skipping all remaining tasks and marking the Dag successful. > Steps to reproduce are outlined here: > https://github.com/fdm1/docker-airflow/blob/v1-8-not-skipping/dags/test_clear_bug.py#L4-L9 > In 1.7.1.3, the tasks do skip correctly, so this appears to be a new bug. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (AIRFLOW-872) Tasks are not skipping correctly
[ https://issues.apache.org/jira/browse/AIRFLOW-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876942#comment-15876942 ] Daniel Huang edited comment on AIRFLOW-872 at 2/22/17 5:19 PM: --- I hit this as well with the LatestOnlyOperator (very similar to ShortCircuitOperator, it also sets its direct downstream tasks to SKIPPED) when there are more than one task downstream from the latest only task. I believe this [block|https://github.com/apache/incubator-airflow/blob/50702d06187035c99e51ea936c756c00332c4a4a/airflow/models.py#L4034] is being hit and setting the dag to failed prematurely. When I commented it out as a temporary workaround, the remaining tasks get set to skipped correctly and the dag ends in success. was (Author: dxhuang): I hit this as well with the LatestOnlyOperator (very similar to ShortCircuitOperator, it also sets its direct downstream tasks to SKIPPED) when there are more than one task downstream from the latest only task. I believe this [block](https://github.com/apache/incubator-airflow/blob/50702d06187035c99e51ea936c756c00332c4a4a/airflow/models.py#L4034) is being hit and setting the dag to failed prematurely. When I commented it out as a temporary workaround, the remaining tasks get set to skipped correctly and the dag ends in success. > Tasks are not skipping correctly > > > Key: AIRFLOW-872 > URL: https://issues.apache.org/jira/browse/AIRFLOW-872 > Project: Apache Airflow > Issue Type: Bug > Components: cli, DagRun, webserver >Affects Versions: Airflow 1.8 > Environment: docker-compose using docker-compose-LocalExecutor.yml > on this fork/branch > https://github.com/fdm1/docker-airflow/tree/v1-8-not-skipping >Reporter: Frank Massi > Attachments: Screen Shot 2017-02-13 at 1.09.35 PM.png, Screen Shot > 2017-02-13 at 1.11.09 PM.png, Screen Shot 2017-02-13 at 1.11.15 PM.png, > Screen Shot 2017-02-13 at 1.13.13 PM.png > > > When using the BranchPythonOperator or ShortCircuitOperator to make Dags > idempotent, if running the Dag after a successful DagRun and clearing the > initial task, the second task skips and then the Dag enters a failed state, > as opposed to skipping all remaining tasks and marking the Dag successful. > Steps to reproduce are outlined here: > https://github.com/fdm1/docker-airflow/blob/v1-8-not-skipping/dags/test_clear_bug.py#L4-L9 > In 1.7.1.3, the tasks do skip correctly, so this appears to be a new bug. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-872) Tasks are not skipping correctly
[ https://issues.apache.org/jira/browse/AIRFLOW-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876942#comment-15876942 ] Daniel Huang commented on AIRFLOW-872: -- I hit this as well with the LatestOnlyOperator (very similar to ShortCircuitOperator, it also sets its direct downstream tasks to SKIPPED) when there are more than one task downstream from the latest only task. I believe this [block](https://github.com/apache/incubator-airflow/blob/50702d06187035c99e51ea936c756c00332c4a4a/airflow/models.py#L4034) is being hit and setting the dag to failed prematurely. When I commented it out as a temporary workaround, the remaining tasks get set to skipped correctly and the dag ends in success. > Tasks are not skipping correctly > > > Key: AIRFLOW-872 > URL: https://issues.apache.org/jira/browse/AIRFLOW-872 > Project: Apache Airflow > Issue Type: Bug > Components: cli, DagRun, webserver >Affects Versions: Airflow 1.8 > Environment: docker-compose using docker-compose-LocalExecutor.yml > on this fork/branch > https://github.com/fdm1/docker-airflow/tree/v1-8-not-skipping >Reporter: Frank Massi > Attachments: Screen Shot 2017-02-13 at 1.09.35 PM.png, Screen Shot > 2017-02-13 at 1.11.09 PM.png, Screen Shot 2017-02-13 at 1.11.15 PM.png, > Screen Shot 2017-02-13 at 1.13.13 PM.png > > > When using the BranchPythonOperator or ShortCircuitOperator to make Dags > idempotent, if running the Dag after a successful DagRun and clearing the > initial task, the second task skips and then the Dag enters a failed state, > as opposed to skipping all remaining tasks and marking the Dag successful. > Steps to reproduce are outlined here: > https://github.com/fdm1/docker-airflow/blob/v1-8-not-skipping/dags/test_clear_bug.py#L4-L9 > In 1.7.1.3, the tasks do skip correctly, so this appears to be a new bug. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Work started] (AIRFLOW-882) Code example in docs has unnecessary DAG>>Operator assignment
[ https://issues.apache.org/jira/browse/AIRFLOW-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-882 started by Daniel Huang. > Code example in docs has unnecessary DAG>>Operator assignment > - > > Key: AIRFLOW-882 > URL: https://issues.apache.org/jira/browse/AIRFLOW-882 > Project: Apache Airflow > Issue Type: Improvement > Components: docs >Reporter: Daniel Huang >Assignee: Daniel Huang >Priority: Trivial > > The docs currently say: > {code} > We can put this all together to build a simple pipeline: > with DAG('my_dag', start_date=datetime(2016, 1, 1)) as dag: > ( > dag > >> DummyOperator(task_id='dummy_1') > >> BashOperator( > task_id='bash_1', > bash_command='echo "HELLO!"') > >> PythonOperator( > task_id='python_1', > python_callable=lambda: print("GOODBYE!")) > ) > {code} > But the {{dag >> ...}} is unnecessary because the operators are already > initialized with the proper DAG > (https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/models.py#L1699). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-883) Assigning operator to DAG via bitwise composition does not pickup default args
Daniel Huang created AIRFLOW-883: Summary: Assigning operator to DAG via bitwise composition does not pickup default args Key: AIRFLOW-883 URL: https://issues.apache.org/jira/browse/AIRFLOW-883 Project: Apache Airflow Issue Type: Bug Components: models Reporter: Daniel Huang Priority: Minor This is only the case when the operator does not specify {{dag=dag}} and is not initialized within a DAG's context manager (due to https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/utils/decorators.py#L50) Example: {code} default_args = { 'owner': 'airflow', 'start_date': datetime(2017, 2, 1) } dag = DAG('my_dag', default_args=default_args) dummy = DummyOperator(task_id='dummy') dag >> dummy {code} This will raise a {{Task is missing the start_date parameter}}. I _think_ this should probably be allowed because I assume the purpose of supporting {{dag >> op}} was to allow delayed assignment of an operator to a DAG. I believe to fix this, on assignment, we would need to go back and go through dag.default_args to see if any of those attrs weren't explicitly set on task...not the cleanest. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-882) Code example in docs has unnecessary DAG>>Operator assignment
Daniel Huang created AIRFLOW-882: Summary: Code example in docs has unnecessary DAG>>Operator assignment Key: AIRFLOW-882 URL: https://issues.apache.org/jira/browse/AIRFLOW-882 Project: Apache Airflow Issue Type: Improvement Components: docs Reporter: Daniel Huang Assignee: Daniel Huang Priority: Trivial The docs currently say: {code} We can put this all together to build a simple pipeline: with DAG('my_dag', start_date=datetime(2016, 1, 1)) as dag: ( dag >> DummyOperator(task_id='dummy_1') >> BashOperator( task_id='bash_1', bash_command='echo "HELLO!"') >> PythonOperator( task_id='python_1', python_callable=lambda: print("GOODBYE!")) ) {code} But the {{dag >> ...}} is unnecessary because the operators are already initialized with the proper DAG (https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/models.py#L1699). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Work started] (AIRFLOW-881) Create SubDagOperator within DAG context manager without passing dag param
[ https://issues.apache.org/jira/browse/AIRFLOW-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-881 started by Daniel Huang. > Create SubDagOperator within DAG context manager without passing dag param > -- > > Key: AIRFLOW-881 > URL: https://issues.apache.org/jira/browse/AIRFLOW-881 > Project: Apache Airflow > Issue Type: Improvement > Components: operators, subdag >Reporter: Daniel Huang >Assignee: Daniel Huang >Priority: Trivial > > Currently, the following raises a {{Please pass in the `dag` param}} > exception: > {code} > with DAG('my_dag', default_args=default_args) as dag: > op = SubDagOperator(task_id='my_subdag', subdag=subdag_factory(...)) > {code} > But the SubDagOperator should be aware if it's in the context manager of a > dag without having to specify {{dag=dag}} when initializing the operator. > Similar to how the {{@apply_defaults}} decorator does it > (https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/utils/decorators.py#L50). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-881) Create SubDagOperator within DAG context manager without passing dag param
Daniel Huang created AIRFLOW-881: Summary: Create SubDagOperator within DAG context manager without passing dag param Key: AIRFLOW-881 URL: https://issues.apache.org/jira/browse/AIRFLOW-881 Project: Apache Airflow Issue Type: Improvement Components: operators, subdag Reporter: Daniel Huang Assignee: Daniel Huang Priority: Trivial Currently, the following raises a {{Please pass in the `dag` param}} exception: {code} with DAG('my_dag', default_args=default_args) as dag: op = SubDagOperator(task_id='my_subdag', subdag=subdag_factory(...)) {code} But the SubDagOperator should be aware if it's in the context manager of a dag without having to specify {{dag=dag}} when initializing the operator. Similar to how the {{@apply_defaults}} decorator does it (https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/utils/decorators.py#L50). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-859) airflow trigger_dag not working
[ https://issues.apache.org/jira/browse/AIRFLOW-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861865#comment-15861865 ] Daniel Huang commented on AIRFLOW-859: -- Are you running the scheduler? {code} airflow scheduler {code} https://airflow.incubator.apache.org/scheduler.html > airflow trigger_dag not working > -- > > Key: AIRFLOW-859 > URL: https://issues.apache.org/jira/browse/AIRFLOW-859 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.7.1.3 > Environment: Using Airflow 1.7.1.3 on Amazon Web Services (date +%Z # > timezone name > displays UTC). > Airflow using Mariadb also on AWS (maria db system_time_zone set to UTC) >Reporter: Stefano Tiani-Tanzi > > Hi, > I have a simple DAG ( code at end of this message ). > The DAG loads OK using CLI "airflow list_dags". > In Airflow webpage the DAG is set to "On". > I try and trigger the dag using CLI "airflow trigger_dag trigger_dag_v1" and > get feedback as follows: > [2017-02-10 18:20:12,969] {__init__.py:36} INFO - Using executor LocalExecutor > [2017-02-10 18:20:13,374] {cli.py:142} INFO - Created @ 2017-02-10 18:20:13.342199: manual__2017-02-10T18:20:13.342199, externally > triggered: True> > However, the dag does not start/run. > I can see in the web interface ( using Browse|Dag Runs ) that the run is at > "Failed" status but there are no logs associated with the run. Also in the > Tree view no runs appear. > Airflow server and db are both set to UTC and the DAG is switched on in the > web UI. > Have I missed something or am I doing something incorrectly? > Any help would be much appreciated. > Many thanks > Stefano > import json > from airflow import DAG > from airflow.models import Variable > from airflow.operators import DummyOperator > from datetime import datetime, timedelta > default_args = { > 'owner': 'stef', > 'start_date': datetime(2017, 01, 01), > 'retries': 1, > 'retry_delay': timedelta(minutes=3), > 'depends_on_past': False, > 'wait_for_downstream': True, > 'provide_context': True > } > dag = DAG('trigger_dag_v1', default_args=default_args, schedule_interval=None) > task_dummy = DummyOperator( > task_id='dummy', > dag=dag) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-847) Xcoms are not passed into SubDAG
[ https://issues.apache.org/jira/browse/AIRFLOW-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856326#comment-15856326 ] Daniel Huang commented on AIRFLOW-847: -- I ran into this last week. {{xcom_pull()}} by default pulls xcom values from the current DAG (https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1668). So if you're pulling from the SubDAG and want xcom values from the parent DAG, you need to specify it in the {{dag_id}} param. For example, from a template: {code} {{ task_instance.xcom_pull(task_ids="do_thing", dag_id=dag.parent_dag.dag_id) }} {code} > Xcoms are not passed into SubDAG > > > Key: AIRFLOW-847 > URL: https://issues.apache.org/jira/browse/AIRFLOW-847 > Project: Apache Airflow > Issue Type: Bug > Components: subdag, xcom >Reporter: Robin B >Priority: Blocker > > It's not possible to do a xcom_pull within a subdag > None of the following seems to be working: > * As templated var in SubDagoperator > * As var in SubDagoperator > * From within Subdag-factory -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Work started] (AIRFLOW-770) HDFS hooks should support alternative ways of getting connection
[ https://issues.apache.org/jira/browse/AIRFLOW-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-770 started by Daniel Huang. > HDFS hooks should support alternative ways of getting connection > > > Key: AIRFLOW-770 > URL: https://issues.apache.org/jira/browse/AIRFLOW-770 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Reporter: Daniel Huang >Assignee: Daniel Huang >Priority: Minor > > The HDFS hook currently uses {{get_connections()}} instead of > {{get_connection()}} to grab the connection info. I believe this is so if > multiple connections are specified, instead of choosing them at random, it > appropriately passes them all via snakebite's HAClient. > As far as I can tell, this means connection info can't be set outside of the > UI, since environment variables are not looked at (which had me confused for > a bit). I think ideally we'd want to be able to do so for the three different > snakebite clients. Here are some possible suggestions for allowing this: > * AutoConfigClient: add attribute like {{HDFSHook(..., > autoconfig=True).get_conn()}} > * Client: specify single URI in environment variable > * HAClient: specify multiple URIs in environment variable, separated by > commas? Not very adhering to standard and if we did this, we'd probably want > to support this across all hooks. > WebHDFS hook has a similar issue with pulling from env. > references: > https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/base_hook.py#L43-L56 > https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/hdfs_hook.py#L45-L73 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (AIRFLOW-770) HDFS hooks should support alternative ways of getting connection
[ https://issues.apache.org/jira/browse/AIRFLOW-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Huang reassigned AIRFLOW-770: Assignee: Daniel Huang > HDFS hooks should support alternative ways of getting connection > > > Key: AIRFLOW-770 > URL: https://issues.apache.org/jira/browse/AIRFLOW-770 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Reporter: Daniel Huang >Assignee: Daniel Huang >Priority: Minor > > The HDFS hook currently uses {{get_connections()}} instead of > {{get_connection()}} to grab the connection info. I believe this is so if > multiple connections are specified, instead of choosing them at random, it > appropriately passes them all via snakebite's HAClient. > As far as I can tell, this means connection info can't be set outside of the > UI, since environment variables are not looked at (which had me confused for > a bit). I think ideally we'd want to be able to do so for the three different > snakebite clients. Here are some possible suggestions for allowing this: > * AutoConfigClient: add attribute like {{HDFSHook(..., > autoconfig=True).get_conn()}} > * Client: specify single URI in environment variable > * HAClient: specify multiple URIs in environment variable, separated by > commas? Not very adhering to standard and if we did this, we'd probably want > to support this across all hooks. > WebHDFS hook has a similar issue with pulling from env. > references: > https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/base_hook.py#L43-L56 > https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/hdfs_hook.py#L45-L73 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-770) HDFS hooks should support alternative ways of getting connection
[ https://issues.apache.org/jira/browse/AIRFLOW-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Huang updated AIRFLOW-770: - Description: The HDFS hook currently uses {{get_connections()}} instead of {{get_connection()}} to grab the connection info. I believe this is so if multiple connections are specified, instead of choosing them at random, it appropriately passes them all via snakebite's HAClient. As far as I can tell, this means connection info can't be set outside of the UI, since environment variables are not looked at (which had me confused for a bit). I think ideally we'd want to be able to do so for the three different snakebite clients. Here are some possible suggestions for allowing this: * AutoConfigClient: add attribute like {{HDFSHook(..., autoconfig=True).get_conn()}} * Client: specify single URI in environment variable * HAClient: specify multiple URIs in environment variable, separated by commas? Not very adhering to standard and if we did this, we'd probably want to support this across all hooks. WebHDFS hook has a similar issue with pulling from env. references: https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/base_hook.py#L43-L56 https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/hdfs_hook.py#L45-L73 was: The HDFS hook currently uses {{get_connections()}} instead of {{get_connection()}} to grab the connection info. I believe this is so if multiple connections are specified, instead of choosing them at random, it appropriately passes them all via snakebite's HAClient. As far as I can tell, this means connection info can't be set outside of the UI, since environment variables are not looked at (which had me confused for a bit). I think ideally we'd want to be able to do so for the three different snakebite clients. Here are some possible suggestions for allowing this: * AutoConfigClient: add attribute like {{HDFSHook(..., autoconfig=True).get_conn()}} * Client: specify single URI in environment variable * HAClient: specify multiple URIs in environment variable, separated by commas? Not very adhering to standard and if we did this, we'd probably want to support this across all hooks. references: https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/base_hook.py#L43-L56 https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/hdfs_hook.py#L45-L73 Summary: HDFS hooks should support alternative ways of getting connection (was: HDFS hook should support alternative ways of getting connection) > HDFS hooks should support alternative ways of getting connection > > > Key: AIRFLOW-770 > URL: https://issues.apache.org/jira/browse/AIRFLOW-770 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Reporter: Daniel Huang >Priority: Minor > > The HDFS hook currently uses {{get_connections()}} instead of > {{get_connection()}} to grab the connection info. I believe this is so if > multiple connections are specified, instead of choosing them at random, it > appropriately passes them all via snakebite's HAClient. > As far as I can tell, this means connection info can't be set outside of the > UI, since environment variables are not looked at (which had me confused for > a bit). I think ideally we'd want to be able to do so for the three different > snakebite clients. Here are some possible suggestions for allowing this: > * AutoConfigClient: add attribute like {{HDFSHook(..., > autoconfig=True).get_conn()}} > * Client: specify single URI in environment variable > * HAClient: specify multiple URIs in environment variable, separated by > commas? Not very adhering to standard and if we did this, we'd probably want > to support this across all hooks. > WebHDFS hook has a similar issue with pulling from env. > references: > https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/base_hook.py#L43-L56 > https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/hdfs_hook.py#L45-L73 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-768) Clicking "Code" when zoomed into a subdag causes an exception
[ https://issues.apache.org/jira/browse/AIRFLOW-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847833#comment-15847833 ] Daniel Huang commented on AIRFLOW-768: -- Looks like a duplicate of AIRFLOW-365, which I just put up a PR for. > Clicking "Code" when zoomed into a subdag causes an exception > - > > Key: AIRFLOW-768 > URL: https://issues.apache.org/jira/browse/AIRFLOW-768 > Project: Apache Airflow > Issue Type: Bug > Components: subdag, webserver >Affects Versions: Airflow 1.7.1.3 > Environment: puckel/docker-airflow:1.7.1.3-5 >Reporter: Dirk Gorissen > > Zoom into a subdag in the graph view. Then clicking on Code gives: > --- > Node: c3a8ec77a1e6 > --- > Traceback (most recent call last): > File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1817, in > wsgi_app > response = self.full_dispatch_request() > File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1477, in > full_dispatch_request > rv = self.handle_user_exception(e) > File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1381, in > handle_user_exception > reraise(exc_type, exc_value, tb) > File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1475, in > full_dispatch_request > rv = self.dispatch_request() > File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1461, in > dispatch_request > return self.view_functions[rule.endpoint](**req.view_args) > File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 68, > in inner > return self._run_view(f, *args, **kwargs) > File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line > 367, in _run_view > return fn(self, *args, **kwargs) > File "/usr/local/lib/python2.7/dist-packages/flask_login.py", line 755, in > decorated_view > return func(*args, **kwargs) > File "/usr/local/lib/python2.7/dist-packages/airflow/www/views.py", line > 655, in code > m = importlib.import_module(dag.module_name) > AttributeError: 'DAG' object has no attribute 'module_name' -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (AIRFLOW-365) Code view in subdag trigger exception
[ https://issues.apache.org/jira/browse/AIRFLOW-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Huang reassigned AIRFLOW-365: Assignee: Daniel Huang > Code view in subdag trigger exception > - > > Key: AIRFLOW-365 > URL: https://issues.apache.org/jira/browse/AIRFLOW-365 > Project: Apache Airflow > Issue Type: Bug > Components: ui >Affects Versions: Airflow 1.7.1.3 > Environment: Centos 7 >Reporter: marco pioltelli >Assignee: Daniel Huang >Priority: Minor > > Hi, > first of all this product is wonderful. > I have created my DAG with subdag. > It works very well, the following issue happen on the UI: > - going into the dag, selecting (zoom into) a subdag > - clicking into the Code section to view the code, the following exception is > triggered > --- > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app > response = self.full_dispatch_request() > File "/usr/lib/python2.7/site-packages/flask/app.py", line 1477, in > full_dispatch_request > rv = self.handle_user_exception(e) > File "/usr/lib/python2.7/site-packages/flask/app.py", line 1381, in > handle_user_exception > reraise(exc_type, exc_value, tb) > File "/usr/lib/python2.7/site-packages/flask/app.py", line 1475, in > full_dispatch_request > rv = self.dispatch_request() > File "/usr/lib/python2.7/site-packages/flask/app.py", line 1461, in > dispatch_request > return self.view_functions[rule.endpoint](**req.view_args) > File "/usr/lib/python2.7/site-packages/flask_admin/base.py", line 68, in > inner > return self._run_view(f, *args, **kwargs) > File "/usr/lib/python2.7/site-packages/flask_admin/base.py", line 367, in > _run_view > return fn(self, *args, **kwargs) > File "/usr/lib/python2.7/site-packages/flask_login.py", line 755, in > decorated_view > return func(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/airflow/www/views.py", line 655, in > code > m = importlib.import_module(dag.module_name) > AttributeError: 'DAG' object has no attribute 'module_name' > It seems it is searching for a DAG instead for a subdag > This also happens with the example sub dag operator. > Thanks > Marco -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Work started] (AIRFLOW-806) UI should properly ignore DAG doc when it is None
[ https://issues.apache.org/jira/browse/AIRFLOW-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-806 started by Daniel Huang. > UI should properly ignore DAG doc when it is None > - > > Key: AIRFLOW-806 > URL: https://issues.apache.org/jira/browse/AIRFLOW-806 > Project: Apache Airflow > Issue Type: Improvement > Components: ui >Affects Versions: Airflow 1.7.1.3 >Reporter: Daniel Huang >Assignee: Daniel Huang >Priority: Trivial > > If {{dag.doc_md = __doc__}}, but there are no docstrings on the file, the > Graph view will error with: > {code} > AttributeError: 'NoneType' object has no attribute 'strip' > {code} > The UI should just assume we have no docs to show for the DAG. I suspect user > in AIRFLOW-625 ran into this error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-806) UI should properly ignore DAG doc when it is None
Daniel Huang created AIRFLOW-806: Summary: UI should properly ignore DAG doc when it is None Key: AIRFLOW-806 URL: https://issues.apache.org/jira/browse/AIRFLOW-806 Project: Apache Airflow Issue Type: Improvement Components: ui Affects Versions: Airflow 1.7.1.3 Reporter: Daniel Huang Assignee: Daniel Huang Priority: Trivial If {{dag.doc_md = __doc__}}, but there are no docstrings on the file, the Graph view will error with: {code} AttributeError: 'NoneType' object has no attribute 'strip' {code} The UI should just assume we have no docs to show for the DAG. I suspect user in AIRFLOW-625 ran into this error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-625) doc_md in concepts document seems wrong
[ https://issues.apache.org/jira/browse/AIRFLOW-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838399#comment-15838399 ] Daniel Huang commented on AIRFLOW-625: -- I wasn't able to reproduce this. Did you by chance have {{dag.doc_md = __doc__}} set (like https://github.com/apache/incubator-airflow/blob/master/docs/concepts.rst does) when you first tried this? Maybe DAG did not refresh on your latest example attempt? That would have triggered this error since you don't have any docstrings set. Also in that case, changing {{t1.doc_md}} to {{dag.doc_md}} would have fixed your issue because then {{dag.doc_md}} was no longer {{None}}. > doc_md in concepts document seems wrong > --- > > Key: AIRFLOW-625 > URL: https://issues.apache.org/jira/browse/AIRFLOW-625 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.7.1 > Environment: CentOS 7.2 >Reporter: Flex Gao > > In > [https://github.com/apache/incubator-airflow/blob/master/docs/concepts.rst] > it said *doc_md* is an attribute of a task, but this will give an error on > webserver *Graph* tab > Example Dag file: > {code} > #!/usr/bin/env python > # -*- coding: utf-8 -*- > from airflow import DAG > from airflow.operators.bash_operator import BashOperator > from datetime import datetime, timedelta > SCRIPTS_PATH = '/var/lib/airflow/scripts' > default_args = { > 'depends_on_past': False, > 'start_date': datetime(2016, 11, 9, 0, 55), > } > dag = DAG('elasticsearch', default_args=default_args, > schedule_interval=timedelta(days=1)) > t1 = BashOperator( > task_id='daily_index_delete', > bash_command='%s/es_clean.py' % SCRIPTS_PATH, > dag=dag) > t1.doc_md = """\ > Task Documentation > Clean ES Indeices Every Day > """ > {code} > This will give a traceback: > *AttributeError: 'NoneType' object has no attribute 'strip'* > But if i changed *t1.doc_md* to *dag.doc_md*, everything is ok. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-770) HDFS hook should support alternative ways of getting connection
Daniel Huang created AIRFLOW-770: Summary: HDFS hook should support alternative ways of getting connection Key: AIRFLOW-770 URL: https://issues.apache.org/jira/browse/AIRFLOW-770 Project: Apache Airflow Issue Type: Improvement Components: hooks Reporter: Daniel Huang Priority: Minor The HDFS hook currently uses {{get_connections()}} instead of {{get_connection()}} to grab the connection info. I believe this is so if multiple connections are specified, instead of choosing them at random, it appropriately passes them all via snakebite's HAClient. As far as I can tell, this means connection info can't be set outside of the UI, since environment variables are not looked at (which had me confused for a bit). I think ideally we'd want to be able to do so for the three different snakebite clients. Here are some possible suggestions for allowing this: * AutoConfigClient: add attribute like {{HDFSHook(..., autoconfig=True).get_conn()}} * Client: specify single URI in environment variable * HAClient: specify multiple URIs in environment variable, separated by commas? Not very adhering to standard and if we did this, we'd probably want to support this across all hooks. references: https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/base_hook.py#L43-L56 https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/hdfs_hook.py#L45-L73 -- This message was sent by Atlassian JIRA (v6.3.4#6332)