[jira] [Closed] (AIRFLOW-1798) Include celery ssl configs in default template

2018-01-19 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Huang closed AIRFLOW-1798.
-
Resolution: Done

resolved by AIRFLOW-1840

> Include celery ssl configs in default template
> --
>
> Key: AIRFLOW-1798
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1798
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: configuration
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-1798) Include celery ssl configs in default template

2017-11-09 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-1798 started by Daniel Huang.
-
> Include celery ssl configs in default template
> --
>
> Key: AIRFLOW-1798
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1798
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: configuration
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1798) Include celery ssl configs in default template

2017-11-09 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-1798:
-

 Summary: Include celery ssl configs in default template
 Key: AIRFLOW-1798
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1798
 Project: Apache Airflow
  Issue Type: Improvement
  Components: configuration
Reporter: Daniel Huang
Assignee: Daniel Huang
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1559) MySQL warnings about aborted connections, missing engine disposal

2017-11-08 Thread Daniel Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244805#comment-16244805
 ] 

Daniel Huang commented on AIRFLOW-1559:
---

[~jeffreybian] I added engine dispose calls in a few spots, including the ones 
you mentioned. Please take a look! 
https://github.com/apache/incubator-airflow/pull/2767

> MySQL warnings about aborted connections, missing engine disposal
> -
>
> Key: AIRFLOW-1559
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1559
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Minor
>
> We're seeing a flood of warnings about aborted connections in our MySQL logs. 
> {code}
> Aborted connection 56720 to db: 'airflow' user: 'foo' host: 'x.x.x.x' (Got an 
> error reading communication packets)
> {code}
> It appears this is because we're not performing [engine 
> disposal|http://docs.sqlalchemy.org/en/latest/core/connections.html#engine-disposal].
>  The most common source of this warning is from the scheduler, when it kicks 
> off new processes to process the DAG files. Calling dispose in 
> https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L403 
> greatly reduced these messages. However, the worker is still causing some of 
> these, I assume from when we spin up processes to run tasks. We do call 
> dispose in 
> https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1394-L1396,
>  but I think it's a bit early. Not sure if there's a place we can put this 
> cleanup to ensure it's done everywhere.
> Quick script to reproduce this warning message:
> {code}
> from airflow import settings
> from airflow.models import Connection
> session = settings.Session()
> session.query(Connection).count()
> session.close()
> # not calling settings.engine.dispose()
> {code}
> Reproduced with Airflow 1.8.1, MySQL 5.7, and SQLAlchemy 1.1.13. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (AIRFLOW-1559) MySQL warnings about aborted connections, missing engine disposal

2017-11-08 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Huang reassigned AIRFLOW-1559:
-

Assignee: Daniel Huang

> MySQL warnings about aborted connections, missing engine disposal
> -
>
> Key: AIRFLOW-1559
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1559
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Minor
>
> We're seeing a flood of warnings about aborted connections in our MySQL logs. 
> {code}
> Aborted connection 56720 to db: 'airflow' user: 'foo' host: 'x.x.x.x' (Got an 
> error reading communication packets)
> {code}
> It appears this is because we're not performing [engine 
> disposal|http://docs.sqlalchemy.org/en/latest/core/connections.html#engine-disposal].
>  The most common source of this warning is from the scheduler, when it kicks 
> off new processes to process the DAG files. Calling dispose in 
> https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L403 
> greatly reduced these messages. However, the worker is still causing some of 
> these, I assume from when we spin up processes to run tasks. We do call 
> dispose in 
> https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1394-L1396,
>  but I think it's a bit early. Not sure if there's a place we can put this 
> cleanup to ensure it's done everywhere.
> Quick script to reproduce this warning message:
> {code}
> from airflow import settings
> from airflow.models import Connection
> session = settings.Session()
> session.query(Connection).count()
> session.close()
> # not calling settings.engine.dispose()
> {code}
> Reproduced with Airflow 1.8.1, MySQL 5.7, and SQLAlchemy 1.1.13. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (AIRFLOW-1559) MySQL warnings about aborted connections, missing engine disposal

2017-11-08 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-1559 started by Daniel Huang.
-
> MySQL warnings about aborted connections, missing engine disposal
> -
>
> Key: AIRFLOW-1559
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1559
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Minor
>
> We're seeing a flood of warnings about aborted connections in our MySQL logs. 
> {code}
> Aborted connection 56720 to db: 'airflow' user: 'foo' host: 'x.x.x.x' (Got an 
> error reading communication packets)
> {code}
> It appears this is because we're not performing [engine 
> disposal|http://docs.sqlalchemy.org/en/latest/core/connections.html#engine-disposal].
>  The most common source of this warning is from the scheduler, when it kicks 
> off new processes to process the DAG files. Calling dispose in 
> https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L403 
> greatly reduced these messages. However, the worker is still causing some of 
> these, I assume from when we spin up processes to run tasks. We do call 
> dispose in 
> https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1394-L1396,
>  but I think it's a bit early. Not sure if there's a place we can put this 
> cleanup to ensure it's done everywhere.
> Quick script to reproduce this warning message:
> {code}
> from airflow import settings
> from airflow.models import Connection
> session = settings.Session()
> session.query(Connection).count()
> session.close()
> # not calling settings.engine.dispose()
> {code}
> Reproduced with Airflow 1.8.1, MySQL 5.7, and SQLAlchemy 1.1.13. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (AIRFLOW-1794) No Exception.message in Python 3

2017-11-08 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-1794 started by Daniel Huang.
-
> No Exception.message in Python 3
> 
>
> Key: AIRFLOW-1794
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1794
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Minor
>
> [~ashb] ran into this
> {code}
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1988, in 
> wsgi_app
> response = self.full_dispatch_request()
>   File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1641, in 
> full_dispatch_request
> rv = self.handle_user_exception(e)
>   File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1544, in 
> handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 33, in 
> reraise
> raise value
>   File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1639, in 
> full_dispatch_request
> rv = self.dispatch_request()
>   File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1625, in 
> dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File "/usr/local/lib/python3.5/dist-packages/flask_admin/base.py", line 69, 
> in inner
> return self._run_view(f, *args, **kwargs)
>   File "/usr/local/lib/python3.5/dist-packages/flask_admin/base.py", line 
> 368, in _run_view
> return fn(self, *args, **kwargs)
>   File "/usr/local/lib/python3.5/dist-packages/flask_login.py", line 758, in 
> decorated_view
> return func(*args, **kwargs)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/www/utils.py", line 
> 262, in wrapper
> return f(*args, **kwargs)
>   File "/usr/local/lib/python3.5/dist-packages/airflow/www/views.py", line 
> 715, in log
> .format(task_log_reader, e.message)]
> AttributeError: 'AttributeError' object has no attribute 'message'
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1794) No Exception.message in Python 3

2017-11-08 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-1794:
-

 Summary: No Exception.message in Python 3
 Key: AIRFLOW-1794
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1794
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Daniel Huang
Assignee: Daniel Huang
Priority: Minor


[~ashb] ran into this

{code}
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1988, in 
wsgi_app
response = self.full_dispatch_request()
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1641, in 
full_dispatch_request
rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1544, in 
handle_user_exception
reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 33, in 
reraise
raise value
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1639, in 
full_dispatch_request
rv = self.dispatch_request()
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1625, in 
dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python3.5/dist-packages/flask_admin/base.py", line 69, 
in inner
return self._run_view(f, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/flask_admin/base.py", line 368, 
in _run_view
return fn(self, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/flask_login.py", line 758, in 
decorated_view
return func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/airflow/www/utils.py", line 262, 
in wrapper
return f(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/airflow/www/views.py", line 715, 
in log
.format(task_log_reader, e.message)]
AttributeError: 'AttributeError' object has no attribute 'message'
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1463) Scheduler does not reschedule tasks in QUEUED state

2017-09-20 Thread Daniel Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16173844#comment-16173844
 ] 

Daniel Huang commented on AIRFLOW-1463:
---

Hitting this as well due to deployments and one of my DAGs intermittently 
hitting the {{DAGBAG_IMPORT_TIMEOUT}}. Just restarting the scheduler does not 
get my tasks out of the QUEUED state. I've had to delete the task instance so 
it the scheduler re-queues the task with an increased timeout.

> Scheduler does not reschedule tasks in QUEUED state
> ---
>
> Key: AIRFLOW-1463
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1463
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
> Environment: Ubuntu 14.04
> Airflow 1.8.0
> SQS backed task queue, AWS RDS backed meta storage
> DAG folder is synced by script on code push: archive is downloaded from s3, 
> unpacked, moved, install script is run. airflow executable is replaced with 
> symlink pointing to the latest version of code, no airflow processes are 
> restarted.
>Reporter: Stanislav Pak
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Our pipelines related code is deployed almost simultaneously on all airflow 
> boxes: scheduler+webserver box, workers boxes. Some common python package is 
> deployed on those boxes on every other code push (3-5 deployments per hour). 
> Due to installation specifics, a DAG that imports module from that package 
> might fail. If DAG import fails when worker runs a task, the task is still 
> removed from the queue but task state is not changed, so in this case the 
> task stays in QUEUED state forever.
> Beside the described case, there is scenario when it happens because of DAG 
> update lag in scheduler. A task can be scheduled with old DAG and worker can 
> run the task with new DAG that fails to be imported.
> There might be other scenarios when it happens.
> Proposal:
> Catch errors when importing DAG on task run and clear task instance state if 
> import fails. This should fix transient issues of this kind.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (AIRFLOW-1627) SubDagOperator initialization should only query pools when necessary

2017-09-20 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-1627 started by Daniel Huang.
-
> SubDagOperator initialization should only query pools when necessary
> 
>
> Key: AIRFLOW-1627
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1627
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators, subdag
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Minor
>
> If a SubDagOperator is assigned to a pool, it queries db for pool info to 
> ensure there is no pool conflict with one of its tasks when only 1 slot 
> remains. However, we should check that there's a possible conflict (a task in 
> the subdag is in the same pool as the subdag) before actually querying for 
> pools.
> I have a DAG with hundreds of subdags and I found that the pool conflict 
> check was taking up a fair chunk of time when processing the DAG file.
> Relevant code: 
> https://github.com/apache/incubator-airflow/blob/a81c153cc48e4c99a9e0a5047990b84c5d07e3cb/airflow/operators/subdag_operator.py#L60-L81



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1627) SubDagOperator initialization should only query pools when necessary

2017-09-20 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-1627:
-

 Summary: SubDagOperator initialization should only query pools 
when necessary
 Key: AIRFLOW-1627
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1627
 Project: Apache Airflow
  Issue Type: Improvement
  Components: operators, subdag
Reporter: Daniel Huang
Assignee: Daniel Huang
Priority: Minor


If a SubDagOperator is assigned to a pool, it queries db for pool info to 
ensure there is no pool conflict with one of its tasks when only 1 slot 
remains. However, we should check that there's a possible conflict (a task in 
the subdag is in the same pool as the subdag) before actually querying for 
pools.

I have a DAG with hundreds of subdags and I found that the pool conflict check 
was taking up a fair chunk of time when processing the DAG file.

Relevant code: 
https://github.com/apache/incubator-airflow/blob/a81c153cc48e4c99a9e0a5047990b84c5d07e3cb/airflow/operators/subdag_operator.py#L60-L81



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1559) MySQL warnings about aborted connections, missing engine disposal

2017-09-01 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-1559:
-

 Summary: MySQL warnings about aborted connections, missing engine 
disposal
 Key: AIRFLOW-1559
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1559
 Project: Apache Airflow
  Issue Type: Bug
  Components: db
Reporter: Daniel Huang
Priority: Minor


We're seeing a flood of warnings about aborted connections in our MySQL logs. 

{code}
Aborted connection 56720 to db: 'airflow' user: 'foo' host: 'x.x.x.x' (Got an 
error reading communication packets)
{code}

It appears this is because we're not performing [engine 
disposal|http://docs.sqlalchemy.org/en/latest/core/connections.html#engine-disposal].
 The most common source of this warning is from the scheduler, when it kicks 
off new processes to process the DAG files. Calling dispose in 
https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L403 
greatly reduced these messages. However, the worker is still causing some of 
these, I assume from when we spin up processes to run tasks. We do call dispose 
in 
https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1394-L1396,
 but I think it's a bit early. Not sure if there's a place we can put this 
cleanup to ensure it's done everywhere.

Quick script to reproduce this warning message:

{code}
from airflow import settings
from airflow.models import Connection

session = settings.Session()
session.query(Connection).count()
session.close()
# not calling settings.engine.dispose()
{code}

Reproduced with Airflow 1.8.1, MySQL 5.7, and SQLAlchemy 1.1.13. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1538) DAG prematurely marked as failed before downstream tasks are marked as upstream_failed

2017-08-28 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-1538:
-

 Summary: DAG prematurely marked as failed before downstream tasks 
are marked as upstream_failed
 Key: AIRFLOW-1538
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1538
 Project: Apache Airflow
  Issue Type: Bug
  Components: scheduler
Reporter: Daniel Huang
Priority: Minor


When a task fails, I'm seeing that only sometimes do its downstream tasks get 
marked as upstream_failed. Most of the time the downstream tasks are left with 
null state. While the DAG run is still correctly marked as failed, I have some 
tasks with trigger rule all_failed/one_failed further down that don't get 
executed because the failure isn't propagated down.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1537) SubDagOperator does not retry properly

2017-08-28 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-1537:
-

 Summary: SubDagOperator does not retry properly
 Key: AIRFLOW-1537
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1537
 Project: Apache Airflow
  Issue Type: Bug
  Components: backfill, scheduler, subdag
Reporter: Daniel Huang


Since AIRFLOW-1124, we only backfill tasks with no state. While that makes 
sense for a backfill where we will use in combination with clear, it doesn't 
quite work in the context of a sub DAG that is set to retry.

I'd expect failed tasks to be rerun at least, but there can be some other side 
effects to doing that in other cases. [~artwr] mentioned this in 
https://github.com/apache/incubator-airflow/pull/2247/files#r112189191. Any 
thoughts on how to handle this [~bolke]?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1400) catchup=False caused exception

2017-07-11 Thread Daniel Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082309#comment-16082309
 ] 

Daniel Huang commented on AIRFLOW-1400:
---

What's your DAG's {{schedule_interval}}?

> catchup=False caused exception
> --
>
> Key: AIRFLOW-1400
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1400
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.8
>Reporter: Xi Wang
>
> When I set of the task with catchup=False, it threw a error as 
> follow(logs/scheduler/my_dag_name):
> [2017-07-10 15:13:12,534] {jobs.py:354} DagFileProcessor373 ERROR - Got an 
> exception! Propagating...
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 346, in helper
> pickle_dags)
>   File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 53, in 
> wrapper
> result = func(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 1581, in 
> process_file
> self._process_dags(dagbag, dags, ti_keys_to_schedule)
>   File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 1171, in 
> _process_dags
> dag_run = self.create_dag_run(dag)
>   File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 53, in 
> wrapper
> result = func(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 776, in 
> create_dag_run
> if next_start <= now:
> TypeError: can't compare datetime.datetime to NoneType
> It seems next_start was not defined properly, 
> (https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L777)
> Any help is appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1355) Unable to launch jobs : DAGs not being executed.

2017-06-28 Thread Daniel Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067650#comment-16067650
 ] 

Daniel Huang commented on AIRFLOW-1355:
---

If the DAG is in Running status, what state are the tasks in? Queued or Running?

> Unable to launch jobs : DAGs not being executed.
> 
>
> Key: AIRFLOW-1355
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1355
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Affects Versions: Airflow 2.0
> Environment: Mac OS and Ubuntu
>Reporter: Pavan KN
>
> Steps to re-produce:
> 1. Create new installation
> 2. Launch Airflow
> 3. Enable a DAG and trigger it manually
> DAG/Job won't get executed. Will stay in Running status, but no execution 
> starts and continues to stay at same status.
> Same issues is there with Sequential, Local and Celeri executors.
> Happening in 2.0 version. Tried on multiple Mac machines and on Ubuntu.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1328) Daily DAG execute the past day

2017-06-20 Thread Daniel Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056292#comment-16056292
 ] 

Daniel Huang commented on AIRFLOW-1328:
---

>From the [docs|https://airflow.incubator.apache.org/scheduler.html]:

{quote}Note that if you run a DAG on a schedule_interval of one day, the run 
stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, 
the job instance is started once the period it covers has ended.

Let’s Repeat That The scheduler runs your job one schedule_interval AFTER the 
start date, at the END of the period.

{quote}

This means in your case, the DAG with execution date 06-19T15:00:00 is not 
expected to run until 06-20T07:00:00 (the next scheduled interval). 

> Daily DAG execute the past day
> --
>
> Key: AIRFLOW-1328
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1328
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: Airflow 1.8
> Environment: debian jessie
>Reporter: Pierre-Antoine Tible
>
> Hello,
> I'm running Airflow 1.8 under debian jessie. I installed it via pip.
> I am using the LocalScheduler with a Mysql.
> I made a simple DAG with a BashOperator for a daily task (two times) : 
> +_default_args = {
> 'owner': 'airflow',
> 'depends_on_past': False,
> 'start_date': datetime.now() - timedelta(days=1, seconds=6),
> 'email': ['XXX'],
> 'email_on_failure': True,
> 'email_on_retry': False,
> 'retries': 1,
> 'retry_delay': timedelta(minutes=5),
> 'execution_timeout': None,
> #'catchup': False,
> #'backfill': False,
> # 'queue': 'bash_queue',
> # 'pool': 'backfill',
> # 'priority_weight': 10,
> # 'end_date': datetime(2016, 1, 1),
> }
> dag = DAG('campaign-reminder', default_args=default_args, 
> schedule_interval="0,0 7,15 * * *", concurrency=1, max_active_runs=1)
> dag.catchup = False
> t1 = BashOperator(
> task_id='campaign-reminder',
> bash_command=' ',
> dag=dag)_+
> I did it today, it works, but the execution date was "06-19T15:00:00", we are 
> the 20th, so it's one day behind the schedule.
> My first though was a mistake with the start_date, so I put a datetime() and 
> it did the same ...
> I don't understand why.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1296) DAGs using operators involving cascading skipped tasks fail prematurely

2017-06-08 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-1296:
-

 Summary: DAGs using operators involving cascading skipped tasks 
fail prematurely
 Key: AIRFLOW-1296
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1296
 Project: Apache Airflow
  Issue Type: Bug
  Components: scheduler
Reporter: Daniel Huang


So this is basically the same issue as AIRFLOW-872 and AIRFLOW-719. A 
workaround had fixed this 
(https://github.com/apache/incubator-airflow/pull/2125), but was later reverted 
(https://github.com/apache/incubator-airflow/pull/2195). I totally agree with 
the reason for reverting, but I still think this is an issue. 

The issue is related to any operators that involves cascading skipped tasks, 
like ShortCircuitOperator or LatestOnlyOperator. These operators mark only 
their *direct* downstream task as SKIPPED, but additional downstream tasks from 
that skipped task is left up to the scheduler to cascade the SKIPPED state (see 
latest only op docs about this expected behavior 
https://airflow.incubator.apache.org/concepts.html#latest-run-only). However, 
instead the scheduler marks the DAG run as FAILED prematurely before the DAG 
has a chance to skip all downstream tasks.

This example DAG should reproduce the issue: 
https://gist.github.com/dhuang/61d38fb001c3a917edf4817bb0c915f9. 

Expected result: DAG succeeds with tasks - latest_only (success) -> dummy1 
(skipped) -> dummy2 (skipped) -> dummy3 (skipped)
Actual result: DAG fails with tasks - latest_only (success) -> dummy1 (skipped) 
-> dummy2 (none) -> dummy3 (none)

I believe the results I'm seeing are because of this deadlock prevention logic, 
https://github.com/apache/incubator-airflow/blob/1.8.1/airflow/models.py#L4182. 
While that actual result shown above _could_ mean a deadlock, in this case it 
shouldn't be. Since this {{update_state}} logic is reached first in each 
scheduler run, dummy2/dummy3 don't get a chance to cascade the SKIPPED state. 
Commenting out that block gives me the results I expect.

[~bolke] I know you spent awhile trying to reproduce my issue and weren't able 
to, but I'm still hitting this on a fresh environment, default configs, 
sqlite/mysql dbs, local/sequential/celery executors, and 1.8.1/master.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-1147) airflow scheduler not working

2017-04-26 Thread Daniel Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15985069#comment-15985069
 ] 

Daniel Huang commented on AIRFLOW-1147:
---

Can you provide some log output of the scheduler? Also just checking, did you 
toggle the DAG as "on" in the UI?

> airflow scheduler not working
> -
>
> Key: AIRFLOW-1147
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1147
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.8
> Environment: CentOS running on 128 GB ram
>Reporter: Mubin Khalid
>Priority: Critical
>  Labels: documentation, newbie
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I've created some `DAG`s, and I tried to put it on scheduler. I want to run 
> all the tasks in the DAG after exact 24 hours.
> I tried to do something like this.
> {code}
> DEFAULT_ARGS= {
> 'owner'   : 'mubin',
> 'depends_on_past' : False,
> 'start_date'  : datetime(2017, 4, 24, 14, 30),
> 'retries' : 5,
> 'retry_delay' : timedetla(1),
> }
> SCHEDULE_INTERVAL  = timedelta(minutes=1440)
> # SCHEDULE_INTERVAL= timedelta(hours=24)
> # SCHEDULE_INTERVAL= timedelta(days=1)
> dag = DAG('StandardizeDataDag',
> default_args   = DEFAULT_ARGS,
> schedule_interval  = SCHEDULE_INTERVAL
> )
>  {code}   
> I tried to put different intervals, but not any working. However if I try to 
> reset db  {code} airflow resetdb -y {code}  and then run  {code} airflow 
> initdb {code} , it works for once. then after that, scheduler isn't able to 
> run it.
> PS.  {code} airflow scheduler {code}  executed from  {code} root {code} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (AIRFLOW-1076) Support getting variable by string in templates

2017-04-05 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-1076 started by Daniel Huang.
-
> Support getting variable by string in templates
> ---
>
> Key: AIRFLOW-1076
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1076
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Minor
>
> Currently, one can fetch variables in templates with {{ var.value.foo }}. But 
> that doesn't work if the variable key has a character you can't use as an 
> attribute, like ":" or "-". 
> Should provide alternative method of {{ var.value.get('foo:bar') }}. Can then 
> also supply a default value if the variable is not found. This also allows 
> you to fetch the variable specified in another jinja variable (probably not 
> common use case).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-1076) Support getting variable by string in templates

2017-04-05 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Huang updated AIRFLOW-1076:
--
Description: 
Currently, one can fetch variables in templates with {{ var.value.foo }}. But 
that doesn't work if the variable key has a character you can't use as an 
attribute, like ":" or "-". 

Should provide alternative method of {{ var.value.get('foo:bar') }}. Can then 
also supply a default value if the variable is not found. This also allows you 
to fetch the variable specified in another jinja variable (probably not common 
use case).

  was:
Currently, one can fetch variables in templates with `{{ var.value.foo}}`. But 
that doesn't work if the variable key has a character you can't use as an 
attribute, like ":" or "-". 

Should provide alternative method of `{{ var.value.get('foo:bar') }}`. Can then 
also supply a default value if the variable is not found.


> Support getting variable by string in templates
> ---
>
> Key: AIRFLOW-1076
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1076
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Minor
>
> Currently, one can fetch variables in templates with {{ var.value.foo }}. But 
> that doesn't work if the variable key has a character you can't use as an 
> attribute, like ":" or "-". 
> Should provide alternative method of {{ var.value.get('foo:bar') }}. Can then 
> also supply a default value if the variable is not found. This also allows 
> you to fetch the variable specified in another jinja variable (probably not 
> common use case).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1076) Support getting variable by string in templates

2017-04-05 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-1076:
-

 Summary: Support getting variable by string in templates
 Key: AIRFLOW-1076
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1076
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Daniel Huang
Assignee: Daniel Huang
Priority: Minor


Currently, one can fetch variables in templates with `{{ var.value.foo}}`. But 
that doesn't work if the variable key has a character you can't use as an 
attribute, like ":" or "-". 

Should provide alternative method of `{{ var.value.get('foo:bar') }}`. Can then 
also supply a default value if the variable is not found.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (AIRFLOW-1075) Cleanup security docs

2017-04-05 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-1075 started by Daniel Huang.
-
> Cleanup security docs
> -
>
> Key: AIRFLOW-1075
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1075
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docs
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Trivial
>
> Noticed a few minor things to fix, like "Impersonation" being under "SSL" 
> section.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-1075) Cleanup security docs

2017-04-05 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-1075:
-

 Summary: Cleanup security docs
 Key: AIRFLOW-1075
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1075
 Project: Apache Airflow
  Issue Type: Improvement
  Components: docs
Reporter: Daniel Huang
Assignee: Daniel Huang
Priority: Trivial


Noticed a few minor things to fix, like "Impersonation" being under "SSL" 
section.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (AIRFLOW-982) Update examples/docs for ALL_SUCCESS/ONE_SUCCESS including skipped

2017-03-14 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-982 started by Daniel Huang.

> Update examples/docs for ALL_SUCCESS/ONE_SUCCESS including skipped
> --
>
> Key: AIRFLOW-982
> URL: https://issues.apache.org/jira/browse/AIRFLOW-982
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docs, examples
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-983) Make trigger rules more explicit regarding success vs skipped

2017-03-14 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-983:


 Summary: Make trigger rules more explicit regarding success vs 
skipped
 Key: AIRFLOW-983
 URL: https://issues.apache.org/jira/browse/AIRFLOW-983
 Project: Apache Airflow
  Issue Type: Improvement
  Components: dependencies
Reporter: Daniel Huang


Since AIRFLOW-719, the trigger rules all_success/one_success include both 
success and skipped states.

We should probably make ALL_SUCCESS strictly success (again) and add a separate 
ALL_SUCCESS_OR_SKIPPED/ALL_FAILED_OR_SKIPPED. ALL_SUCCESS_OR_SKIPPED may be a 
more appropriate default trigger rule as well. Otherwise, we need to note in 
LatestOnly/ShortCircuit/Branch operators of the appropriate trigger rule to use 
there.

Some previous discussion in 
https://github.com/apache/incubator-airflow/pull/1961



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-982) Update examples/docs for ALL_SUCCESS/ONE_SUCCESS including skipped

2017-03-14 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-982:


 Summary: Update examples/docs for ALL_SUCCESS/ONE_SUCCESS 
including skipped
 Key: AIRFLOW-982
 URL: https://issues.apache.org/jira/browse/AIRFLOW-982
 Project: Apache Airflow
  Issue Type: Improvement
  Components: docs, examples
Reporter: Daniel Huang
Assignee: Daniel Huang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-956) Setup readthedocs for versioned docs

2017-03-08 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-956:


 Summary: Setup readthedocs for versioned docs
 Key: AIRFLOW-956
 URL: https://issues.apache.org/jira/browse/AIRFLOW-956
 Project: Apache Airflow
  Issue Type: Improvement
  Components: docs
Reporter: Daniel Huang
Assignee: Daniel Huang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (AIRFLOW-946) Virtualenv not explicitly used by webserver/worker subprocess

2017-03-07 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-946 started by Daniel Huang.

> Virtualenv not explicitly used by webserver/worker subprocess
> -
>
> Key: AIRFLOW-946
> URL: https://issues.apache.org/jira/browse/AIRFLOW-946
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Minor
>
> I have airflow installed in a virtualenv. I'd expect calling 
> {{/path/to/venv/bin/airflow webserver}} or {{/path/to/venv/bin/airflow 
> worker}} *without activating my virtualenv* to work. However, they both fail 
> to run properly because they spawn a process that is called without 
> specifying the virtualenv.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-946) Virtualenv not explicitly used by webserver/worker subprocess

2017-03-06 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-946:


 Summary: Virtualenv not explicitly used by webserver/worker 
subprocess
 Key: AIRFLOW-946
 URL: https://issues.apache.org/jira/browse/AIRFLOW-946
 Project: Apache Airflow
  Issue Type: Bug
  Components: cli
Reporter: Daniel Huang
Assignee: Daniel Huang
Priority: Minor


I have airflow installed in a virtualenv. I'd expect calling 
{{/path/to/venv/bin/airflow webserver}} or {{/path/to/venv/bin/airflow worker}} 
*without activating my virtualenv* to work. However, they both fail to run 
properly because they spawn a process that is called without specifying the 
virtualenv.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-55) Add HDFS Log Support

2017-03-01 Thread Daniel Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891506#comment-15891506
 ] 

Daniel Huang commented on AIRFLOW-55:
-

Took another stab at getting this through using the existing WebHDFSHook, 
https://github.com/apache/incubator-airflow/pull/2119.

> Add HDFS Log Support
> 
>
> Key: AIRFLOW-55
> URL: https://issues.apache.org/jira/browse/AIRFLOW-55
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: hooks
>Affects Versions: Airflow 1.7.0
>Reporter: Wu Xiang
>Assignee: Daniel Huang
>  Labels: features
> Fix For: Airflow 1.8
>
>
> To support save task logs on HDFS.
> PR:
> https://github.com/apache/incubator-airflow/pull/1409



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (AIRFLOW-55) Add HDFS Log Support

2017-03-01 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Huang reassigned AIRFLOW-55:
---

Assignee: Daniel Huang

> Add HDFS Log Support
> 
>
> Key: AIRFLOW-55
> URL: https://issues.apache.org/jira/browse/AIRFLOW-55
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: hooks
>Affects Versions: Airflow 1.7.0
>Reporter: Wu Xiang
>Assignee: Daniel Huang
>  Labels: features
> Fix For: Airflow 1.8
>
>
> To support save task logs on HDFS.
> PR:
> https://github.com/apache/incubator-airflow/pull/1409



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-872) Tasks are not skipping correctly

2017-02-24 Thread Daniel Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883889#comment-15883889
 ] 

Daniel Huang commented on AIRFLOW-872:
--

Dug a little deeper. Your example DAG and the DAG I ran into this issue both 
use tasks with the default trigger rule of ALL_SUCCESS. SKIPPED is not 
considered a successful status, so the [code is failing the 
DAG|https://github.com/apache/incubator-airflow/blob/2ce7556a590f81b44a7222830e648c6912eb6986/airflow/ti_deps/deps/trigger_rule_dep.py#L166]
 when it sees t3's parent was skipped, which makes sense. I don't know if this 
behavior change from 1.7.1.3->1.8 was an intentional bug fix or unintended side 
effect.

I think we just need a NONE_FAILED trigger rule option that accepts 
success/skipped. I'm happy to make the change if others think it sounds 
reasonable. Open to a better name as well.

Also, I do wonder if it would make more sense as the default (ALL_SUCCESS would 
be the stricter option). Otherwise, we may want to call out this behavior in 
the LatestOnly/ShortCircuit operators.

> Tasks are not skipping correctly
> 
>
> Key: AIRFLOW-872
> URL: https://issues.apache.org/jira/browse/AIRFLOW-872
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli, DagRun, webserver
>Affects Versions: Airflow 1.8
> Environment: docker-compose using docker-compose-LocalExecutor.yml  
> on this fork/branch 
> https://github.com/fdm1/docker-airflow/tree/v1-8-not-skipping
>Reporter: Frank Massi
> Attachments: Screen Shot 2017-02-13 at 1.09.35 PM.png, Screen Shot 
> 2017-02-13 at 1.11.09 PM.png, Screen Shot 2017-02-13 at 1.11.15 PM.png, 
> Screen Shot 2017-02-13 at 1.13.13 PM.png
>
>
> When using the BranchPythonOperator or ShortCircuitOperator to make Dags 
> idempotent, if running the Dag after a successful DagRun and clearing the 
> initial task, the second task skips and then the Dag enters a failed state, 
> as opposed to skipping all remaining tasks and marking the Dag successful.
> Steps to reproduce are outlined here: 
> https://github.com/fdm1/docker-airflow/blob/v1-8-not-skipping/dags/test_clear_bug.py#L4-L9
> In 1.7.1.3, the tasks do skip correctly, so this appears to be a new bug.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (AIRFLOW-872) Tasks are not skipping correctly

2017-02-22 Thread Daniel Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876942#comment-15876942
 ] 

Daniel Huang edited comment on AIRFLOW-872 at 2/22/17 5:19 PM:
---

I hit this as well with the LatestOnlyOperator (very similar to 
ShortCircuitOperator, it also sets its direct downstream tasks to SKIPPED) when 
there are more than one task downstream from the latest only task.

I believe this 
[block|https://github.com/apache/incubator-airflow/blob/50702d06187035c99e51ea936c756c00332c4a4a/airflow/models.py#L4034]
 is being hit and setting the dag to failed prematurely. When I commented it 
out as a temporary workaround, the remaining tasks get set to skipped correctly 
and the dag ends in success.


was (Author: dxhuang):
I hit this as well with the LatestOnlyOperator (very similar to 
ShortCircuitOperator, it also sets its direct downstream tasks to SKIPPED) when 
there are more than one task downstream from the latest only task.

I believe this 
[block](https://github.com/apache/incubator-airflow/blob/50702d06187035c99e51ea936c756c00332c4a4a/airflow/models.py#L4034)
 is being hit and setting the dag to failed prematurely. When I commented it 
out as a temporary workaround, the remaining tasks get set to skipped correctly 
and the dag ends in success.

> Tasks are not skipping correctly
> 
>
> Key: AIRFLOW-872
> URL: https://issues.apache.org/jira/browse/AIRFLOW-872
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli, DagRun, webserver
>Affects Versions: Airflow 1.8
> Environment: docker-compose using docker-compose-LocalExecutor.yml  
> on this fork/branch 
> https://github.com/fdm1/docker-airflow/tree/v1-8-not-skipping
>Reporter: Frank Massi
> Attachments: Screen Shot 2017-02-13 at 1.09.35 PM.png, Screen Shot 
> 2017-02-13 at 1.11.09 PM.png, Screen Shot 2017-02-13 at 1.11.15 PM.png, 
> Screen Shot 2017-02-13 at 1.13.13 PM.png
>
>
> When using the BranchPythonOperator or ShortCircuitOperator to make Dags 
> idempotent, if running the Dag after a successful DagRun and clearing the 
> initial task, the second task skips and then the Dag enters a failed state, 
> as opposed to skipping all remaining tasks and marking the Dag successful.
> Steps to reproduce are outlined here: 
> https://github.com/fdm1/docker-airflow/blob/v1-8-not-skipping/dags/test_clear_bug.py#L4-L9
> In 1.7.1.3, the tasks do skip correctly, so this appears to be a new bug.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-872) Tasks are not skipping correctly

2017-02-21 Thread Daniel Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876942#comment-15876942
 ] 

Daniel Huang commented on AIRFLOW-872:
--

I hit this as well with the LatestOnlyOperator (very similar to 
ShortCircuitOperator, it also sets its direct downstream tasks to SKIPPED) when 
there are more than one task downstream from the latest only task.

I believe this 
[block](https://github.com/apache/incubator-airflow/blob/50702d06187035c99e51ea936c756c00332c4a4a/airflow/models.py#L4034)
 is being hit and setting the dag to failed prematurely. When I commented it 
out as a temporary workaround, the remaining tasks get set to skipped correctly 
and the dag ends in success.

> Tasks are not skipping correctly
> 
>
> Key: AIRFLOW-872
> URL: https://issues.apache.org/jira/browse/AIRFLOW-872
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli, DagRun, webserver
>Affects Versions: Airflow 1.8
> Environment: docker-compose using docker-compose-LocalExecutor.yml  
> on this fork/branch 
> https://github.com/fdm1/docker-airflow/tree/v1-8-not-skipping
>Reporter: Frank Massi
> Attachments: Screen Shot 2017-02-13 at 1.09.35 PM.png, Screen Shot 
> 2017-02-13 at 1.11.09 PM.png, Screen Shot 2017-02-13 at 1.11.15 PM.png, 
> Screen Shot 2017-02-13 at 1.13.13 PM.png
>
>
> When using the BranchPythonOperator or ShortCircuitOperator to make Dags 
> idempotent, if running the Dag after a successful DagRun and clearing the 
> initial task, the second task skips and then the Dag enters a failed state, 
> as opposed to skipping all remaining tasks and marking the Dag successful.
> Steps to reproduce are outlined here: 
> https://github.com/fdm1/docker-airflow/blob/v1-8-not-skipping/dags/test_clear_bug.py#L4-L9
> In 1.7.1.3, the tasks do skip correctly, so this appears to be a new bug.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (AIRFLOW-882) Code example in docs has unnecessary DAG>>Operator assignment

2017-02-17 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-882 started by Daniel Huang.

> Code example in docs has unnecessary DAG>>Operator assignment
> -
>
> Key: AIRFLOW-882
> URL: https://issues.apache.org/jira/browse/AIRFLOW-882
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docs
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Trivial
>
> The docs currently say:
> {code}
> We can put this all together to build a simple pipeline:
> with DAG('my_dag', start_date=datetime(2016, 1, 1)) as dag:
> (
> dag
> >> DummyOperator(task_id='dummy_1')
> >> BashOperator(
> task_id='bash_1',
> bash_command='echo "HELLO!"')
> >> PythonOperator(
> task_id='python_1',
> python_callable=lambda: print("GOODBYE!"))
> )
> {code}
> But the {{dag >> ...}} is unnecessary because the operators are already 
> initialized with the proper DAG 
> (https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/models.py#L1699).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-883) Assigning operator to DAG via bitwise composition does not pickup default args

2017-02-16 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-883:


 Summary: Assigning operator to DAG via bitwise composition does 
not pickup default args
 Key: AIRFLOW-883
 URL: https://issues.apache.org/jira/browse/AIRFLOW-883
 Project: Apache Airflow
  Issue Type: Bug
  Components: models
Reporter: Daniel Huang
Priority: Minor


This is only the case when the operator does not specify {{dag=dag}} and is not 
initialized within a DAG's context manager (due to 
https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/utils/decorators.py#L50)

Example:

{code}
default_args = {
'owner': 'airflow', 
'start_date': datetime(2017, 2, 1)
}
dag = DAG('my_dag', default_args=default_args)
dummy = DummyOperator(task_id='dummy')

dag >> dummy
{code}

This will raise a {{Task is missing the start_date parameter}}. I _think_ this 
should probably be allowed because I assume the purpose of supporting {{dag >> 
op}} was to allow delayed assignment of an operator to a DAG. 

I believe to fix this, on assignment, we would need to go back and go through 
dag.default_args to see if any of those attrs weren't explicitly set on 
task...not the cleanest. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-882) Code example in docs has unnecessary DAG>>Operator assignment

2017-02-16 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-882:


 Summary: Code example in docs has unnecessary DAG>>Operator 
assignment
 Key: AIRFLOW-882
 URL: https://issues.apache.org/jira/browse/AIRFLOW-882
 Project: Apache Airflow
  Issue Type: Improvement
  Components: docs
Reporter: Daniel Huang
Assignee: Daniel Huang
Priority: Trivial


The docs currently say:

{code}
We can put this all together to build a simple pipeline:

with DAG('my_dag', start_date=datetime(2016, 1, 1)) as dag:
(
dag
>> DummyOperator(task_id='dummy_1')
>> BashOperator(
task_id='bash_1',
bash_command='echo "HELLO!"')
>> PythonOperator(
task_id='python_1',
python_callable=lambda: print("GOODBYE!"))
)
{code}

But the {{dag >> ...}} is unnecessary because the operators are already 
initialized with the proper DAG 
(https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/models.py#L1699).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (AIRFLOW-881) Create SubDagOperator within DAG context manager without passing dag param

2017-02-16 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-881 started by Daniel Huang.

> Create SubDagOperator within DAG context manager without passing dag param
> --
>
> Key: AIRFLOW-881
> URL: https://issues.apache.org/jira/browse/AIRFLOW-881
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators, subdag
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Trivial
>
> Currently, the following raises a {{Please pass in the `dag` param}} 
> exception:
> {code}
> with DAG('my_dag', default_args=default_args) as dag:
> op = SubDagOperator(task_id='my_subdag', subdag=subdag_factory(...))
> {code}
> But the SubDagOperator should be aware if it's in the context manager of a 
> dag without having to specify {{dag=dag}} when initializing the operator. 
> Similar to how the {{@apply_defaults}} decorator does it 
> (https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/utils/decorators.py#L50).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-881) Create SubDagOperator within DAG context manager without passing dag param

2017-02-16 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-881:


 Summary: Create SubDagOperator within DAG context manager without 
passing dag param
 Key: AIRFLOW-881
 URL: https://issues.apache.org/jira/browse/AIRFLOW-881
 Project: Apache Airflow
  Issue Type: Improvement
  Components: operators, subdag
Reporter: Daniel Huang
Assignee: Daniel Huang
Priority: Trivial


Currently, the following raises a {{Please pass in the `dag` param}} exception:

{code}
with DAG('my_dag', default_args=default_args) as dag:
op = SubDagOperator(task_id='my_subdag', subdag=subdag_factory(...))
{code}

But the SubDagOperator should be aware if it's in the context manager of a dag 
without having to specify {{dag=dag}} when initializing the operator. Similar 
to how the {{@apply_defaults}} decorator does it 
(https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/utils/decorators.py#L50).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-859) airflow trigger_dag not working

2017-02-10 Thread Daniel Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861865#comment-15861865
 ] 

Daniel Huang commented on AIRFLOW-859:
--

Are you running the scheduler?

{code}
airflow scheduler
{code}

https://airflow.incubator.apache.org/scheduler.html

> airflow trigger_dag  not working
> --
>
> Key: AIRFLOW-859
> URL: https://issues.apache.org/jira/browse/AIRFLOW-859
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1.3
> Environment: Using Airflow 1.7.1.3 on Amazon Web Services (date +%Z # 
> timezone name
>  displays UTC).
> Airflow using Mariadb also on AWS (maria db system_time_zone set to UTC) 
>Reporter: Stefano Tiani-Tanzi
>
> Hi,
> I have a simple DAG ( code at end of this message ).
> The DAG loads OK using CLI "airflow list_dags".
> In Airflow webpage the DAG is set to "On".
> I try and trigger the dag using CLI "airflow trigger_dag trigger_dag_v1" and 
> get feedback as follows:
> [2017-02-10 18:20:12,969] {__init__.py:36} INFO - Using executor LocalExecutor
> [2017-02-10 18:20:13,374] {cli.py:142} INFO - Created  @ 2017-02-10 18:20:13.342199: manual__2017-02-10T18:20:13.342199, externally 
> triggered: True>
> However, the dag does not start/run.
> I can see in the web interface ( using Browse|Dag Runs ) that the run is at 
> "Failed" status but there are no logs associated with the run. Also in the 
> Tree view no runs appear.
> Airflow server and db are both set to UTC and the DAG is switched on in the 
> web UI.
> Have I missed something or am I doing something incorrectly?
> Any help would be much appreciated.
> Many thanks
> Stefano
> import json
> from airflow import DAG
> from airflow.models import Variable
> from airflow.operators import DummyOperator
> from datetime import datetime, timedelta
> default_args = {
> 'owner': 'stef',
> 'start_date': datetime(2017, 01, 01),
> 'retries': 1,
> 'retry_delay': timedelta(minutes=3),
> 'depends_on_past': False,
> 'wait_for_downstream': True,
> 'provide_context': True
> }
> dag = DAG('trigger_dag_v1', default_args=default_args, schedule_interval=None)
> task_dummy = DummyOperator(
> task_id='dummy',
> dag=dag)   



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-847) Xcoms are not passed into SubDAG

2017-02-07 Thread Daniel Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856326#comment-15856326
 ] 

Daniel Huang commented on AIRFLOW-847:
--

I ran into this last week. {{xcom_pull()}} by default pulls xcom values from 
the current DAG 
(https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1668).
 So if you're pulling from the SubDAG and want xcom values from the parent DAG, 
you need to specify it in the {{dag_id}} param. For example, from a template:

{code}
{{ task_instance.xcom_pull(task_ids="do_thing", dag_id=dag.parent_dag.dag_id) }}
{code}

> Xcoms are not passed into SubDAG
> 
>
> Key: AIRFLOW-847
> URL: https://issues.apache.org/jira/browse/AIRFLOW-847
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: subdag, xcom
>Reporter: Robin B
>Priority: Blocker
>
> It's not possible to do a xcom_pull within a subdag
> None of the following seems to be working:
> * As templated var in SubDagoperator
> * As var in SubDagoperator
> * From within Subdag-factory



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (AIRFLOW-770) HDFS hooks should support alternative ways of getting connection

2017-02-06 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-770 started by Daniel Huang.

> HDFS hooks should support alternative ways of getting connection
> 
>
> Key: AIRFLOW-770
> URL: https://issues.apache.org/jira/browse/AIRFLOW-770
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Minor
>
> The HDFS hook currently uses {{get_connections()}} instead of 
> {{get_connection()}} to grab the connection info. I believe this is so if 
> multiple connections are specified, instead of choosing them at random, it 
> appropriately passes them all via snakebite's HAClient.
> As far as I can tell, this means connection info can't be set outside of the 
> UI, since environment variables are not looked at (which had me confused for 
> a bit). I think ideally we'd want to be able to do so for the three different 
> snakebite clients. Here are some possible suggestions for allowing this:
> * AutoConfigClient: add attribute like {{HDFSHook(..., 
> autoconfig=True).get_conn()}}
> * Client: specify single URI in environment variable
> * HAClient: specify multiple URIs in environment variable, separated by 
> commas? Not very adhering to standard and if we did this, we'd probably want 
> to support this across all hooks.
> WebHDFS hook has a similar issue with pulling from env.
> references:
> https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/base_hook.py#L43-L56
> https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/hdfs_hook.py#L45-L73



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (AIRFLOW-770) HDFS hooks should support alternative ways of getting connection

2017-02-06 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Huang reassigned AIRFLOW-770:


Assignee: Daniel Huang

> HDFS hooks should support alternative ways of getting connection
> 
>
> Key: AIRFLOW-770
> URL: https://issues.apache.org/jira/browse/AIRFLOW-770
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Minor
>
> The HDFS hook currently uses {{get_connections()}} instead of 
> {{get_connection()}} to grab the connection info. I believe this is so if 
> multiple connections are specified, instead of choosing them at random, it 
> appropriately passes them all via snakebite's HAClient.
> As far as I can tell, this means connection info can't be set outside of the 
> UI, since environment variables are not looked at (which had me confused for 
> a bit). I think ideally we'd want to be able to do so for the three different 
> snakebite clients. Here are some possible suggestions for allowing this:
> * AutoConfigClient: add attribute like {{HDFSHook(..., 
> autoconfig=True).get_conn()}}
> * Client: specify single URI in environment variable
> * HAClient: specify multiple URIs in environment variable, separated by 
> commas? Not very adhering to standard and if we did this, we'd probably want 
> to support this across all hooks.
> WebHDFS hook has a similar issue with pulling from env.
> references:
> https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/base_hook.py#L43-L56
> https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/hdfs_hook.py#L45-L73



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-770) HDFS hooks should support alternative ways of getting connection

2017-02-06 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Huang updated AIRFLOW-770:
-
Description: 
The HDFS hook currently uses {{get_connections()}} instead of 
{{get_connection()}} to grab the connection info. I believe this is so if 
multiple connections are specified, instead of choosing them at random, it 
appropriately passes them all via snakebite's HAClient.

As far as I can tell, this means connection info can't be set outside of the 
UI, since environment variables are not looked at (which had me confused for a 
bit). I think ideally we'd want to be able to do so for the three different 
snakebite clients. Here are some possible suggestions for allowing this:

* AutoConfigClient: add attribute like {{HDFSHook(..., 
autoconfig=True).get_conn()}}
* Client: specify single URI in environment variable
* HAClient: specify multiple URIs in environment variable, separated by commas? 
Not very adhering to standard and if we did this, we'd probably want to support 
this across all hooks.

WebHDFS hook has a similar issue with pulling from env.

references:
https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/base_hook.py#L43-L56
https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/hdfs_hook.py#L45-L73

  was:
The HDFS hook currently uses {{get_connections()}} instead of 
{{get_connection()}} to grab the connection info. I believe this is so if 
multiple connections are specified, instead of choosing them at random, it 
appropriately passes them all via snakebite's HAClient.

As far as I can tell, this means connection info can't be set outside of the 
UI, since environment variables are not looked at (which had me confused for a 
bit). I think ideally we'd want to be able to do so for the three different 
snakebite clients. Here are some possible suggestions for allowing this:

* AutoConfigClient: add attribute like {{HDFSHook(..., 
autoconfig=True).get_conn()}}
* Client: specify single URI in environment variable
* HAClient: specify multiple URIs in environment variable, separated by commas? 
Not very adhering to standard and if we did this, we'd probably want to support 
this across all hooks.

references:
https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/base_hook.py#L43-L56
https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/hdfs_hook.py#L45-L73

Summary: HDFS hooks should support alternative ways of getting 
connection  (was: HDFS hook should support alternative ways of getting 
connection)

> HDFS hooks should support alternative ways of getting connection
> 
>
> Key: AIRFLOW-770
> URL: https://issues.apache.org/jira/browse/AIRFLOW-770
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Reporter: Daniel Huang
>Priority: Minor
>
> The HDFS hook currently uses {{get_connections()}} instead of 
> {{get_connection()}} to grab the connection info. I believe this is so if 
> multiple connections are specified, instead of choosing them at random, it 
> appropriately passes them all via snakebite's HAClient.
> As far as I can tell, this means connection info can't be set outside of the 
> UI, since environment variables are not looked at (which had me confused for 
> a bit). I think ideally we'd want to be able to do so for the three different 
> snakebite clients. Here are some possible suggestions for allowing this:
> * AutoConfigClient: add attribute like {{HDFSHook(..., 
> autoconfig=True).get_conn()}}
> * Client: specify single URI in environment variable
> * HAClient: specify multiple URIs in environment variable, separated by 
> commas? Not very adhering to standard and if we did this, we'd probably want 
> to support this across all hooks.
> WebHDFS hook has a similar issue with pulling from env.
> references:
> https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/base_hook.py#L43-L56
> https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/hdfs_hook.py#L45-L73



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AIRFLOW-768) Clicking "Code" when zoomed into a subdag causes an exception

2017-01-31 Thread Daniel Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847833#comment-15847833
 ] 

Daniel Huang commented on AIRFLOW-768:
--

Looks like a duplicate of AIRFLOW-365, which I just put up a PR for.

> Clicking "Code" when zoomed into a subdag causes an exception
> -
>
> Key: AIRFLOW-768
> URL: https://issues.apache.org/jira/browse/AIRFLOW-768
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: subdag, webserver
>Affects Versions: Airflow 1.7.1.3
> Environment: puckel/docker-airflow:1.7.1.3-5
>Reporter: Dirk Gorissen
>
> Zoom into a subdag in the graph view. Then clicking on Code gives:
> ---
> Node: c3a8ec77a1e6
> ---
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1817, in 
> wsgi_app
> response = self.full_dispatch_request()
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1477, in 
> full_dispatch_request
> rv = self.handle_user_exception(e)
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1381, in 
> handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1475, in 
> full_dispatch_request
> rv = self.dispatch_request()
>   File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1461, in 
> dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 68, 
> in inner
> return self._run_view(f, *args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/flask_admin/base.py", line 
> 367, in _run_view
> return fn(self, *args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/flask_login.py", line 755, in 
> decorated_view
> return func(*args, **kwargs)
>   File "/usr/local/lib/python2.7/dist-packages/airflow/www/views.py", line 
> 655, in code
> m = importlib.import_module(dag.module_name)
> AttributeError: 'DAG' object has no attribute 'module_name'



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (AIRFLOW-365) Code view in subdag trigger exception

2017-01-31 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Huang reassigned AIRFLOW-365:


Assignee: Daniel Huang

> Code view in subdag trigger exception
> -
>
> Key: AIRFLOW-365
> URL: https://issues.apache.org/jira/browse/AIRFLOW-365
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: Airflow 1.7.1.3
> Environment: Centos 7
>Reporter: marco pioltelli
>Assignee: Daniel Huang
>Priority: Minor
>
> Hi,
> first of all this product is wonderful.
> I have created my DAG with subdag.
> It works very well, the following issue happen on the UI:
> - going into the dag, selecting (zoom into) a subdag
> - clicking into the Code section to view the code, the following exception is 
> triggered
> ---
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
> response = self.full_dispatch_request()
>   File "/usr/lib/python2.7/site-packages/flask/app.py", line 1477, in 
> full_dispatch_request
> rv = self.handle_user_exception(e)
>   File "/usr/lib/python2.7/site-packages/flask/app.py", line 1381, in 
> handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File "/usr/lib/python2.7/site-packages/flask/app.py", line 1475, in 
> full_dispatch_request
> rv = self.dispatch_request()
>   File "/usr/lib/python2.7/site-packages/flask/app.py", line 1461, in 
> dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File "/usr/lib/python2.7/site-packages/flask_admin/base.py", line 68, in 
> inner
> return self._run_view(f, *args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/flask_admin/base.py", line 367, in 
> _run_view
> return fn(self, *args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/flask_login.py", line 755, in 
> decorated_view
> return func(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/airflow/www/views.py", line 655, in 
> code
> m = importlib.import_module(dag.module_name)
> AttributeError: 'DAG' object has no attribute 'module_name'
> It seems it is searching for a DAG instead for a subdag
> This also happens  with the example sub dag operator.
> Thanks
> Marco



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (AIRFLOW-806) UI should properly ignore DAG doc when it is None

2017-01-25 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-806 started by Daniel Huang.

> UI should properly ignore DAG doc when it is None
> -
>
> Key: AIRFLOW-806
> URL: https://issues.apache.org/jira/browse/AIRFLOW-806
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Affects Versions: Airflow 1.7.1.3
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Trivial
>
> If {{dag.doc_md = __doc__}}, but there are no docstrings on the file, the 
> Graph view will error with:
> {code}
> AttributeError: 'NoneType' object has no attribute 'strip'
> {code}
> The UI should just assume we have no docs to show for the DAG. I suspect user 
> in AIRFLOW-625 ran into this error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-806) UI should properly ignore DAG doc when it is None

2017-01-25 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-806:


 Summary: UI should properly ignore DAG doc when it is None
 Key: AIRFLOW-806
 URL: https://issues.apache.org/jira/browse/AIRFLOW-806
 Project: Apache Airflow
  Issue Type: Improvement
  Components: ui
Affects Versions: Airflow 1.7.1.3
Reporter: Daniel Huang
Assignee: Daniel Huang
Priority: Trivial


If {{dag.doc_md = __doc__}}, but there are no docstrings on the file, the Graph 
view will error with:

{code}
AttributeError: 'NoneType' object has no attribute 'strip'
{code}

The UI should just assume we have no docs to show for the DAG. I suspect user 
in AIRFLOW-625 ran into this error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-625) doc_md in concepts document seems wrong

2017-01-25 Thread Daniel Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838399#comment-15838399
 ] 

Daniel Huang commented on AIRFLOW-625:
--

I wasn't able to reproduce this. 

Did you by chance have {{dag.doc_md = __doc__}} set (like 
https://github.com/apache/incubator-airflow/blob/master/docs/concepts.rst does) 
when you first tried this? Maybe DAG did not refresh on your latest example 
attempt? That would have triggered this error since you don't have any 
docstrings set. Also in that case, changing {{t1.doc_md}} to {{dag.doc_md}} 
would have fixed your issue because then {{dag.doc_md}} was no longer {{None}}.

> doc_md in concepts document seems wrong
> ---
>
> Key: AIRFLOW-625
> URL: https://issues.apache.org/jira/browse/AIRFLOW-625
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1
> Environment: CentOS 7.2 
>Reporter: Flex Gao
>
> In 
> [https://github.com/apache/incubator-airflow/blob/master/docs/concepts.rst] 
> it said *doc_md* is an attribute of a task, but this will give an error on 
> webserver *Graph* tab
> Example Dag file:
> {code}
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
> from airflow import DAG
> from airflow.operators.bash_operator import BashOperator
> from datetime import datetime, timedelta
> SCRIPTS_PATH = '/var/lib/airflow/scripts'
> default_args = {
> 'depends_on_past': False,
> 'start_date': datetime(2016, 11, 9, 0, 55),
> }
> dag = DAG('elasticsearch', default_args=default_args, 
> schedule_interval=timedelta(days=1))
> t1 = BashOperator(
> task_id='daily_index_delete',
> bash_command='%s/es_clean.py' % SCRIPTS_PATH,
> dag=dag)
> t1.doc_md = """\
>  Task Documentation
> Clean ES Indeices Every Day
> """
> {code}
> This will give a traceback: 
> *AttributeError: 'NoneType' object has no attribute 'strip'*
> But if i changed *t1.doc_md* to *dag.doc_md*, everything is ok.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-770) HDFS hook should support alternative ways of getting connection

2017-01-19 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-770:


 Summary: HDFS hook should support alternative ways of getting 
connection
 Key: AIRFLOW-770
 URL: https://issues.apache.org/jira/browse/AIRFLOW-770
 Project: Apache Airflow
  Issue Type: Improvement
  Components: hooks
Reporter: Daniel Huang
Priority: Minor


The HDFS hook currently uses {{get_connections()}} instead of 
{{get_connection()}} to grab the connection info. I believe this is so if 
multiple connections are specified, instead of choosing them at random, it 
appropriately passes them all via snakebite's HAClient.

As far as I can tell, this means connection info can't be set outside of the 
UI, since environment variables are not looked at (which had me confused for a 
bit). I think ideally we'd want to be able to do so for the three different 
snakebite clients. Here are some possible suggestions for allowing this:

* AutoConfigClient: add attribute like {{HDFSHook(..., 
autoconfig=True).get_conn()}}
* Client: specify single URI in environment variable
* HAClient: specify multiple URIs in environment variable, separated by commas? 
Not very adhering to standard and if we did this, we'd probably want to support 
this across all hooks.

references:
https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/base_hook.py#L43-L56
https://github.com/apache/incubator-airflow/blob/b56cb5cc97de074bb0e520f66b79e7eb2d913fb1/airflow/hooks/hdfs_hook.py#L45-L73



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)