[jira] [Updated] (AIRFLOW-179) DbApiHook string serialization fails when string contains non-ASCII characters

2016-05-25 Thread John Bodley (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Bodley updated AIRFLOW-179:

Description: 
The DbApiHook.insert_rows(...) method tries to serialize all values to strings 
using the ASCII codec,  this is problematic if the cell contains non-ASCII 
characters, i.e.

>>> from airflow.hooks import DbApiHook
>>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng')
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", 
line 196, in _serialize_cell
return "'" + str(cell).replace("'", "''") + "'"
  File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line 
102, in __new__
return super(newstr, cls).__new__(cls, value)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: ordinal 
not in range(128)


Rather than manually trying to serialize and escape values to an ASCII string 
one should try to serialize the value to string using the character set of the 
corresponding target database leveraging the connection to mutate the object to 
the SQL string literal.

Note an exception should still be thrown if the target encoding is not 
compatible with the source encoding.

  was:
The DbApiHook.insert_rows(...) method tries to serialize all values to strings 
using the ASCII codec,  this is problematic if the cell contains non-ASCII 
characters, i.e.

>>> from airflow.hooks import DbApiHook
>>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng')
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", 
line 196, in _serialize_cell
return "'" + str(cell).replace("'", "''") + "'"
  File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line 
102, in __new__
return super(newstr, cls).__new__(cls, value)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: ordinal 
not in range(128)


Rather than manually trying to serialize values to an ASCII string one should 
try to serialize the value to string using the character set of the 
corresponding target database leveraging the connection to mutate an object to 
the SQL string literal.

Note an exception should still be thrown if the target encoding is not 
compatible with the source encoding.


> DbApiHook string serialization fails when string contains non-ASCII characters
> --
>
> Key: AIRFLOW-179
> URL: https://issues.apache.org/jira/browse/AIRFLOW-179
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Reporter: John Bodley
>Assignee: John Bodley
>
> The DbApiHook.insert_rows(...) method tries to serialize all values to 
> strings using the ASCII codec,  this is problematic if the cell contains 
> non-ASCII characters, i.e.
> >>> from airflow.hooks import DbApiHook
> >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng')
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", 
> line 196, in _serialize_cell
> return "'" + str(cell).replace("'", "''") + "'"
>   File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line 
> 102, in __new__
> return super(newstr, cls).__new__(cls, value)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: 
> ordinal not in range(128)
> Rather than manually trying to serialize and escape values to an ASCII string 
> one should try to serialize the value to string using the character set of 
> the corresponding target database leveraging the connection to mutate the 
> object to the SQL string literal.
> Note an exception should still be thrown if the target encoding is not 
> compatible with the source encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (AIRFLOW-179) DbApiHook string serialization fails when string contains non-ASCII characters

2016-05-25 Thread John Bodley (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-179 started by John Bodley.
---
> DbApiHook string serialization fails when string contains non-ASCII characters
> --
>
> Key: AIRFLOW-179
> URL: https://issues.apache.org/jira/browse/AIRFLOW-179
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Reporter: John Bodley
>Assignee: John Bodley
>
> The DbApiHook.insert_rows(...) method tries to serialize all values to 
> strings using the ASCII codec,  this is problematic if the cell contains 
> non-ASCII characters, i.e.
> >>> from airflow.hooks import DbApiHook
> >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng')
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", 
> line 196, in _serialize_cell
> return "'" + str(cell).replace("'", "''") + "'"
>   File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line 
> 102, in __new__
> return super(newstr, cls).__new__(cls, value)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: 
> ordinal not in range(128)
> Rather than manually trying to serialize values to an ASCII string one should 
> try to serialize the value to string using the character set of the 
> corresponding target database leveraging the connection to mutate an object 
> to the SQL string literal.
> Note an exception should still be thrown if the target encoding is not 
> compatible with the source encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-179) DbApiHook string serialization fails when string contains non-ASCII characters

2016-05-25 Thread John Bodley (JIRA)
John Bodley created AIRFLOW-179:
---

 Summary: DbApiHook string serialization fails when string contains 
non-ASCII characters
 Key: AIRFLOW-179
 URL: https://issues.apache.org/jira/browse/AIRFLOW-179
 Project: Apache Airflow
  Issue Type: Bug
  Components: hooks
Reporter: John Bodley
Assignee: John Bodley


The DbApiHook.insert_rows(...) method tries to serialize all values to strings 
using the ASCII codec,  this is problematic if the cell contains non-ASCII 
characters, i.e.

>>> from airflow.hooks import DbApiHook
>>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng')
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", 
line 196, in _serialize_cell
return "'" + str(cell).replace("'", "''") + "'"
  File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line 
102, in __new__
return super(newstr, cls).__new__(cls, value)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: ordinal 
not in range(128)


Rather than manually trying to serialize values to an ASCII string one should 
try to serialize the value to string using the character set of the 
corresponding target database leveraging the connection to mutate an object to 
the SQL string literal.

Note an exception should still be thrown if the target encoding is not 
compatible with the source encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-178) Zip files in DAG folder does not get picked up by Ariflow

2016-05-25 Thread Joy Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joy Gao updated AIRFLOW-178:

External issue URL: https://github.com/apache/incubator-airflow/pull/1545

> Zip files in DAG folder does not get picked up by Ariflow
> -
>
> Key: AIRFLOW-178
> URL: https://issues.apache.org/jira/browse/AIRFLOW-178
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Joy Gao
>Assignee: Joy Gao
>Priority: Minor
>
> The collect_dags method in DagBag class currently skips any file that does 
> not end in '.py', thereby skipping potential zip files in the DAG folder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301539#comment-15301539
 ] 

Bolke de Bruin edited comment on AIRFLOW-168 at 5/26/16 5:08 AM:
-

The double scheduling is indeed a bug on master, also with the updated 
scheduler from 124, that I will need to fix.


was (Author: bolke):
The double scheduling is indeed a bug, also with the updated scheduler from 
124, that I will need to fix.

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, 
> screenshot-1.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. Though it 
> works as expected on a prod env which is running v1.7 with celery workers and 
> mysql backend.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301539#comment-15301539
 ] 

Bolke de Bruin commented on AIRFLOW-168:


The double scheduling is indeed a bug, also with the updated scheduler from 
124, that I will need to fix.

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, 
> screenshot-1.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. Though it 
> works as expected on a prod env which is running v1.7 with celery workers and 
> mysql backend.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-171) Email does not work in 1.7.1.2

2016-05-25 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301533#comment-15301533
 ] 

Bolke de Bruin commented on AIRFLOW-171:


It should, but I dont think it is, be mentioned in UPDATING.md.

> Email does not work in 1.7.1.2
> --
>
> Key: AIRFLOW-171
> URL: https://issues.apache.org/jira/browse/AIRFLOW-171
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1
> Environment: AWS Amazon Linux Image
>Reporter: Hao Ye
>
> Job failure emails was working in 1.7.0. They seem to have stopped working in 
> 1.7.1.
> Error is
> {quote}
> [2016-05-25 00:48:02,334] {models.py:1311} ERROR - Failed to send email to: 
> ['em...@email.com']
> [2016-05-25 00:48:02,334] {models.py:1312} ERROR - 'module' object has no 
> attribute 'send_email_smtp'
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, 
> in handle_failure
> self.email_alert(error, is_retry=False)
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, 
> in email_alert
> send_email(task.email, title, body)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 42, in send_email
> backend = getattr(module, attr)
> AttributeError: 'module' object has no attribute 'send_email_smtp'
> {quote}
> File exists and method exists. Seems to work fine when called in python 
> directly.
> Maybe it's loading the wrong email module.
> Tried to set PYTHONPATH to have 
> /usr/local/lib/python2.7/site-packages/airflow earlier in the path, but that 
> didn't seem to work either.
> Could this be related to the utils refactoring that happened between 1.7.0 
> and 1.7.1?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-178) Zip files in DAG folder does not get picked up by Ariflow

2016-05-25 Thread Joy Gao (JIRA)
Joy Gao created AIRFLOW-178:
---

 Summary: Zip files in DAG folder does not get picked up by Ariflow
 Key: AIRFLOW-178
 URL: https://issues.apache.org/jira/browse/AIRFLOW-178
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Joy Gao
Assignee: Joy Gao
Priority: Minor


The collect_dags method in DagBag class currently skips any file that does not 
end in '.py', thereby skipping potential zip files in the DAG folder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301507#comment-15301507
 ] 

Bolke de Bruin edited comment on AIRFLOW-168 at 5/26/16 4:44 AM:
-

Yes sorry I mentioned this on gitter. With master deadlock detection is broken 
due to the eager creation and the scheduler will not check for existing 
task_instances before creation, hence the constraint error. It will need the 
follow up patch from AIRFLOW-128.

Black is indeed the color (basically undefined) for tasksinstances that are 
created but have not been picked up by the scheduler.


was (Author: bolke):
Yes sorry I mentioned this on gitter. With master deadlock detection is broken 
due to the eager creation. It will need the follow up patch from AIRFLOW-128.

Black is indeed the color (basically undefined)

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, 
> screenshot-1.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. Though it 
> works as expected on a prod env which is running v1.7 with celery workers and 
> mysql backend.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301507#comment-15301507
 ] 

Bolke de Bruin edited comment on AIRFLOW-168 at 5/26/16 4:42 AM:
-

Yes sorry I mentioned this on gitter. With master deadlock detection is broken 
due to the eager creation. It will need the follow up patch from AIRFLOW-128.

Black is indeed the color (basically undefined)


was (Author: bolke):
Yes sorry I mentioned this on gitter. With master deadlock detection is broken 
due to the eager creation. It will need the follow up patch from AIRFLOW-128.

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, 
> screenshot-1.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. Though it 
> works as expected on a prod env which is running v1.7 with celery workers and 
> mysql backend.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (AIRFLOW-161) Redirection to external url

2016-05-25 Thread Sumit Maheshwari (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Maheshwari reassigned AIRFLOW-161:


Assignee: Sumit Maheshwari

> Redirection to external url
> ---
>
> Key: AIRFLOW-161
> URL: https://issues.apache.org/jira/browse/AIRFLOW-161
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Sumit Maheshwari
>Assignee: Sumit Maheshwari
>
> Hi,
> I am not able to find a good way (apart from loading everything upfront), 
> where I can redirect someone to a external service url, using the information 
> stored in airflow. There could be many use cases like downloading a signed 
> file from s3, redirecting to hadoop job tracker, or a direct case on which I 
> am working which is linking airflow tasks to qubole commands. 
> I already have a working model and will open a PR soon. Please let me know if 
> there existing ways already.
> Thanks,
> Sumit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-177) Resume a failed dag

2016-05-25 Thread Sumit Maheshwari (JIRA)
Sumit Maheshwari created AIRFLOW-177:


 Summary: Resume a failed dag
 Key: AIRFLOW-177
 URL: https://issues.apache.org/jira/browse/AIRFLOW-177
 Project: Apache Airflow
  Issue Type: New Feature
  Components: core
Reporter: Sumit Maheshwari


Say I've a dag with 10 nodes and one of the dag run got failed at 5th node. Now 
if I want to resume that dag, I can go and run individual task one by one. Is 
there any way by which I can just tell dag_id and execution_date (or run_id) 
and it automatically retries only failed tasks?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (AIRFLOW-167) Get dag state for a given execution date.

2016-05-25 Thread Sumit Maheshwari (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Maheshwari reassigned AIRFLOW-167:


Assignee: Sumit Maheshwari

> Get dag state for a given execution date.
> -
>
> Key: AIRFLOW-167
> URL: https://issues.apache.org/jira/browse/AIRFLOW-167
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Reporter: Sumit Maheshwari
>Assignee: Sumit Maheshwari
>
> I was trying to get state for a particular dag-run programmatically, but 
> couldn't find a way. 
> If we could have a rest call like 
> `/admin/dagrun?dag_id=&execution_date=` and get the output that 
> would be best. Currently we've to do html parsing to get the same. 
> Other (and easier) way is to add a cli support like we have for `task_state`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301507#comment-15301507
 ] 

Bolke de Bruin commented on AIRFLOW-168:


Yes sorry I mentioned this on gitter. With master deadlock detection is broken 
due to the eager creation. It will need the follow up patch from AIRFLOW-128.

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, 
> screenshot-1.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. Though it 
> works as expected on a prod env which is running v1.7 with celery workers and 
> mysql backend.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Sumit Maheshwari (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Maheshwari updated AIRFLOW-168:
-
Description: 
I was looking at example_xcom example and found that it got scheduled twice. 
Ones at the start_time and ones at the current time. To be correct I tried 
multiple times (by reloading db) and its same. 

I am on airflow master, using sequential executor with sqlite3. Though it works 
as expected on a prod env which is running v1.7 with celery workers and mysql 
backend.  



  was:
I was looking at example_xcom example and found that it got scheduled twice. 
Ones at the start_time and ones at the current time. To be correct I tried 
multiple times (by reloading db) and its same. 

I am on airflow master, using sequential executor with sqlite3. 




> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, 
> screenshot-1.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. Though it 
> works as expected on a prod env which is running v1.7 with celery workers and 
> mysql backend.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301461#comment-15301461
 ] 

Chris Riccomini edited comment on AIRFLOW-168 at 5/26/16 3:24 AM:
--

I noticed that the scheduler log shows (stacktrace at bottom):

{noformat}
[2016-05-25 20:22:37,925] {jobs.py:580} INFO - Prioritizing 0 queued jobs
[2016-05-25 20:22:37,933] {jobs.py:732} INFO - Starting 0 scheduler jobs
[2016-05-25 20:22:37,933] {jobs.py:747} INFO - Done queuing tasks, calling the 
executor's heartbeat
[2016-05-25 20:22:37,933] {jobs.py:750} INFO - Loop took: 0.011795 seconds
[2016-05-25 20:22:37,936] {models.py:308} INFO - Finding 'running' jobs without 
a recent heartbeat
[2016-05-25 20:22:37,937] {models.py:314} INFO - Failing jobs without heartbeat 
after 2016-05-25 20:20:22.937222
[2016-05-25 20:22:42,925] {jobs.py:580} INFO - Prioritizing 0 queued jobs
[2016-05-25 20:22:42,934] {jobs.py:732} INFO - Starting 1 scheduler jobs
[2016-05-25 20:22:42,977] {models.py:2703} INFO - Checking state for 
[2016-05-25 20:22:42,983] {jobs.py:504} INFO - Getting list of tasks to skip 
for active runs.
[2016-05-25 20:22:42,986] {jobs.py:520} INFO - Checking dependencies on 3 tasks 
instances, minus 0 skippable ones
[2016-05-25 20:22:42,991] {base_executor.py:36} INFO - Adding to queue: airflow 
run example_xcom push 2016-05-25T20:22:42.953808 --local -sd 
DAGS_FOLDER/example_dags/example_xcom.py 
[2016-05-25 20:22:42,993] {base_executor.py:36} INFO - Adding to queue: airflow 
run example_xcom push_by_returning 2016-05-25T20:22:42.953808 --local -sd 
DAGS_FOLDER/example_dags/example_xcom.py 
[2016-05-25 20:22:43,011] {jobs.py:747} INFO - Done queuing tasks, calling the 
executor's heartbeat
[2016-05-25 20:22:43,012] {jobs.py:750} INFO - Loop took: 0.089461 seconds
[2016-05-25 20:22:43,018] {models.py:308} INFO - Finding 'running' jobs without 
a recent heartbeat
[2016-05-25 20:22:43,019] {models.py:314} INFO - Failing jobs without heartbeat 
after 2016-05-25 20:20:28.019143
[2016-05-25 20:22:43,028] {sequential_executor.py:26} INFO - Executing command: 
airflow run example_xcom push 2016-05-25T20:22:42.953808 --local -sd 
DAGS_FOLDER/example_dags/example_xcom.py 
[2016-05-25 20:22:43,453] {__init__.py:36} INFO - Using executor 
SequentialExecutor
Logging into: 
/Users/chrisr/airflow/logs/example_xcom/push/2016-05-25T20:22:42.953808
[2016-05-25 20:22:44,300] {__init__.py:36} INFO - Using executor 
SequentialExecutor
[2016-05-25 20:22:48,937] {sequential_executor.py:26} INFO - Executing command: 
airflow run example_xcom push_by_returning 2016-05-25T20:22:42.953808 --local 
-sd DAGS_FOLDER/example_dags/example_xcom.py 
[2016-05-25 20:22:49,366] {__init__.py:36} INFO - Using executor 
SequentialExecutor
Logging into: 
/Users/chrisr/airflow/logs/example_xcom/push_by_returning/2016-05-25T20:22:42.953808
[2016-05-25 20:22:50,210] {__init__.py:36} INFO - Using executor 
SequentialExecutor
[2016-05-25 20:22:54,844] {jobs.py:580} INFO - Prioritizing 0 queued jobs
[2016-05-25 20:22:54,853] {jobs.py:732} INFO - Starting 1 scheduler jobs
[2016-05-25 20:22:54,903] {models.py:2703} INFO - Checking state for 
[2016-05-25 20:22:54,907] {models.py:2703} INFO - Checking state for 
[2016-05-25 20:22:54,911] {jobs.py:504} INFO - Getting list of tasks to skip 
for active runs.
[2016-05-25 20:22:54,913] {jobs.py:520} INFO - Checking dependencies on 6 tasks 
instances, minus 2 skippable ones
[2016-05-25 20:22:54,920] {base_executor.py:36} INFO - Adding to queue: airflow 
run example_xcom push 2015-01-01T00:00:00 --local -sd 
DAGS_FOLDER/example_dags/example_xcom.py 
[2016-05-25 20:22:54,921] {base_executor.py:36} INFO - Adding to queue: airflow 
run example_xcom push_by_returning 2015-01-01T00:00:00 --local -sd 
DAGS_FOLDER/example_dags/example_xcom.py 
[2016-05-25 20:22:54,935] {base_executor.py:36} INFO - Adding to queue: airflow 
run example_xcom puller 2016-05-25T20:22:42.953808 --local -sd 
DAGS_FOLDER/example_dags/example_xcom.py 
[2016-05-25 20:22:54,954] {jobs.py:747} INFO - Done queuing tasks, calling the 
executor's heartbeat
[2016-05-25 20:22:54,954] {jobs.py:750} INFO - Loop took: 0.113319 seconds
[2016-05-25 20:22:54,960] {models.py:308} INFO - Finding 'running' jobs without 
a recent heartbeat
[2016-05-25 20:22:54,960] {models.py:314} INFO - Failing jobs without heartbeat 
after 2016-05-25 20:20:39.960629
[2016-05-25 20:22:54,978] {sequential_executor.py:26} INFO - Executing command: 
airflow run example_xcom push_by_returning 2015-01-01T00:00:00 --local -sd 
DAGS_FOLDER/example_dags/example_xcom.py 
[2016-05-25 20:22:55,410] {__init__.py:36} INFO - Using executor 
SequentialExecutor
Logging into: 
/Users/chrisr/airflow/logs/example_xcom/push_by_returning/2015-01-01T00:00:00
[2016-05-25 20:22:56,239] {__init__.py:36} INFO - Using executor 
SequentialExecutor
[2016-05-25 20:23:00,873] {sequential_executor

[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301461#comment-15301461
 ] 

Chris Riccomini commented on AIRFLOW-168:
-

I noticed that the scheduler log shows:

{noformat}
[2016-05-25 20:22:37,925] {jobs.py:580} INFO - Prioritizing 0 queued jobs
[2016-05-25 20:22:37,933] {jobs.py:732} INFO - Starting 0 scheduler jobs
[2016-05-25 20:22:37,933] {jobs.py:747} INFO - Done queuing tasks, calling the 
executor's heartbeat
[2016-05-25 20:22:37,933] {jobs.py:750} INFO - Loop took: 0.011795 seconds
[2016-05-25 20:22:37,936] {models.py:308} INFO - Finding 'running' jobs without 
a recent heartbeat
[2016-05-25 20:22:37,937] {models.py:314} INFO - Failing jobs without heartbeat 
after 2016-05-25 20:20:22.937222
[2016-05-25 20:22:42,925] {jobs.py:580} INFO - Prioritizing 0 queued jobs
[2016-05-25 20:22:42,934] {jobs.py:732} INFO - Starting 1 scheduler jobs
[2016-05-25 20:22:42,977] {models.py:2703} INFO - Checking state for 
[2016-05-25 20:22:42,983] {jobs.py:504} INFO - Getting list of tasks to skip 
for active runs.
[2016-05-25 20:22:42,986] {jobs.py:520} INFO - Checking dependencies on 3 tasks 
instances, minus 0 skippable ones
[2016-05-25 20:22:42,991] {base_executor.py:36} INFO - Adding to queue: airflow 
run example_xcom push 2016-05-25T20:22:42.953808 --local -sd 
DAGS_FOLDER/example_dags/example_xcom.py 
[2016-05-25 20:22:42,993] {base_executor.py:36} INFO - Adding to queue: airflow 
run example_xcom push_by_returning 2016-05-25T20:22:42.953808 --local -sd 
DAGS_FOLDER/example_dags/example_xcom.py 
[2016-05-25 20:22:43,011] {jobs.py:747} INFO - Done queuing tasks, calling the 
executor's heartbeat
[2016-05-25 20:22:43,012] {jobs.py:750} INFO - Loop took: 0.089461 seconds
[2016-05-25 20:22:43,018] {models.py:308} INFO - Finding 'running' jobs without 
a recent heartbeat
[2016-05-25 20:22:43,019] {models.py:314} INFO - Failing jobs without heartbeat 
after 2016-05-25 20:20:28.019143
[2016-05-25 20:22:43,028] {sequential_executor.py:26} INFO - Executing command: 
airflow run example_xcom push 2016-05-25T20:22:42.953808 --local -sd 
DAGS_FOLDER/example_dags/example_xcom.py 
[2016-05-25 20:22:43,453] {__init__.py:36} INFO - Using executor 
SequentialExecutor
Logging into: 
/Users/chrisr/airflow/logs/example_xcom/push/2016-05-25T20:22:42.953808
[2016-05-25 20:22:44,300] {__init__.py:36} INFO - Using executor 
SequentialExecutor
[2016-05-25 20:22:48,937] {sequential_executor.py:26} INFO - Executing command: 
airflow run example_xcom push_by_returning 2016-05-25T20:22:42.953808 --local 
-sd DAGS_FOLDER/example_dags/example_xcom.py 
[2016-05-25 20:22:49,366] {__init__.py:36} INFO - Using executor 
SequentialExecutor
Logging into: 
/Users/chrisr/airflow/logs/example_xcom/push_by_returning/2016-05-25T20:22:42.953808
[2016-05-25 20:22:50,210] {__init__.py:36} INFO - Using executor 
SequentialExecutor
[2016-05-25 20:22:54,844] {jobs.py:580} INFO - Prioritizing 0 queued jobs
[2016-05-25 20:22:54,853] {jobs.py:732} INFO - Starting 1 scheduler jobs
[2016-05-25 20:22:54,903] {models.py:2703} INFO - Checking state for 
[2016-05-25 20:22:54,907] {models.py:2703} INFO - Checking state for 
[2016-05-25 20:22:54,911] {jobs.py:504} INFO - Getting list of tasks to skip 
for active runs.
[2016-05-25 20:22:54,913] {jobs.py:520} INFO - Checking dependencies on 6 tasks 
instances, minus 2 skippable ones
[2016-05-25 20:22:54,920] {base_executor.py:36} INFO - Adding to queue: airflow 
run example_xcom push 2015-01-01T00:00:00 --local -sd 
DAGS_FOLDER/example_dags/example_xcom.py 
[2016-05-25 20:22:54,921] {base_executor.py:36} INFO - Adding to queue: airflow 
run example_xcom push_by_returning 2015-01-01T00:00:00 --local -sd 
DAGS_FOLDER/example_dags/example_xcom.py 
[2016-05-25 20:22:54,935] {base_executor.py:36} INFO - Adding to queue: airflow 
run example_xcom puller 2016-05-25T20:22:42.953808 --local -sd 
DAGS_FOLDER/example_dags/example_xcom.py 
[2016-05-25 20:22:54,954] {jobs.py:747} INFO - Done queuing tasks, calling the 
executor's heartbeat
[2016-05-25 20:22:54,954] {jobs.py:750} INFO - Loop took: 0.113319 seconds
[2016-05-25 20:22:54,960] {models.py:308} INFO - Finding 'running' jobs without 
a recent heartbeat
[2016-05-25 20:22:54,960] {models.py:314} INFO - Failing jobs without heartbeat 
after 2016-05-25 20:20:39.960629
[2016-05-25 20:22:54,978] {sequential_executor.py:26} INFO - Executing command: 
airflow run example_xcom push_by_returning 2015-01-01T00:00:00 --local -sd 
DAGS_FOLDER/example_dags/example_xcom.py 
[2016-05-25 20:22:55,410] {__init__.py:36} INFO - Using executor 
SequentialExecutor
Logging into: 
/Users/chrisr/airflow/logs/example_xcom/push_by_returning/2015-01-01T00:00:00
[2016-05-25 20:22:56,239] {__init__.py:36} INFO - Using executor 
SequentialExecutor
[2016-05-25 20:23:00,873] {sequential_executor.py:26} INFO - Executing command: 
airflow run example_xcom push 2015-01

[jira] [Updated] (AIRFLOW-161) Redirection to external url

2016-05-25 Thread Sumit Maheshwari (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Maheshwari updated AIRFLOW-161:
-
External issue URL: 
https://github.com/apache/incubator-airflow/pull/1538/files

> Redirection to external url
> ---
>
> Key: AIRFLOW-161
> URL: https://issues.apache.org/jira/browse/AIRFLOW-161
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Sumit Maheshwari
>
> Hi,
> I am not able to find a good way (apart from loading everything upfront), 
> where I can redirect someone to a external service url, using the information 
> stored in airflow. There could be many use cases like downloading a signed 
> file from s3, redirecting to hadoop job tracker, or a direct case on which I 
> am working which is linking airflow tasks to qubole commands. 
> I already have a working model and will open a PR soon. Please let me know if 
> there existing ways already.
> Thanks,
> Sumit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301449#comment-15301449
 ] 

Chris Riccomini commented on AIRFLOW-168:
-

The comment in [this|https://github.com/apache/incubator-airflow/pull/1506] 
pull request from [~bolke] reads:

{quote}
This creates dagrun from a Dag. It also creates the TaskInstances from the 
tasks known at instantiation time. By having taskinstances created at dagrun 
instantiation time, deadlocks that were tested for will not take place anymore 
(@jlowin, correct? different test required?). *For now, the visual consequence 
of having these taskinstances already there is that they will be black in the 
tree view.*
{quote}

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, 
> screenshot-1.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-161) Redirection to external url

2016-05-25 Thread Sumit Maheshwari (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301450#comment-15301450
 ] 

Sumit Maheshwari commented on AIRFLOW-161:
--

Sure, can you please cc top contributors to PR or here.

> Redirection to external url
> ---
>
> Key: AIRFLOW-161
> URL: https://issues.apache.org/jira/browse/AIRFLOW-161
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Sumit Maheshwari
>
> Hi,
> I am not able to find a good way (apart from loading everything upfront), 
> where I can redirect someone to a external service url, using the information 
> stored in airflow. There could be many use cases like downloading a signed 
> file from s3, redirecting to hadoop job tracker, or a direct case on which I 
> am working which is linking airflow tasks to qubole commands. 
> I already have a working model and will open a PR soon. Please let me know if 
> there existing ways already.
> Thanks,
> Sumit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Sumit Maheshwari (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301444#comment-15301444
 ] 

Sumit Maheshwari commented on AIRFLOW-168:
--

Actually I am getting the same.. 2 schedules, 5 task_instaces (instead of 6). 

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, 
> screenshot-1.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301442#comment-15301442
 ] 

Chris Riccomini edited comment on AIRFLOW-168 at 5/26/16 3:16 AM:
--

I was able to reproduce this. My results were even stranger. One of the tasks 
is showing up as black in the treeview. [~bolke], I'm wondering if this is 
related to the scheduler work you're doing?

!screenshot-1.png||width=300!



was (Author: criccomini):
I was able to reproduce this. My results were even stranger. One of the tasks 
is showing up as black in the treeview. [~bolke], I'm wondering if this is 
related to the scheduler work you're doing?

!screenshot-1.png|thumbnail!


> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, 
> screenshot-1.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301442#comment-15301442
 ] 

Chris Riccomini edited comment on AIRFLOW-168 at 5/26/16 3:16 AM:
--

I was able to reproduce this. My results were even stranger. One of the tasks 
is showing up as black in the treeview. [~bolke], I'm wondering if this is 
related to the scheduler work you're doing?

!screenshot-1.png|width=300!



was (Author: criccomini):
I was able to reproduce this. My results were even stranger. One of the tasks 
is showing up as black in the treeview. [~bolke], I'm wondering if this is 
related to the scheduler work you're doing?

!screenshot-1.png||width=300!


> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, 
> screenshot-1.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301442#comment-15301442
 ] 

Chris Riccomini edited comment on AIRFLOW-168 at 5/26/16 3:17 AM:
--

I was able to reproduce this. My results were even stranger. One of the tasks 
is showing up as black in the treeview. [~bolke], I'm wondering if this is 
related to the scheduler work you're doing?

!screenshot-1.png|width=500!



was (Author: criccomini):
I was able to reproduce this. My results were even stranger. One of the tasks 
is showing up as black in the treeview. [~bolke], I'm wondering if this is 
related to the scheduler work you're doing?

!screenshot-1.png|width=300!


> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, 
> screenshot-1.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301442#comment-15301442
 ] 

Chris Riccomini edited comment on AIRFLOW-168 at 5/26/16 3:15 AM:
--

I was able to reproduce this. My results were even stranger. One of the tasks 
is showing up as black in the treeview. [~bolke], I'm wondering if this is 
related to the scheduler work you're doing?

!screenshot-1.png|thumbnail!



was (Author: criccomini):
I was able to reproduce this. My results were even stranger. One of the tasks 
is showing up as black in the treeview. [~bolke], I'm wondering if this is 
related to the scheduler work you're doing?

!screenshot-1.png!


> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, 
> screenshot-1.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301442#comment-15301442
 ] 

Chris Riccomini edited comment on AIRFLOW-168 at 5/26/16 3:14 AM:
--

I was able to reproduce this. My results were even stranger. One of the tasks 
is showing up as black in the treeview. [~bolke], I'm wondering if this is 
related to the scheduler work you're doing?

!screenshot-1.png!



was (Author: criccomini):
I was able to reproduce this. My results were even stranger. One of the tasks 
is showing up as black in the treeview. [~bolke], I'm wondering if this is 
related to the scheduler work you're doing?

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, 
> screenshot-1.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301442#comment-15301442
 ] 

Chris Riccomini commented on AIRFLOW-168:
-

I was able to reproduce this. My results were even stranger. One of the tasks 
is showing up as black in the treeview. [~bolke], I'm wondering if this is 
related to the scheduler work you're doing?

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, 
> screenshot-1.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301443#comment-15301443
 ] 

Chris Riccomini commented on AIRFLOW-168:
-

Note: My machine is running on PST, not UTC.

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, 
> screenshot-1.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-168:

Attachment: screenshot-1.png

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, 
> screenshot-1.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Sumit Maheshwari (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301440#comment-15301440
 ] 

Sumit Maheshwari commented on AIRFLOW-168:
--

No, it's set to IST, will that be a concern? 

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-168:

Affects Version/s: Airflow 1.7.1.2

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301436#comment-15301436
 ] 

Chris Riccomini commented on AIRFLOW-168:
-

Is the timezone on the machine set to UTC?

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-161) Redirection to external url

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301433#comment-15301433
 ] 

Chris Riccomini commented on AIRFLOW-161:
-

Yea, I wouldn't object if there were a generic way to redirect from the UI. My 
objection is more hard coding Quoble stuff in generic Airflow files (views.py, 
dag.html).

I think we'd also need to loop in a few more committers to make sure everyone 
agrees on the approach taken.

> Redirection to external url
> ---
>
> Key: AIRFLOW-161
> URL: https://issues.apache.org/jira/browse/AIRFLOW-161
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Sumit Maheshwari
>
> Hi,
> I am not able to find a good way (apart from loading everything upfront), 
> where I can redirect someone to a external service url, using the information 
> stored in airflow. There could be many use cases like downloading a signed 
> file from s3, redirecting to hadoop job tracker, or a direct case on which I 
> am working which is linking airflow tasks to qubole commands. 
> I already have a working model and will open a PR soon. Please let me know if 
> there existing ways already.
> Thanks,
> Sumit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-161) Redirection to external url

2016-05-25 Thread Sumit Maheshwari (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301400#comment-15301400
 ] 

Sumit Maheshwari commented on AIRFLOW-161:
--

Fair enough, I can't challenge that decision, as Qubole is not as big as aws or 
gce :).  

However as that link will be visible only for qubole_operator type tasks, which 
implies that user is using qubole and having that link will help him. Also I 
think that airflow gonna need /redirect (or similar) route in near future.  

> Redirection to external url
> ---
>
> Key: AIRFLOW-161
> URL: https://issues.apache.org/jira/browse/AIRFLOW-161
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Sumit Maheshwari
>
> Hi,
> I am not able to find a good way (apart from loading everything upfront), 
> where I can redirect someone to a external service url, using the information 
> stored in airflow. There could be many use cases like downloading a signed 
> file from s3, redirecting to hadoop job tracker, or a direct case on which I 
> am working which is linking airflow tasks to qubole commands. 
> I already have a working model and will open a PR soon. Please let me know if 
> there existing ways already.
> Thanks,
> Sumit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-167) Get dag state for a given execution date.

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301398#comment-15301398
 ] 

Chris Riccomini commented on AIRFLOW-167:
-

Commented, thanks!

> Get dag state for a given execution date.
> -
>
> Key: AIRFLOW-167
> URL: https://issues.apache.org/jira/browse/AIRFLOW-167
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Reporter: Sumit Maheshwari
>
> I was trying to get state for a particular dag-run programmatically, but 
> couldn't find a way. 
> If we could have a rest call like 
> `/admin/dagrun?dag_id=&execution_date=` and get the output that 
> would be best. Currently we've to do html parsing to get the same. 
> Other (and easier) way is to add a cli support like we have for `task_state`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-169) Hide expire dags in UI

2016-05-25 Thread Sumit Maheshwari (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301346#comment-15301346
 ] 

Sumit Maheshwari commented on AIRFLOW-169:
--

I am referring to landing page, ie. /admin. Expired dags means dags which are 
supposed to run @once and already ran, or dags with end_time which is in past 
to current time. 

Similarly if in cli, we can pass some option (say -e) to list_dags command, 
which will ignores those expired dags. 

> Hide expire dags in UI
> --
>
> Key: AIRFLOW-169
> URL: https://issues.apache.org/jira/browse/AIRFLOW-169
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: ui
>Reporter: Sumit Maheshwari
>
> It would be great if we've option to hide expired schedules from UI. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Sumit Maheshwari (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301339#comment-15301339
 ] 

Sumit Maheshwari commented on AIRFLOW-168:
--

I was on the latest master. 

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-167) Get dag state for a given execution date.

2016-05-25 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-167:

External issue URL: https://github.com/apache/incubator-airflow/pull/1541

> Get dag state for a given execution date.
> -
>
> Key: AIRFLOW-167
> URL: https://issues.apache.org/jira/browse/AIRFLOW-167
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Reporter: Sumit Maheshwari
>
> I was trying to get state for a particular dag-run programmatically, but 
> couldn't find a way. 
> If we could have a rest call like 
> `/admin/dagrun?dag_id=&execution_date=` and get the output that 
> would be best. Currently we've to do html parsing to get the same. 
> Other (and easier) way is to add a cli support like we have for `task_state`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-167) Get dag state for a given execution date.

2016-05-25 Thread Sumit Maheshwari (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301338#comment-15301338
 ] 

Sumit Maheshwari commented on AIRFLOW-167:
--

Yup, already opened https://github.com/apache/incubator-airflow/pull/1541/files.

> Get dag state for a given execution date.
> -
>
> Key: AIRFLOW-167
> URL: https://issues.apache.org/jira/browse/AIRFLOW-167
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Reporter: Sumit Maheshwari
>
> I was trying to get state for a particular dag-run programmatically, but 
> couldn't find a way. 
> If we could have a rest call like 
> `/admin/dagrun?dag_id=&execution_date=` and get the output that 
> would be best. Currently we've to do html parsing to get the same. 
> Other (and easier) way is to add a cli support like we have for `task_state`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-171) Email does not work in 1.7.1.2

2016-05-25 Thread Hao Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301175#comment-15301175
 ] 

Hao Ye commented on AIRFLOW-171:


Thanks, I got it working.

We had an older version of the config that pointed to
email_backend = airflow.utils.send_email_smtp
instead of 
email_backend = airflow.utils.email.send_email_smtp.

On that note, is there an easy way to detect config changes when upgrading? We 
currently keep our config across upgrades and so may not pick up new changes.

> Email does not work in 1.7.1.2
> --
>
> Key: AIRFLOW-171
> URL: https://issues.apache.org/jira/browse/AIRFLOW-171
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1
> Environment: AWS Amazon Linux Image
>Reporter: Hao Ye
>
> Job failure emails was working in 1.7.0. They seem to have stopped working in 
> 1.7.1.
> Error is
> {quote}
> [2016-05-25 00:48:02,334] {models.py:1311} ERROR - Failed to send email to: 
> ['em...@email.com']
> [2016-05-25 00:48:02,334] {models.py:1312} ERROR - 'module' object has no 
> attribute 'send_email_smtp'
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, 
> in handle_failure
> self.email_alert(error, is_retry=False)
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, 
> in email_alert
> send_email(task.email, title, body)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 42, in send_email
> backend = getattr(module, attr)
> AttributeError: 'module' object has no attribute 'send_email_smtp'
> {quote}
> File exists and method exists. Seems to work fine when called in python 
> directly.
> Maybe it's loading the wrong email module.
> Tried to set PYTHONPATH to have 
> /usr/local/lib/python2.7/site-packages/airflow earlier in the path, but that 
> didn't seem to work either.
> Could this be related to the utils refactoring that happened between 1.7.0 
> and 1.7.1?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[1/2] incubator-airflow git commit: AIRFLOW-45: Support Hidden Airflow Variables

2016-05-25 Thread criccomini
Repository: incubator-airflow
Updated Branches:
  refs/heads/master 7332c40c2 -> 456dada69


AIRFLOW-45: Support Hidden Airflow Variables


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/3e309415
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/3e309415
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/3e309415

Branch: refs/heads/master
Commit: 3e3094157eb516ad37c4691ddfcdda5c9444352e
Parents: 7332c40
Author: Matthew Chen 
Authored: Wed May 25 08:45:24 2016 -0700
Committer: Matthew Chen 
Committed: Wed May 25 08:45:24 2016 -0700

--
 airflow/configuration.py |   9 -
 airflow/www/views.py |  36 +++-
 docs/img/variable_hidden.png | Bin 0 -> 154299 bytes
 docs/ui.rst  |  13 +
 4 files changed, 56 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3e309415/airflow/configuration.py
--
diff --git a/airflow/configuration.py b/airflow/configuration.py
index 13bb344..582bc7c 100644
--- a/airflow/configuration.py
+++ b/airflow/configuration.py
@@ -156,7 +156,10 @@ defaults = {
 },
 'github_enterprise': {
 'api_rev': 'v3'
-}
+},
+'admin': {
+'hide_sensitive_variable_fields': True,
+},
 }
 
 DEFAULT_CONFIG = """\
@@ -386,6 +389,10 @@ authenticate = False
 # default_principal = admin
 # default_secret = admin
 
+[admin]
+# UI to hide sensitive variable fields when set to True
+hide_sensitive_variable_fields = True
+
 """
 
 TEST_CONFIG = """\

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3e309415/airflow/www/views.py
--
diff --git a/airflow/www/views.py b/airflow/www/views.py
index bcd390c..78f9677 100644
--- a/airflow/www/views.py
+++ b/airflow/www/views.py
@@ -82,6 +82,17 @@ current_user = airflow.login.current_user
 logout_user = airflow.login.logout_user
 
 FILTER_BY_OWNER = False
+
+DEFAULT_SENSITIVE_VARIABLE_FIELDS = (
+'password',
+'secret',
+'passwd',
+'authorization',
+'api_key',
+'apikey',
+'access_token',
+)
+
 if conf.getboolean('webserver', 'FILTER_BY_OWNER'):
 # filter_by_owner if authentication is enabled and filter_by_owner is true
 FILTER_BY_OWNER = not current_app.config['LOGIN_DISABLED']
@@ -265,6 +276,11 @@ def recurse_tasks(tasks, task_ids, dag_ids, 
task_id_to_dag):
 task_id_to_dag[tasks.task_id] = tasks.dag
 
 
+def should_hide_value_for_key(key_name):
+return any(s in key_name for s in DEFAULT_SENSITIVE_VARIABLE_FIELDS) \
+   and conf.getboolean('admin', 'hide_sensitive_variable_fields')
+
+
 class Airflow(BaseView):
 
 def is_visible(self):
@@ -2015,11 +2031,17 @@ admin.add_view(mv)
 class VariableView(wwwutils.LoginMixin, AirflowModelView):
 verbose_name = "Variable"
 verbose_name_plural = "Variables"
+
+def hidden_field_formatter(view, context, model, name):
+if should_hide_value_for_key(model.key):
+return Markup('*' * 8)
+return getattr(model, name)
+
 form_columns = (
 'key',
 'val',
 )
-column_list = ('key', 'is_encrypted',)
+column_list = ('key', 'val', 'is_encrypted',)
 column_filters = ('key', 'val')
 column_searchable_list = ('key', 'val')
 form_widget_args = {
@@ -2028,6 +2050,18 @@ class VariableView(wwwutils.LoginMixin, 
AirflowModelView):
 'rows': 20,
 }
 }
+column_sortable_list = (
+'key',
+'val',
+'is_encrypted',
+)
+column_formatters = {
+'val': hidden_field_formatter
+}
+
+def on_form_prefill(self, form, id):
+if should_hide_value_for_key(form.key.data):
+form.val.data = '*' * 8
 
 
 class JobModelView(ModelViewOnly):

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3e309415/docs/img/variable_hidden.png
--
diff --git a/docs/img/variable_hidden.png b/docs/img/variable_hidden.png
new file mode 100644
index 000..e081ca3
Binary files /dev/null and b/docs/img/variable_hidden.png differ

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3e309415/docs/ui.rst
--
diff --git a/docs/ui.rst b/docs/ui.rst
index 112804e..4b232fa 100644
--- a/docs/ui.rst
+++ b/docs/ui.rst
@@ -41,6 +41,19 @@ dependencies and their current status for a specific run.
 
 
 
+Variable View
+.
+The variable view allows you to list, create, edit or delete the key-value pair
+of a variable used during jobs. Value of 

[jira] [Closed] (AIRFLOW-45) Support hidden Airflow variables

2016-05-25 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini closed AIRFLOW-45.
--
Resolution: Fixed

+1 Merged. Thanks! [~cheny258].

> Support hidden Airflow variables
> 
>
> Key: AIRFLOW-45
> URL: https://issues.apache.org/jira/browse/AIRFLOW-45
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: security
>Reporter: Chris Riccomini
>Assignee: Matthew Chen
>
> We have a use case where someone wants to set a variable for their DAG, but 
> they don't want it visible via the UI. I see that variables are encrypted in 
> the DB (if the crypto package is installed), but the variables are still 
> visible via the UI, which is a little annoying.
> Obviously, this is not 100% secure, since you can still create a DAG to read 
> the variable, but it will at least keep arbitrary users from logging 
> in/loading the UI and seeing the variable.
> I propose basically handling this the same way that DB hook passwords are 
> handled. Don't show them in the UI when the edit button is clicked, but allow 
> the variables to be editable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-45) Support hidden Airflow variables

2016-05-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301084#comment-15301084
 ] 

ASF subversion and git services commented on AIRFLOW-45:


Commit 3e3094157eb516ad37c4691ddfcdda5c9444352e in incubator-airflow's branch 
refs/heads/master from [~cheny258]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=3e30941 ]

AIRFLOW-45: Support Hidden Airflow Variables


> Support hidden Airflow variables
> 
>
> Key: AIRFLOW-45
> URL: https://issues.apache.org/jira/browse/AIRFLOW-45
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: security
>Reporter: Chris Riccomini
>Assignee: Matthew Chen
>
> We have a use case where someone wants to set a variable for their DAG, but 
> they don't want it visible via the UI. I see that variables are encrypted in 
> the DB (if the crypto package is installed), but the variables are still 
> visible via the UI, which is a little annoying.
> Obviously, this is not 100% secure, since you can still create a DAG to read 
> the variable, but it will at least keep arbitrary users from logging 
> in/loading the UI and seeing the variable.
> I propose basically handling this the same way that DB hook passwords are 
> handled. Don't show them in the UI when the edit button is clicked, but allow 
> the variables to be editable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-176) PR tool crashes with non-integer JIRA ids

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301115#comment-15301115
 ] 

Chris Riccomini commented on AIRFLOW-176:
-

+1

> PR tool crashes with non-integer JIRA ids
> -
>
> Key: AIRFLOW-176
> URL: https://issues.apache.org/jira/browse/AIRFLOW-176
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: PR tool
>Affects Versions: Airflow 1.7.1.2
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>
> The PR tool crashes if a non-integer id is passed. This includes the default 
> ID  (AIRFLOW-XXX) so it affects folks who don't type in a new ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[2/2] incubator-airflow git commit: Merge pull request #1530 from mattuuh7/hidden-fields

2016-05-25 Thread criccomini
Merge pull request #1530 from mattuuh7/hidden-fields


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/456dada6
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/456dada6
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/456dada6

Branch: refs/heads/master
Commit: 456dada695174989e6785f08f58112e760b72d8b
Parents: 7332c40 3e30941
Author: Chris Riccomini 
Authored: Wed May 25 16:15:01 2016 -0700
Committer: Chris Riccomini 
Committed: Wed May 25 16:15:01 2016 -0700

--
 airflow/configuration.py |   9 -
 airflow/www/views.py |  36 +++-
 docs/img/variable_hidden.png | Bin 0 -> 154299 bytes
 docs/ui.rst  |  13 +
 4 files changed, 56 insertions(+), 2 deletions(-)
--




[jira] [Work started] (AIRFLOW-173) Create a FileSensor / NFSFileSensor sensor

2016-05-25 Thread Andre (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-173 started by Andre.
-
> Create a FileSensor / NFSFileSensor sensor
> --
>
> Key: AIRFLOW-173
> URL: https://issues.apache.org/jira/browse/AIRFLOW-173
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Andre
>Assignee: Andre
>Priority: Minor
>
> While HDFS and WebHDFS suit most organisations using Hadoop, for some shops 
> running MapR-FS, Airflow implementation is simplified by the use of plain 
> files pointing to MapR's NFS gateways.
> A FileSensor and/or a NFSFileSensor would assist the adoption of Airflow 
> within the MapR customer base, but more importantly, help those who are using 
> POSIX compliant distributed filesystems that can be mounted on Unix 
> derivative systems (e.g. as MapR-FS (via NFS), CephFS, GlusterFS, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-173) Create a FileSensor / NFSFileSensor sensor

2016-05-25 Thread Andre (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andre updated AIRFLOW-173:
--
Assignee: (was: Andre)

> Create a FileSensor / NFSFileSensor sensor
> --
>
> Key: AIRFLOW-173
> URL: https://issues.apache.org/jira/browse/AIRFLOW-173
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Andre
>Priority: Minor
>
> While HDFS and WebHDFS suit most organisations using Hadoop, for some shops 
> running MapR-FS, Airflow implementation is simplified by the use of plain 
> files pointing to MapR's NFS gateways.
> A FileSensor and/or a NFSFileSensor would assist the adoption of Airflow 
> within the MapR customer base, but more importantly, help those who are using 
> POSIX compliant distributed filesystems that can be mounted on Unix 
> derivative systems (e.g. as MapR-FS (via NFS), CephFS, GlusterFS, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-167) Get dag state for a given execution date.

2016-05-25 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-167:

Component/s: cli

> Get dag state for a given execution date.
> -
>
> Key: AIRFLOW-167
> URL: https://issues.apache.org/jira/browse/AIRFLOW-167
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Reporter: Sumit Maheshwari
>
> I was trying to get state for a particular dag-run programmatically, but 
> couldn't find a way. 
> If we could have a rest call like 
> `/admin/dagrun?dag_id=&execution_date=` and get the output that 
> would be best. Currently we've to do html parsing to get the same. 
> Other (and easier) way is to add a cli support like we have for `task_state`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-172) All example DAGs report "Only works with the CeleryExecutor, sorry"

2016-05-25 Thread Andre (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301030#comment-15301030
 ] 

Andre commented on AIRFLOW-172:
---

indeed. Noticed that. 

May need to re-read the documentation and possibly suggest some changes to 
tutorial

> All example DAGs report "Only works with the CeleryExecutor, sorry"
> ---
>
> Key: AIRFLOW-172
> URL: https://issues.apache.org/jira/browse/AIRFLOW-172
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor
>Affects Versions: Airflow 1.7.1
>Reporter: Andre
>
> After installing airflow and trying to run some example DAGs I was faced with 
> {{Only works with the CeleryExecutor, sorry}}
> on every DAG I tried to run.
> {code}$ pip list
> airflow (1.7.1.2)
> alembic (0.8.6)
> Babel (1.3)
> bitarray (0.8.1)
> cffi (1.6.0)
> chartkick (0.4.2)
> croniter (0.3.12)
> cryptography (1.3.2)
> dill (0.2.5)
> docutils (0.12)
> Flask (0.10.1)
> Flask-Admin (1.4.0)
> Flask-Cache (0.13.1)
> Flask-Login (0.2.11)
> Flask-WTF (0.12)
> funcsigs (0.4)
> future (0.15.2)
> google-apputils (0.4.2)
> gunicorn (19.3.0)
> hive-thrift-py (0.0.1)
> idna (2.1)
> impyla (0.13.7)
> itsdangerous (0.24)
> Jinja2 (2.8)
> lockfile (0.12.2)
> Mako (1.0.4)
> Markdown (2.6.6)
> MarkupSafe (0.23)
> mysqlclient (1.3.7)
> numpy (1.11.0)
> pandas (0.18.1)
> pip (8.1.2)
> ply (3.8)
> protobuf (2.6.1)
> pyasn1 (0.1.9)
> pycparser (2.14)
> Pygments (2.1.3)
> PyHive (0.1.8)
> pykerberos (1.1.10)
> python-daemon (2.1.1)
> python-dateutil (2.5.3)
> python-editor (1.0)
> python-gflags (3.0.5)
> pytz (2016.4)
> requests (2.10.0)
> setproctitle (1.1.10)
> setuptools (21.2.1)
> six (1.10.0)
> snakebite (2.9.0)
> SQLAlchemy (1.0.13)
> thrift (0.9.3)
> thriftpy (0.3.8)
> unicodecsv (0.14.1)
> Werkzeug (0.11.10)
> WTForms (2.1)
> {code}
> {code}
> $ airflow webserver -p 8088
> [2016-05-25 15:22:48,204] {__init__.py:36} INFO - Using executor LocalExecutor
>      _
>  |__( )_  __/__  /  __
>   /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
> ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
>  _/_/  |_/_/  /_//_//_/  \//|__/
> [2016-05-25 15:22:49,066] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> Running the Gunicorn server with 4 syncworkers on host 0.0.0.0 and port 8088 
> with a timeout of 120...
> [2016-05-25 15:22:49 +1000] [20191] [INFO] Starting gunicorn 19.3.0
> [2016-05-25 15:22:49 +1000] [20191] [INFO] Listening at: http://0.0.0.0:8088 
> (20191)
> [2016-05-25 15:22:49 +1000] [20191] [INFO] Using worker: sync
> [2016-05-25 15:22:49 +1000] [20197] [INFO] Booting worker with pid: 20197
> [2016-05-25 15:22:49 +1000] [20198] [INFO] Booting worker with pid: 20198
> [2016-05-25 15:22:49 +1000] [20199] [INFO] Booting worker with pid: 20199
> [2016-05-25 15:22:49 +1000] [20200] [INFO] Booting worker with pid: 20200
> [2016-05-25 15:22:50,086] {__init__.py:36} INFO - Using executor LocalExecutor
> [2016-05-25 15:22:50,176] {__init__.py:36} INFO - Using executor LocalExecutor
> [2016-05-25 15:22:50,262] {__init__.py:36} INFO - Using executor LocalExecutor
> [2016-05-25 15:22:50,364] {__init__.py:36} INFO - Using executor LocalExecutor
> [2016-05-25 15:22:50,931] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> [2016-05-25 15:22:51,000] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> [2016-05-25 15:22:51,093] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> [2016-05-25 15:22:51,191] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-173) Create a FileSensor / NFSFileSensor sensor

2016-05-25 Thread Andre (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301022#comment-15301022
 ] 

Andre commented on AIRFLOW-173:
---

inotify is more efficient but it is not portable to some file systems...

For example, these are DFSs that can be mounted as normal filesystems but 
where, I suspect, the inotify approach wouldn't play ball nicely:

Ceph
http://www.spinics.net/lists/ceph-users/msg23087.html

Gluster:
https://www.gluster.org/pipermail/gluster-users/2012-September/011276.html

As consequence, when writting 
https://github.com/apache/incubator-airflow/pull/1543, I ended up using a more 
"unsophisticated" approach of pooling the file (very much like WebHDFS and HDFS 
sensors are doing due to the lack of inotify).

> Create a FileSensor / NFSFileSensor sensor
> --
>
> Key: AIRFLOW-173
> URL: https://issues.apache.org/jira/browse/AIRFLOW-173
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Andre
>Priority: Minor
>
> While HDFS and WebHDFS suit most organisations using Hadoop, for some shops 
> running MapR-FS, Airflow implementation is simplified by the use of plain 
> files pointing to MapR's NFS gateways.
> A FileSensor and/or a NFSFileSensor would assist the adoption of Airflow 
> within the MapR customer base, but more importantly, help those who are using 
> POSIX compliant distributed filesystems that can be mounted on Unix 
> derivative systems (e.g. as MapR-FS (via NFS), CephFS, GlusterFS, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-167) Get dag state for a given execution date.

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300973#comment-15300973
 ] 

Chris Riccomini commented on AIRFLOW-167:
-

This sounds reasonable to me. Want to send a PR?

> Get dag state for a given execution date.
> -
>
> Key: AIRFLOW-167
> URL: https://issues.apache.org/jira/browse/AIRFLOW-167
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Reporter: Sumit Maheshwari
>
> I was trying to get state for a particular dag-run programmatically, but 
> couldn't find a way. 
> If we could have a rest call like 
> `/admin/dagrun?dag_id=&execution_date=` and get the output that 
> would be best. Currently we've to do html parsing to get the same. 
> Other (and easier) way is to add a cli support like we have for `task_state`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-160) Parse DAG files through child processes

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300992#comment-15300992
 ] 

Chris Riccomini commented on AIRFLOW-160:
-

{quote}
We've also seen an unusual case where modules loaded by the user DAG affect 
operation of the scheduler
{quote}

We're also very concerned about security, and having DAGs evaluated in-process 
in the scheduler is pretty dangerous, since it allows DAGs to take over the 
scheduler. Definite +1 to making DAG parsing a subprocess. As a separate 
ticket, we will also probably want to make the subprocesses run as a 
DAG-specific user (e.g. owner). This will prevent DAGs from messing with the 
Airflow files on the file system, killing Airflow processes, etc.

{quote}
 I think inotify is more suitable or an API call to refresh the dagbag if 
triggered externally. API call is also nicer because it can update all 
processes that require a load of the dagbag.
{quote}

+1 to this comment as well. Our ops folks were actually asking today if there's 
an API to trigger a DAG refresh. They are going to push DAGs to a folder via a 
deploy script, and would like to tell Airflow to refresh accordingly. Polling 
other than during this operation is pointless. inotify would also work (and is 
probably a better solution than the API, even).

> Parse DAG files through child processes
> ---
>
> Key: AIRFLOW-160
> URL: https://issues.apache.org/jira/browse/AIRFLOW-160
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Paul Yang
>Assignee: Paul Yang
>
> Currently, the Airflow scheduler parses all user DAG files in the same 
> process as the scheduler itself. We've seen issues in production where bad 
> DAG files cause scheduler to fail. A simple example is if the user script 
> calls `sys.exit(1)`, the scheduler will exit as well. We've also seen an 
> unusual case where modules loaded by the user DAG affect operation of the 
> scheduler. For better uptime, the scheduler should be resistant to these 
> problematic user DAGs.
> The proposed solution is to parse and schedule user DAGs through child 
> processes. This way, the main scheduler process is more isolated from bad 
> DAGs. There's a side benefit as well - since parsing is distributed among 
> multiple processes, it's possible to parse the DAG files more frequently, 
> reducing the latency between when a DAG is modified and when the changes are 
> picked up.
> Another issue right now is that all DAGs must be scheduled before any tasks 
> are sent to the executor. This means that the frequency of task scheduling is 
> limited by the slowest DAG to schedule. The changes needed for scheduling 
> DAGs through child processes will also make it easy to decouple this process 
> and allow tasks to be scheduled and sent to the executor in a more 
> independent fashion. This way, overall scheduling won't be held back by a 
> slow DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-101) Acces the tree view of the Web UI instead of the graph view when clicking on a dag

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300985#comment-15300985
 ] 

Chris Riccomini commented on AIRFLOW-101:
-

[~bolke], which PR was this part of?

> Acces the tree view of the Web UI instead of the graph view when clicking on 
> a dag
> --
>
> Key: AIRFLOW-101
> URL: https://issues.apache.org/jira/browse/AIRFLOW-101
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Affects Versions: Airflow 1.7.0
> Environment: All
>Reporter: Michal TOMA
>Priority: Minor
> Fix For: Airflow 1.7.1
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I'd like to have a config parameter that would allow to access directly the 
> tree view of the DAG tasks instead of the current graph view.
> I my environment failed tasks are very common and I need to have a quick view 
> of what failed and when in the past. As of now I must click either the DAG 
> and than click the tree view menu or click the very small tree view icon.
> For me the DAG graph is not that important and I'd like to see the tree view 
> when clicking on the name of the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-167) Get dag state for a given execution date.

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300977#comment-15300977
 ] 

Chris Riccomini commented on AIRFLOW-167:
-

Specifically, the CLI solution sounds reasonable. The REST solution is better, 
but we haven't yet set up a REST API for Airflow yet. In the mean time, want to 
send a PR for the CLI {{dag_state}} command?

> Get dag state for a given execution date.
> -
>
> Key: AIRFLOW-167
> URL: https://issues.apache.org/jira/browse/AIRFLOW-167
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: cli
>Reporter: Sumit Maheshwari
>
> I was trying to get state for a particular dag-run programmatically, but 
> couldn't find a way. 
> If we could have a rest call like 
> `/admin/dagrun?dag_id=&execution_date=` and get the output that 
> would be best. Currently we've to do html parsing to get the same. 
> Other (and easier) way is to add a cli support like we have for `task_state`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-176) PR tool crashes with non-integer JIRA ids

2016-05-25 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin updated AIRFLOW-176:
---
External issue URL: https://github.com/apache/incubator-airflow/pull/1544

> PR tool crashes with non-integer JIRA ids
> -
>
> Key: AIRFLOW-176
> URL: https://issues.apache.org/jira/browse/AIRFLOW-176
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: PR tool
>Affects Versions: Airflow 1.7.1.2
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>
> The PR tool crashes if a non-integer id is passed. This includes the default 
> ID  (AIRFLOW-XXX) so it affects folks who don't type in a new ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-168:

Affects Version/s: Airflow 1.7.1.2

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300951#comment-15300951
 ] 

Chris Riccomini commented on AIRFLOW-168:
-

What version of Airflow are you running?

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-168:

Affects Version/s: (was: Airflow 1.7.1.2)

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-169) Hide expire dags in UI

2016-05-25 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-169:

Component/s: ui

> Hide expire dags in UI
> --
>
> Key: AIRFLOW-169
> URL: https://issues.apache.org/jira/browse/AIRFLOW-169
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: ui
>Reporter: Sumit Maheshwari
>
> It would be great if we've option to hide expired schedules from UI. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-169) Hide expire dags in UI

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300948#comment-15300948
 ] 

Chris Riccomini commented on AIRFLOW-169:
-

What do you mean by expired schedules? You mean DagRuns that have finished? 
Which page in the UI are you referring to?

> Hide expire dags in UI
> --
>
> Key: AIRFLOW-169
> URL: https://issues.apache.org/jira/browse/AIRFLOW-169
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: ui
>Reporter: Sumit Maheshwari
>
> It would be great if we've option to hide expired schedules from UI. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-172) All example DAGs report "Only works with the CeleryExecutor, sorry"

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300921#comment-15300921
 ] 

Chris Riccomini commented on AIRFLOW-172:
-

Try turning on the scheduler: {{airflow scheduler}}

> All example DAGs report "Only works with the CeleryExecutor, sorry"
> ---
>
> Key: AIRFLOW-172
> URL: https://issues.apache.org/jira/browse/AIRFLOW-172
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor
>Affects Versions: Airflow 1.7.1
>Reporter: Andre
>
> After installing airflow and trying to run some example DAGs I was faced with 
> {{Only works with the CeleryExecutor, sorry}}
> on every DAG I tried to run.
> {code}$ pip list
> airflow (1.7.1.2)
> alembic (0.8.6)
> Babel (1.3)
> bitarray (0.8.1)
> cffi (1.6.0)
> chartkick (0.4.2)
> croniter (0.3.12)
> cryptography (1.3.2)
> dill (0.2.5)
> docutils (0.12)
> Flask (0.10.1)
> Flask-Admin (1.4.0)
> Flask-Cache (0.13.1)
> Flask-Login (0.2.11)
> Flask-WTF (0.12)
> funcsigs (0.4)
> future (0.15.2)
> google-apputils (0.4.2)
> gunicorn (19.3.0)
> hive-thrift-py (0.0.1)
> idna (2.1)
> impyla (0.13.7)
> itsdangerous (0.24)
> Jinja2 (2.8)
> lockfile (0.12.2)
> Mako (1.0.4)
> Markdown (2.6.6)
> MarkupSafe (0.23)
> mysqlclient (1.3.7)
> numpy (1.11.0)
> pandas (0.18.1)
> pip (8.1.2)
> ply (3.8)
> protobuf (2.6.1)
> pyasn1 (0.1.9)
> pycparser (2.14)
> Pygments (2.1.3)
> PyHive (0.1.8)
> pykerberos (1.1.10)
> python-daemon (2.1.1)
> python-dateutil (2.5.3)
> python-editor (1.0)
> python-gflags (3.0.5)
> pytz (2016.4)
> requests (2.10.0)
> setproctitle (1.1.10)
> setuptools (21.2.1)
> six (1.10.0)
> snakebite (2.9.0)
> SQLAlchemy (1.0.13)
> thrift (0.9.3)
> thriftpy (0.3.8)
> unicodecsv (0.14.1)
> Werkzeug (0.11.10)
> WTForms (2.1)
> {code}
> {code}
> $ airflow webserver -p 8088
> [2016-05-25 15:22:48,204] {__init__.py:36} INFO - Using executor LocalExecutor
>      _
>  |__( )_  __/__  /  __
>   /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
> ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
>  _/_/  |_/_/  /_//_//_/  \//|__/
> [2016-05-25 15:22:49,066] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> Running the Gunicorn server with 4 syncworkers on host 0.0.0.0 and port 8088 
> with a timeout of 120...
> [2016-05-25 15:22:49 +1000] [20191] [INFO] Starting gunicorn 19.3.0
> [2016-05-25 15:22:49 +1000] [20191] [INFO] Listening at: http://0.0.0.0:8088 
> (20191)
> [2016-05-25 15:22:49 +1000] [20191] [INFO] Using worker: sync
> [2016-05-25 15:22:49 +1000] [20197] [INFO] Booting worker with pid: 20197
> [2016-05-25 15:22:49 +1000] [20198] [INFO] Booting worker with pid: 20198
> [2016-05-25 15:22:49 +1000] [20199] [INFO] Booting worker with pid: 20199
> [2016-05-25 15:22:49 +1000] [20200] [INFO] Booting worker with pid: 20200
> [2016-05-25 15:22:50,086] {__init__.py:36} INFO - Using executor LocalExecutor
> [2016-05-25 15:22:50,176] {__init__.py:36} INFO - Using executor LocalExecutor
> [2016-05-25 15:22:50,262] {__init__.py:36} INFO - Using executor LocalExecutor
> [2016-05-25 15:22:50,364] {__init__.py:36} INFO - Using executor LocalExecutor
> [2016-05-25 15:22:50,931] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> [2016-05-25 15:22:51,000] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> [2016-05-25 15:22:51,093] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> [2016-05-25 15:22:51,191] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-161) Redirection to external url

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300938#comment-15300938
 ] 

Chris Riccomini commented on AIRFLOW-161:
-

I don't think that we want to embed Quoble logic directly in Airflow. I'm a bit 
out of my element on the UI-front, though. Perhaps there's a way to achieve 
this through plugins, or by simply putting the link in the logs?

> Redirection to external url
> ---
>
> Key: AIRFLOW-161
> URL: https://issues.apache.org/jira/browse/AIRFLOW-161
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Sumit Maheshwari
>
> Hi,
> I am not able to find a good way (apart from loading everything upfront), 
> where I can redirect someone to a external service url, using the information 
> stored in airflow. There could be many use cases like downloading a signed 
> file from s3, redirecting to hadoop job tracker, or a direct case on which I 
> am working which is linking airflow tasks to qubole commands. 
> I already have a working model and will open a PR soon. Please let me know if 
> there existing ways already.
> Thanks,
> Sumit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (AIRFLOW-172) All example DAGs report "Only works with the CeleryExecutor, sorry"

2016-05-25 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini closed AIRFLOW-172.
---
Resolution: Not A Bug

Please re-open if you have further questions.

> All example DAGs report "Only works with the CeleryExecutor, sorry"
> ---
>
> Key: AIRFLOW-172
> URL: https://issues.apache.org/jira/browse/AIRFLOW-172
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor
>Affects Versions: Airflow 1.7.1
>Reporter: Andre
>
> After installing airflow and trying to run some example DAGs I was faced with 
> {{Only works with the CeleryExecutor, sorry}}
> on every DAG I tried to run.
> {code}$ pip list
> airflow (1.7.1.2)
> alembic (0.8.6)
> Babel (1.3)
> bitarray (0.8.1)
> cffi (1.6.0)
> chartkick (0.4.2)
> croniter (0.3.12)
> cryptography (1.3.2)
> dill (0.2.5)
> docutils (0.12)
> Flask (0.10.1)
> Flask-Admin (1.4.0)
> Flask-Cache (0.13.1)
> Flask-Login (0.2.11)
> Flask-WTF (0.12)
> funcsigs (0.4)
> future (0.15.2)
> google-apputils (0.4.2)
> gunicorn (19.3.0)
> hive-thrift-py (0.0.1)
> idna (2.1)
> impyla (0.13.7)
> itsdangerous (0.24)
> Jinja2 (2.8)
> lockfile (0.12.2)
> Mako (1.0.4)
> Markdown (2.6.6)
> MarkupSafe (0.23)
> mysqlclient (1.3.7)
> numpy (1.11.0)
> pandas (0.18.1)
> pip (8.1.2)
> ply (3.8)
> protobuf (2.6.1)
> pyasn1 (0.1.9)
> pycparser (2.14)
> Pygments (2.1.3)
> PyHive (0.1.8)
> pykerberos (1.1.10)
> python-daemon (2.1.1)
> python-dateutil (2.5.3)
> python-editor (1.0)
> python-gflags (3.0.5)
> pytz (2016.4)
> requests (2.10.0)
> setproctitle (1.1.10)
> setuptools (21.2.1)
> six (1.10.0)
> snakebite (2.9.0)
> SQLAlchemy (1.0.13)
> thrift (0.9.3)
> thriftpy (0.3.8)
> unicodecsv (0.14.1)
> Werkzeug (0.11.10)
> WTForms (2.1)
> {code}
> {code}
> $ airflow webserver -p 8088
> [2016-05-25 15:22:48,204] {__init__.py:36} INFO - Using executor LocalExecutor
>      _
>  |__( )_  __/__  /  __
>   /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
> ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
>  _/_/  |_/_/  /_//_//_/  \//|__/
> [2016-05-25 15:22:49,066] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> Running the Gunicorn server with 4 syncworkers on host 0.0.0.0 and port 8088 
> with a timeout of 120...
> [2016-05-25 15:22:49 +1000] [20191] [INFO] Starting gunicorn 19.3.0
> [2016-05-25 15:22:49 +1000] [20191] [INFO] Listening at: http://0.0.0.0:8088 
> (20191)
> [2016-05-25 15:22:49 +1000] [20191] [INFO] Using worker: sync
> [2016-05-25 15:22:49 +1000] [20197] [INFO] Booting worker with pid: 20197
> [2016-05-25 15:22:49 +1000] [20198] [INFO] Booting worker with pid: 20198
> [2016-05-25 15:22:49 +1000] [20199] [INFO] Booting worker with pid: 20199
> [2016-05-25 15:22:49 +1000] [20200] [INFO] Booting worker with pid: 20200
> [2016-05-25 15:22:50,086] {__init__.py:36} INFO - Using executor LocalExecutor
> [2016-05-25 15:22:50,176] {__init__.py:36} INFO - Using executor LocalExecutor
> [2016-05-25 15:22:50,262] {__init__.py:36} INFO - Using executor LocalExecutor
> [2016-05-25 15:22:50,364] {__init__.py:36} INFO - Using executor LocalExecutor
> [2016-05-25 15:22:50,931] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> [2016-05-25 15:22:51,000] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> [2016-05-25 15:22:51,093] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> [2016-05-25 15:22:51,191] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (AIRFLOW-171) Email does not work in 1.7.1.2

2016-05-25 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini closed AIRFLOW-171.
---
Resolution: Information Provided

Please re-open if you have more questions.

> Email does not work in 1.7.1.2
> --
>
> Key: AIRFLOW-171
> URL: https://issues.apache.org/jira/browse/AIRFLOW-171
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1
> Environment: AWS Amazon Linux Image
>Reporter: Hao Ye
>
> Job failure emails was working in 1.7.0. They seem to have stopped working in 
> 1.7.1.
> Error is
> {quote}
> [2016-05-25 00:48:02,334] {models.py:1311} ERROR - Failed to send email to: 
> ['em...@email.com']
> [2016-05-25 00:48:02,334] {models.py:1312} ERROR - 'module' object has no 
> attribute 'send_email_smtp'
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, 
> in handle_failure
> self.email_alert(error, is_retry=False)
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, 
> in email_alert
> send_email(task.email, title, body)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 42, in send_email
> backend = getattr(module, attr)
> AttributeError: 'module' object has no attribute 'send_email_smtp'
> {quote}
> File exists and method exists. Seems to work fine when called in python 
> directly.
> Maybe it's loading the wrong email module.
> Tried to set PYTHONPATH to have 
> /usr/local/lib/python2.7/site-packages/airflow earlier in the path, but that 
> didn't seem to work either.
> Could this be related to the utils refactoring that happened between 1.7.0 
> and 1.7.1?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AIRFLOW-175) PR merge tool needs to reset environment after work_local finishes

2016-05-25 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-175.

Resolution: Fixed

Merged in https://github.com/apache/incubator-airflow/pull/1534

> PR merge tool needs to reset environment after work_local finishes
> --
>
> Key: AIRFLOW-175
> URL: https://issues.apache.org/jira/browse/AIRFLOW-175
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: PR tool
>Affects Versions: Airflow 1.7.1.2
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>
> If you use the pr tool to work locally ({{airflow-pr work_local}}) and make 
> changes to the files, then an error is raised when you try to exit the PR 
> tool because git refuses to overwrite the changes. The tool needs to call 
> {{git reset --hard}} before exiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-171) Email does not work in 1.7.1.2

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300911#comment-15300911
 ] 

Chris Riccomini commented on AIRFLOW-171:
-

This looks like something is wrong with your environment.

> Email does not work in 1.7.1.2
> --
>
> Key: AIRFLOW-171
> URL: https://issues.apache.org/jira/browse/AIRFLOW-171
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1
> Environment: AWS Amazon Linux Image
>Reporter: Hao Ye
>
> Job failure emails was working in 1.7.0. They seem to have stopped working in 
> 1.7.1.
> Error is
> {quote}
> [2016-05-25 00:48:02,334] {models.py:1311} ERROR - Failed to send email to: 
> ['em...@email.com']
> [2016-05-25 00:48:02,334] {models.py:1312} ERROR - 'module' object has no 
> attribute 'send_email_smtp'
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, 
> in handle_failure
> self.email_alert(error, is_retry=False)
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, 
> in email_alert
> send_email(task.email, title, body)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 42, in send_email
> backend = getattr(module, attr)
> AttributeError: 'module' object has no attribute 'send_email_smtp'
> {quote}
> File exists and method exists. Seems to work fine when called in python 
> directly.
> Maybe it's loading the wrong email module.
> Tried to set PYTHONPATH to have 
> /usr/local/lib/python2.7/site-packages/airflow earlier in the path, but that 
> didn't seem to work either.
> Could this be related to the utils refactoring that happened between 1.7.0 
> and 1.7.1?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-108) Add data retention policy to Airflow

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300909#comment-15300909
 ] 

Chris Riccomini commented on AIRFLOW-108:
-

I spoke with [~maxime.beauche...@apache.org] about this a bit. One trick that I 
didn't realize is that you can delete all of the task instances after their 
DagRun is marked as success/failed. Once the DagRun is marked as such, if the 
tasks are deleted, the scheduler won't try to re-run them because the DagRun is 
already showing as a terminal state.

This is a bit hacky, but does work. I still think a retention policy that 
allows us to delete TaskInstances *and* DagRuns would be useful, but due to the 
trick described above, I think this JIRA is probably lower priority than it was 
when I initially filed this ticket.

> Add data retention policy to Airflow
> 
>
> Key: AIRFLOW-108
> URL: https://issues.apache.org/jira/browse/AIRFLOW-108
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: db, scheduler
>Reporter: Chris Riccomini
>
> Airflow's DB currently holds the entire history of all executions for all 
> time. This is problematic as the DB grows. The UI starts to get slower, and 
> the DB's disk usage grows. There is no bound to how large the DB will grow.
> It would be useful to add a feature in Airflow to do two things:
> # Delete old data from the DB
> # Mark some lower watermark, past which DAG executions are ignored
> For example, (2) would allow you to tell the scheduler "ignore all data prior 
> to a year ago". And (1) would allow Airflow to delete all data prior to 
> January 1, 2015.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-171) Email does not work in 1.7.1.2

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300917#comment-15300917
 ] 

Chris Riccomini commented on AIRFLOW-171:
-

Or this config value is wrong, as [~bolke] said:

{code}
path, attr = configuration.get('email', 'EMAIL_BACKEND').rsplit('.', 1)
{code}

> Email does not work in 1.7.1.2
> --
>
> Key: AIRFLOW-171
> URL: https://issues.apache.org/jira/browse/AIRFLOW-171
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1
> Environment: AWS Amazon Linux Image
>Reporter: Hao Ye
>
> Job failure emails was working in 1.7.0. They seem to have stopped working in 
> 1.7.1.
> Error is
> {quote}
> [2016-05-25 00:48:02,334] {models.py:1311} ERROR - Failed to send email to: 
> ['em...@email.com']
> [2016-05-25 00:48:02,334] {models.py:1312} ERROR - 'module' object has no 
> attribute 'send_email_smtp'
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, 
> in handle_failure
> self.email_alert(error, is_retry=False)
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, 
> in email_alert
> send_email(task.email, title, body)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 42, in send_email
> backend = getattr(module, attr)
> AttributeError: 'module' object has no attribute 'send_email_smtp'
> {quote}
> File exists and method exists. Seems to work fine when called in python 
> directly.
> Maybe it's loading the wrong email module.
> Tried to set PYTHONPATH to have 
> /usr/local/lib/python2.7/site-packages/airflow earlier in the path, but that 
> didn't seem to work either.
> Could this be related to the utils refactoring that happened between 1.7.0 
> and 1.7.1?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-171) Email does not work in 1.7.1.2

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300901#comment-15300901
 ] 

Chris Riccomini commented on AIRFLOW-171:
-

Just a note: I have confirmed that email is working for me on 1.7.1.2. We had a 
task with retry=2, retry_delay5. It failed twice, and an email was sent.

> Email does not work in 1.7.1.2
> --
>
> Key: AIRFLOW-171
> URL: https://issues.apache.org/jira/browse/AIRFLOW-171
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1
> Environment: AWS Amazon Linux Image
>Reporter: Hao Ye
>
> Job failure emails was working in 1.7.0. They seem to have stopped working in 
> 1.7.1.
> Error is
> {quote}
> [2016-05-25 00:48:02,334] {models.py:1311} ERROR - Failed to send email to: 
> ['em...@email.com']
> [2016-05-25 00:48:02,334] {models.py:1312} ERROR - 'module' object has no 
> attribute 'send_email_smtp'
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, 
> in handle_failure
> self.email_alert(error, is_retry=False)
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, 
> in email_alert
> send_email(task.email, title, body)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 42, in send_email
> backend = getattr(module, attr)
> AttributeError: 'module' object has no attribute 'send_email_smtp'
> {quote}
> File exists and method exists. Seems to work fine when called in python 
> directly.
> Maybe it's loading the wrong email module.
> Tried to set PYTHONPATH to have 
> /usr/local/lib/python2.7/site-packages/airflow earlier in the path, but that 
> didn't seem to work either.
> Could this be related to the utils refactoring that happened between 1.7.0 
> and 1.7.1?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-160) Parse DAG files through child processes

2016-05-25 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300747#comment-15300747
 ] 

Bolke de Bruin commented on AIRFLOW-160:


+1 on the idea, -1 on more polling. I think inotify is more suitable or an API 
call to refresh the dagbag if triggered externally. API call is also nicer 
because it can update all processes that require a load of the dagbag.

> Parse DAG files through child processes
> ---
>
> Key: AIRFLOW-160
> URL: https://issues.apache.org/jira/browse/AIRFLOW-160
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Paul Yang
>Assignee: Paul Yang
>
> Currently, the Airflow scheduler parses all user DAG files in the same 
> process as the scheduler itself. We've seen issues in production where bad 
> DAG files cause scheduler to fail. A simple example is if the user script 
> calls `sys.exit(1)`, the scheduler will exit as well. We've also seen an 
> unusual case where modules loaded by the user DAG affect operation of the 
> scheduler. For better uptime, the scheduler should be resistant to these 
> problematic user DAGs.
> The proposed solution is to parse and schedule user DAGs through child 
> processes. This way, the main scheduler process is more isolated from bad 
> DAGs. There's a side benefit as well - since parsing is distributed among 
> multiple processes, it's possible to parse the DAG files more frequently, 
> reducing the latency between when a DAG is modified and when the changes are 
> picked up.
> Another issue right now is that all DAGs must be scheduled before any tasks 
> are sent to the executor. This means that the frequency of task scheduling is 
> limited by the slowest DAG to schedule. The changes needed for scheduling 
> DAGs through child processes will also make it easy to decouple this process 
> and allow tasks to be scheduled and sent to the executor in a more 
> independent fashion. This way, overall scheduling won't be held back by a 
> slow DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AIRFLOW-101) Acces the tree view of the Web UI instead of the graph view when clicking on a dag

2016-05-25 Thread Bolke de Bruin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin resolved AIRFLOW-101.

Resolution: Fixed

This has been done

> Acces the tree view of the Web UI instead of the graph view when clicking on 
> a dag
> --
>
> Key: AIRFLOW-101
> URL: https://issues.apache.org/jira/browse/AIRFLOW-101
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Affects Versions: Airflow 1.7.0
> Environment: All
>Reporter: Michal TOMA
>Priority: Minor
> Fix For: Airflow 1.7.1
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I'd like to have a config parameter that would allow to access directly the 
> tree view of the DAG tasks instead of the current graph view.
> I my environment failed tasks are very common and I need to have a quick view 
> of what failed and when in the past. As of now I must click either the DAG 
> and than click the tree view menu or click the very small tree view icon.
> For me the DAG graph is not that important and I'd like to see the tree view 
> when clicking on the name of the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-166) Webserver times out using systemd script

2016-05-25 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300739#comment-15300739
 ] 

Bolke de Bruin commented on AIRFLOW-166:


It probably is due to some locations not being writable by airflow. Check if 
passing --pid  helps. Maybe even --log 
 --stdout  --stderr  is needed



> Webserver times out using systemd script
> 
>
> Key: AIRFLOW-166
> URL: https://issues.apache.org/jira/browse/AIRFLOW-166
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1
> Environment: CentOS 7
>Reporter: Yuri Bendana
>
> I just upgraded to 1.7.1.2 from 1.6 and I'm having a problem starting the 
> webserver using the systemd script.  This used to work fine.  The issue is 
> that it starts and then just hangs, no error is reported and it finally times 
> out after about a minute.  I tried starting it from the command line and it 
> works fine without timing out.  I also ran it in daemon mode with -D and 
> again it seems to be fine.  Any thoughts on how to debug this?
> Here's the log output:
> {code}
> May 23 16:27:50 ybendana-linux systemd: Starting Airflow webserver daemon...
> May 23 16:27:51 ybendana-linux airflow: [2016-05-23 16:27:51,444] 
> {__init__.py:36} INFO - Using executor LocalExecutor
> May 23 16:27:53 ybendana-linux airflow:    _
> May 23 16:27:53 ybendana-linux airflow: |__( )_  __/__  
> /  __
> May 23 16:27:53 ybendana-linux airflow:   /| |_  /__  ___/_  /_ __  /_  
> __ \_ | /| / /
> May 23 16:27:53 ybendana-linux airflow: ___  ___ |  / _  /   _  __/ _  / / 
> /_/ /_ |/ |/ /
> May 23 16:27:53 ybendana-linux airflow: _/_/  |_/_/  /_//_//_/  
> \//|__/
> May 23 16:27:53 ybendana-linux airflow: [2016-05-23 16:27:53,446] 
> {models.py:154} INFO - Filling up the DagBag from /opt/airflow/dags
> May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55 +] [15960] 
> [INFO] Starting gunicorn 19.3.0
> May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55 +] [15960] 
> [INFO] Listening at: http://0.0.0.0:8080 (15960)
> May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55 +] [15960] 
> [INFO] Using worker: sync
> May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55 +] [16067] 
> [INFO] Booting worker with pid: 16067
> May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55 +] [16069] 
> [INFO] Booting worker with pid: 16069
> May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55 +] [16070] 
> [INFO] Booting worker with pid: 16070
> May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55 +] [16071] 
> [INFO] Booting worker with pid: 16071
> May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55,876] 
> {__init__.py:36} INFO - Using executor LocalExecutor
> May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55,950] 
> {__init__.py:36} INFO - Using executor LocalExecutor
> May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55,972] 
> {__init__.py:36} INFO - Using executor LocalExecutor
> May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55,997] 
> {__init__.py:36} INFO - Using executor LocalExecutor
> May 23 16:27:57 ybendana-linux airflow: [2016-05-23 16:27:57,885] 
> {models.py:154} INFO - Filling up the DagBag from /opt/airflow/dags
> May 23 16:27:57 ybendana-linux airflow: [2016-05-23 16:27:57,951] 
> {models.py:154} INFO - Filling up the DagBag from /opt/airflow/dags
> May 23 16:27:57 ybendana-linux airflow: [2016-05-23 16:27:57,983] 
> {models.py:154} INFO - Filling up the DagBag from /opt/airflow/dags
> May 23 16:27:58 ybendana-linux airflow: [2016-05-23 16:27:58,014] 
> {models.py:154} INFO - Filling up the DagBag from /opt/airflow/dags
> May 23 16:29:20 ybendana-linux systemd: airflow-webserver.service start 
> operation timed out. Terminating.
> May 23 16:29:20 ybendana-linux airflow: [2016-05-23 16:29:20 +] [16070] 
> [INFO] Worker exiting (pid: 16070)
> May 23 16:29:20 ybendana-linux airflow: [2016-05-23 16:29:20 +] [15960] 
> [INFO] Handling signal: term
> May 23 16:29:20 ybendana-linux airflow: [2016-05-23 16:29:20 +] [16071] 
> [INFO] Worker exiting (pid: 16071)
> May 23 16:29:20 ybendana-linux airflow: [2016-05-23 16:29:20 +] [16069] 
> [INFO] Worker exiting (pid: 16069)
> May 23 16:29:20 ybendana-linux airflow: [2016-05-23 16:29:20 +] [16067] 
> [INFO] Worker exiting (pid: 16067)
> May 23 16:29:21 ybendana-linux airflow: [2016-05-23 16:29:21 +] [15960] 
> [INFO] Shutting down: Master
> May 23 16:29:21 ybendana-linux systemd: Failed to start Airflow webserver 
> daemon.
> May 23 16:29:21 ybendana-linux systemd: Unit airflow-webserver.service 
> entered failed state.
> May 23 16:29:21 ybendana-linux systemd: airflow-webserver.service failed.

[jira] [Commented] (AIRFLOW-171) Email does not work in 1.7.1.2

2016-05-25 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300733#comment-15300733
 ] 

Bolke de Bruin commented on AIRFLOW-171:


Have you checked your config and properly configured the backend?

> Email does not work in 1.7.1.2
> --
>
> Key: AIRFLOW-171
> URL: https://issues.apache.org/jira/browse/AIRFLOW-171
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1
> Environment: AWS Amazon Linux Image
>Reporter: Hao Ye
>
> Job failure emails was working in 1.7.0. They seem to have stopped working in 
> 1.7.1.
> Error is
> {quote}
> [2016-05-25 00:48:02,334] {models.py:1311} ERROR - Failed to send email to: 
> ['em...@email.com']
> [2016-05-25 00:48:02,334] {models.py:1312} ERROR - 'module' object has no 
> attribute 'send_email_smtp'
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, 
> in handle_failure
> self.email_alert(error, is_retry=False)
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, 
> in email_alert
> send_email(task.email, title, body)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 42, in send_email
> backend = getattr(module, attr)
> AttributeError: 'module' object has no attribute 'send_email_smtp'
> {quote}
> File exists and method exists. Seems to work fine when called in python 
> directly.
> Maybe it's loading the wrong email module.
> Tried to set PYTHONPATH to have 
> /usr/local/lib/python2.7/site-packages/airflow earlier in the path, but that 
> didn't seem to work either.
> Could this be related to the utils refactoring that happened between 1.7.0 
> and 1.7.1?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-173) Create a FileSensor / NFSFileSensor sensor

2016-05-25 Thread Bolke de Bruin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300728#comment-15300728
 ] 

Bolke de Bruin commented on AIRFLOW-173:


I like it, but wouldnt a inotify combination with a trigger dag_run not be more 
efficient?

> Create a FileSensor / NFSFileSensor sensor
> --
>
> Key: AIRFLOW-173
> URL: https://issues.apache.org/jira/browse/AIRFLOW-173
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Andre
>Priority: Minor
>
> While HDFS and WebHDFS suit most organisations using Hadoop, for some shops 
> running MapR-FS, Airflow implementation is simplified by the use of plain 
> files pointing to MapR's NFS gateways.
> A FileSensor and/or a NFSFileSensor would assist the adoption of Airflow 
> within the MapR customer base, but more importantly, help those who are using 
> POSIX compliant distributed filesystems that can be mounted on Unix 
> derivative systems (e.g. as MapR-FS (via NFS), CephFS, GlusterFS, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AIRFLOW-157) Minor fixes for PR merge tool

2016-05-25 Thread Jeremiah Lowin (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Lowin resolved AIRFLOW-157.

   Resolution: Fixed
Fix Version/s: (was: Airflow 1.8)

Merged in https://github.com/apache/incubator-airflow/pull/1534

> Minor fixes for PR merge tool
> -
>
> Key: AIRFLOW-157
> URL: https://issues.apache.org/jira/browse/AIRFLOW-157
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: PR tool
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
>
> 1. subscripting a {{filter}} object fails in Python3
> 2. JIRA issue inference looks for a 4 or 5 digit issue number... we're not 
> quite there yet!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-176) PR tool crashes with non-integer JIRA ids

2016-05-25 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-176:
--

 Summary: PR tool crashes with non-integer JIRA ids
 Key: AIRFLOW-176
 URL: https://issues.apache.org/jira/browse/AIRFLOW-176
 Project: Apache Airflow
  Issue Type: Bug
  Components: PR tool
Affects Versions: Airflow 1.7.1.2
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin


The PR tool crashes if a non-integer id is passed. This includes the default ID 
 (AIRFLOW-XXX) so it affects folks who don't type in a new ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-157) Minor fixes for PR merge tool

2016-05-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300240#comment-15300240
 ] 

ASF subversion and git services commented on AIRFLOW-157:
-

Commit 805944b74744b34e1510c2f5d080de98704705d0 in incubator-airflow's branch 
refs/heads/master from [~jlowin]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=805944b ]

[AIRFLOW-157] Make PR tool Py3-compat; add JIRA command

- Adds Python3 compatibility (filter objects can't be indexed)
- Adds JIRA command to close issues without merging a PR
- Adds general usability fixes and starts cleaning up code


> Minor fixes for PR merge tool
> -
>
> Key: AIRFLOW-157
> URL: https://issues.apache.org/jira/browse/AIRFLOW-157
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: PR tool
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: Airflow 1.8
>
>
> 1. subscripting a {{filter}} object fails in Python3
> 2. JIRA issue inference looks for a 4 or 5 digit issue number... we're not 
> quite there yet!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[1/3] incubator-airflow git commit: [AIRFLOW-157] Make PR tool Py3-compat; add JIRA command

2016-05-25 Thread jlowin
Repository: incubator-airflow
Updated Branches:
  refs/heads/master ac96fbf85 -> 7332c40c2


[AIRFLOW-157] Make PR tool Py3-compat; add JIRA command

- Adds Python3 compatibility (filter objects can't be indexed)
- Adds JIRA command to close issues without merging a PR
- Adds general usability fixes and starts cleaning up code


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/805944b7
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/805944b7
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/805944b7

Branch: refs/heads/master
Commit: 805944b74744b34e1510c2f5d080de98704705d0
Parents: 98f10d5
Author: jlowin 
Authored: Fri May 20 17:15:07 2016 -0400
Committer: jlowin 
Committed: Wed May 25 10:52:13 2016 -0400

--
 dev/README.md  |   5 +-
 dev/airflow-pr | 220 +++-
 2 files changed, 134 insertions(+), 91 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/805944b7/dev/README.md
--
diff --git a/dev/README.md b/dev/README.md
index 59ea024..a0c185e 100755
--- a/dev/README.md
+++ b/dev/README.md
@@ -8,7 +8,6 @@ It is very important that PRs reference a JIRA issue. The 
preferred way to do th
 
 __Please note:__ this tool will restore your current branch when it finishes, 
but you will lose any uncommitted changes. Make sure you commit any changes you 
wish to keep before proceeding.
 
-Also, do not run this tool from inside the `dev` folder if you are working 
with a PR that predates the `dev` directory. It will be unable to restore 
itself from a nonexistent location. Run it from the main airflow directory 
instead: `dev/airflow-pr`.
 
 ### Execution
 Simply execute the `airflow-pr` tool:
@@ -28,6 +27,7 @@ Options:
   --help  Show this message and exit.
 
 Commands:
+  close_jira  Close a JIRA issue (without merging a PR)
   merge   Merge a GitHub PR into Airflow master
   work_local  Clone a GitHub PR locally for testing (no push)
 ```
@@ -38,8 +38,7 @@ Execute `airflow-pr merge` to be interactively guided through 
the process of mer
 
 Execute `airflow-pr work_local` to only merge the PR locally. The tool will 
pause once the merge is complete, allowing the user to explore the PR, and then 
will delete the merge and restore the original development environment.
 
-Both commands can be followed by a PR number (`airflow-pr merge 42`); 
otherwise the tool will prompt for one.
-
+Execute `airflow-pr close_jira` to close a JIRA issue without needing to merge 
a PR. You will be prompted for an issue number and close comment.
 
 ### Configuration
 

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/805944b7/dev/airflow-pr
--
diff --git a/dev/airflow-pr b/dev/airflow-pr
index 918ad54..dab9540 100755
--- a/dev/airflow-pr
+++ b/dev/airflow-pr
@@ -35,6 +35,7 @@ import os
 import re
 import subprocess
 import sys
+import textwrap
 
 # Python 3 compatibility
 try:
@@ -95,41 +96,32 @@ def get_json(url):
 if (
 "X-RateLimit-Remaining" in e.headers and
 e.headers["X-RateLimit-Remaining"] == '0'):
-print(
+click.echo(
 "Exceeded the GitHub API rate limit; set the environment "
 "variable GITHUB_OAUTH_KEY in order to make authenticated "
 "GitHub requests.")
 else:
-print("Unable to fetch URL, exiting: %s" % url)
+click.echo("Unable to fetch URL, exiting: %s" % url)
 sys.exit(-1)
 
 
 def fail(msg):
-print(msg)
+click.echo(msg)
 clean_up()
 sys.exit(-1)
 
 
 def run_cmd(cmd):
 if isinstance(cmd, list):
-print(' {}'.format(' '.join(cmd)))
+click.echo('>> Running command: {}'.format(' '.join(cmd)))
 return subprocess.check_output(cmd).decode('utf-8')
 else:
-print(' {}'.format(cmd))
+click.echo('>> Running command: {}'.format(cmd))
 return subprocess.check_output(cmd.split(" ")).decode('utf-8')
 
-def get_yes_no(prompt):
-while True:
-result = raw_input("\n%s (y/n): " % prompt)
-if result.lower() not in ('y', 'n'):
-print('Invalid response.')
-else:
-break
-return result.lower() == 'y'
-
 
 def continue_maybe(prompt):
-if not get_yes_no(prompt):
+if not click.confirm(prompt):
 fail("Okay, exiting.")
 
 
@@ -137,13 +129,13 @@ def clean_up():
 if 'original_head' not in globals():
 return
 
-print("Restoring head pointer to %s" % original_head)
+click.echo("Restoring head pointer to %s" % original_head)
 run_cmd("git checkout %s" % o

[2/3] incubator-airflow git commit: [AIRFLOW-175] Run git-reset before checkout in PR tool

2016-05-25 Thread jlowin
[AIRFLOW-175] Run git-reset before checkout in PR tool

If the user made any changes, git checkout will fail because the
changes would be overwritten. Running git reset blows the changes away.


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/6d87679a
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/6d87679a
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/6d87679a

Branch: refs/heads/master
Commit: 6d87679a56b7fd6f918439db953ca6b959752721
Parents: 805944b
Author: jlowin 
Authored: Wed May 25 10:49:10 2016 -0400
Committer: jlowin 
Committed: Wed May 25 10:53:22 2016 -0400

--
 dev/airflow-pr | 3 +++
 1 file changed, 3 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/6d87679a/dev/airflow-pr
--
diff --git a/dev/airflow-pr b/dev/airflow-pr
index dab9540..8dd8df7 100755
--- a/dev/airflow-pr
+++ b/dev/airflow-pr
@@ -129,6 +129,9 @@ def clean_up():
 if 'original_head' not in globals():
 return
 
+click.echo('Resetting git to remove any changes')
+run_cmd('git reset --hard')
+
 click.echo("Restoring head pointer to %s" % original_head)
 run_cmd("git checkout %s" % original_head)
 



[3/3] incubator-airflow git commit: Merge pull request #1534 from jlowin/pr-tool-2

2016-05-25 Thread jlowin
Merge pull request #1534 from jlowin/pr-tool-2


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/7332c40c
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/7332c40c
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/7332c40c

Branch: refs/heads/master
Commit: 7332c40c24f85ca3be20511af1c6b618b5adfe7f
Parents: ac96fbf 6d87679
Author: jlowin 
Authored: Wed May 25 11:40:53 2016 -0400
Committer: jlowin 
Committed: Wed May 25 11:40:53 2016 -0400

--
 dev/README.md  |   5 +-
 dev/airflow-pr | 223 +++-
 2 files changed, 137 insertions(+), 91 deletions(-)
--




[jira] [Commented] (AIRFLOW-175) PR merge tool needs to reset environment after work_local finishes

2016-05-25 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300241#comment-15300241
 ] 

ASF subversion and git services commented on AIRFLOW-175:
-

Commit 6d87679a56b7fd6f918439db953ca6b959752721 in incubator-airflow's branch 
refs/heads/master from [~jlowin]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=6d87679 ]

[AIRFLOW-175] Run git-reset before checkout in PR tool

If the user made any changes, git checkout will fail because the
changes would be overwritten. Running git reset blows the changes away.


> PR merge tool needs to reset environment after work_local finishes
> --
>
> Key: AIRFLOW-175
> URL: https://issues.apache.org/jira/browse/AIRFLOW-175
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: PR tool
>Affects Versions: Airflow 1.7.1.2
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>
> If you use the pr tool to work locally ({{airflow-pr work_local}}) and make 
> changes to the files, then an error is raised when you try to exit the PR 
> tool because git refuses to overwrite the changes. The tool needs to call 
> {{git reset --hard}} before exiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-175) PR merge tool needs to reset environment after work_local finishes

2016-05-25 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300226#comment-15300226
 ] 

Chris Riccomini commented on AIRFLOW-175:
-

+1

> PR merge tool needs to reset environment after work_local finishes
> --
>
> Key: AIRFLOW-175
> URL: https://issues.apache.org/jira/browse/AIRFLOW-175
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: PR tool
>Affects Versions: Airflow 1.7.1.2
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>
> If you use the pr tool to work locally ({{airflow-pr work_local}}) and make 
> changes to the files, then an error is raised when you try to exit the PR 
> tool because git refuses to overwrite the changes. The tool needs to call 
> {{git reset --hard}} before exiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-175) PR merge tool needs to reset environment after work_local finishes

2016-05-25 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-175:

External issue URL: https://github.com/apache/incubator-airflow/pull/1534

> PR merge tool needs to reset environment after work_local finishes
> --
>
> Key: AIRFLOW-175
> URL: https://issues.apache.org/jira/browse/AIRFLOW-175
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: PR tool
>Affects Versions: Airflow 1.7.1.2
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>
> If you use the pr tool to work locally ({{airflow-pr work_local}}) and make 
> changes to the files, then an error is raised when you try to exit the PR 
> tool because git refuses to overwrite the changes. The tool needs to call 
> {{git reset --hard}} before exiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AIRFLOW-157) Minor fixes for PR merge tool

2016-05-25 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-157:

Component/s: PR tool

> Minor fixes for PR merge tool
> -
>
> Key: AIRFLOW-157
> URL: https://issues.apache.org/jira/browse/AIRFLOW-157
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: PR tool
>Reporter: Jeremiah Lowin
>Assignee: Jeremiah Lowin
>Priority: Minor
> Fix For: Airflow 1.8
>
>
> 1. subscripting a {{filter}} object fails in Python3
> 2. JIRA issue inference looks for a 4 or 5 digit issue number... we're not 
> quite there yet!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-175) PR merge tool needs to reset environment after work_local finishes

2016-05-25 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-175:
--

 Summary: PR merge tool needs to reset environment after work_local 
finishes
 Key: AIRFLOW-175
 URL: https://issues.apache.org/jira/browse/AIRFLOW-175
 Project: Apache Airflow
  Issue Type: Bug
  Components: PR tool
Affects Versions: 1.7.1.2
Reporter: Jeremiah Lowin
Assignee: Jeremiah Lowin


If you use the pr tool to work locally ({{airflow-pr work_local}}) and make 
changes to the files, then an error is raised when you try to exit the PR tool 
because git refuses to overwrite the changes. The tool needs to call {{git 
reset --hard}} before exiting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (AIRFLOW-52) Release airflow 1.7.1

2016-05-25 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-52?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini closed AIRFLOW-52.
--
Resolution: Done

Closing.

> Release airflow 1.7.1
> -
>
> Key: AIRFLOW-52
> URL: https://issues.apache.org/jira/browse/AIRFLOW-52
> Project: Apache Airflow
>  Issue Type: Task
>  Components: release
>Reporter: Dan Davydov
>Assignee: Dan Davydov
>  Labels: release
>
> Release the airflow 1.7.1 tag.
> Current status:
> There are three issues blocking this release caused by this commit:
> https://github.com/apache/incubator-airflow/commit/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6
> -1. DAGs with a lot of tasks take much longer to parse (~25x slowdown)-
> -2. The following kind of patterns fail:-
> {code}
> email.set_upstream(dag.roots)
> dag.add_task(email)
> {code}
> This is because set_upstream now calls add_task and a task can't be added 
> more than once.
> -3. Airflow losing queued tasks (see linked issue)-
> -4. Airflow putting dags in a stuck state (AIRFLOW-92)-
> I'm working with the owner of the commit to resolve these issues.
> The way to catch (1) in the future is an integration test that asserts a 
> given non-trivial DAG parses under X seconds



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-172) All example DAGs report "Only works with the CeleryExecutor, sorry"

2016-05-25 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300096#comment-15300096
 ] 

Jeremiah Lowin commented on AIRFLOW-172:


Are you trying to run tasks by hand in the Airflow UI? I think that's the only 
place where that error message exists. That's different than running a DAG, 
it's more for maintenance.

> All example DAGs report "Only works with the CeleryExecutor, sorry"
> ---
>
> Key: AIRFLOW-172
> URL: https://issues.apache.org/jira/browse/AIRFLOW-172
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor
>Affects Versions: Airflow 1.7.1
>Reporter: Andre
>
> After installing airflow and trying to run some example DAGs I was faced with 
> {{Only works with the CeleryExecutor, sorry}}
> on every DAG I tried to run.
> {code}$ pip list
> airflow (1.7.1.2)
> alembic (0.8.6)
> Babel (1.3)
> bitarray (0.8.1)
> cffi (1.6.0)
> chartkick (0.4.2)
> croniter (0.3.12)
> cryptography (1.3.2)
> dill (0.2.5)
> docutils (0.12)
> Flask (0.10.1)
> Flask-Admin (1.4.0)
> Flask-Cache (0.13.1)
> Flask-Login (0.2.11)
> Flask-WTF (0.12)
> funcsigs (0.4)
> future (0.15.2)
> google-apputils (0.4.2)
> gunicorn (19.3.0)
> hive-thrift-py (0.0.1)
> idna (2.1)
> impyla (0.13.7)
> itsdangerous (0.24)
> Jinja2 (2.8)
> lockfile (0.12.2)
> Mako (1.0.4)
> Markdown (2.6.6)
> MarkupSafe (0.23)
> mysqlclient (1.3.7)
> numpy (1.11.0)
> pandas (0.18.1)
> pip (8.1.2)
> ply (3.8)
> protobuf (2.6.1)
> pyasn1 (0.1.9)
> pycparser (2.14)
> Pygments (2.1.3)
> PyHive (0.1.8)
> pykerberos (1.1.10)
> python-daemon (2.1.1)
> python-dateutil (2.5.3)
> python-editor (1.0)
> python-gflags (3.0.5)
> pytz (2016.4)
> requests (2.10.0)
> setproctitle (1.1.10)
> setuptools (21.2.1)
> six (1.10.0)
> snakebite (2.9.0)
> SQLAlchemy (1.0.13)
> thrift (0.9.3)
> thriftpy (0.3.8)
> unicodecsv (0.14.1)
> Werkzeug (0.11.10)
> WTForms (2.1)
> {code}
> {code}
> $ airflow webserver -p 8088
> [2016-05-25 15:22:48,204] {__init__.py:36} INFO - Using executor LocalExecutor
>      _
>  |__( )_  __/__  /  __
>   /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
> ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
>  _/_/  |_/_/  /_//_//_/  \//|__/
> [2016-05-25 15:22:49,066] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> Running the Gunicorn server with 4 syncworkers on host 0.0.0.0 and port 8088 
> with a timeout of 120...
> [2016-05-25 15:22:49 +1000] [20191] [INFO] Starting gunicorn 19.3.0
> [2016-05-25 15:22:49 +1000] [20191] [INFO] Listening at: http://0.0.0.0:8088 
> (20191)
> [2016-05-25 15:22:49 +1000] [20191] [INFO] Using worker: sync
> [2016-05-25 15:22:49 +1000] [20197] [INFO] Booting worker with pid: 20197
> [2016-05-25 15:22:49 +1000] [20198] [INFO] Booting worker with pid: 20198
> [2016-05-25 15:22:49 +1000] [20199] [INFO] Booting worker with pid: 20199
> [2016-05-25 15:22:49 +1000] [20200] [INFO] Booting worker with pid: 20200
> [2016-05-25 15:22:50,086] {__init__.py:36} INFO - Using executor LocalExecutor
> [2016-05-25 15:22:50,176] {__init__.py:36} INFO - Using executor LocalExecutor
> [2016-05-25 15:22:50,262] {__init__.py:36} INFO - Using executor LocalExecutor
> [2016-05-25 15:22:50,364] {__init__.py:36} INFO - Using executor LocalExecutor
> [2016-05-25 15:22:50,931] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> [2016-05-25 15:22:51,000] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> [2016-05-25 15:22:51,093] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> [2016-05-25 15:22:51,191] {models.py:154} INFO - Filling up the DagBag from 
> /opt/airflow/production/dags
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-52) Release airflow 1.7.1

2016-05-25 Thread Jeremiah Lowin (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300086#comment-15300086
 ] 

Jeremiah Lowin commented on AIRFLOW-52:
---

Should this be closed?

> Release airflow 1.7.1
> -
>
> Key: AIRFLOW-52
> URL: https://issues.apache.org/jira/browse/AIRFLOW-52
> Project: Apache Airflow
>  Issue Type: Task
>  Components: release
>Reporter: Dan Davydov
>Assignee: Dan Davydov
>  Labels: release
>
> Release the airflow 1.7.1 tag.
> Current status:
> There are three issues blocking this release caused by this commit:
> https://github.com/apache/incubator-airflow/commit/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6
> -1. DAGs with a lot of tasks take much longer to parse (~25x slowdown)-
> -2. The following kind of patterns fail:-
> {code}
> email.set_upstream(dag.roots)
> dag.add_task(email)
> {code}
> This is because set_upstream now calls add_task and a task can't be added 
> more than once.
> -3. Airflow losing queued tasks (see linked issue)-
> -4. Airflow putting dags in a stuck state (AIRFLOW-92)-
> I'm working with the owner of the commit to resolve these issues.
> The way to catch (1) in the future is an integration test that asserts a 
> given non-trivial DAG parses under X seconds



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-174) Add --debug option to scheduler

2016-05-25 Thread Jeremiah Lowin (JIRA)
Jeremiah Lowin created AIRFLOW-174:
--

 Summary: Add --debug option to scheduler
 Key: AIRFLOW-174
 URL: https://issues.apache.org/jira/browse/AIRFLOW-174
 Project: Apache Airflow
  Issue Type: Improvement
  Components: scheduler
Affects Versions: Airflow 1.7.1
Reporter: Jeremiah Lowin
Assignee: Bolke de Bruin
Priority: Minor


{{airflow webserver}} has a {{--debug}} param which enables the use of 
interactive debuggers like {{ipdb}} (among other side effects). Unfortunately 
the {{airflow scheduler}} process does not respect debugger instructions, which 
makes tracing errors very difficult. It just prints the following error and 
resumes operation:
{code}
Traceback (most recent call last):
  File "/Users/jlowin/git/airflow/airflow/jobs.py", line 690, in _do_dags
self.process_dag(dag, tis_out)
  File "/Users/jlowin/git/airflow/airflow/jobs.py", line 521, in process_dag
run.update_state()
  File "/Users/jlowin/git/airflow/airflow/utils/db.py", line 53, in wrapper
result = func(*args, **kwargs)
  File "/Users/jlowin/git/airflow/airflow/models.py", line 3471, in update_state
all_deadlocked = (has_unfinished_tasks and no_dependencies_met)
  File "/Users/jlowin/git/airflow/airflow/models.py", line 3471, in update_state
all_deadlocked = (has_unfinished_tasks and no_dependencies_met)
  File "/Users/jlowin/anaconda3/lib/python3.5/bdb.py", line 48, in 
trace_dispatch
return self.dispatch_line(frame)
  File "/Users/jlowin/anaconda3/lib/python3.5/bdb.py", line 67, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
{code}

 [~bolke] I'm assigning this to you for now because I suspect it's related to 
the subprocess/daemonizing changes you made though I'm not sure. If we can 
enable {{ipdb}} it will make future scheduler work so much easier!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice

2016-05-25 Thread Sumit Maheshwari (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299668#comment-15299668
 ] 

Sumit Maheshwari commented on AIRFLOW-168:
--

Also heard that if we change the start_date of that dag, scheduler creates 
instances for all the dates in between earlier and changed one as well. 

> schedule_interval @once scheduling dag atleast twice
> 
>
> Key: AIRFLOW-168
> URL: https://issues.apache.org/jira/browse/AIRFLOW-168
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Sumit Maheshwari
> Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png
>
>
> I was looking at example_xcom example and found that it got scheduled twice. 
> Ones at the start_time and ones at the current time. To be correct I tried 
> multiple times (by reloading db) and its same. 
> I am on airflow master, using sequential executor with sqlite3. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AIRFLOW-173) Create a FileSensor / NFSFileSensor sensor

2016-05-25 Thread Andre (JIRA)
Andre created AIRFLOW-173:
-

 Summary: Create a FileSensor / NFSFileSensor sensor
 Key: AIRFLOW-173
 URL: https://issues.apache.org/jira/browse/AIRFLOW-173
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Andre
Priority: Minor


While HDFS and WebHDFS suit most organisations using Hadoop, for some shops 
running MapR-FS, Airflow implementation is simplified by the use of plain files 
pointing to MapR's NFS gateways.

A FileSensor and/or a NFSFileSensor would assist the adoption of Airflow within 
the MapR customer base, but more importantly, help those who are using POSIX 
compliant distributed filesystems that can be mounted on Unix derivative 
systems (e.g. as MapR-FS (via NFS), CephFS, GlusterFS, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)