[jira] [Commented] (AIRFLOW-3270) Apache airflow 1.10.0 integration with LDAP anonmyously

2018-11-02 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673351#comment-16673351
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3270:


I think the same issue would still apply on 1.10.0 so it doesn't matter but the 
lines def match up form the stack trace. For instance

{code}
File 
"/usr/local/lib/python3.5/site-packages/airflow/contrib/auth/backends/ldap_auth.py",
 line 268, in login
LdapUser.try_login(username, password)
{code}

https://github.com/apache/incubator-airflow/blob/1.10.0/airflow/contrib/auth/backends/ldap_auth.py#L268
 -- the is not in a login function.

So you might want to see if you have two versions installed (just so that you 
edit the right one).

> Apache airflow 1.10.0 integration with LDAP anonmyously
> ---
>
> Key: AIRFLOW-3270
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3270
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Affects Versions: 1.10.0
>Reporter: Hari Krishna ADDEPALLI LN
>Priority: Blocker
>
> Please advise what to include in airflow.cfg when going to integrate with 
> LDAP anonymously ? We are using DS389 as LDAP server vendor name. 
>  
> {noformat}
> [webserver] 
> authenticate = True 
> auth_backend = airflow.contrib.auth.backends.ldap_auth  
> {noformat}
>  
> And 
>  
> {noformat}
> [ldap] 
> uri = ldap://nsp-daf178e8.ad1.prd.us-phx.odc.im:389 
> user_filter = memberOf=cn=rvs-all-prd_usphx,ou=groups,dc=odc,dc=im
> user_name_attr = uid 
> group_member_attr =
> superuser_filter = memberOf=cn=rvd-sudo_all-prd_usphx,ou=groups,dc=odc,dc=im 
> data_profiler_filter = 
> bind_user = 
> bind_password = 
> basedn = ou=people,dc=odc,dc=im 
> cacert = /opt/orchestration/airflow/ldap_ca.crt 
> search_scope = LEVEL
> {noformat}
> I am hitting below exception:
> {noformat}
>   File "/usr/local/lib/python3.5/site-packages/ldap3/operation/search.py", 
> line 215, in parse_filter     
> raise LDAPInvalidFilterError('malformed filter') 
> ldap3.core.exceptions.LDAPInvalidFilterError: malformed filter
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3270) Apache airflow 1.10.0 integration with LDAP anonmyously

2018-11-02 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673311#comment-16673311
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3270:


Well your stack trace lines are for 1.8, so this might not be true anymore, but 
could you try making this change in airflow/contrib/auth/backends/ldap_auth.py, 
in {{get_ldap_connection}} function:

{code}
diff --git a/airflow/contrib/auth/backends/ldap_auth.py 
b/airflow/contrib/auth/backends/ldap_auth.py
index 13b49f90..42ad7026 100644
--- a/airflow/contrib/auth/backends/ldap_auth.py
+++ b/airflow/contrib/auth/backends/ldap_auth.py
@@ -51,6 +51,13 @@ class LdapException(Exception):
 
 
 def get_ldap_connection(dn=None, password=None):
+# When coming form confing we can't set None, the best we can do is set it
+# to an empty string
+if dn == "":
+dn=None
+if password == "":
+password = None
+
 tls_configuration = None
 use_ssl = False
 try:
{code}

> Apache airflow 1.10.0 integration with LDAP anonmyously
> ---
>
> Key: AIRFLOW-3270
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3270
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Affects Versions: 1.10.0
>Reporter: Hari Krishna ADDEPALLI LN
>Priority: Blocker
>
> Please advise what to include in airflow.cfg when going to integrate with 
> LDAP anonymously ? We are using DS389 as LDAP server vendor name. 
>  
> {noformat}
> [webserver] 
> authenticate = True 
> auth_backend = airflow.contrib.auth.backends.ldap_auth  
> {noformat}
>  
> And 
>  
> {noformat}
> [ldap] 
> uri = ldap://nsp-daf178e8.ad1.prd.us-phx.odc.im:389 
> user_filter = memberOf=cn=rvs-all-prd_usphx,ou=groups,dc=odc,dc=im
> user_name_attr = uid 
> group_member_attr =
> superuser_filter = memberOf=cn=rvd-sudo_all-prd_usphx,ou=groups,dc=odc,dc=im 
> data_profiler_filter = 
> bind_user = 
> bind_password = 
> basedn = ou=people,dc=odc,dc=im 
> cacert = /opt/orchestration/airflow/ldap_ca.crt 
> search_scope = LEVEL
> {noformat}
> I am hitting below exception:
> {noformat}
>   File "/usr/local/lib/python3.5/site-packages/ldap3/operation/search.py", 
> line 215, in parse_filter     
> raise LDAPInvalidFilterError('malformed filter') 
> ldap3.core.exceptions.LDAPInvalidFilterError: malformed filter
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AIRFLOW-3270) Apache airflow 1.10.0 integration with LDAP anonmyously

2018-11-02 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673287#comment-16673287
 ] 

Ash Berlin-Taylor edited comment on AIRFLOW-3270 at 11/2/18 3:44 PM:
-

[~ashb] : the '\{{ = }}'  as the end of the line is copy/paste issue on this 
JIRA. Below is the correct one without formatting. Below is the full exception 
stack included.

 
{code}
[ldap]

uri = ldap://nsp-daf178e8.ad1.prd.us-phx.odc.im:389

user_filter = memberOf=cn=rvs-all-prd_usphx,ou=groups,dc=odc,dc=im

user_name_attr = uid

group_member_attr =

superuser_filter = memberOf=cn=rvd-sudo_all-prd_usphx,ou=groups,dc=odc,dc=im

data_profiler_filter =

bind_user =

bind_password =

basedn = ou=people,dc=odc,dc=im

cacert = /opt/orchestration/airflow/ldap_ca.crt

search_scope = LEVEL
{code}

 ===

===

{code}
[2018-10-30 04:01:04,520] ERROR in app: Exception on /admin/airflow/login [POST]
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1988, in 
wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1641, in 
full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1544, in 
handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.5/site-packages/flask/_compat.py", line 33, in 
reraise
raise value
File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1639, in 
full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1625, in 
dispatch_request
return 
self.view_functions[rule.endpoint|https://github.com/cannatag/ldap3/issues/**req.view_args]
File "/usr/local/lib/python3.5/site-packages/flask_admin/base.py", line 69, in 
inner
return self._run_view(f, *args, **kwargs)
File "/usr/local/lib/python3.5/site-packages/flask_admin/base.py", line 368, in 
_run_view
return fn(self, *args, **kwargs)
File "/usr/local/lib/python3.5/site-packages/airflow/www/views.py", line 650, 
in login
return airflow.login.login(self, request)
File 
"/usr/local/lib/python3.5/site-packages/airflow/contrib/auth/backends/ldap_auth.py",
 line 268, in login
LdapUser.try_login(username, password)
File 
"/usr/local/lib/python3.5/site-packages/airflow/contrib/auth/backends/ldap_auth.py",
 line 180, in try_login
search_scope=native(search_scope))
File "/usr/local/lib/python3.5/site-packages/ldap3/core/connection.py", line 
779, in search
[2018-10-30 04:01:04,520] [72] \{app.py:1587} ERROR - Exception on 
/admin/airflow/login [POST]
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1988, in 
wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1641, in 
full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1544, in 
handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.5/site-packages/flask/_compat.py", line 33, in 
reraise
raise value
File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1639, in 
full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1625, in 
dispatch_request
return 
self.view_functions[rule.endpoint|https://github.com/cannatag/ldap3/issues/**req.view_args]
File "/usr/local/lib/python3.5/site-packages/flask_admin/base.py", line 69, in 
inner
return self._run_view(f, *args, **kwargs)
File "/usr/local/lib/python3.5/site-packages/flask_admin/base.py", line 368, in 
_run_view
return fn(self, *args, **kwargs)
File "/usr/local/lib/python3.5/site-packages/airflow/www/views.py", line 650, 
in login
return airflow.login.login(self, request)
File 
"/usr/local/lib/python3.5/site-packages/airflow/contrib/auth/backends/ldap_auth.py",
 line 268, in login
LdapUser.try_login(username, password)
File 
"/usr/local/lib/python3.5/site-packages/airflow/contrib/auth/backends/ldap_auth.py",
 line 180, in try_login
search_scope=native(search_scope))
File "/usr/local/lib/python3.5/site-packages/ldap3/core/connection.py", line 
779, in search
check_names=self.check_names)
File "/usr/local/lib/python3.5/site-packages/ldap3/operation/search.py", line 
372, in search_operation
request['filter'] = compile_filter(parse_filter(search_filter, schema, 
auto_escape, auto_encode, validator, check_names).elements[0]) # parse the 
searchFilter string and compile it starting from the root node
File "/usr/local/lib/python3.5/site-packages/ldap3/operation/search.py", line 
215, in parse_filter
raise LDAPInvalidFilterError('malformed filter')
ldap3.core.exceptions.LDAPInvalidFi

[jira] [Commented] (AIRFLOW-3292) `delete_dag` endpoint and cli commands don't delete on exact dag_id matching

2018-11-02 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673289#comment-16673289
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3292:


{{.}} in a dag is reserves for subdags, so deleting {{schema}} also deletes the 
subdags too. If {{schema.table1}} is _NOT_ a sub-dag then we should probably 
add better validation of the dag id to only allow dots in dag ids when used in 
subdags.

Does that help?

> `delete_dag` endpoint and cli commands don't delete on exact dag_id matching
> 
>
> Key: AIRFLOW-3292
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3292
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api, cli
>Affects Versions: 1.10.0
>Reporter: Teresa Martyny
>Priority: Major
>
> If you have the following dag ids: `schema`, `schema.table1`, 
> `schema.table2`, `schema_replace`
> When you hit the delete_dag endpoint with the dag id: `schema`, it will 
> delete `schema`, `schema.table1`, and `schema.table2`, not just `schema`. 
> Underscores are fine so it doesn't delete `schema_replace`, but periods are 
> not.
> If this is expected behavior, clarifying that functionality in the docs would 
> be great, and then I can submit a feature request for the ability to use 
> regex for exact matching with this command and endpoint.
> Thanks!! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3291) Update S3KeySensor to not need s3:GetObject permissions

2018-11-02 Thread Ash Berlin-Taylor (JIRA)
Ash Berlin-Taylor created AIRFLOW-3291:
--

 Summary: Update S3KeySensor to not need s3:GetObject permissions
 Key: AIRFLOW-3291
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3291
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Ash Berlin-Taylor


The S3KeySensor/S3Hook as it is currently written requires {{s3:GetObject}} 
permissions on the bucket (as it does a HeadObject API call) - it would be nice 
if it could use ListBucket instead as for our use case we don't really want 
Airflow reading the files as EMR does that for us.

This would be doable by changing the implemention of check_key_exists and 
check_wildcard_exists(?) to use list instead/only and not try to load the Key 
object.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3270) Apache airflow 1.10.0 integration with LDAP anonmyously

2018-11-02 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673000#comment-16673000
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3270:


Can you include the full stack trace? The problem could be in your 
{{superuser_filter}} (the '{{ = }}' at the end of it look suspect at first 
sight.

> Apache airflow 1.10.0 integration with LDAP anonmyously
> ---
>
> Key: AIRFLOW-3270
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3270
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Affects Versions: 1.10.0
>Reporter: Hari Krishna ADDEPALLI LN
>Priority: Blocker
>
> Please advise what to include in airflow.cfg when going to integrate with 
> LDAP anonymously ? We are using DS389 as LDAP server vendor name. 
>  
> {noformat}
> [webserver] 
> authenticate = True 
> auth_backend = airflow.contrib.auth.backends.ldap_auth  
> {noformat}
>  
> And 
>  
> {noformat}
> [ldap] 
> uri = ldap://nsp-daf178e8.ad1.prd.us-phx.odc.im:389 
> user_filter = memberOf=cn=rvs-all-prd_usphx,ou=groups,dc=odc,dc=im
> user_name_attr = uid 
> group_member_attr =
> superuser_filter = memberOf=cn=rvd-sudo_all-prd_usphx,ou=groups,dc=odc,dc=im 
> data_profiler_filter = 
> bind_user = 
> bind_password = 
> basedn = ou=people,dc=odc,dc=im 
> cacert = /opt/orchestration/airflow/ldap_ca.crt 
> search_scope = LEVEL
> {noformat}
> I am hitting below exception:
> {noformat}
>   File "/usr/local/lib/python3.5/site-packages/ldap3/operation/search.py", 
> line 215, in parse_filter     
> raise LDAPInvalidFilterError('malformed filter') 
> ldap3.core.exceptions.LDAPInvalidFilterError: malformed filter
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3289) BashOperator mangles {{\}} escapes in commands

2018-11-02 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-3289:
---
Summary: BashOperator mangles {{\}} escapes in commands  (was: sed called 
from BashOperator not working as expected)

> BashOperator mangles {{\}} escapes in commands
> --
>
> Key: AIRFLOW-3289
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3289
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Nikolay Semyachkin
>Priority: Major
> Attachments: example.csv, issue_proof.py
>
>
> I want to call a sed command on csv file to replace empty values (,,) with \N.
> I can do it with the following bash command 
> {code:java}
> cat example.csv | sed 's;,,;,\\N,;g' > example_processed.csv{code}
> But when I try to do the same with airflow BashOperator, it substitutes ,, 
> with N (instead of \N).
>  
> I attached the code and csv file to reproduce.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3289) sed called from BashOperator not working as expected

2018-11-02 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672996#comment-16672996
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3289:


I suspect a work-around for now is to specify \{{N}} - there may be some 
escaping bug in the bash operator

> sed called from BashOperator not working as expected
> 
>
> Key: AIRFLOW-3289
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3289
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Nikolay Semyachkin
>Priority: Major
> Attachments: example.csv, issue_proof.py
>
>
> I want to call a sed command on csv file to replace empty values (,,) with \N.
> I can do it with the following bash command 
> {code:java}
> cat example.csv | sed 's;,,;,\\N,;g' > example_processed.csv{code}
> But when I try to do the same with airflow BashOperator, it substitutes ,, 
> with N (instead of \N).
>  
> I attached the code and csv file to reproduce.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3287) Core tests DB clean up to be run in the right moment

2018-11-02 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3287.

Resolution: Fixed

Thanks, good spot!

> Core tests DB clean up to be run in the right moment
> 
>
> Key: AIRFLOW-3287
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3287
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Reporter: Jarosław Śmietanka
>Assignee: Jarosław Śmietanka
>Priority: Minor
>
> While running a single unit test of some Dataproc operator, I've spotted that 
> database clean-up code in `tests.core` module is triggered, which should not 
> take place since I run unit test in a completely different place.  
> A proposed solution is to move this database clean-up code inside the 
> CoreTest as `tearDown`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3164) verify certificate of LDAP server

2018-11-01 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-3164:
---
Priority: Blocker  (was: Major)

> verify certificate of LDAP server
> -
>
> Key: AIRFLOW-3164
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3164
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Priority: Blocker
> Fix For: 1.10.1
>
>
> Currently we dont verify the certificate of the Ldap server this can lead to 
> security incidents.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3164) verify certificate of LDAP server

2018-11-01 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-3164:
---
Fix Version/s: 1.10.1

> verify certificate of LDAP server
> -
>
> Key: AIRFLOW-3164
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3164
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Priority: Major
> Fix For: 1.10.1
>
>
> Currently we dont verify the certificate of the Ldap server this can lead to 
> security incidents.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-2875) Env variables should have percent signs escaped before writing to tmp config

2018-11-01 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-2875.
--
Resolution: Duplicate

> Env variables should have percent signs escaped before writing to tmp config
> 
>
> Key: AIRFLOW-2875
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2875
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration
> Environment: Ubuntu
> Airflow 1.10rc2
>Reporter: William Horton
>Priority: Major
>
> I encountered this when I was using an environment variable for 
> `AIRFLOW__CELERY__BROKER_URL`. The airflow worker was able to run and 
> communicate with the SQS queue, but when it received a task and began to run 
> it, I encountered an error with this trace:
> {code:java}
> [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring Traceback (most recent call last):
> [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring File "/opt/airflow/venv/bin/airflow", line 32, in 
> [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring args.func(args)
> [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring File 
> "/opt/airflow/venv/local/lib/python2.7/site-packages/airflow/utils/cli.py", 
> line 74, in wrapper
> [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring return f(*args, **kwargs)
> [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring File 
> "/opt/airflow/venv/local/lib/python2.7/site-packages/airflow/bin/cli.py", 
> line 460, in run
> [2018-08-08 15:19:24,403] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring conf.set(section, option, value)
> [2018-08-08 15:19:24,403] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring File 
> "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/_init_.py",
>  line 1239, in set
> [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring super(ConfigParser, self).set(section, option, value)
> [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring File 
> "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/_init_.py",
>  line 914, in set
> [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring value)
> [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring File 
> "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/_init_.py",
>  line 392, in before_set
> [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring "position %d" % (value, tmp_value.find('%')))
> [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring ValueError: invalid interpolation syntax in 
> {code}
> The issue was that the broker url had a percent sign, and when the cli called 
> `conf.set(section, option, value)`, it was throwing because it interpreted 
> the percent as an interpolation.
> To avoid this issue, I would propose that the environment variables be 
> escaped when being written in `utils.configuration.tmp_configuration_copy`, 
> so that when `conf.set` is called in `bin/cli`, it doesn't throw on these 
> unescaped values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2423) syncing DAGs without scheduler/web-server restart

2018-11-01 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2423.

   Resolution: Fixed
Fix Version/s: (was: 2.0.0)

> syncing DAGs without scheduler/web-server restart
> -
>
> Key: AIRFLOW-2423
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2423
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: DAG
>Reporter: Aditi Verma
>Assignee: Aditi Verma
>Priority: Major
>  Labels: features
>
> Syncing DAGs from a common remote location is useful while running Airflow on 
> a distributed setup (like mesos/kubernetes), where the hosts/containers 
> running the scheduler and the web-server can change with time. Also, where 
> scheduler and web-server restart should be avoided for every DAG 
> update/addition



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1371) Customize log_filepath to allow files writen to NFS-shares

2018-11-01 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-1371.

Resolution: Fixed

Logging path is now customizeable

> Customize log_filepath to allow files writen to NFS-shares
> --
>
> Key: AIRFLOW-1371
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1371
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: models
>Affects Versions: 1.8.0
>Reporter: Alexander Bij
>Priority: Minor
>
> To be able to send logs to a NFS-mount the folder and files cannot contain  
> [reserved characters | 
> https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx]
>  like:
> * < (less than)
> * > (greater than)
> * : (colon)
> * " (double quote)
> * / (forward slash)
> * \ (backslash)
> * ? (question mark)
> * (asterisk)
> Airflow writes the logs with an ISO-datetime format. This file contains one 
> of these characters.
> There are some solutions. What I would like is a configurable log_file_format 
> to create log-files.
> {code}
> # Fetched by views.py and written by models.py currently like this:
> @property
> def log_filepath(self):
> iso = self.execution_date.isoformat()
> log = os.path.expanduser(configuration.get('core', 'BASE_LOG_FOLDER'))
> return "{log}/{self.dag_id}/{self.task_id}/{iso}.log".format(**locals()))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-2278) Add possibility to edit DAGs in webapp

2018-11-01 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-2278.
--
Resolution: Won't Fix

I'm making a call on this one - it is out of scope for Airflow and being able 
to edit DAG files won't work on too many cases (Celery workers, Kub etc.)

> Add possibility to edit DAGs in webapp
> --
>
> Key: AIRFLOW-2278
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2278
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webapp
>Reporter: Mykola Mykhalov
>Assignee: Mykola Mykhalov
>Priority: Minor
>
> When you need to make some minor changes in your DAG would be nice to have 
> possibility to edit it from webapp.
> To protect DAG from editing by everyone should be set up arg inside DAG 
> 'editable_by'. Where '*' means everyone, and  ['user1', 'user2'] means 
> specific users.
> Example when DAG can be editable in webapp by user1 and user2 only:
> args = {
>  'owner': 'airflow',
>  'editable_by': ['user1', 'user2'],
>  'start_date': airflow.utils.dates.days_ago(2)
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1905) Make start_doc_server script python3 compatible

2018-11-01 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-1905.

Resolution: Fixed

> Make start_doc_server script  python3 compatible
> 
>
> Key: AIRFLOW-1905
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1905
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docs
>Reporter: Sanjay Pillai
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-1540) Airflow 1.8.1 - Add proxies to slack operator

2018-11-01 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-1540.
--
Resolution: Won't Fix

This should just be cone by setting the `http_proxy` or `https_proxy` 
environment variables.

> Airflow 1.8.1 - Add proxies to slack operator
> -
>
> Key: AIRFLOW-1540
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1540
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: operators
>Reporter: user_airflow
>Assignee: user_airflow
>Priority: Critical
>
> We are trying to use slack operator from the cloud server that use proxies to 
> connect open internet including slack. Currently the connection to Slack APIs 
> fail everytime with connection timed out. Need to include proxies option in 
> Slack operator to resolve the issue.
> Pull request created: https://github.com/apache/incubator-airflow/pull/2548



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-1171) Encoding error for non latin-1 Postgres database

2018-11-01 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-1171.
--
Resolution: Invalid

Now invalid due to the different way of doing things in AIRFLOW-1170

> Encoding error for non latin-1 Postgres database
> 
>
> Key: AIRFLOW-1171
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1171
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db, hooks
>Affects Versions: 1.8.0
> Environment: macOS 10.12.5
> Python 2.7.12
> Postgres 9.6.1
> However, these are irrelevant to this issue.
>Reporter: Richard Lee
>Assignee: Richard Lee
>Priority: Major
>
> There's [a known issue|https://github.com/psycopg/psycopg2/issues/331] from 
> psycopg2 that Airflow ignores the encoding settings from db by default and 
> which results in encoding error if there's any non latin-1 content in 
> database cell.
> Reference stack trace:
> {code}
>   File "dags/recipe_hourly_pageviews.py", line 73, in 
> dag.cli()
>   File 
> "/Users/dlackty/.pyenv/versions/2.7.12/lib/python2.7/site-packages/airflow/models.py",
>  line 3339, in cli
> args.func(args, self)
>   File 
> "/Users/dlackty/.pyenv/versions/2.7.12/lib/python2.7/site-packages/airflow/bin/cli.py",
>  line 585, in test
> ti.run(ignore_task_deps=True, ignore_ti_state=True, test_mode=True)
>   File 
> "/Users/dlackty/.pyenv/versions/2.7.12/lib/python2.7/site-packages/airflow/utils/db.py",
>  line 53, in wrapper
> result = func(*args, **kwargs)
>   File 
> "/Users/dlackty/.pyenv/versions/2.7.12/lib/python2.7/site-packages/airflow/models.py",
>  line 1374, in run
> result = task_copy.execute(context=context)
>   File 
> "/Users/dlackty/.pyenv/versions/2.7.12/lib/python2.7/site-packages/airflow/operators/generic_transfer.py",
>  l
> ine 78, in execute
> destination_hook.insert_rows(table=self.destination_table, rows=results)
>   File 
> "/Users/dlackty/.pyenv/versions/2.7.12/lib/python2.7/site-packages/airflow/hooks/dbapi_hook.py",
>  line 215, i
> n insert_rows
> l.append(self._serialize_cell(cell, conn))
>   File 
> "/Users/dlackty/.pyenv/versions/2.7.12/lib/python2.7/site-packages/airflow/hooks/postgres_hook.py",
>  line 70,
>  in _serialize_cell
> return psycopg2.extensions.adapt(cell).getquoted().decode('utf-8')
> UnicodeEncodeError: 'latin-1' codec can't encode characters in position 6-10: 
> ordinal not in range(256)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1027) Task details cannot be shown when PythonOperator calls a partial function

2018-11-01 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-1027.

Resolution: Duplicate

> Task details cannot be shown when PythonOperator calls a partial function
> -
>
> Key: AIRFLOW-1027
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1027
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.7.1
>Reporter: Adrian Partl
>Assignee: Adrian Partl
>Priority: Minor
>
> Showing task details of a PythonOperator that uses a `functools.partial` as a 
> callable results in the following error:
> {noformat}
>   File "/usr/lib/python2.7/site-packages/airflow/www/views.py", line 909, in 
> task
> special_attrs_rendered[attr_name] = attr_renderer[attr_name](source)
>   File "/usr/lib/python2.7/site-packages/airflow/www/views.py", line 224, in 
> 
> inspect.getsource(x), lexers.PythonLexer),
>   File "/usr/lib64/python2.7/inspect.py", line 701, in getsource
> lines, lnum = getsourcelines(object)
>   File "/usr/lib64/python2.7/inspect.py", line 690, in getsourcelines
> lines, lnum = findsource(object)
>   File "/usr/lib64/python2.7/inspect.py", line 526, in findsource
> file = getfile(object)
>   File "/usr/lib64/python2.7/inspect.py", line 420, in getfile
> 'function, traceback, frame, or code object'.format(object))
> TypeError:  is not a module, class, 
> method, function, traceback, frame, or code object
> {noformat}
> A sample dag definition for this is:
> {noformat}
> def func_with_two_args(arg_1, arg_2):
> pass
> partial_func = functools.partial(func_with_two_args, arg_1=1)
> dag = DAG(dag_id='test_issue_1333_dag', default_args=default_args)
> dag_task1 = PythonOperator(
> task_id='test_dagrun_functool_partial',
> dag=dag,
> python_callable=partial_func)
> {noformat}
> I will provide a PR with a fix for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-1039) Airflow is raising IntegrityError when during parallel DAG trigger

2018-11-01 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-1039.
--
Resolution: Won't Fix

Think this is won't fix - now that run times are millisecond (or are they 
microsecond) accuracy the likely hood of a clash is smaller anyway.

Anyone feel free to re-open if you think I'm wrong and we should handle this

> Airflow is raising IntegrityError when during parallel DAG trigger
> --
>
> Key: AIRFLOW-1039
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1039
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: 1.8.0
>Reporter: Matus Valo
>Priority: Minor
>
> When Two concurrent processes are trying to trigger the same dag with the 
> same execution date at the same time, the IntegrityError is thrown by 
> SQLAlchemy:
> uwsgi[15887]: [2017-03-24 12:51:38,074] {app.py:1587} ERROR - Exception on / 
> [POST]
> uwsgi[15887]: Traceback (most recent call last):
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/flask/app.py", line 
> 1988, in wsgi_app
> uwsgi[15887]: response = self.full_dispatch_request()
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/flask/app.py", line 
> 1641, in full_dispatch_request
> uwsgi[15887]: rv = self.handle_user_exception(e)
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/flask/app.py", line 
> 1544, in handle_user_exception
> uwsgi[15887]: reraise(exc_type, exc_value, tb)
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/flask/app.py", line 
> 1639, in full_dispatch_request
> uwsgi[15887]: rv = self.dispatch_request()
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/flask/app.py", line 
> 1625, in dispatch_request
> uwsgi[15887]: return self.view_functions[rule.endpoint](**req.view_args)
> uwsgi[15887]: File "./ws.py", line 21, in hello
> uwsgi[15887]: trigger_dag('poc_dag2', run_id=str(uuid1()), 
> conf=json.dumps({'input_files': input_files}), execution_date=datetime.now())
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/airflow/api/common/experimental/trigger_dag.py",
>  line 56, in trigger_dag
> uwsgi[15887]: external_trigger=True
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/airflow/utils/db.py", 
> line 53, in wrapper
> uwsgi[15887]: result = func(*args, **kwargs)
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/airflow/models.py", 
> line 3377, in create_dagrun
> uwsgi[15887]: session.commit()
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 874, in commit
> uwsgi[15887]: self.transaction.commit()
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 461, in commit
> uwsgi[15887]: self._prepare_impl()
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 441, in _prepare_impl
> uwsgi[15887]: self.session.flush()
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 2139, in flush
> uwsgi[15887]: self._flush(objects)
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 2259, in _flush
> uwsgi[15887]: transaction.rollback(_capture_exception=True)
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py",
>  line 60, in __exit__
> uwsgi[15887]: compat.reraise(exc_type, exc_value, exc_tb)
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/sqlalchemy/orm/session.py",
>  line 2223, in _flush
> uwsgi[15887]: flush_context.execute()
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py",
>  line 389, in execute
> uwsgi[15887]: rec.execute(self)
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py",
>  line 548, in execute
> uwsgi[15887]: uow
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py",
>  line 181, in save_obj
> uwsgi[15887]: mapper, table, insert)
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py",
>  line 835, in _emit_insert_statements
> uwsgi[15887]: execute(statement, params)
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/python2.7/site-packages/sqlalchemy/engine/base.py",
>  line 945, in execute
> uwsgi[15887]: return meth(self, multiparams, params)
> uwsgi[15887]: File 
> "/home/matus/envs/airflow/lib/pytho

[jira] [Resolved] (AIRFLOW-1023) Upload file to S3 using S3 hook fails with "Connection reset by peer"

2018-11-01 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-1023.

Resolution: Cannot Reproduce

I think this doesn't apply anymore with the change to boto3.

If I'm wrong please re-open this issue.

> Upload file to S3 using S3 hook fails with "Connection reset by peer"
> -
>
> Key: AIRFLOW-1023
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1023
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: 1.7.1
>Reporter: Adrian Partl
>Assignee: Adrian Partl
>Priority: Major
>
> Using the S3 hook to upload local files to an S3 bucket fails with 
> {noformat}
>   File "/usr/lib/python2.7/site-packages/airflow/hooks/S3_hook.py", line 364, 
> in load_file
> replace=replace)
>   File "/usr/lib/python2.7/site-packages/boto/s3/key.py", line 1362, in 
> set_contents_from_filename
> encrypt_key=encrypt_key)
>   File "/usr/lib/python2.7/site-packages/boto/s3/key.py", line 1293, in 
> set_contents_from_file
> chunked_transfer=chunked_transfer, size=size)
>   File "/usr/lib/python2.7/site-packages/boto/s3/key.py", line 750, in 
> send_file
> chunked_transfer=chunked_transfer, size=size)
>   File "/usr/lib/python2.7/site-packages/boto/s3/key.py", line 951, in 
> _send_file_internal
> query_args=query_args
>   File "/usr/lib/python2.7/site-packages/boto/s3/connection.py", line 668, in 
> make_request
> retry_handler=retry_handler
>   File "/usr/lib/python2.7/site-packages/boto/connection.py", line 1071, in 
> make_request
> retry_handler=retry_handler)
>   File "/usr/lib/python2.7/site-packages/boto/connection.py", line 1030, in 
> _mexe
> raise ex
> error: [Errno 104] Connection reset by peer
> {noformat}
> This is a known issue with boto and only affects uploads to S3 buckets 
> outside of the standard US location (in my case {{eu-west-1}}).
> The issue is reported on boto side as:
> https://github.com/boto/boto/issues/2207
> A work around is mentioned by user {{anna-buttfield-sirca}} which basically 
> reconnects the boto S3 connection to the corresponding location.
> I will provide a PR implementing the work around, since a resolution of the 
> issue on the boto side seems unlikely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3136) Scheduler Failing the Task retries run while processing Executor Events

2018-10-31 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3136.

   Resolution: Fixed
Fix Version/s: 1.10.1
   2.0.0

> Scheduler Failing the Task retries run while processing Executor Events
> ---
>
> Key: AIRFLOW-3136
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3136
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.9.0
>Reporter: raman
>Priority: Major
> Fix For: 2.0.0, 1.10.1
>
>
> Following behaviour is observed with Airflow 1.9 with LocalExecutor mode
>  
> Airflow scheduler processes the executor events in 
> "_process_executor_events(self, simple_dag_bag, session=None)" function of 
> jobs.py.
> The events are identified by key which is composed of dag id, task id, 
> execution date. So all retries of a task have the same key.
> If task retry interval is very small like 30 seconds than scheduler might 
> schedule the next retry run while the previous task run result is still in 
> the executor event queue.
> Current task run might be in queued state while scheduler is processing the 
> executor's previous events Which might make scheduler to fail the current run 
> because of following code in the jobs.py file
> def _process_executor_events(self, simple_dag_bag, session=None):
>  """
>  Respond to executor events.
>  """
>  # TODO: this shares quite a lot of code with _manage_executor_state
> TI = models.TaskInstance
>  *for key, state in 
> list(self.executor.get_event_buffer(simple_dag_bag.dag_ids)*
>  *.items()):*
>  dag_id, task_id, execution_date = key
>  self.log.info(
>  "Executor reports %s.%s execution_date=%s as %s",
>  dag_id, task_id, execution_date, state
>  )
>  if state == State.FAILED or state == State.SUCCESS:
>  qry = session.query(TI).filter(TI.dag_id == dag_id,
>  TI.task_id == task_id,
>  TI.execution_date == execution_date)
>  ti = qry.first()
>  if not ti:
>  self.log.warning("TaskInstance %s went missing from the database", ti)
>  continue
> TODO: should we fail RUNNING as well, as we do in Backfills?
>  *if ti.state == State.QUEUED:*
>  msg = ("Executor reports task instance %s finished (%s) "
>  "although the task says its %s. Was the task "
>  "killed externally?".format(ti, state, ti.state))
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (AIRFLOW-2780) Adds IMAP Hook to interact with a mail server

2018-10-31 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-2780:
---
Comment: was deleted

(was: feluelle closed pull request #3661: [AIRFLOW-2780] Adds IMAP Hook to 
interact with a mail server
URL: https://github.com/apache/incubator-airflow/pull/3661
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/imap_hook.py 
b/airflow/contrib/hooks/imap_hook.py
new file mode 100644
index 00..b9cc44fdaa
--- /dev/null
+++ b/airflow/contrib/hooks/imap_hook.py
@@ -0,0 +1,262 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import email
+import imaplib
+import re
+
+from airflow import LoggingMixin
+from airflow.hooks.base_hook import BaseHook
+
+
+class ImapHook(BaseHook):
+"""
+This hook connects to a mail server by using the imap protocol.
+
+:param imap_conn_id: The connection id that contains the information
+ used to authenticate the client.
+ The default value is 'imap_default'.
+:type imap_conn_id: str
+"""
+
+def __init__(self, imap_conn_id='imap_default'):
+super(ImapHook, self).__init__(imap_conn_id)
+self.conn = self.get_connection(imap_conn_id)
+self.mail_client = imaplib.IMAP4_SSL(self.conn.host)
+
+def __enter__(self):
+self.mail_client.login(self.conn.login, self.conn.password)
+return self
+
+def __exit__(self, exc_type, exc_val, exc_tb):
+self.mail_client.logout()
+
+def has_mail_attachment(self, name, mail_folder='INBOX', 
check_regex=False):
+"""
+Checks the mail folder for mails containing attachments with the given 
name.
+
+:param name: The name of the attachment that will be searched for.
+:type name: str
+:param mail_folder: The mail folder where to look at.
+The default value is 'INBOX'.
+:type mail_folder: str
+:param check_regex: Checks the name for a regular expression.
+The default value is False.
+:type check_regex: bool
+:returns: True if there is an attachment with the given name and False 
if not.
+:rtype: bool
+"""
+mail_attachments = self._retrieve_mails_attachments_by_name(name, 
mail_folder,
+
check_regex,
+
latest_only=True)
+return len(mail_attachments) > 0
+
+def retrieve_mail_attachments(self, name, mail_folder='INBOX', 
check_regex=False,
+  latest_only=False):
+"""
+Retrieves mail's attachments in the mail folder by its name.
+
+:param name: The name of the attachment that will be downloaded.
+:type name: str
+:param mail_folder: The mail folder where to look at.
+The default value is 'INBOX'.
+:type mail_folder: str
+:param check_regex: Checks the name for a regular expression.
+The default value is False.
+:type check_regex: bool
+:param latest_only: If set to True it will only retrieve
+the first matched attachment.
+The default value is False.
+:type latest_only: bool
+:returns: a list of tuple each containing the attachment filename and 
its payload.
+:rtype: a list of tuple
+"""
+mail_attachments = self._retrieve_mails_attachments_by_name(name, 
mail_folder,
+
check_regex,
+
latest_only)
+return mail_attachments
+
+def download_mail_attachments(self, name, local_output_directory, 
mail_folder='INBOX',
+  check_regex=False, latest_only=False):
+"""
+Downloads mail's attachments in the mail folder by its name
+to the local directory.
+
+:param name: The nam

[jira] [Resolved] (AIRFLOW-2703) Scheduler crashes if Mysql Connectivity is lost

2018-10-31 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2703.

   Resolution: Fixed
Fix Version/s: 2.0.0

> Scheduler crashes if Mysql Connectivity is lost
> ---
>
> Key: AIRFLOW-2703
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2703
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.9.0, 2.0.0
>Reporter: raman
>Assignee: Mishika Singh
>Priority: Major
> Fix For: 2.0.0
>
>
> Airflow scheduler crashes if connectivity to Mysql is lost.
> Below is the stack Trace
> Traceback (most recent call last): File 
> "/usr/src/venv/local/lib/python2.7/site-packages/airflow/jobs.py", line 371, 
> in helper pickle_dags) File 
> "/usr/src/venv/local/lib/python2.7/site-packages/airflow/utils/db.py", line 
> 50, in wrapper result = func(*args, **kwargs) File 
> "/usr/src/venv/local/lib/python2.7/site-packages/airflow/jobs.py", line 1762, 
> in process_file dag.sync_to_db() File 
> "/usr/src/venv/local/lib/python2.7/site-packages/airflow/utils/db.py", line 
> 50, in wrapper result = func(*args, **kwargs) File 
> "/usr/src/venv/local/lib/python2.7/site-packages/airflow/models.py", line 
> 3816, in sync_to_db session.commit() File 
> "/usr/src/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", 
> line 943, in commit self.transaction.commit() File 
> "/usr/src/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", 
> line 471, in commit t[1].commit() File 
> "/usr/src/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", 
> line 1643, in commit self._do_commit() File 
> "/usr/src/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", 
> line 1674, in _do_commit self.connection._commit_impl() File 
> "/usr/src/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", 
> line 726, in _commit_impl self._handle_dbapi_exception(e, None, None, None, 
> None) File 
> "/usr/src/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", 
> line 1413, in _handle_dbapi_exception exc_info File 
> "/usr/src/venv/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", 
> line 203, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, 
> cause=cause) File 
> "/usr/src/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", 
> line 724, in _commit_impl self.engine.dialect.do_commit(self.connection) File 
> "/usr/src/venv/local/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py",
>  line 1784, in do_commit dbapi_connection.commit() OperationalError: 
> (_mysql_exceptions.OperationalError) (2013, 'Lost connection to MySQL server 
> during query') (Background on this error at: http://sqlalche.me/e/e3q8)
> Process DagFileProcessor141318-Process:



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2665) No BASH will cause the dag to fail

2018-10-29 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2665.

Resolution: Fixed

> No BASH will cause the dag to fail
> --
>
> Key: AIRFLOW-2665
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2665
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Christian Barra
>Priority: Major
>  Labels: easy-fix
>
> If you are a running airflow in a system where bash is not available the dags 
> will fail, with no logs inside the UI (you have to scroll through the local 
> logs).
> How to replicate this? just use an alpine based image:
> ```
> [2018-06-22 21:05:20,659] \{jobs.py:1386} INFO - Processing DAG_1
>  [2018-06-22 21:05:20,667] \{local_executor.py:43} INFO - LocalWorker running 
> airflow run DAG_1 stackoverflow 2018-06-22T21:05:19.384402 --local -sd 
> /usr/local/airflow/dags/my_dag.py
>  /bin/sh: exec: line 1: bash: not found
>  [2018-06-22 21:05:20,671] \{local_executor.py:50} ERROR - Failed to execute 
> task Command 'exec bash -c 'airflow run DAG_1 my_task 
> 2018-06-22T21:05:19.384402 --local -sd /usr/local/airflow/dags/my_dag.py'' 
> returned non-zero exit status 127..
>  /bin/sh: exec: line 1: bash: not found
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3208) Apache airflow 1.8.0 integration with LDAP anonmyously

2018-10-29 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3208.

Resolution: Won't Fix

You have reported this against 1.8 which is two releases (and a few years?) old 
now.

If this is still an issue against 1.10 please re-open this.

> Apache airflow 1.8.0 integration with LDAP anonmyously
> --
>
> Key: AIRFLOW-3208
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3208
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: authentication
>Affects Versions: 1.8.0, 1.8.2
>Reporter: Hari Krishna ADDEPALLI LN
>Priority: Critical
>
> Hello.,
> We wanted to have airflow integration with LDAP anonymously, the LDAP is 
> based on either "openldap" or "389 directory Server". Below is the detail 
> added in the airflow.cfg : 
> {noformat}
> [webserver] 
> authenticate = True 
> auth_backend = airflow.contrib.auth.backends.ldap_auth  {noformat}
>   
> {noformat}
> [ldap] 
> uri = ldap://nsp-daf178e8.ad1.prd.us-phx.odc.im:389 
> user_filter =  
> user_name_attr = uid 
> group_member_attr = groupMembership=ou=groups,dc=odc,dc=im 
> superuser_filter = memberOf=cn=rvd-sudo_all-prd_usphx,ou=groups,dc=odc,dc=im 
> data_profiler_filter = 
> bind_user = ou=people,dc=odc,dc=im 
> bind_password = 
> basedn = ou=people,dc=odc,dc=im 
> cacert = /opt/orchestration/airflow/ldap_ca.crt 
> search_scope = SUBTREE{noformat}
> However, when trying to validate, it failed with below exception, please 
> advise what to correct as per provided detail of LDAP as per above ? We only 
> use "basedn=ou=people,dc=odc,dc=im" with provided LDAP host and was able to 
> access anonymously when tried using jxplorer workbench. We are able to do 
> LDAP anonymously both on kibana/elasticsearch/jenkins, however coming to 
> airflow, please advise solution.
>  
> {noformat}
> Traceback (most recent call last):
> File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1988, in 
> wsgi_app
> response = self.full_dispatch_request()
> File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1641, in 
> full_dispatch_request
> rv = self.handle_user_exception(e)
> File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1544, in 
> handle_user_exception
> reraise(exc_type, exc_value, tb)
> File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, in 
> reraise
> raise value
> File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1639, in 
> full_dispatch_request
> rv = self.dispatch_request()
> File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1625, in 
> dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
> File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 69, 
> in inner
> return self._run_view(f, *args, **kwargs)
> File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 368, 
> in _run_view
> return fn(self, *args, **kwargs)
> File "/usr/local/lib/python3.6/site-packages/airflow/www/views.py", line 650, 
> in login
> return airflow.login.login(self, request)
> File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/auth/backends/ldap_auth.py",
>  line 268, in login
> LdapUser.try_login(username, password)
> File 
> "/usr/local/lib/python3.6/site-packages/airflow/contrib/auth/backends/ldap_auth.py",
>  line 180, in try_login
> search_scope=native(search_scope))
> File "/usr/local/lib/python3.6/site-packages/ldap3/core/connection.py", line 
> 779, in search
> check_names=self.check_names)
> File "/usr/local/lib/python3.6/site-packages/ldap3/operation/search.py", line 
> 372, in search_operation
> request['filter'] = compile_filter(parse_filter(search_filter, schema, 
> auto_escape, auto_encode, validator, check_names).elements[0]) # parse the 
> searchFilter string and compile it starting from the root node
> File "/usr/local/lib/python3.6/site-packages/ldap3/operation/search.py", line 
> 206, in parse_filter
> current_node.append(evaluate_match(search_filter[start_pos:end_pos], schema, 
> auto_escape, auto_encode, validator, check_names))
> File "/usr/local/lib/python3.6/site-packages/ldap3/operation/search.py", line 
> 89, in evaluate_match
> raise LDAPInvalidFilterError('invalid matching assertion')
> ldap3.core.exceptions.LDAPInvalidFilterError: invalid matching assertion
> {noformat}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-1970) Database cannot be initialized if an invalid fernet key is provided

2018-10-26 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-1970:
---
Fix Version/s: 1.10.1

> Database cannot be initialized if an invalid fernet key is provided
> ---
>
> Key: AIRFLOW-1970
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1970
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration
>Affects Versions: 1.9.0
> Environment: Python 2.7.12
> PostgreSQL 9.6.3
>Reporter: Michael Backes
>Assignee: Jarosław Śmietanka
>Priority: Major
> Fix For: 2.0.0, 1.10.1
>
>
> If I use an invalid fernet key in my config file, I'm not able to run 
> "airflow initdb" successfully.
> For example if I have the following in my config:
> {panel:title=airflow.cfg}
> ...
> \\# Secret key to save connection passwords in the db
> fernet_key = xxx
> ...
> {panel}
> I will get the following error when running "airflow initdb":
> {panel:title=log}
> [2018-01-05 16:43:00,525] {__init__.py:45} INFO - Using executor LocalExecutor
> DB: postgresql+psycopg2://airflow_user:secret_pw@some.address:5432/airflow
> [2018-01-05 16:43:00,624] {db.py:312} INFO - Creating tables
> INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
> INFO  [alembic.runtime.migration] Will assume transactional DDL.
> Traceback (most recent call last):
>   File "/usr/local/bin/airflow", line 27, in 
> args.func(args)
>   File "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 897, 
> in initdb
> db_utils.initdb()
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/db.py", line 
> 114, in initdb
> schema='airflow_ci'))
>   File "", line 4, in __init__
>   File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/orm/state.py", 
> line 414, in _initialize_instance
> manager.dispatch.init_failure(self, args, kwargs)
>   File 
> "/usr/local/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", 
> line 66, in __exit__
> compat.reraise(exc_type, exc_value, exc_tb)
>   File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/orm/state.py", 
> line 411, in _initialize_instance
> return manager.original_init(*mixed[1:], **kwargs)
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 578, 
> in __init__
> self.extra = extra
>   File "", line 1, in __set__
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 639, 
> in set_extra
> fernet = get_fernet()
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 105, 
> in get_fernet
> return Fernet(configuration.get('core', 'FERNET_KEY').encode('utf-8'))
>   File "/usr/local/lib64/python2.7/site-packages/cryptography/fernet.py", 
> line 34, in __init__
> key = base64.urlsafe_b64decode(key)
>   File "/usr/lib64/python2.7/base64.py", line 119, in urlsafe_b64decode
> return b64decode(s.translate(_urlsafe_decode_translation))
>   File "/usr/lib64/python2.7/base64.py", line 78, in b64decode
> raise TypeError(msg)
> TypeError: Incorrect padding
> {panel}
> I also got an error when I try to add extras to a connection, if fernet_key 
> is empty in the config file. The error message was "Incorrect padding". Once 
> I provided a valid key generated with the instructions given 
> [here|http://airflow.readthedocs.io/en/latest/configuration.html?highlight=fernet#connections]
>  and restarted all of the airflow services it worked without any issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-923) airflow webserver -D flag doesn't daemonize anymore

2018-10-26 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-923.
---
Resolution: Duplicate

> airflow webserver -D flag doesn't daemonize anymore
> ---
>
> Key: AIRFLOW-923
> URL: https://issues.apache.org/jira/browse/AIRFLOW-923
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: joyce chan
>Priority: Trivial
> Fix For: 1.8.0
>
> Attachments: Screen Shot 2018-02-12 at 10.32.23 AM.png, Screen Shot 
> 2018-02-12 at 10.32.33 AM.png
>
>
> Airflow 1.8 rc4
> airflow webserver -D flag doesn't daemonize anymore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (AIRFLOW-2674) BashOperator eats stdout and stderr logs

2018-10-26 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor reopened AIRFLOW-2674:


Re-opening to resolve as duplicate rather than fixed.

> BashOperator eats stdout and stderr logs
> 
>
> Key: AIRFLOW-2674
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2674
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: Tim Swast
>Priority: Minor
>
> I've noticed that when I use the BashOperator, I do not see output from the 
> bash processes in the task logs or even my machine's logs. This makes it 
> difficult-to-impossible debug problems in a BashOperator task.
> See [related StackOverflow question "Airflow BashOperator log doesn't contain 
> full output"|https://stackoverflow.com/q/43400302/101923].
> Possibly related to https://issues.apache.org/jira/browse/AIRFLOW-1733



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2674) BashOperator eats stdout and stderr logs

2018-10-26 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2674.

   Resolution: Duplicate
Fix Version/s: (was: 1.10.1)

> BashOperator eats stdout and stderr logs
> 
>
> Key: AIRFLOW-2674
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2674
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: Tim Swast
>Priority: Minor
>
> I've noticed that when I use the BashOperator, I do not see output from the 
> bash processes in the task logs or even my machine's logs. This makes it 
> difficult-to-impossible debug problems in a BashOperator task.
> See [related StackOverflow question "Airflow BashOperator log doesn't contain 
> full output"|https://stackoverflow.com/q/43400302/101923].
> Possibly related to https://issues.apache.org/jira/browse/AIRFLOW-1733



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-839) docker_operator.py attempts to log status key without first checking existence

2018-10-26 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-839:
--
Fix Version/s: 1.10.1

> docker_operator.py attempts to log status key without first checking existence
> --
>
> Key: AIRFLOW-839
> URL: https://issues.apache.org/jira/browse/AIRFLOW-839
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.7.1
> Environment: arch linux
> python 2.7 and python 3
>Reporter: Mike Perry
>Assignee: Mike Perry
>Priority: Minor
> Fix For: 2.0.0, 1.10.1
>
>
> When pulling a docker image, docker_operator.py attempts to log the `status` 
> key each time it gets output. This is usually fine, but occasionaly no 
> `status` key exists. We've seen this happen when we run out of inode space in 
> our cluster and the docker cli is unable to extract the image. This is 
> consistent with the docker HTTP api docs 
> (https://docs.docker.com/engine/api/v1.24/#create-an-image). If an error 
> occurs, there won't be a `status` key. There will be an `error` key. 
> This is a relatively minor bug, but it obscures the real issue and can 
> sometimes make it difficult to figure out what went wrong. 
> If you agree this should be fixed, I can submit a pr that first checks for 
> status before logging. 
> here's the code that would need changed: 
> https://github.com/apache/incubator-airflow/blob/master/airflow/operators/docker_operator.py#L156
> Stack Trace
> [2017-01-10 22:47:21,980] {models.py:1286} ERROR - 'status'
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 1245, 
> in run
> result = task_copy.execute(context=context)
>   File 
> "/usr/local/lib/python2.7/dist-packages/airflow/operators/docker_operator.py",
>  line 150, in execute
> logging.info("{}".format(output['status']))
> KeyError: 'status'
> [2017-01-10 22:47:21,982] {models.py:1298} INFO - Marking task as UP_FOR_RETRY
> [2017-01-10 22:47:22,006] {models.py:1327} ERROR - 'status'
> [2017-01-10 22:47:22,704] {jobs.py:159} DEBUG - [heart] Boom.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-571) allow gunicorn config to be passed to airflow webserver

2018-10-26 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664926#comment-16664926
 ] 

Ash Berlin-Taylor commented on AIRFLOW-571:
---

A generic approach to configuring gunicorn (either CLI args, or a section in 
the config) that we don't have to update to support every possible specific 
argument sounds like a win!

> allow gunicorn config to be passed to airflow webserver
> ---
>
> Key: AIRFLOW-571
> URL: https://issues.apache.org/jira/browse/AIRFLOW-571
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Dennis O'Brien
>Assignee: Iuliia Volkova
>Priority: Major
>
> I have run into an issue when running airflow webserver behind a load 
> balancer where redirects result in https requests forwarded to http.  I ran 
> into a similar issue with Caravel which also uses gunicorn.  
> https://github.com/airbnb/caravel/issues/978  From that issue:
> {quote}
> When gunicorn is run on a different machine from the load balancer (nginx or 
> ELB), it needs to be told explicitly to trust the X-Forwarded-* headers sent. 
> gunicorn takes an option --forwarded-allow-ips which can either be a comma 
> separated list of ip addresses, or "*" to trust all.
> {quote}
> I don't see a simple way to inject custom arguments to the gunicorn call in 
> `webserver()`.  Rather than making a special case to set 
> --forwarded-allow-ips, it would be nice if the caller of `airflow webserver` 
> could pass an additional gunicorn config file.
> The call to gunicorn is already including a -c and I'm not sure gunicorn will 
> take multiple configs, so maybe we have to parse the config and include each 
> name=value on the gunicorn command line.  Any suggestions on how best to 
> allow this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3257) Pin flake8 version to avoid checks changing over time

2018-10-25 Thread Ash Berlin-Taylor (JIRA)
Ash Berlin-Taylor created AIRFLOW-3257:
--

 Summary: Pin flake8 version to avoid checks changing over time
 Key: AIRFLOW-3257
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3257
 Project: Apache Airflow
  Issue Type: Bug
  Components: ci
Reporter: Ash Berlin-Taylor


Flake8 3.6.0 was just released and it introduced some new checks that didn't 
exist before. As a result all of our CI pipelines are now failing.

To avoid this happening in future we should pin the version of flake8 to 3.5.0 
(currently this is in tox.ini, and setup.py)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1508) Skipped state not part of State.task_states

2018-10-25 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-1508.

   Resolution: Fixed
Fix Version/s: 1.10.1

> Skipped state not part of State.task_states
> ---
>
> Key: AIRFLOW-1508
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1508
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Ace Haidrey
>Assignee: Ace Haidrey
>Priority: Major
> Fix For: 1.10.1
>
>
> In the airflow.state module, 
> [State.task_state|https://github.com/apache/incubator-airflow/blob/master/airflow/utils/state.py#L44]
>  doesn't include the {{SKIPPED}} state even though the {{TaskInstance}} 
> object has it. I was wondering if this was on purpose or a bug. I would think 
> it should be part of the task_state object since that makes sense and will 
> help some of my workflows to not have to add this in manually. 
> I'm not sure who the appropriate person to ask is so thinking I'll tag some 
> people and get some feedback (hopefully that's okay)..
> CC [~criccomini] [~bolke] [~allisonwang]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (AIRFLOW-2524) Airflow integration with AWS Sagemaker

2018-10-25 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor reopened AIRFLOW-2524:


> Airflow integration with AWS Sagemaker
> --
>
> Key: AIRFLOW-2524
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2524
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, contrib
>Reporter: Rajeev Srinivasan
>Assignee: Yang Yu
>Priority: Major
>  Labels: AWS
> Fix For: 1.10.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Would it be possible to orchestrate an end to end  AWS  Sagemaker job using 
> Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3224) Flask Errors when Installing Airflow 1.10 in Kubernetes

2018-10-25 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3224.

Resolution: Duplicate

> Flask Errors when Installing Airflow 1.10 in Kubernetes
> ---
>
> Key: AIRFLOW-3224
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3224
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery, dependencies, kubernetes, scheduler, webapp
>Affects Versions: 1.10.0
>Reporter: Brooke
>Priority: Blocker
>
> I am currently working on deploying Apache Airflow 1.10.0 to a Kubernetes 
> cluster. I am running into some dependency issues with Flask.
> If I use the current version of flask-login (0.4.1), I receive this error:
> {{apache-airflow 1.10.0 has requirement flask-login==0.2.11, but you'll have 
> flask-login 0.4.1 which is incompatible.}}
> With this error, the UI won't render, and instead, I see a text bomb followed 
> by many flask-appbuilder/flask-login warnings.
> If I use the Airflow's requirement of flask-login (0.2.11), I receive this 
> error:

> {{flask-appbuilder 1.12.0 has requirement Flask-Login<0.5,>=0.3, but you'll 
> have flask-login 0.2.11 which is incompatible.}}
> With this error, the UI renders with Airflow 1.9 features and CeleryExecutor 
> won't work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2674) BashOperator eats stdout and stderr logs

2018-10-25 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663468#comment-16663468
 ] 

Ash Berlin-Taylor commented on AIRFLOW-2674:


Was this when you were running {{airflow test}} or when the task is run 
normally by the scheduler?

(Possible duplicate of AIRFLOW-3064

> BashOperator eats stdout and stderr logs
> 
>
> Key: AIRFLOW-2674
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2674
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: Tim Swast
>Priority: Minor
> Fix For: 1.10.1
>
>
> I've noticed that when I use the BashOperator, I do not see output from the 
> bash processes in the task logs or even my machine's logs. This makes it 
> difficult-to-impossible debug problems in a BashOperator task.
> See [related StackOverflow question "Airflow BashOperator log doesn't contain 
> full output"|https://stackoverflow.com/q/43400302/101923].
> Possibly related to https://issues.apache.org/jira/browse/AIRFLOW-1733



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3172) AttributeError: 'DagModel' object has no attribute 'execution_date'

2018-10-24 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-3172:
---
Affects Version/s: 1.10.0
Fix Version/s: (was: 1.10.0)
   1.10.1

> AttributeError: 'DagModel' object has no attribute 'execution_date'
> ---
>
> Key: AIRFLOW-3172
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3172
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
> Environment: Docker Environment: python:3.6-stretch
>Reporter: Vinnson Lee
>Priority: Major
> Fix For: 1.10.1
>
>
> 2018-10-09 10:13:28,430] ERROR in app: Exception on /admin/dagmodel/ [GET]
> Traceback (most recent call last):
>  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982, in 
> wsgi_app
>  response = self.full_dispatch_request()
>  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614, in 
> full_dispatch_request
>  rv = self.handle_user_exception(e)
>  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517, in 
> handle_user_exception
>  reraise(exc_type, exc_value, tb)
>  File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, in 
> reraise
>  raise value
>  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612, in 
> full_dispatch_request
>  rv = self.dispatch_request()
>  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598, in 
> dispatch_request
>  return self.view_functions[rule.endpoint](**req.view_args)
>  File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 69, 
> in inner
>  return self._run_view(f, *args, **kwargs)
>  File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 368, 
> in _run_view
>  return fn(self, *args, **kwargs)
>  File "/usr/local/lib/python3.6/site-packages/flask_admin/model/base.py", 
> line 1900, in index_view
>  return_url=self._get_list_url(view_args),
>  File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 308, 
> in render
>  return render_template(template, **kwargs)
>  File "/usr/local/lib/python3.6/site-packages/flask/templating.py", line 134, 
> in render_template
>  context, ctx.app)
>  File "/usr/local/lib/python3.6/site-packages/flask/templating.py", line 116, 
> in _render
>  rv = template.render(context)
>  File "/usr/local/lib/python3.6/site-packages/jinja2/environment.py", line 
> 989, in render
>  return self.environment.handle_exception(exc_info, True)
>  File "/usr/local/lib/python3.6/site-packages/jinja2/environment.py", line 
> 754, in handle_exception
>  reraise(exc_type, exc_value, tb)
>  File "/usr/local/lib/python3.6/site-packages/jinja2/_compat.py", line 37, in 
> reraise
>  raise value.with_traceback(tb)
>  File 
> "/usr/local/lib/python3.6/site-packages/airflow/www/templates/airflow/list_dags.html",
>  line 22, in top-level template code
>  \{% import 'admin/actions.html' as actionlib with context %}
>  File 
> "/usr/local/lib/python3.6/site-packages/airflow/www/templates/admin/master.html",
>  line 18, in top-level template code
>  \{% extends 'admin/base.html' %}
>  File 
> "/usr/local/lib/python3.6/site-packages/flask_admin/templates/bootstrap3/admin/base.html",
>  line 30, in top-level template code
>  \{% block page_body %}
>  File 
> "/usr/local/lib/python3.6/site-packages/airflow/www/templates/admin/master.html",
>  line 107, in block "page_body"
>  \{% block body %}
>  File 
> "/usr/local/lib/python3.6/site-packages/airflow/www/templates/airflow/list_dags.html",
>  line 67, in block "body"
>  \{% block model_list_table %}
>  File 
> "/usr/local/lib/python3.6/site-packages/airflow/www/templates/airflow/list_dags.html",
>  line 115, in block "model_list_table"
>  \{% block list_row scoped %}
>  File 
> "/usr/local/lib/python3.6/site-packages/airflow/www/templates/airflow/list_dags.html",
>  line 143, in block "list_row"
>  \{{ get_value(row, c) }}
>  File "/usr/local/lib/python3.6/site-packages/flask_admin/model/base.py", 
> line 1742, in get_list_value
>  self.column_type_formatters,
>  File "/usr/local/lib/python3.6/site-packages/flask_admin/model/base.py", 
> line 1707, in _get_list_value
>  value = column_fmt(self, context, model, name)
>  File "/usr/local/lib/python3.6/site-packages/airflow/www/views.py", line 
> 124, in dag_link
>  execution_date=m.execution_date)
> AttributeError: 'DagModel' object has no attribute 'execution_date'
>  
>  
> Its fine to work with SQLlite, but not with mysql



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3249) unify do_xcom_push for all operators

2018-10-24 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662217#comment-16662217
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3249:


Interesting point about the `xcom_push()` vs `self.xcom_push` incompatibility. 
I feel we must be missing something otherwise someone would have complained 
about it not working by now... H! Maybe it never worked for SSHOp etc, in 
which case we don't have to worry about back compat.

For Docker, Bash and HTTP we can have a back-compat check in the constructors 
that checks for `xcom_push_flag` being set, issues a DeprecationWarning and 
then passes that down as `do_xcom_push`

> unify do_xcom_push for all operators
> 
>
> Key: AIRFLOW-3249
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3249
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Ben Marengo
>Assignee: Ben Marengo
>Priority: Major
>
> following the implementation of AIRFLOW-3207 (global option to stop task 
> pushing result to xcom), i did a quick search around to find out which 
> operators have a custom implementation of this {{do_xcom_push}} flag:
> ||operator||instance var||__init__ arg||will change be backward comp?||
> |DatabricksRunNowOperator|do_xcom_push | do_xcom_push|(/)|
> |DatabricksSubmitRunOperator|do_xcom_push| do_xcom_push|(/)|
> |DatastoreExportOperator|xcom_push| xcom_push|(x)|
> |DatastoreImportOperator|xcom_push| xcom_push|(x)|
> |KubernetesPodOperator|xcom_push|xcom_push |(x)|
> |SSHOperator|xcom_push|xcom_push |(x)|
> |WinRMOperator|xcom_push| xcom_push|(x)|
> |BashOperator|xcom_push_flag|xcom_push|(x)|
> |DockerOperator|xcom_push_flag|xcom_push|(x)|
> |SimpleHttpOperator|xcom_push_flag|xcom_push|(x)|
> this custom implementation should be removed.
> i presume also that the operators with instance var = xcom_push conflict with 
> method BaseOperator.xcom_push() and probably aren't working properly anyway!?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3248) Conda support

2018-10-24 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661921#comment-16661921
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3248:


It is not clear what you are asking for in this ticket.

> Conda support
> -
>
> Key: AIRFLOW-3248
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3248
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: ravi satya durga prasad Yenugula
>Priority: Major
>
> Installation support of airflow on conda environment



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3247) Unable to install airflow with Anaconda(py37) enviroment

2018-10-24 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3247.

Resolution: Duplicate

> Unable to install airflow with Anaconda(py37) enviroment
> 
>
> Key: AIRFLOW-3247
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3247
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: python 3.7
> Anaconda 3-5.3
>Reporter: ravi satya durga prasad Yenugula
>Priority: Major
>
> I am facing 'command "python setup.py egg_info" failed with error code 1' 
> error  with the latest Anaconda environment



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1159) Update DockerOperator to use 2.2.X docker-py API

2018-10-23 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-1159.

   Resolution: Duplicate
Fix Version/s: (was: 1.8.0)

> Update DockerOperator to use 2.2.X docker-py API
> 
>
> Key: AIRFLOW-1159
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1159
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 2.0.0
> Environment: 410ppm CO2 :(
>Reporter: Russell Jurney
>Assignee: Russell Jurney
>Priority: Major
>
> Currently DockerOperator uses docker-py version 1.6.0, which is pretty old 
> (Nov 2015). I would like to upgrade it to use the latest API (2.2.x).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-944) Docker operator does not work with Docker >= 1.19

2018-10-23 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-944.
---
Resolution: Duplicate

> Docker operator does not work with Docker >= 1.19
> -
>
> Key: AIRFLOW-944
> URL: https://issues.apache.org/jira/browse/AIRFLOW-944
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.7.1.3, 1.8.0rc4
> Environment: Ubuntu 16.04
>Reporter: Ludovic Claude
>Priority: Major
>
> Docker operator does not work when mem_limit is set and Docker version 1.19 
> or more recent is used.
> Here are the logs, I have seen this issue with Airflow 1.7.1.3 and Airflow 
> 1.8.0 rc4.
> [2017-03-06 11:37:54,895] {base_task_runner.py:95} INFO - Subtask: 
> [2017-03-06 11:37:54,895] {docker_operator.py:132} INFO - Starting docker 
> container from image hbpmip/mipmap
> [2017-03-06 11:37:54,903] {base_task_runner.py:95} INFO - Subtask: 
> [2017-03-06 11:37:54,902] {models.py:1417} ERROR - mem_limit has been moved to
>  host_config in API version 1.19
> [2017-03-06 11:37:54,903] {base_task_runner.py:95} INFO - Subtask: Traceback 
> (most recent call last):
> [2017-03-06 11:37:54,904] {base_task_runner.py:95} INFO - Subtask:   File 
> "/usr/local/lib/python3.5/dist-packages/airflow/models.py", line 1369, in run
> [2017-03-06 11:37:54,904] {base_task_runner.py:95} INFO - Subtask: result 
> = task_copy.execute(context=context)
> [2017-03-06 11:37:54,904] {base_task_runner.py:95} INFO - Subtask:   File 
> "/tmp/src/airflow-imaging-plugins/airflow_pipeline/operators/docker_pipeline_operator.py",
>  line 191, in execute
> [2017-03-06 11:37:54,904] {base_task_runner.py:95} INFO - Subtask: logs = 
> super(DockerPipelineOperator, self).execute(context)
> [2017-03-06 11:37:54,904] {base_task_runner.py:95} INFO - Subtask:   File 
> "/usr/local/lib/python3.5/dist-packages/airflow/operators/docker_operator.py",
>  line 172, in execute
> [2017-03-06 11:37:54,904] {base_task_runner.py:95} INFO - Subtask: 
> user=self.user
> [2017-03-06 11:37:54,904] {base_task_runner.py:95} INFO - Subtask:   File 
> "/usr/local/lib/python3.5/dist-packages/docker/api/container.py", line 133, 
> in create_container
> [2017-03-06 11:37:54,904] {base_task_runner.py:95} INFO - Subtask: 
> volume_driver, stop_signal, networking_config,
> [2017-03-06 11:37:54,904] {base_task_runner.py:95} INFO - Subtask:   File 
> "/usr/local/lib/python3.5/dist-packages/docker/api/container.py", line 138, 
> in create_container_config
> [2017-03-06 11:37:54,904] {base_task_runner.py:95} INFO - Subtask: return 
> utils.create_container_config(self._version, *args, **kwargs)
> [2017-03-06 11:37:54,904] {base_task_runner.py:95} INFO - Subtask:   File 
> "/usr/local/lib/python3.5/dist-packages/docker/utils/utils.py", line 1041, in 
> create_container_config
> [2017-03-06 11:37:54,904] {base_task_runner.py:95} INFO - Subtask: 
> 'mem_limit has been moved to host_config in API version 1.19'
> [2017-03-06 11:37:54,904] {base_task_runner.py:95} INFO - Subtask: 
> docker.errors.InvalidVersion: mem_limit has been moved to host_config in API 
> version 1.19
> [2017-03-06 11:37:54,904] {base_task_runner.py:95} INFO - Subtask: 
> [2017-03-06 11:37:54,903] {models.py:1433} INFO - Marking task as UP_FOR_RETRY
> [2017-03-06 11:37:54,912] {base_task_runner.py:95} INFO - Subtask: 
> [2017-03-06 11:37:54,912] {models.py:1462} ERROR - mem_limit has been moved 
> to host_config in API version 1.19



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2976) pipy docker dependency broken

2018-10-23 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2976.

Resolution: Duplicate

> pipy docker dependency broken
> -
>
> Key: AIRFLOW-2976
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2976
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: dependencies, docker
>Affects Versions: 2.0.0
>Reporter: Lin, Yi-Li
>Assignee: Gerardo Curiel
>Priority: Major
> Attachments: docker_dag.log, docker_dag.py
>
>
> I'm trying to install airflow with docker extras but airflow's dependency 
> will install recent docker-py (3.5.0) from pypi which is incompatible with 
> current DockerOperator.
> DockerOperator will complain that "create_container() got an unexpected 
> keyword argument 'cpu_shares'".
> It looks like that interface is changed from docker-py 3.0.0 and work with 
> docker-py 2.7.0.
> The log and dag file are in attachments.
> Note, installation comment: "AIRFLOW_GPL_UNIDECODE=yes pip install 
> 'apache-airflow[docker,mysql]==1.10.0'"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2288) Source tarball should not extract to root

2018-10-23 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660971#comment-16660971
 ] 

Ash Berlin-Taylor commented on AIRFLOW-2288:


1.9:

{code}
$ http 
https://archive.apache.org/dist/incubator/airflow/1.9.0-incubating/apache-airflow-1.9.0+incubating-source.tar.gz
 | tar -tz
.codecov.yml
.coveragerc
.editorconfig
.github/
.github/PULL_REQUEST_TEMPLATE.md
{code}

1.10

{code}
$ http 
https://archive.apache.org/dist/incubator/airflow/1.10.0-incubating/apache-airflow-1.10.0+incubating-source.tar.gz
 | tar -tz
apache-airflow-1.10.0rc4+incubating/
apache-airflow-1.10.0rc4+incubating/.codecov.yml
apache-airflow-1.10.0rc4+incubating/.coveragerc
apache-airflow-1.10.0rc4+incubating/.editorconfig
apache-airflow-1.10.0rc4+incubating/.github/
apache-airflow-1.10.0rc4+incubating/.github/PULL_REQUEST_TEMPLATE.md
{code}

> Source tarball should not extract to root
> -
>
> Key: AIRFLOW-2288
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2288
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Bolke de Bruin
>Priority: Blocker
> Fix For: 1.10.0
>
>
> {color:#454545}the src tarball extracting to the current{color}
> directory was surprising.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2288) Source tarball should not extract to root

2018-10-23 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2288.

Resolution: Fixed

Yup, fixed in 1.10.0

> Source tarball should not extract to root
> -
>
> Key: AIRFLOW-2288
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2288
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Bolke de Bruin
>Priority: Blocker
> Fix For: 1.10.0
>
>
> {color:#454545}the src tarball extracting to the current{color}
> directory was surprising.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2744) RBAC app doesn't integrate plugins (blueprints etc)

2018-10-23 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-2744:
---
Affects Version/s: (was: 2.0.0)
   1.10.0

> RBAC app doesn't integrate plugins (blueprints etc)
> ---
>
> Key: AIRFLOW-2744
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2744
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webapp, webserver
>Affects Versions: 1.10.0
>Reporter: David Dossett
>Priority: Major
> Fix For: 2.0.0, 1.10.1
>
>
> In the current 1.10.0rc tag, the new RBAC app doesn't integrate any plugins 
> created by a user extending Airflow. In the old www/app.py you had the 
> [integrate_plugins|https://github.com/apache/incubator-airflow/blob/f1083cbada337731ed0b7e27b09eee7a26c8189a/airflow/www/app.py#L126]
>  function. But currently the 
> [www_rbac/app.py|https://github.com/apache/incubator-airflow/blob/f1083cbada337731ed0b7e27b09eee7a26c8189a/airflow/www_rbac/app.py]
>  doesn't pull in any plugins from the plugin_manager. So nothing you do to 
> extend Airflow's webapp will work.
> I think adding the code for registering the blueprints and menu links is a 
> pretty simple fix. I'm not sure how the FAB system is handling the same 
> functionality as Flask-Admin views though.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2744) RBAC app doesn't integrate plugins (blueprints etc)

2018-10-23 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2744.

   Resolution: Fixed
Fix Version/s: 1.10.1
   2.0.0

> RBAC app doesn't integrate plugins (blueprints etc)
> ---
>
> Key: AIRFLOW-2744
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2744
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webapp, webserver
>Affects Versions: 2.0.0
>Reporter: David Dossett
>Priority: Major
> Fix For: 2.0.0, 1.10.1
>
>
> In the current 1.10.0rc tag, the new RBAC app doesn't integrate any plugins 
> created by a user extending Airflow. In the old www/app.py you had the 
> [integrate_plugins|https://github.com/apache/incubator-airflow/blob/f1083cbada337731ed0b7e27b09eee7a26c8189a/airflow/www/app.py#L126]
>  function. But currently the 
> [www_rbac/app.py|https://github.com/apache/incubator-airflow/blob/f1083cbada337731ed0b7e27b09eee7a26c8189a/airflow/www_rbac/app.py]
>  doesn't pull in any plugins from the plugin_manager. So nothing you do to 
> extend Airflow's webapp will work.
> I think adding the code for registering the blueprints and menu links is a 
> pretty simple fix. I'm not sure how the FAB system is handling the same 
> functionality as Flask-Admin views though.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2976) pipy docker dependency broken

2018-10-23 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660912#comment-16660912
 ] 

Ash Berlin-Taylor commented on AIRFLOW-2976:


I think the change in AIRFLOW-3203 fixes this issue. Could you check and let me 
know?

> pipy docker dependency broken
> -
>
> Key: AIRFLOW-2976
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2976
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: dependencies, docker
>Affects Versions: 2.0.0
>Reporter: Lin, Yi-Li
>Assignee: Gerardo Curiel
>Priority: Major
> Attachments: docker_dag.log, docker_dag.py
>
>
> I'm trying to install airflow with docker extras but airflow's dependency 
> will install recent docker-py (3.5.0) from pypi which is incompatible with 
> current DockerOperator.
> DockerOperator will complain that "create_container() got an unexpected 
> keyword argument 'cpu_shares'".
> It looks like that interface is changed from docker-py 3.0.0 and work with 
> docker-py 2.7.0.
> The log and dag file are in attachments.
> Note, installation comment: "AIRFLOW_GPL_UNIDECODE=yes pip install 
> 'apache-airflow[docker,mysql]==1.10.0'"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2976) pipy docker dependency broken

2018-10-23 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660909#comment-16660909
 ] 

Ash Berlin-Taylor commented on AIRFLOW-2976:


I think this one has been fixed. Let me dig out the fix.

> pipy docker dependency broken
> -
>
> Key: AIRFLOW-2976
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2976
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: dependencies, docker
>Affects Versions: 2.0.0
>Reporter: Lin, Yi-Li
>Assignee: Gerardo Curiel
>Priority: Major
> Attachments: docker_dag.log, docker_dag.py
>
>
> I'm trying to install airflow with docker extras but airflow's dependency 
> will install recent docker-py (3.5.0) from pypi which is incompatible with 
> current DockerOperator.
> DockerOperator will complain that "create_container() got an unexpected 
> keyword argument 'cpu_shares'".
> It looks like that interface is changed from docker-py 3.0.0 and work with 
> docker-py 2.7.0.
> The log and dag file are in attachments.
> Note, installation comment: "AIRFLOW_GPL_UNIDECODE=yes pip install 
> 'apache-airflow[docker,mysql]==1.10.0'"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-1163) Add support for x-forwarded-* headers to support access behind AWS ELB

2018-10-23 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-1163:
---
Fix Version/s: 1.10.1
  Summary: Add support for x-forwarded-* headers  to support access 
behind AWS ELB  (was: Cannot Access Airflow Webserver Behind AWS ELB)

> Add support for x-forwarded-* headers  to support access behind AWS ELB
> ---
>
> Key: AIRFLOW-1163
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1163
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.7.1
>Reporter: Tim
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: 2.0.0, 1.10.1
>
>
> Cannot access airflow from behind a load balancer.
> If we go directly to the IP of the server it loads just fine. When trying to 
> use the load balancer cname and forward the request it does not load.
> We updated the base_url to be the LB url but it still does not work. The page 
> sits and spins forever. Eventually it loads some ui elements.
> Here is what I see on the network tab:
> https://puu.sh/vC4Zp/e34131.png
> Here is what our config looks like:
> {code}
> [webserver]
> # The base url of your website as airflow cannot guess what domain or
> # cname you are using. This is use in automated emails that
> # airflow sends to point links to the right web server
> base_url = 
> http://internal-st-airflow-lb-590109685.us-east-1.elb.amazonaws.com:80
> # The ip specified when starting the web server
> web_server_host = 0.0.0.0
> # The port on which to run the web server
> web_server_port = 8080
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3137) Make ProxyFix middleware optional

2018-10-23 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-3137:
---
Fix Version/s: 1.10.1

> Make ProxyFix middleware optional
> -
>
> Key: AIRFLOW-3137
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3137
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Josh Carp
>Assignee: Josh Carp
>Priority: Trivial
> Fix For: 2.0.0, 1.10.1
>
>
> The werkzeug ProxyFix middleware should only be used when the app is run 
> behind a trusted proxy. We should enable ProxyFix via a configuration flag, 
> like in superset.
> From the werkzeug docs: "Do not use this middleware in non-proxy setups for 
> security reasons."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3244) Introduce offset on the execution date for data assessment

2018-10-23 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660409#comment-16660409
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3244:


Not commenting on the whole feature request: but:

> I also use Celery executor, so its workers keep polling during those 2 days, 
> making them unavailable for other DAGs.

In the next release Sensors can be configured to check once then release the 
executor slot back which would address this one point.

> Introduce offset on the execution date for data assessment
> --
>
> Key: AIRFLOW-3244
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3244
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: DAG
>Affects Versions: 1.10.0
>Reporter: Alberto Anceschi
>Priority: Minor
>  Labels: features, request
>
> Hi everyone,
>  
> I'm trying to port my current cronjobs into Airflow. Let's consider a real 
> case scenario: I've to send every week a report and through the pipeline data 
> from Google Analytics needs to be collected, so I need 2 days before running 
> the DAG (data assessment). Week starts on Monday and ends on Sunday, so I 
> need the DAG to run on Wednesday at Midnight UTC.
> In order to see on the Airflow dashboard start_date/exection_date that make 
> sense to me, for now I've used a TimeDeltaSensor that adds that 2 day offset 
> I need, but this is not its purpose. I also use Celery executor, so its 
> workers keep polling during those 2 days, making them unavailable for other 
> DAGs.
> I think that the assumption that at the end of the period scheduled data are 
> ready is not correct and at the same time it's much more intuitive seeing on 
> the dashboard Monday execution dates instead of Tuesday ones.
>  
> What do you think about this request? Any suggestion? Thank you,
>  
> Alberto
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2397) Support affinity policies for Kubernetes executor/operator

2018-10-22 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-2397:
---
Fix Version/s: (was: 1.10.0)
   1.10.1

> Support affinity policies for Kubernetes executor/operator
> --
>
> Key: AIRFLOW-2397
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2397
> Project: Apache Airflow
>  Issue Type: Sub-task
>Reporter: Sergio B
>Assignee: roc chan
>Priority: Major
> Fix For: 2.0.0, 1.10.1
>
>
> In order to be able to have a fine control in the workload distribution 
> implement the ability to set affinity policies in kubernetes would solve 
> complex problems 
> https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#affinity-v1-core



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2854) kubernetes_pod_operator add more configuration items

2018-10-22 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-2854:
---
Fix Version/s: 1.10.1

> kubernetes_pod_operator add more configuration items
> 
>
> Key: AIRFLOW-2854
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2854
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: 2.0.0
>Reporter: pengchen
>Assignee: pengchen
>Priority: Minor
> Fix For: 2.0.0, 1.10.1
>
>
> kubernetes_pod_operator is missing kubernetes pods related configuration 
> items, as follows:
> 1. image_pull_secrets
> _Pull secrets_ are used to _pull_ private container _images_ from registries. 
> In this case, we need to configure the image_pull_secrets in pod spec file
> 2. service_account_name
> When kubernetes is running on rbac Authorization. If it is a job that needs 
> to operate on kubernetes resources, we need to configure service account.
> 3. is_delete_operator_pod
> This option can be given to the user to decide whether to delete the job pod 
> created by pod_operator, which is currently not processed.
> 4. hostnetwork



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2662) support affinity & nodeSelector policies for kubernetes executor/operator

2018-10-22 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-2662:
---
Fix Version/s: 1.10.1

> support affinity & nodeSelector policies for kubernetes executor/operator
> -
>
> Key: AIRFLOW-2662
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2662
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: 2.0.0
>Reporter: pengchen
>Assignee: pengchen
>Priority: Minor
> Fix For: 2.0.0, 1.10.1
>
>
> In this issue(https://issues.apache.org/jira/browse/AIRFLOW-2397), only the 
> affinity function of the kubernetes operator pod is provided, and the 
> affinity function of the kubernetes executor pod is not supported. The full 
> affinity and nodeselector function are provided here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (AIRFLOW-3239) Test discovery partial fails due to incorrect name of the test files

2018-10-22 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor reopened AIRFLOW-3239:


> Test discovery partial fails due to incorrect name of the test files
> 
>
> Key: AIRFLOW-3239
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3239
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: tests
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Major
>
> In PR [https://github.com/apache/incubator-airflow/pull/4049,] I have fixed 
> the incorrect name of some test files (resulting in partial failure in test 
> discovery).
> There are some other scripts with this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3239) Test discovery partial fails due to incorrect name of the test files

2018-10-22 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658880#comment-16658880
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3239:


A couple more files:

tests/api/common/experimental/mark_tasks.py
tests/api/common/experimental/trigger_dag_tests.py
tests/impersonation.py
tests/jobs.py
tests/models.py
tests/plugins_manager.py
tests/utils.py
tests/operators/bash_operator.py
tests/operators/operator.py

I think the ones in models.py are being loaded from tests/\_\_init\_\_.py so 
are being run. But we should remove the need for imports in 
tests/\_\_init\_\_.py et al and name the rest of the files properly

> Test discovery partial fails due to incorrect name of the test files
> 
>
> Key: AIRFLOW-3239
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3239
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: tests
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Major
>
> In PR [https://github.com/apache/incubator-airflow/pull/4049,] I have fixed 
> the incorrect name of some test files (resulting in partial failure in test 
> discovery).
> There are some other scripts with this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3025) Allow to specify dns and dns-search parameters for DockerOperator

2018-10-22 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-3025:
---
Fix Version/s: 1.10.1

> Allow to specify dns and dns-search parameters for DockerOperator
> -
>
> Key: AIRFLOW-3025
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3025
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Konrad Gołuchowski
>Assignee: Konrad Gołuchowski
>Priority: Minor
> Fix For: 2.0.0, 1.10.1
>
>
> Docker allows to specify dns and dns-search options when starting a 
> container. It would be useful to enable DockerOperator to use these two 
> options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2574) initdb fails when mysql password contains percent sign

2018-10-22 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2574.

Resolution: Fixed

> initdb fails when mysql password contains percent sign
> --
>
> Key: AIRFLOW-2574
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2574
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Zihao Zhang
>Priority: Minor
> Fix For: 1.10.1
>
>
> [db.py|https://github.com/apache/incubator-airflow/blob/3358551c8e73d9019900f7a85f18ebfd88591450/airflow/utils/db.py#L345]
>  uses 
> [config.set_main_option|http://alembic.zzzcomputing.com/en/latest/api/config.html#alembic.config.Config.set_main_option]
>  which says "A raw percent sign not part of an interpolation symbol must 
> therefore be escaped"
> When there is a percent sign in database connection string, this will crash 
> due to bad interpolation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2421) HTTPHook and SimpleHTTPOperator do not verify certificates by default

2018-10-22 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-2421:
---
Fix Version/s: 1.10.1
  Component/s: security

> HTTPHook and SimpleHTTPOperator do not verify certificates by default
> -
>
> Key: AIRFLOW-2421
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2421
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks, security
>Affects Versions: 1.8.0
>Reporter: David Adrian
>Priority: Major
> Fix For: 1.10.1
>
>
> To verify HTTPS certificates when using anything built with an HTTP hook, you 
> have to explicitly pass the undocumented {{extra_options = \{"verify": True} 
> }}. The offending line is at 
> https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/http_hook.py#L103.
> {code}
> response = session.send(
> 
> verify=extra_options.get("verify", False),
> 
> )
> {code}
> Not only is this the opposite default of what is expected, the necessary 
> requirements to verify certificates (e.g certifi), are already installed as 
> part of Airflow. I haven't dug through all of the code yet, but I'm concerned 
> that any other connections, operators or hooks built using HTTP hook don't 
> pass this option in.
> Instead, the HTTP hook should default to {{verify=True}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2421) HTTPHook and SimpleHTTPOperator do not verify certificates by default

2018-10-22 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658789#comment-16658789
 ] 

Ash Berlin-Taylor commented on AIRFLOW-2421:


I think we should change the default to verify true - not verifying is the 
wrong default value.

Additionally I think the "default" value for the extra options should come from 
the connection extra field, and merge in any extra settings from the 
per-function dict.



> HTTPHook and SimpleHTTPOperator do not verify certificates by default
> -
>
> Key: AIRFLOW-2421
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2421
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks, security
>Affects Versions: 1.8.0
>Reporter: David Adrian
>Priority: Major
> Fix For: 1.10.1
>
>
> To verify HTTPS certificates when using anything built with an HTTP hook, you 
> have to explicitly pass the undocumented {{extra_options = \{"verify": True} 
> }}. The offending line is at 
> https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/http_hook.py#L103.
> {code}
> response = session.send(
> 
> verify=extra_options.get("verify", False),
> 
> )
> {code}
> Not only is this the opposite default of what is expected, the necessary 
> requirements to verify certificates (e.g certifi), are already installed as 
> part of Airflow. I haven't dug through all of the code yet, but I'm concerned 
> that any other connections, operators or hooks built using HTTP hook don't 
> pass this option in.
> Instead, the HTTP hook should default to {{verify=True}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-2618) Improve UI by add "Next Run" column

2018-10-22 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-2618.
--
Resolution: Duplicate

> Improve UI by add "Next Run" column
> ---
>
> Key: AIRFLOW-2618
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2618
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Reporter: jack
>Priority: Minor
>
> Please add also a column in the UI for "Next Run". Ideally when passing mouse 
> over it we will also see the 5 next scheduled runs.
> This can be very helpful.
> If for some reason you think this is an "overhead" why not adding it and 
> allow a "personalized UI" feature where the user can set if this column will 
> appear or not. This can be a very good feature in allowing users 
> personalizing their own UI columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-63) Dangling Running Jobs

2018-10-22 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658765#comment-16658765
 ] 

Ash Berlin-Taylor commented on AIRFLOW-63:
--

Possibly, though if the scheduler process is killed hard (oom, segfault etc) 
there still may be cases where the job remains running. So I think I'd say "not 
quite yet" and this is still possibly an issue (at least not fixed by my PR)

> Dangling Running Jobs
> -
>
> Key: AIRFLOW-63
> URL: https://issues.apache.org/jira/browse/AIRFLOW-63
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.7.0
> Environment: mac os X with local executor
>Reporter: Giacomo Tagliabe
>Priority: Minor
>
> It seems that if the scheduler is killed unexpectedly, the SchedulerJob 
> remains marked as running. Same thing applies to LocalTaskJob: if a job is 
> running when the scheduler dies, the job remains marked as running forever. 
> I'd expect `kill_zombies` to mark the job with an old heartbeat as not 
> running, but it seems it only marks the related task instances. This to me 
> seems like a bug, I also fail to see the piece of code that  is supposed to 
> do that, which leads me to think that this is not handled at all. I don't 
> think there is anything really critical about having stale jobs marked as 
> running, but they definitely is confusing to see



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3238) Dags, removed from the filesystem, are not deactivated on initdb

2018-10-22 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-3238:
---
Fix Version/s: 1.10.1

> Dags, removed from the filesystem, are not deactivated on initdb
> 
>
> Key: AIRFLOW-3238
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3238
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG
>Reporter: Jason Shao
>Assignee: Jason Shao
>Priority: Major
> Fix For: 1.10.1
>
>
> Removed dags continue to show up in the airflow UI. This can only be 
> remedied, currently, by either deleting the dag or modifying the internal 
> meta db. Fix models.DAG.deactivate_unknown_dags so that removed dags are 
> automatically deactivated (hidden from the UI) on restart.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1788) EMR hook doesn't pass through SecurityConfiguration parameter

2018-10-19 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-1788.

Resolution: Fixed

AIRFLOW-3197 allows this (and any other params in the future)

> EMR hook doesn't pass through SecurityConfiguration parameter
> -
>
> Key: AIRFLOW-1788
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1788
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Sam Kitajima-Kimbrel
>Assignee: Sam Kitajima-Kimbrel
>Priority: Major
>
> Expected: `SecurityConfiguration` argument to boto3's run_job_flow() method 
> are passed through via the EMR hook in Airflow.
> Actual: Not passed through because hard-coded parameter names.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3215) Creating EMR using python from airflow

2018-10-19 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656467#comment-16656467
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3215:


That is an error from a Cloudformation stack provided by AWS, not Airflow, so 
it isn't something we can help with.

> Creating EMR using python from airflow
> --
>
> Key: AIRFLOW-3215
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3215
> Project: Apache Airflow
>  Issue Type: Task
>  Components: aws, DAG
>Affects Versions: 1.7.0
> Environment: Airflow with boto3 - connecting AWS -configure with 
> access and security 
>Reporter: Pandu
>Priority: Major
>
> I have problem with imports while creating EMR. 
> import boto3
> connection = boto3.client(
> 'emr'
> )
> cluster_id = connection.run_job_flow(
>   Name='emr123',
>   LogUri='s3://emr-spark-application/log.txt',
>   ReleaseLabel='emr-4.1.0',
>   Instances={
> 'InstanceGroups': [
> {
>   'Name': "Master nodes",
>   'Market': 'ON_DEMAND',
>   'InstanceRole': 'MASTER',
>   'InstanceType': 'm1.large',
>   'InstanceCount': 1
> },
> {
>   'Name': "Slave nodes",
>   'Market': 'ON_DEMAND',
>   'InstanceRole': 'CORE',
>   'InstanceType': 'm1.large',
>   'InstanceCount': 1
> }
> ],
> 'KeepJobFlowAliveWhenNoSteps': True,
> 'TerminationProtected': False
>   },
>   Applications=[{
>  'Name': 'Hadoop'
> }, {
>  'Name': 'Spark'
>   }],
>   JobFlowRole='EMR_EC2_DefaultRole',
>   ServiceRole='EMR_DefaultRole'
> )
> print (cluster_id['JobFlowId'])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3183) Potential Bug in utils/dag_processing/DagFileProcessorManager.max_runs_reached()

2018-10-18 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-3183:
---
Fix Version/s: 1.10.1

> Potential Bug in 
> utils/dag_processing/DagFileProcessorManager.max_runs_reached()
> 
>
> Key: AIRFLOW-3183
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3183
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Minor
> Fix For: 2.0.0, 1.10.1
>
>
> In 
> [https://github.com/apache/incubator-airflow/blob/df7c16a3ce01625277dd2e5c4ce4ed096dcfbb40/airflow/utils/dag_processing.py#L581,]
>  the condition is to ensure the function will return False if any file's 
> run_count is smaller than max_run.
> But the operator used here is "!=". Instead, it should be "<".
> This is because in *DagFileProcessorManager*, there is no statement helping 
> limit the upper limit of run_count. It's possible that files' run_count will 
> be bigger than max_run. In such case, max_runs_reached() method may fail its 
> purpose.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3227) Airflow is broken when deleting a Variable

2018-10-18 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654934#comment-16654934
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3227:


The path is {{/home/ubuntu/airflow/dags/loadairflow.py}}. That is outside of 
Airflow install. There is no file called "loadairflow.py" in the airflow source 
tree.

> Airflow is broken when deleting a Variable
> --
>
> Key: AIRFLOW-3227
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3227
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: jack
>Priority: Critical
>
> I created a Variable and used it with one of my DAGs.
> After a while I modified the DAG and it no longer requires this Variable.
> I deleted the Variable from the UI using the delete button and then Airflow 
> generates this message:
> {color:#242729}Broken DAG: [/home/ubuntu/airflow/dags/loadairflow.py] 
> u'Variable order_id_imported does not exist'{color}
>  
> Only when re-creating the Variable the message disappears.
> This message has nothing to do with my DAG. It's a BUG in loadairflow.py file
>  
> Please issue a fix for this. It forces me to keep having Variables I don't 
> need.
>  
> StackOverflow issue: 
> https://stackoverflow.com/questions/52856018/why-airflow-force-me-to-have-variable-which-i-dont-need



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3227) Airflow is broken when deleting a Variable

2018-10-18 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654899#comment-16654899
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3227:


{{loadairflow.py}} is in your dags/ folder, not part of airflow - this is your 
file.

> Airflow is broken when deleting a Variable
> --
>
> Key: AIRFLOW-3227
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3227
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: jack
>Priority: Critical
>
> I created a Variable and used it with one of my DAGs.
> After a while I modified the DAG and it no longer requires this Variable.
> I deleted the Variable from the UI using the delete button and then Airflow 
> generates this message:
> {color:#242729}Broken DAG: [/home/ubuntu/airflow/dags/loadairflow.py] 
> u'Variable order_id_imported does not exist'{color}
>  
> Only when re-creating the Variable the message disappears.
> This message has nothing to do with my DAG. It's a BUG in loadairflow.py file
>  
> Please issue a fix for this. It forces me to keep having Variables I don't 
> need.
>  
> StackOverflow issue: 
> https://stackoverflow.com/questions/52856018/why-airflow-force-me-to-have-variable-which-i-dont-need



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2622) Add "confirm=False" option to SFTPOperator

2018-10-17 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-2622:
---
Summary: Add "confirm=False" option to SFTPOperator  (was: SFTPOperator 
cannot set confirm=False)

> Add "confirm=False" option to SFTPOperator
> --
>
> Key: AIRFLOW-2622
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2622
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: plugins
>Affects Versions: 1.8.0
>Reporter: David
>Assignee: David
>Priority: Minor
> Fix For: 2.0.0, 1.10.1
>
>
> Paramiko supports setting `confirm=False` when PUTing a file via SFTP, the 
> SFTPOperator does not support this property though



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2622) SFTPOperator cannot set confirm=False

2018-10-17 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-2622:
---
Fix Version/s: 1.10.1

> SFTPOperator cannot set confirm=False
> -
>
> Key: AIRFLOW-2622
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2622
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: plugins
>Affects Versions: 1.8.0
>Reporter: David
>Assignee: David
>Priority: Minor
> Fix For: 2.0.0, 1.10.1
>
>
> Paramiko supports setting `confirm=False` when PUTing a file via SFTP, the 
> SFTPOperator does not support this property though



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2797) Add ability to create Google Dataproc cluster with custom image

2018-10-17 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-2797:
---
Fix Version/s: 1.10.1

Pulling in to 1.10.1 as this change is a depend of AIRFLOW-2789 (well sort of, 
wouldn't be hard to unpick, but seems nice to have this change in)

> Add ability to create Google Dataproc cluster with custom image
> ---
>
> Key: AIRFLOW-2797
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2797
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib, gcp, operators
>Affects Versions: 2.0.0
>Reporter: Jarosław Śmietanka
>Assignee: Jarosław Śmietanka
>Priority: Minor
> Fix For: 1.10.1
>
>
> In GCP, it is possible to create Dataproc cluster with a [custom 
> image|https://cloud.google.com/dataproc/docs/guides/dataproc-images] that 
> includes user's pre-installed packages. It significantly reduces the startup 
> time of the cluster.
>  
> Since I already have a code which does that, I volunteer to bring it to the 
> community :)  
> This improvement won't change many components and should not require 
> groundbreaking changes to DataprocClusterCreateOperator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-1762) Use key_file in create_tunnel()

2018-10-17 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-1762:
---
Fix Version/s: 1.10.1

Pulling in to 1.10.1 as it is needed for AIRFLOW-3112

> Use key_file in create_tunnel()
> ---
>
> Key: AIRFLOW-1762
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1762
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: 1.8.0, 1.9.0
>Reporter: Nathan McIntyre
>Assignee: Nathan McIntyre
>Priority: Major
>  Labels: patch
> Fix For: 2.0.0, 1.10.1
>
>
> In contrib/hooks/ssh_hook.py, the ssh command created by the create_tunnel() 
> method does not use the key_file attribute. This prevents the creation of 
> tunnels where a key file is required. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3226) Airflow DAGRun tests require a db reset between runs

2018-10-17 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654059#comment-16654059
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3226:


The run_unit_tests.sh does a {{airflow resetdb}} so each run gets a fresh DB 
that way.

It would be better to have each test clear up after itself.

 

(Better yet, and on my long term plan, but masses more work) would be to have 
each test run in a transaction where we can and just roll back at the end! This 
wouldn't work for all tests, but would cover some of the simpler tests)

> Airflow DAGRun tests require a db reset between runs
> 
>
> Key: AIRFLOW-3226
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3226
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 1.10.0
>Reporter: James Meickle
>Priority: Minor
>
> The Airflow DAGRun tests create DAG runs in the sqlite test database. These 
> runs have unique constraints on dag_id and execution_date/run_id. Since there 
> is no random factor to these in the tests, they only succeed on the first run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-3046) ECS Operator mistakenly reports success when task is killed due to EC2 host termination

2018-10-17 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-3046.

Resolution: Fixed

> ECS Operator mistakenly reports success when task is killed due to EC2 host 
> termination
> ---
>
> Key: AIRFLOW-3046
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3046
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, operators
>Reporter: Dan MacTough
>Priority: Major
> Fix For: 1.10.1
>
>
> We have ECS clusters made up of EC2 spot fleets. Among other things, this 
> means hosts can be terminated on short notice. When this happens, all tasks 
> (and associated containers) get terminated, as well.
> We expect that when that happens for Airflow task instances using the ECS 
> Operator, those instances will be marked as failures and retried.
> Instead, they are marked as successful.
> As a result, the immediate downstream task fails, causing the scheduled DAG 
> run to fail.
> Here's an example of the Airflow log output when this happens:
> {noformat}
> [2018-09-12 01:02:02,712] {ecs_operator.py:112} INFO - ECS Task stopped, 
> check status: {'tasks': [{'taskArn': 
> 'arn:aws:ecs:us-east-1::task/32d43a1d-fbc7-4659-815d-9133bde11cdc',
>  'clusterArn': 'arn:aws:ecs:us-east-1::cluster/processing', 
> 'taskDefinitionArn': 
> 'arn:aws:ecs:us-east-1::task-definition/foobar-testing_dataEngineering_rd:76',
>  'containerInstanceArn': 
> 'arn:aws:ecs:us-east-1::container-instance/7431f0a6-8fc5-4eff-8196-32f77d286a61',
>  'overrides': {'containerOverrides': [{'name': 'foobar-testing', 'command': 
> ['./bin/generate-features.sh', '2018-09-11']}]}, 'lastStatus': 'STOPPED', 
> 'desiredStatus': 'STOPPED', 'cpu': '4096', 'memory': '6', 'containers': 
> [{'containerArn': 
> 'arn:aws:ecs:us-east-1::container/0d5cc553-f894-4f9a-b17c-9f80f7ce8d0a',
>  'taskArn': 
> 'arn:aws:ecs:us-east-1::task/32d43a1d-fbc7-4659-815d-9133bde11cdc',
>  'name': 'foobar-testing', 'lastStatus': 'RUNNING', 'networkBindings': [], 
> 'networkInterfaces': [], 'healthStatus': 'UNKNOWN'}], 'startedBy': 'Airflow', 
> 'version': 3, 'stoppedReason': 'Host EC2 (instance i-02cf23bbd5ae26194) 
> terminated.', 'connectivity': 'CONNECTED', 'connectivityAt': 
> datetime.datetime(2018, 9, 12, 0, 6, 30, 245000, tzinfo=tzlocal()), 
> 'pullStartedAt': datetime.datetime(2018, 9, 12, 0, 6, 32, 748000, 
> tzinfo=tzlocal()), 'pullStoppedAt': datetime.datetime(2018, 9, 12, 0, 6, 59, 
> 748000, tzinfo=tzlocal()), 'createdAt': datetime.datetime(2018, 9, 12, 0, 6, 
> 30, 245000, tzinfo=tzlocal()), 'startedAt': datetime.datetime(2018, 9, 12, 0, 
> 7, 0, 748000, tzinfo=tzlocal()), 'stoppingAt': datetime.datetime(2018, 9, 12, 
> 1, 2, 0, 91000, tzinfo=tzlocal()), 'stoppedAt': datetime.datetime(2018, 9, 
> 12, 1, 2, 0, 91000, tzinfo=tzlocal()), 'group': 
> 'family:foobar-testing_dataEngineering_rd', 'launchType': 'EC2', 
> 'attachments': [], 'healthStatus': 'UNKNOWN'}], 'failures': [], 
> 'ResponseMetadata': {'RequestId': '758c791f-b627-11e8-83f7-2b76f4796ed2', 
> 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Wed, 12 
> Sep 2018 01:02:02 GMT', 'content-type': 'application/x-amz-json-1.1', 
> 'content-length': '1412', 'connection': 'keep-alive', 'x-amzn-requestid': 
> '758c791f-b627-11e8-83f7-2b76f4796ed2'}, 'RetryAttempts': 0}}{noformat}
> I believe the function that checks whether the task is successful needs at 
> least one more check. 
> We are currently running a modified version of the ECS Operator that contains 
> the following {{_check_success_task}} function to address this failure 
> condition:
> {code}
> def _check_success_task(self):
> response = self.client.describe_tasks(
> cluster=self.cluster,
> tasks=[self.arn]
> )
> self.log.info('ECS Task stopped, check status: %s', response)
> if len(response.get('failures', [])) > 0:
> raise AirflowException(response)
> for task in response['tasks']:
> if 'terminated' in task.get('stoppedReason', '').lower():
> raise AirflowException('The task was stopped because the host 
> instance terminated: {}'.format(
> task.get('stoppedReason', '')))
> containers = task['containers']
> for container in containers:
> if container.get('lastStatus') == 'STOPPED' and \
> container['exitCode'] != 0:
> raise AirflowException(
> 'This task is not in success state {}'.format(task))
> elif container.get('lastStatus') == 'PENDING':
> raise Airflo

[jira] [Commented] (AIRFLOW-3211) Airflow losing track of running GCP Dataproc jobs upon Airflow restart

2018-10-17 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654004#comment-16654004
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3211:


(Just a note that you shouldn't need to restart Airflow on deployment to get it 
to pick up new/changed dags)

> Airflow losing track of running GCP Dataproc jobs upon Airflow restart
> --
>
> Key: AIRFLOW-3211
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3211
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Julie Chien
>Assignee: Julie Chien
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.9.0, 1.10.0
>
>
> If Airflow restarts (say, due to deployments, system updates, or regular 
> machine restarts such as the weekly restarts in GCP App Engine) while it's 
> running a job on GCP Dataproc, it'll lose track of that job, mark the task as 
> failed, and eventually retry. However, the jobs may still be running on 
> Dataproc and maybe even finish successfully. So when Airflow retries and 
> reruns the job, the same job will run twice. This can result in issues like 
> delayed workflows, increased costs, and duplicate data. 
>   
>  To reproduce:
> Setup:
>  # Install Airflow.
>  # Set up a GCP Project with the Dataproc API enabled
>  # In the box that's running Airflow, {{pip install google-api-python-client 
> }}{{oauth2client}}
>  # Install this DAG in the Airflow instance: 
> https://github.com/GoogleCloudPlatform/python-docs-samples/blob/b80895ed88ba86fce223df27a48bf481007ca708/composer/workflows/quickstart.py
>  Set up the Airflow variables as instructed at the top of the file.
>  # Start the Airflow scheduler and webserver if they're not running already. 
> Kick off a run of the above DAG through the Airflow UI. Wait for the cluster 
> to spin up and the job to start running on Dataproc.
>  # While the job's running, kill the scheduler and webserver, and then start 
> them back up.
>  # Wait for Airflow to retry the task. Click on the cluster in Dataproc to 
> observe that the job will have been resubmitted, even though the first job is 
> still running without error.
>   
>  At Etsy, we've customized the Dataproc operators to allow for the new 
> Airflow task to pick up where the old one left off upon Airflow restarts, and 
> have been happily using our solution for the past 6 months. I'd like to 
> submit a PR to merge this change upstream.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-3215) Creating EMR using python from airflow

2018-10-17 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-3215.
--
Resolution: Not A Problem

> Creating EMR using python from airflow
> --
>
> Key: AIRFLOW-3215
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3215
> Project: Apache Airflow
>  Issue Type: Task
>  Components: aws, DAG
>Affects Versions: 1.7.0
> Environment: Airflow with boto3 - connecting AWS -configure with 
> access and security 
>Reporter: Pandu
>Priority: Major
>
> I have problem with imports while creating EMR. 
> import boto3
> connection = boto3.client(
> 'emr'
> )
> cluster_id = connection.run_job_flow(
>   Name='emr123',
>   LogUri='s3://emr-spark-application/log.txt',
>   ReleaseLabel='emr-4.1.0',
>   Instances={
> 'InstanceGroups': [
> {
>   'Name': "Master nodes",
>   'Market': 'ON_DEMAND',
>   'InstanceRole': 'MASTER',
>   'InstanceType': 'm1.large',
>   'InstanceCount': 1
> },
> {
>   'Name': "Slave nodes",
>   'Market': 'ON_DEMAND',
>   'InstanceRole': 'CORE',
>   'InstanceType': 'm1.large',
>   'InstanceCount': 1
> }
> ],
> 'KeepJobFlowAliveWhenNoSteps': True,
> 'TerminationProtected': False
>   },
>   Applications=[{
>  'Name': 'Hadoop'
> }, {
>  'Name': 'Spark'
>   }],
>   JobFlowRole='EMR_EC2_DefaultRole',
>   ServiceRole='EMR_DefaultRole'
> )
> print (cluster_id['JobFlowId'])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3215) Creating EMR using python from airflow

2018-10-17 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653505#comment-16653505
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3215:


Operators are just python modules - so any process that makes them available to 
{{python -c 'import mymodulenamehere'}}
- that is out of the scope of airflow. Look for a guide on python packaging 
perhaps?

> Creating EMR using python from airflow
> --
>
> Key: AIRFLOW-3215
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3215
> Project: Apache Airflow
>  Issue Type: Task
>  Components: aws, DAG
>Affects Versions: 1.7.0
> Environment: Airflow with boto3 - connecting AWS -configure with 
> access and security 
>Reporter: Pandu
>Priority: Major
>
> I have problem with imports while creating EMR. 
> import boto3
> connection = boto3.client(
> 'emr'
> )
> cluster_id = connection.run_job_flow(
>   Name='emr123',
>   LogUri='s3://emr-spark-application/log.txt',
>   ReleaseLabel='emr-4.1.0',
>   Instances={
> 'InstanceGroups': [
> {
>   'Name': "Master nodes",
>   'Market': 'ON_DEMAND',
>   'InstanceRole': 'MASTER',
>   'InstanceType': 'm1.large',
>   'InstanceCount': 1
> },
> {
>   'Name': "Slave nodes",
>   'Market': 'ON_DEMAND',
>   'InstanceRole': 'CORE',
>   'InstanceType': 'm1.large',
>   'InstanceCount': 1
> }
> ],
> 'KeepJobFlowAliveWhenNoSteps': True,
> 'TerminationProtected': False
>   },
>   Applications=[{
>  'Name': 'Hadoop'
> }, {
>  'Name': 'Spark'
>   }],
>   JobFlowRole='EMR_EC2_DefaultRole',
>   ServiceRole='EMR_DefaultRole'
> )
> print (cluster_id['JobFlowId'])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3215) Creating EMR using python from airflow

2018-10-17 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653491#comment-16653491
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3215:


The problem is nothing to do with EMR or not it's \{{ImportError: No module 
named telemetry_pipeline_utils}}. Where do you expect this module to come from?

(This is something in your code, and not part of Airflow so we can't really 
help you much more.)

> Creating EMR using python from airflow
> --
>
> Key: AIRFLOW-3215
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3215
> Project: Apache Airflow
>  Issue Type: Task
>  Components: aws, DAG
>Affects Versions: 1.7.0
> Environment: Airflow with boto3 - connecting AWS -configure with 
> access and security 
>Reporter: Pandu
>Priority: Major
>
> I have problem with imports while creating EMR. 
> import boto3
> connection = boto3.client(
> 'emr'
> )
> cluster_id = connection.run_job_flow(
>   Name='emr123',
>   LogUri='s3://emr-spark-application/log.txt',
>   ReleaseLabel='emr-4.1.0',
>   Instances={
> 'InstanceGroups': [
> {
>   'Name': "Master nodes",
>   'Market': 'ON_DEMAND',
>   'InstanceRole': 'MASTER',
>   'InstanceType': 'm1.large',
>   'InstanceCount': 1
> },
> {
>   'Name': "Slave nodes",
>   'Market': 'ON_DEMAND',
>   'InstanceRole': 'CORE',
>   'InstanceType': 'm1.large',
>   'InstanceCount': 1
> }
> ],
> 'KeepJobFlowAliveWhenNoSteps': True,
> 'TerminationProtected': False
>   },
>   Applications=[{
>  'Name': 'Hadoop'
> }, {
>  'Name': 'Spark'
>   }],
>   JobFlowRole='EMR_EC2_DefaultRole',
>   ServiceRole='EMR_DefaultRole'
> )
> print (cluster_id['JobFlowId'])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2338) Airflow requires Hive config

2018-10-17 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-2338:
---
Affects Version/s: (was: 1.9.0)
   1.10.0

> Airflow requires Hive config
> 
>
> Key: AIRFLOW-2338
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2338
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration
>Affects Versions: 1.10.0
>Reporter: John Arnold
>Priority: Minor
>
> Had working airflow, upgraded to latest from master branch, started getting 
> config exceptions:
> {code:java}
> [2018-04-16 23:52:11,732] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 [2018-04-16 23:52:11,731] {__init__.py:50} INFO - Using executor 
> CeleryExecutor
> [2018-04-16 23:52:12,478] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 Traceback (most recent call last):
> [2018-04-16 23:52:12,478] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 File "/var/lib/celery/venv/bin/airflow", line 32, in 
> [2018-04-16 23:52:12,478] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 args.func(args)
> [2018-04-16 23:52:12,478] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 File 
> "/var/lib/celery/venv/lib/python3.6/site-packages/airflow/utils/cli.py", line 
> 77, in wrapper
> [2018-04-16 23:52:12,478] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 raise e
> [2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 File 
> "/var/lib/celery/venv/lib/python3.6/site-packages/airflow/utils/cli.py", line 
> 74, in wrapper
> [2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 return f(*args, **kwargs)
> [2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 File 
> "/var/lib/celery/venv/lib/python3.6/site-packages/airflow/bin/cli.py", line 
> 438, in run
> [2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 conf.set(section, option, value)
> [2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 File 
> "/var/lib/celery/venv/lib/python3.6/site-packages/backports/configparser/__init__.py",
>  line 1239, in set
> [2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 super(ConfigParser, self).set(section, option, value)
> [2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 File 
> "/var/lib/celery/venv/lib/python3.6/site-packages/backports/configparser/__init__.py",
>  line 921, in set
> [2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 raise from_none(NoSectionError(section))
> [2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 backports.configparser.NoSectionError: No section: 'hive'
> [2018-04-16 23:52:36,727] {logging_mixin.py:95} INFO - [2018-04-16 
> 23:52:36,726] \{jobs.py:2548} INFO - Task exited with return code 1
> {code}
> It's looking for a hive section in the airflow.cfg but my config doesn't have 
> one.
> Not sure what changed to make this break, but the default values should be 
> used / we should not error out on a missing section that isn't even needed or 
> used by current dags.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2338) Airflow requires Hive config

2018-10-17 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2338.

Resolution: Duplicate

Should be fixed in 1.10.1

> Airflow requires Hive config
> 
>
> Key: AIRFLOW-2338
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2338
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration
>Affects Versions: 1.10.0
>Reporter: John Arnold
>Priority: Minor
>
> Had working airflow, upgraded to latest from master branch, started getting 
> config exceptions:
> {code:java}
> [2018-04-16 23:52:11,732] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 [2018-04-16 23:52:11,731] {__init__.py:50} INFO - Using executor 
> CeleryExecutor
> [2018-04-16 23:52:12,478] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 Traceback (most recent call last):
> [2018-04-16 23:52:12,478] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 File "/var/lib/celery/venv/bin/airflow", line 32, in 
> [2018-04-16 23:52:12,478] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 args.func(args)
> [2018-04-16 23:52:12,478] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 File 
> "/var/lib/celery/venv/lib/python3.6/site-packages/airflow/utils/cli.py", line 
> 77, in wrapper
> [2018-04-16 23:52:12,478] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 raise e
> [2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 File 
> "/var/lib/celery/venv/lib/python3.6/site-packages/airflow/utils/cli.py", line 
> 74, in wrapper
> [2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 return f(*args, **kwargs)
> [2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 File 
> "/var/lib/celery/venv/lib/python3.6/site-packages/airflow/bin/cli.py", line 
> 438, in run
> [2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 conf.set(section, option, value)
> [2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 File 
> "/var/lib/celery/venv/lib/python3.6/site-packages/backports/configparser/__init__.py",
>  line 1239, in set
> [2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 super(ConfigParser, self).set(section, option, value)
> [2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 File 
> "/var/lib/celery/venv/lib/python3.6/site-packages/backports/configparser/__init__.py",
>  line 921, in set
> [2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 raise from_none(NoSectionError(section))
> [2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
> login_task0 backports.configparser.NoSectionError: No section: 'hive'
> [2018-04-16 23:52:36,727] {logging_mixin.py:95} INFO - [2018-04-16 
> 23:52:36,726] \{jobs.py:2548} INFO - Task exited with return code 1
> {code}
> It's looking for a hive section in the airflow.cfg but my config doesn't have 
> one.
> Not sure what changed to make this break, but the default values should be 
> used / we should not error out on a missing section that isn't even needed or 
> used by current dags.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2338) Airflow requires Hive config

2018-10-17 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-2338:
---
Description: 
Had working airflow, upgraded to latest from master branch, started getting 
config exceptions:
{code:java}
[2018-04-16 23:52:11,732] {base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 [2018-04-16 23:52:11,731] {__init__.py:50} INFO - Using executor 
CeleryExecutor
[2018-04-16 23:52:12,478] {base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 Traceback (most recent call last):
[2018-04-16 23:52:12,478] {base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 File "/var/lib/celery/venv/bin/airflow", line 32, in 
[2018-04-16 23:52:12,478] {base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 args.func(args)
[2018-04-16 23:52:12,478] {base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 File 
"/var/lib/celery/venv/lib/python3.6/site-packages/airflow/utils/cli.py", line 
77, in wrapper
[2018-04-16 23:52:12,478] {base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 raise e
[2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 File 
"/var/lib/celery/venv/lib/python3.6/site-packages/airflow/utils/cli.py", line 
74, in wrapper
[2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 return f(*args, **kwargs)
[2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 File 
"/var/lib/celery/venv/lib/python3.6/site-packages/airflow/bin/cli.py", line 
438, in run
[2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 conf.set(section, option, value)
[2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 File 
"/var/lib/celery/venv/lib/python3.6/site-packages/backports/configparser/__init__.py",
 line 1239, in set
[2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 super(ConfigParser, self).set(section, option, value)
[2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 File 
"/var/lib/celery/venv/lib/python3.6/site-packages/backports/configparser/__init__.py",
 line 921, in set
[2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 raise from_none(NoSectionError(section))
[2018-04-16 23:52:12,479] {base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 backports.configparser.NoSectionError: No section: 'hive'
[2018-04-16 23:52:36,727] {logging_mixin.py:95} INFO - [2018-04-16 
23:52:36,726] \{jobs.py:2548} INFO - Task exited with return code 1
{code}
It's looking for a hive section in the airflow.cfg but my config doesn't have 
one.

Not sure what changed to make this break, but the default values should be used 
/ we should not error out on a missing section that isn't even needed or used 
by current dags.

  was:
Had working airflow, upgraded to latest from master branch, started getting 
config exceptions:

[2018-04-16 23:52:11,732] \{base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 [2018-04-16 23:52:11,731] \{__init__.py:50} INFO - Using executor 
CeleryExecutor
[2018-04-16 23:52:12,478] \{base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 Traceback (most recent call last):
[2018-04-16 23:52:12,478] \{base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 File "/var/lib/celery/venv/bin/airflow", line 32, in 
[2018-04-16 23:52:12,478] \{base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 args.func(args)
[2018-04-16 23:52:12,478] \{base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 File 
"/var/lib/celery/venv/lib/python3.6/site-packages/airflow/utils/cli.py", line 
77, in wrapper
[2018-04-16 23:52:12,478] \{base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 raise e
[2018-04-16 23:52:12,479] \{base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 File 
"/var/lib/celery/venv/lib/python3.6/site-packages/airflow/utils/cli.py", line 
74, in wrapper
[2018-04-16 23:52:12,479] \{base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 return f(*args, **kwargs)
[2018-04-16 23:52:12,479] \{base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 File 
"/var/lib/celery/venv/lib/python3.6/site-packages/airflow/bin/cli.py", line 
438, in run
[2018-04-16 23:52:12,479] \{base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 conf.set(section, option, value)
[2018-04-16 23:52:12,479] \{base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 File 
"/var/lib/celery/venv/lib/python3.6/site-packages/backports/configparser/__init__.py",
 line 1239, in set
[2018-04-16 23:52:12,479] \{base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 super(ConfigParser, self).set(section, option, value)
[2018-04-16 23:52:12,479] \{base_task_runner.py:106} INFO - Job 147: Subtask 
login_task0 File 
"/

[jira] [Created] (AIRFLOW-3221) RBAC-date filter mangles input sub-second datetimes in JS

2018-10-17 Thread Ash Berlin-Taylor (JIRA)
Ash Berlin-Taylor created AIRFLOW-3221:
--

 Summary: RBAC-date filter mangles input sub-second datetimes in JS
 Key: AIRFLOW-3221
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3221
 Project: Apache Airflow
  Issue Type: Improvement
  Components: ui
Reporter: Ash Berlin-Taylor
 Fix For: 1.10.0


The JS used to generate the DateTime picker for the "Search"/filter actions for 
any date time coluum will not accept sub-second precision - even if you 
manually type it in, the JS will "helpfully" parse and reformat the input by 
deleting the subsecond part.

To reproduce:
- Browse -> Task Instances
- Expand "Search"
- Add a filter on a date column (Start Date say)
- Type/paste {{2018-10-16 20:24:26.375717+01:00}} in the box 
- tab to move focus elsewhere
- See the value change to {{2018-10-17 09:44:32}}!! (note different day and 
time!)

If you manually edit the URL then the original input is accepted fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-3196) Filtering from admin views on any dates fails

2018-10-16 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-3196.
--
Resolution: Duplicate

Working on a fix though, should be in 1.10.1

> Filtering from admin views on any dates fails
> -
>
> Key: AIRFLOW-3196
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3196
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 1.10.0
> Environment: Airflow 1.10
>Reporter: Emmanuel Brard
>Priority: Critical
>
> Any filtering on any date fields from all admin views fails with :
> {code}
> ---
> Node: airflow-web-95fd56cd6-csvh7
> ---
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1116, in _execute_context
> context = constructor(dialect, self, conn, *args)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 623, in _init_compiled
> param.append(processors[key](compiled_params[key]))
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/sql/type_api.py", 
> line 1078, in process
> return process_param(value, dialect)
>   File "/usr/local/lib/python3.6/site-packages/airflow/utils/sqlalchemy.py", 
> line 156, in process_bind_param
> raise ValueError('naive datetime is disallowed')
> ValueError: naive datetime is disallowed
> {code}
> It looks like DateTimeField from wtforms is not TZ aware in 
> https://github.com/apache/incubator-airflow/blob/master/airflow/www/forms.py#L31
> Probably using 
> https://github.com/apache/incubator-airflow/blob/master/airflow/utils/timezone.py#L98
> will help.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2799) Filtering UI objects by datetime is broken

2018-10-16 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652487#comment-16652487
 ] 

Ash Berlin-Taylor commented on AIRFLOW-2799:


Fixing non-rbac UI was easy, fixing RBAC was more of a pain. I have 80% of a 
solution so will open a PR soon

> Filtering UI objects by datetime is broken 
> ---
>
> Key: AIRFLOW-2799
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2799
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui, webserver
>Affects Versions: 1.10.0
> Environment: Debian Stretch, Python 3.5.3
>Reporter: Kevin Campbell
>Assignee: Ash Berlin-Taylor
>Priority: Major
> Fix For: 1.10.1
>
>
> On master (49fd23a3ee0269e2b974648f4a823c1d0b6c12ec) searching objects via 
> the user interface is broken for datetime fields.
> Create a new installation
>  Create a test dag (example_bash_operator)
>  Start webserver and scheduler
>  Enable dag
> On web UI, go to Browse > Task Instances
>  Search for task instances with execution_date greater than 5 days ago
>  You will get an exception
> {code:java}
>   / (  ()   )  \___
>  /( (  (  )   _))  )   )\
>(( (   )()  )   (   )  )
>  ((/  ( _(   )   (   _) ) (  () )  )
> ( (  ( (_)   (((   )  .((_ ) .  )_
>( (  )(  (  ))   ) . ) (   )
>   (  (   (  (   ) (  _  ( _) ).  ) . ) ) ( )
>   ( (  (   ) (  )   (  )) ) _)(   )  )  )
>  ( (  ( \ ) ((_  ( ) ( )  )   ) )  )) ( )
>   (  (   (  (   (_ ( ) ( _)  ) (  )  )   )
>  ( (  ( (  (  ) (_  )  ) )  _)   ) _( ( )
>   ((  (   )(( _)   _) _(_ (  (_ )
>(_((__(_(__(( ( ( |  ) ) ) )_))__))_)___)
>((__)\\||lll|l||///  \_))
> (   /(/ (  )  ) )\   )
>   (( ( ( | | ) ) )\   )
>(   /(| / ( )) ) ) )) )
>  ( ( _(|)_) )
>   (  ||\(|(|)|/|| )
> (|(||(||))
>   ( //|/l|||)|\\ \ )
> (/ / //  /|//\\  \ \  \ _)
> ---
> Node: wave.diffractive.io
> ---
> Traceback (most recent call last):
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py",
>  line 1116, in _execute_context
> context = constructor(dialect, self, conn, *args)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy/engine/default.py",
>  line 649, in _init_compiled
> for key in compiled_params
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy/engine/default.py",
>  line 649, in 
> for key in compiled_params
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy/sql/type_api.py",
>  line 1078, in process
> return process_param(value, dialect)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy_utc/sqltypes.py",
>  line 30, in process_bind_param
> raise ValueError('naive datetime is disallowed')
> ValueError: naive datetime is disallowed
> The above exception was the direct cause of the following exception:
> Traceback (most recent call last):
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1982, in wsgi_app
> response = self.full_dispatch_request()
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1614, in full_dispatch_request
> rv = self.handle_user_exception(e)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1517, in handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/_compat.py",
>  line 33, in reraise
> raise value
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1612, in full_dispatch_request
> rv = self.dispatch_request()
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1598, in dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask_admin/base.p

[jira] [Assigned] (AIRFLOW-2799) Filtering UI objects by datetime is broken

2018-10-16 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor reassigned AIRFLOW-2799:
--

Assignee: Ash Berlin-Taylor

> Filtering UI objects by datetime is broken 
> ---
>
> Key: AIRFLOW-2799
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2799
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui, webserver
>Affects Versions: 1.10.0
> Environment: Debian Stretch, Python 3.5.3
>Reporter: Kevin Campbell
>Assignee: Ash Berlin-Taylor
>Priority: Major
> Fix For: 1.10.1
>
>
> On master (49fd23a3ee0269e2b974648f4a823c1d0b6c12ec) searching objects via 
> the user interface is broken for datetime fields.
> Create a new installation
>  Create a test dag (example_bash_operator)
>  Start webserver and scheduler
>  Enable dag
> On web UI, go to Browse > Task Instances
>  Search for task instances with execution_date greater than 5 days ago
>  You will get an exception
> {code:java}
>   / (  ()   )  \___
>  /( (  (  )   _))  )   )\
>(( (   )()  )   (   )  )
>  ((/  ( _(   )   (   _) ) (  () )  )
> ( (  ( (_)   (((   )  .((_ ) .  )_
>( (  )(  (  ))   ) . ) (   )
>   (  (   (  (   ) (  _  ( _) ).  ) . ) ) ( )
>   ( (  (   ) (  )   (  )) ) _)(   )  )  )
>  ( (  ( \ ) ((_  ( ) ( )  )   ) )  )) ( )
>   (  (   (  (   (_ ( ) ( _)  ) (  )  )   )
>  ( (  ( (  (  ) (_  )  ) )  _)   ) _( ( )
>   ((  (   )(( _)   _) _(_ (  (_ )
>(_((__(_(__(( ( ( |  ) ) ) )_))__))_)___)
>((__)\\||lll|l||///  \_))
> (   /(/ (  )  ) )\   )
>   (( ( ( | | ) ) )\   )
>(   /(| / ( )) ) ) )) )
>  ( ( _(|)_) )
>   (  ||\(|(|)|/|| )
> (|(||(||))
>   ( //|/l|||)|\\ \ )
> (/ / //  /|//\\  \ \  \ _)
> ---
> Node: wave.diffractive.io
> ---
> Traceback (most recent call last):
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py",
>  line 1116, in _execute_context
> context = constructor(dialect, self, conn, *args)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy/engine/default.py",
>  line 649, in _init_compiled
> for key in compiled_params
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy/engine/default.py",
>  line 649, in 
> for key in compiled_params
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy/sql/type_api.py",
>  line 1078, in process
> return process_param(value, dialect)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy_utc/sqltypes.py",
>  line 30, in process_bind_param
> raise ValueError('naive datetime is disallowed')
> ValueError: naive datetime is disallowed
> The above exception was the direct cause of the following exception:
> Traceback (most recent call last):
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1982, in wsgi_app
> response = self.full_dispatch_request()
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1614, in full_dispatch_request
> rv = self.handle_user_exception(e)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1517, in handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/_compat.py",
>  line 33, in reraise
> raise value
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1612, in full_dispatch_request
> rv = self.dispatch_request()
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1598, in dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask_admin/base.py",
>  line 69, in inner
> return self._run_view(f, *args, **kwargs)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/py

[jira] [Commented] (AIRFLOW-2799) Filtering UI objects by datetime is broken

2018-10-16 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652382#comment-16652382
 ] 

Ash Berlin-Taylor commented on AIRFLOW-2799:


Aslo applies to RBAC view.

> Filtering UI objects by datetime is broken 
> ---
>
> Key: AIRFLOW-2799
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2799
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui, webserver
>Affects Versions: 1.10.0
> Environment: Debian Stretch, Python 3.5.3
>Reporter: Kevin Campbell
>Priority: Major
> Fix For: 1.10.1
>
>
> On master (49fd23a3ee0269e2b974648f4a823c1d0b6c12ec) searching objects via 
> the user interface is broken for datetime fields.
> Create a new installation
>  Create a test dag (example_bash_operator)
>  Start webserver and scheduler
>  Enable dag
> On web UI, go to Browse > Task Instances
>  Search for task instances with execution_date greater than 5 days ago
>  You will get an exception
> {code:java}
>   / (  ()   )  \___
>  /( (  (  )   _))  )   )\
>(( (   )()  )   (   )  )
>  ((/  ( _(   )   (   _) ) (  () )  )
> ( (  ( (_)   (((   )  .((_ ) .  )_
>( (  )(  (  ))   ) . ) (   )
>   (  (   (  (   ) (  _  ( _) ).  ) . ) ) ( )
>   ( (  (   ) (  )   (  )) ) _)(   )  )  )
>  ( (  ( \ ) ((_  ( ) ( )  )   ) )  )) ( )
>   (  (   (  (   (_ ( ) ( _)  ) (  )  )   )
>  ( (  ( (  (  ) (_  )  ) )  _)   ) _( ( )
>   ((  (   )(( _)   _) _(_ (  (_ )
>(_((__(_(__(( ( ( |  ) ) ) )_))__))_)___)
>((__)\\||lll|l||///  \_))
> (   /(/ (  )  ) )\   )
>   (( ( ( | | ) ) )\   )
>(   /(| / ( )) ) ) )) )
>  ( ( _(|)_) )
>   (  ||\(|(|)|/|| )
> (|(||(||))
>   ( //|/l|||)|\\ \ )
> (/ / //  /|//\\  \ \  \ _)
> ---
> Node: wave.diffractive.io
> ---
> Traceback (most recent call last):
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py",
>  line 1116, in _execute_context
> context = constructor(dialect, self, conn, *args)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy/engine/default.py",
>  line 649, in _init_compiled
> for key in compiled_params
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy/engine/default.py",
>  line 649, in 
> for key in compiled_params
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy/sql/type_api.py",
>  line 1078, in process
> return process_param(value, dialect)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy_utc/sqltypes.py",
>  line 30, in process_bind_param
> raise ValueError('naive datetime is disallowed')
> ValueError: naive datetime is disallowed
> The above exception was the direct cause of the following exception:
> Traceback (most recent call last):
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1982, in wsgi_app
> response = self.full_dispatch_request()
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1614, in full_dispatch_request
> rv = self.handle_user_exception(e)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1517, in handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/_compat.py",
>  line 33, in reraise
> raise value
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1612, in full_dispatch_request
> rv = self.dispatch_request()
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1598, in dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask_admin/base.py",
>  line 69, in inner
> return self._run_view(f, *args, **kwargs)
>   File 
> "/home/kev/.virtualenvs/airflow/local/l

[jira] [Updated] (AIRFLOW-1867) sendgrid fails on python3 with attachments

2018-10-16 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-1867:
---
Fix Version/s: 1.10.1

> sendgrid fails on python3 with attachments
> --
>
> Key: AIRFLOW-1867
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1867
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Scott Kruger
>Priority: Minor
> Fix For: 1.10.1
>
>
> Sendgrid emails raise an exception on python 3 when attaching files due to 
> {{base64.b64encode}} returning {{bytes}} rather than {{unicode/string}} (see: 
> https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/utils/sendgrid.py#L69).
>   The fix is simple: decode the base64 data to `utf-8`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2799) Filtering UI objects by datetime is broken

2018-10-16 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-2799:
---
Affects Version/s: (was: 2.0.0)
   1.10.0
Fix Version/s: 1.10.1

> Filtering UI objects by datetime is broken 
> ---
>
> Key: AIRFLOW-2799
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2799
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui, webserver
>Affects Versions: 1.10.0
> Environment: Debian Stretch, Python 3.5.3
>Reporter: Kevin Campbell
>Priority: Major
> Fix For: 1.10.1
>
>
> On master (49fd23a3ee0269e2b974648f4a823c1d0b6c12ec) searching objects via 
> the user interface is broken for datetime fields.
> Create a new installation
>  Create a test dag (example_bash_operator)
>  Start webserver and scheduler
>  Enable dag
> On web UI, go to Browse > Task Instances
>  Search for task instances with execution_date greater than 5 days ago
>  You will get an exception
> {code:java}
>   / (  ()   )  \___
>  /( (  (  )   _))  )   )\
>(( (   )()  )   (   )  )
>  ((/  ( _(   )   (   _) ) (  () )  )
> ( (  ( (_)   (((   )  .((_ ) .  )_
>( (  )(  (  ))   ) . ) (   )
>   (  (   (  (   ) (  _  ( _) ).  ) . ) ) ( )
>   ( (  (   ) (  )   (  )) ) _)(   )  )  )
>  ( (  ( \ ) ((_  ( ) ( )  )   ) )  )) ( )
>   (  (   (  (   (_ ( ) ( _)  ) (  )  )   )
>  ( (  ( (  (  ) (_  )  ) )  _)   ) _( ( )
>   ((  (   )(( _)   _) _(_ (  (_ )
>(_((__(_(__(( ( ( |  ) ) ) )_))__))_)___)
>((__)\\||lll|l||///  \_))
> (   /(/ (  )  ) )\   )
>   (( ( ( | | ) ) )\   )
>(   /(| / ( )) ) ) )) )
>  ( ( _(|)_) )
>   (  ||\(|(|)|/|| )
> (|(||(||))
>   ( //|/l|||)|\\ \ )
> (/ / //  /|//\\  \ \  \ _)
> ---
> Node: wave.diffractive.io
> ---
> Traceback (most recent call last):
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy/engine/base.py",
>  line 1116, in _execute_context
> context = constructor(dialect, self, conn, *args)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy/engine/default.py",
>  line 649, in _init_compiled
> for key in compiled_params
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy/engine/default.py",
>  line 649, in 
> for key in compiled_params
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy/sql/type_api.py",
>  line 1078, in process
> return process_param(value, dialect)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/sqlalchemy_utc/sqltypes.py",
>  line 30, in process_bind_param
> raise ValueError('naive datetime is disallowed')
> ValueError: naive datetime is disallowed
> The above exception was the direct cause of the following exception:
> Traceback (most recent call last):
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1982, in wsgi_app
> response = self.full_dispatch_request()
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1614, in full_dispatch_request
> rv = self.handle_user_exception(e)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1517, in handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/_compat.py",
>  line 33, in reraise
> raise value
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1612, in full_dispatch_request
> rv = self.dispatch_request()
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask/app.py",
>  line 1598, in dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File 
> "/home/kev/.virtualenvs/airflow/local/lib/python3.5/site-packages/flask_admin/base.py",
>  line 69, in inner
> return self._run_view(f, *args, **kwargs)
>   File 
> "/home/kev/.virtualenvs/

[jira] [Commented] (AIRFLOW-3196) Filtering from admin views on any dates fails

2018-10-16 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652284#comment-16652284
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3196:


Possible related to AIRFLOW-2799

> Filtering from admin views on any dates fails
> -
>
> Key: AIRFLOW-3196
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3196
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 1.10.0
> Environment: Airflow 1.10
>Reporter: Emmanuel Brard
>Priority: Critical
>
> Any filtering on any date fields from all admin views fails with :
> {code}
> ---
> Node: airflow-web-95fd56cd6-csvh7
> ---
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
> line 1116, in _execute_context
> context = constructor(dialect, self, conn, *args)
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", 
> line 623, in _init_compiled
> param.append(processors[key](compiled_params[key]))
>   File "/usr/local/lib/python3.6/site-packages/sqlalchemy/sql/type_api.py", 
> line 1078, in process
> return process_param(value, dialect)
>   File "/usr/local/lib/python3.6/site-packages/airflow/utils/sqlalchemy.py", 
> line 156, in process_bind_param
> raise ValueError('naive datetime is disallowed')
> ValueError: naive datetime is disallowed
> {code}
> It looks like DateTimeField from wtforms is not TZ aware in 
> https://github.com/apache/incubator-airflow/blob/master/airflow/www/forms.py#L31
> Probably using 
> https://github.com/apache/incubator-airflow/blob/master/airflow/utils/timezone.py#L98
> will help.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-2873) Improvements to Quick Start flow

2018-10-16 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2873.

   Resolution: Fixed
Fix Version/s: 1.10.0

> Improvements to Quick Start flow
> 
>
> Key: AIRFLOW-2873
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2873
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: configuration
>Affects Versions: 1.9.0
>Reporter: G. Geijteman
>Priority: Major
> Fix For: 1.10.0
>
>
> Thank you for developing Airflow!
> Having ran through the [Quick 
> Start|https://airflow.incubator.apache.org/start.html], i've come across two 
> issues that I would like to highlight:
> {code:java}
> bash-3.2$ cd ~/project/airflow/
> bash-3.2$ export AIRFLOW_HOME=~/project/airflow
> bash-3.2$ python3 -m venv $AIRFLOW_HOME/venv
> bash-3.2$ source venv/bin/activate
> (venv) bash-3.2$ pip install --upgrade pip
> (venv) bash-3.2$ uname -a Darwin mac.local 17.7.0 Darwin Kernel Version 
> 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 
> x86_64
> (venv) bash-3.2$ python -V
> Python 3.6.5
> (venv) bash-3.2$ pip -V
> pip 18.0 from ~/project/airflow/venv/lib/python3.6/site-packages/pip (python 
> 3.6)
> (venv) bash-3.2$ pip install apache-airflow[redis,postgres] -U {code}
> Results in:
> {code:java}
> During handling of the above exception, another exception occurred:Traceback 
> (most recent call last):
> File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", 
> line 639, in set_extra
> fernet = get_fernet()
> File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", 
> line 103, in get_fernet
> raise AirflowException('Failed to import Fernet, it may not be installed')
> airflow.exceptions.AirflowException: Failed to import Fernet, it may not be 
> installed
> [2018-08-08 10:49:01,121]{models.py:643}ERROR - Failed to load fernet while 
> encrypting value, using non-encrypted value.
> Traceback (most recent call last):
> File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", 
> line 101, in get_fernet
> from cryptography.fernet import Fernet
> ModuleNotFoundError: No module named 'cryptography'{code}
> This is solved by:
> {code:java}
> (venv) bash-3.2$ pip install cryptography{code}
> *Proposed fix:*
> _Include the `cryptography` package in the setup / package requirements_
>  
> Having fixed that, the following issue occurs when trying to:
> {code:java}
> (venv) bash-3.2$ airflow initdb{code}
> Exempt:
> {code:java}
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", 
> line 639, in set_extra
> fernet = get_fernet()
> File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", 
> line 107, in get_fernet
> raise AirflowException("Could not create Fernet object: {}".format(ve))
> airflow.exceptions.AirflowException: Could not create Fernet object: 
> Incorrect padding
> [2018-08-08 10:50:50,697]
> {models.py:643}
> ERROR - Failed to load fernet while encrypting value, using non-encrypted 
> value.
> Traceback (most recent call last):
> File "~/project/airflow/venv/lib/python3.6/site-packages/airflow/models.py", 
> line 105, in get_fernet
> return Fernet(configuration.get('core', 'FERNET_KEY').encode('utf-8'))
> File 
> "~/project/airflow/venv/lib/python3.6/site-packages/cryptography/fernet.py", 
> line 34, in _init_
> key = base64.urlsafe_b64decode(key)
> File 
> "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/base64.py",
>  line 133, in urlsafe_b64decode
> return b64decode(s)
> File 
> "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/base64.py",
>  line 87, in b64decode
> return binascii.a2b_base64(s)
> binascii.Error: Incorrect padding{code}
> Which after some googling leads to the conclusion that the 
> ~/project/airflow/airflow.cfg fernet_key field is not set to the correct 
> value.
> *Feature request:*
> _Have the setup automatically generate a valid fernet key for the user._
> The fact that this page exists: [https://bcb.github.io/airflow/fernet-key] 
> suggests this could easily be a part of the package.
> I understand that this project is in incubator phase, but I would say having 
> a quick start that is not working as-is will discourage users from trying out 
> this project. Thank you for considering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-1114) S3Hook doesn't support Minio

2018-10-16 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-1114.

Resolution: Fixed

Fixed - we upgraded to boto3 across the board for AWS hooks in 1.9? or 1.10

> S3Hook doesn't support Minio
> 
>
> Key: AIRFLOW-1114
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1114
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: aws, hooks
>Affects Versions: 1.8.0
> Environment: I am running it from the following Docker image: 
> https://github.com/puckel/docker-airflow
>Reporter: Alessandro Cosentino
>Priority: Major
>
> I want to mock an S3 bucket with Minio for local testing when using the 
> S3Hook. If I specify a connection whose host url is {{localhost}}, the S3Hook 
> fails to connect to the bucket because it can't infer the region.
> I suspect this is due to a bug in boto: 
> https://github.com/boto/boto/issues/2624
> One solution would be to upgrade the S3Hook to boto3. If I get the green 
> light, I will make a PR for that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-1029) https://issues.apache.org/jira/browse/AIRFLOW

2018-10-16 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-1029.
--
Resolution: Cannot Reproduce

Think this has been fixed now. Please open a new issue if this is still 
happening against 1.10.0+

> https://issues.apache.org/jira/browse/AIRFLOW
> -
>
> Key: AIRFLOW-1029
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1029
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.9.0
>Reporter: Alessio Palma
>Priority: Major
>  Labels: amqp_compliance, rabbitmq, scheduler
> Attachments: PannelloAIRFLOW 2.png, image (1).png
>
>
> I'm using:
> AIRFLOW 1.8.0RC5
> ERLANG 19.2 
> RABBIT 3.6.7
> PYTHON 2.7
> When I start a DAG from the panel ( see picture ), Scheduler stop working.
> After some investigation the problem raises here: 
> celery_executor.py: 
>  83 def sync(self):
>  84 
>  85 self.logger.debug(
>  86 "Inquiring about {} celery task(s)".format(len(self.tasks)))
>  87
>  88 for key, async in list(self.tasks.items()):
>  90 state = async.state < HERE 
> Python stack trace says that the connection is closed; capturing some TCP 
> traffic I can see that the connection to RABBITMQ is closed ( TCP FIN )  
> before to send a STOMP, so RABBITMQ replies with  TCP RST. ( see picture 2: 
> 172.1.0.2 -> rabbitmq node, 172.1.0.1 -> airflow node   ) 
> This exception stops the scheduler.
> If you are using airflow-scheduler-failover-controller the scheduler is 
> restarted, but this is just a work around and does not fixes the problem at 
> the root. 
> Is safe to trap the exception ? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-1151) Fix scripts execution in SparkSql hook

2018-10-16 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-1151.
--
Resolution: Duplicate

> Fix scripts execution in SparkSql hook 
> ---
>
> Key: AIRFLOW-1151
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1151
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: 1.8.0
>Reporter: Giovanni Lanzani
>Priority: Major
>
> When using the the SparkSqlOperator and submitting a file (ending with .sql 
> or .hql), a whitespace need to be appended, otherwise a Jinja error will be 
> raised. However the trailing whitespace confused the hook as those files will 
> not end with .sql and .hql, but with .sql and .hql. This PR fixes this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2999) S3_hook - add the ability to download file to local disk

2018-10-16 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651618#comment-16651618
 ] 

Ash Berlin-Taylor commented on AIRFLOW-2999:


Mirroring the GCP download operator for S3 sounds like a sensible plan.

> S3_hook  - add the ability to download file to local disk
> -
>
> Key: AIRFLOW-2999
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2999
> Project: Apache Airflow
>  Issue Type: Task
>Affects Versions: 1.10.0
>Reporter: jack
>Priority: Major
>
> The [S3_hook 
> |https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/S3_hook.py#L177]
>  has get_key method that returns boto3.s3.Object it also has load_file method 
> which loads file from local file system to S3.
>  
> What it doesn't have is a method to download a file from S3 to the local file 
> system.
> Basicly it should be something very simple... an extention to the get_key 
> method with parameter to the destination on local file system adding a code 
> for taking the boto3.s3.Object and save it on the disk.  Note: that it can be 
> more than 1 file if the user choose a folder in S3.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AIRFLOW-981) TreeView date axis shows dates into the future

2018-10-16 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-981.
-
Resolution: Cannot Reproduce

Marking as cannot reproduce for now.

Please re-open this (or comment) if this is still an issue for anyone with 
1.10.0+

> TreeView date axis shows dates into the future
> --
>
> Key: AIRFLOW-981
> URL: https://issues.apache.org/jira/browse/AIRFLOW-981
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui, webserver
>Affects Versions: 1.7.1.3
>Reporter: Ruslan Dautkhanov
>Priority: Critical
> Attachments: Airflow - TreeView-2 weeks in the future.png
>
>
> Freshly installed AirFlow
> example_twitter_dag Tree View shows date scale from Mar 13 (yesterday when 
> that job didn't even run) till March 27th and further (2+ weeks into the 
> future)
> So the tasks below that date scale are totally misaligned to the time 
> dimension.
> See screenshot below.
> !Airflow - TreeView-2 weeks in the future.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3215) Creating EMR using python from airflow

2018-10-16 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651407#comment-16651407
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3215:


You haven't provided any error so we can't help you yet.

Also you have said this is against 1.7 - if so please try with a more recent 
version!

> Creating EMR using python from airflow
> --
>
> Key: AIRFLOW-3215
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3215
> Project: Apache Airflow
>  Issue Type: Task
>  Components: aws, DAG
>Affects Versions: 1.7.0
> Environment: Airflow with boto3 - connecting AWS -configure with 
> access and security 
>Reporter: Pandu
>Priority: Major
>
> I have problem with imports while creating EMR. 
> import boto3
> connection = boto3.client(
> 'emr'
> )
> cluster_id = connection.run_job_flow(
>   Name='emr123',
>   LogUri='s3://emr-spark-application/log.txt',
>   ReleaseLabel='emr-4.1.0',
>   Instances={
> 'InstanceGroups': [
> {
>   'Name': "Master nodes",
>   'Market': 'ON_DEMAND',
>   'InstanceRole': 'MASTER',
>   'InstanceType': 'm1.large',
>   'InstanceCount': 1
> },
> {
>   'Name': "Slave nodes",
>   'Market': 'ON_DEMAND',
>   'InstanceRole': 'CORE',
>   'InstanceType': 'm1.large',
>   'InstanceCount': 1
> }
> ],
> 'KeepJobFlowAliveWhenNoSteps': True,
> 'TerminationProtected': False
>   },
>   Applications=[{
>  'Name': 'Hadoop'
> }, {
>  'Name': 'Spark'
>   }],
>   JobFlowRole='EMR_EC2_DefaultRole',
>   ServiceRole='EMR_DefaultRole'
> )
> print (cluster_id['JobFlowId'])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3209) return job id on bq operators

2018-10-16 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651397#comment-16651397
 ] 

Ash Berlin-Taylor commented on AIRFLOW-3209:


If it's only useful in post_execute then storing it as an instance property 
sounds sensible.

If it would be useful as history or in downstream operators then it could be 
stored in XCom with a different key: from memory something like this 
{{self.xcom_push(self.job_id, key="bigquery_job_id")}} - (though don't call it 
just "job_id" as Airflow has a similar property/column already.)

> return job id on bq operators
> -
>
> Key: AIRFLOW-3209
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3209
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Ben Marengo
>Assignee: Ben Marengo
>Priority: Major
>
> i would like to be able to access the job_id in the post_execute()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AIRFLOW-843) Store task exceptions in context

2018-10-16 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-843.
---
   Resolution: Fixed
Fix Version/s: 1.10.1

> Store task exceptions in context
> 
>
> Key: AIRFLOW-843
> URL: https://issues.apache.org/jira/browse/AIRFLOW-843
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Scott Kruger
>Priority: Minor
> Fix For: 1.10.1
>
>
> If a task encounters an exception during execution, it should store the 
> exception on the execution context so that other methods (namely 
> `on_failure_callback` can access it.  This would help with custom error 
> integrations, e.g. Sentry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


<    1   2   3   4   5   6   7   8   >