[GitHub] [airflow] ashb commented on a change in pull request #5979: [AIRFLOW-5373] Super fast pre-commit check for basic python2 compatib…

2019-09-03 Thread GitBox
ashb commented on a change in pull request #5979: [AIRFLOW-5373] Super fast 
pre-commit check for basic python2 compatib…
URL: https://github.com/apache/airflow/pull/5979#discussion_r320254712
 
 

 ##
 File path: airflow/contrib/example_dags/example_qubole_operator.py
 ##
 @@ -198,7 +198,7 @@ def compare_result(ds, **kwargs):
 
 /** Computes an approximation to pi */
 object SparkPi {
-  def main(args: Array[String]) {
+  def main(args) {
 
 Review comment:
   Looks like https://pre-commit.com/#regular-expressions to exclude the whole 
file is the only way to do this one. A bit overly broad but this file isn't 
changed very often so that's probably okay. That or adjust the regex to not 
allow `) {` on the line (trickier regex to get that working)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5365) When switching between branches (master/v1-10-test) rebuild of image should not be needed for pre-commits

2019-09-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921401#comment-16921401
 ] 

ASF subversion and git services commented on AIRFLOW-5365:
--

Commit 6a213183b132b229908bd2b85c8abb7dc86e88d7 in airflow's branch 
refs/heads/v1-10-test from Jarek Potiuk
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=6a21318 ]

[AIRFLOW-5365] No need to do image rebuild when switching master/v1-10-test 
(#5972)


(cherry picked from commit 319b80437cf7e58d0ceecf9a58e336e14936b163)


> When switching between branches (master/v1-10-test) rebuild of image should 
> not be needed for pre-commits
> -
>
> Key: AIRFLOW-5365
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5365
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ci
>Affects Versions: 2.0.0, 1.10.5
>Reporter: Jarek Potiuk
>Priority: Major
> Fix For: 1.10.5
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (AIRFLOW-5365) When switching between branches (master/v1-10-test) rebuild of image should not be needed for pre-commits

2019-09-03 Thread Jarek Potiuk (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Potiuk resolved AIRFLOW-5365.
---
Fix Version/s: 1.10.5
   Resolution: Fixed

> When switching between branches (master/v1-10-test) rebuild of image should 
> not be needed for pre-commits
> -
>
> Key: AIRFLOW-5365
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5365
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ci
>Affects Versions: 2.0.0, 1.10.5
>Reporter: Jarek Potiuk
>Priority: Major
> Fix For: 1.10.5
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [airflow] potiuk commented on a change in pull request #5979: [AIRFLOW-5373] Super fast pre-commit check for basic python2 compatib…

2019-09-03 Thread GitBox
potiuk commented on a change in pull request #5979: [AIRFLOW-5373] Super fast 
pre-commit check for basic python2 compatib…
URL: https://github.com/apache/airflow/pull/5979#discussion_r320250286
 
 

 ##
 File path: airflow/contrib/example_dags/example_qubole_operator.py
 ##
 @@ -198,7 +198,7 @@ def compare_result(ds, **kwargs):
 
 /** Computes an approximation to pi */
 object SparkPi {
-  def main(args: Array[String]) {
+  def main(args) {
 
 Review comment:
   Ah indeed. So I need to add a a possibility to exclude such false 
positives. Will do soon.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil edited a comment on issue #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability

2019-09-03 Thread GitBox
kaxil edited a comment on issue #5743: [AIRFLOW-5088][AIP-24] Persisting 
serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#issuecomment-52749
 
 
   > How about instead of a background thread (I'm wary of using threads in 
python) could we instead query the last modified time of the serialised dag on 
each request?
   > 
   > i.e. when asking for dag X we check if dag X has been updated in the db 
since we last loaded it?
   
   How about this -> 
https://github.com/kaxil/airflow/commit/830b83cfa845c69319061076bb8515c3fa99553c
   cc @coufon 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil edited a comment on issue #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability

2019-09-03 Thread GitBox
kaxil edited a comment on issue #5743: [AIRFLOW-5088][AIP-24] Persisting 
serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#issuecomment-52749
 
 
   > How about instead of a background thread (I'm wary of using threads in 
python) could we instead query the last modified time of the serialised dag on 
each request?
   > 
   > i.e. when asking for dag X we check if dag X has been updated in the db 
since we last loaded it?
   
   How about this -> 
   
https://github.com/kaxil/airflow/commit/830b83cfa845c69319061076bb8515c3fa99553c
   cc @coufon 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil edited a comment on issue #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability

2019-09-03 Thread GitBox
kaxil edited a comment on issue #5743: [AIRFLOW-5088][AIP-24] Persisting 
serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#issuecomment-52749
 
 
   > How about instead of a background thread (I'm wary of using threads in 
python) could we instead query the last modified time of the serialised dag on 
each request?
   > 
   > i.e. when asking for dag X we check if dag X has been updated in the db 
since we last loaded it?
   
   How about this -> 
https://github.com/kaxil/airflow/commit/830b83cfa845c69319061076bb8515c3fa99553c
   cc @coufon  @ashb 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil edited a comment on issue #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability

2019-09-03 Thread GitBox
kaxil edited a comment on issue #5743: [AIRFLOW-5088][AIP-24] Persisting 
serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#issuecomment-52749
 
 
   > How about instead of a background thread (I'm wary of using threads in 
python) could we instead query the last modified time of the serialised dag on 
each request?
   > 
   > i.e. when asking for dag X we check if dag X has been updated in the db 
since we last loaded it?
   How about this -> 
https://github.com/kaxil/airflow/commit/d2ec5371ae987ff87f60ee6b0143eb66dd5dc1e7
 , 
https://github.com/kaxil/airflow/commit/399dd656427e5d46e863fc2e7c0dda3b788fd8ed
 , 
https://github.com/kaxil/airflow/commit/c6a53b34b372466bf1a6a6f906d736284a115f55?
   
   cc @coufon 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil edited a comment on issue #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability

2019-09-03 Thread GitBox
kaxil edited a comment on issue #5743: [AIRFLOW-5088][AIP-24] Persisting 
serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#issuecomment-52749
 
 
   > How about instead of a background thread (I'm wary of using threads in 
python) could we instead query the last modified time of the serialised dag on 
each request?
   > 
   > i.e. when asking for dag X we check if dag X has been updated in the db 
since we last loaded it?
   How about this -> 
https://github.com/kaxil/airflow/commit/d2ec5371ae987ff87f60ee6b0143eb66dd5dc1e7
 , 
https://github.com/kaxil/airflow/commit/399dd656427e5d46e863fc2e7c0dda3b788fd8ed
 , 
https://github.com/kaxil/airflow/commit/c6a53b34b372466bf1a6a6f906d736284a115f55?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-5393) UI crash in the Ad Hoc Query menu

2019-09-03 Thread ivan de los santos (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ivan de los santos updated AIRFLOW-5393:

Description: 
Airflow UI will crash in the browser returning "Oops" message and the Traceback 
of the crashing error.

 

*How to replicate*: 
 # Launch airflow webserver -p 8080
 # Go to the Airflow-UI
 # Click on "Data Profiling"
 # Select any connection to a database.
 # Click on ".csv" button without writing any text on the query field.
 # You will get an "oops" message with the Traceback.

 

*File causing the problem*:  /python3.6/dist-packages/airflow/www/views.py 
(Line 2317)

 

*Reasons of the problem*:
 #  UnboundLocalError: local variable 'df' referenced before assignment

 * This means "df" was never declared, in fact, df it is contained in a try / 
except block so the except is launched before df gets assigned.

{code:java}
Traceback (most recent call last):
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 2446, 
in wsgi_app
response = self.full_dispatch_request()
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1951, 
in full_dispatch_request
rv = self.handle_user_exception(e)
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1820, 
in handle_user_exception
reraise(exc_type, exc_value, tb)
  File "/home/rde/.local/lib/python3.6/site-packages/flask/_compat.py", line 
39, in reraise
raise value
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1949, 
in full_dispatch_request
rv = self.dispatch_request()
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1935, 
in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/rde/.local/lib/python3.6/site-packages/flask_admin/base.py", line 
69, in inner
return self._run_view(f, *args, **kwargs)
  File "/home/rde/.local/lib/python3.6/site-packages/flask_admin/base.py", line 
368, in _run_view
return fn(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/www/utils.py", line 375, 
in view_func
return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/db.py", line 74, 
in wrapper
return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/www/views.py", line 
2318, in query
response=df.to_csv(index=False),
UnboundLocalError: local variable 'df' referenced before assignment
{code}
 

*Proposed solution*: checking the *_error_* variable which will be True as it 
raises an exception "_(2006, "Unknown MySQL server host 'mysql' (2)")_" and df 
is never assigned.

 
{code:java}
if csv: 
if not error:
return Response(
response=df.to_csv(index=False),
status=200,
mimetype="application/text")
{code}
 

 

I am willing to work in this issue, I think it might be fixed in master tho.

This is my first open issue.

 

Best regards,

Iván

  was:
Airflow UI will crash in the browser returning "Oops" message and the Traceback 
of the crashing error.

 

*How to replicate*: 
 # Launch airflow webserver -p 8080
 # Go to the Airflow-UI
 # Click on "Data Profiling"
 # Select any connection to a database.
 # Click on ".csv" button without writing any text on the query field.
 # You will get an "oops" message with the Traceback.

 

*File causing the problem*:  /python3.6/dist-packages/airflow/www/views.py 
(Line 2317)

 

*Reasons of the problem*:
 #  UnboundLocalError: local variable 'df' referenced before assignment

 * This means "df" was never declared, in fact, df it is contained in a try / 
except block so the except is launched before df gets assigned.
 * 

{code:java}
Traceback (most recent call last):
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 2446, 
in wsgi_app
response = self.full_dispatch_request()
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1951, 
in full_dispatch_request
rv = self.handle_user_exception(e)
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1820, 
in handle_user_exception
reraise(exc_type, exc_value, tb)
  File "/home/rde/.local/lib/python3.6/site-packages/flask/_compat.py", line 
39, in reraise
raise value
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1949, 
in full_dispatch_request
rv = self.dispatch_request()
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1935, 
in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/rde/.local/lib/python3.6/site-packages/flask_admin/base.py", line 
69, in inner
return self._run_view(f, *args, **kwargs)
  File "/home/rde/.local/lib/python3.6/site-packages/flask_admin/base.py", line 
368, in _run_view
return fn(self, *args, **kwargs)
  

[jira] [Updated] (AIRFLOW-5393) UI crash in the Ad Hoc Query menu

2019-09-03 Thread ivan de los santos (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ivan de los santos updated AIRFLOW-5393:

Description: 
Airflow UI will crash in the browser returning "Oops" message and the Traceback 
of the crashing error.

 

*How to replicate*: 
 # Launch airflow webserver -p 8080
 # Go to the Airflow-UI
 # Click on "Data Profiling"
 # Select any connection to a database.
 # Click on ".csv" button without writing any text on the query field.
 # You will get an "oops" message with the Traceback.

 

*File causing the problem*:  /python3.6/dist-packages/airflow/www/views.py 
(Line 2317)

 

*Reasons of the problem*:
 #  UnboundLocalError: local variable 'df' referenced before assignment

 * This means "df" was never declared, in fact, df it is contained in a try / 
except block so the except is launched before df gets assigned.
 * 

{code:java}
Traceback (most recent call last):
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 2446, 
in wsgi_app
response = self.full_dispatch_request()
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1951, 
in full_dispatch_request
rv = self.handle_user_exception(e)
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1820, 
in handle_user_exception
reraise(exc_type, exc_value, tb)
  File "/home/rde/.local/lib/python3.6/site-packages/flask/_compat.py", line 
39, in reraise
raise value
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1949, 
in full_dispatch_request
rv = self.dispatch_request()
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1935, 
in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/rde/.local/lib/python3.6/site-packages/flask_admin/base.py", line 
69, in inner
return self._run_view(f, *args, **kwargs)
  File "/home/rde/.local/lib/python3.6/site-packages/flask_admin/base.py", line 
368, in _run_view
return fn(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/www/utils.py", line 375, 
in view_func
return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/db.py", line 74, 
in wrapper
return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/www/views.py", line 
2318, in query
response=df.to_csv(index=False),
UnboundLocalError: local variable 'df' referenced before assignment
{code}
 

*Proposed solution*: checking the *_error_* variable which will be True as it 
raises an exception "_(2006, "Unknown MySQL server host 'mysql' (2)")_" and df 
is never assigned.

 
{code:java}
if csv: 
if not error:
return Response(
response=df.to_csv(index=False),
status=200,
mimetype="application/text")
{code}
 

 

I am willing to work in this issue, I think it might be fixed in master tho.

This is my first open issue.

 

Best regards,

Iván

  was:
Airflow UI will crash in the browser returning "Oops" message and the Traceback 
of the crashing error.

 

*How to replicate*: 
 # Launch airflow webserver -p 8080
 # Go to the Airflow-UI
 # Click on "Data Profiling"
 # Select any connection to a database.
 # Click on ".csv" button without writing any text on the query field.
 # You will get an "oops" message with the Traceback.

 

*File causing the problem*:  /python3.6/dist-packages/airflow/www/views.py 
(Line 2317)

 

*Reasons of the problem*:
 #  UnboundLocalError: local variable 'df' referenced before assignment

 * This means "df" was never declared, infact df it is contained in a try / 
except block so the except will probably be launched before df gets an 
assignment.

{code:java}
Traceback (most recent call last):
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 2446, 
in wsgi_app
response = self.full_dispatch_request()
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1951, 
in full_dispatch_request
rv = self.handle_user_exception(e)
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1820, 
in handle_user_exception
reraise(exc_type, exc_value, tb)
  File "/home/rde/.local/lib/python3.6/site-packages/flask/_compat.py", line 
39, in reraise
raise value
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1949, 
in full_dispatch_request
rv = self.dispatch_request()
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1935, 
in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/rde/.local/lib/python3.6/site-packages/flask_admin/base.py", line 
69, in inner
return self._run_view(f, *args, **kwargs)
  File "/home/rde/.local/lib/python3.6/site-packages/flask_admin/base.py", line 
368, in _run_view
return fn(self, *

[jira] [Updated] (AIRFLOW-5393) UI crash in the Ad Hoc Query menu

2019-09-03 Thread ivan de los santos (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ivan de los santos updated AIRFLOW-5393:

Description: 
Airflow UI will crash in the browser returning "Oops" message and the Traceback 
of the crashing error.

 

*How to replicate*: 
 # Launch airflow webserver -p 8080
 # Go to the Airflow-UI
 # Click on "Data Profiling"
 # Select any connection to a database.
 # Click on ".csv" button without writing any text on the query field.
 # You will get an "oops" message with the Traceback.

 

*File causing the problem*:  /python3.6/dist-packages/airflow/www/views.py 
(Line 2317)

 

*Reasons of the problem*:
 #  UnboundLocalError: local variable 'df' referenced before assignment

 * This means "df" was never declared, infact df it is contained in a try / 
except block so the except will probably be launched before df gets an 
assignment.

{code:java}
Traceback (most recent call last):
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 2446, 
in wsgi_app
response = self.full_dispatch_request()
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1951, 
in full_dispatch_request
rv = self.handle_user_exception(e)
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1820, 
in handle_user_exception
reraise(exc_type, exc_value, tb)
  File "/home/rde/.local/lib/python3.6/site-packages/flask/_compat.py", line 
39, in reraise
raise value
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1949, 
in full_dispatch_request
rv = self.dispatch_request()
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1935, 
in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/rde/.local/lib/python3.6/site-packages/flask_admin/base.py", line 
69, in inner
return self._run_view(f, *args, **kwargs)
  File "/home/rde/.local/lib/python3.6/site-packages/flask_admin/base.py", line 
368, in _run_view
return fn(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/www/utils.py", line 375, 
in view_func
return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/db.py", line 74, 
in wrapper
return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/www/views.py", line 
2318, in query
response=df.to_csv(index=False),
UnboundLocalError: local variable 'df' referenced before assignment
{code}
*Proposed solution*: Return a message indicating that the query is emtpy.
{code:java}
if csv: 
if not error:
return Response(
response=df.to_csv(index=False),
status=200,
mimetype="application/text")
{code}
 

 

I am willing to work in this issue, I think it might be fixed in master tho.

This is my first open issue.

 

Best regards,

Iván

  was:
Airflow UI will crash in the browser returning "Oops" message and the Traceback 
of the crashing error.

 

*How to replicate*: 
 # Launch airflow webserver -p 8080
 # Go to the Airflow-UI
 # Click on "Data Profiling"
 # Select any connection to a database.
 # Click on ".csv" button without writing any text on the query field.
 # You will get an "oops" message with the Traceback.

 

*File causing the problem*:  /python3.6/dist-packages/airflow/www/views.py 
(Line 2318)

 

*Reasons of the problem*:
 #  UnboundLocalError: local variable 'df' referenced before assignment

 * This means "df" was never declared, infact df it is contained in a try / 
except block so the except will probably be launched before df gets an 
assignment.

{code:java}
Traceback (most recent call last):
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 2446, 
in wsgi_app
response = self.full_dispatch_request()
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1951, 
in full_dispatch_request
rv = self.handle_user_exception(e)
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1820, 
in handle_user_exception
reraise(exc_type, exc_value, tb)
  File "/home/rde/.local/lib/python3.6/site-packages/flask/_compat.py", line 
39, in reraise
raise value
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1949, 
in full_dispatch_request
rv = self.dispatch_request()
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1935, 
in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/rde/.local/lib/python3.6/site-packages/flask_admin/base.py", line 
69, in inner
return self._run_view(f, *args, **kwargs)
  File "/home/rde/.local/lib/python3.6/site-packages/flask_admin/base.py", line 
368, in _run_view
return fn(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/www/utils.py", line 375, 

[jira] [Created] (AIRFLOW-5394) Invalid schedule interval issues

2019-09-03 Thread Shreyash hisariya (Jira)
Shreyash hisariya created AIRFLOW-5394:
--

 Summary: Invalid schedule interval issues
 Key: AIRFLOW-5394
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5394
 Project: Apache Airflow
  Issue Type: Bug
  Components: DagRun, scheduler
Affects Versions: 1.10.2
Reporter: Shreyash hisariya


I am facing issues with using schedule interval of Airflow. Since there is no 
documentation at all, it took me few days to find that the cron expression 
accepts only 5 or 6 fields.

Even with 5 or 6 fields, the dag is failing at multiple times. 

*For example : Invalid Cron expression: [0 15 10 * * ?] is not acceptable* 

The above cron is valid but the airflow doesn't accept it.
 # Can you please put up a documentation of what is valid and what not is 
valid? 
 # Why does airflow doesn't take more than 6 fields?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (AIRFLOW-5393) UI crash in the Ad Hoc Query menu

2019-09-03 Thread ivan de los santos (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ivan de los santos updated AIRFLOW-5393:

Environment: 
Linux
NAME="Ubuntu"
VERSION="18.04.2 LTS (Bionic Beaver)"

Airflow version 1.10.4

  was:
Operating system



> UI crash in the Ad Hoc Query menu
> -
>
> Key: AIRFLOW-5393
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5393
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.10.4
> Environment: Linux
> NAME="Ubuntu"
> VERSION="18.04.2 LTS (Bionic Beaver)"
> Airflow version 1.10.4
>Reporter: ivan de los santos
>Priority: Minor
>  Labels: beginner, easyfix, patch
> Attachments: Captura de pantalla de 2019-09-03 13-42-02.png
>
>
> Airflow UI will crash in the browser returning "Oops" message and the 
> Traceback of the crashing error.
>  
> *How to replicate*: 
>  # Launch airflow webserver -p 8080
>  # Go to the Airflow-UI
>  # Click on "Data Profiling"
>  # Select any connection to a database.
>  # Click on ".csv" button without writing any text on the query field.
>  # You will get an "oops" message with the Traceback.
>  
> *File causing the problem*:  /python3.6/dist-packages/airflow/www/views.py 
> (Line 2318)
>  
> *Reasons of the problem*:
>  #  UnboundLocalError: local variable 'df' referenced before assignment
>  * This means "df" was never declared, infact df it is contained in a try / 
> except block so the except will probably be launched before df gets an 
> assignment.
> {code:java}
> Traceback (most recent call last):
>   File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 
> 2446, in wsgi_app
> response = self.full_dispatch_request()
>   File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 
> 1951, in full_dispatch_request
> rv = self.handle_user_exception(e)
>   File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 
> 1820, in handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File "/home/rde/.local/lib/python3.6/site-packages/flask/_compat.py", line 
> 39, in reraise
> raise value
>   File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 
> 1949, in full_dispatch_request
> rv = self.dispatch_request()
>   File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 
> 1935, in dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File "/home/rde/.local/lib/python3.6/site-packages/flask_admin/base.py", 
> line 69, in inner
> return self._run_view(f, *args, **kwargs)
>   File "/home/rde/.local/lib/python3.6/site-packages/flask_admin/base.py", 
> line 368, in _run_view
> return fn(self, *args, **kwargs)
>   File "/usr/local/lib/python3.6/dist-packages/airflow/www/utils.py", line 
> 375, in view_func
> return f(*args, **kwargs)
>   File "/usr/local/lib/python3.6/dist-packages/airflow/utils/db.py", line 74, 
> in wrapper
> return func(*args, **kwargs)
>   File "/usr/local/lib/python3.6/dist-packages/airflow/www/views.py", line 
> 2318, in query
> response=df.to_csv(index=False),
> UnboundLocalError: local variable 'df' referenced before assignment
> {code}
> *Proposed solution*: Return a message indicating that the query is emtpy.
>  
>  
> I am willing to work in this issue if someone with more experience could 
> guide me about how he expects the application to behave.
> This is my first open issue.
>  
> Best regards,
> Iván



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (AIRFLOW-5393) UI crash in the Ad Hoc Query menu

2019-09-03 Thread ivan de los santos (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ivan de los santos updated AIRFLOW-5393:

Attachment: Captura de pantalla de 2019-09-03 13-42-02.png

> UI crash in the Ad Hoc Query menu
> -
>
> Key: AIRFLOW-5393
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5393
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.10.4
> Environment: Operating system
>Reporter: ivan de los santos
>Priority: Minor
>  Labels: beginner, easyfix, patch
> Attachments: Captura de pantalla de 2019-09-03 13-42-02.png
>
>
> Airflow UI will crash in the browser returning "Oops" message and the 
> Traceback of the crashing error.
>  
> *How to replicate*: 
>  # Launch airflow webserver -p 8080
>  # Go to the Airflow-UI
>  # Click on "Data Profiling"
>  # Select any connection to a database.
>  # Click on ".csv" button without writing any text on the query field.
>  # You will get an "oops" message with the Traceback.
>  
> *File causing the problem*:  /python3.6/dist-packages/airflow/www/views.py 
> (Line 2318)
>  
> *Reasons of the problem*:
>  #  UnboundLocalError: local variable 'df' referenced before assignment
>  * This means "df" was never declared, infact df it is contained in a try / 
> except block so the except will probably be launched before df gets an 
> assignment.
> {code:java}
> Traceback (most recent call last):
>   File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 
> 2446, in wsgi_app
> response = self.full_dispatch_request()
>   File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 
> 1951, in full_dispatch_request
> rv = self.handle_user_exception(e)
>   File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 
> 1820, in handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File "/home/rde/.local/lib/python3.6/site-packages/flask/_compat.py", line 
> 39, in reraise
> raise value
>   File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 
> 1949, in full_dispatch_request
> rv = self.dispatch_request()
>   File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 
> 1935, in dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File "/home/rde/.local/lib/python3.6/site-packages/flask_admin/base.py", 
> line 69, in inner
> return self._run_view(f, *args, **kwargs)
>   File "/home/rde/.local/lib/python3.6/site-packages/flask_admin/base.py", 
> line 368, in _run_view
> return fn(self, *args, **kwargs)
>   File "/usr/local/lib/python3.6/dist-packages/airflow/www/utils.py", line 
> 375, in view_func
> return f(*args, **kwargs)
>   File "/usr/local/lib/python3.6/dist-packages/airflow/utils/db.py", line 74, 
> in wrapper
> return func(*args, **kwargs)
>   File "/usr/local/lib/python3.6/dist-packages/airflow/www/views.py", line 
> 2318, in query
> response=df.to_csv(index=False),
> UnboundLocalError: local variable 'df' referenced before assignment
> {code}
> *Proposed solution*: Return a message indicating that the query is emtpy.
>  
>  
> I am willing to work in this issue if someone with more experience could 
> guide me about how he expects the application to behave.
> This is my first open issue.
>  
> Best regards,
> Iván



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (AIRFLOW-5393) UI crash in the Ad Hoc Query menu

2019-09-03 Thread ivan de los santos (Jira)
ivan de los santos created AIRFLOW-5393:
---

 Summary: UI crash in the Ad Hoc Query menu
 Key: AIRFLOW-5393
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5393
 Project: Apache Airflow
  Issue Type: Bug
  Components: ui
Affects Versions: 1.10.4
 Environment: Operating system

Reporter: ivan de los santos


Airflow UI will crash in the browser returning "Oops" message and the Traceback 
of the crashing error.

 

*How to replicate*: 
 # Launch airflow webserver -p 8080
 # Go to the Airflow-UI
 # Click on "Data Profiling"
 # Select any connection to a database.
 # Click on ".csv" button without writing any text on the query field.
 # You will get an "oops" message with the Traceback.

 

*File causing the problem*:  /python3.6/dist-packages/airflow/www/views.py 
(Line 2318)

 

*Reasons of the problem*:
 #  UnboundLocalError: local variable 'df' referenced before assignment

 * This means "df" was never declared, infact df it is contained in a try / 
except block so the except will probably be launched before df gets an 
assignment.

{code:java}
Traceback (most recent call last):
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 2446, 
in wsgi_app
response = self.full_dispatch_request()
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1951, 
in full_dispatch_request
rv = self.handle_user_exception(e)
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1820, 
in handle_user_exception
reraise(exc_type, exc_value, tb)
  File "/home/rde/.local/lib/python3.6/site-packages/flask/_compat.py", line 
39, in reraise
raise value
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1949, 
in full_dispatch_request
rv = self.dispatch_request()
  File "/home/rde/.local/lib/python3.6/site-packages/flask/app.py", line 1935, 
in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/rde/.local/lib/python3.6/site-packages/flask_admin/base.py", line 
69, in inner
return self._run_view(f, *args, **kwargs)
  File "/home/rde/.local/lib/python3.6/site-packages/flask_admin/base.py", line 
368, in _run_view
return fn(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/www/utils.py", line 375, 
in view_func
return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/db.py", line 74, 
in wrapper
return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/www/views.py", line 
2318, in query
response=df.to_csv(index=False),
UnboundLocalError: local variable 'df' referenced before assignment
{code}
*Proposed solution*: Return a message indicating that the query is emtpy.

 

 

I am willing to work in this issue if someone with more experience could guide 
me about how he expects the application to behave.

This is my first open issue.

 

Best regards,

Iván



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5129) Add typehint to gcp_dlp_hook.py

2019-09-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921354#comment-16921354
 ] 

ASF subversion and git services commented on AIRFLOW-5129:
--

Commit a9ba91579960d1a642a7e19b35ac91a84375fff6 in airflow's branch 
refs/heads/master from Ryan Yuan
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=a9ba915 ]

[AIRFLOW-5129] Add typehint to GCP DLP hook (#5980)



> Add typehint to gcp_dlp_hook.py
> ---
>
> Key: AIRFLOW-5129
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5129
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Affects Versions: 1.10.3
>Reporter: Ryan Yuan
>Assignee: Ryan Yuan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5129) Add typehint to gcp_dlp_hook.py

2019-09-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921353#comment-16921353
 ] 

ASF GitHub Bot commented on AIRFLOW-5129:
-

mik-laj commented on pull request #5980: [AIRFLOW-5129] Add typehint to GCP DLP 
hook
URL: https://github.com/apache/airflow/pull/5980
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add typehint to gcp_dlp_hook.py
> ---
>
> Key: AIRFLOW-5129
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5129
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: gcp
>Affects Versions: 1.10.3
>Reporter: Ryan Yuan
>Assignee: Ryan Yuan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [airflow] mik-laj merged pull request #5980: [AIRFLOW-5129] Add typehint to GCP DLP hook

2019-09-03 Thread GitBox
mik-laj merged pull request #5980: [AIRFLOW-5129] Add typehint to GCP DLP hook
URL: https://github.com/apache/airflow/pull/5980
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] mik-laj commented on a change in pull request #5975: [AIRFLOW-5368] Display DAG from the CLI

2019-09-03 Thread GitBox
mik-laj commented on a change in pull request #5975: [AIRFLOW-5368] Display DAG 
from the CLI
URL: https://github.com/apache/airflow/pull/5975#discussion_r320224848
 
 

 ##
 File path: airflow/bin/cli.py
 ##
 @@ -441,6 +442,33 @@ def set_is_paused(is_paused, args):
 print("Dag: {}, paused: {}".format(args.dag_id, str(is_paused)))
 
 
+def show_dag(args):
+dag = get_dag(args)
+dot = render_dag(dag)
+if args.save:
+filename, _, fileformat = args.save.rpartition('.')
+dot.render(filename=filename, format=fileformat, cleanup=True)
+print("File {} saved".format(args.save))
+elif args.imgcat:
+data = dot.pipe(format='png')
+try:
+proc = subprocess.Popen("imgcat", stdout=subprocess.PIPE, 
stdin=subprocess.PIPE)
+except OSError as e:
+if e.errno == errno.ENOENT:
+raise AirflowException(
+"Failed to execute. Make sure the imgcat executables are 
on your systems \'PATH\'"
+)
+else:
+raise
+out, err = proc.communicate(data)
+if out:
+print(out.decode('utf-8'))
+if err:
+print(err.decode('utf-8'))
+else:
+print(dot.source)
 
 Review comment:
   After that, I thought about the problem that there are these logs.  I think 
airflow should be completely independent of CLI. Docker and kubernetes work 
similarly. This is important when we want Airflow to be completely serverless 
and secure. If I understand correctly, currently CLI always gives full access 
to Airflow. This is on my list of ideas.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] mik-laj commented on a change in pull request #5944: [AIRFLOW-5362] Reorder imports

2019-09-03 Thread GitBox
mik-laj commented on a change in pull request #5944: [AIRFLOW-5362] Reorder 
imports
URL: https://github.com/apache/airflow/pull/5944#discussion_r320219308
 
 

 ##
 File path: setup.py
 ##
 @@ -268,6 +268,7 @@ def write_version(filename: str = 
os.path.join(*["airflow", "git_version"])):
 'click==6.7',
 'flake8>=3.6.0',
 'flake8-colors',
+'flake8-isort',
 
 Review comment:
   Discussion about flake-isort in Airflow is available here:
   https://github.com/apache/airflow/pull/4892#issuecomment-472369397


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] mik-laj commented on a change in pull request #5944: [AIRFLOW-5362] Reorder imports

2019-09-03 Thread GitBox
mik-laj commented on a change in pull request #5944: [AIRFLOW-5362] Reorder 
imports
URL: https://github.com/apache/airflow/pull/5944#discussion_r320219308
 
 

 ##
 File path: setup.py
 ##
 @@ -268,6 +268,7 @@ def write_version(filename: str = 
os.path.join(*["airflow", "git_version"])):
 'click==6.7',
 'flake8>=3.6.0',
 'flake8-colors',
+'flake8-isort',
 
 Review comment:
   Discussion about flake-import-order in Airflow is available here:
   https://github.com/apache/airflow/pull/4892#issuecomment-472369397


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] mik-laj edited a comment on issue #5975: [AIRFLOW-5368] Display DAG from the CLI

2019-09-03 Thread GitBox
mik-laj edited a comment on issue #5975: [AIRFLOW-5368] Display DAG from the CLI
URL: https://github.com/apache/airflow/pull/5975#issuecomment-527248123
 
 
   https://user-images.githubusercontent.com/12058428/64134169-c8171a80-cddb-11e9-9fd6-46f5548fbfde.png";>
   I updated PR:
   
   * Add task coloring according to the ui_color, ui_fgcolor properties 
   * Tasks are drawn using rounded rectangles
   * Rendering logic has been moved to a separate file. It can be useful when 
someone wants to generate automatic documentation


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5088) To implement DAG JSON serialization and DB persistence for webserver scalability improvement

2019-09-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921339#comment-16921339
 ] 

ASF GitHub Bot commented on AIRFLOW-5088:
-

kaxil commented on pull request #5992: [AIRFLOW-5088][AIP-24][BackPort] 
Persisting serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5992
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-5088
 - 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-24+DAG+Persistence+in+DB+using+JSON+for+Airflow+Webserver+and+%28optional%29+Scheduler
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   **Backport of https://github.com/apache/airflow/pull/5743 for v1-10-* 
branches **
   Based on #5701, this PR implements functionalities including writing 
serialized DAGs to DB in scheduler, reading DAGs from DB in webserver, 
controlled by [core] dagcached
   
   The goal is to decouple webserver from the DAG folder, instead it reads 
everything from database.
   
   Rendering template by functions is an exception, in that case it needs to 
re-import DAG, because functions are stringified in serialized DAG.
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> To implement DAG JSON serialization and DB persistence for webserver 
> scalability improvement
> 
>
> Key: AIRFLOW-5088
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5088
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: DAG, webserver
>Affects Versions: 1.10.5
>Reporter: Zhou Fang
>Assignee: Zhou Fang
>Priority: Major
>
> Created this issue for starting to implement DAG serialization using JSON and 
> persistence in DB. Serialized DAG will be used in webserver for solving the 
> webserver scalability issue.
>  
> The implementation is based on AIP-24: 
> [https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-24+DAG+Persistence+in+DB+using+JSON+for+Airflow+Webserver+and+%28optional%29+Scheduler]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [airflow] kaxil opened a new pull request #5992: [AIRFLOW-5088][AIP-24][BackPort] Persisting serialized DAG in DB for webserver scalability

2019-09-03 Thread GitBox
kaxil opened a new pull request #5992: [AIRFLOW-5088][AIP-24][BackPort] 
Persisting serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5992
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-5088
 - 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-24+DAG+Persistence+in+DB+using+JSON+for+Airflow+Webserver+and+%28optional%29+Scheduler
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   **Backport of https://github.com/apache/airflow/pull/5743 for v1-10-* 
branches **
   Based on #5701, this PR implements functionalities including writing 
serialized DAGs to DB in scheduler, reading DAGs from DB in webserver, 
controlled by [core] dagcached
   
   The goal is to decouple webserver from the DAG folder, instead it reads 
everything from database.
   
   Rendering template by functions is an exception, in that case it needs to 
re-import DAG, because functions are stringified in serialized DAG.
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] mik-laj commented on a change in pull request #5975: [AIRFLOW-5368] Display DAG from the CLI

2019-09-03 Thread GitBox
mik-laj commented on a change in pull request #5975: [AIRFLOW-5368] Display DAG 
from the CLI
URL: https://github.com/apache/airflow/pull/5975#discussion_r320214775
 
 

 ##
 File path: airflow/gcp/example_dags/example_vision.py
 ##
 @@ -52,6 +52,7 @@
 
 import airflow
 from airflow import models
+from airflow.contrib.sensors.aws_glue_catalog_partition_sensor import 
AwsGlueCatalogPartitionSensor
 
 Review comment:
   Yes. Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] mik-laj commented on a change in pull request #5975: [AIRFLOW-5368] Display DAG from the CLI

2019-09-03 Thread GitBox
mik-laj commented on a change in pull request #5975: [AIRFLOW-5368] Display DAG 
from the CLI
URL: https://github.com/apache/airflow/pull/5975#discussion_r320214538
 
 

 ##
 File path: airflow/bin/cli.py
 ##
 @@ -441,6 +442,33 @@ def set_is_paused(is_paused, args):
 print("Dag: {}, paused: {}".format(args.dag_id, str(is_paused)))
 
 
+def show_dag(args):
+dag = get_dag(args)
+dot = render_dag(dag)
+if args.save:
+filename, _, fileformat = args.save.rpartition('.')
+dot.render(filename=filename, format=fileformat, cleanup=True)
+print("File {} saved".format(args.save))
+elif args.imgcat:
+data = dot.pipe(format='png')
+try:
+proc = subprocess.Popen("imgcat", stdout=subprocess.PIPE, 
stdin=subprocess.PIPE)
+except OSError as e:
+if e.errno == errno.ENOENT:
+raise AirflowException(
+"Failed to execute. Make sure the imgcat executables are 
on your systems \'PATH\'"
+)
+else:
+raise
+out, err = proc.communicate(data)
+if out:
+print(out.decode('utf-8'))
+if err:
+print(err.decode('utf-8'))
+else:
+print(dot.source)
 
 Review comment:
   It works. I checked it.
   ```
   airflow dags show example_gcp_vision_explicit_id --save a.dot
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] kaxil commented on issue #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability

2019-09-03 Thread GitBox
kaxil commented on issue #5743: [AIRFLOW-5088][AIP-24] Persisting serialized 
DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#issuecomment-52749
 
 
   > How about instead of a background thread (I'm wary of using threads in 
python) could we instead query the last modified time of the serialised dag on 
each request?
   > 
   > i.e. when asking for dag X we check if dag X has been updated in the db 
since we last loaded it?
   How about this -> 
https://github.com/kaxil/airflow/commit/d2ec5371ae987ff87f60ee6b0143eb66dd5dc1e7
 ?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #5944: [AIRFLOW-5362] Reorder imports

2019-09-03 Thread GitBox
ashb commented on a change in pull request #5944: [AIRFLOW-5362] Reorder imports
URL: https://github.com/apache/airflow/pull/5944#discussion_r320202786
 
 

 ##
 File path: setup.py
 ##
 @@ -268,6 +268,7 @@ def write_version(filename: str = 
os.path.join(*["airflow", "git_version"])):
 'click==6.7',
 'flake8>=3.6.0',
 'flake8-colors',
+'flake8-isort',
 
 Review comment:
   > License: GNU General Public License v2 (GPLv2) (GPL version 2)
   
   I can't remember what the outcome was on this for other devel only changes. 
@mik-laj Do you remember?
   
   (Somewhat annoyingly isort itself is MIT)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #5944: [AIRFLOW-5362] Reorder imports

2019-09-03 Thread GitBox
ashb commented on a change in pull request #5944: [AIRFLOW-5362] Reorder imports
URL: https://github.com/apache/airflow/pull/5944#discussion_r320204446
 
 

 ##
 File path: airflow/_vendor/nvd3/__init__.py
 ##
 @@ -16,14 +16,14 @@
'scatterChart', 'discreteBarChart', 'multiBarChart']
 
 
+from . import ipynb
+from .cumulativeLineChart import cumulativeLineChart
 
 Review comment:
   Can you exclude anything under airflow/_vendor from changes please?
   
   Perhaps `skip=airflow/_vendor` in setup.cfg might do the trick? Should do 
given `skip = build,.tox,venv` is used in some of isort's examples.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] Fokko edited a comment on issue #5990: [AIRFLOW-5390] Remove provide context

2019-09-03 Thread GitBox
Fokko edited a comment on issue #5990: [AIRFLOW-5390] Remove provide context
URL: https://github.com/apache/airflow/pull/5990#issuecomment-52746
 
 
   Thanks @ashb for thinking along. Appreciate it. I've added the tests to the 
suite.
   
   I think we can fix the one with the op_args by skipping the first 
`len(op_args)` number of arguments. This would not introduce any change in 
behavior. Similar to the arguments in the `kwargs`, we could give the keys in 
`op_kwargs` priority.
   
   `kwargs` is actually handled: 
https://github.com/apache/airflow/blob/master/airflow/operators/python_operator.py#L108-L109


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] Fokko commented on issue #5990: [AIRFLOW-5390] Remove provide context

2019-09-03 Thread GitBox
Fokko commented on issue #5990: [AIRFLOW-5390] Remove provide context
URL: https://github.com/apache/airflow/pull/5990#issuecomment-52746
 
 
   Thanks @ashb for thinking along. Appreciate it. I've added the tests to the 
suite.
   
   I think we can fix the one with the op_args by skipping the first 
`len(op_args)` number of arguments. This would not introduce any change in 
behavior. Similar to the arguments in the `kwargs`, we could give the keys in 
`op_kwargs ` priority.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5365) When switching between branches (master/v1-10-test) rebuild of image should not be needed for pre-commits

2019-09-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921310#comment-16921310
 ] 

ASF GitHub Bot commented on AIRFLOW-5365:
-

ashb commented on pull request #5972: [AIRFLOW-5365] No need to do image 
rebuild when switching master/v1-1…
URL: https://github.com/apache/airflow/pull/5972
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> When switching between branches (master/v1-10-test) rebuild of image should 
> not be needed for pre-commits
> -
>
> Key: AIRFLOW-5365
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5365
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ci
>Affects Versions: 2.0.0, 1.10.5
>Reporter: Jarek Potiuk
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5365) When switching between branches (master/v1-10-test) rebuild of image should not be needed for pre-commits

2019-09-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921311#comment-16921311
 ] 

ASF subversion and git services commented on AIRFLOW-5365:
--

Commit 319b80437cf7e58d0ceecf9a58e336e14936b163 in airflow's branch 
refs/heads/master from Jarek Potiuk
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=319b804 ]

[AIRFLOW-5365] No need to do image rebuild when switching master/v1-10-test 
(#5972)



> When switching between branches (master/v1-10-test) rebuild of image should 
> not be needed for pre-commits
> -
>
> Key: AIRFLOW-5365
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5365
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ci
>Affects Versions: 2.0.0, 1.10.5
>Reporter: Jarek Potiuk
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [airflow] ashb merged pull request #5972: [AIRFLOW-5365] No need to do image rebuild when switching master/v1-1…

2019-09-03 Thread GitBox
ashb merged pull request #5972: [AIRFLOW-5365] No need to do image rebuild when 
switching master/v1-1…
URL: https://github.com/apache/airflow/pull/5972
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #5979: [AIRFLOW-5373] Super fast pre-commit check for basic python2 compatib…

2019-09-03 Thread GitBox
ashb commented on a change in pull request #5979: [AIRFLOW-5373] Super fast 
pre-commit check for basic python2 compatib…
URL: https://github.com/apache/airflow/pull/5979#discussion_r320197482
 
 

 ##
 File path: airflow/contrib/example_dags/example_qubole_operator.py
 ##
 @@ -198,7 +198,7 @@ def compare_result(ds, **kwargs):
 
 /** Computes an approximation to pi */
 object SparkPi {
-  def main(args: Array[String]) {
+  def main(args) {
 
 Review comment:
   This one is a false positive -- this is Scala code :D


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ssoto opened a new pull request #5991: Adds two companies to ussing Airflow section

2019-09-03 Thread GitBox
ssoto opened a new pull request #5991: Adds two companies to ussing Airflow 
section
URL: https://github.com/apache/airflow/pull/5991
 
 
   https://issues.apache.org/jira/browse/AIRFLOW-5392
   
   ### Description
   
   Adds two new companies that are using Apache Airflow
   
   ### Tests
   OK
   
   ### Commits
   Ok
   
   ### Documentation
   Ok
   
   ### Code Quality
   
   - [ x ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-5392) README.md update

2019-09-03 Thread Sergio Soto (Jira)
Sergio Soto created AIRFLOW-5392:


 Summary: README.md update
 Key: AIRFLOW-5392
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5392
 Project: Apache Airflow
  Issue Type: Improvement
  Components: documentation
Affects Versions: 1.10.4
Reporter: Sergio Soto
Assignee: Sergio Soto


Add [Logitravel Group|[http://example.com]|https://www.logitravel.com/] and 
[Bluekiri|https://bluekiri.com/] to companies that uses Apache Airflow



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [airflow] ashb commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] Persisting serialized DAG in DB for webserver scalability

2019-09-03 Thread GitBox
ashb commented on a change in pull request #5743: [AIRFLOW-5088][AIP-24] 
Persisting serialized DAG in DB for webserver scalability
URL: https://github.com/apache/airflow/pull/5743#discussion_r320186899
 
 

 ##
 File path: airflow/models/serialized_dag.py
 ##
 @@ -0,0 +1,155 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Serialzed DAG table in database."""
+
+import hashlib
+from typing import Any, Dict, List, Optional, TYPE_CHECKING
+from sqlalchemy import Column, Index, Integer, String, Text, and_
+from sqlalchemy.sql import exists
+
+from airflow.models.base import Base, ID_LEN
+from airflow.utils import db, timezone
+from airflow.utils.sqlalchemy import UtcDateTime
+
+
+if TYPE_CHECKING:
+from airflow.dag.serialization.serialized_dag import SerializedDAG  # 
noqa: F401, E501; # pylint: disable=cyclic-import
+from airflow.models import DAG  # noqa: F401; # pylint: 
disable=cyclic-import
+
+
+class SerializedDagModel(Base):
+"""A table for serialized DAGs.
+
+serialized_dag table is a snapshot of DAG files synchronized by scheduler.
+This feature is controlled by:
+[core] dagcached = False: enable this feature
+[core] dagcached_min_update_interval = 30 (s):
+serialized DAGs are updated in DB when a file gets processed by 
scheduler,
+to reduce DB write rate, there is a minimal interval of updating 
serialized DAGs.
+[scheduler] dag_dir_list_interval = 300 (s):
+interval of deleting serialized DAGs in DB when the files are 
deleted, suggest
+to use a smaller interval such as 60
+
+It is used by webserver to load dagbags when dagcached=True. Because 
reading from
+database is lightweight compared to importing from files, it solves the 
webserver
+scalability issue.
+"""
+__tablename__ = 'serialized_dag'
+
+dag_id = Column(String(ID_LEN), primary_key=True)
+fileloc = Column(String(2000))
+# The max length of fileloc exceeds the limit of indexing.
+fileloc_hash = Column(Integer)
+data = Column(Text)
+last_updated = Column(UtcDateTime)
+
+__table_args__ = (
+Index('idx_fileloc_hash', fileloc_hash, unique=False),
+)
+
+def __init__(self, dag):
+from airflow.dag.serialization import Serialization
+
+self.dag_id = dag.dag_id
+self.fileloc = dag.full_filepath
+self.fileloc_hash = SerializedDagModel.dag_fileloc_hash(self.fileloc)
+self.data = Serialization.to_json(dag)
+self.last_updated = timezone.utcnow()
+
+@staticmethod
+def dag_fileloc_hash(full_filepath: str) -> int:
+Hashing file location for indexing.
+
+:param full_filepath: full filepath of DAG file
+:return: hashed full_filepath
+"""
+# hashing is needed because the length of fileloc is 2000 as an 
Airflow convention,
+# which is over the limit of indexing. If we can reduce the length of 
fileloc, then
+# hashing is not needed.
+return int(0x & int(
 
 Review comment:
   Sounds good, yes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #5975: [AIRFLOW-5368] Display DAG from the CLI

2019-09-03 Thread GitBox
ashb commented on a change in pull request #5975: [AIRFLOW-5368] Display DAG 
from the CLI
URL: https://github.com/apache/airflow/pull/5975#discussion_r320185972
 
 

 ##
 File path: airflow/gcp/example_dags/example_vision.py
 ##
 @@ -52,6 +52,7 @@
 
 import airflow
 from airflow import models
+from airflow.contrib.sensors.aws_glue_catalog_partition_sensor import 
AwsGlueCatalogPartitionSensor
 
 Review comment:
   Unrelated change?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on a change in pull request #5975: [AIRFLOW-5368] Display DAG from the CLI

2019-09-03 Thread GitBox
ashb commented on a change in pull request #5975: [AIRFLOW-5368] Display DAG 
from the CLI
URL: https://github.com/apache/airflow/pull/5975#discussion_r320185707
 
 

 ##
 File path: airflow/bin/cli.py
 ##
 @@ -441,6 +442,33 @@ def set_is_paused(is_paused, args):
 print("Dag: {}, paused: {}".format(args.dag_id, str(is_paused)))
 
 
+def show_dag(args):
+dag = get_dag(args)
+dot = render_dag(dag)
+if args.save:
+filename, _, fileformat = args.save.rpartition('.')
+dot.render(filename=filename, format=fileformat, cleanup=True)
+print("File {} saved".format(args.save))
+elif args.imgcat:
+data = dot.pipe(format='png')
+try:
+proc = subprocess.Popen("imgcat", stdout=subprocess.PIPE, 
stdin=subprocess.PIPE)
+except OSError as e:
+if e.errno == errno.ENOENT:
+raise AirflowException(
+"Failed to execute. Make sure the imgcat executables are 
on your systems \'PATH\'"
+)
+else:
+raise
+out, err = proc.communicate(data)
+if out:
+print(out.decode('utf-8'))
+if err:
+print(err.decode('utf-8'))
+else:
+print(dot.source)
 
 Review comment:
   Given this also prints logging to stdout from airflow startup it might be 
worth adding a way to save the dot contents to a file. (Unless 
`dot.render(format="dot")` already works for that?)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] Fokko commented on a change in pull request #5990: [AIRFLOW-5390] Remove provide context

2019-09-03 Thread GitBox
Fokko commented on a change in pull request #5990: [AIRFLOW-5390] Remove 
provide context
URL: https://github.com/apache/airflow/pull/5990#discussion_r320184548
 
 

 ##
 File path: airflow/operators/python_operator.py
 ##
 @@ -104,10 +97,21 @@ def execute(self, context):
  for k, v in airflow_context_vars.items()]))
 os.environ.update(airflow_context_vars)
 
-if self.provide_context:
-context.update(self.op_kwargs)
-context['templates_dict'] = self.templates_dict
+context.update(self.op_kwargs)
+context['templates_dict'] = self.templates_dict
+
+if {parameter for name, parameter
+   in signature(self.python_callable).parameters.items()
+   if str(parameter).startswith("**")}:
+# If there is a **kwargs, **context or **_ then just pass 
everything.
 self.op_kwargs = context
+else:
+# If there is only for example, an execution_date, then pass only 
these in :-)
+self.op_kwargs = {
+name: context[name] for name, parameter
+in signature(self.python_callable).parameters.items()
+if name in context  # If it isn't available on the context, 
then ignore
 
 Review comment:
   👍 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] Fokko commented on a change in pull request #5990: [AIRFLOW-5390] Remove provide context

2019-09-03 Thread GitBox
Fokko commented on a change in pull request #5990: [AIRFLOW-5390] Remove 
provide context
URL: https://github.com/apache/airflow/pull/5990#discussion_r320184507
 
 

 ##
 File path: airflow/operators/python_operator.py
 ##
 @@ -104,10 +97,21 @@ def execute(self, context):
  for k, v in airflow_context_vars.items()]))
 os.environ.update(airflow_context_vars)
 
-if self.provide_context:
-context.update(self.op_kwargs)
-context['templates_dict'] = self.templates_dict
+context.update(self.op_kwargs)
+context['templates_dict'] = self.templates_dict
+
+if {parameter for name, parameter
+   in signature(self.python_callable).parameters.items()
+   if str(parameter).startswith("**")}:
 
 Review comment:
   Love it, thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] ashb commented on issue #5990: [AIRFLOW-5390] Remove provide context

2019-09-03 Thread GitBox
ashb commented on issue #5990: [AIRFLOW-5390] Remove provide context
URL: https://github.com/apache/airflow/pull/5990#issuecomment-527386507
 
 
   Another example: what happens if I do:
   
   ```python
   def fn(dag, **context):
   print(dag)
   
   PythonOperator(
   op_args=[1],
   python_callable=fn
   )
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] BasPH commented on a change in pull request #5990: [AIRFLOW-5390] Remove provide context

2019-09-03 Thread GitBox
BasPH commented on a change in pull request #5990: [AIRFLOW-5390] Remove 
provide context
URL: https://github.com/apache/airflow/pull/5990#discussion_r320181214
 
 

 ##
 File path: airflow/operators/python_operator.py
 ##
 @@ -104,10 +97,21 @@ def execute(self, context):
  for k, v in airflow_context_vars.items()]))
 os.environ.update(airflow_context_vars)
 
-if self.provide_context:
-context.update(self.op_kwargs)
-context['templates_dict'] = self.templates_dict
+context.update(self.op_kwargs)
+context['templates_dict'] = self.templates_dict
+
+if {parameter for name, parameter
+   in signature(self.python_callable).parameters.items()
+   if str(parameter).startswith("**")}:
 
 Review comment:
   Bit shorter: `{param for param in sig.parameters.values() if 
str(param).startswith("**")}`
   
   or: `any(str(params).startswith("**") for params in sig.parameters.values())`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] BasPH commented on a change in pull request #5990: [AIRFLOW-5390] Remove provide context

2019-09-03 Thread GitBox
BasPH commented on a change in pull request #5990: [AIRFLOW-5390] Remove 
provide context
URL: https://github.com/apache/airflow/pull/5990#discussion_r320182021
 
 

 ##
 File path: airflow/operators/python_operator.py
 ##
 @@ -104,10 +97,21 @@ def execute(self, context):
  for k, v in airflow_context_vars.items()]))
 os.environ.update(airflow_context_vars)
 
-if self.provide_context:
-context.update(self.op_kwargs)
-context['templates_dict'] = self.templates_dict
+context.update(self.op_kwargs)
+context['templates_dict'] = self.templates_dict
+
+if {parameter for name, parameter
+   in signature(self.python_callable).parameters.items()
+   if str(parameter).startswith("**")}:
+# If there is a **kwargs, **context or **_ then just pass 
everything.
 self.op_kwargs = context
+else:
+# If there is only for example, an execution_date, then pass only 
these in :-)
+self.op_kwargs = {
+name: context[name] for name, parameter
+in signature(self.python_callable).parameters.items()
+if name in context  # If it isn't available on the context, 
then ignore
 
 Review comment:
   Same here: `name: context[name] for name in 
signature(self.python_callable).parameters.keys() if name in context`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] BasPH commented on a change in pull request #5990: [AIRFLOW-5390] Remove provide context

2019-09-03 Thread GitBox
BasPH commented on a change in pull request #5990: [AIRFLOW-5390] Remove 
provide context
URL: https://github.com/apache/airflow/pull/5990#discussion_r320181214
 
 

 ##
 File path: airflow/operators/python_operator.py
 ##
 @@ -104,10 +97,21 @@ def execute(self, context):
  for k, v in airflow_context_vars.items()]))
 os.environ.update(airflow_context_vars)
 
-if self.provide_context:
-context.update(self.op_kwargs)
-context['templates_dict'] = self.templates_dict
+context.update(self.op_kwargs)
+context['templates_dict'] = self.templates_dict
+
+if {parameter for name, parameter
+   in signature(self.python_callable).parameters.items()
+   if str(parameter).startswith("**")}:
 
 Review comment:
   Bit shorter: `{param for param in sig.parameters.values() if 
str(param).startswith("**")}`
   
   or: `any(str(param).startswith("**") for param in sig.parameters.values())`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] Fokko commented on issue #5967: Enable authentication for druid query hook.

2019-09-03 Thread GitBox
Fokko commented on issue #5967: Enable authentication for druid query hook.
URL: https://github.com/apache/airflow/pull/5967#issuecomment-527383916
 
 
   @scrawfor Can you follow the process and create a ticket in Jira? This keeps 
us to help track of the releases.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [airflow] Fokko merged pull request #5966: [AIRFLOW-5360] Type annotations for BaseSensorOperator

2019-09-03 Thread GitBox
Fokko merged pull request #5966: [AIRFLOW-5360] Type annotations for 
BaseSensorOperator
URL: https://github.com/apache/airflow/pull/5966
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-5360) Type annotations for BaseSensorOperator

2019-09-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921292#comment-16921292
 ] 

ASF subversion and git services commented on AIRFLOW-5360:
--

Commit f46b54a10ec35485e7f507e308b9b36232168170 in airflow's branch 
refs/heads/master from TobKed
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=f46b54a ]

[AIRFLOW-5360] Type annotations for BaseSensorOperator (#5966)



> Type annotations for BaseSensorOperator
> ---
>
> Key: AIRFLOW-5360
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5360
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.10.4
>Reporter: Tobiasz Kedzierski
>Assignee: Tobiasz Kedzierski
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5360) Type annotations for BaseSensorOperator

2019-09-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921291#comment-16921291
 ] 

ASF GitHub Bot commented on AIRFLOW-5360:
-

Fokko commented on pull request #5966: [AIRFLOW-5360] Type annotations for 
BaseSensorOperator
URL: https://github.com/apache/airflow/pull/5966
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Type annotations for BaseSensorOperator
> ---
>
> Key: AIRFLOW-5360
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5360
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.10.4
>Reporter: Tobiasz Kedzierski
>Assignee: Tobiasz Kedzierski
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (AIRFLOW-5391) Clearing a task skipped by BranchPythonOperator will cause the task to execute

2019-09-03 Thread Qian Yu (Jira)
Qian Yu created AIRFLOW-5391:


 Summary: Clearing a task skipped by BranchPythonOperator will 
cause the task to execute
 Key: AIRFLOW-5391
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5391
 Project: Apache Airflow
  Issue Type: Bug
  Components: operators
Affects Versions: 1.10.4
Reporter: Qian Yu


I tried this on 1.10.3 and 1.10.4, both have this issue: 

E.g. in this example from the doc, branch_a executed, branch_false was skipped 
because of branching condition. However if someone Clear branch_false, it'll 
cause branch_false to 
execute.!https://airflow.apache.org/_images/branch_good.png!

This behaviour is understandable given how BranchPythonOperator is implemented. 
BranchPythonOperator does not store its decision anywhere. It skips its own 
downstream tasks in the branch at runtime. So there's currently no way for 
branch_false to know it should be skipped without rerunning the branching task.

This is obviously counter-intuitive from the user's perspective. In this 
example, users would not expect branch_false to execute when they clear it 
because the branching task should have skipped it.

There are a few ways to improve this:

Option 1): Make downstream tasks skipped by BranchPythonOperator not clearable 
without also clearing the upstream BranchPythonOperator. In this example, if 
someone clears branch_false without clearing branching, branch_false should not 
execute.

Option 2): Make BranchPythonOperator store the result of its skip condition 
somewhere. Make downstream tasks check for this stored decision and skip 
themselves if they should have been skipped by the condition. This probably 
means the decision of BranchPythonOperator needs to be stored in the db.

 

[kevcampb|https://blog.diffractive.io/author/kevcampb/] attempted a workaround 
and on this blog. And he acknowledged his workaround is not perfect and a 
better permanent fix is needed:

[https://blog.diffractive.io/2018/08/07/replacement-shortcircuitoperator-for-airflow/]

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (AIRFLOW-5390) Remove provide_context

2019-09-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921232#comment-16921232
 ] 

ASF GitHub Bot commented on AIRFLOW-5390:
-

Fokko commented on pull request #5990: [AIRFLOW-5390] Remove provide context
URL: https://github.com/apache/airflow/pull/5990
 
 
   Make sure you have checked _all_ steps below.
   
   I'm giving Apache Airflow training across Europe, and at every workshop that 
I provide, I find it a bit awkward to introduce the idea of the 
provide_context. Instead, I want to remove this thing and make it infer the 
variables automagically based on the signature of the Python callable. Less is 
more :-)
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-5390\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-5390
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove provide_context
> --
>
> Key: AIRFLOW-5390
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5390
> Project: Apache Airflow
>  Issue Type: Task
>  Components: core
>Affects Versions: 1.10.4
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [airflow] Fokko opened a new pull request #5990: [AIRFLOW-5390] Remove provide context

2019-09-03 Thread GitBox
Fokko opened a new pull request #5990: [AIRFLOW-5390] Remove provide context
URL: https://github.com/apache/airflow/pull/5990
 
 
   Make sure you have checked _all_ steps below.
   
   I'm giving Apache Airflow training across Europe, and at every workshop that 
I provide, I find it a bit awkward to introduce the idea of the 
provide_context. Instead, I want to remove this thing and make it infer the 
variables automagically based on the signature of the Python callable. Less is 
more :-)
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-5390\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-5390
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
 - In case you are proposing a fundamental code change, you need to create 
an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)).
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
 - If you implement backwards incompatible changes, please leave a note in 
the [Updating.md](https://github.com/apache/airflow/blob/master/UPDATING.md) so 
we can assign it to a appropriate release
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-5390) Remove provide_context

2019-09-03 Thread Fokko Driesprong (Jira)
Fokko Driesprong created AIRFLOW-5390:
-

 Summary: Remove provide_context
 Key: AIRFLOW-5390
 URL: https://issues.apache.org/jira/browse/AIRFLOW-5390
 Project: Apache Airflow
  Issue Type: Task
  Components: core
Affects Versions: 1.10.4
Reporter: Fokko Driesprong
Assignee: Fokko Driesprong
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[GitHub] [airflow] TobKed commented on issue #5980: [AIRFLOW-5129] Add typehint to GCP DLP hook

2019-09-03 Thread GitBox
TobKed commented on issue #5980: [AIRFLOW-5129] Add typehint to GCP DLP hook
URL: https://github.com/apache/airflow/pull/5980#issuecomment-527341514
 
 
   cc @mik-laj 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


<    1   2   3