[jira] [Created] (AIRFLOW-3650) Fix flaky test in TestTriggerDag

2019-01-07 Thread Tao Feng (JIRA)
Tao Feng created AIRFLOW-3650:
-

 Summary: Fix flaky test in TestTriggerDag
 Key: AIRFLOW-3650
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3650
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Tao Feng
Assignee: Tao Feng






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] feng-tao commented on issue #4457: [AIRFLOW-3650] Skip running on mysql for the flaky test

2019-01-07 Thread GitBox
feng-tao commented on issue #4457: [AIRFLOW-3650] Skip running on mysql for the 
flaky test
URL: https://github.com/apache/airflow/pull/4457#issuecomment-452203627
 
 
   PTAL @kaxil  @Fokko 
   
   I think the issue is not on the actual test itself, but on mysqlclient 
library. The session can't get the latest data from ORM after finish running 
this 
line(https://github.com/apache/airflow/blob/master/tests/www_rbac/test_views.py#L1473)
 (I open a local pdb to confirm). But I try different mysqlclient versions 
which all don't work. To unblock, I would suggest we skip MySQL orm for this 
test.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-3650) Fix flaky test in TestTriggerDag

2019-01-07 Thread Tao Feng (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Feng updated AIRFLOW-3650:
--
Description: The test_trigger_dag_button test fails for mysql ORM almost 
everytime

> Fix flaky test in TestTriggerDag
> 
>
> Key: AIRFLOW-3650
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3650
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Tao Feng
>Assignee: Tao Feng
>Priority: Major
>
> The test_trigger_dag_button test fails for mysql ORM almost everytime



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-3592) Logs cannot be viewed while in rescheduled state

2019-01-07 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann reassigned AIRFLOW-3592:


Assignee: Stefan Seelmann

> Logs cannot be viewed while in rescheduled state
> 
>
> Key: AIRFLOW-3592
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3592
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: webserver
>Affects Versions: 1.10.1
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 1.10.2, 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-3591) Fix start date, end date, duration for rescheduled tasks

2019-01-07 Thread Stefan Seelmann (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann reassigned AIRFLOW-3591:


Assignee: Stefan Seelmann

> Fix start date, end date, duration for rescheduled tasks
> 
>
> Key: AIRFLOW-3591
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3591
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: webserver
>Affects Versions: 1.10.1
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 1.10.2, 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] seelmann commented on issue #4408: [AIRFLOW-3589] Visualize reschedule state in all views

2019-01-07 Thread GitBox
seelmann commented on issue #4408: [AIRFLOW-3589] Visualize reschedule state in 
all views
URL: https://github.com/apache/airflow/pull/4408#issuecomment-452194510
 
 
   @ashb @Fokko PTAL, ready to merge from my PoV.
   
   I tested in our test env with Celery executor and added some tests for the 
changes in `jobs.py`.
   
   I decided to add a dedicated state `UP_FOR_RESCHEDULE`. It increases the 
complexity in `jobs.py` a bit. However with that change all the views just work 
without any code change.
   
   Other things to discuss:
   * Is the name `UP_FOR_RESCHEDULE` a good one or do you have a better idea?
   * As color I choosed `turquoise`, not sure if it's a good choice? See 
screenshots attached to the Jira.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (AIRFLOW-3649) Feature to add extra labels to kubernetes worker pods

2019-01-07 Thread Bo Blanton (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Blanton updated AIRFLOW-3649:

Priority: Minor  (was: Major)

> Feature to add extra labels to kubernetes worker pods
> -
>
> Key: AIRFLOW-3649
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3649
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: kubernetes
>Affects Versions: 1.10.1
>Reporter: Bo Blanton
>Priority: Minor
>
> For many systems and metrics emitters in k8, the pod labels are used to tag 
> these metrics and other resource utilization metrics.
> Since airflow adds a fixed set of "reserved" labels for its own 
> identification of worker nodes, we wish to add some static labels for just 
> these purposes.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3649) Feature to add extra labels to kubernetes worker pods

2019-01-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736807#comment-16736807
 ] 

ASF GitHub Bot commented on AIRFLOW-3649:
-

wyndhblb commented on pull request #4459: [AIRFLOW-3649] Feature to add extra 
labels to kubernetes worker pods
URL: https://github.com/apache/airflow/pull/4459
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ X] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   This pr is meant to add the ability to attached a static set of extra labels 
to worker pods for resource, metrics, and other use tracking.
   
   ### Tests
   
   - [ X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Feature to add extra labels to kubernetes worker pods
> -
>
> Key: AIRFLOW-3649
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3649
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: kubernetes
>Affects Versions: 1.10.1
>Reporter: Bo Blanton
>Priority: Major
>
> For many systems and metrics emitters in k8, the pod labels are used to tag 
> these metrics and other resource utilization metrics.
> Since airflow adds a fixed set of "reserved" labels for its own 
> identification of worker nodes, we wish to add some static labels for just 
> these purposes.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] wyndhblb opened a new pull request #4459: [AIRFLOW-3649] Feature to add extra labels to kubernetes worker pods

2019-01-07 Thread GitBox
wyndhblb opened a new pull request #4459: [AIRFLOW-3649] Feature to add extra 
labels to kubernetes worker pods
URL: https://github.com/apache/airflow/pull/4459
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ X] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI 
changes:
   
   This pr is meant to add the ability to attached a static set of extra labels 
to worker pods for resource, metrics, and other use tracking.
   
   ### Tests
   
   - [ X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-3649) Feature to add extra labels to kubernetes worker pods

2019-01-07 Thread Bo Blanton (JIRA)
Bo Blanton created AIRFLOW-3649:
---

 Summary: Feature to add extra labels to kubernetes worker pods
 Key: AIRFLOW-3649
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3649
 Project: Apache Airflow
  Issue Type: Improvement
  Components: kubernetes
Affects Versions: 1.10.1
Reporter: Bo Blanton


For many systems and metrics emitters in k8, the pod labels are used to tag 
these metrics and other resource utilization metrics.

Since airflow adds a fixed set of "reserved" labels for its own identification 
of worker nodes, we wish to add some static labels for just these purposes.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] msumit commented on issue #3533: [AIRFLOW-161] New redirect route and extra links

2019-01-07 Thread GitBox
msumit commented on issue #3533: [AIRFLOW-161] New redirect route and extra 
links
URL: https://github.com/apache/airflow/pull/3533#issuecomment-452189430
 
 
   @ArgentFalcon maybe you want to update the screenshots and description of 
the PR as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3648) Default to gcp project id in connection for all gcp hooks/operators

2019-01-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736754#comment-16736754
 ] 

ASF GitHub Bot commented on AIRFLOW-3648:
-

jmcarp commented on pull request #4458: [AIRFLOW-3648] Default to connection 
project id in gcp cloud sql.
URL: https://github.com/apache/airflow/pull/4458
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3648
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Default to gcp project id in connection for all gcp hooks/operators
> ---
>
> Key: AIRFLOW-3648
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3648
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Josh Carp
>Assignee: Josh Carp
>Priority: Minor
>
> Some gcp hooks and operators use the project id specified in the connection 
> if no value is passed explicitly, and some don't. For consistency, all gcp 
> hooks and operators should make project id optional and default to the 
> project id in the connection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jmcarp opened a new pull request #4458: [AIRFLOW-3648] Default to connection project id in gcp cloud sql.

2019-01-07 Thread GitBox
jmcarp opened a new pull request #4458: [AIRFLOW-3648] Default to connection 
project id in gcp cloud sql.
URL: https://github.com/apache/airflow/pull/4458
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3648
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-3648) Default to gcp project id in connection for all gcp hooks/operators

2019-01-07 Thread Josh Carp (JIRA)
Josh Carp created AIRFLOW-3648:
--

 Summary: Default to gcp project id in connection for all gcp 
hooks/operators
 Key: AIRFLOW-3648
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3648
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Josh Carp
Assignee: Josh Carp


Some gcp hooks and operators use the project id specified in the connection if 
no value is passed explicitly, and some don't. For consistency, all gcp hooks 
and operators should make project id optional and default to the project id in 
the connection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jmcarp commented on a change in pull request #4390: [AIRFLOW-3584] Use ORM DAGs for index view.

2019-01-07 Thread GitBox
jmcarp commented on a change in pull request #4390: [AIRFLOW-3584] Use ORM DAGs 
for index view.
URL: https://github.com/apache/airflow/pull/4390#discussion_r245878602
 
 

 ##
 File path: airflow/www_rbac/compile_assets.sh
 ##
 @@ -23,6 +23,6 @@ if [ -d airflow/www_rbac/static/dist ]; then
 fi
 
 cd airflow/www_rbac/
-npm install
+# npm install
 
 Review comment:
   That's a typo, I'll revert it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jmcarp commented on a change in pull request #4390: [AIRFLOW-3584] Use ORM DAGs for index view.

2019-01-07 Thread GitBox
jmcarp commented on a change in pull request #4390: [AIRFLOW-3584] Use ORM DAGs 
for index view.
URL: https://github.com/apache/airflow/pull/4390#discussion_r245878561
 
 

 ##
 File path: airflow/models/__init__.py
 ##
 @@ -240,6 +240,20 @@ def clear_task_instances(tis,
 dr.start_date = timezone.utcnow()
 
 
+def get_last_dagrun(dag_id, session, include_externally_triggered=False):
+"""
+Returns the last dag run for a dag, None if there was none.
+Last dag run can be any type of run eg. scheduled or backfilled.
+Overridden DagRuns are ignored.
+"""
+DR = DagRun
+query = session.query(DR).filter(DR.dag_id == dag_id)
+if not include_externally_triggered:
+query = query.filter(DR.external_trigger == False)  # noqa
+query = query.order_by(DR.execution_date.desc())
 
 Review comment:
   It looks like `dag_id` and `execution_date` columns already have indexes. 
Which indexes are missing?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4421: [AIRFLOW-3468] Remove KnownEvent(Event)?

2019-01-07 Thread GitBox
feng-tao commented on issue #4421: [AIRFLOW-3468] Remove KnownEvent(Event)?
URL: https://github.com/apache/airflow/pull/4421#issuecomment-452174547
 
 
   @Fokko , regarding not having a migration script, I wonder what will happen 
for the following case:
   assume we have three tables: ``A``, ``B``, ``C``, we modify  table ``A`` and 
provide a alemic migration script(version 1), then delete table ``B`` without a 
script, then modify table ``C`` with another alemic script(version 2), in this 
case will the migration(upgrade / downgrade) run successfully from version 1 to 
 2 or vice versa?  If yes, I am +1 :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao edited a comment on issue #4421: [AIRFLOW-3468] Remove KnownEvent(Event)?

2019-01-07 Thread GitBox
feng-tao edited a comment on issue #4421: [AIRFLOW-3468] Remove 
KnownEvent(Event)?
URL: https://github.com/apache/airflow/pull/4421#issuecomment-452174547
 
 
   @Fokko , regarding not having a migration script, I wonder what will happen 
for the following case:
   assume we have three tables: ``A``, ``B``, ``C``, we modify  table ``A`` and 
provide an alemic migration script(version 1), then delete table ``B`` without 
a script, then modify table ``C`` with another alemic script(version 2), in 
this case will the migration(upgrade / downgrade) run successfully from version 
1 to  2 or vice versa?  If yes, I am +1 :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on a change in pull request #4457: [AIRFLOW-XXX] Fix TestTriggerDag flaky test

2019-01-07 Thread GitBox
feng-tao commented on a change in pull request #4457: [AIRFLOW-XXX] Fix 
TestTriggerDag flaky test
URL: https://github.com/apache/airflow/pull/4457#discussion_r245870276
 
 

 ##
 File path: tests/www_rbac/test_views.py
 ##
 @@ -1464,12 +1464,9 @@ def test_trigger_dag_button_normal_exist(self):
 
 def test_trigger_dag_button(self):
 
-test_dag_id = "example_bash_operator"
+test_dag_id = "example_python_operator"
 
 DR = models.DagRun
-self.session.query(DR).delete()
-self.session.commit()
-
 self.client.get('trigger?dag_id={}'.format(test_dag_id))
 
 Review comment:
   thanks. this pr is for testing only. I would like to see why the test fails 
particularly for MySQL ORM. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #4457: [AIRFLOW-XXX] Fix TestTriggerDag flaky test

2019-01-07 Thread GitBox
XD-DENG commented on a change in pull request #4457: [AIRFLOW-XXX] Fix 
TestTriggerDag flaky test
URL: https://github.com/apache/airflow/pull/4457#discussion_r245869601
 
 

 ##
 File path: tests/www_rbac/test_views.py
 ##
 @@ -1464,12 +1464,9 @@ def test_trigger_dag_button_normal_exist(self):
 
 def test_trigger_dag_button(self):
 
-test_dag_id = "example_bash_operator"
+test_dag_id = "example_python_operator"
 
 DR = models.DagRun
-self.session.query(DR).delete()
-self.session.commit()
-
 self.client.get('trigger?dag_id={}'.format(test_dag_id))
 
 Review comment:
   Hi @feng-tao , may be good to change this line to
   ```python
   self.client.get('/trigger?dag_id={}'.format(test_dag_id))
   ```
   (add the slash), even though seems it's not affecting that much.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao opened a new pull request #4457: Fix TestTriggerDag flaky test

2019-01-07 Thread GitBox
feng-tao opened a new pull request #4457: Fix TestTriggerDag flaky test
URL: https://github.com/apache/airflow/pull/4457
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-XXX
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   test_trigger_dag_button seems to be failing pretty consistent for MySQL ORM. 
See if this change fix the issue.
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [ ] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4407: [AIRFLOW-3600] Remove dagbag from trigger

2019-01-07 Thread GitBox
feng-tao commented on issue #4407: [AIRFLOW-3600] Remove dagbag from trigger
URL: https://github.com/apache/airflow/pull/4407#issuecomment-452156950
 
 
   another failure in https://travis-ci.org/apache/airflow/jobs/476638427 for 
https://github.com/apache/airflow/pull/4436. It seems the test fails very 
consistently with mysql ORM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jmcarp commented on issue #4436: [AIRFLOW-3631] Update flake8 and fix lint.

2019-01-07 Thread GitBox
jmcarp commented on issue #4436: [AIRFLOW-3631] Update flake8 and fix lint.
URL: https://github.com/apache/airflow/pull/4436#issuecomment-452143443
 
 
   Updated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jmcarp commented on a change in pull request #4436: [AIRFLOW-3631] Update flake8 and fix lint.

2019-01-07 Thread GitBox
jmcarp commented on a change in pull request #4436: [AIRFLOW-3631] Update 
flake8 and fix lint.
URL: https://github.com/apache/airflow/pull/4436#discussion_r245854403
 
 

 ##
 File path: airflow/settings.py
 ##
 @@ -90,7 +90,7 @@ def timing(cls, stat, dt):
   /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
 ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
  _/_/  |_/_/  /_//_//_/  \//|__/
- """
+ """  # noqa: W605
 
 Review comment:
   I took another look, and switching to a raw string would escape the 
backslash on the first line, which we don't want. I refactored slightly to 
avoid literal backslashes and to make sure the ascii text lines up.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on issue #4444: [AIRFLOW-3281] Fix Kubernetes operator with git-sync

2019-01-07 Thread GitBox
kaxil commented on issue #: [AIRFLOW-3281] Fix Kubernetes operator with 
git-sync
URL: https://github.com/apache/airflow/pull/#issuecomment-452132314
 
 
   @feng-tao I have included that commit as well.
   
   @dimberman  I have spent some time today and cherry-picked and resolved some 
conflicts (with some help from this PR - thank you guys), it would be great if 
you can verify if everything that was needed is there. And then we will try to 
resolve issues with tests if any.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4421: [AIRFLOW-3468] Remove KnownEvent(Event)?

2019-01-07 Thread GitBox
feng-tao commented on issue #4421: [AIRFLOW-3468] Remove KnownEvent(Event)?
URL: https://github.com/apache/airflow/pull/4421#issuecomment-452119451
 
 
   but CI is failing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4421: [AIRFLOW-3468] Remove KnownEvent(Event)?

2019-01-07 Thread GitBox
feng-tao commented on issue #4421: [AIRFLOW-3468] Remove KnownEvent(Event)?
URL: https://github.com/apache/airflow/pull/4421#issuecomment-452119383
 
 
   @Fokko , I see your point. +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] codecov-io commented on issue #4399: [AIRFLOW-3594] Unify different License Header

2019-01-07 Thread GitBox
codecov-io commented on issue #4399: [AIRFLOW-3594] Unify different License 
Header
URL: https://github.com/apache/airflow/pull/4399#issuecomment-452114962
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/4399?src=pr&el=h1) 
Report
   > Merging 
[#4399](https://codecov.io/gh/apache/airflow/pull/4399?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/airflow/commit/11a36d17f22d89c97100c875cb01e8e6105a94cc?src=pr&el=desc)
 will **increase** coverage by `2.02%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/airflow/pull/4399/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/4399?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff @@
   ##   master#4399  +/-   ##
   ==
   + Coverage   78.59%   80.62%   +2.02% 
   ==
 Files 204  204  
 Lines   1645319586+3133 
   ==
   + Hits1293215791+2859 
   - Misses   3521 3795 +274
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/airflow/pull/4399?src=pr&el=tree) | 
Coverage Δ | |
   |---|---|---|
   | 
[airflow/operators/hive\_to\_mysql.py](https://codecov.io/gh/apache/airflow/pull/4399/diff?src=pr&el=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvaGl2ZV90b19teXNxbC5weQ==)
 | `80% <0%> (-20%)` | :arrow_down: |
   | 
[airflow/operators/hive\_to\_samba\_operator.py](https://codecov.io/gh/apache/airflow/pull/4399/diff?src=pr&el=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvaGl2ZV90b19zYW1iYV9vcGVyYXRvci5weQ==)
 | `82.75% <0%> (-17.25%)` | :arrow_down: |
   | 
[airflow/operators/jdbc\_operator.py](https://codecov.io/gh/apache/airflow/pull/4399/diff?src=pr&el=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvamRiY19vcGVyYXRvci5weQ==)
 | `85.71% <0%> (-14.29%)` | :arrow_down: |
   | 
[airflow/operators/mysql\_to\_hive.py](https://codecov.io/gh/apache/airflow/pull/4399/diff?src=pr&el=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvbXlzcWxfdG9faGl2ZS5weQ==)
 | `92.3% <0%> (-7.7%)` | :arrow_down: |
   | 
[airflow/operators/dagrun\_operator.py](https://codecov.io/gh/apache/airflow/pull/4399/diff?src=pr&el=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvZGFncnVuX29wZXJhdG9yLnB5)
 | `93.75% <0%> (-2.41%)` | :arrow_down: |
   | 
[airflow/utils/helpers.py](https://codecov.io/gh/apache/airflow/pull/4399/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9oZWxwZXJzLnB5)
 | `82.75% <0%> (+0.14%)` | :arrow_up: |
   | 
[airflow/plugins\_manager.py](https://codecov.io/gh/apache/airflow/pull/4399/diff?src=pr&el=tree#diff-YWlyZmxvdy9wbHVnaW5zX21hbmFnZXIucHk=)
 | `92.98% <0%> (+0.84%)` | :arrow_up: |
   | 
[airflow/jobs.py](https://codecov.io/gh/apache/airflow/pull/4399/diff?src=pr&el=tree#diff-YWlyZmxvdy9qb2JzLnB5)
 | `79.2% <0%> (+1.76%)` | :arrow_up: |
   | 
[airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/airflow/pull/4399/diff?src=pr&el=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=)
 | `94.81% <0%> (+2.21%)` | :arrow_up: |
   | 
[airflow/bin/cli.py](https://codecov.io/gh/apache/airflow/pull/4399/diff?src=pr&el=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5)
 | `67.41% <0%> (+2.39%)` | :arrow_up: |
   | ... and [5 
more](https://codecov.io/gh/apache/airflow/pull/4399/diff?src=pr&el=tree-more) 
| |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/airflow/pull/4399?src=pr&el=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/airflow/pull/4399?src=pr&el=footer). 
Last update 
[11a36d1...a396fd1](https://codecov.io/gh/apache/airflow/pull/4399?src=pr&el=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3645) Use a base_executor_config and merge operator level executor_config

2019-01-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736462#comment-16736462
 ] 

ASF GitHub Bot commented on AIRFLOW-3645:
-

Mokubyow commented on pull request #4456: [AIRFLOW-3645] Add 
base_executor_config
URL: https://github.com/apache/airflow/pull/4456
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-3645)
   
   ### Description
   
   - [x] Add a base_executor_config that merges any operator_level 
executor_config into itself. This helps to dry up KubernetesExecutor 
deployments that might need to pass an executor config to all operators.
   
   ### Tests
   
   - [x] My PR adds the following unit tests: `nosetests -v 
tests/utils/test_helpers.py:TestHelpers.test_dict_merge`
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use a base_executor_config and merge operator level executor_config
> ---
>
> Key: AIRFLOW-3645
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3645
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kyle Hamlin
>Assignee: Kyle Hamlin
>Priority: Major
> Fix For: 1.10.2
>
>
> It would be very useful to have a `base_executor_config` and merge the base 
> config with any operator level `executor_config`.
> I imaging referencing a python dict similar to how we reference a custom 
> logging_config
> *Example config*
> {code:java}
> [core]
> base_executor_config = config.base_executor_config.BASE_EXECUTOR_CONFIG
> {code}
> *Example base_executor_config*
> {code:java}
> BASE_EXECUTOR_CONFIG = {
> "KubernetesExecutor": {
> "image_pull_policy": "Always",
> "annotations": {
> "iam.amazonaws.com/role": "arn:aws:iam::"
> },
> "volumes": [
> {
> "name": "airflow-lib",
> "persistentVolumeClaim": {
> "claimName": "airflow-lib"
> }
> }
> ],
> "volume_mounts": [
> {
> "name": "airflow-lib",
> "mountPath": "/usr/local/airflow/lib",
> }
> ]
> }
> }
> {code}
> *Example operator*
> {code:java}
> run_this = PythonOperator(
> task_id='print_the_context',
> provide_context=True,
> python_callable=print_context,
> executor_config={
> "KubernetesExecutor": {
> "request_memory": "256Mi",
> "request_cpu": "100m",
> "limit_memory": "256Mi",
> "limit_cpu": "100m"
> }
> },
> dag=dag)
> {code}
> Then we'll want to have a dict deep merge function in that returns the 
> executor_config
> *Merge functionality*
> {code:java}
> import collections
> from airflow import conf
> from airflow.utils.module_loading import import_string
> def dict_merge(dct, merge_dct):
> """ Recursive dict merge. Inspired by :meth:``dict.update()``, instead of
> updating only top-level keys, dict_merge recurses down into dicts nested
> to an arbitrary depth, updating keys. The ``merge_dct`` is merged into
> ``dct``.
> :param dct: dict onto which the merge is executed
> :param merge_dct: dct merged into dct
> :return: dct
> """
> for k, v in merge_dct.items():
> if (k in dct and isinstance(dct[k], dict)
> and isinstance(merge_dct[k], collections.Mapping)):
> dict

[GitHub] Mokubyow opened a new pull request #4456: [AIRFLOW-3645] Add base_executor_config

2019-01-07 Thread GitBox
Mokubyow opened a new pull request #4456: [AIRFLOW-3645] Add 
base_executor_config
URL: https://github.com/apache/airflow/pull/4456
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-3645)
   
   ### Description
   
   - [x] Add a base_executor_config that merges any operator_level 
executor_config into itself. This helps to dry up KubernetesExecutor 
deployments that might need to pass an executor config to all operators.
   
   ### Tests
   
   - [x] My PR adds the following unit tests: `nosetests -v 
tests/utils/test_helpers.py:TestHelpers.test_dict_merge`
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3647) Contributed SparkSubmitOperator doesn't honor --archives configuration

2019-01-07 Thread Ken Melms (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736460#comment-16736460
 ] 

Ken Melms commented on AIRFLOW-3647:


I have the code for this ready to go - I just needed an issue to tie the PR to.

> Contributed SparkSubmitOperator doesn't honor --archives configuration
> --
>
> Key: AIRFLOW-3647
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3647
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: 1.10.1
> Environment: Linux (RHEL 7)
> Python 3.5 (using a virtual environment)
> spark-2.1.3-bin-hadoop26
> Airflow 1.10.1
> CDH 5.14 Hadoop [Yarn] cluster (no end user / dev modifications allowed)
>Reporter: Ken Melms
>Priority: Minor
>  Labels: easyfix, newbie
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The contributed SparkSubmitOperator has no ability to honor the spark-submit 
> configuration field "--archives" which is treated subtly different than 
> "files" or "-py-files" in that it will unzip the archive into the 
> application's working directory, and can optionally add an alias to the 
> unzipped folder so that you can refer to it elsewhere in your submission.
> EG:
> spark-submit  --archives=hdfs:user/someone/python35_venv.zip#PYTHON 
> --conf "spark.yarn.appMasterEnv.PYSPARK_PYTHON=./PYTHON/python35/bin/python3" 
> run_me.py  
> In our case - this behavior allows for multiple python virtual environments 
> to be sourced from HDFS without incurring the penalty of pushing the whole 
> python virtual env to the cluster each submission.  This solves (for us) 
> using python-based spark jobs on a cluster that the end user has no ability 
> to define the python modules in use.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-3647) Contributed SparkSubmitOperator doesn't honor --archives configuration

2019-01-07 Thread Ken Melms (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ken Melms updated AIRFLOW-3647:
---
Description: 
The contributed SparkSubmitOperator has no ability to honor the spark-submit 
configuration field "--archives" which is treated subtly different than "files" 
or "-py-files" in that it will unzip the archive into the application's working 
directory, and can optionally add an alias to the unzipped folder so that you 
can refer to it elsewhere in your submission.

EG:

spark-submit  --archives=hdfs:user/someone/python35_venv.zip#PYTHON --conf 
"spark.yarn.appMasterEnv.PYSPARK_PYTHON=./PYTHON/python35/bin/python3" 
run_me.py  

In our case - this behavior allows for multiple python virtual environments to 
be sourced from HDFS without incurring the penalty of pushing the whole python 
virtual env to the cluster each submission.  This solves (for us) using 
python-based spark jobs on a cluster that the end user has no ability to define 
the python modules in use.

 

  was:
The contributed SparkSubmitOperator has no ability to honor the spark-submit 
configuration field "--archives" which is treated subtly different than 
"--files" or "--py-files" in that it will unzip the archive into the 
application's working directory, and can optionally add an alias to the 
unzipped folder so that you can refer to it elsewhere in your submission.

EG:

spark-submit  --archives=hdfs:user/someone/python35_venv.zip#PYTHON --conf 
"spark.yarn.appMasterEnv.PYSPARK_PYTHON=./PYTHON/python35/bin/python3" 
run_me.py  



In our case - this behavior allows for multiple python virtual environments to 
be sourced from HDFS without incurring the penalty of pushing the whole python 
virtual env to the cluster each submission.  This solves (for us) using 
python-based spark jobs on a cluster that the end user has no ability to define 
the python modules in use.

 


> Contributed SparkSubmitOperator doesn't honor --archives configuration
> --
>
> Key: AIRFLOW-3647
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3647
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: 1.10.1
> Environment: Linux (RHEL 7)
> Python 3.5 (using a virtual environment)
> spark-2.1.3-bin-hadoop26
> Airflow 1.10.1
> CDH 5.14 Hadoop [Yarn] cluster (no end user / dev modifications allowed)
>Reporter: Ken Melms
>Priority: Minor
>  Labels: easyfix, newbie
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The contributed SparkSubmitOperator has no ability to honor the spark-submit 
> configuration field "--archives" which is treated subtly different than 
> "files" or "-py-files" in that it will unzip the archive into the 
> application's working directory, and can optionally add an alias to the 
> unzipped folder so that you can refer to it elsewhere in your submission.
> EG:
> spark-submit  --archives=hdfs:user/someone/python35_venv.zip#PYTHON 
> --conf "spark.yarn.appMasterEnv.PYSPARK_PYTHON=./PYTHON/python35/bin/python3" 
> run_me.py  
> In our case - this behavior allows for multiple python virtual environments 
> to be sourced from HDFS without incurring the penalty of pushing the whole 
> python virtual env to the cluster each submission.  This solves (for us) 
> using python-based spark jobs on a cluster that the end user has no ability 
> to define the python modules in use.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3647) Contributed SparkSubmitOperator doesn't honor --archives configuration

2019-01-07 Thread Ken Melms (JIRA)
Ken Melms created AIRFLOW-3647:
--

 Summary: Contributed SparkSubmitOperator doesn't honor --archives 
configuration
 Key: AIRFLOW-3647
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3647
 Project: Apache Airflow
  Issue Type: Improvement
  Components: contrib
Affects Versions: 1.10.1
 Environment: Linux (RHEL 7)
Python 3.5 (using a virtual environment)
spark-2.1.3-bin-hadoop26
Airflow 1.10.1
CDH 5.14 Hadoop [Yarn] cluster (no end user / dev modifications allowed)

Reporter: Ken Melms


The contributed SparkSubmitOperator has no ability to honor the spark-submit 
configuration field "--archives" which is treated subtly different than 
"--files" or "--py-files" in that it will unzip the archive into the 
application's working directory, and can optionally add an alias to the 
unzipped folder so that you can refer to it elsewhere in your submission.

EG:

spark-submit  --archives=hdfs:user/someone/python35_venv.zip#PYTHON --conf 
"spark.yarn.appMasterEnv.PYSPARK_PYTHON=./PYTHON/python35/bin/python3" 
run_me.py  



In our case - this behavior allows for multiple python virtual environments to 
be sourced from HDFS without incurring the penalty of pushing the whole python 
virtual env to the cluster each submission.  This solves (for us) using 
python-based spark jobs on a cluster that the end user has no ability to define 
the python modules in use.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] felipegasparini commented on issue #4036: [AIRFLOW-2744] Allow RBAC to accept plugins for views and links.

2019-01-07 Thread GitBox
felipegasparini commented on issue #4036: [AIRFLOW-2744] Allow RBAC to accept 
plugins for views and links.
URL: https://github.com/apache/airflow/pull/4036#issuecomment-452113006
 
 
   We also need to fix the example in the documentation 
(https://airflow.apache.org/plugins.html#example). It is currently broken: 
import fails, wrong method name in the view and missing path for templates.
   
   I made a sample project to make this plugin integration work: 
https://github.com/felipegasparini/airflow_plugin_rbac_test/blob/dbaa049a9996df275b1d90f74b93ffbf206bb1d5/airflow/plugins/test_plugin/test_plugin.py
   
   I will submit a PR to fix the doc later, but just posting it here since it 
may be useful for others.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4407: [AIRFLOW-3600] Remove dagbag from trigger

2019-01-07 Thread GitBox
feng-tao commented on issue #4407: [AIRFLOW-3600] Remove dagbag from trigger
URL: https://github.com/apache/airflow/pull/4407#issuecomment-452107835
 
 
   @Fokko , looking at recent commit, this is the one that modifies this part 
of the code. And the CI is not always fails with this test(sometimes works, 
sometimes not). Hence I suspect this pr is the case.
   
   And we are not sure when this pr checked in CI is broken or not, right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4405: [AIRFLOW-3598] Add tests for MsSqlToHiveTransfer

2019-01-07 Thread GitBox
Fokko commented on issue #4405: [AIRFLOW-3598] Add tests for MsSqlToHiveTransfer
URL: https://github.com/apache/airflow/pull/4405#issuecomment-452107449
 
 
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work started] (AIRFLOW-3645) Use a base_executor_config and merge operator level executor_config

2019-01-07 Thread Kyle Hamlin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-3645 started by Kyle Hamlin.

> Use a base_executor_config and merge operator level executor_config
> ---
>
> Key: AIRFLOW-3645
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3645
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kyle Hamlin
>Assignee: Kyle Hamlin
>Priority: Major
> Fix For: 1.10.2
>
>
> It would be very useful to have a `base_executor_config` and merge the base 
> config with any operator level `executor_config`.
> I imaging referencing a python dict similar to how we reference a custom 
> logging_config
> *Example config*
> {code:java}
> [core]
> base_executor_config = config.base_executor_config.BASE_EXECUTOR_CONFIG
> {code}
> *Example base_executor_config*
> {code:java}
> BASE_EXECUTOR_CONFIG = {
> "KubernetesExecutor": {
> "image_pull_policy": "Always",
> "annotations": {
> "iam.amazonaws.com/role": "arn:aws:iam::"
> },
> "volumes": [
> {
> "name": "airflow-lib",
> "persistentVolumeClaim": {
> "claimName": "airflow-lib"
> }
> }
> ],
> "volume_mounts": [
> {
> "name": "airflow-lib",
> "mountPath": "/usr/local/airflow/lib",
> }
> ]
> }
> }
> {code}
> *Example operator*
> {code:java}
> run_this = PythonOperator(
> task_id='print_the_context',
> provide_context=True,
> python_callable=print_context,
> executor_config={
> "KubernetesExecutor": {
> "request_memory": "256Mi",
> "request_cpu": "100m",
> "limit_memory": "256Mi",
> "limit_cpu": "100m"
> }
> },
> dag=dag)
> {code}
> Then we'll want to have a dict deep merge function in that returns the 
> executor_config
> *Merge functionality*
> {code:java}
> import collections
> from airflow import conf
> from airflow.utils.module_loading import import_string
> def dict_merge(dct, merge_dct):
> """ Recursive dict merge. Inspired by :meth:``dict.update()``, instead of
> updating only top-level keys, dict_merge recurses down into dicts nested
> to an arbitrary depth, updating keys. The ``merge_dct`` is merged into
> ``dct``.
> :param dct: dict onto which the merge is executed
> :param merge_dct: dct merged into dct
> :return: dct
> """
> for k, v in merge_dct.items():
> if (k in dct and isinstance(dct[k], dict)
> and isinstance(merge_dct[k], collections.Mapping)):
> dict_merge(dct[k], merge_dct[k])
> else:
> dct[k] = merge_dct[k]
> 
> return dct
> def get_executor_config(executor_config):
> """Try to import base_executor_config and merge it with provided
> executor_config.
> :param executor_config: operator level executor config
> :return: dict"""
> 
> try:
> base_executor_config = import_string(
> conf.get('core', 'base_executor_config'))
> merged_executor_config = dict_merge(
> base_executor_config, executor_config)
> return merged_executor_config
> except Exception:
> return executor_config
> {code}
> Finally, we'll want to call the get_executor_config function in the 
> `BaseOperator` possibly here: 
> https://github.com/apache/airflow/blob/master/airflow/models/__init__.py#L2348



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-3645) Use a base_executor_config and merge operator level executor_config

2019-01-07 Thread Kyle Hamlin (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Hamlin reassigned AIRFLOW-3645:


Assignee: Kyle Hamlin

> Use a base_executor_config and merge operator level executor_config
> ---
>
> Key: AIRFLOW-3645
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3645
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Kyle Hamlin
>Assignee: Kyle Hamlin
>Priority: Major
> Fix For: 1.10.2
>
>
> It would be very useful to have a `base_executor_config` and merge the base 
> config with any operator level `executor_config`.
> I imaging referencing a python dict similar to how we reference a custom 
> logging_config
> *Example config*
> {code:java}
> [core]
> base_executor_config = config.base_executor_config.BASE_EXECUTOR_CONFIG
> {code}
> *Example base_executor_config*
> {code:java}
> BASE_EXECUTOR_CONFIG = {
> "KubernetesExecutor": {
> "image_pull_policy": "Always",
> "annotations": {
> "iam.amazonaws.com/role": "arn:aws:iam::"
> },
> "volumes": [
> {
> "name": "airflow-lib",
> "persistentVolumeClaim": {
> "claimName": "airflow-lib"
> }
> }
> ],
> "volume_mounts": [
> {
> "name": "airflow-lib",
> "mountPath": "/usr/local/airflow/lib",
> }
> ]
> }
> }
> {code}
> *Example operator*
> {code:java}
> run_this = PythonOperator(
> task_id='print_the_context',
> provide_context=True,
> python_callable=print_context,
> executor_config={
> "KubernetesExecutor": {
> "request_memory": "256Mi",
> "request_cpu": "100m",
> "limit_memory": "256Mi",
> "limit_cpu": "100m"
> }
> },
> dag=dag)
> {code}
> Then we'll want to have a dict deep merge function in that returns the 
> executor_config
> *Merge functionality*
> {code:java}
> import collections
> from airflow import conf
> from airflow.utils.module_loading import import_string
> def dict_merge(dct, merge_dct):
> """ Recursive dict merge. Inspired by :meth:``dict.update()``, instead of
> updating only top-level keys, dict_merge recurses down into dicts nested
> to an arbitrary depth, updating keys. The ``merge_dct`` is merged into
> ``dct``.
> :param dct: dict onto which the merge is executed
> :param merge_dct: dct merged into dct
> :return: dct
> """
> for k, v in merge_dct.items():
> if (k in dct and isinstance(dct[k], dict)
> and isinstance(merge_dct[k], collections.Mapping)):
> dict_merge(dct[k], merge_dct[k])
> else:
> dct[k] = merge_dct[k]
> 
> return dct
> def get_executor_config(executor_config):
> """Try to import base_executor_config and merge it with provided
> executor_config.
> :param executor_config: operator level executor config
> :return: dict"""
> 
> try:
> base_executor_config = import_string(
> conf.get('core', 'base_executor_config'))
> merged_executor_config = dict_merge(
> base_executor_config, executor_config)
> return merged_executor_config
> except Exception:
> return executor_config
> {code}
> Finally, we'll want to call the get_executor_config function in the 
> `BaseOperator` possibly here: 
> https://github.com/apache/airflow/blob/master/airflow/models/__init__.py#L2348



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] Fokko commented on issue #4399: [AIRFLOW-3594] Unify different License Header

2019-01-07 Thread GitBox
Fokko commented on issue #4399: [AIRFLOW-3594] Unify different License Header
URL: https://github.com/apache/airflow/pull/4399#issuecomment-452106547
 
 
   @feluelle I've restarted the test, but due to the recent renaming, I'm not 
sure if the status of the CI will propagate properly. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4407: [AIRFLOW-3600] Remove dagbag from trigger

2019-01-07 Thread GitBox
Fokko commented on issue #4407: [AIRFLOW-3600] Remove dagbag from trigger
URL: https://github.com/apache/airflow/pull/4407#issuecomment-452106154
 
 
   ```
   ==
   42) FAIL: test_trigger_dag_button (tests.www_rbac.test_views.TestTriggerDag)
   --
  Traceback (most recent call last):
   tests/www_rbac/test_views.py line 1476 in test_trigger_dag_button
 self.assertIsNotNone(run)
  AssertionError: unexpectedly None
   ```
   Hmm, are you sure that this PR is the source?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] galak75 removed a comment on issue #4292: [AIRFLOW-2508] Handle non string types in Operators templatized fields

2019-01-07 Thread GitBox
galak75 removed a comment on issue #4292: [AIRFLOW-2508] Handle non string 
types in Operators templatized fields
URL: https://github.com/apache/airflow/pull/4292#issuecomment-445222367
 
 
   Everything went well on our fork (see 
https://travis-ci.org/VilledeMontreal/incubator-airflow/builds/464753497)
   But one build failed on Travis with the error below:
   ```
   No output has been received in the last 10m0s, this potentially indicates a 
stalled build or something wrong with the build itself.
   Check the details on how to adjust your build configuration on: 
https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
   The build has been terminated
   ```
   Could anyone restart the build on my PR please? I'm not able to do it, 
probably a question of permissions...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ArgentFalcon commented on issue #3533: [AIRFLOW-161] New redirect route and extra links

2019-01-07 Thread GitBox
ArgentFalcon commented on issue #3533: [AIRFLOW-161] New redirect route and 
extra links
URL: https://github.com/apache/airflow/pull/3533#issuecomment-452102290
 
 
   Oh yeah, I should finish this up. I have to pull some more changes that we 
did internally at Lyft that make it more versatile. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on a change in pull request #4436: [AIRFLOW-3631] Update flake8 and fix lint.

2019-01-07 Thread GitBox
Fokko commented on a change in pull request #4436: [AIRFLOW-3631] Update flake8 
and fix lint.
URL: https://github.com/apache/airflow/pull/4436#discussion_r245818023
 
 

 ##
 File path: airflow/settings.py
 ##
 @@ -90,7 +90,7 @@ def timing(cls, stat, dt):
   /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
 ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
  _/_/  |_/_/  /_//_//_/  \//|__/
- """
+ """  # noqa: W605
 
 Review comment:
   Yes, a raw string would be the way to go here :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ArgentFalcon commented on issue #3533: [AIRFLOW-161] New redirect route and extra links

2019-01-07 Thread GitBox
ArgentFalcon commented on issue #3533: [AIRFLOW-161] New redirect route and 
extra links
URL: https://github.com/apache/airflow/pull/3533#issuecomment-452102449
 
 
   Maybe I should just fix the commits and merge it and then make a new PR with 
more changes. I'll do that. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feluelle commented on issue #4405: [AIRFLOW-3598] Add tests for MsSqlToHiveTransfer

2019-01-07 Thread GitBox
feluelle commented on issue #4405: [AIRFLOW-3598] Add tests for 
MsSqlToHiveTransfer
URL: https://github.com/apache/airflow/pull/4405#issuecomment-452102237
 
 
   Sure.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on a change in pull request #4390: [AIRFLOW-3584] Use ORM DAGs for index view.

2019-01-07 Thread GitBox
Fokko commented on a change in pull request #4390: [AIRFLOW-3584] Use ORM DAGs 
for index view.
URL: https://github.com/apache/airflow/pull/4390#discussion_r245817009
 
 

 ##
 File path: airflow/www_rbac/compile_assets.sh
 ##
 @@ -23,6 +23,6 @@ if [ -d airflow/www_rbac/static/dist ]; then
 fi
 
 cd airflow/www_rbac/
-npm install
+# npm install
 
 Review comment:
   Why is this commented out?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on a change in pull request #4390: [AIRFLOW-3584] Use ORM DAGs for index view.

2019-01-07 Thread GitBox
Fokko commented on a change in pull request #4390: [AIRFLOW-3584] Use ORM DAGs 
for index view.
URL: https://github.com/apache/airflow/pull/4390#discussion_r245817248
 
 

 ##
 File path: airflow/models/__init__.py
 ##
 @@ -240,6 +240,20 @@ def clear_task_instances(tis,
 dr.start_date = timezone.utcnow()
 
 
+def get_last_dagrun(dag_id, session, include_externally_triggered=False):
+"""
+Returns the last dag run for a dag, None if there was none.
+Last dag run can be any type of run eg. scheduled or backfilled.
+Overridden DagRuns are ignored.
+"""
+DR = DagRun
+query = session.query(DR).filter(DR.dag_id == dag_id)
+if not include_externally_triggered:
+query = query.filter(DR.external_trigger == False)  # noqa
+query = query.order_by(DR.execution_date.desc())
 
 Review comment:
   Shouldn't this query benefit from an index as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4405: [AIRFLOW-3598] Add tests for MsSqlToHiveTransfer

2019-01-07 Thread GitBox
Fokko commented on issue #4405: [AIRFLOW-3598] Add tests for MsSqlToHiveTransfer
URL: https://github.com/apache/airflow/pull/4405#issuecomment-452100848
 
 
   @feluelle Can you rebase onto master?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4445: [AIRFLOW-3635] Fix incorrect logic in detele_dag (introduced in PR#4406)

2019-01-07 Thread GitBox
Fokko commented on issue #4445: [AIRFLOW-3635] Fix incorrect logic in 
detele_dag (introduced in PR#4406)
URL: https://github.com/apache/airflow/pull/4445#issuecomment-452100655
 
 
   Thanks for picking this up @XD-DENG 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4378: AIRFLOW-3573 - Remove DagStat table

2019-01-07 Thread GitBox
Fokko commented on issue #4378: AIRFLOW-3573 - Remove DagStat table
URL: https://github.com/apache/airflow/pull/4378#issuecomment-452100456
 
 
   @ffinfo PTAL


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4: Create separate images with python2 and python3 support

2019-01-07 Thread GitBox
Fokko commented on issue #4: Create separate images with python2 and python3 
support
URL: https://github.com/apache/airflow-ci/pull/4#issuecomment-452100290
 
 
   This is something that we still want, but I don't have the time to look into 
this because this will probably will break the master build.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4351: [AIRFLOW-3554] Remove contrib folder from code cov omit list

2019-01-07 Thread GitBox
Fokko commented on issue #4351: [AIRFLOW-3554] Remove contrib folder from code 
cov omit list
URL: https://github.com/apache/airflow/pull/4351#issuecomment-452098776
 
 
   I'm okay with this as well. The contrib code should have tests as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on a change in pull request #4298: [AIRFLOW-3478] Make sure that the session is closed

2019-01-07 Thread GitBox
Fokko commented on a change in pull request #4298: [AIRFLOW-3478] Make sure 
that the session is closed
URL: https://github.com/apache/airflow/pull/4298#discussion_r245813920
 
 

 ##
 File path: airflow/bin/cli.py
 ##
 @@ -423,14 +418,11 @@ def unpause(args, dag=None):
 def set_is_paused(is_paused, args, dag=None):
 dag = dag or get_dag(args)
 
-session = settings.Session()
-dm = session.query(DagModel).filter(
-DagModel.dag_id == dag.dag_id).first()
-dm.is_paused = is_paused
-session.commit()
 
 Review comment:
   I've restored the `.commit()` for now. I would like to work this Friday on 
setting the `expire_on_commit=True`: 
https://github.com/apache/airflow/blob/master/airflow/settings.py#L198
   
   It feels like we have a lot of connections to the database because they 
aren't properly closed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4383: [AIRFLOW-3475] Move ImportError out of models.py

2019-01-07 Thread GitBox
Fokko commented on issue #4383: [AIRFLOW-3475] Move ImportError out of models.py
URL: https://github.com/apache/airflow/pull/4383#issuecomment-452097714
 
 
   @BasPH I've restarted the failed tests. Maybe do a rebase? It seems to fail 
on k8s.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4298: [AIRFLOW-3478] Make sure that the session is closed

2019-01-07 Thread GitBox
Fokko commented on issue #4298: [AIRFLOW-3478] Make sure that the session is 
closed
URL: https://github.com/apache/airflow/pull/4298#issuecomment-452096893
 
 
   Rebased :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4320: [AIRFLOW-3515] Remove the run_duration option

2019-01-07 Thread GitBox
Fokko commented on issue #4320: [AIRFLOW-3515] Remove the run_duration option
URL: https://github.com/apache/airflow/pull/4320#issuecomment-452096465
 
 
   I've rebased onto master and resolved the conflicts


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Fokko commented on issue #4421: [AIRFLOW-3468] Remove KnownEvent(Event)?

2019-01-07 Thread GitBox
Fokko commented on issue #4421: [AIRFLOW-3468] Remove KnownEvent(Event)?
URL: https://github.com/apache/airflow/pull/4421#issuecomment-452095834
 
 
   Rebased. @feng-tao I've looked into the Alembic script, but it becomes quite 
nasty in my opinion. The upgrade will be a `DROP TABLE IF EXISTS`, and the 
downgrade will recreate the tables which aren't used in Airflow 2.0 anymore.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4036: [AIRFLOW-2744] Allow RBAC to accept plugins for views and links.

2019-01-07 Thread GitBox
feng-tao commented on issue #4036: [AIRFLOW-2744] Allow RBAC to accept plugins 
for views and links.
URL: https://github.com/apache/airflow/pull/4036#issuecomment-452093401
 
 
   @oliviersm199 , it seems that PluginRBACTest fails(currently it is disabled) 
if re-enable. Do you think you have time to fix the test?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (AIRFLOW-3292) `delete_dag` endpoint and cli commands don't delete on exact dag_id matching

2019-01-07 Thread Teresa Martyny (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736368#comment-16736368
 ] 

Teresa Martyny edited comment on AIRFLOW-3292 at 1/7/19 9:25 PM:
-

Sorry Ash, I just saw this response. We don't use subdags, so we were unaware 
of this naming convention. Adding a validation to prevent people from naming 
dags this way would be great. In the meantime, we have created a ticket on our 
end to rename our dags. Thanks for clarifying!


was (Author: teresamartyny):
Sorry Ash, I just saw this response. We don't use subdags, so we were unaware 
of this naming convention. Adding a validation to prevent people from namings 
dags this way would be great. In the meantime, we have created a ticket on our 
end to rename our dags. Thanks for clarifying!

> `delete_dag` endpoint and cli commands don't delete on exact dag_id matching
> 
>
> Key: AIRFLOW-3292
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3292
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api, cli
>Affects Versions: 1.10.0
>Reporter: Teresa Martyny
>Priority: Major
>
> If you have the following dag ids: `schema`, `schema.table1`, 
> `schema.table2`, `schema_replace`
> When you hit the delete_dag endpoint with the dag id: `schema`, it will 
> delete `schema`, `schema.table1`, and `schema.table2`, not just `schema`. 
> Underscores are fine so it doesn't delete `schema_replace`, but periods are 
> not.
> If this is expected behavior, clarifying that functionality in the docs would 
> be great, and then I can submit a feature request for the ability to use 
> regex for exact matching with this command and endpoint.
> Thanks!! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3292) `delete_dag` endpoint and cli commands don't delete on exact dag_id matching

2019-01-07 Thread Teresa Martyny (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736368#comment-16736368
 ] 

Teresa Martyny commented on AIRFLOW-3292:
-

Sorry Ash, I just saw this response. We don't use subdags, so we were unaware 
of this naming convention. Adding a validation to prevent people from namings 
dags this way would be great. In the meantime, we have created a ticket on our 
end to rename our dags. Thanks for clarifying!

> `delete_dag` endpoint and cli commands don't delete on exact dag_id matching
> 
>
> Key: AIRFLOW-3292
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3292
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api, cli
>Affects Versions: 1.10.0
>Reporter: Teresa Martyny
>Priority: Major
>
> If you have the following dag ids: `schema`, `schema.table1`, 
> `schema.table2`, `schema_replace`
> When you hit the delete_dag endpoint with the dag id: `schema`, it will 
> delete `schema`, `schema.table1`, and `schema.table2`, not just `schema`. 
> Underscores are fine so it doesn't delete `schema_replace`, but periods are 
> not.
> If this is expected behavior, clarifying that functionality in the docs would 
> be great, and then I can submit a feature request for the ability to use 
> regex for exact matching with this command and endpoint.
> Thanks!! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] feng-tao edited a comment on issue #4444: [AIRFLOW-3281] Fix Kubernetes operator with git-sync

2019-01-07 Thread GitBox
feng-tao edited a comment on issue #: [AIRFLOW-3281] Fix Kubernetes 
operator with git-sync
URL: https://github.com/apache/airflow/pull/#issuecomment-452078774
 
 
   hey @kaxil , that's great news. I think we may need to revert this 
commit(https://github.com/apache/airflow/commit/b5b9287a75596a617557798f1286cf7b89c55350#diff-a7b22c07c43739c8eb0850a6fd6f7eb8)
 in v1-10-test branch as it is already included in master branch and 
cherry-pick the same commit from master branch. I think the original author 
creates a separate commit for v10.1 release. Once that commit is reverted, 
thing should be much easier. And we should include this critical fix as 
well(https://github.com/apache/airflow/pull/4305/commits/bf45855e11e0cb80040615af19fe9138406cb52b)
 once the commit is resolved.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4444: [AIRFLOW-3281] Fix Kubernetes operator with git-sync

2019-01-07 Thread GitBox
feng-tao commented on issue #: [AIRFLOW-3281] Fix Kubernetes operator with 
git-sync
URL: https://github.com/apache/airflow/pull/#issuecomment-452079131
 
 
   @kaxil, thanks for running the release :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao edited a comment on issue #4444: [AIRFLOW-3281] Fix Kubernetes operator with git-sync

2019-01-07 Thread GitBox
feng-tao edited a comment on issue #: [AIRFLOW-3281] Fix Kubernetes 
operator with git-sync
URL: https://github.com/apache/airflow/pull/#issuecomment-452078774
 
 
   hey @kaxil , that's great news. I think we may need to revert this 
commit(https://github.com/apache/airflow/commit/b5b9287a75596a617557798f1286cf7b89c55350#diff-a7b22c07c43739c8eb0850a6fd6f7eb8)
 in v1-10-test branch as it is already included in master branch and 
cherry-pick the same commit from master branch. I think the original author 
creates a separate commit for v10.1 release. Once that commit is reverted, 
thing should be much easier.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4444: [AIRFLOW-3281] Fix Kubernetes operator with git-sync

2019-01-07 Thread GitBox
feng-tao commented on issue #: [AIRFLOW-3281] Fix Kubernetes operator with 
git-sync
URL: https://github.com/apache/airflow/pull/#issuecomment-452078774
 
 
   hey @kaxil , that's great news. I think we may need to revert this 
commit(https://github.com/apache/airflow/commit/b5b9287a75596a617557798f1286cf7b89c55350#diff-a7b22c07c43739c8eb0850a6fd6f7eb8)
 in v1-10-test branch as it is already included in master branch. I think the 
original author creates a separate commit for v10.1 release. Once that commit 
is reverted, thing should be much easier.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on issue #4444: [AIRFLOW-3281] Fix Kubernetes operator with git-sync

2019-01-07 Thread GitBox
kaxil commented on issue #: [AIRFLOW-3281] Fix Kubernetes operator with 
git-sync
URL: https://github.com/apache/airflow/pull/#issuecomment-452076833
 
 
   Trying to resolve conflicts :D so that we can include 
https://github.com/apache/incubator-airflow/pull/3770 and your DAG-level access 
commit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3402) Set default kubernetes affinity and toleration settings in airflow.cfg

2019-01-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736325#comment-16736325
 ] 

ASF GitHub Bot commented on AIRFLOW-3402:
-

kaxil commented on pull request #4454: [AIRFLOW-3402] Port PR #4247 to 1.10-test
URL: https://github.com/apache/airflow/pull/4454
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Set default kubernetes affinity and toleration settings in airflow.cfg
> --
>
> Key: AIRFLOW-3402
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3402
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: kubernetes
>Reporter: Kevin Pullin
>Assignee: Kevin Pullin
>Priority: Major
> Fix For: 1.10.2
>
>
> Currently airflow supports setting kubernetes `affinity` and `toleration` 
> configuration inside dags using either a `KubernetesExecutorConfig` 
> definition or using the `KubernetesPodOperator`.
> In order to reduce having to set and maintain this configuration in every 
> dag, it'd be useful to have the ability to set these globally in the 
> airflow.cfg file.  One use case is to force all kubernetes pods to run on a 
> particular set of dedicated airflow nodes, which requires both affinity rules 
> and tolerations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] kaxil closed pull request #4454: [AIRFLOW-3402] Port PR #4247 to 1.10-test

2019-01-07 Thread GitBox
kaxil closed pull request #4454: [AIRFLOW-3402] Port PR #4247 to 1.10-test
URL: https://github.com/apache/airflow/pull/4454
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/config_templates/default_airflow.cfg 
b/airflow/config_templates/default_airflow.cfg
index c8aa4061e7..a72604a536 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -630,6 +630,16 @@ gcp_service_account_keys =
 # It will raise an exception if called from a process not running in a 
kubernetes environment.
 in_cluster = True
 
+# Affinity configuration as a single line formatted JSON object.
+# See the affinity model for top-level key names (e.g. `nodeAffinity`, etc.):
+#   
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.12/#affinity-v1-core
+affinity =
+
+# A list of toleration objects as a single line formatted JSON array
+# See:
+#   
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.12/#toleration-v1-core
+tolerations =
+
 [kubernetes_node_selectors]
 # The Key-value pairs to be given to worker pods.
 # The worker pods will be scheduled to the nodes of the specified key-value 
pairs.
diff --git a/airflow/contrib/example_dags/example_kubernetes_executor.py 
b/airflow/contrib/example_dags/example_kubernetes_executor.py
index 1d9bb73043..d03e255ab3 100644
--- a/airflow/contrib/example_dags/example_kubernetes_executor.py
+++ b/airflow/contrib/example_dags/example_kubernetes_executor.py
@@ -32,6 +32,31 @@
 schedule_interval=None
 )
 
+affinity = {
+'podAntiAffinity': {
+'requiredDuringSchedulingIgnoredDuringExecution': [
+{
+'topologyKey': 'kubernetes.io/hostname',
+'labelSelector': {
+'matchExpressions': [
+{
+'key': 'app',
+'operator': 'In',
+'values': ['airflow']
+}
+]
+}
+}
+]
+}
+}
+
+tolerations = [{
+'key': 'dedicated',
+'operator': 'Equal',
+'value': 'airflow'
+}]
+
 
 def print_stuff():
 print("stuff!")
@@ -59,11 +84,14 @@ def use_zip_binary():
 executor_config={"KubernetesExecutor": {"image": "airflow/ci_zip:latest"}}
 )
 
-# Limit resources on this operator/task
+# Limit resources on this operator/task with node affinity & tolerations
 three_task = PythonOperator(
 task_id="three_task", python_callable=print_stuff, dag=dag,
 executor_config={
-"KubernetesExecutor": {"request_memory": "128Mi", "limit_memory": 
"128Mi"}}
+"KubernetesExecutor": {"request_memory": "128Mi",
+   "limit_memory": "128Mi",
+   "tolerations": tolerations,
+   "affinity": affinity}}
 )
 
 start_task.set_downstream([one_task, two_task, three_task])
diff --git a/airflow/contrib/executors/kubernetes_executor.py 
b/airflow/contrib/executors/kubernetes_executor.py
index dd9cd3ec53..e06a5f47e1 100644
--- a/airflow/contrib/executors/kubernetes_executor.py
+++ b/airflow/contrib/executors/kubernetes_executor.py
@@ -16,6 +16,7 @@
 # under the License.
 
 import base64
+import json
 import multiprocessing
 from queue import Queue
 from dateutil import parser
@@ -40,7 +41,7 @@ class KubernetesExecutorConfig:
 def __init__(self, image=None, image_pull_policy=None, request_memory=None,
  request_cpu=None, limit_memory=None, limit_cpu=None,
  gcp_service_account_key=None, node_selectors=None, 
affinity=None,
- annotations=None, volumes=None, volume_mounts=None):
+ annotations=None, volumes=None, volume_mounts=None, 
tolerations=None):
 self.image = image
 self.image_pull_policy = image_pull_policy
 self.request_memory = request_memory
@@ -53,16 +54,18 @@ def __init__(self, image=None, image_pull_policy=None, 
request_memory=None,
 self.annotations = annotations
 self.volumes = volumes
 self.volume_mounts = volume_mounts
+self.tolerations = tolerations
 
 def __repr__(self):
 return "{}(image={}, image_pull_policy={}, request_memory={}, 
request_cpu={}, " \
"limit_memory={}, limit_cpu={}, gcp_service_account_key={}, " \
"node_selectors={}, affinity={}, annotations={}, volumes={}, " \
-   "volume_mounts={})" \
+   "volume_mounts={}, tolerations={})" \
 .format(KubernetesExecutorConfig.__name__, self.image, 
self.image_pull_policy,
 self.request_memory, self.request_cpu, self.limit_memory,
 self.limit_cpu

[GitHub] feluelle commented on a change in pull request #4455: [AIRFLOW-3519] Fix example http operator

2019-01-07 Thread GitBox
feluelle commented on a change in pull request #4455: [AIRFLOW-3519] Fix 
example http operator
URL: https://github.com/apache/airflow/pull/4455#discussion_r245791673
 
 

 ##
 File path: airflow/example_dags/example_http_operator.py
 ##
 @@ -92,7 +92,7 @@
 http_conn_id='http_default',
 endpoint='',
 request_params={},
-response_check=lambda response: True if "Google" in response.content else 
False,
 
 Review comment:
   ..and I thought `"Google" in response.text` makes more sense (to me) and you 
do not need to do the byte conversion by yourself. :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4444: [AIRFLOW-3281] Fix Kubernetes operator with git-sync

2019-01-07 Thread GitBox
feng-tao commented on issue #: [AIRFLOW-3281] Fix Kubernetes operator with 
git-sync
URL: https://github.com/apache/airflow/pull/#issuecomment-452075293
 
 
   what is the issue @kaxil ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao edited a comment on issue #4444: [AIRFLOW-3281] Fix Kubernetes operator with git-sync

2019-01-07 Thread GitBox
feng-tao edited a comment on issue #: [AIRFLOW-3281] Fix Kubernetes 
operator with git-sync
URL: https://github.com/apache/airflow/pull/#issuecomment-452075293
 
 
   @kaxil, what is the issue?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil edited a comment on issue #4444: [AIRFLOW-3281] Fix Kubernetes operator with git-sync

2019-01-07 Thread GitBox
kaxil edited a comment on issue #: [AIRFLOW-3281] Fix Kubernetes operator 
with git-sync
URL: https://github.com/apache/airflow/pull/#issuecomment-452074777
 
 
   Sorry, I merged it and had to revert which caused this piece to fail. May 
well have to create a new PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on issue #4444: [AIRFLOW-3281] Fix Kubernetes operator with git-sync

2019-01-07 Thread GitBox
kaxil commented on issue #: [AIRFLOW-3281] Fix Kubernetes operator with 
git-sync
URL: https://github.com/apache/airflow/pull/#issuecomment-452074777
 
 
   Sorry, I merged it and had to revert. Let me know when it is ready


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #4455: [AIRFLOW-3519] Fix example http operator

2019-01-07 Thread GitBox
kaxil commented on a change in pull request #4455: [AIRFLOW-3519] Fix example 
http operator
URL: https://github.com/apache/airflow/pull/4455#discussion_r245790529
 
 

 ##
 File path: airflow/example_dags/example_http_operator.py
 ##
 @@ -92,7 +92,7 @@
 http_conn_id='http_default',
 endpoint='',
 request_params={},
-response_check=lambda response: True if "Google" in response.content else 
False,
 
 Review comment:
   Makes sense. Thanks @feluelle 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feluelle commented on a change in pull request #4455: [AIRFLOW-3519] Fix example http operator

2019-01-07 Thread GitBox
feluelle commented on a change in pull request #4455: [AIRFLOW-3519] Fix 
example http operator
URL: https://github.com/apache/airflow/pull/4455#discussion_r245789254
 
 

 ##
 File path: airflow/example_dags/example_http_operator.py
 ##
 @@ -92,7 +92,7 @@
 http_conn_id='http_default',
 endpoint='',
 request_params={},
-response_check=lambda response: True if "Google" in response.content else 
False,
 
 Review comment:
   Yes. `response.content` is a byte string and not a decoded string like 
`response.text`.
   So either `b"Google" in response.content` would work or `"Google" in 
response.text` . See these Jira tickets: 
https://issues.apache.org/jira/browse/AIRFLOW-3519 and 
https://issues.apache.org/jira/browse/AIRFLOW-450


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil closed pull request #4444: [AIRFLOW-3281] Fix Kubernetes operator with git-sync

2019-01-07 Thread GitBox
kaxil closed pull request #: [AIRFLOW-3281] Fix Kubernetes operator with 
git-sync
URL: https://github.com/apache/airflow/pull/
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):



 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3281) Kubernetes git sync implementation is broken

2019-01-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736311#comment-16736311
 ] 

ASF GitHub Bot commented on AIRFLOW-3281:
-

kaxil commented on pull request #: [AIRFLOW-3281] Fix Kubernetes operator 
with git-sync
URL: https://github.com/apache/airflow/pull/
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Kubernetes git sync implementation is broken
> 
>
> Key: AIRFLOW-3281
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3281
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Riccardo Bini
>Assignee: Riccardo Bini
>Priority: Major
>
> The current implementation of git-sync when airflow is being used with 
> kubernetes is broken.
> The init container doesn't share the volume with the airflow container and 
> the path of the dag folder doesn't take into account the fact that git sync 
> creates a sym link to the revision



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] benjamingregory edited a comment on issue #3055: [AIRFLOW-2125] Using binary package psycopg2-binary

2019-01-07 Thread GitBox
benjamingregory edited a comment on issue #3055: [AIRFLOW-2125] Using binary 
package psycopg2-binary
URL: https://github.com/apache/airflow/pull/3055#issuecomment-452071229
 
 
   @bern4rdelli @jgao54 @Fokko 
   
   Question as to why this was changed to `psychopg2-binary` given the 
following warning from 
http://initd.org/psycopg/docs/install.html#binary-install-from-pypi
   
   ```
   Note: The -binary package is meant for beginners to start playing with 
Python and PostgreSQL without the need to meet the build requirements. 
   
   If you are the maintainer of a publish package depending on psycopg2 you 
shouldn’t use psycopg2-binary as a module dependency. For production use you 
are advised to use the source distribution.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] benjamingregory commented on issue #3055: [AIRFLOW-2125] Using binary package psycopg2-binary

2019-01-07 Thread GitBox
benjamingregory commented on issue #3055: [AIRFLOW-2125] Using binary package 
psycopg2-binary
URL: https://github.com/apache/airflow/pull/3055#issuecomment-452071229
 
 
   @bern4rdelli @jgao54 @Fokko 
   
   Question as to why this was changed to `psychopg2-binary` given the 
following warning from 
http://initd.org/psycopg/docs/install.html#binary-install-from-pypi
   
   ```
   Note: The -binary package is meant for beginners to start playing with 
Python and PostgreSQL without the need to meet the build requirements. If you 
are the maintainer of a publish package depending on psycopg2 you shouldn’t use 
psycopg2-binary as a module dependency. For production use you are advised to 
use the source distribution.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] kaxil commented on a change in pull request #4455: [AIRFLOW-3519] Fix example http operator

2019-01-07 Thread GitBox
kaxil commented on a change in pull request #4455: [AIRFLOW-3519] Fix example 
http operator
URL: https://github.com/apache/airflow/pull/4455#discussion_r245784539
 
 

 ##
 File path: airflow/example_dags/example_http_operator.py
 ##
 @@ -92,7 +92,7 @@
 http_conn_id='http_default',
 endpoint='',
 request_params={},
-response_check=lambda response: True if "Google" in response.content else 
False,
 
 Review comment:
   response.content works as well. Does it error for you @feluelle ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (AIRFLOW-3646) Fix plugin manager test

2019-01-07 Thread Tao Feng (JIRA)
Tao Feng created AIRFLOW-3646:
-

 Summary: Fix plugin manager test
 Key: AIRFLOW-3646
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3646
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Tao Feng






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] feng-tao commented on issue #4399: [AIRFLOW-3594] Unify different License Header

2019-01-07 Thread GitBox
feng-tao commented on issue #4399: [AIRFLOW-3594] Unify different License Header
URL: https://github.com/apache/airflow/pull/4399#issuecomment-452035908
 
 
   PTAL @bolkedebruin 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3272) Create gRPC hook for creating generic grpc connection

2019-01-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736141#comment-16736141
 ] 

ASF GitHub Bot commented on AIRFLOW-3272:
-

morgendave commented on pull request #4101: [AIRFLOW-3272] Add base grpc hook
URL: https://github.com/apache/airflow/pull/4101
 
 
   Make sure you have checked all steps below.
   
   Jira
 My PR addresses the following Airflow Jira issues and references them in 
the PR title. For example, "[AIRFLOW-3272] My Airflow PR"
   https://issues.apache.org/jira/browse/AIRFLOW-3272
   Description
 Add support for gRPC connection in airflow. 
   
   In Airflow there are use cases of calling gPRC services, so instead of each 
time create the channel in a PythonOperator, there should be a basic GrpcHook 
to take care of it. The hook needs to take care of the authentication.
   
   Tests
 My PR adds the following unit tests OR does not need testing for this 
extremely good reason:
   Commits
 My commits all reference Jira issues in their subject lines, and I have 
squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "How to write a good git commit message":
   Subject is separated from body by a blank line
   Subject is limited to 50 characters (not including Jira issue reference)
   Subject does not end with a period
   Subject uses the imperative mood ("add", not "adding")
   Body wraps at 72 characters
   Body explains "what" and "why", not "how"
   Documentation
 In case of new functionality, my PR adds documentation that describes how 
to use it.
   When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   Code Quality
 Passes flake8
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Create gRPC hook for creating generic grpc connection
> -
>
> Key: AIRFLOW-3272
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3272
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Zhiwei Zhao
>Assignee: Zhiwei Zhao
>Priority: Minor
>
> Add support for gRPC connection in airflow. 
> In Airflow there are use cases of calling gPRC services, so instead of each 
> time create the channel in a PythonOperator, there should be a basic GrpcHook 
> to take care of it. The hook needs to take care of the authentication.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3645) Use a base_executor_config and merge operator level executor_config

2019-01-07 Thread Kyle Hamlin (JIRA)
Kyle Hamlin created AIRFLOW-3645:


 Summary: Use a base_executor_config and merge operator level 
executor_config
 Key: AIRFLOW-3645
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3645
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Kyle Hamlin
 Fix For: 1.10.2


It would be very useful to have a `base_executor_config` and merge the base 
config with any operator level `executor_config`.

I imaging referencing a python dict similar to how we reference a custom 
logging_config

*Example config*
{code:java}
[core]
base_executor_config = config.base_executor_config.BASE_EXECUTOR_CONFIG
{code}
*Example base_executor_config*
{code:java}
BASE_EXECUTOR_CONFIG = {
"KubernetesExecutor": {
"image_pull_policy": "Always",
"annotations": {
"iam.amazonaws.com/role": "arn:aws:iam::"
},
"volumes": [
{
"name": "airflow-lib",
"persistentVolumeClaim": {
"claimName": "airflow-lib"
}
}
],
"volume_mounts": [
{
"name": "airflow-lib",
"mountPath": "/usr/local/airflow/lib",
}
]
}
}
{code}
*Example operator*
{code:java}
run_this = PythonOperator(
task_id='print_the_context',
provide_context=True,
python_callable=print_context,
executor_config={
"KubernetesExecutor": {
"request_memory": "256Mi",
"request_cpu": "100m",
"limit_memory": "256Mi",
"limit_cpu": "100m"
}
},
dag=dag)
{code}
Then we'll want to have a dict deep merge function in that returns the 
executor_config

*Merge functionality*
{code:java}
import collections
from airflow import conf
from airflow.utils.module_loading import import_string

def dict_merge(dct, merge_dct):
""" Recursive dict merge. Inspired by :meth:``dict.update()``, instead of
updating only top-level keys, dict_merge recurses down into dicts nested
to an arbitrary depth, updating keys. The ``merge_dct`` is merged into
``dct``.
:param dct: dict onto which the merge is executed
:param merge_dct: dct merged into dct
:return: dct
"""

for k, v in merge_dct.items():
if (k in dct and isinstance(dct[k], dict)
and isinstance(merge_dct[k], collections.Mapping)):
dict_merge(dct[k], merge_dct[k])
else:
dct[k] = merge_dct[k]

return dct


def get_executor_config(executor_config):
"""Try to import base_executor_config and merge it with provided
executor_config.
:param executor_config: operator level executor config
:return: dict"""

try:
base_executor_config = import_string(
conf.get('core', 'base_executor_config'))
merged_executor_config = dict_merge(
base_executor_config, executor_config)
return merged_executor_config
except Exception:
return executor_config
{code}

Finally, we'll want to call the get_executor_config function in the 
`BaseOperator` possibly here: 
https://github.com/apache/airflow/blob/master/airflow/models/__init__.py#L2348



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3272) Create gRPC hook for creating generic grpc connection

2019-01-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736140#comment-16736140
 ] 

ASF GitHub Bot commented on AIRFLOW-3272:
-

morgendave commented on pull request #4101: [AIRFLOW-3272] Add base grpc hook
URL: https://github.com/apache/airflow/pull/4101
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Create gRPC hook for creating generic grpc connection
> -
>
> Key: AIRFLOW-3272
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3272
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Zhiwei Zhao
>Assignee: Zhiwei Zhao
>Priority: Minor
>
> Add support for gRPC connection in airflow. 
> In Airflow there are use cases of calling gPRC services, so instead of each 
> time create the channel in a PythonOperator, there should be a basic GrpcHook 
> to take care of it. The hook needs to take care of the authentication.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] morgendave closed pull request #4101: [AIRFLOW-3272] Add base grpc hook

2019-01-07 Thread GitBox
morgendave closed pull request #4101: [AIRFLOW-3272] Add base grpc hook
URL: https://github.com/apache/airflow/pull/4101
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/hooks/grpc_hook.py 
b/airflow/contrib/hooks/grpc_hook.py
new file mode 100644
index 00..46a9a0ca7e
--- /dev/null
+++ b/airflow/contrib/hooks/grpc_hook.py
@@ -0,0 +1,120 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+import grpc
+from google import auth as google_auth
+from google.auth import jwt as google_auth_jwt
+from google.auth.transport import grpc as google_auth_transport_grpc
+from google.auth.transport import requests as google_auth_transport_requests
+
+from airflow.hooks.base_hook import BaseHook
+from airflow.exceptions import AirflowConfigException
+
+
+class GrpcHook(BaseHook):
+"""
+General interaction with gRPC servers.
+:param grpc_conn_id: The connection ID to use when fetching connection 
info.
+:type grpc_conn_id: str
+:param interceptors: a list of gRPC interceptor objects which would be 
applied
+to the connected gRPC channel. None by default.
+:type interceptors: a list of gRPC interceptors based on or extends the 
four
+official gRPC interceptors, eg, UnaryUnaryClientInterceptor, 
UnaryStreamClientInterceptor,
+StreamUnaryClientInterceptor, StreamStreamClientInterceptor.
+::param custom_connection_func: The customized connection function to 
return gRPC channel.
+:type custom_connection_func: python callable objects that accept the 
connection as
+its only arg. Could be partial or lambda.
+"""
+
+def __init__(self, grpc_conn_id, interceptors=None, 
custom_connection_func=None):
+self.grpc_conn_id = grpc_conn_id
+self.conn = self.get_connection(self.grpc_conn_id)
+self.extras = self.conn.extra_dejson
+self.interceptors = interceptors if interceptors else []
+self.custom_connection_func = custom_connection_func
+
+def get_conn(self):
+base_url = self.conn.host
+
+if self.conn.port:
+base_url = base_url + ":" + str(self.conn.port)
+
+auth_type = self._get_field("auth_type")
+
+if auth_type == "NO_AUTH":
+channel = grpc.insecure_channel(base_url)
+elif auth_type == "SSL" or auth_type == "TLS":
+credential_file_name = self._get_field("credential_pem_file")
+creds = 
grpc.ssl_channel_credentials(open(credential_file_name).read())
+channel = grpc.secure_channel(base_url, creds)
+elif auth_type == "JWT_GOOGLE":
+credentials, _ = google_auth.default()
+jwt_creds = 
google_auth_jwt.OnDemandCredentials.from_signing_credentials(
+credentials)
+channel = google_auth_transport_grpc.secure_authorized_channel(
+jwt_creds, None, base_url)
+elif auth_type == "OATH_GOOGLE":
+scopes = self._get_field("scopes").split(",")
+credentials, _ = google_auth.default(scopes=scopes)
+request = google_auth_transport_requests.Request()
+channel = google_auth_transport_grpc.secure_authorized_channel(
+credentials, request, base_url)
+elif auth_type == "CUSTOM":
+if not self.custom_connection_func:
+raise AirflowConfigException(
+"Customized connection function not set, not able to 
establish a channel")
+channel = self.custom_connection_func(self.conn)
+else:
+raise AirflowConfigException(
+"auth_type not supported or not provided, channel cannot be 
established,\
+given value: %s" % str(auth_type))
+
+if self.interceptors:
+for interceptor in self.interceptors:
+channel = grpc.intercept_channel(channel,
+ interceptor)
+
+return channel
+
+def run(self, stub_class, call_func, streaming=False, data={}):
+with self.get_conn() as channel:
+stub = stub_class(channel)
+try:
+rpc_func = getattr(stub, call_func)
+response = rpc_func(**

[GitHub] morgendave opened a new pull request #4101: [AIRFLOW-3272] Add base grpc hook

2019-01-07 Thread GitBox
morgendave opened a new pull request #4101: [AIRFLOW-3272] Add base grpc hook
URL: https://github.com/apache/airflow/pull/4101
 
 
   Make sure you have checked all steps below.
   
   Jira
 My PR addresses the following Airflow Jira issues and references them in 
the PR title. For example, "[AIRFLOW-3272] My Airflow PR"
   https://issues.apache.org/jira/browse/AIRFLOW-3272
   Description
 Add support for gRPC connection in airflow. 
   
   In Airflow there are use cases of calling gPRC services, so instead of each 
time create the channel in a PythonOperator, there should be a basic GrpcHook 
to take care of it. The hook needs to take care of the authentication.
   
   Tests
 My PR adds the following unit tests OR does not need testing for this 
extremely good reason:
   Commits
 My commits all reference Jira issues in their subject lines, and I have 
squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "How to write a good git commit message":
   Subject is separated from body by a blank line
   Subject is limited to 50 characters (not including Jira issue reference)
   Subject does not end with a period
   Subject uses the imperative mood ("add", not "adding")
   Body wraps at 72 characters
   Body explains "what" and "why", not "how"
   Documentation
 In case of new functionality, my PR adds documentation that describes how 
to use it.
   When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
   Code Quality
 Passes flake8


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4453: [AIRFLOW-XXX] Fix existing flake8 errors

2019-01-07 Thread GitBox
feng-tao commented on issue #4453: [AIRFLOW-XXX] Fix existing flake8 errors
URL: https://github.com/apache/airflow/pull/4453#issuecomment-452019385
 
 
   @yohei1126 , I think this test is pretty flaky.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4453: [AIRFLOW-XXX] Fix existing flake8 errors

2019-01-07 Thread GitBox
feng-tao commented on issue #4453: [AIRFLOW-XXX] Fix existing flake8 errors
URL: https://github.com/apache/airflow/pull/4453#issuecomment-452019252
 
 
   @yohei1126 , not sure which one you are looking at, but the master CI 
passes(https://travis-ci.org/apache/airflow/builds/476197675). 
   
   The test failure comes from https://github.com/apache/airflow/pull/4407 
which I have discussed in the pr as well as the mailing list.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] feng-tao commented on issue #4407: [AIRFLOW-3600] Remove dagbag from trigger

2019-01-07 Thread GitBox
feng-tao commented on issue #4407: [AIRFLOW-3600] Remove dagbag from trigger
URL: https://github.com/apache/airflow/pull/4407#issuecomment-452019084
 
 
   @ffinfo , please let me know if you will investigate the issue. If not, I 
prefer to revert this change until the flaky test is fixed. What do you think 
@Fokko ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (AIRFLOW-3519) example_http_operator is failing due to

2019-01-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736031#comment-16736031
 ] 

ASF GitHub Bot commented on AIRFLOW-3519:
-

feluelle commented on pull request #4455: [AIRFLOW-3519] Fix example http 
operator
URL: https://github.com/apache/airflow/pull/4455
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3519
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   This PR fixes the sensor in the example http operator that searches for a 
string in a byte-like object called response.content.
   Now it searches in the decoded response object called response.text.
   
   **NOTE:** This PR also fixes issue 
https://issues.apache.org/jira/browse/AIRFLOW-450. This ticket was already 
marked as resolved but it actually wasn't.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> example_http_operator is failing due to 
> 
>
> Key: AIRFLOW-3519
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3519
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: Windows 10 professional edition.Apache airflow 
>Reporter: Arunachalam Ambikapathi
>Assignee: Felix Uellendall
>Priority: Minor
>
> When example_http_operator DAG is called from command line, 
>  ./airflow trigger_dag example_http_operator
> it was throwing error 
>  [2018-12-13 10:37:41,892]
> {logging_mixin.py:95}
> INFO - [2018-12-13 10:37:41,892]
> {http_hook.py:126}
> INFO - Sending 'GET' to url: [https://www.google.com/]
> [2018-12-13 10:37:41,992]
> {logging_mixin.py:95}
> WARNING - 
> /home/arun1/.local/lib/python3.5/site-packages/urllib3/connectionpool.py:847: 
> InsecureRequestWarning: Unverified HTTPS request is being made. Adding 
> certificate verification is strongly advised. See: 
> [https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings]
>  InsecureRequestWarning)
>  [2018-12-13 10:37:42,064]
> {models.py:1760}
> *ERROR - a bytes-like object is required, not 'str'*
> This may be due to this was not tested in python3.5 version.
>  *Fix:*
>  I changed the dag to this and tested it is working.
> from 
> response_check=lambda response: True if "Google" in response.content else 
> False,
> to 
> response_check=lambda response: True if *b'Google'* in response.content else 
> False,
> Please apply this in the example it would help new users a lot.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-3644) AIP-8 Split Hooks/Operators out of core package and repository

2019-01-07 Thread Tim Swast (JIRA)
Tim Swast created AIRFLOW-3644:
--

 Summary: AIP-8 Split Hooks/Operators out of core package and 
repository
 Key: AIRFLOW-3644
 URL: https://issues.apache.org/jira/browse/AIRFLOW-3644
 Project: Apache Airflow
  Issue Type: Improvement
  Components: contrib, core
Reporter: Tim Swast


Based on discussion at 
http://mail-archives.apache.org/mod_mbox/airflow-dev/201809.mbox/%3c308670db-bd2a-4738-81b1-3f6fb312c...@apache.org%3E
 I believe separating hooks/operators into separate packages can benefit 
long-term maintainability of Apache Airflow by distributing maintenance and 
reducing the surface area of the core Airflow package.

AIP-8 draft: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] feluelle opened a new pull request #4455: [AIRFLOW-3519] Fix example http operator

2019-01-07 Thread GitBox
feluelle opened a new pull request #4455: [AIRFLOW-3519] Fix example http 
operator
URL: https://github.com/apache/airflow/pull/4455
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references 
them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
 - https://issues.apache.org/jira/browse/AIRFLOW-3519
 - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI 
changes:
   
   This PR fixes the sensor in the example http operator that searches for a 
string in a byte-like object called response.content.
   Now it searches in the decoded response object called response.text.
   
   **NOTE:** This PR also fixes issue 
https://issues.apache.org/jira/browse/AIRFLOW-450. This ticket was already 
marked as resolved but it actually wasn't.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
 - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Mokubyow edited a comment on issue #4309: [AIRFLOW-3504] Extend/refine the functionality of "/health" endpoint

2019-01-07 Thread GitBox
Mokubyow edited a comment on issue #4309: [AIRFLOW-3504] Extend/refine the 
functionality of "/health" endpoint
URL: https://github.com/apache/airflow/pull/4309#issuecomment-451993107
 
 
   Emmanual Bard brought up a good point about the scheduler health check. What 
you have only checks the last scheduled run, not the scheduler heartbeat which 
is what we really want to know. The query should look something like this:
   
   ```
   select max(latest_heartbeat) from job
   where job_type = 'SchedulerJob'
   and state = 'running'
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Mokubyow edited a comment on issue #4309: [AIRFLOW-3504] Extend/refine the functionality of "/health" endpoint

2019-01-07 Thread GitBox
Mokubyow edited a comment on issue #4309: [AIRFLOW-3504] Extend/refine the 
functionality of "/health" endpoint
URL: https://github.com/apache/airflow/pull/4309#issuecomment-451993107
 
 
   Emmanual Bard brought up a good point about the scheduler health check. What 
you have only checks the last scheduled run, not the scheduler heartbeat which 
is what we really want to know. The query should look something like this:
   
   ```
   select max(latest_heartbeat) from job
   where job_type = 'SchedulerJob'
   and state = 'running'```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Mokubyow commented on issue #4309: [AIRFLOW-3504] Extend/refine the functionality of "/health" endpoint

2019-01-07 Thread GitBox
Mokubyow commented on issue #4309: [AIRFLOW-3504] Extend/refine the 
functionality of "/health" endpoint
URL: https://github.com/apache/airflow/pull/4309#issuecomment-451993107
 
 
   Emmanual Bard brought up a good point about the scheduler health check. What 
you have only checks the last scheduled run, not the scheduler heartbeat which 
is what we really want to know. The query should look something like this:
   
   ```select
   max(latest_heartbeat) from job
   where job_type = 'SchedulerJob'
   and state = 'running'```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (AIRFLOW-3519) example_http_operator is failing due to

2019-01-07 Thread Felix Uellendall (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736002#comment-16736002
 ] 

Felix Uellendall edited comment on AIRFLOW-3519 at 1/7/19 4:04 PM:
---

I would rather change the response_check to return True if it is in 
response.text instead of response.content.
Because response.text is the decoded response of response.content. See 
http://docs.python-requests.org/en/master/user/quickstart/#response-content

What do you think [~Arun Ambikapathi] ?


was (Author: feluelle):
I would rather change the response_check to return True if it is in 
response.text instead of response.content.
Because response.text is the decoded response of response.content.

What do you think [~Arun Ambikapathi] ?

> example_http_operator is failing due to 
> 
>
> Key: AIRFLOW-3519
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3519
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: Windows 10 professional edition.Apache airflow 
>Reporter: Arunachalam Ambikapathi
>Assignee: Felix Uellendall
>Priority: Minor
>
> When example_http_operator DAG is called from command line, 
>  ./airflow trigger_dag example_http_operator
> it was throwing error 
>  [2018-12-13 10:37:41,892]
> {logging_mixin.py:95}
> INFO - [2018-12-13 10:37:41,892]
> {http_hook.py:126}
> INFO - Sending 'GET' to url: [https://www.google.com/]
> [2018-12-13 10:37:41,992]
> {logging_mixin.py:95}
> WARNING - 
> /home/arun1/.local/lib/python3.5/site-packages/urllib3/connectionpool.py:847: 
> InsecureRequestWarning: Unverified HTTPS request is being made. Adding 
> certificate verification is strongly advised. See: 
> [https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings]
>  InsecureRequestWarning)
>  [2018-12-13 10:37:42,064]
> {models.py:1760}
> *ERROR - a bytes-like object is required, not 'str'*
> This may be due to this was not tested in python3.5 version.
>  *Fix:*
>  I changed the dag to this and tested it is working.
> from 
> response_check=lambda response: True if "Google" in response.content else 
> False,
> to 
> response_check=lambda response: True if *b'Google'* in response.content else 
> False,
> Please apply this in the example it would help new users a lot.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-3519) example_http_operator is failing due to

2019-01-07 Thread Felix Uellendall (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736002#comment-16736002
 ] 

Felix Uellendall commented on AIRFLOW-3519:
---

I would rather change the response_check to return True if it is in 
response.text instead of response.content.
Because response.text is the decoded response of response.content.

What do you think [~Arun Ambikapathi] ?

> example_http_operator is failing due to 
> 
>
> Key: AIRFLOW-3519
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3519
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: Windows 10 professional edition.Apache airflow 
>Reporter: Arunachalam Ambikapathi
>Assignee: Felix Uellendall
>Priority: Minor
>
> When example_http_operator DAG is called from command line, 
>  ./airflow trigger_dag example_http_operator
> it was throwing error 
>  [2018-12-13 10:37:41,892]
> {logging_mixin.py:95}
> INFO - [2018-12-13 10:37:41,892]
> {http_hook.py:126}
> INFO - Sending 'GET' to url: [https://www.google.com/]
> [2018-12-13 10:37:41,992]
> {logging_mixin.py:95}
> WARNING - 
> /home/arun1/.local/lib/python3.5/site-packages/urllib3/connectionpool.py:847: 
> InsecureRequestWarning: Unverified HTTPS request is being made. Adding 
> certificate verification is strongly advised. See: 
> [https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings]
>  InsecureRequestWarning)
>  [2018-12-13 10:37:42,064]
> {models.py:1760}
> *ERROR - a bytes-like object is required, not 'str'*
> This may be due to this was not tested in python3.5 version.
>  *Fix:*
>  I changed the dag to this and tested it is working.
> from 
> response_check=lambda response: True if "Google" in response.content else 
> False,
> to 
> response_check=lambda response: True if *b'Google'* in response.content else 
> False,
> Please apply this in the example it would help new users a lot.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AIRFLOW-3519) example_http_operator is failing due to

2019-01-07 Thread Felix Uellendall (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Uellendall reassigned AIRFLOW-3519:
-

Assignee: Felix Uellendall

> example_http_operator is failing due to 
> 
>
> Key: AIRFLOW-3519
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3519
> Project: Apache Airflow
>  Issue Type: Bug
> Environment: Windows 10 professional edition.Apache airflow 
>Reporter: Arunachalam Ambikapathi
>Assignee: Felix Uellendall
>Priority: Minor
>
> When example_http_operator DAG is called from command line, 
>  ./airflow trigger_dag example_http_operator
> it was throwing error 
>  [2018-12-13 10:37:41,892]
> {logging_mixin.py:95}
> INFO - [2018-12-13 10:37:41,892]
> {http_hook.py:126}
> INFO - Sending 'GET' to url: [https://www.google.com/]
> [2018-12-13 10:37:41,992]
> {logging_mixin.py:95}
> WARNING - 
> /home/arun1/.local/lib/python3.5/site-packages/urllib3/connectionpool.py:847: 
> InsecureRequestWarning: Unverified HTTPS request is being made. Adding 
> certificate verification is strongly advised. See: 
> [https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings]
>  InsecureRequestWarning)
>  [2018-12-13 10:37:42,064]
> {models.py:1760}
> *ERROR - a bytes-like object is required, not 'str'*
> This may be due to this was not tested in python3.5 version.
>  *Fix:*
>  I changed the dag to this and tested it is working.
> from 
> response_check=lambda response: True if "Google" in response.content else 
> False,
> to 
> response_check=lambda response: True if *b'Google'* in response.content else 
> False,
> Please apply this in the example it would help new users a lot.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-3601) update operators to BigQuery to support location

2019-01-07 Thread Yohei Onishi (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-3601 started by Yohei Onishi.
-
> update operators to BigQuery to support location
> 
>
> Key: AIRFLOW-3601
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3601
> Project: Apache Airflow
>  Issue Type: Task
>Affects Versions: 1.10.1
>Reporter: Yohei Onishi
>Assignee: Yohei Onishi
>Priority: Major
>
> location support for BigQueryHook was merged by the PR 4324 
> [https://github.com/apache/incubator-airflow/pull/4324]
> The following operators needs to be updated.
>  * bigquery_check_operator.py
>  ** 
> BigQueryCheckOperator
>  * bigquery_get_data.py
>  ** 
> BigQueryGetDataOperator
>  *  bigquery_operator.py
>  ** 
> BigQueryOperator
> BigQueryCreateEmptyTableOperator
> BigQueryCreateExternalTableOperator
> BigQueryDeleteDatasetOperator
> BigQueryCreateEmptyDatasetOperator
>  *  bigquery_table_delete_operator.py
>  ** 
> BigQueryTableDeleteOperator
>  * bigquery_to_bigquery.py
>  ** 
> BigQueryToBigQueryOperator
>  * bigquery_to_gcs.py
>  ** 
> BigQueryToCloudStorageOperator
>  * gcs_to_bq.py
>  ** 
> GoogleCloudStorageToBigQueryOperator
>  * bigquery_sensor.py
>  ** 
> BigQueryTableSensor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] XD-DENG commented on a change in pull request #4436: [AIRFLOW-3631] Update flake8 and fix lint.

2019-01-07 Thread GitBox
XD-DENG commented on a change in pull request #4436: [AIRFLOW-3631] Update 
flake8 and fix lint.
URL: https://github.com/apache/airflow/pull/4436#discussion_r245611643
 
 

 ##
 File path: airflow/settings.py
 ##
 @@ -90,7 +90,7 @@ def timing(cls, stat, dt):
   /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
 ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
  _/_/  |_/_/  /_//_//_/  \//|__/
- """
+ """  # noqa: W605
 
 Review comment:
   Get it. Overall it looks good to me (pinning flake8 to 3.5.0 was a temporary 
fix). Thanks!
   
   The only thing I'm still considering is how the string chunks should be 
handled, like the `HEADER` here. I would like to leave this to the committers 
to decide.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


  1   2   >